date:20230518

[PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-18 Thread Ajit Agarwal via Gcc-patches

Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass.

Code Sinking sinks the blocks after call. This increases
register pressure for callee-saved registers. Improves
code sinking before call in the use blocks or immediate
dominator of use blocks.

2023-05-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-sink.cc (statement_sink_location): Modifed to
move statements before calls.
(block_call_p): New function.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best
blocks in the immediate post dominator.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  20 +++
 gcc/tree-ssa-sink.cc| 159 ++--
 3 files changed, 185 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..716bc1f9257
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
-fdump-tree-sink-stats" } */
+
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..ff41e2ea8ae
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */ 
+/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */
+
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 87b1d40c174..76556e7795b 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -171,6 +171,72 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return TRUE if immediate uses of the defs in
+   USE occur in the same block as USE, FALSE otherwise.  */
+
+bool
+def_use_same_block (gimple *stmt)
+{
+  use_operand_p use_p;
+  def_operand_p def_p;
+  imm_use_iterator imm_iter;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_DEF)
+{
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
+   {
+ if (is_gimple_debug (USE_STMT (use_p)))
+   continue;
+
+ if (use_p
+ && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt)))
+   return true;
+   }
+ }
+  return false;
+}
+
+/* Return TRUE if the block has only calls, FALSE otherwise. */
+
+bool
+block_call_p (basic_block bb)
+{
+  int i = 0;
+  bool is_call = false;
+  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+  gimple *last_stmt = gsi_stmt (gsi);
+
+  if (last_stmt && gimple_code (last_stmt) == GIMPLE_COND)
+{
+  if (!gsi_end_p (gsi))
+   gsi_prev (&gsi);
+
+   for (; !gsi_end_p (gsi);)
+{
+  gimple *stmt = gsi_stmt (gsi);
+
+  /* We have already seen a call.  */
+  if (is_call)
+return false;
+
+  if (is_gimple_call (stmt))
+is_call = true;
+  else
+return false;
+
+  if (!gsi_end_p (gsi))
+gsi_prev (&gsi);
+
+   ++i;
+   }
+ }
+  if (is_call && i == 1)
+return true;
+
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
statements.
@@ -190,7 +256,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
@@ -230,14 +297,47 @@ select_best_block (basic_block early_bb,
   if (threshold > 100)
threshold = 100;
 }
-
   /* If BEST_BB is at the same nesting level, t

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-18 Thread Richard Sandiford via Gcc-patches

Prathamesh Kulkarni  writes:
> On Tue, 16 May 2023 at 00:29, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > Hi Richard,
>> > After committing the interleave+zip1 patch for vector initialization,
>> > it seems to regress the s32 case for this patch:
>> >
>> > int32x4_t f_s32(int32_t x)
>> > {
>> >   return (int32x4_t) { x, x, x, 1 };
>> > }
>> >
>> > code-gen:
>> > f_s32:
>> > moviv30.2s, 0x1
>> > fmovs31, w0
>> > dup v0.2s, v31.s[0]
>> > ins v30.s[0], v31.s[0]
>> > zip1v0.4s, v0.4s, v30.4s
>> > ret
>> >
>> > instead of expected code-gen:
>> > f_s32:
>> > moviv31.2s, 0x1
>> > dup v0.4s, w0
>> > ins v0.s[3], v31.s[0]
>> > ret
>> >
>> > Cost for fallback sequence: 16
>> > Cost for interleave and zip sequence: 12
>> >
>> > For the above case, the cost for interleave+zip1 sequence is computed as:
>> > halves[0]:
>> > (set (reg:V2SI 96)
>> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
>> > cost = 8
>> >
>> > halves[1]:
>> > (set (reg:V2SI 97)
>> > (const_vector:V2SI [
>> > (const_int 1 [0x1]) repeated x2
>> > ]))
>> > (set (reg:V2SI 97)
>> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
>> > (reg:V2SI 97)
>> > (const_int 1 [0x1])))
>> > cost = 8
>> >
>> > followed by:
>> > (set (reg:V4SI 95)
>> > (unspec:V4SI [
>> > (subreg:V4SI (reg:V2SI 96) 0)
>> > (subreg:V4SI (reg:V2SI 97) 0)
>> > ] UNSPEC_ZIP1))
>> > cost = 4
>> >
>> > So the total cost becomes
>> > max(costs[0], costs[1]) + zip1_insn_cost
>> > = max(8, 8) + 4
>> > = 12
>> >
>> > While the fallback rtl sequence is:
>> > (set (reg:V4SI 95)
>> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
>> > cost = 8
>> > (set (reg:SI 98)
>> > (const_int 1 [0x1]))
>> > cost = 4
>> > (set (reg:V4SI 95)
>> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
>> > (reg:V4SI 95)
>> > (const_int 8 [0x8])))
>> > cost = 4
>> >
>> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 sequence.
>> >
>> > I think the issue is probably that for the interleave+zip1 sequence we take
>> > max(costs[0], costs[1]) to reflect that both halves are interleaved,
>> > but for the fallback seq we use seq_cost, which assumes serial execution
>> > of insns in the sequence.
>> > For above fallback sequence,
>> > set (reg:V4SI 95)
>> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
>> > and
>> > (set (reg:SI 98)
>> > (const_int 1 [0x1]))
>> > could be executed in parallel, which would make it's cost max(8, 4) + 4 = 
>> > 12.
>>
>> Agreed.
>>
>> A good-enough substitute for this might be to ignore scalar moves
>> (for both alternatives) when costing for speed.
> Thanks for the suggestions. Just wondering for aarch64, if there's an easy
> way we can check if insn is a scalar move, similar to riscv's 
> scalar_move_insn_p
> that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ?

It should be enough to check that the pattern is a SET:

(a) whose SET_DEST has a scalar mode and
(b) whose SET_SRC an aarch64_mov_operand 

>> > I was wondering if we should we make cost for interleave+zip1 sequence
>> > more conservative
>> > by not taking max, but summing up costs[0] + costs[1] even for speed ?
>> > For this case,
>> > that would be 8 + 8 + 4 = 20.
>> >
>> > It generates the fallback sequence for other cases (s8, s16, s64) from
>> > the test-case.
>>
>> What does it do for the tests in the interleave+zip1 patch?  If it doesn't
>> make a difference there then it sounds like we don't have enough tests. :)
> Oh right, the tests in interleave+zip1 patch only check for s16 case,
> sorry about that :/
> Looking briefly at the code generated for s8, s32 and s64 case,
> (a) s8, and s16 seem to use same sequence for all cases.
> (b) s64 seems to use fallback sequence.
> (c) For vec-init-21.c, s8 and s16 cases prefer fallback sequence
> because costs are tied,
> while s32 case prefers interleave+zip1:
>
> int32x4_t f_s32(int32_t x, int32_t y)
> {
>   return (int32x4_t) { x, y, 1, 2 };
> }
>
> Code-gen with interleave+zip1 sequence:
> f_s32:
> moviv31.2s, 0x1
> moviv0.2s, 0x2
> ins v31.s[0], w0
> ins v0.s[0], w1
> zip1v0.4s, v31.4s, v0.4s
> ret
>
> Code-gen with fallback sequence:
> f_s32:
> adrpx2, .LC0
> ldr q0, [x2, #:lo12:.LC0]
> ins v0.s[0], w0
> ins v0.s[1], w1
> ret
>
> Fallback sequence cost = 20
> interleave+zip1 sequence cost = 12
> I assume interleave+zip1 sequence is better in this case (chosen currently) ?
>
> I will send a patch to add cases for s8, s16 and s64 in a follow up patch 
> soon.
>>
>> Summing is only conservative if the fallback sequence is somehow "safer".
>> But I don't think it is.   Building an N-element vector from N scalars
>> can be done using N instructions in the fallback case and N+1 in

[PATCH 1/4] Missed opportunity to use [SU]ABD

2023-05-18 Thread Oluwatamilore Adebayo via Gcc-patches

From: oluade01 

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 255 +-
 4 files changed, 239 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
07bf8bdebffb2e523f25a41f2b57e43c0276b745..3e65584d7efcd301f2c96a40edd82d30b84462b8
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = op0 > op1 ? op0 - op1 : op1 - op0;
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
a49b09539776c0056e77f99b10365d0a8747fbc5..50f1822f220c023027f4b0f777965f3757842fa2
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -770,6 +770,93 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+
+   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
+   If REJECT_UNSIGNED is true it aborts if the type of ABS_STMT is unsigned.
+   HALF_TYPE and UNPROM will be set should the statement be found to
+   be a widened operation.
+   DIFF_OPRNDS will be set to the two inputs of the MINUS_EXPR preceding
+   ABS_STMT, otherwise it will be set the operations found by
+   vect_widened_op_tree.
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type, bool reject_unsigned,
+   vect_unpromoted_value unprom[2],
+   tree diff_oprnds[2])
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (reject_unsigned && TYPE_UNSIGNED (abs_type))
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g.  from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't

Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-18 Thread Richard Sandiford via Gcc-patches

pan2...@intel.com writes:
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
> index c5180b9308a..c2103a5cb5c 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -215,7 +215,11 @@ private:
>  
>// The values returned by the accessors above.
>unsigned int m_regno;
> -  access_kind m_kind : 8;
> +
> +  // The value returned by the accessor above.
> +  machine_mode m_mode : MACHINE_MODE_BITSIZE;
> +
> +  access_kind m_kind : 2;

There's no need to repeat the comment.  Just:

  // The values returned by the accessors above.
  unsigned int m_regno;
  machine_mode m_mode : MACHINE_MODE_BITSIZE;
  access_kind m_kind : 2;

would be enough.

OK with that change, thanks.  (There's no need to post the updated patch.)

Richard

RE: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-18 Thread Li, Pan2 via Gcc-patches

Committed with that change, thanks Richard Sandiford.

Pan

-Original Message-
From: Richard Sandiford  
Sent: Thursday, May 18, 2023 4:57 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang ; jeffreya...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

pan2...@intel.com writes:
> diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h index 
> c5180b9308a..c2103a5cb5c 100644
> --- a/gcc/rtl-ssa/accesses.h
> +++ b/gcc/rtl-ssa/accesses.h
> @@ -215,7 +215,11 @@ private:
>  
>// The values returned by the accessors above.
>unsigned int m_regno;
> -  access_kind m_kind : 8;
> +
> +  // The value returned by the accessor above.
> +  machine_mode m_mode : MACHINE_MODE_BITSIZE;
> +
> +  access_kind m_kind : 2;

There's no need to repeat the comment.  Just:

  // The values returned by the accessors above.
  unsigned int m_regno;
  machine_mode m_mode : MACHINE_MODE_BITSIZE;
  access_kind m_kind : 2;

would be enough.

OK with that change, thanks.  (There's no need to post the updated patch.)

Richard

[PATCH] Fix internal error on small array with negative lower bound

2023-05-18 Thread Eric Botcazou via Gcc-patches

Hi,

Ada supports arrays with negative indices, although the internal index type is
sizetype like in other languages, which is unsigned.  This means that negative
values are represented by very large numbers, which works with a bit of care.
The attached test exposes a small loophole in output_constructor_bitfield.

Tested on x86-64/Linux, OK for the mainline?


2023-05-18  Eric Botcazou 

* varasm.cc (output_constructor_bitfield): Call tree_to_uhwi instead
of tree_to_shwi on array indices.  Minor tweaks.


2023-05-18  Eric Botcazou 

* gnat.dg/specs/array6.ads: New test.

-- 
Eric Botcazoudiff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 2256194d934..478cbfe6736 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -5585,19 +5585,18 @@ output_constructor_bitfield (oc_local_state *local, unsigned int bit_offset)
 
   /* Relative index of this element if this is an array component.  */
   HOST_WIDE_INT relative_index
-= (!local->field
-   ? (local->index
-	  ? (tree_to_shwi (local->index)
-	 - tree_to_shwi (local->min_index))
-	  : local->last_relative_index + 1)
-   : 0);
+= (local->field
+   ? 0
+   : (local->index
+	  ? tree_to_uhwi (local->index) - tree_to_uhwi (local->min_index)
+	  : local->last_relative_index + 1));
 
   /* Bit position of this element from the start of the containing
  constructor.  */
   HOST_WIDE_INT constructor_relative_ebitpos
-  = (local->field
-	 ? int_bit_position (local->field)
-	 : ebitsize * relative_index);
+= (local->field
+   ? int_bit_position (local->field)
+   : ebitsize * relative_index);
 
   /* Bit position of this element from the start of a possibly ongoing
  outer byte buffer.  */
-- { dg-do compile }

package Array6 is 

  type Range_Type is range -10 ..  10;
  type Array_Type is array (Range_Type range <> ) of Short_Short_Integer;

  type Record_Type is record 
A : Array_Type(-2..4);
  end record ;

  Rec : Record_Type := (A => (others => -1));

end Array6;

[committed gcc12 backport] arm: Fix vstrwq* backend + testsuite

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

From: Andrea Corallo 

Hi all,

this patch fixes the vstrwq* MVE instrinsics failing to emit the
correct sequence of instruction due to a missing predicate. Also the
immediate range is fixed to be multiples of 2 up between [-252, 252].

Best Regards

  Andrea

gcc/ChangeLog:

* config/arm/constraints.md (mve_vldrd_immediate): Move it to
predicates.md.
(Ri): Move constraint definition from predicates.md.
(Rl): Define new constraint.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add
missing constraint.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
for op 1, use mve_vstrw_immediate predicate and Rl constraint for
op 2. Fix asm output spacing.
(mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint.
* config/arm/predicates.md (Ri) Move constraint to constraints.md
(mve_vldrd_immediate): Move it from
constraints.md.
(mve_vstrw_immediate): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
check-function-bodies instead of scan-assembler checks.  Use
extern "C" for C++ testing.
* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
---
 gcc/config/arm/constraints.md | 20 --
 gcc/config/arm/mve.md | 10 ++---
 gcc/config/arm/predicates.md  | 14 +++
 .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
 .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
 .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
 .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
 .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_s32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_u32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_offset_f32.c| 32 ---
 .../intrinsics/vstrwq_scatter_offset_p_f32.c  | 40 ---

[committed gcc12 backport] arm: Add vorrq_n overloading into vorrq _Generic

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vorrq): Add _n variant.
---
 gcc/config/arm/arm_mve.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 6bf1794d2ff..39b3446617d 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35852,6 +35852,10 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: 
__arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, 
float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: 
__arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, 
float32x4_t)));})
 
@@ -38637,7 +38641,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)));})
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
-- 
2.25.1

[committed gcc12 backport] [arm] complete vmsr/vmrs blank and case adjustments

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

From: Alexandre Oliva 

Back in September last year, some of the vmsr and vmrs patterns had an
extraneous blank removed, and the case of register names lowered, but
another instance remained, and so did a testcase.

for  gcc/ChangeLog

* config/arm/vfp.md (*thumb2_movsi_vfp): Drop blank after tab
after vmsr and vmrs, and lower the case of P0.

for  gcc/testsuite/ChangeLog

* gcc.target/arm/acle/cde-mve-full-assembly.c: Drop blank
after tab after vmsr, and lower the case of P0.
---
 gcc/config/arm/vfp.md |   4 +-
 .../arm/acle/cde-mve-full-assembly.c  | 264 +-
 2 files changed, 134 insertions(+), 134 deletions(-)

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 932e4b7447e..7a430ef8d36 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -312,9 +312,9 @@ (define_insn "*thumb2_movsi_vfp"
 case 12: case 13:
   return output_move_vfp (operands);
 case 14:
-  return \"vmsr\\t P0, %1\";
+  return \"vmsr\\tp0, %1\";
 case 15:
-  return \"vmrs\\t %0, P0\";
+  return \"vmrs\\t%0, p0\";
 case 16:
   return \"mcr\\tp10, 7, %1, cr1, cr0, 0\\t @SET_FPSCR\";
 case 17:
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c 
b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
index 501cc84da10..e3e7f7ef3e5 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
@@ -567,80 +567,80 @@
contain back references).  */
 /*
 ** test_cde_vcx1q_mfloat16x8_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_mfloat32x4_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint8x16_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint16x8_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint32x4_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint64x2_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_mint8x16_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movh

[committed gcc12 backport] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrsr3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsrFPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.

(cherry picked from commit f1417d051be094ffbce228e11951f3e12e8fca1c)
---
 gcc/config/arm/arm_mve.h  | 16 ++---
 gcc/config/arm/mve.md |  2 +-
 .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++
 3 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 82ceec2bbfc..6bf1794d2ff 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -16055,7 +16055,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16065,7 +16065,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16075,7 +16075,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16085,7 +16085,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16131,7 +16131,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16141,7 +16141,7

[committed gcc12 backport] arm testsuite: XFAIL or relax registers in some tests [PR109697]

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

This is a simple testsuite tidy-up patch, addressing to types of errors:

* The vcmp vector-scalar tests failing due to the compiler's preference
of vector-vector comparisons, over vector-scalar comparisons. This is
due to the lack of cost model for MVE and the compiler not knowing that
the RTL vec_duplicate is free in those instructions. For now, we simply
XFAIL these checks.
* The tests for pr108177 had strict usage of q0 and r0 registers,
meaning that they would FAIL with -mfloat-abi=softf. The register checks
have now been relaxed. A couple of these run-tests also had incosistent
use of integer MVE with floating point vectors, so I've now changed
these to use FP MVE.

gcc/testsuite/ChangeLog:
PR target/109697
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/pr108177-1.c: Relax registers.
* gcc.target/arm/mve/pr108177-10.c: Relax registers.
* gcc.target/arm/mve/pr108177-11.c: Relax registers.
* gcc.target/arm/mve/pr108177-12.c: Relax registers.
* gcc.target/arm/mve/pr108177-13.c: Relax registers.
* gcc.target/arm/mve/pr108177-13-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-14.c: Relax registers.
* gcc.target/arm/mve/pr108177-14-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-2.c: Relax registers.
* gcc.target/arm/mve/pr108177-3.c: Relax registers.
* gcc.target/arm/mve/pr108177-4.c: Relax registers.
* gcc.target/arm/mve/pr108177-5.c: Relax registers.
* gcc.target/arm/mve/pr108177-6.c: Relax registers.
* gcc.target/arm/mve/pr108177-7.c: Relax registers.
* gcc.target/arm/mve/pr108177-8.c: Relax registers.
* gcc.target/arm/mve/pr108177-9.c: Relax registers.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq

[committed gcc12 backport] arm testsuite: Remove reduntant tests

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Following Andrea's overhaul of the MVE testsuite, these tests are now
reduntant, as equivalent checks have been added to the each intrinsic's
.c test.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: 
Removed.
* gc

[committed gcc12 backport] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

These newly updated tests were rewritten by Andrea. Some of them
needed further manual fixing as follows:

* The #shift immediate value not in the check-function-bodies as expected
* The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
  this case the test rewritten from the ACLE had the lsr+and pattern,
  but the compiler was able to optimise to ubfx. Hence I've changed the
  test to now match on ubfx.
* Added a separate test to check shift on constants being optimised to
  movs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/mve_const_shifts.c: New test.
---
 .../gcc.target/arm/mve/intrinsics/srshr.c |  2 +-
 .../gcc.target/arm/mve/intrinsics/srshrl.c|  2 +-
 .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +--
 .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +--
 .../gcc.target/arm/mve/intrinsics/urshr.c |  4 +-
 .../gcc.target/arm/mve/intrinsics/urshrl.c|  4 +-
 .../arm/mve/intrinsics/vadciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vadciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vadcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vadcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vsbciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c |  8 +---
 .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++
 23 files changed, 81 insertions(+), 128 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
index 94e3f42fd33..734375d58c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 */
 int32_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
index 65f28ccbfde..a91943c38a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|)
+** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|)
 ** ...
 */
 int64_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
index b23c9d97ba6..462531cad54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** uqshl   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** uqshl   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...

[committed gcc12 backport] arm: Fix overloading of MVE scalar constant parameters on vbicq, vmvnq_m

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vbicq): Change coerce on
scalar constant.
(__arm_vmvnq_m): Likewise.
---
 gcc/config/arm/arm_mve.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 39b3446617d..0b35bd0eedd 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35906,10 +35906,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38825,10 +38825,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -40962,10 +40962,10 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vmvnq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vmvnq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vmvnq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmvnq_m_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1(__p1, int

[commited trunk 5/9] arm: Fix overloading of MVE scalar constant parameters on vbicq

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vbicq): Change coerce on
scalar constant.
---
 gcc/config/arm/arm_mve.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 4ad1c99c288..30cec519791 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -10847,10 +10847,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -11699,10 +11699,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-- 
2.25.1

[commited trunk 4/9] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrsr3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsrFPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
---
 gcc/config/arm/arm_mve.h  | 16 ++---
 gcc/config/arm/mve.md |  2 +-
 .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++
 3 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1774e6eca2b..4ad1c99c288 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -4098,7 +4098,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4108,7 +4108,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4118,7 +4118,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4128,7 +4128,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4174,7 +4174,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4184,7 +4184,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, _

[commited trunk 8/9] arm testsuite: XFAIL or relax registers in some tests [PR109697]

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

This is a simple testsuite tidy-up patch, addressing to types of errors:

* The vcmp vector-scalar tests failing due to the compiler's preference
of vector-vector comparisons, over vector-scalar comparisons. This is
due to the lack of cost model for MVE and the compiler not knowing that
the RTL vec_duplicate is free in those instructions. For now, we simply
XFAIL these checks.
* The tests for pr108177 had strict usage of q0 and r0 registers,
meaning that they would FAIL with -mfloat-abi=softf. The register checks
have now been relaxed. A couple of these run-tests also had incosistent
use of integer MVE with floating point vectors, so I've now changed these
to use FP MVE.

gcc/testsuite/ChangeLog:
PR target/109697
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/pr108177-1.c: Relax registers.
* gcc.target/arm/mve/pr108177-10.c: Relax registers.
* gcc.target/arm/mve/pr108177-11.c: Relax registers.
* gcc.target/arm/mve/pr108177-12.c: Relax registers.
* gcc.target/arm/mve/pr108177-13.c: Relax registers.
* gcc.target/arm/mve/pr108177-13-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-14.c: Relax registers.
* gcc.target/arm/mve/pr108177-14-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-2.c: Relax registers.
* gcc.target/arm/mve/pr108177-3.c: Relax registers.
* gcc.target/arm/mve/pr108177-4.c: Relax registers.
* gcc.target/arm/mve/pr108177-5.c: Relax registers.
* gcc.target/arm/mve/pr108177-6.c: Relax registers.
* gcc.target/arm/mve/pr108177-7.c: Relax registers.
* gcc.target/arm/mve/pr108177-8.c: Relax registers.
* gcc.target/arm/mve/pr108177-9.c: Relax registers.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq

[commited trunk 2/9] arm: Fix vstrwq* backend + testsuite

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

From: Andrea Corallo 

Hi all,

this patch fixes the vstrwq* MVE instrinsics failing to emit the
correct sequence of instruction due to a missing predicate. Also the
immediate range is fixed to be multiples of 2 up between [-252, 252].

Best Regards

  Andrea

gcc/ChangeLog:

* config/arm/constraints.md (mve_vldrd_immediate): Move it to
predicates.md.
(Ri): Move constraint definition from predicates.md.
(Rl): Define new constraint.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add
missing constraint.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
for op 1, use mve_vstrw_immediate predicate and Rl constraint for
op 2. Fix asm output spacing.
(mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint.
* config/arm/predicates.md (Ri) Move constraint to constraints.md
(mve_vldrd_immediate): Move it from
constraints.md.
(mve_vstrw_immediate): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
check-function-bodies instead of scan-assembler checks.  Use
extern "C" for C++ testing.
* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
---
 gcc/config/arm/constraints.md | 20 --
 gcc/config/arm/mve.md | 10 ++---
 gcc/config/arm/predicates.md  | 14 +++
 .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
 .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
 .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
 .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
 .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_s32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_u32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_offset_f32.c| 32 ---
 .../intrinsics/vstrwq_scatter_offset_p_f32.c  | 40 ---

[commited trunk 9/9] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

These newly updated tests were rewritten by Andrea. Some of them
needed further manual fixing as follows:

* The #shift immediate value not in the check-function-bodies as expected
* The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
  this case the test rewritten from the ACLE had the lsr+and pattern,
  but the compiler was able to optimise to ubfx. Hence I've changed the
  test to now match on ubfx.
* Added a separate test to check shift on constants being optimised to
  movs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/mve_const_shifts.c: New test.
---
 .../gcc.target/arm/mve/intrinsics/srshr.c |  2 +-
 .../gcc.target/arm/mve/intrinsics/srshrl.c|  2 +-
 .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +--
 .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +--
 .../gcc.target/arm/mve/intrinsics/urshr.c |  4 +-
 .../gcc.target/arm/mve/intrinsics/urshrl.c|  4 +-
 .../arm/mve/intrinsics/vadciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vadciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vadcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vadcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vsbciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c |  8 +---
 .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++
 23 files changed, 81 insertions(+), 128 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
index 94e3f42fd33..734375d58c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 */
 int32_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
index 65f28ccbfde..a91943c38a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|)
+** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|)
 ** ...
 */
 int64_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
index b23c9d97ba6..462531cad54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** uqshl   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** uqshl   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...

[commited trunk 7/9] arm testsuite: Remove reduntant tests

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches

Following Andrea's overhaul of the MVE testsuite, these tests are now
reduntant, as equivalent checks have been added to the each intrinsic's
.c test.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: 
Removed.
* gc

[PATCH] aarch64: Implement vector FP absolute compare intrinsics with builtins

2023-05-18 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

While optimising some vector math library code with intrinsics we stumbled upon 
the issue in the testcase.
The compiler should be generating a FACGT instruction but instead we generate:
foo(__Float32x4_t, __Float32x4_t, __Float32x4_t):
fabsv0.4s, v0.4s
adrpx0, .LC0
ldr q31, [x0, #:lo12:.LC0]
fcmgt   v0.4s, v0.4s, v31.4s
ret

This is because the vcagtq_f32 intrinsic is open-coded in arm_neon.h as
return vabsq_f32 (__a) > vabsq_f32 (__b)
thus relying on the optimisers to merge it back together. But since one of the 
arms of the comparison
is a vector constant the combine pass optimises the abs into it and tries 
matching:
(set (reg:V4SI 101)
(neg:V4SI (gt:V4SI (reg:V4SF 100)
(const_vector:V4SF [
(const_double:SF 1.0e+2 [0x0.c8p+7]) repeated x4
]
and
(set (reg:V4SI 101)
(neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 104))
(reg:V4SF 103

instead of what we want:
(insn 13 9 14 2 (set (reg/i:V4SI 32 v0)
(neg:V4SI (gt:V4SI (abs:V4SF (reg:V4SF 98))
(abs:V4SF (reg:V4SF 96)

I don't really see a good way around that with our current implementation of 
these intrinsics.
Therefore this patch reimplements these intrinsics with aarch64 builtins that 
generate the RTL for these
instructions directly. Apparently we already had them defined in 
aarch64-simd-builtins.def and have been
using them for the fp16 case already.
I realise that this approach is against the general principle of expressing 
intrinsics in the higher-level constructs,
so I'm willing to listen to counter-arguments.
That said, the FACGT/FACGE instructions are as fast as the non-ABS comparison 
instructions on all microarchitectures that I know of
so it should always be a win to have them in the merged form rather than split 
the fabs step separately or try to hoist it.
And the testcase does come from real library code that we're trying to optimise.
With this patch for the testcase we generate:
foo:
adrpx0, .LC0
ldr q31, [x0, #:lo12:.LC0]
facgt   v0.4s, v0.4s, v31.4s
ret

Bootstrapped and tested on aarch64-none-linux-gnu.
I'll hold off on committing this to give folks a few days to comment, but will 
push by the end of next week if there are no objections.

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/arm_neon.h (vcage_f64): Reimplement with builtins.
(vcage_f32): Likewise.
(vcages_f32): Likewise.
(vcageq_f32): Likewise.
(vcaged_f64): Likewise.
(vcageq_f64): Likewise.
(vcagts_f32): Likewise.
(vcagt_f32): Likewise.
(vcagt_f64): Likewise.
(vcagtq_f32): Likewise.
(vcagtd_f64): Likewise.
(vcagtq_f64): Likewise.
(vcale_f32): Likewise.
(vcale_f64): Likewise.
(vcaled_f64): Likewise.
(vcales_f32): Likewise.
(vcaleq_f32): Likewise.
(vcaleq_f64): Likewise.
(vcalt_f32): Likewise.
(vcalt_f64): Likewise.
(vcaltd_f64): Likewise.
(vcaltq_f32): Likewise.
(vcaltq_f64): Likewise.
(vcalts_f32): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/facgt_constpool_1.c: New test.


facgt.patch
Description: facgt.patch

[PATCH] rs6000: Update powerpc test fold-vec-extract-int.p8.c

2023-05-18 Thread Ajit Agarwal via Gcc-patches



Hello All:

Update powerpc tests with extra zero_extend removal with default ree pass.
Bootstrapped and Regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000: Update powerpc test fold-vec-extract-int.p8.c

Update powerpc tests with extra zero_extend removal with default ree pass.

2023-04-16  Ajit Kumar Agarwal  

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-extract-int.p8.c: Update test.
---
 gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
index 75eaf25943b..e8f1055ddc0 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
@@ -13,7 +13,7 @@
 
 /* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
 /* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } 
} } */
 /* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */
 /* { dg-final { scan-assembler-times {\msldi\M} 3  { target lp64 } } } */
-- 
2.31.1

RE: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-18 Thread Li, Pan2 via Gcc-patches

Synced with today(5/18/2023)'s upstream, passed the bootstrap and regression 
test in X86.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Tuesday, May 16, 2023 6:23 PM
To: gcc-patches@gcc.gnu.org
Cc: richard.sandif...@arm.com; rguent...@suse.de; Ju-Zhe Zhong 

Subject: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

From: Ju-Zhe Zhong 

This patch implement decrement IV for length approach in loop control.

Address comment from kewen that incorporate the implementation inside 
"vect_set_loop_controls_directly" instead of a standalone function.

Address comment from Richard using MIN_EXPR to handle these 3 following cases 
1. single rgroup.
2. multiple rgroup for SLP.
3. multiple rgroup for non-SLP (tested on vec_pack_trunc).

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variable.
(vect_get_loop_len): Add decrement IV support.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
(vect_get_loop_len): Add decrement IV support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
test.

---
 .../rvv/autovec/partial/multiple_rgroup-1.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-1.h   | 304 ++
 .../rvv/autovec/partial/multiple_rgroup-2.c   |   6 +
 .../rvv/autovec/partial/multiple_rgroup-2.h   | 546 ++
 .../autovec/partial/multiple_rgroup_run-1.c   |  19 +
 .../autovec/partial/multiple_rgroup_run-2.c   |  19 +
 gcc/tree-vect-loop-manip.cc   | 184 +-
 gcc/tree-vect-loop.cc |  37 +-
 gcc/tree-vect-stmts.cc|   9 +-
 gcc/tree-vectorizer.h |  13 +-
 10 files changed, 1132 insertions(+), 11 deletions(-)  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
new file mode 100644
index 000..69cc3be78f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup
+++ -1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
+riscv-autovec-preference=fixed-vlmax" } */
+
+#include "multiple_rgroup-1.h"
+
+TEST_ALL (test_1)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
new file mode 100644
index 000..fbc49f4855d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup
+++ -1.h
@@ -0,0 +1,304 @@
+#include 
+#include 
+
+#define test_1(TYPE1, TYPE2)   
\
+  void __attribute__ ((noinline, noclone)) 
\
+  test_1_##TYPE1_##TYPE2 (TYPE1 *__restrict f, TYPE2 *__restrict d, TYPE1 x,   
\
+ TYPE1 x2, TYPE2 y, int n)\
+  {
\
+for (int i = 0; i < n; ++i)
\
+  {
\
+   f[i * 2 + 0] = x;  \
+   f[i * 2 + 1] = x2; \
+   d[i] = y;  \
+  }
\
+  }
+
+#define run_1(TYPE1, TYPE2)
\
+  int n_1_##TYPE1_##TYPE2 = 1;

Re: [PATCH] Fix internal error on small array with negative lower bound

2023-05-18 Thread Richard Biener via Gcc-patches

On Thu, May 18, 2023 at 11:51 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> Ada supports arrays with negative indices, although the internal index type is
> sizetype like in other languages, which is unsigned.  This means that negative
> values are represented by very large numbers, which works with a bit of care.
> The attached test exposes a small loophole in output_constructor_bitfield.
>
> Tested on x86-64/Linux, OK for the mainline?

Would it be better to use

  wi::to_uhwi (wi::to_wide (local->index) - wi::to_wide (local->min_index))

to honor the actual sign of the indices?  I think nothing forbids frontends to
use a signed TYPE_DOMAIN here?  But the difference should be always
representable in an unsigned value of course.

>
> 2023-05-18  Eric Botcazou 
>
> * varasm.cc (output_constructor_bitfield): Call tree_to_uhwi instead
> of tree_to_shwi on array indices.  Minor tweaks.
>
>
> 2023-05-18  Eric Botcazou 
>
> * gnat.dg/specs/array6.ads: New test.
>
> --
> Eric Botcazou

[PING] [C PATCH] Fix ICEs related to VM types in C [PR106465, PR107557, PR108423, PR109450]

2023-05-18 Thread Martin Uecker via Gcc-patches




Ping. Ok, for trunk?

Bootstrapped and tested on x86_64-linux-gnu with no regressions.



Fix ICEs related to VM types in C [PR106465, PR107557, PR108423, PR109450]

Size expressions were sometimes lost and not gimplified correctly, leading 
to
ICEs and incorrect evaluation order.  Fix this by 1) not recursing into
pointers when gimplifying parameters in the middle-end (the code is merged 
with
gimplify_type_sizes), which is incorrect because it might access variables
declared later for incomplete structs, and 2) tracking size expressions for
struct/union members correctly, 3) emitting code to evaluate size 
expressions
for missing cases (nested functions, empty declarations, and 
structs/unions).

PR c/106465
PR c/107557
PR c/108423
PR c/109450

gcc/
* c/c-decl.cc (start_decl): Make sure size expression are
evaluated only in correct context.
(grokdeclarator): Size expression in fields may need a bind
expression, make sure DECL_EXPR is always created.
(grokfield, declspecs_add_type): Pass along size expressions.
(finish_struct): Remove unneeded DECL_EXPR.
(start_function): Evaluate size expressions for nested functions.
* c/c-parser.cc (c_parser_struct_declarations,
c_parser_struct_or_union_specifier): Pass along size expressions.
(c_parser_declaration_or_fndef): Evaluate size expression.
(c_parser_objc_at_property_declaration,
c_parser_objc_class_instance_variables): Adapt.
* function.cc (gimplify_parm_type): Remove function.
(gimplify_parameters): Call gimplify_parm_sizes.
* gimplify.cc (gimplify_type_sizes): Make function static.
(gimplify_parm_sizes): New function.

gcc/testsuite/
* gcc.dg/nested-vla-1.c: New test.
* gcc.dg/nested-vla-2.c: New test.
* gcc.dg/nested-vla-3.c: New test.
* gcc.dg/pr106465.c: New test.
* gcc.dg/pr107557-1.c: New test.
* gcc.dg/pr107557-2.c: New test.
* gcc.dg/pr108423-1.c: New test.
* gcc.dg/pr108423-2.c: New test.
* gcc.dg/pr108423-3.c: New test.
* gcc.dg/pr108423-4.c: New test.
* gcc.dg/pr108423-5.c: New test.
* gcc.dg/pr108423-6.c: New test.
* gcc.dg/pr109450-1.c: New test.
* gcc.dg/pr109450-2.c: New test.
* gcc.dg/typename-vla-2.c: New test.
* gcc.dg/typename-vla-3.c: New test.
* gcc.dg/typename-vla-4.c: New test.
* gcc.misc-tests/gcov-pr85350.c: Adapt.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 90d7cd27cd5..f63c1108ab5 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5378,7 +5378,8 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
 if (lastdecl != error_mark_node)
   *lastloc = DECL_SOURCE_LOCATION (lastdecl);
 
-  if (expr)
+  /* Make sure the size expression is evaluated at this point.  */
+  if (expr && !current_scope->parm_flag)
 add_stmt (fold_convert (void_type_node, expr));
 
   if (TREE_CODE (decl) != FUNCTION_DECL && MAIN_NAME_P (DECL_NAME (decl))
@@ -7510,7 +7511,8 @@ grokdeclarator (const struct c_declarator *declarator,
&& c_type_variably_modified_p (type))
  {
tree bind = NULL_TREE;
-   if (decl_context == TYPENAME || decl_context == PARM)
+   if (decl_context == TYPENAME || decl_context == PARM
+   || decl_context == FIELD)
  {
bind = build3 (BIND_EXPR, void_type_node, NULL_TREE,
   NULL_TREE, NULL_TREE);
@@ -7519,10 +7521,11 @@ grokdeclarator (const struct c_declarator *declarator,
push_scope ();
  }
tree decl = build_decl (loc, TYPE_DECL, NULL_TREE, type);
-   DECL_ARTIFICIAL (decl) = 1;
pushdecl (decl);
-   finish_decl (decl, loc, NULL_TREE, NULL_TREE, NULL_TREE);
+   DECL_ARTIFICIAL (decl) = 1;
+   add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, 
decl));
TYPE_NAME (type) = decl;
+
if (bind)
  {
pop_scope ();
@@ -8721,7 +8724,7 @@ start_struct (location_t loc, enum tree_code code, tree 
name,
 tree
 grokfield (location_t loc,
   struct c_declarator *declarator, struct c_declspecs *declspecs,
-  tree width, tree *decl_attrs)
+  tree width, tree *decl_attrs, tree *expr)
 {
   tree value;
 
@@ -8778,7 +8781,7 @@ grokfield (location_t loc,
 }
 
   value = grokdeclarator (declarator, declspecs, FIELD, false,
- width ? &width : NULL, decl_attrs, NULL, NULL,
+ wi

[PATCH 0/3] Fix nonportable shell syntax in "test" and "[" commands

2023-05-18 Thread Jonathan Wakely via Gcc-patches



Tested powerpc64le-linux.

I plan to push these as obvious, unless I hear objections.

Jonathan Wakely (2):
  gcc: Fix nonportable shell syntax in "test" and "[" commands
[PR105831]
  contrib: Fix nonportable shell syntax in "test" and "[" commands
[PR105831]

Michael Bäuerle (1):
  gcc: Fix nonportable shell syntax in "test" and "[" commands
[PR105831]

 contrib/bench-stringop   |  4 ++--
 contrib/reghunt/bin/reg-hunt |  2 +-
 contrib/repro_fail   |  4 ++--
 gcc/config.gcc   |  2 +-
 gcc/config/nvptx/gen-opt.sh  |  2 +-
 gcc/configure|  2 +-
 gcc/configure.ac |  2 +-
 gcc/testsuite/gcc.test-framework/gen_directive_tests | 12 ++--
 8 files changed, 15 insertions(+), 15 deletions(-)

-- 
2.40.1

[PATCH 2/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

POSIX sh does not support the == for string comparisons, use = instead.

The gen_directive_tests script uses a bash shebang so == does work, but
there's no reason this script can't just use the more portable form
anyway.

PR bootstrap/105831

gcc/ChangeLog:

* config.gcc: Use = operator instead of ==.

gcc/testsuite/ChangeLog:

* gcc.test-framework/gen_directive_tests: Use = operator instead
of ==.
---
 gcc/config.gcc   |  2 +-
 gcc/testsuite/gcc.test-framework/gen_directive_tests | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e08c67d7cde..d88071773c9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2441,7 +2441,7 @@ riscv*-*-elf* | riscv*-*-rtems*)
  tmake_file="${tmake_file} riscv/t-rtems"
  ;;
*)
- if test "x${with_multilib_generator}" == xdefault; then
+ if test "x${with_multilib_generator}" = xdefault; then
  case "x${enable_multilib}" in
  xno) ;;
  xyes) tmake_file="${tmake_file} riscv/t-elf-multilib" ;;
diff --git a/gcc/testsuite/gcc.test-framework/gen_directive_tests 
b/gcc/testsuite/gcc.test-framework/gen_directive_tests
index 29f0a734877..87b3f3d1b40 100644
--- a/gcc/testsuite/gcc.test-framework/gen_directive_tests
+++ b/gcc/testsuite/gcc.test-framework/gen_directive_tests
@@ -283,8 +283,8 @@ one() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "yes" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if [ "${FAIL_VERSION}" = "yes" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND}-${EXP_FAIL}
else
NAME=${KIND}-${EXP_XFAIL}
@@ -322,8 +322,8 @@ two() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "yes" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if  "yes" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND1}-${KIND2}-${EXP_FAIL}
else
NAME=${KIND1}-${KIND2}-${EXP_XFAIL}
@@ -364,8 +364,8 @@ three() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "${yes}" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if [ "${FAIL_VERSION}" = "${yes}" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_FAIL}
else
NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_XFAIL}
-- 
2.40.1

[PATCH 1/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

From: Michael BÃ¤uerle 

POSIX sh does not support the == for string comparisons, use = instead.

gcc/ChangeLog:

PR bootstrap/105831
* config/nvptx/gen-opt.sh: Use = operator instead of ==.
* configure.ac: Likewise.
* configure: Regenerate.
---
 gcc/config/nvptx/gen-opt.sh | 2 +-
 gcc/configure   | 2 +-
 gcc/configure.ac| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index dc10722b96f..cab73f5e663 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -56,7 +56,7 @@ EnumValue
 Enum(ptx_isa) String(sm_$sm) Value(PTX_ISA_SM$sm)
 EOF
 
-if [ "$sm" == "$last" ]; then
+if [ "$sm" = "$last" ]; then
# Don't end with trailing empty line.
continue
 fi
diff --git a/gcc/configure b/gcc/configure
index 191f68581b3..5f67808b774 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -6409,7 +6409,7 @@ _ACEOF
 if test "$enable_largefile" != no; then
   case "$host, $build" in
 *-*-aix*,*|*,*-*-aix*)
-  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" == 4; then
+  if test "$ac_cv_sizeof_ino_t" = "4" -a "$ac_cv_sizeof_dev_t" = 4; then
 
 $as_echo "#define HOST_STAT_FOR_64BIT_INODES stat64x" >>confdefs.h
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 075424669c9..cc8dd9e20bf 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -473,7 +473,7 @@ AC_CHECK_SIZEOF(dev_t)
 if test "$enable_largefile" != no; then
   case "$host, $build" in
 *-*-aix*,*|*,*-*-aix*)
-  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" == 4; then
+  if test "$ac_cv_sizeof_ino_t" = "4" -a "$ac_cv_sizeof_dev_t" = 4; then
AC_DEFINE(HOST_STAT_FOR_64BIT_INODES, stat64x,
  [Define which stat syscall is able to handle 64bit indodes.])
   fi;;
-- 
2.40.1

[PATCH 3/3] contrib: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

POSIX sh does not support the == for string comparisons, use = instead.

These contrib scripts all use a bash shebang so == does work, but
there's no reason they can't just use the more portable form anyway.

PR bootstrap/105831

contrib/ChangeLog:

* bench-stringop: Use = operator instead of ==.
* repro_fail: Likewise.

contrib/reghunt/ChangeLog:

* bin/reg-hunt: Use = operator instead of ==.
---
 contrib/bench-stringop   | 4 ++--
 contrib/reghunt/bin/reg-hunt | 2 +-
 contrib/repro_fail   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/contrib/bench-stringop b/contrib/bench-stringop
index daf1bce6e6f..f058e066b3a 100755
--- a/contrib/bench-stringop
+++ b/contrib/bench-stringop
@@ -87,7 +87,7 @@ test "$2" "$3" "-mstringop-strategy=rep_byte 
-malign-stringops" rep1
 test "$2" "$3" "-mstringop-strategy=rep_byte -mno-align-stringops" rep1noalign
 test "$2" "$3" "-mstringop-strategy=rep_4byte -malign-stringops" rep4
 test "$2" "$3" "-mstringop-strategy=rep_4byte -mno-align-stringops" rep4noalign
-if [ "$mode" == 64 ]
+if [ "$mode" = 64 ]
 then
 test "$2" "$3" "-mstringop-strategy=rep_8byte -malign-stringops" rep8
 test "$2" "$3" "-mstringop-strategy=rep_8byte -mno-align-stringops" rep8noalign
@@ -109,7 +109,7 @@ echo "$best"
 
 test_all_sizes()
 {
-if [ "$mode" == 64 ]
+if [ "$mode" = 64 ]
 then
 echo "  block size  libcall rep1noalg   rep4noalg   rep8noalg   
loopnoalg   unrlnoalg   sse noalg   bytePGO dynamicBEST"
 else
diff --git a/contrib/reghunt/bin/reg-hunt b/contrib/reghunt/bin/reg-hunt
index 6427535dabe..aff4e9005b5 100755
--- a/contrib/reghunt/bin/reg-hunt
+++ b/contrib/reghunt/bin/reg-hunt
@@ -142,7 +142,7 @@ process_patch () {
 # build failures, quit now.
 
 if [ ${SKIP} -eq 0 ]; then
-  if [ "x${REG_NEWMID}" == "x" \
+  if [ "x${REG_NEWMID}" = "x" \
-o ${TEST_ID} -eq ${LATER_THAN} \
-o ${TEST_ID} -eq ${EARLIER_THAN} ]; then
 error "build failed for ${TEST_ID}"
diff --git a/contrib/repro_fail b/contrib/repro_fail
index 9ea79f2bccf..abb479d08aa 100755
--- a/contrib/repro_fail
+++ b/contrib/repro_fail
@@ -42,10 +42,10 @@ if [ $# -lt 2 ] ; then
 exit 1
 fi
 
-if [ "$1" == "--debug" ] ; then
+if [ "$1" = "--debug" ] ; then
 debug_args="-wrapper gdb,--args"
 shift
-elif [ "$1" == "--debug-tui" ] ; then
+elif [ "$1" = "--debug-tui" ] ; then
 debug_args="-wrapper gdb,--tui,--args"
 shift
 else
-- 
2.40.1

Re: [PATCH 2/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jakub Jelinek via Gcc-patches

On Thu, May 18, 2023 at 01:56:46PM +0100, Jonathan Wakely via Gcc-patches wrote:
> --- a/gcc/testsuite/gcc.test-framework/gen_directive_tests
> +++ b/gcc/testsuite/gcc.test-framework/gen_directive_tests
> @@ -322,8 +322,8 @@ two() {
>  echo "${GOOD_PROG}" >> $FILE1
>  echo "${GOOD_PROG}" > $FILE2
>  
> -if [ "${FAIL_VERSION}" == "yes" ]; then
> - if [ "${EXP}" == "${EXP_PASS}" ]; then
> +if  "yes" ]; then

This line looks suspicious...

> + if [ "${EXP}" = "${EXP_PASS}" ]; then
>   NAME=${KIND1}-${KIND2}-${EXP_FAIL}
>   else
>   NAME=${KIND1}-${KIND2}-${EXP_XFAIL}
> @@ -364,8 +364,8 @@ three() {
>  echo "${GOOD_PROG}" >> $FILE1
>  echo "${GOOD_PROG}" > $FILE2
>  
> -if [ "${FAIL_VERSION}" == "${yes}" ]; then
> - if [ "${EXP}" == "${EXP_PASS}" ]; then
> +if [ "${FAIL_VERSION}" = "${yes}" ]; then
> + if [ "${EXP}" = "${EXP_PASS}" ]; then
>   NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_FAIL}
>   else
>   NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_XFAIL}
> -- 
> 2.40.1

Jakub

Re: [PATCH 1/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jakub Jelinek via Gcc-patches

On Thu, May 18, 2023 at 01:56:45PM +0100, Jonathan Wakely via Gcc-patches wrote:
> From: Michael B??uerle 
> 
> POSIX sh does not support the == for string comparisons, use = instead.
> 
> gcc/ChangeLog:
> 
>   PR bootstrap/105831
>   * config/nvptx/gen-opt.sh: Use = operator instead of ==.
>   * configure.ac: Likewise.
>   * configure: Regenerate.

LGTM.

Jakub

Re: [PATCH 3/3] contrib: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jakub Jelinek via Gcc-patches

On Thu, May 18, 2023 at 01:56:47PM +0100, Jonathan Wakely via Gcc-patches wrote:
> POSIX sh does not support the == for string comparisons, use = instead.
> 
> These contrib scripts all use a bash shebang so == does work, but
> there's no reason they can't just use the more portable form anyway.
> 
>   PR bootstrap/105831
> 
> contrib/ChangeLog:
> 
>   * bench-stringop: Use = operator instead of ==.
>   * repro_fail: Likewise.
> 
> contrib/reghunt/ChangeLog:
> 
>   * bin/reg-hunt: Use = operator instead of ==.

LGTM.

Jakub

[PATCH v2 2/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

Fixes a fat finger error in the v1 patch, spotted by Jakub.

-- >8 --

POSIX sh does not support the == for string comparisons, use = instead.

The gen_directive_tests script uses a bash shebang so == does work, but
there's no reason this script can't just use the more portable form
anyway.

PR bootstrap/105831

gcc/ChangeLog:

* config.gcc: Use = operator instead of ==.

gcc/testsuite/ChangeLog:

* gcc.test-framework/gen_directive_tests: Use = operator instead
of ==.
---
 gcc/config.gcc   |  2 +-
 gcc/testsuite/gcc.test-framework/gen_directive_tests | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e08c67d7cde..d88071773c9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2441,7 +2441,7 @@ riscv*-*-elf* | riscv*-*-rtems*)
  tmake_file="${tmake_file} riscv/t-rtems"
  ;;
*)
- if test "x${with_multilib_generator}" == xdefault; then
+ if test "x${with_multilib_generator}" = xdefault; then
  case "x${enable_multilib}" in
  xno) ;;
  xyes) tmake_file="${tmake_file} riscv/t-elf-multilib" ;;
diff --git a/gcc/testsuite/gcc.test-framework/gen_directive_tests 
b/gcc/testsuite/gcc.test-framework/gen_directive_tests
index 29f0a734877..1cfc8432f60 100644
--- a/gcc/testsuite/gcc.test-framework/gen_directive_tests
+++ b/gcc/testsuite/gcc.test-framework/gen_directive_tests
@@ -283,8 +283,8 @@ one() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "yes" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if [ "${FAIL_VERSION}" = "yes" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND}-${EXP_FAIL}
else
NAME=${KIND}-${EXP_XFAIL}
@@ -322,8 +322,8 @@ two() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "yes" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if [ "${FAIL_VERSION}" = "yes" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND1}-${KIND2}-${EXP_FAIL}
else
NAME=${KIND1}-${KIND2}-${EXP_XFAIL}
@@ -364,8 +364,8 @@ three() {
 echo "${GOOD_PROG}" >> $FILE1
 echo "${GOOD_PROG}" > $FILE2
 
-if [ "${FAIL_VERSION}" == "${yes}" ]; then
-   if [ "${EXP}" == "${EXP_PASS}" ]; then
+if [ "${FAIL_VERSION}" = "${yes}" ]; then
+   if [ "${EXP}" = "${EXP_PASS}" ]; then
NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_FAIL}
else
NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_XFAIL}
-- 
2.40.1

Re: [PATCH 2/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

On Thu, 18 May 2023 at 13:59, Jakub Jelinek wrote:

> On Thu, May 18, 2023 at 01:56:46PM +0100, Jonathan Wakely via Gcc-patches
> wrote:
> > --- a/gcc/testsuite/gcc.test-framework/gen_directive_tests
> > +++ b/gcc/testsuite/gcc.test-framework/gen_directive_tests
> > @@ -322,8 +322,8 @@ two() {
> >  echo "${GOOD_PROG}" >> $FILE1
> >  echo "${GOOD_PROG}" > $FILE2
> >
> > -if [ "${FAIL_VERSION}" == "yes" ]; then
> > - if [ "${EXP}" == "${EXP_PASS}" ]; then
> > +if  "yes" ]; then
>
> This line looks suspicious...
>

Yikes! I think instead of clicking on the first '=' character I must have
selected the whole of `"${FAIL_VERSION}" =` and then deleted it all.

v2 patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618919.html


> > + if [ "${EXP}" = "${EXP_PASS}" ]; then
> >   NAME=${KIND1}-${KIND2}-${EXP_FAIL}
> >   else
> >   NAME=${KIND1}-${KIND2}-${EXP_XFAIL}
> > @@ -364,8 +364,8 @@ three() {
> >  echo "${GOOD_PROG}" >> $FILE1
> >  echo "${GOOD_PROG}" > $FILE2
> >
> > -if [ "${FAIL_VERSION}" == "${yes}" ]; then
> > - if [ "${EXP}" == "${EXP_PASS}" ]; then
> > +if [ "${FAIL_VERSION}" = "${yes}" ]; then
> > + if [ "${EXP}" = "${EXP_PASS}" ]; then
> >   NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_FAIL}
> >   else
> >   NAME=${KIND1}-${KIND2}-${KIND3}-${EXP_XFAIL}
> > --
> > 2.40.1
>
> Jakub
>
>

Re: [PATCH v2 2/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jakub Jelinek via Gcc-patches

On Thu, May 18, 2023 at 02:03:58PM +0100, Jonathan Wakely via Gcc-patches wrote:
> Fixes a fat finger error in the v1 patch, spotted by Jakub.
> 
> -- >8 --
> 
> POSIX sh does not support the == for string comparisons, use = instead.
> 
> The gen_directive_tests script uses a bash shebang so == does work, but
> there's no reason this script can't just use the more portable form
> anyway.
> 
>   PR bootstrap/105831
> 
> gcc/ChangeLog:
> 
>   * config.gcc: Use = operator instead of ==.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.test-framework/gen_directive_tests: Use = operator instead
>   of ==.

That looks better ;)  Ok.

Jakub

Re: [PATCH] rs6000: Update powerpc test fold-vec-extract-int.p8.c

2023-05-18 Thread Peter Bergner via Gcc-patches

On 5/18/23 6:16 AM, Ajit Agarwal via Gcc-patches wrote:
> -/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
> +/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
>  /* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } 
> } } } */

Can you please check whether the big-endian count needs updating too?
Thanks.

Peter

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-18 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 18 May 2023 at 13:37, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Tue, 16 May 2023 at 00:29, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > Hi Richard,
> >> > After committing the interleave+zip1 patch for vector initialization,
> >> > it seems to regress the s32 case for this patch:
> >> >
> >> > int32x4_t f_s32(int32_t x)
> >> > {
> >> >   return (int32x4_t) { x, x, x, 1 };
> >> > }
> >> >
> >> > code-gen:
> >> > f_s32:
> >> > moviv30.2s, 0x1
> >> > fmovs31, w0
> >> > dup v0.2s, v31.s[0]
> >> > ins v30.s[0], v31.s[0]
> >> > zip1v0.4s, v0.4s, v30.4s
> >> > ret
> >> >
> >> > instead of expected code-gen:
> >> > f_s32:
> >> > moviv31.2s, 0x1
> >> > dup v0.4s, w0
> >> > ins v0.s[3], v31.s[0]
> >> > ret
> >> >
> >> > Cost for fallback sequence: 16
> >> > Cost for interleave and zip sequence: 12
> >> >
> >> > For the above case, the cost for interleave+zip1 sequence is computed as:
> >> > halves[0]:
> >> > (set (reg:V2SI 96)
> >> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
> >> > cost = 8
> >> >
> >> > halves[1]:
> >> > (set (reg:V2SI 97)
> >> > (const_vector:V2SI [
> >> > (const_int 1 [0x1]) repeated x2
> >> > ]))
> >> > (set (reg:V2SI 97)
> >> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
> >> > (reg:V2SI 97)
> >> > (const_int 1 [0x1])))
> >> > cost = 8
> >> >
> >> > followed by:
> >> > (set (reg:V4SI 95)
> >> > (unspec:V4SI [
> >> > (subreg:V4SI (reg:V2SI 96) 0)
> >> > (subreg:V4SI (reg:V2SI 97) 0)
> >> > ] UNSPEC_ZIP1))
> >> > cost = 4
> >> >
> >> > So the total cost becomes
> >> > max(costs[0], costs[1]) + zip1_insn_cost
> >> > = max(8, 8) + 4
> >> > = 12
> >> >
> >> > While the fallback rtl sequence is:
> >> > (set (reg:V4SI 95)
> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> >> > cost = 8
> >> > (set (reg:SI 98)
> >> > (const_int 1 [0x1]))
> >> > cost = 4
> >> > (set (reg:V4SI 95)
> >> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
> >> > (reg:V4SI 95)
> >> > (const_int 8 [0x8])))
> >> > cost = 4
> >> >
> >> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 
> >> > sequence.
> >> >
> >> > I think the issue is probably that for the interleave+zip1 sequence we 
> >> > take
> >> > max(costs[0], costs[1]) to reflect that both halves are interleaved,
> >> > but for the fallback seq we use seq_cost, which assumes serial execution
> >> > of insns in the sequence.
> >> > For above fallback sequence,
> >> > set (reg:V4SI 95)
> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> >> > and
> >> > (set (reg:SI 98)
> >> > (const_int 1 [0x1]))
> >> > could be executed in parallel, which would make it's cost max(8, 4) + 4 
> >> > = 12.
> >>
> >> Agreed.
> >>
> >> A good-enough substitute for this might be to ignore scalar moves
> >> (for both alternatives) when costing for speed.
> > Thanks for the suggestions. Just wondering for aarch64, if there's an easy
> > way we can check if insn is a scalar move, similar to riscv's 
> > scalar_move_insn_p
> > that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ?
>
> It should be enough to check that the pattern is a SET:
>
> (a) whose SET_DEST has a scalar mode and
> (b) whose SET_SRC an aarch64_mov_operand
Hi Richard,
Thanks for the suggestions, the attached patch calls seq_cost to compute
cost for sequence and then subtracts cost of each scalar move insn from it.
Does that look OK ?
The patch is under bootstrap+test on aarch64-linux-gnu.

After applying the single-constant case patch on top, the cost of fallback
sequence is now reduced to 12 instead of 16:
Cost before ignoring scalar moves: 16
Ignoring cost = 4 for: (set (reg:SI 98)
(const_int 1 [0x1]))
Cost after ignoring scalar moves: 12
fallback_seq_cost = 12, zip1_seq_cost = 12

fallback_seq:
(set (reg:V4SI 95)
(vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
(set (reg:SI 98)
(const_int 1 [0x1]))
(set (reg:V4SI 95)
(vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
(reg:V4SI 95)
(const_int 8 [0x8])))

zip1_seq:
(set (reg:V2SI 96)
(vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
(set (reg:V2SI 97)
(const_vector:V2SI [
(const_int 1 [0x1]) repeated x2
]))
(set (reg:V2SI 97)
(vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
(reg:V2SI 97)
(const_int 1 [0x1])))
(set (reg:V4SI 95)
(unspec:V4SI [
(subreg:V4SI (reg:V2SI 96) 0)
(subreg:V4SI (reg:V2SI 97) 0)
] UNSPEC_ZIP1))

So now the costs for both sequences are tied at 12, and so it now
chooses the fallback sequence,
which "fixes" this case. However, more generally, if the costs for
both sequences are tied,
how do we evaluate which sequence'd be better ? Currently we choose
the fallback sequence if
the costs for both sequences are same.
>
>

Re: [PATCH 1/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On 18 May 2023 14:56:45 CEST, Jonathan Wakely via Gcc-patches 
 wrote:
>From: Michael B��uerle 
>
>POSIX sh does not support the == for string comparisons, use = instead.
>
>gcc/ChangeLog:
>
>   PR bootstrap/105831

> 
>diff --git a/gcc/configure.ac b/gcc/configure.ac
>index 075424669c9..cc8dd9e20bf 100644
>--- a/gcc/configure.ac
>+++ b/gcc/configure.ac
>@@ -473,7 +473,7 @@ AC_CHECK_SIZEOF(dev_t)
> if test "$enable_largefile" != no; then
>   case "$host, $build" in
> *-*-aix*,*|*,*-*-aix*)
>-  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" == 4; then
>+  if test "$ac_cv_sizeof_ino_t" = "4" -a "$ac_cv_sizeof_dev_t" = 4; then

test(1) -a and -o are marked obsolescent in SUS and should be spelled out as && 
or ||, respectively: 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html

thanks,

Re: [PATCH 1/3] gcc: Fix nonportable shell syntax in "test" and "[" commands [PR105831]

2023-05-18 Thread Jonathan Wakely via Gcc-patches

On Thu, 18 May 2023 at 15:47, Bernhard Reutner-Fischer <
rep.dot@gmail.com> wrote:

> On 18 May 2023 14:56:45 CEST, Jonathan Wakely via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >From: Michael B��uerle 
> >
> >POSIX sh does not support the == for string comparisons, use = instead.
> >
> >gcc/ChangeLog:
> >
> >   PR bootstrap/105831
>
> >
> >diff --git a/gcc/configure.ac b/gcc/configure.ac
> >index 075424669c9..cc8dd9e20bf 100644
> >--- a/gcc/configure.ac
> >+++ b/gcc/configure.ac
> >@@ -473,7 +473,7 @@ AC_CHECK_SIZEOF(dev_t)
> > if test "$enable_largefile" != no; then
> >   case "$host, $build" in
> > *-*-aix*,*|*,*-*-aix*)
> >-  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" ==
> 4; then
> >+  if test "$ac_cv_sizeof_ino_t" = "4" -a "$ac_cv_sizeof_dev_t" = 4;
> then
>
> test(1) -a and -o are marked obsolescent in SUS and should be spelled out
> as && or ||, respectively:
> https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
>


To be precise, it would be:

if test "$ac_cv_sizeof_ino_t" = "4" && test "$ac_cv_sizeof_dev_t" = 4; then

i.e. not just replacing -a with &&

The == causes real errors with some sh implementations, so fixing that
fixes bootstrap errors, and was the source of a bug report (and patch
submission). Using -a isn't causing errors for anybody AFAIK, so is less
important.

I'll take a look at the libstdc++ configury though, as I've been meaning to
modernise some of it and am already making changes there.

Re: [PATCH 08/14] fortran: use _P() defines from tree.h

2023-05-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On Sun, 14 May 2023 15:10:12 +0200
Mikael Morin  wrote:

> Le 14/05/2023 à 01:23, Bernhard Reutner-Fischer via Gcc-patches a écrit :
> > From: Bernhard Reutner-Fischer 
> > 
> > gcc/fortran/ChangeLog:
> > 
> > * trans-array.cc (is_pointer_array): Use _P() defines from tree.h.
> > (gfc_conv_scalarized_array_ref): Ditto.
> > (gfc_conv_array_ref): Ditto.
> > * trans-decl.cc (gfc_finish_decl): Ditto.
> > (gfc_get_symbol_decl): Ditto.
> > * trans-expr.cc (gfc_trans_pointer_assignment): Ditto.
> > (gfc_trans_arrayfunc_assign): Ditto.
> > (gfc_trans_assignment_1): Ditto.
> > * trans-intrinsic.cc (gfc_conv_intrinsic_minmax): Ditto.
> > (conv_intrinsic_ieee_value): Ditto.
> > * trans-io.cc (gfc_convert_array_to_string): Ditto.
> > * trans-openmp.cc (gfc_omp_is_optional_argument): Ditto.
> > (gfc_trans_omp_clauses): Ditto.
> > * trans-stmt.cc (gfc_conv_label_variable): Ditto.
> > * trans.cc (gfc_build_addr_expr): Ditto.
> > (get_array_span): Ditto.  
> 
> OK from the fortran side.
> 
> Thanks

Thanks, i'll push it during the weekend.

I've fed gfortran.h into the script and found some CLASS_DATA spots,
see attached bootstrapped and tested patch.
Do we want to have that?
If so, i'd write a proper ChangeLog, of course.

Thanks!
diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 9d0c802b867..1466b07e260 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -889,7 +889,7 @@ copy_vtab_proc_comps (gfc_symbol *declared, gfc_symbol *vtype)
 
   vtab = gfc_find_derived_vtab (declared);
 
-  for (cmp = vtab->ts.u.derived->components; cmp; cmp = cmp->next)
+  for (cmp = CLASS_DATA (vtab); cmp; cmp = cmp->next)
 {
   if (gfc_find_component (vtype, cmp->name, true, true, NULL))
 	continue;
@@ -1078,7 +1078,7 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
   gfc_component *c;
 
   vtab = gfc_find_derived_vtab (comp->ts.u.derived);
-  for (c = vtab->ts.u.derived->components; c; c = c->next)
+  for (c = CLASS_DATA (vtab); c; c = c->next)
 	if (strcmp (c->name, "_final") == 0)
 	  break;
 
@@ -1143,7 +1143,7 @@ finalize_component (gfc_expr *expr, gfc_symbol *derived, gfc_component *comp,
 {
   gfc_component *c;
 
-  for (c = comp->ts.u.derived->components; c; c = c->next)
+  for (c = CLASS_DATA (comp); c; c = c->next)
 	finalize_component (e, comp->ts.u.derived, c, stat, fini_coarray, code,
 			sub_ns);
   gfc_free_expr (e);
@@ -1675,7 +1675,7 @@ generate_finalization_wrapper (gfc_symbol *derived, gfc_namespace *ns,
   gfc_component *comp;
 
   vtab = gfc_find_derived_vtab (derived->components->ts.u.derived);
-  for (comp = vtab->ts.u.derived->components; comp; comp = comp->next)
+  for (comp = CLASS_DATA (vtab); comp; comp = comp->next)
 	if (comp->name[0] == '_' && comp->name[1] == 'f')
 	  {
 	ancestor_wrapper = comp->initializer;
@@ -2752,7 +2752,7 @@ yes:
 {
   /* Return finalizer expression.  */
   gfc_component *final;
-  final = vtab->ts.u.derived->components->next->next->next->next->next;
+  final = CLASS_DATA (vtab)->next->next->next->next->next;
   gcc_assert (strcmp (final->name, "_final") == 0);
   gcc_assert (final->initializer
 		  && final->initializer->expr_type != EXPR_NULL);
diff --git a/gcc/fortran/data.cc b/gcc/fortran/data.cc
index d29eb12c1b1..f907bb35eb1 100644
--- a/gcc/fortran/data.cc
+++ b/gcc/fortran/data.cc
@@ -730,7 +730,7 @@ formalize_structure_cons (gfc_expr *expr)
   if (!cur || cur->n.component == NULL)
 return;
 
-  for (order = expr->ts.u.derived->components; order; order = order->next)
+  for (order = CLASS_DATA (expr); order; order = order->next)
 {
   cur = find_con_by_component (order, expr->value.constructor);
   if (cur)
diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
index b398b29a642..864470afdec 100644
--- a/gcc/fortran/dependency.cc
+++ b/gcc/fortran/dependency.cc
@@ -1253,7 +1253,7 @@ check_data_pointer_types (gfc_expr *expr1, gfc_expr *expr2)
 
   if (sym1->ts.type == BT_DERIVED && !seen_component_ref)
 {
-  for (cm1 = sym1->ts.u.derived->components; cm1; cm1 = cm1->next)
+  for (cm1 = CLASS_DATA (sym1); cm1; cm1 = cm1->next)
 	{
 	  if (cm1->ts.type == BT_DERIVED)
 	return false;
diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index aa01a4d3d22..a6b4ef0a0bf 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -2671,7 +2671,7 @@ check_alloc_comp_init (gfc_expr *e)
   gcc_assert (e->expr_type == EXPR_STRUCTURE);
   gcc_assert (e->ts.type == BT_DERIVED || e->ts.type == BT_CLASS);
 
-  for (comp = e->ts.u.derived->components,
+  for (comp = CLASS_DATA (e),
ctor = gfc_constructor_first (e->value.constructor);
comp; comp = comp->next, ctor = gfc_constructor_next (ctor))
 {
@@ -5061,7 +5061,7 @@ component_initializer (gfc_component *c, bool generate)
   else if (c->ts.type == BT_DERIVED || c->t

[PATCH] stor-layout, aarch64: Express SRA intrinsics with RTL codes

2023-05-18 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch expresses the intrinsics for the SRA and RSRA instructions with
standard RTL codes rather than relying on UNSPECs.
These instructions perform a vector shift right plus accumulate with an
optional rounding constant addition for the RSRA variant.
There are a number of interesting points:

* The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
kinds of trouble as it started matching the existing define_insns for
"add x0, x0, asr #N" instructions and adding the SRA form as an extra
alternative required a significant amount of deduplication of iterators and
things still didn't work out well. I decided not to tackle that case in
this patch. It can be attempted later.

* For the RSRA variants that add a rounding constant (1 << (shift-1)) the
addition is notionally performed in a wider mode than the input types so that
overflow is handled properly. In RTL this can be represented with an appropriate
extend operation followed by a truncate back to the original modes.
However for 128-bit input modes such as V4SI we don't have appropriate modes
defined for this widening i.e. we'd need a V4DI mode to represent the
intermediate widened result.  This patch defines such modes for
V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
more Advanced SIMD instruction that have similar intermediate widening
semantics.

* The above new modes led to a problem with stor-layout.cc. The new modes only
exist for the sake of the RTL optimisers understanding the semantics of the
instruction but are not indended to be moved to and from register or memory,
assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
for these new modes. However, the code in stor-layout.cc:
explicitly doesn't check this when picking a TYPE_MODE due to modes being made
potentially available later through target switching (PR38240).
This led to these modes being picked as TYPE_MODE for declarations such as:
typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
fixed-length SVE modes are available and vector_type_mode later struggling
to rectify this.
This issue is addressed with the new target hook
TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
vector mode can be used in any legal target attribute configuration of the
port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
only the initial target configuration. This allows a simple adjustment in
stor-layout.cc that still disqualifies these limited modes early on while
allowing consideration of modes that can be turned on in the future with
target attributes.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for the non-aarch64 parts?

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes.
* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
Declare prototype.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_sra): Rename to...
(aarch64_sra_n_insn): ... This.
(aarch64_rsra_n_insn): New define_insn.
(aarch64_sra_n): New define_expand.
(aarch64_rsra_n): Likewise.
(aarch64_sra_n): Rename to...
(aarch64_sra_ndi): ... This.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
any_target_p argument.
(aarch64_extract_vec_duplicate_wide_int): Define.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
(aarch64_const_vec_rnd_cst_p): Likewise.
(aarch64_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete.
(VSRA): Adjust for the above.
(sur): Likewise.
(V2XWIDE): New mode_attr.
(vec_or_offset): Likewise.
(SHIFTEXTEND): Likewise.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New
predicate.
* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
* doc/tm.texi.in: Regenerate.
* stor-layout.cc (mode_for_vector): Check
vector_mode_supported_any_target_p when iterating through vector modes.
* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.


sra.patch
Description: sra.patch

Re: [v2] RISC-V: Remove masking third operand of rotate instructions

2023-05-18 Thread Joern Rennecke

On Thu, 18 May 2023 at 16:37, Joern Rennecke  wrote
in https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618928.html :
>
> This breaks building libstdc++-v3 for
> -march=rv32imafdcv_zicsr_zifencei_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b
> -mabi=ilp32f .

Sorry, I forgot the ChangeLog entry for my patch and missed the [v2]
part of the subject.

2023-05-18  Joern Rennecke  

gcc/ChangeLog:
* config/riscv/constraints.md (DsS, DsD): Restore agreement
with shiftm1 mode attribute.
diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index c448e6b37e9..44525b2da49 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -65,13 +65,13 @@
   "@internal
31 immediate"
   (and (match_code "const_int")
-   (match_test "ival == 31")))
+   (match_test "(ival & 31) == 31")))
 
 (define_constraint "DsD"
   "@internal
63 immediate"
   (and (match_code "const_int")
-   (match_test "ival == 63")))
+   (match_test "(ival & 63) == 63")))
 
 (define_constraint "DbS"
   "@internal"

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-18 Thread Richard Sandiford via Gcc-patches

Prathamesh Kulkarni  writes:
> On Thu, 18 May 2023 at 13:37, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Tue, 16 May 2023 at 00:29, Richard Sandiford
>> >  wrote:
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> > Hi Richard,
>> >> > After committing the interleave+zip1 patch for vector initialization,
>> >> > it seems to regress the s32 case for this patch:
>> >> >
>> >> > int32x4_t f_s32(int32_t x)
>> >> > {
>> >> >   return (int32x4_t) { x, x, x, 1 };
>> >> > }
>> >> >
>> >> > code-gen:
>> >> > f_s32:
>> >> > moviv30.2s, 0x1
>> >> > fmovs31, w0
>> >> > dup v0.2s, v31.s[0]
>> >> > ins v30.s[0], v31.s[0]
>> >> > zip1v0.4s, v0.4s, v30.4s
>> >> > ret
>> >> >
>> >> > instead of expected code-gen:
>> >> > f_s32:
>> >> > moviv31.2s, 0x1
>> >> > dup v0.4s, w0
>> >> > ins v0.s[3], v31.s[0]
>> >> > ret
>> >> >
>> >> > Cost for fallback sequence: 16
>> >> > Cost for interleave and zip sequence: 12
>> >> >
>> >> > For the above case, the cost for interleave+zip1 sequence is computed 
>> >> > as:
>> >> > halves[0]:
>> >> > (set (reg:V2SI 96)
>> >> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
>> >> > cost = 8
>> >> >
>> >> > halves[1]:
>> >> > (set (reg:V2SI 97)
>> >> > (const_vector:V2SI [
>> >> > (const_int 1 [0x1]) repeated x2
>> >> > ]))
>> >> > (set (reg:V2SI 97)
>> >> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
>> >> > (reg:V2SI 97)
>> >> > (const_int 1 [0x1])))
>> >> > cost = 8
>> >> >
>> >> > followed by:
>> >> > (set (reg:V4SI 95)
>> >> > (unspec:V4SI [
>> >> > (subreg:V4SI (reg:V2SI 96) 0)
>> >> > (subreg:V4SI (reg:V2SI 97) 0)
>> >> > ] UNSPEC_ZIP1))
>> >> > cost = 4
>> >> >
>> >> > So the total cost becomes
>> >> > max(costs[0], costs[1]) + zip1_insn_cost
>> >> > = max(8, 8) + 4
>> >> > = 12
>> >> >
>> >> > While the fallback rtl sequence is:
>> >> > (set (reg:V4SI 95)
>> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
>> >> > cost = 8
>> >> > (set (reg:SI 98)
>> >> > (const_int 1 [0x1]))
>> >> > cost = 4
>> >> > (set (reg:V4SI 95)
>> >> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
>> >> > (reg:V4SI 95)
>> >> > (const_int 8 [0x8])))
>> >> > cost = 4
>> >> >
>> >> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 
>> >> > sequence.
>> >> >
>> >> > I think the issue is probably that for the interleave+zip1 sequence we 
>> >> > take
>> >> > max(costs[0], costs[1]) to reflect that both halves are interleaved,
>> >> > but for the fallback seq we use seq_cost, which assumes serial execution
>> >> > of insns in the sequence.
>> >> > For above fallback sequence,
>> >> > set (reg:V4SI 95)
>> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
>> >> > and
>> >> > (set (reg:SI 98)
>> >> > (const_int 1 [0x1]))
>> >> > could be executed in parallel, which would make it's cost max(8, 4) + 4 
>> >> > = 12.
>> >>
>> >> Agreed.
>> >>
>> >> A good-enough substitute for this might be to ignore scalar moves
>> >> (for both alternatives) when costing for speed.
>> > Thanks for the suggestions. Just wondering for aarch64, if there's an easy
>> > way we can check if insn is a scalar move, similar to riscv's 
>> > scalar_move_insn_p
>> > that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ?
>>
>> It should be enough to check that the pattern is a SET:
>>
>> (a) whose SET_DEST has a scalar mode and
>> (b) whose SET_SRC an aarch64_mov_operand
> Hi Richard,
> Thanks for the suggestions, the attached patch calls seq_cost to compute
> cost for sequence and then subtracts cost of each scalar move insn from it.
> Does that look OK ?
> The patch is under bootstrap+test on aarch64-linux-gnu.

Yeah, the patch looks reasonable (some comments below).  The testing
for this kind of patch is more than a formality though, so it would
be good to wait to see if the tests pass.

> [...]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 29dbacfa917..7efd896d364 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -22332,6 +22332,32 @@ aarch64_unzip_vector_init (machine_mode mode, rtx 
> vals, bool even_p)
>return gen_rtx_PARALLEL (new_mode, vec);
>  }
>  
> +/* Return true if INSN is a scalar move.  */
> +
> +static bool
> +scalar_move_insn_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +return false;
> +  rtx src = SET_SRC (set);
> +  rtx dest = SET_DEST (set);
> +  return is_a(GET_MODE (dest)) && aarch64_mov_operand_p (src, 
> GET_MODE (src));

Long line.

> +}
> +
> +/* Ignore cost for scalar moves from cost of sequence. This function is 
> called
> +   for calculating sequence costs in aarch64_expand_vector_init.  */
> +
> +static unsigned
> +seq_cost_ignore_scalar_moves (rtx_insn *seq, bool speed)

Maybe more readable as "ignoring" rather tha

[avr,committed] Fix a trivial typo in gen-avr-mmcu-specs.cc.

2023-05-18 Thread Georg-Johann Lay


Applied as obvious, there was a trailing */ in a 1-line // comment.

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a726d007f197d13ec80b9d625bf8bab97c96384c

Johann


gcc/ChangeLog
* config/avr/gen-avr-mmcu-specs.cc: Remove stale */ after // comment.

--

diff --git a/gcc/config/avr/gen-avr-mmcu-specs.cc 
b/gcc/config/avr/gen-avr-mmcu-specs.cc
index 
9344246cb7203a665db575a2bf7c0e8a29521963..b9a5ad44e4e5c350fbcc45d468684ff6d873574e 
100644 (file)

--- a/gcc/config/avr/gen-avr-mmcu-specs.cc
+++ b/gcc/config/avr/gen-avr-mmcu-specs.cc
@@ -30,7 +30,7 @@
 #include "avr-devices.cc"

 // Get rid of "defaults.h".  We just need tm.h for `WITH_AVRLIBC' and
-// and `WITH_RTEMS'.  */
+// and `WITH_RTEMS'.
 #define GCC_DEFAULTS_H

 #include "tm.h"

RE: [PATCH] PR gcc/98350:Handle FMA friendly in reassoc pass

2023-05-18 Thread Cui, Lili via Gcc-patches

Attach CPU2017 3 run results:

On ICX: 
507.cactuBSSN_r: Improved by 1.7% for multi-copy .
503.bwaves_r  : Improved by  0.60% for single copy .
507.cactuBSSN_r : Improved by  1.10% for single copy .
519.lbm_r : Improved by  2.21% for single copy .
no measurable changes for other benchmarks.

On aarch64 
507.cactuBSSN_r: Improved by 1.7% for multi-copy.
503.bwaves_r : Improved by 6.00% for single-copy.
no measurable changes for other benchmarks.

> -Original Message-
> From: Cui, Lili 
> Sent: Wednesday, May 17, 2023 9:02 PM
> To: gcc-patches@gcc.gnu.org
> Cc: richard.guent...@gmail.com; Cui, Lili 
> Subject: [PATCH] PR gcc/98350:Handle FMA friendly in reassoc pass
> 
> From: Lili Cui 
> 
> Make some changes in reassoc pass to make it more friendly to fma pass
> later.
> Using FMA instead of mult + add reduces register pressure and insruction
> retired.
> 
> There are mainly two changes
> 1. Put no-mult ops and mult ops alternately at the end of the queue, which is
> conducive to generating more fma and reducing the loss of FMA when
> breaking the chain.
> 2. Rewrite the rewrite_expr_tree_parallel function to try to build parallel
> chains according to the given correlation width, keeping the FMA chance as
> much as possible.
> 
> TEST1:
> 
> float
> foo (float a, float b, float c, float d, float *e) {
>return  *e  + a * b + c * d ;
> }
> 
> For "-Ofast -mfpmath=sse -mfma" GCC generates:
> vmulss  %xmm3, %xmm2, %xmm2
> vfmadd132ss %xmm1, %xmm2, %xmm0
> vaddss  (%rdi), %xmm0, %xmm0
> ret
> 
> With this patch GCC generates:
> vfmadd213ss   (%rdi), %xmm1, %xmm0
> vfmadd231ss   %xmm2, %xmm3, %xmm0
> ret
> 
> TEST2:
> 
> for (int i = 0; i < N; i++)
> {
>   a[i] += b[i]* c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * l[i] 
> + m[i]* o[i] +
> p[i]; }
> 
> For "-Ofast -mfpmath=sse -mfma"  GCC generates:
>   vmovapd e(%rax), %ymm4
>   vmulpd  d(%rax), %ymm4, %ymm3
>   addq$32, %rax
>   vmovapd c-32(%rax), %ymm5
>   vmovapd j-32(%rax), %ymm6
>   vmulpd  h-32(%rax), %ymm6, %ymm2
>   vmovapd a-32(%rax), %ymm6
>   vaddpd  p-32(%rax), %ymm6, %ymm0
>   vmovapd g-32(%rax), %ymm7
>   vfmadd231pd b-32(%rax), %ymm5, %ymm3
>   vmovapd o-32(%rax), %ymm4
>   vmulpd  m-32(%rax), %ymm4, %ymm1
>   vmovapd l-32(%rax), %ymm5
>   vfmadd231pd f-32(%rax), %ymm7, %ymm2
>   vfmadd231pd k-32(%rax), %ymm5, %ymm1
>   vaddpd  %ymm3, %ymm0, %ymm0
>   vaddpd  %ymm2, %ymm0, %ymm0
>   vaddpd  %ymm1, %ymm0, %ymm0
>   vmovapd %ymm0, a-32(%rax)
>   cmpq$8192, %rax
>   jne .L4
>   vzeroupper
>   ret
> 
> with this patch applied GCC breaks the chain with width = 2 and generates 6
> fma:
> 
>   vmovapd a(%rax), %ymm2
>   vmovapd c(%rax), %ymm0
>   addq$32, %rax
>   vmovapd e-32(%rax), %ymm1
>   vmovapd p-32(%rax), %ymm5
>   vmovapd g-32(%rax), %ymm3
>   vmovapd j-32(%rax), %ymm6
>   vmovapd l-32(%rax), %ymm4
>   vmovapd o-32(%rax), %ymm7
>   vfmadd132pd b-32(%rax), %ymm2, %ymm0
>   vfmadd132pd d-32(%rax), %ymm5, %ymm1
>   vfmadd231pd f-32(%rax), %ymm3, %ymm0
>   vfmadd231pd h-32(%rax), %ymm6, %ymm1
>   vfmadd231pd k-32(%rax), %ymm4, %ymm0
>   vfmadd231pd m-32(%rax), %ymm7, %ymm1
>   vaddpd  %ymm1, %ymm0, %ymm0
>   vmovapd %ymm0, a-32(%rax)
>   cmpq$8192, %rax
>   jne .L2
>   vzeroupper
>   ret
> 
> gcc/ChangeLog:
> 
>   PR gcc/98350
>   * tree-ssa-reassoc.cc
>   (rewrite_expr_tree_parallel): Rewrite this function.
>   (rank_ops_for_fma): New.
>   (reassociate_bb): Handle new function.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR gcc/98350
>   * gcc.dg/pr98350-1.c: New test.
>   * gcc.dg/pr98350-2.c: Ditto.
> ---
>  gcc/testsuite/gcc.dg/pr98350-1.c |  31   gcc/testsuite/gcc.dg/pr98350-2.c
> |  11 ++
>  gcc/tree-ssa-reassoc.cc  | 256 +--
>  3 files changed, 215 insertions(+), 83 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/pr98350-1.c  create mode 100644
> gcc/testsuite/gcc.dg/pr98350-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/pr98350-1.c b/gcc/testsuite/gcc.dg/pr98350-
> 1.c
> new file mode 100644
> index 000..185511c5e0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr98350-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -mfpmath=sse -mfma -Wno-attributes " } */
> +
> +/* Test that the compiler properly optimizes multiply and add
> +   to generate more FMA instructions.  */ #define N 1024 double a[N];
> +double b[N]; double c[N]; double d[N]; double e[N]; double f[N]; double
> +g[N]; double h[N]; double j[N]; double k[N]; double l[N]; double m[N];
> +double o[N]; double p[N];
> +
> +
> +void
> +foo (void)
> +{
> +  for (int i = 0; i < N; i++)
> +  {
> +a[i] += b[i] * c[i] + d[i] * e[i] + f[i] * g[i] + h[i]

Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-18 Thread Segher Boessenkool

Hi!

On Thu, May 18, 2023 at 12:44:28PM +0530, Ajit Agarwal wrote:
> This patch improves code sinking pass to sink statements before call to reduce
> register pressure.

An example would be useful :-)

>   * tree-ssa-sink.cc (statement_sink_location): Modifed to
>   move statements before calls.

Spello ("modified").  But, you should write in the imperative mood
anyway, so "modify".  But, every change is a modification, so do without
the fluff altogether?  "Move statements before calls."

>   (block_call_p): New function.
>   (def_use_same_block): New function.
>   (select_best_block): Add heuristics to select the best
>   blocks in the immediate post dominator.

Please don't break lines
early
it makes things
harder to
read.

:-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */

This is the default, you can just leave it out.

> +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
> -fdump-tree-sink-stats" } */

You don't need -fdump-tree-sink without options since you have
-fdump-tree-sink-stats as well.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */ 
> +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */

You don't need to say it twice either :-)

> +/* Return TRUE if immediate uses of the defs in
> +   USE occur in the same block as USE, FALSE otherwise.  */
> +
> +bool
> +def_use_same_block (gimple *stmt)
> +{

There is no function parameter "use" here?  STMT instead?

> +  use_operand_p use_p;
> +  def_operand_p def_p;

Neither of these is a predicate.  Lose the _p please?

> +   if (use_p
> +   && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt)))

Please fit this on one line.  And no parens around random things please.

> +/* Return TRUE if the block has only calls, FALSE otherwise. */
> +
> +bool
> +block_call_p (basic_block bb)

> +/* We have already seen a call.  */
> +if (is_call)
> +  return false;
> +
> +if (is_gimple_call (stmt))
> +  is_call = true;
> +else
> +  return false;

> +  if (is_call && i == 1)
> +return true;
> +
> +  return false;

This doesn't do what the function comment says?  It is very important
that function comments say exactly what a function does.  It can perhaps
leave out some details, but it should be correct by and large.

> + /* Update sinking point as stmt before call if the sinking block
> +has only calls. Otherwise update sinking point as the use
> +stmt. */

(two spaces after full stop, twice)

> + if (gsi_stmt (gsi) == use
> + && !is_gimple_call (last_stmt)
> + && (gimple_code (last_stmt) != GIMPLE_SWITCH)
> + && (gimple_code (last_stmt) != GIMPLE_COND)
> + && (gimple_code (last_stmt) != GIMPLE_GOTO)
> + && (!gimple_vdef (use) || !def_use_same_block (def_stmt)))

Please no unnecessary parens.  At first I didn't notice the last line
here *does* need it!


Segher

Re: [PATCH 2/3] Refactor widen_plus as internal_fn

2023-05-18 Thread Andre Vieira (lists) via Gcc-patches


How about this?

Not sure about the DEF_INTERNAL documentation I rewrote in 
internal-fn.def, was struggling to word these, so improvements welcome!


gcc/ChangeLog:

2023-04-25  Andre Vieira  
Joel Hutton  
Tamar Christina  

* config/aarch64/aarch64-simd.md 
(vec_widen_addl_lo_): Rename

this ...
(vec_widen_add_lo_): ... to this.
(vec_widen_addl_hi_): Rename this ...
(vec_widen_add_hi_): ... to this.
(vec_widen_subl_lo_): Rename this ...
(vec_widen_sub_lo_): ... to this.
(vec_widen_subl_hi_): Rename this ...
(vec_widen_sub_hi_): ...to this.
* doc/generic.texi: Document new IFN codes.
	* internal-fn.cc (ifn_cmp): Function to compare ifn's for 
sorting/searching.

(lookup_hilo_internal_fn): Add lookup function.
(commutative_binary_fn_p): Add widen_plus fn's.
(widening_fn_p): New function.
(narrowing_fn_p): New function.
(direct_internal_fn_optab): Change visibility.
* internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an
internal_fn that expands into multiple internal_fns for widening.
(DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing.
(IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO,
 IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD,
 IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, 
IFN_VEC_WIDEN_MINUS_LO,
 IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define 
widening

plus,minus functions.
* internal-fn.h (direct_internal_fn_optab): Declare new prototype.
(lookup_hilo_internal_fn): Likewise.
(widening_fn_p): Likewise.
(Narrowing_fn_p): Likewise.
* optabs.cc (commutative_optab_p): Add widening plus optabs.
* optabs.def (OPTAB_D): Define widen add, sub optabs.
* tree-cfg.cc (verify_gimple_call): Add checks for widening ifns.
* tree-inline.cc (estimate_num_insns): Return same
cost for widen add and sub IFNs as previous tree_codes.
* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
patterns with a hi/lo or even/odd split.
(vect_recog_sad_pattern): Refactor to use new IFN codes.
(vect_recog_widen_plus_pattern): Likewise.
(vect_recog_widen_minus_pattern): Likewise.
(vect_recog_average_pattern): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Add support for
_HILO IFNs.
(supportable_widening_operation): Likewise.
* tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-add.c: Test that new
IFN_VEC_WIDEN_PLUS is being used.
* gcc.target/aarch64/vect-widen-sub.c: Test that new
IFN_VEC_WIDEN_MINUS is being used.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4626,7 +4626,7 @@
   [(set_attr "type" "neon__long")]
 )
 
-(define_expand "vec_widen_addl_lo_"
+(define_expand "vec_widen_add_lo_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
(ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
@@ -4638,7 +4638,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_addl_hi_"
+(define_expand "vec_widen_add_hi_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
(ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
@@ -4650,7 +4650,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_subl_lo_"
+(define_expand "vec_widen_sub_lo_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
(ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
@@ -4662,7 +4662,7 @@
   DONE;
 })
 
-(define_expand "vec_widen_subl_hi_"
+(define_expand "vec_widen_sub_hi_"
   [(match_operand: 0 "register_operand")
(ANY_EXTEND: (match_operand:VQW 1 "register_operand"))
(ANY_EXTEND: (match_operand:VQW 2 "register_operand"))]
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 
8b2882da4fe7da07d22b4e5384d049ba7d3907bf..5e36dac2b1a10257616f12cdfb0b12d0f2879ae9
 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1811,10 +1811,16 @@ a value from @code{enum annot_expr_kind}, the third is 
an @code{INTEGER_CST}.
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
 @tindex VEC_WIDEN_MULT_LO_EXPR
-@tindex VEC_WIDEN_PLUS_HI_EXPR
-@tindex VEC_WIDEN_PLUS_LO_EXPR
-@tindex VEC_WIDEN_MINUS_HI_EXPR
-@tindex VEC_WIDEN_MINUS_LO_EXPR
+@tindex IFN_VEC_WIDEN_PLUS
+@tindex IFN_VEC_WIDEN_PLUS_HI
+@tindex IFN_VEC_WIDEN_PLUS_LO
+@tindex IFN_VEC_WIDEN_PLUS_EVEN
+@tindex IFN_VEC_WIDEN_PLUS_ODD
+@tindex IFN_VEC_WIDEN_MINUS
+@tindex IFN_VEC_WIDEN_MINUS_HI
+@tindex IFN_V

Re: [PATCH] Fix internal error on small array with negative lower bound

2023-05-18 Thread Eric Botcazou via Gcc-patches

> Would it be better to use
> 
>   wi::to_uhwi (wi::to_wide (local->index) - wi::to_wide (local->min_index))
> 
> to honor the actual sign of the indices?  I think nothing forbids frontends
> to use a signed TYPE_DOMAIN here?  But the difference should be always
> representable in an unsigned value of course.

We use tree_to_uhwi everywhere else though, see categorize_ctor_elements_1:

  if (tree_fits_uhwi_p (lo_index) && tree_fits_uhwi_p (hi_index))
mult = (tree_to_uhwi (hi_index)
- tree_to_uhwi (lo_index) + 1);

or store_constructor

this_node_count = (tree_to_uhwi (hi_index)
   - tree_to_uhwi (lo_index) + 1);

so the proposed form looks better for the sake of consistency.

-- 
Eric Botcazou

Re: [PATCH V2, rs6000] Disable generation of scalar modulo instructions

2023-05-18 Thread Pat Haugen via Gcc-patches


Ping.

On 4/18/23 7:22 AM, Pat Haugen via Gcc-patches wrote:

Updated from prior patch to also disable for int128.


Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-04-18  Pat Haugen  

gcc/
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Likewise.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..1cf0a0013c0 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
     rs6000_asm_output_opcode (STREAM);    \
  }    \
    while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values. This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..4f397bc9179 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
  FAIL;

    operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
  }
    else
  {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
  (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
   (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "mods %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+    (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+  (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+    FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+    }
+})
+
+(define_insn "*umod3"
    [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
  (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
    (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
    "modu %0,%1,%2"
    [(set_attr "type" "div")
     (set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
   (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmoduq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
@@ -3516,7 +3552,7 @@ (define_insn "modti3"
    [(set (match_operand:TI 0 "altivec_register_operand" "=v")
  (mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
  (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
    "vmodsq %0,%1,%2"
    [(set_attr "type" "vecdiv")
     (set_attr "size" "128")])
diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c 
b/gcc/testsuite/gcc.target/powerpc/clone1.c

index c69fd2aa1b8..74323ca0e8c 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone1.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone1.c
@@ -21,6 +21,7 @@ long mod_func_or (long a, long b, long c)
    return mod_func (a, b) | c;
  }

-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 }

[PATCH] c++: scoped variable template-id of reference type [PR97340]

2023-05-18 Thread Patrick Palka via Gcc-patches

lookup_and_finish_template_variable calls convert_from_reference, which
means for a variable template-id of reference type the function returns
an INDIRECT_REF instead of the bare VAR_DECL.  But the downstream logic
of two callers, tsubst_qualified_id and finish_class_member_access_expr,
expect a DECL_P result and so we end up crashing when resolving the
template-id's in the first testcase.  (Note that these two callers
eventually call convert_from_reference as appropriate, so this earlier
call seems at best redundant.)

This patch fixes this by pulling out the convert_from_reference call
from lookup_and_finish_template_variable and into the callers that
actually need it, which turns out to be tsubst_copy_and_build (without
it we'd mishandle the second testcase).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/97340

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't call
convert_from_reference.
(tsubst_copy_and_build) : Call
convert_from_reference on the result of
lookup_and_finish_template_variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ80.C: New test.
* g++.dg/cpp1y/var-templ81.C: New test.
---
 gcc/cp/pt.cc |  3 ++-
 gcc/testsuite/g++.dg/cpp1y/var-templ80.C | 22 ++
 gcc/testsuite/g++.dg/cpp1y/var-templ81.C | 14 ++
 3 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ80.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ81.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 351fc18b600..9e5b29f3099 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10394,7 +10394,7 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
   complain &= ~tf_partial;
   var = finish_template_variable (var, complain);
   mark_used (var);
-  return convert_from_reference (var);
+  return var;
 }
 
 /* If the set of template parameters PARMS contains a template parameter
@@ -20462,6 +20462,7 @@ tsubst_copy_and_build (tree t,
  {
tree r = lookup_and_finish_template_variable (templ, targs,
  complain);
+   r = convert_from_reference (r);
r = maybe_wrap_with_location (r, EXPR_LOCATION (t));
RETURN (r);
  }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ80.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ80.C
new file mode 100644
index 000..4439bee8292
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ80.C
@@ -0,0 +1,22 @@
+// PR c++/97340
+// { dg-do compile { target c++14 } }
+
+template
+struct A {
+  template
+  static constexpr const int& var = 0;
+};
+
+template
+struct B {
+  static constexpr int x1 = A::template var;
+  static constexpr int y1 = A{}.template var;
+
+  static constexpr int x2 = A::template var;
+  static constexpr int y2 = A{}.template var;
+
+  static constexpr int x3 = A::template var;
+  static constexpr int y3 = A{}.template var;
+};
+
+template struct B;
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ81.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ81.C
new file mode 100644
index 000..f9d2e6b1eed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ81.C
@@ -0,0 +1,14 @@
+// Verify we don't ICE on an invalid use of unary * for a variable
+// template-id of reference type.
+// { dg-do compile { target c++14 } }
+
+template
+static constexpr const int& var = 0;
+
+template
+struct B {
+  static constexpr int x = *var; // { dg-error "argument of unary" }
+  static constexpr const int& y = *var; // { dg-error "argument of unary" }
+};
+
+template struct B;
-- 
2.41.0.rc0.4.g004e0f790f

Re: [PATCH 1/4] Missed opportunity to use [SU]ABD

2023-05-18 Thread Richard Sandiford via Gcc-patches

Thanks for the update.  Some of these comments would have applied
to the first version, so sorry for not catching them first time.

 writes:
> From: oluade01 
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
>   * doc/md.texi (sabd, uabd): Document them.
>   * internal-fn.def (ABD): Use new optab.
>   * optabs.def (sabd_optab, uabd_optab): New optabs,
>   * tree-vect-patterns.cc (vect_recog_absolute_difference):
>   Recognize the following idiom abs (a - b).
>   (vect_recog_sad_pattern): Refactor to use
>   vect_recog_absolute_difference.
>   (vect_recog_abd_pattern): Use patterns found by
>   vect_recog_absolute_difference to build a new ABD
>   internal call.
> ---
>  gcc/doc/md.texi   |  10 ++
>  gcc/internal-fn.def   |   3 +
>  gcc/optabs.def|   2 +
>  gcc/tree-vect-patterns.cc | 255 +-
>  4 files changed, 239 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 07bf8bdebffb2e523f25a41f2b57e43c0276b745..3e65584d7efcd301f2c96a40edd82d30b84462b8
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
>  Vector shift and rotate instructions that take vectors as operand 2
>  instead of a scalar type.
>  
> +@cindex @code{uabd@var{m}} instruction pattern
> +@cindex @code{sabd@var{m}} instruction pattern
> +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> +Signed and unsigned absolute difference instructions.  These
> +instructions find the difference between operands 1 and 2
> +then return the absolute value.  A C code equivalent would be:
> +@smallexample
> +op0 = op0 > op1 ? op0 - op1 : op1 - op0;

Should be:

  op0 = op1 > op2 ? op1 - op2 : op2 - op1;

since op0 is the output.

> +@end smallexample
> +
>  @cindex @code{avg@var{m}3_floor} instruction pattern
>  @cindex @code{uavg@var{m}3_floor} instruction pattern
>  @item @samp{avg@var{m}3_floor}
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
>  
> +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> +   sabd, uabd, binary)
> +
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> savg_floor, uavg_floor, binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> "mask_fold_left_plus_$a")
>  OPTAB_D (extract_last_optab, "extract_last_$a")
>  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
>  
> +OPTAB_D (uabd_optab, "uabd$a3")
> +OPTAB_D (sabd_optab, "sabd$a3")
>  OPTAB_D (savg_floor_optab, "avg$a3_floor")
>  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
>  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> a49b09539776c0056e77f99b10365d0a8747fbc5..50f1822f220c023027f4b0f777965f3757842fa2
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -770,6 +770,93 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
> stmt2_info, tree new_rhs,
>  }
>  }
>  
> +/* Look for the following pattern
> + X = x[i]
> + Y = y[i]
> + DIFF = X - Y
> + DAD = ABS_EXPR
> +
> +   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
> +   If REJECT_UNSIGNED is true it aborts if the type of ABS_STMT is unsigned.
> +   HALF_TYPE and UNPROM will be set should the statement be found to
> +   be a widened operation.
> +   DIFF_OPRNDS will be set to the two inputs of the MINUS_EXPR preceding
> +   ABS_STMT, otherwise it will be set the operations found by
> +   vect_widened_op_tree.
> + */
> +static bool
> +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
> + tree *half_type, bool reject_unsigned,
> + vect_unpromoted_value unprom[2],
> + tree diff_oprnds[2])
> +{
> +  if (!abs_stmt)
> +return false;
> +
> +  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a 
> phi
> + inside the loop (in case we are analyzing an outer-loop).  */
> +  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
> +  if (code != ABS_EXPR && code != ABSU_EXPR)
> +return false;
> +
> +  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
> +

[PATCH] c++: simplify norm_cache manipulation

2023-05-18 Thread Patrick Palka via Gcc-patches

Avoid performing two norm_cache lookups during normalization of a
concept-id by allocating inserting a norm_entry* before rather than
after the fact, which is simpler and faster.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (normalize_concept_check): Avoid having to do
two norm_cache lookups.  Remove unnecessary early exit for an
ill-formed concept definition.
---
 gcc/cp/constraint.cc | 31 +--
 1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c81e024e0e2..8cf0f2d0974 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -774,38 +774,25 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
 
   if (!norm_cache)
 norm_cache = hash_table::create_ggc (31);
-  norm_entry entry = {tmpl, targs, NULL_TREE};
-  norm_entry **slot = nullptr;
-  hashval_t hash = 0;
-  bool insert = false;
+  norm_entry *entry = nullptr;
   if (!info.generate_diagnostics ())
 {
   /* Cache the normal form of the substituted concept-id (when not
 diagnosing).  */
-  hash = norm_hasher::hash (&entry);
-  slot = norm_cache->find_slot_with_hash (&entry, hash, NO_INSERT);
-  if (slot)
+  norm_entry elt = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = norm_cache->find_slot (&elt, INSERT);
+  if (*slot)
return (*slot)->norm;
-  insert = true;
+  entry = ggc_alloc ();
+  *entry = elt;
+  *slot = entry;
 }
 
-  /* The concept may have been ill-formed.  */
   tree def = get_concept_definition (DECL_TEMPLATE_RESULT (tmpl));
-  if (def == error_mark_node)
-return error_mark_node;
-
   info.update_context (check, args);
   tree norm = normalize_expression (def, targs, info);
-  if (insert)
-{
-  /* Recompute SLOT since norm_cache may have been expanded during
-the recursive call.  */
-  slot = norm_cache->find_slot_with_hash (&entry, hash, INSERT);
-  gcc_checking_assert (!*slot);
-  entry.norm = norm;
-  *slot = ggc_alloc ();
-  **slot = entry;
-}
+  if (entry)
+entry->norm = norm;
   return norm;
 }
 
-- 
2.41.0.rc0.4.g004e0f790f

Re: [PATCH] Fix internal error on small array with negative lower bound

2023-05-18 Thread Richard Biener via Gcc-patches




> Am 18.05.2023 um 19:44 schrieb Eric Botcazou :
> 
> 
>> 
>> Would it be better to use
>> 
>>  wi::to_uhwi (wi::to_wide (local->index) - wi::to_wide (local->min_index))
>> 
>> to honor the actual sign of the indices?  I think nothing forbids frontends
>> to use a signed TYPE_DOMAIN here?  But the difference should be always
>> representable in an unsigned value of course.
> 
> We use tree_to_uhwi everywhere else though, see categorize_ctor_elements_1:
> 
>  if (tree_fits_uhwi_p (lo_index) && tree_fits_uhwi_p (hi_index))
>mult = (tree_to_uhwi (hi_index)
>- tree_to_uhwi (lo_index) + 1);
> 
> or store_constructor
> 
>this_node_count = (tree_to_uhwi (hi_index)
>   - tree_to_uhwi (lo_index) + 1);
> 
> so the proposed form looks better for the sake of consistency.

Ok, thanks for checking.

Richard 

> -- 
> Eric Botcazou
> 
>

[COMMITTED] i386: Add infrastructure for QImode partial vector mult and shift operations

2023-05-18 Thread Uros Bizjak via Gcc-patches

QImode partial vector multiplications and shifts can be implemented using
their HImode counterparts.  Add infrastructure to handle V8QImode and
V4QImode vectors by extending (interleaving) their input operands to
V8HImode, performing V8HImode operation and truncating output back to
the original QImode vector.

The patch implements V8QImode and V4QImode multiplication for SSE2 targets,
using generic permutation to truncate output operand, but still taking
advantage of VPMOVWB down convert instruction, when available.

The patch also removes setting of REG_EQAUL note to the last insn
of ix86_expand_vecop_qihi expander.  This is what generic code does
automatically when named pattern is expanded.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): New.
(ix86_expand_vecop_qihi): Add op2vec bool variable.
Do not set REG_EQUAL note.
* config/i386/i386-protos.h (ix86_expand_vecop_qihi_partial):
Add prototype.
* config/i386/i386.cc (ix86_multiplication_cost): Handle
V4QImode and V8QImode.
* config/i386/mmx.md (mulv8qi3): New expander.
(mulv4qi3): Ditto.
* config/i386/sse.md (mulv8qi3): Remove.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-pr95488-1.c: Adjust
expected scan-assembler-times frequency and strings.
* gcc.target/i386/vect-mulv4qi.c: New test.
* gcc.target/i386/vect-mulv8qi.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 8a869eb3b30..d5116801498 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23270,6 +23270,116 @@ ix86_expand_vec_shift_qihi_constant (enum rtx_code 
code,
   return true;
 }
 
+void
+ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2)
+{
+  machine_mode qimode = GET_MODE (dest);
+  rtx qop1, qop2, hop1, hop2, qdest, hres;
+  bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
+  bool uns_p = true;
+
+  switch (qimode)
+{
+case E_V4QImode:
+case E_V8QImode:
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  qop1 = lowpart_subreg (V16QImode, force_reg (qimode, op1), qimode);
+
+  if (op2vec)
+qop2 = lowpart_subreg (V16QImode, force_reg (qimode, op2), qimode);
+  else
+qop2 = op2;
+
+  switch (code)
+{
+case MULT:
+  gcc_assert (op2vec);
+  /* Unpack data such that we've got a source byte in each low byte of
+each word.  We don't care what goes into the high byte of each word.
+Rather than trying to get zero in there, most convenient is to let
+it be a copy of the low byte.  */
+  hop1 = copy_to_reg (qop1);
+  hop2 = copy_to_reg (qop2);
+  emit_insn (gen_vec_interleave_lowv16qi (hop1, hop1, hop1));
+  emit_insn (gen_vec_interleave_lowv16qi (hop2, hop2, hop2));
+  break;
+
+case ASHIFTRT:
+  uns_p = false;
+  /* FALLTHRU */
+case ASHIFT:
+case LSHIFTRT:
+  hop1 = gen_reg_rtx (V8HImode);
+  ix86_expand_sse_unpack (hop1, qop1, uns_p, false);
+  /* vashr/vlshr/vashl  */
+  if (op2vec)
+   {
+ hop2 = gen_reg_rtx (V8HImode);
+ ix86_expand_sse_unpack (hop2, qop2, uns_p, false);
+   }
+  else
+   hop2 = qop2;
+
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  if (code != MULT && op2vec)
+{
+  /* Expand vashr/vlshr/vashl.  */
+  hres = gen_reg_rtx (V8HImode);
+  emit_insn (gen_rtx_SET (hres,
+ simplify_gen_binary (code, V8HImode,
+  hop1, hop2)));
+}
+  else
+/* Expand mult/ashr/lshr/ashl.  */
+hres = expand_simple_binop (V8HImode, code, hop1, hop2,
+   NULL_RTX, 1, OPTAB_DIRECT);
+
+  if (TARGET_AVX512BW && TARGET_AVX512VL)
+{
+  if (qimode == V8QImode)
+   qdest = dest;
+  else
+   qdest = gen_reg_rtx (V8QImode);
+
+  emit_insn (gen_truncv8hiv8qi2 (qdest, hres));
+}
+  else
+{
+  struct expand_vec_perm_d d;
+  rtx qres = gen_lowpart (V16QImode, hres);
+  bool ok;
+  int i;
+
+  qdest = gen_reg_rtx (V16QImode);
+
+  /* Merge the data back into the right place.  */
+  d.target = qdest;
+  d.op0 = qres;
+  d.op1 = qres;
+  d.vmode = V16QImode;
+  d.nelt = 16;
+  d.one_operand_p = false;
+  d.testing_p = false;
+
+  for (i = 0; i < d.nelt; ++i)
+   d.perm[i] = i * 2;
+
+  ok = ix86_expand_vec_perm_const_1 (&d);
+  gcc_assert (ok);
+}
+
+  if (qdest != dest)
+emit_move_insn (dest, gen_lowpart (qimode, qdest));
+}
+
 /* Expand a vector operation CODE for a V*QImode in terms of the
same operation on V*HImode.  */
 
@@ -23281,6 +23391,7 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   rtx (*gen_il) (rtx, rtx, rtx);
   rtx (*gen_ih) (rtx, rtx, rtx);
   rtx op1_l, op1_h, op2_l

Re: [PATCH] c++: simplify norm_cache manipulation

2023-05-18 Thread Jason Merrill via Gcc-patches


On 5/18/23 14:01, Patrick Palka wrote:

Avoid performing two norm_cache lookups during normalization of a
concept-id by allocating inserting a norm_entry* before rather than
after the fact, which is simpler and faster.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* constraint.cc (normalize_concept_check): Avoid having to do
two norm_cache lookups.  Remove unnecessary early exit for an
ill-formed concept definition.
---
  gcc/cp/constraint.cc | 31 +--
  1 file changed, 9 insertions(+), 22 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c81e024e0e2..8cf0f2d0974 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -774,38 +774,25 @@ normalize_concept_check (tree check, tree args, norm_info 
info)
  
if (!norm_cache)

  norm_cache = hash_table::create_ggc (31);
-  norm_entry entry = {tmpl, targs, NULL_TREE};
-  norm_entry **slot = nullptr;
-  hashval_t hash = 0;
-  bool insert = false;
+  norm_entry *entry = nullptr;
if (!info.generate_diagnostics ())
  {
/* Cache the normal form of the substituted concept-id (when not
 diagnosing).  */
-  hash = norm_hasher::hash (&entry);
-  slot = norm_cache->find_slot_with_hash (&entry, hash, NO_INSERT);
-  if (slot)
+  norm_entry elt = {tmpl, targs, NULL_TREE};
+  norm_entry **slot = norm_cache->find_slot (&elt, INSERT);
+  if (*slot)
return (*slot)->norm;
-  insert = true;
+  entry = ggc_alloc ();
+  *entry = elt;
+  *slot = entry;
  }
  
-  /* The concept may have been ill-formed.  */

tree def = get_concept_definition (DECL_TEMPLATE_RESULT (tmpl));
-  if (def == error_mark_node)
-return error_mark_node;
-
info.update_context (check, args);
tree norm = normalize_expression (def, targs, info);
-  if (insert)
-{
-  /* Recompute SLOT since norm_cache may have been expanded during
-the recursive call.  */
-  slot = norm_cache->find_slot_with_hash (&entry, hash, INSERT);
-  gcc_checking_assert (!*slot);
-  entry.norm = norm;
-  *slot = ggc_alloc ();
-  **slot = entry;
-}
+  if (entry)
+entry->norm = norm;
return norm;
  }

Re: [PATCH] c++: scoped variable template-id of reference type [PR97340]

2023-05-18 Thread Jason Merrill via Gcc-patches


On 5/18/23 13:59, Patrick Palka wrote:

lookup_and_finish_template_variable calls convert_from_reference, which
means for a variable template-id of reference type the function returns
an INDIRECT_REF instead of the bare VAR_DECL.  But the downstream logic
of two callers, tsubst_qualified_id and finish_class_member_access_expr,
expect a DECL_P result and so we end up crashing when resolving the
template-id's in the first testcase.  (Note that these two callers
eventually call convert_from_reference as appropriate, so this earlier
call seems at best redundant.)

This patch fixes this by pulling out the convert_from_reference call
from lookup_and_finish_template_variable and into the callers that
actually need it, which turns out to be tsubst_copy_and_build (without
it we'd mishandle the second testcase).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/97340

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Don't call
convert_from_reference.
(tsubst_copy_and_build) : Call
convert_from_reference on the result of
lookup_and_finish_template_variable.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ80.C: New test.
* g++.dg/cpp1y/var-templ81.C: New test.
---
  gcc/cp/pt.cc |  3 ++-
  gcc/testsuite/g++.dg/cpp1y/var-templ80.C | 22 ++
  gcc/testsuite/g++.dg/cpp1y/var-templ81.C | 14 ++
  3 files changed, 38 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ80.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ81.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 351fc18b600..9e5b29f3099 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10394,7 +10394,7 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
complain &= ~tf_partial;
var = finish_template_variable (var, complain);
mark_used (var);
-  return convert_from_reference (var);
+  return var;
  }
  
  /* If the set of template parameters PARMS contains a template parameter

@@ -20462,6 +20462,7 @@ tsubst_copy_and_build (tree t,
  {
tree r = lookup_and_finish_template_variable (templ, targs,
  complain);
+   r = convert_from_reference (r);
r = maybe_wrap_with_location (r, EXPR_LOCATION (t));
RETURN (r);
  }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ80.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ80.C
new file mode 100644
index 000..4439bee8292
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ80.C
@@ -0,0 +1,22 @@
+// PR c++/97340
+// { dg-do compile { target c++14 } }
+
+template
+struct A {
+  template
+  static constexpr const int& var = 0;
+};
+
+template
+struct B {
+  static constexpr int x1 = A::template var;
+  static constexpr int y1 = A{}.template var;
+
+  static constexpr int x2 = A::template var;
+  static constexpr int y2 = A{}.template var;
+
+  static constexpr int x3 = A::template var;
+  static constexpr int y3 = A{}.template var;
+};
+
+template struct B;
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ81.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ81.C
new file mode 100644
index 000..f9d2e6b1eed
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ81.C
@@ -0,0 +1,14 @@
+// Verify we don't ICE on an invalid use of unary * for a variable
+// template-id of reference type.
+// { dg-do compile { target c++14 } }
+
+template
+static constexpr const int& var = 0;
+
+template
+struct B {
+  static constexpr int x = *var; // { dg-error "argument of unary" }
+  static constexpr const int& y = *var; // { dg-error "argument of unary" }
+};
+
+template struct B;

Re: [PATCH 08/14] fortran: use _P() defines from tree.h

2023-05-18 Thread Mikael Morin


Le 18/05/2023 à 17:18, Bernhard Reutner-Fischer a écrit :

On Sun, 14 May 2023 15:10:12 +0200
Mikael Morin  wrote:


Le 14/05/2023 à 01:23, Bernhard Reutner-Fischer via Gcc-patches a écrit :

From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

* trans-array.cc (is_pointer_array): Use _P() defines from tree.h.
(gfc_conv_scalarized_array_ref): Ditto.
(gfc_conv_array_ref): Ditto.
* trans-decl.cc (gfc_finish_decl): Ditto.
(gfc_get_symbol_decl): Ditto.
* trans-expr.cc (gfc_trans_pointer_assignment): Ditto.
(gfc_trans_arrayfunc_assign): Ditto.
(gfc_trans_assignment_1): Ditto.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmax): Ditto.
(conv_intrinsic_ieee_value): Ditto.
* trans-io.cc (gfc_convert_array_to_string): Ditto.
* trans-openmp.cc (gfc_omp_is_optional_argument): Ditto.
(gfc_trans_omp_clauses): Ditto.
* trans-stmt.cc (gfc_conv_label_variable): Ditto.
* trans.cc (gfc_build_addr_expr): Ditto.
(get_array_span): Ditto.


OK from the fortran side.

Thanks


Thanks, i'll push it during the weekend.

I've fed gfortran.h into the script and found some CLASS_DATA spots,
see attached bootstrapped and tested patch.
Do we want to have that?

Some of it makes sense, but not all of it.

It is a macro to access the _data component of a class container.
So for class-related stuff it makes sense to use CLASS_DATA, and 
typically there will be a check that the type is BT_CLASS before.
But for cases where we loop over all of the components of a type that is 
not necessarily a class container, it doesn't make sense to use CLASS_DATA.


So I suggest to only keep the following hunks.



diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index aa01a4d3d22..a6b4ef0a0bf 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -5847,9 +5847,9 @@ gfc_get_corank (gfc_expr *e)
   if (!gfc_is_coarray (e))
 return 0;
 
-  if (e->ts.type == BT_CLASS && e->ts.u.derived->components)

-corank = e->ts.u.derived->components->as
-? e->ts.u.derived->components->as->corank : 0;
+  if (e->ts.type == BT_CLASS && CLASS_DATA (e))
+corank = CLASS_DATA (e)->as
+? CLASS_DATA (e)->as->corank : 0;
   else
 corank = e->symtree->n.sym->as ? e->symtree->n.sym->as->corank : 0;
 
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc

index 9c92958a397..6e26fb07ddd 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -15176,7 +15176,7 @@ resolve_component (gfc_component *c, gfc_symbol *sym)
   /* Check type-spec if this is not the parent-type component.  */
   if (((sym->attr.is_class
 && (!sym->components->ts.u.derived->attr.extension
-|| c != sym->components->ts.u.derived->components))
+   || c != CLASS_DATA (sym->components)))
|| (!sym->attr.is_class
&& (!sym->attr.extension || c != sym->components)))
   && !sym->attr.vtype
@@ -15189,7 +15189,7 @@ resolve_component (gfc_component *c, gfc_symbol *sym)
  component.  */
   if (super_type
   && ((sym->attr.is_class
-   && c == sym->components->ts.u.derived->components)
+  && c == CLASS_DATA (sym->components))
   || (!sym->attr.is_class && c == sym->components))
   && strcmp (super_type->name, c->name) == 0)
 c->attr.access = super_type->attr.access;
@@ -15435,7 +15435,7 @@ resolve_fl_derived0 (gfc_symbol *sym)
   return false;
 }
 
-  c = (sym->attr.is_class) ? sym->components->ts.u.derived->components

+  c = (sym->attr.is_class) ? CLASS_DATA (sym->components)
   : sym->components;
 
   success = true;

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index a7b4784d73a..6ba2040e61c 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -3116,28 +3116,28 @@ gfc_simplify_extends_type_of (gfc_expr *a, gfc_expr 
*mold)
   /* Return .false. if the dynamic type can never be an extension.  */
   if ((a->ts.type == BT_CLASS && mold->ts.type == BT_CLASS
&& !gfc_type_is_extension_of
-   (mold->ts.u.derived->components->ts.u.derived,
-a->ts.u.derived->components->ts.u.derived)
+   (CLASS_DATA (mold)->ts.u.derived,
+CLASS_DATA (a)->ts.u.derived)
&& !gfc_type_is_extension_of
-   (a->ts.u.derived->components->ts.u.derived,
-mold->ts.u.derived->components->ts.u.derived))
+   (CLASS_DATA (a)->ts.u.derived,
+CLASS_DATA (mold)->ts.u.derived))
   || (a->ts.type == BT_DERIVED && mold->ts.type == BT_CLASS
  && !gfc_type_is_extension_of
-   (mold->ts.u.derived->components->ts.u.derived,
+   (CLASS_DATA (mold)->ts.u.derived,
 a->ts.u.derived))
   || (a->ts.type == BT_CLASS && mold->ts.type == BT_DERIVED
  &&

Re: [PATCH v2] Fortran: Narrow return types [PR78798]

2023-05-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On Sun, 14 May 2023 14:27:42 +0200
Mikael Morin  wrote:

> Le 10/05/2023 à 18:47, Bernhard Reutner-Fischer via Fortran a écrit :
> > From: Bernhard Reutner-Fischer 
> > 
> > gcc/fortran/ChangeLog:
> > 
> > PR fortran/78798
> > * array.cc (compare_bounds): Use narrower return type.
> > (gfc_compare_array_spec): Likewise.
> > (is_constant_element): Likewise.
> > (gfc_constant_ac): Likewise.  
> (...)
> > ---
> > Bootstrapped without new warnings and regression tested on
> > x86_64-linux with no regressions, OK for trunk?
> >   
> (...)
> > diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
> > index b348bda6e6c..4e3aed84b9d 100644
> > --- a/gcc/fortran/check.cc
> > +++ b/gcc/fortran/check.cc
> > @@ -1156,7 +1156,7 @@ dim_rank_check (gfc_expr *dim, gfc_expr *array, int 
> > allow_assumed)
> >  dimension bi, returning 0 if they are known not to be identical,
> >  and 1 if they are identical, or if this cannot be determined.  */
> >   
> > -static int
> > +static bool
> >   identical_dimen_shape (gfc_expr *a, int ai, gfc_expr *b, int bi)
> >   {
> > mpz_t a_size, b_size;  
> 
> To be consistent, please change as well the local variable "ret" used as 
> return value from int to bool.
> 
> > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> > index c3b7c7f7bd9..d7890a97287 100644
> > --- a/gcc/fortran/cpp.cc
> > +++ b/gcc/fortran/cpp.cc
> > @@ -297,7 +297,7 @@ gfc_cpp_init_options (unsigned int 
> > decoded_options_count,
> > gfc_cpp_option.deferred_opt_count = 0;
> >   }
> >   
> > -int
> > +bool
> >   gfc_cpp_handle_option (size_t scode, const char *arg, int value 
> > ATTRIBUTE_UNUSED)
> >   {
> > int result = 1;  
> 
> Same here, change the type of variable "result".
> 
> (...)
> > diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
> > index a648d5c7903..b398b29a642 100644
> > --- a/gcc/fortran/dependency.cc
> > +++ b/gcc/fortran/dependency.cc  
> (...)
> 
> > @@ -1091,7 +1091,7 @@ gfc_check_argument_dependency (gfc_expr *other, 
> > sym_intent intent,
> >   /* Like gfc_check_argument_dependency, but check all the arguments in 
> > ACTUAL.
> >  FNSYM is the function being called, or NULL if not known.  */
> >   
> > -int
> > +bool
> >   gfc_check_fncall_dependency (gfc_expr *other, sym_intent intent,
> >  gfc_symbol *fnsym, gfc_actual_arglist *actual,
> >  gfc_dep_check elemental)  
> 
> Why not change the associated subfunctions 
> (gfc_check_argument_dependency, gfc_check_argument_var_dependency) as well ?

I have left these subfunctions alone for now to get the other hunks out
of the way. I have adjusted the patch according to your other comments
and pushed it as r14-973-gc072df1ab14450.

Thanks!

> 
> (...)
> > @@ -2098,7 +2098,7 @@ ref_same_as_full_array (gfc_ref *full_ref, gfc_ref 
> > *ref)
> > there is some kind of overlap.
> > 0 : array references are identical or not overlapping.  */
> >   
> > -int
> > +bool
> >   gfc_dep_resolver (gfc_ref *lref, gfc_ref *rref, gfc_reverse *reverse,
> >   bool identical)
> >   {  
> 
> The function comment states that the function may return 2, which 
> doesn't seem to be the case any more.  So please update the comment.
> 
> (...)> diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
> > index 221165d6dac..b4b36e27d75 100644
> > --- a/gcc/fortran/symbol.cc
> > +++ b/gcc/fortran/symbol.cc
> > @@ -3216,7 +3216,7 @@ gfc_find_symtree_in_proc (const char* name, 
> > gfc_namespace* ns)
> >  any parent namespaces if requested by a nonzero parent_flag.
> >  Returns nonzero if the name is ambiguous.  */
> >   
> > -int
> > +bool
> >   gfc_find_sym_tree (const char *name, gfc_namespace *ns, int parent_flag,
> >gfc_symtree **result)
> >   {  
> 
> Maybe change nonzero to true in the comment?
> 
> (...)
> 
> OK with all the above fixed.
> 
> Thanks.
>

Re: [PATCH 01/14] ada: use _P() defines from tree.h

2023-05-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On Mon, 15 May 2023 12:05:10 +0200
Eric Botcazou  wrote:

> > && DECL_RETURN_VALUE_P (inner))
> > diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
> > index 0c4f8b90c8e..460ef6f1f01 100644
> > --- a/gcc/ada/gcc-interface/utils.cc
> > +++ b/gcc/ada/gcc-interface/utils.cc
> > @@ -1966,7 +1966,7 @@ finish_record_type (tree record_type, tree field_list,
> > int rep_level, bool debug_info_p)
> >  {
> >const enum tree_code orig_code = TREE_CODE (record_type);
> > -  const bool had_size = TYPE_SIZE (record_type) != NULL_TREE;
> > +  const bool had_size = COMPLETE_TYPE_P (record_type);
> >const bool had_align = TYPE_ALIGN (record_type) > 0;
> >/* For all-repped records with a size specified, lay the QUAL_UNION_TYPE
> >   out just like a UNION_TYPE, since the size will be fixed.  */  
> 
> This one is not an improvement but more of a coincidence; the rest is OK.
> 

I've dropped this hunk and installed the rest as
r14-974-g04682fe764004b.
Thanks!

Re: [PATCH 01/14] ada: use _P() defines from tree.h

2023-05-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On Sun, 14 May 2023 17:03:55 -0600
Jeff Law  wrote:

> On 5/13/23 17:23, Bernhard Reutner-Fischer via Gcc-patches wrote:
> > From: Bernhard Reutner-Fischer 
> > 
> > gcc/ada/ChangeLog:
> > 
> > * gcc-interface/decl.cc (gnat_to_gnu_entity): Use _P defines

> The series as a whole is OK.

Thanks.
I've dropped the go and rust hunks and installed the rest (with tweaks
as requested) as r14-974-g04682fe764004b .. r14-985-gca2007a9bb3074

[PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-18 Thread Vineet Gupta

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
li  a5,0x101
addia5,a5,0x101
mv  a0,a5
sllia5,a5,32
add a0,a5,a0
ret

With patch
li  a5,0x101
addia5,a5,0x101
sllia0,a5,32
add a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 79122699b6f5..4e1bb2f14cf8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -703,13 +703,18 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
 
-  riscv_move_integer (hi, hi, hival, mode);
   riscv_move_integer (lo, lo, loval, mode);
 
-  hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
-  hi = force_reg (mode, hi);
+  if (loval == hival)
+  hi = gen_rtx_ASHIFT (mode, lo, GEN_INT (32));
+  else
+{
+  riscv_move_integer (hi, hi, hival, mode);
+  hi = gen_rtx_ASHIFT (mode, hi, GEN_INT (32));
+}
 
-  return gen_rtx_fmt_ee (PLUS, mode, hi, lo);
+  hi = force_reg (mode, hi);
+  return gen_rtx_PLUS (mode, hi, lo);
 }
 
 /* Return true if X is a thread-local symbol.  */
-- 
2.34.1

[PATCH] c-family: implement -ffp-contract=on

2023-05-18 Thread Alexander Monakov via Gcc-patches

Implement -ffp-contract=on for C and C++ without changing default
behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN).

gcc/c-family/ChangeLog:

* c-gimplify.cc (fma_supported_p): New helper.
(c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA
contraction.

gcc/ChangeLog:

* common.opt (fp_contract_mode) [on]: Remove fallback.
* config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test.
* doc/invoke.texi (-ffp-contract): Update.
* trans-mem.cc (diagnose_tm_1): Skip internal function calls.
---
 gcc/c-family/c-gimplify.cc | 78 ++
 gcc/common.opt |  3 +-
 gcc/config/sh/sh.md|  2 +-
 gcc/doc/invoke.texi|  8 ++--
 gcc/trans-mem.cc   |  3 ++
 5 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index ef5c7d919f..f7635d3b0c 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-ubsan.h"
 #include "tree-nested.h"
 #include "context.h"
+#include "tree-pass.h"
+#include "internal-fn.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
@@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree body)
   return bind;
 }
 
+/* Helper for c_gimplify_expr: test if target supports fma-like FN.  */
+
+static bool
+fma_supported_p (enum internal_fn fn, tree type)
+{
+  return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH);
+}
+
 /* Gimplification of expression trees.  */
 
 /* Do C-specific gimplification on *EXPR_P.  PRE_P and POST_P are as in
@@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
break;
   }
 
+case PLUS_EXPR:
+case MINUS_EXPR:
+  {
+   tree type = TREE_TYPE (*expr_p);
+   /* For -ffp-contract=on we need to attempt FMA contraction only
+  during initial gimplification.  Late contraction across statement
+  boundaries would violate language semantics.  */
+   if (SCALAR_FLOAT_TYPE_P (type)
+   && flag_fp_contract_mode == FP_CONTRACT_ON
+   && cfun && !(cfun->curr_properties & PROP_gimple_any)
+   && fma_supported_p (IFN_FMA, type))
+ {
+   bool neg_mul = false, neg_add = code == MINUS_EXPR;
+
+   tree *op0_p = &TREE_OPERAND (*expr_p, 0);
+   tree *op1_p = &TREE_OPERAND (*expr_p, 1);
+
+   /* Look for ±(x * y) ± z, swapping operands if necessary.  */
+   if (TREE_CODE (*op0_p) == NEGATE_EXPR
+   && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR)
+ /* '*EXPR_P' is '-(x * y) ± z'.  This is fine.  */;
+   else if (TREE_CODE (*op0_p) != MULT_EXPR)
+ {
+   std::swap (op0_p, op1_p);
+   std::swap (neg_mul, neg_add);
+ }
+   if (TREE_CODE (*op0_p) == NEGATE_EXPR)
+ {
+   op0_p = &TREE_OPERAND (*op0_p, 0);
+   neg_mul = !neg_mul;
+ }
+   if (TREE_CODE (*op0_p) != MULT_EXPR)
+ break;
+   auto_vec ops (3);
+   ops.quick_push (TREE_OPERAND (*op0_p, 0));
+   ops.quick_push (TREE_OPERAND (*op0_p, 1));
+   ops.quick_push (*op1_p);
+
+   enum internal_fn ifn = IFN_FMA;
+   if (neg_mul)
+ {
+   if (fma_supported_p (IFN_FNMA, type))
+ ifn = IFN_FNMA;
+   else
+ ops[0] = build1 (NEGATE_EXPR, type, ops[0]);
+ }
+   if (neg_add)
+ {
+   enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS;
+   if (fma_supported_p (ifn2, type))
+ ifn = ifn2;
+   else
+ ops[2] = build1 (NEGATE_EXPR, type, ops[2]);
+ }
+   for (auto &&op : ops)
+ if (gimplify_expr (&op, pre_p, post_p, is_gimple_val, fb_rvalue)
+ == GS_ERROR)
+   return GS_ERROR;
+
+   gcall *call = gimple_build_call_internal_vec (ifn, ops);
+   gimple_seq_add_stmt_without_update (pre_p, call);
+   *expr_p = create_tmp_var (type);
+   gimple_call_set_lhs (call, *expr_p);
+   return GS_ALL_DONE;
+ }
+   break;
+  }
+
 default:;
 }
 
diff --git a/gcc/common.opt b/gcc/common.opt
index a28ca13385..3daec85aef 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1662,9 +1662,8 @@ Name(fp_contract_mode) Type(enum fp_contract_mode) 
UnknownError(unknown floating
 EnumValue
 Enum(fp_contract_mode) String(off) Value(FP_CONTRACT_OFF)
 
-; Not implemented, fall back to conservative FP_CONTRACT_OFF.
 EnumValue
-Enum(fp_contract_mode) String(on) Value(FP_CONTRACT_OFF)
+Enum(fp_contract_mode) String(on) Value(FP_CONTRACT_ON)

[PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-18 Thread Carl Love via Gcc-patches

GCC maintainers:

version 2.  Fixed an issue with the test case.  The dg-options line was
missing.

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl


rs6000: Add buildin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set R instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 ++
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |   8 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 106 ++
 4 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 92d9b46e1b9..67125473684 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2875,6 +2875,13 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {}
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..adda2df69ea 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ed8b9c8a87b..f16c046051a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
comparison, _Decimal128 value);
 
 double __builtin_mffsl(void);
 
+double __builtin_mffscrn (unsigned long int);
+double __builtin_mffscrn (double);
+
 @end smallexample
 The @code{__builtin_byte_in_set} function requires a
 64-bit environment supporting ISA 3.0 or later.  This function returns
@@ -18511,6 +18514,11 @@ the FPSCR.  The instruction is a lower latency version 
of the @code{mffs}
 instruction.  If the @code{mffsl} instruction is not available, then the
 builtin uses the older @code{mffs} instruction to read the FPSCR.
 
+The @code{__builtin_mffscrn} returns the contents of the control bits in the
+FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
+contents of bits [62:63] of the unsigned long int or double argument are placed
+into bits [62:63] of the FPSCR (RN).
+
 @node Basic PowerPC Built-in Functions Available on ISA 3.1
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..26c666a4091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mpower9-vector -mdejagnu-cpu=power9" } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn (EXPECTED2);
+
+  /* Read FPSCR, bits [62:63] should have been set to 0x2 by previous builtin
+ call.  */
+  val.d = __builtin_mffscrn (EXPECTED1);
+  /* The expected result is the

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-18 Thread Peter Bergner via Gcc-patches

On 5/10/23 1:06 PM, Carl Love wrote:
> -  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
> +  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
>  TR_STXVRHX vsx_stxvrhx {stvec}
>  
> -  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
> +  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
>  TR_STXVRWX vsx_stxvrwx {stvec}

In my estimation, these two changes are "obvious" fixes.

> +  void __builtin_vec_xst_trunc (vsq, signed long long, signed long *);
> +TR_STXVRLX  TR_STXVRLX_S
> +  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long *);
> +TR_STXVRLX  TR_STXVRLX_U

Not a comment on these two changes, and not a request to expand this
specific patch, but I believe I saw other built-ins that were missing
signed long */unsigned long * versions where they could/should accept
them.  Can you double-check whether there are other built-ins that
need similar changes and if so, please post a separate patch to fix
those as well?  Thanks.

Peter

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-18 Thread Carl Love via Gcc-patches

Peter:

On Thu, 2023-05-18 at 16:28 -0500, Peter Bergner wrote:
> 



> 
> > +  void __builtin_vec_xst_trunc (vsq, signed long long, signed long
> > *);
> > +TR_STXVRLX  TR_STXVRLX_S
> > +  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned
> > long *);
> > +TR_STXVRLX  TR_STXVRLX_U
> 
> Not a comment on these two changes, and not a request to expand this
> specific patch, but I believe I saw other built-ins that were missing
> signed long */unsigned long * versions where they could/should accept
> them.  Can you double-check whether there are other built-ins that
> need similar changes and if so, please post a separate patch to fix
> those as well?  Thanks.

OK, I will put that on my to do list to go look for that in other
builtins.  

 Carl

Re: [PING] [C PATCH] Fix ICEs related to VM types in C [PR106465, PR107557, PR108423, PR109450]

2023-05-18 Thread Joseph Myers

On Thu, 18 May 2023, Martin Uecker via Gcc-patches wrote:

> +  /* we still have to evaluate size expressions */

Comments should start with a capital letter and end with ".  ".

> diff --git a/gcc/testsuite/gcc.dg/nested-vla-1.c 
> b/gcc/testsuite/gcc.dg/nested-vla-1.c
> new file mode 100644
> index 000..408a68524d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/nested-vla-1.c
> @@ -0,0 +1,37 @@
> +/* { dg-do run } */
> +/* { dg-options "-std=gnu99" } */

I'm concerned with various undefined behavior in this and other tests; 
they look very fragile, relying on some optimizations and not others 
taking place.  I think they should be adjusted to avoid undefined behavior 
if all the evaluations from the abstract machine (in particular, of sizeof 
operands with variable size) take place, and other undefined behavior from 
calling functions through function pointers with incompatible type.

> + struct bar { char x[++n]; } (*bar2)(void) = bar;/* { dg-warning 
> "incompatible pointer type" } */
> +
> + if (2 != n)
> + __builtin_abort();
> +
> + if (2 != sizeof((*bar2)()))
> + __builtin_abort();

You're relying on the compiler not noticing that a function is being 
called through an incompatible type and thus not turning the call (which 
should be evaluated, because the operand of sizeof has a type with 
variable size) into a call to abort.

> diff --git a/gcc/testsuite/gcc.dg/nested-vla-2.c 
> b/gcc/testsuite/gcc.dg/nested-vla-2.c
> new file mode 100644
> index 000..504eec48c80
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/nested-vla-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +/* { dg-options "-std=gnu99" } */
> +
> +
> +int main()
> +{
> + int n = 1;
> +
> + typeof(char (*)[++n]) bar(void) { }
> +
> + if (2 != n)
> + __builtin_abort();
> +
> + if (2 != sizeof(*bar()))
> + __builtin_abort();

In this test, *bar() is evaluated, i.e. an undefined pointer is 
dereferenced; it would be better to return a valid pointer to a 
sufficiently large array to avoid that undefined behavior.

> diff --git a/gcc/testsuite/gcc.dg/pr106465.c b/gcc/testsuite/gcc.dg/pr106465.c
> new file mode 100644
> index 000..b03e2442f12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106465.c
> @@ -0,0 +1,86 @@
> +/* PR c/106465
> + * { dg-do run }
> + * { dg-options "-std=gnu99" }
> + * */
> +
> +int main()
> +{
> + int n = 3;
> + 
> + void g1(int m, struct { char p[++m]; }* b)  /* { dg-warning 
> "anonymous struct" } */
> + {
> + if (3 != m)
> + __builtin_abort();
> +
> + if (3 != sizeof(b->p))
> + __builtin_abort();
> + }

> + g1(2, (void*)0);

Similarly, this is dereferencing a null pointer in the evaluated operand 
of sizeof.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: RISC-V Test Errors and Failures

2023-05-18 Thread Vineet Gupta





On 5/17/23 00:52, Andreas Schwab wrote:

On Mai 16 2023, Vineet Gupta wrote:


Yes I was seeing similar tcl errors and such - and in my case an even
higher count.

They are coming from commit d6654a4be3b.



As of a726d007f197 today I get a gazzilion splat for riscv multilib 
dejagnu runs and over 5k fails


ERROR: torture-init: torture_without_loops is not empty as expected
ERROR: tcl error code NONE
...
...
   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow | 5033 / 4 |    1 / 1 |   72 /    12 |
 rv32imafdc/ ilp32d/ medlow | 5032 / 3 |    3 / 2 |   72 /    12 |
   rv32imac/  ilp32/ medlow |    1 / 1 |    3 / 2 |  109 /    19 |
   rv64imac/   lp64/ medlow | 5034 / 5 |    1 / 1 |  109 /    19 |

For a non multilib run things are sane:

   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |   11 / 4 |    1 / 1 |   72 /    12 |

It is really hard to test anything on upstream ATM.

-Vineet

[committed] c: Do not allow thread-local tentative definitions for C2x

2023-05-18 Thread Joseph Myers

C2x makes it clear that thread-local declarations can never be
tentative definitions (the legacy feature of C where you can e.g. do
"int i;" more than once at file scope, possibly with one of the
declarations initialized, and it counts as exactly one definition),
but are always definitions in the absence of "extern".  The wording
about external definitions was unclear in the thread-local case in C11
/ C17 (both about what counts as a tentative definition, and what is a
"definition" at all), not having been updated to cover the addition of
thread-local storage.

Implement this C2x requirement.  Arguably this is a defect fix that
would be appropriate to apply for all standard versions, but for now
the change is conditional on flag_isoc2x (however, it doesn't handle
_Thread_local / thread_local any different from GNU __thread).  Making
the change unconditional results in various TLS tests failing to
compile (gcc.dg/c11-thread-local-1.c gcc.dg/tls/thr-init-1.c
gcc.dg/tls/thr-init-2.c gcc.dg/torture/tls/thr-init-2.c
objc.dg/torture/tls/thr-init.m), though it's not clear if those tests
reflect any real code similarly trying to make use of thread-local
tentative definitions.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (diagnose_mismatched_decls): Do not handle
thread-local declarations as tentative definitions for C2x.
(finish_decl): Do not allow thread-local definition with
incomplete type for C2x.

gcc/testsuite/
* gcc.dg/c2x-thread-local-2.c: New test.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 945e45bff89..b5b491cf2da 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2442,8 +2442,20 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
  return false;
}
 
-  /* Multiple initialized definitions are not allowed (6.9p3,5).  */
-  if (DECL_INITIAL (newdecl) && DECL_INITIAL (olddecl))
+  /* Multiple initialized definitions are not allowed (6.9p3,5).
+For this purpose, C2x makes it clear that thread-local
+declarations without extern are definitions, not tentative
+definitions, whether or not they have initializers.  The
+wording before C2x was unclear; literally it would have made
+uninitialized thread-local declarations into tentative
+definitions only if they also used static, but without saying
+explicitly whether or not other cases count as
+definitions at all.  */
+  if ((DECL_INITIAL (newdecl) && DECL_INITIAL (olddecl))
+ || (flag_isoc2x
+ && DECL_THREAD_LOCAL_P (newdecl)
+ && !DECL_EXTERNAL (newdecl)
+ && !DECL_EXTERNAL (olddecl)))
{
  auto_diagnostic_group d;
  error ("redefinition of %q+D", newdecl);
@@ -5714,10 +5726,12 @@ finish_decl (tree decl, location_t init_loc, tree init,
  /* A static variable with an incomplete type
 is an error if it is initialized.
 Also if it is not file scope.
+Also if it is thread-local (in C2x).
 Otherwise, let it through, but if it is not `extern'
 then it may cause an error message later.  */
  ? (DECL_INITIAL (decl) != NULL_TREE
-|| !DECL_FILE_SCOPE_P (decl))
+|| !DECL_FILE_SCOPE_P (decl)
+|| (flag_isoc2x && DECL_THREAD_LOCAL_P (decl)))
  /* An automatic variable with an incomplete type
 is an error.  */
  : !DECL_EXTERNAL (decl)))
diff --git a/gcc/testsuite/gcc.dg/c2x-thread-local-2.c 
b/gcc/testsuite/gcc.dg/c2x-thread-local-2.c
new file mode 100644
index 000..d199ff23848
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-thread-local-2.c
@@ -0,0 +1,40 @@
+/* Test that thread-local declarations are not considered tentative definitions
+   in C2x.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+thread_local int a; /* { dg-message "previous" } */
+thread_local int a; /* { dg-error "redefinition" } */
+
+static thread_local int b; /* { dg-message "previous" } */
+static thread_local int b; /* { dg-error "redefinition" } */
+
+thread_local int c; /* { dg-message "previous" } */
+thread_local int c = 1; /* { dg-error "redefinition" } */
+
+static thread_local int d; /* { dg-message "previous" } */
+static thread_local int d = 1; /* { dg-error "redefinition" } */
+
+thread_local int e = 1; /* { dg-message "previous" } */
+thread_local int e; /* { dg-error "redefinition" } */
+
+static thread_local int f = 1; /* { dg-message "previous" } */
+static thread_local int f; /* { dg-error "redefinition" } */
+
+/* Not being a tentative definition means that incomplete arrays are an error
+   rather than defaulting to size 1.  */
+thread_local int g[]; /* { dg-error "storage size" } */
+static thread_local int h[]; /* { dg-error "array size missing" } */
+extern thread_local int i[];
+
+thread_local int j[]

[PATCH 1/2] Improve do_store_flag for single bit comparison against 0

2023-05-18 Thread Andrew Pinski via Gcc-patches

While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
extends it to handle the case where we know t has a
nonzero of just one bit set.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* expr.cc (do_store_flag): Extend the one bit checking case
to handle the case where we don't have an and but rather still
one bit is known to be non-zero.
---
 gcc/expr.cc | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5ede094e705..91528e734e7 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13083,15 +13083,30 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
   && integer_zerop (arg1)
   && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
 {
-  gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
-  if (srcstmt
- && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+  wide_int nz = tree_nonzero_bits (arg0);
+
+  if (wi::popcount (nz) == 1)
{
+ tree op0;
+ tree op1;
+ gimple *srcstmt = get_def_for_expr (arg0, BIT_AND_EXPR);
+ /* If the defining statement was (x & POW2), then remove the and
+as we are going to add it back. */
+ if (srcstmt
+ && integer_pow2p (gimple_assign_rhs2 (srcstmt)))
+   {
+ op0 = gimple_assign_rhs1 (srcstmt);
+ op1 = gimple_assign_rhs2 (srcstmt);
+   }
+ else
+   {
+ op0 = arg0;
+ op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
+   }
  enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
  type = lang_hooks.types.type_for_mode (mode, unsignedp);
- tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (arg1),
-  gimple_assign_rhs1 (srcstmt),
-  gimple_assign_rhs2 (srcstmt));
+ tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
+  op0, op1);
  temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
  if (temp)
return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
-- 
2.31.1

[PATCH 2/2] Improve do_store_flag for comparing single bit against that bit

2023-05-18 Thread Andrew Pinski via Gcc-patches

This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
  if (t & ~(1<<30)) __builtin_unreachable();
  t ^= (1<<30);
  return t != 0;
}
```

We should handle the case where the nonzero bits is the same as the
comparison operand.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* expr.cc (do_store_flag): Improve for single bit testing
not against zero but against that single bit.
---
 gcc/expr.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 91528e734e7..a4628c51c0c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -13080,12 +13080,15 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
  so we just call into the folder and expand its result.  */
 
   if ((code == NE || code == EQ)
-  && integer_zerop (arg1)
+  && (integer_zerop (arg1)
+ || integer_pow2p (arg1))
   && (TYPE_PRECISION (ops->type) != 1 || TYPE_UNSIGNED (ops->type)))
 {
   wide_int nz = tree_nonzero_bits (arg0);
 
-  if (wi::popcount (nz) == 1)
+  if (wi::popcount (nz) == 1
+ && (integer_zerop (arg1)
+ || wi::to_wide (arg1) == nz))
{
  tree op0;
  tree op1;
@@ -13103,11 +13106,13 @@ do_store_flag (sepops ops, rtx target, machine_mode 
mode)
  op0 = arg0;
  op1 = wide_int_to_tree (TREE_TYPE (op0), nz);
}
- enum tree_code tcode = code == NE ? NE_EXPR : EQ_EXPR;
+ enum tree_code tcode = EQ_EXPR;
+ if ((code == NE) ^ !integer_zerop (arg1))
+   tcode = NE_EXPR;
  type = lang_hooks.types.type_for_mode (mode, unsignedp);
  tree temp = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (op0),
   op0, op1);
- temp = fold_single_bit_test (loc, tcode, temp, arg1, type);
+ temp = fold_single_bit_test (loc, tcode, temp, build_zero_cst (type), 
type);
  if (temp)
return expand_expr (temp, target, VOIDmode, EXPAND_NORMAL);
}
-- 
2.31.1

[RFC V2] RISC-V : Support rv64 ilp32

2023-05-18 Thread Liao Shihua

This patch support ilp32 on rv64.
It remove option check when -march=rv64* -mabi=ilp32. And replace XLEN_SPEC in 
LINK_SPEC by ABI_LEN_SPEC. In addition, it some machine descriptions. 

The series kernel support in this link. 
https://lore.kernel.org/linux-riscv/20230518131013.3366406-1-guo...@kernel.org/

gcc/ChangeLog:

* config.gcc:
* config/riscv/elf.h (LINK_SPEC):
* config/riscv/linux.h (LINK_SPEC):
* config/riscv/riscv.cc (riscv_option_override):
* config/riscv/riscv.h (TARGET_ILP32):
(POINTER_SIZE):
(Pmode):
(ABI_LEN_SPEC):
* config/riscv/riscv.md:
---
 gcc/config.gcc|  3 +++
 gcc/config/riscv/elf.h|  2 +-
 gcc/config/riscv/linux.h  |  2 +-
 gcc/config/riscv/riscv.cc |  4 
 gcc/config/riscv/riscv.h  | 12 ++--
 gcc/config/riscv/riscv.md |  8 ++--
 6 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 6fd1594480a..db8e8f20791 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4658,6 +4658,9 @@ case "${target}" in
ilp32,rv32* | ilp32e,rv32e* \
| ilp32f,rv32*f* | ilp32f,rv32g* \
| ilp32d,rv32*d* | ilp32d,rv32g* \
+   | ilp32f,rv64*f* | ilp32f,rv64g* \
+   | ilp32d,rv64*d* | ilp32d,rv64g* \
+   | ilp32,rv64* \
| lp64,rv64* \
| lp64f,rv64*f* | lp64f,rv64g* \
| lp64d,rv64*d* | lp64d,rv64g*)
diff --git a/gcc/config/riscv/elf.h b/gcc/config/riscv/elf.h
index a725c00b637..bea531ebe89 100644
--- a/gcc/config/riscv/elf.h
+++ b/gcc/config/riscv/elf.h
@@ -18,7 +18,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #define LINK_SPEC "\
--melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \
+-melf" ABI_LEN_SPEC DEFAULT_ENDIAN_SPEC "riscv \
 %{mno-relax:--no-relax} \
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
index b9557a75dc7..4f33c88ef6e 100644
--- a/gcc/config/riscv/linux.h
+++ b/gcc/config/riscv/linux.h
@@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
   "%{mabi=ilp32:_ilp32}"
 
 #define LINK_SPEC "\
--melf" XLEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \
+-melf" ABI_LEN_SPEC DEFAULT_ENDIAN_SPEC "riscv" LD_EMUL_SUFFIX " \
 %{mno-relax:--no-relax} \
 %{mbig-endian:-EB} \
 %{mlittle-endian:-EL} \
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5f44f6dc5c9..09ab940447d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6291,10 +6291,6 @@ riscv_option_override (void)
   && riscv_abi != ABI_LP64 && riscv_abi != ABI_ILP32E)
 error ("z*inx requires ABI ilp32, ilp32e or lp64");
 
-  /* We do not yet support ILP32 on RV64.  */
-  if (BITS_PER_WORD != POINTER_SIZE)
-error ("ABI requires %<-march=rv%d%>", POINTER_SIZE);
-
   /* Validate -mpreferred-stack-boundary= value.  */
   riscv_stack_boundary = ABI_STACK_BOUNDARY;
   if (riscv_preferred_stack_boundary_arg)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 66fb07d6652..54fd328b5b0 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -77,6 +77,10 @@ extern const char *riscv_multi_lib_check (int argc, const 
char **argv);
 #define TARGET_64BIT   (__riscv_xlen == 64)
 #endif /* IN_LIBGCC2 */
 
+#ifndef TARGET_ILP32
+#define TARGET_ILP32   (riscv_abi <= ABI_ILP32D)
+#endif /*TARGET_ILP32*/
+
 #ifdef HAVE_AS_MISA_SPEC
 #define ASM_MISA_SPEC "%{misa-spec=*}"
 #else
@@ -172,7 +176,7 @@ ASM_MISA_SPEC
 #define SHORT_TYPE_SIZE 16
 #define INT_TYPE_SIZE 32
 #define LONG_LONG_TYPE_SIZE 64
-#define POINTER_SIZE (riscv_abi >= ABI_LP64 ? 64 : 32)
+#define POINTER_SIZE (TARGET_ILP32 ? 32 : 64)
 #define LONG_TYPE_SIZE POINTER_SIZE
 
 #define FLOAT_TYPE_SIZE 32
@@ -789,7 +793,7 @@ typedef struct {
After generation of rtl, the compiler makes no further distinction
between pointers and any other objects of this machine mode.  */
 
-#define Pmode word_mode
+#define Pmode (TARGET_ILP32 ? SImode : DImode)
 
 /* Give call MEMs SImode since it is the "most permissive" mode
for both 32-bit and 64-bit targets.  */
@@ -1039,6 +1043,10 @@ extern poly_int64 riscv_v_adjust_bytesize (enum 
machine_mode, int);
   "%{march=rv32*:32}" \
   "%{march=rv64*:64}" \
 
+#define ABI_LEN_SPEC \
+  "%{mabi=ilp32*:32}" \
+  "%{mabi=lp64*:64}" \
+
 #define ABI_SPEC \
   "%{mabi=ilp32:ilp32}" \
   "%{mabi=ilp32e:ilp32e}" \
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index bc384d9aedf..260b0907cf5 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2737,6 +2737,10 @@
   "reload_completed"
   [(const_int 0)]
 {
+  if (GET_MODE (operands[0]) != Pmode)
+operands[0] = convert_to_mode (Pmode, operands[0], 0);  
+  if (GET_MODE (operands[1]) != Pmode)
+operands[1] = convert_to_mode (Pmode, operands[1], 0);
   riscv_set_return_a

Re: [PATCH] avr: Set param_min_pagesize to 0 [PR105523]

2023-05-18 Thread SenthilKumar.Selvaraj--- via Gcc-patches

On 26/04/23, 5:51 PM, "Richard Biener" mailto:richard.guent...@gmail.com>> wrote:
> On Wed, Apr 26, 2023 at 12:56 PM  > wrote:
> >
> > On Wed, Apr 26, 2023 at 3:15 PM Richard Biener via Gcc-patches 
> > mailto:gcc-patches@gcc.gnu.org>> wrote:
> > >
> > > On Wed, Apr 26, 2023 at 11:42 AM Richard Biener
> > > mailto:richard.guent...@gmail.com>> wrote:
> > > >
> > > > On Wed, Apr 26, 2023 at 11:01 AM SenthilKumar.Selvaraj--- via
> > > > Gcc-patches mailto:gcc-patches@gcc.gnu.org>> 
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > This patch fixes PR 105523 by setting param_min_pagesize to 0 for the
> > > > > avr target. For this target, zero and offsets from zero are perfectly
> > > > > valid addresses, and the default value of param_min_pagesize ends up
> > > > > triggering warnings on valid memory accesses.
> > > >
> > > > I think the proper configuration is to have
> > > > DEFAULT_ADDR_SPACE_ZERO_ADDRESS_VALID
> > >
> > > Err, TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
> >
> > That worked. Ok for trunk and backporting to 13 and 12 branches
> > (pending regression testing)?
> 
> 
> OK, but please let Denis time to comment.

Didn't hear from Denis. When running regression tests with this patch,
I found that some tests with -fdelete-null-pointer-checks were
failing. Commit 19416210b37db0584cd0b3f3b3961324b8973d25 made
-fdelete-null-pointer-checks false by default, while still allowing it
to be overridden from the command line (it was previously
unconditionally false).

To keep the same behavior, I modified the hook to report zero
addresses as valid only if -fdelete-null-pointer-checks is not set.
With this change, all regression tests pass.

Ok for trunk and backporting to 13 and 12 branches?

Regards
Senthil

PR 105523

gcc/ChangeLog:

* config/avr/avr.cc (avr_addr_space_zero_address_valid):
(TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Return true if
flag_delete_null_pointer_checks is not set.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr105523.c: New test.


diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc
index d5af40f..4c9eb84 100644
--- gcc/config/avr/avr.cc
+++ gcc/config/avr/avr.cc
@@ -9787,6 +9787,18 @@ avr_addr_space_diagnose_usage (addr_space_t as, 
location_t loc)
   (void) avr_addr_space_supported_p (as, loc);
 }
 
+/* Implement `TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID. Zero is a valid
+   address in all address spaces. Even in ADDR_SPACE_FLASH1 etc..,
+   a zero address is valid and means 0x, where RAMPZ is
+   set to the appropriate segment value.
+   If the user explicitly passes in -fdelete-null-pointer-checks though,
+   assume zero addresses are invalid.*/
+
+static bool
+avr_addr_space_zero_address_valid (addr_space_t as ATTRIBUTE_UNUSED)
+{
+  return flag_delete_null_pointer_checks == 0;
+}
 
 /* Look if DECL shall be placed in program memory space by
means of attribute `progmem' or some address-space qualifier.
@@ -14687,6 +14699,9 @@ avr_float_lib_compare_returns_bool (machine_mode mode, 
enum rtx_code)
 #undef  TARGET_ADDR_SPACE_DIAGNOSE_USAGE
 #define TARGET_ADDR_SPACE_DIAGNOSE_USAGE avr_addr_space_diagnose_usage
 
+#undef  TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
+#define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID avr_addr_space_zero_address_valid
+
 #undef  TARGET_MODE_DEPENDENT_ADDRESS_P
 #define TARGET_MODE_DEPENDENT_ADDRESS_P avr_mode_dependent_address_p
 
diff --git gcc/testsuite/gcc.target/avr/pr105523.c 
gcc/testsuite/gcc.target/avr/pr105523.c
new file mode 100644
index 000..fbbf7bf
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr105523.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -Wall" } */
+
+/* Verify no "array subscript 0 is outside array bounds of" is generated
+   for accessing memory addresses in the 0-4096 range. */
+
+typedef __UINT8_TYPE__ uint8_t;
+
+#define SREG (*(volatile uint8_t*) (0x3F + __AVR_SFR_OFFSET__ ))
+
+void bar (void)
+{
+SREG = 0;
+}

[PATCH] MIPS: don't expand large block move

2023-05-18 Thread YunQiang Su

On platform with LWL/LWR, mips_block_move_loop is always used,
which expand __buildin_memcpy/strcpy to a loop of lwl/lwr/swl/swl etc.

For short (normally <=64), it has better performance,
but when the src/dest are long, use memcpy/strcpy lib call may have
better performance.

At the same time, lib call may be optimized with SIMD, so,
on the platform with SIMD, lib call may have much better performace.

gcc/ChangeLog:
* config/mips/mips.cc (mips_expand_block_move): don't expand
  if length>=64.

gcc/testsuite/ChangeLog:
* gcc.target/mips/expand-block-move-large.c: New test.
---
 gcc/config/mips/mips.cc |  6 ++
 .../gcc.target/mips/expand-block-move-large.c   | 17 +
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/expand-block-move-large.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca491b981a3..00f26d5e923 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -8313,6 +8313,12 @@ mips_expand_block_move (rtx dest, rtx src, rtx length)
}
   else if (optimize)
{
+ /* When the length is big enough, the lib call has better performace
+than load/store insns.
+In most platform, the value is about 64-128.
+And in fact lib call may be optimized with SIMD */
+ if (INTVAL(length) >= 64)
+   return false;
  mips_block_move_loop (dest, src, INTVAL (length),
MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER);
  return true;
diff --git a/gcc/testsuite/gcc.target/mips/expand-block-move-large.c 
b/gcc/testsuite/gcc.target/mips/expand-block-move-large.c
new file mode 100644
index 000..ae054551a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/expand-block-move-large.c
@@ -0,0 +1,17 @@
+/* { dg-options "isa_rev<=5" } */
+/* { dg-final { scan-assembler-not "lwl" } } */
+/* { dg-final { scan-assembler-not "swl" } } */
+/* { dg-final { scan-assembler-not "lwr" } } */
+/* { dg-final { scan-assembler-not "swr" } } */
+/* { dg-final { scan-assembler-not "ldl" } } */
+/* { dg-final { scan-assembler-not "sdl" } } */
+/* { dg-final { scan-assembler-not "ldr" } } */
+/* { dg-final { scan-assembler-not "sdr" } } */
+
+char a[4097], b[4097];
+
+NOCOMPRESSION void
+foo (volatile int *x)
+{
+  __builtin_memcpy(&a[1], &b[1], 64);
+}
-- 
2.30.2

73 matches

Mail list logo