Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-20 Thread Richard Biener via Gcc-patches
On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:

> On Fri, 18 Aug 2023 at 17:11, Richard Biener  wrote:
> >
> > On Fri, 18 Aug 2023, Richard Biener wrote:
> >
> > > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> > >
> > > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Richard Biener  writes:
> > > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > > > >>  wrote:
> > > > > >> > It doesn't seem to make a difference for x86.  That said, the 
> > > > > >> > "fix" is
> > > > > >> > probably sticking the correct target on the dump-check, it seems
> > > > > >> > that vect_fold_extract_last is no longer correct here.
> > > > > >> Um sorry, I did go thru various checks in target-supports.exp, but 
> > > > > >> not
> > > > > >> sure which one will be appropriate for this case,
> > > > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > > > >
> > > > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > > > implemented the direct conversion support.  I suggest to implement
> > > > > > such dg-checks if they are not present (I can't find them),
> > > > > > possibly quite specific to the modes involved (like we have
> > > > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > > > just _float).
> > > > >
> > > > > Yeah, can't remember specific selectors for that feature.  TBH I think
> > > > > most (all?) of the tests were AArch64-specific.
> > > > Hi,
> > > > As Richi mentioned above, the test now vectorizes on AArch64 because
> > > > it has support for direct conversion
> > > > between vectors while x86 doesn't. IIUC this is because
> > > > supportable_convert_operation returns true
> > > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > > > doing the conversion ?
> > > >
> > > > In the attached patch, I added a new target check vect_extend which
> > > > (currently) returns 1 only for aarch64*-*-*,
> > > > which makes the test PASS on both the targets, altho I am not sure if
> > > > this is entirely correct.
> > > > Does the patch look OK ?
> > >
> > > Can you make vect_extend more specific, say vect_extend_hi_si or
> > > what is specifically needed here?  Note I'll have to investigate
> > > why x86 cannot vectorize here since in fact it does have
> > > the extend operation ... it might be also worth splitting the
> > > sign/zero extend case, so - vect_sign_extend_hi_si or
> > > vect_extend_short_int?
> >
> > And now having anaylzed _why_ x86 doesn't vectorize it's rather
> > why we get this vectorized with NEON which is because
> >
> > static opt_machine_mode
> > aarch64_vectorize_related_mode (machine_mode vector_mode,
> > scalar_mode element_mode,
> > poly_uint64 nunits)
> > {
> > ...
> >   /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
> >   if (TARGET_SIMD
> >   && (vec_flags & VEC_ADVSIMD)
> >   && known_eq (nunits, 0U)
> >   && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
> >   && maybe_ge (GET_MODE_BITSIZE (element_mode)
> >* GET_MODE_NUNITS (vector_mode), 128U))
> > {
> >   machine_mode res = aarch64_simd_container_mode (element_mode, 128);
> >   if (VECTOR_MODE_P (res))
> > return res;
> >
> > which makes us get a V4SImode vector for a V4HImode loop vector_mode.
> Thanks for the explanation!
> >
> > So I think the appropriate effective dejagnu target is
> > aarch64-*-* (there's none specifically to advsimd, not sure if one
> > can disable that?)
> The attached patch uses aarch64*-*-* target check, and additionally
> for SVE (and other targets supporting vect_fold_extract_last) it
> checks
> if the condition reduction was carried out using FOLD_EXTRACT_LAST.
> Does that look OK ?

Works for me.

Richard.

> Thanks,
> Prathamesh
> >
> 
> > Richard.
> >
> > > > Thanks,
> > > > Prathamesh
> > > > >
> > > > > Thanks,
> > > > > Richard
> > > >
> > >
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-20 Thread Richard Biener via Gcc-patches
On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:

> On Fri, 18 Aug 2023 at 14:52, Richard Biener  wrote:
> >
> > On Fri, 18 Aug 2023, Richard Sandiford wrote:
> >
> > > Richard Biener  writes:
> > > > The following avoids running into somehow flawed logic in fold_vec_perm
> > > > for non-VLA vectors.
> > > >
> > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > >
> > > > Richard.
> > > >
> > > > PR tree-optimization/111048
> > > > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > > > vectors first.
> > > >
> > > > * gcc.dg/torture/pr111048.c: New testcase.
> > >
> > > Please don't do this as a permanent thing.  It was a deliberate choice
> > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> > > logic gets more coverage.  Like you say, if something is wrong for VLS
> > > then the chances are that it's also wrong for VLA.
> >
> > Sure, feel free to undo this change together with the fix for the
> > VLA case.
> Hi,
> The attached patch reverts the workaround, and fixes the issue.
> Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
> x64_64-linux-gnu.
> OK to commit ?

OK.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > > > ---
> > > >  gcc/fold-const.cc   | 12 ++--
> > > >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 
> > > >  2 files changed, 30 insertions(+), 6 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> > > >
> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > index 5c51c9d91be..144fd7481b3 100644
> > > > --- a/gcc/fold-const.cc
> > > > +++ b/gcc/fold-const.cc
> > > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > > > arg1, const vec_perm_indices &sel,
> > > >unsigned res_npatterns, res_nelts_per_pattern;
> > > >unsigned HOST_WIDE_INT res_nelts;
> > > >
> > > > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > +{
> > > > +  res_npatterns = res_nelts;
> > > > +  res_nelts_per_pattern = 1;
> > > > +}
> > > >/* (1) If SEL is a suitable mask as determined by
> > > >   valid_mask_for_fold_vec_perm_cst_p, then:
> > > >   res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > > > arg1, const vec_perm_indices &sel,
> > > >   res_npatterns = nelts in result vector.
> > > >   res_nelts_per_pattern = 1.
> > > >   This exception is made so that VLS ARG0, ARG1 and SEL work as 
> > > > before.  */
> > > > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, 
> > > > reason))
> > > >  {
> > > >res_npatterns
> > > > = std::max (VECTOR_CST_NPATTERNS (arg0),
> > > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > > > arg1, const vec_perm_indices &sel,
> > > >
> > > >res_nelts = res_npatterns * res_nelts_per_pattern;
> > > >  }
> > > > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > -{
> > > > -  res_npatterns = res_nelts;
> > > > -  res_nelts_per_pattern = 1;
> > > > -}
> > > >else
> > > >  return NULL_TREE;
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c 
> > > > b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > new file mode 100644
> > > > index 000..475978aae2b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > @@ -0,0 +1,24 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > > > +
> > > > +typedef unsigned char u8;
> > > > +
> > > > +__attribute__((noipa))
> > > > +static void check(const u8 * v) {
> > > > +if (*v != 15) __builtin_trap();
> > > > +}
> > > > +
> > > > +__attribute__((noipa))
> > > > +static void bug(void) {
> > > > +u8 in_lanes[32];
> > > > +for (unsigned i = 0; i < 32; i += 2) {
> > > > +  in_lanes[i + 0] = 0;
> > > > +  in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > > > +}
> > > > +
> > > > +check(&in_lanes[13]);
> > > > +  }
> > > > +
> > > > +int main() {
> > > > +bug();
> > > > +}
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PING^4] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-08-20 Thread Ajit Agarwal via Gcc-patches


Ping!

 Forwarded Message 
Subject: [PING^3] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Tue, 1 Aug 2023 13:48:58 +0530
From: Ajit Agarwal 
To: gcc-patches , Jeff Law , 
Richard Biener , Peter Bergner 
, Segher Boessenkool , 
rashmi.srid...@ibm.com

Ping!


 Forwarded Message 
Subject: [PING^2] PATCH v5 4/4] ree: Improve ree pass for rs6000 target using 
defined ABI interfaces.
Date: Tue, 18 Jul 2023 13:28:08 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner 


Ping^2.

Please review.

Thanks & Regards
Ajit


This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Review comments incorporated.

Thanks & Regards
Ajit

Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign
extension and done to improve ree pass to eliminate
such redundant zero and sign extension using defined
ABI interfaces.

2023-06-01  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs_without_defs_p): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 199 +++---
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 183 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..2025a7c43da 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode =
+targetm.calls.promote_function_mode (NULL_TREE, mode, &unsignedp,
+NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs_without_defs_p (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses
+= get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn)
+ != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+   return false;
+
+  rtx_insn *use_insn = DF_REF_INSN (use->ref);
+
+  if (GET_CODE (PATTERN (use_insn)) == SET)
+   {
+ rtx_code code = GET_CO

[PING^4] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 target.

2023-08-20 Thread Ajit Agarwal via Gcc-patches
Ping!


 Forwarded Message 
Subject: PING^3] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 
target.
Date: Tue, 1 Aug 2023 13:50:21 +0530
From: Ajit Agarwal 
To: gcc-patches , Jeff Law , 
Richard Biener , Peter Bergner 
, Segher Boessenkool , 
rashmi.srid...@ibm.com


Ping!

 Forwarded Message 
Subject: [PING^2] [PATCH 3/4] ree: Improve functionality of ree pass for rs6000 
target.
Date: Tue, 18 Jul 2023 13:31:27 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Jeff Law , Richard Biener 
, Segher Boessenkool , 
Peter Bergner 

Ping^2.

Please review.

Thanks & Regards
Ajit


This patch provide functionality to improve ree pass for rs6000 target.
Eliminated sign_extend/zero_extend/AND with varying constants.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

ree: Improve ree pass for rs6000 target

For rs6000 target we see redundant zero and sign extension and done to improve
ree pass to eliminate such redundant zero and sign extension. Support of
zero_extend/sign_extend/AND. Also support of AND with extension with different
constants other than 1.

2023-06-07  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (eliminate_across_bbs_p): Add checks to enable extension
elimination across and within basic blocks.
(def_arith_p): New function to check definition has arithmetic
operation.
(combine_set_extension): Modification to incorporate AND
and current zero_extend and sign_extend instruction.
(merge_def_and_ext): Add calls to eliminate_across_bbs_p and
zero_extend sign_extend and AND instruction.
(rtx_is_zext_p): New function.
(feasible_cfg): New function.
* rtl.h (reg_used_set_between_p): Add prototype.
* rtlanal.cc (reg_used_set_between_p): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim.C: New testcase.
* g++.target/powerpc/zext-elim-1.C: New testcase.
* g++.target/powerpc/zext-elim-2.C: New testcase.
* g++.target/powerpc/sext-elim.C: New testcase.
---
 gcc/ree.cc| 476 --
 gcc/rtl.h |   1 +
 gcc/rtlanal.cc|  15 +
 gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
 .../g++.target/powerpc/zext-elim-1.C  |  19 +
 .../g++.target/powerpc/zext-elim-2.C  |  11 +
 gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
 7 files changed, 524 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..dc6da21ec16 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -253,6 +253,66 @@ struct ext_cand
 
 static int max_insn_uid;
 
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx insn)
+{
+  if (GET_CODE (insn) == AND)
+{
+  rtx set = XEXP (insn, 0);
+  if (REG_P (set))
+   {
+ rtx src = XEXP (insn, 1);
+
+ if (CONST_INT_P (src)
+ && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
+   return true;
+   }
+  else
+   return false;
+}
+
+  return false;
+}
+/* Return TRUE if OP can be considered a zero extension from one or
+   more sub-word modes to larger modes up to a full word.
+
+   For example (and:DI (reg) (const_int X))
+
+   Depending on the value of X could be considered a zero extension
+   from QI, HI and SI to larger modes up to DImode.  */
+
+static bool
+rtx_is_zext_p (rtx_insn *insn)
+{
+  rtx body = single_set (insn);
+
+  if (GET_CODE (body) == SET && GET_CODE (SET_SRC (body)) == AND)
+   {
+ rtx set = XEXP (SET_SRC (body), 0);
+
+ if (REG_P (set) && GET_MODE (SET_DEST (body)) == GET_MODE (set))
+   {
+ rtx src = XEXP (SET_SRC (body), 1);
+
+ if (CONST_INT_P (src)
+ && IN_RANGE (exact_log2 (UINTVAL (src)), 0, 7))
+   return true;
+   }
+ else
+  return false;
+   }
+
+   return false;
+}
+
 /* Update or remove REG_EQUAL or REG_EQUIV notes for INSN.  */
 
 static bool
@@ -319,7 +379,7 @@ combine_set_extension (ext_cand *cand, rtx_insn *curr_insn, 
rtx *orig_set)
 {
   rtx orig_src = SET_SRC (*orig_set);
   machine_mode orig_mode = GET_MODE (SET_DEST (*orig_set));
-  rtx new_set;
+  rtx new_set = NULL_RTX;
   rtx cand_pat = single_set (cand->insn);
 
   /* If the extension's source/destination registers are not the same
@@ -359,27 +419,41 @@ combine_set_extension (ext_cand 

[PING^2] [PATCH v8] tree-ssa-sink: Improve code sinking pass.

2023-08-20 Thread Ajit Agarwal via Gcc-patches
Ping!


 Forwarded Message 
Subject: [PING^1] [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Tue, 1 Aug 2023 13:47:10 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Peter Bergner , Segher 
Boessenkool , rashmi.srid...@ibm.com

Ping! 


 Forwarded Message 
Subject: [PATCH v8] tree-ssa-sink: Improve code sinking pass.
Date: Tue, 18 Jul 2023 19:03:37 +0530
From: Ajit Agarwal 
To: gcc-patches 
CC: Richard Biener , Jeff Law 
, Segher Boessenkool , Peter 
Bergner 

Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-07-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 +++
 gcc/tree-ssa-sink.cc| 59 -
 3 files changed, 67 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..e7190323abe 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -173,7 +173,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -190,11 +191,22 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -203,34 +

Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-20 Thread juzhe.zh...@rivai.ai
Why does this patch not have HAS_FRM?



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-17 16:05
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic 
API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples
 
* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  2 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  1 +
.../riscv/rvv/base/float-point-wredusum.c | 33 +++
4 files changed, 37 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index abf03bab0da..5ee7d3119db 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2548,6 +2548,7 @@ static CONSTEXPR const freducop 
vfredosum_frm_obj;
static CONSTEXPR const reducop vfredmax_obj;
static CONSTEXPR const reducop vfredmin_obj;
static CONSTEXPR const widen_freducop vfwredusum_obj;
+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;
static CONSTEXPR const widen_freducop vfwredosum_obj;
static CONSTEXPR const widen_freducop 
vfwredosum_frm_obj;
static CONSTEXPR const vmv vmv_x_obj;
@@ -2810,6 +2811,7 @@ BASE (vfredmin)
BASE (vfwredosum)
BASE (vfwredosum_frm)
BASE (vfwredusum)
+BASE (vfwredusum_frm)
BASE (vmv_x)
BASE (vmv_s)
BASE (vfmv_f)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index c1bb164a712..69d4562091f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -247,6 +247,7 @@ extern const function_base *const vfredmin;
extern const function_base *const vfwredosum;
extern const function_base *const vfwredosum_frm;
extern const function_base *const vfwredusum;
+extern const function_base *const vfwredusum_frm;
extern const function_base *const vmv_x;
extern const function_base *const vmv_s;
extern const function_base *const vfmv_f;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index da1157f5a56..3ce06dc60b7 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -508,6 +508,7 @@ DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, 
wf_vs_ops)
DEF_RVV_FUNCTION (vfwredusum, reduc_alu, no_mu_preds, wf_vs_ops)
DEF_RVV_FUNCTION (vfwredosum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
+DEF_RVV_FUNCTION (vfwredusum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
/* 15. Vector Mask Instructions.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
new file mode 100644
index 000..6c888c10c0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1_rm (vfloat32m1_t op1, vfloat64m1_t op2,
+ size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm (op1, op2, 0, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_rm_m (vbool32_t mask, vfloat32m1_t op1,
+  vfloat64m1_t op2, size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1 (vfloat32m1_t op1, vfloat64m1_t op2,
+   size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1 (op1, op2, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_m (vbool32_t mask, vfloat32m1_t op1,
+   vfloat64m1_t op2, size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_m (mask, op1, op2, vl);
+}
+
+/* { dg-final { scan-assembler-times {vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+} 4 } 
} */
+/* { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrmi\s+[01234]} 2 } } */
-- 
2.34.1
 
 


[PATCH] MATCH: [PR111002] Sink view_convert for vec_cond

2023-08-20 Thread Andrew Pinski via Gcc-patches
Like convert we can sink view_convert into vec_cond but
we can only do it if the element types are nop_conversions.
This is to allow conversion between signed and unsigned types only.
Rather than between integer and float types which mess up the vec_cond
so that isel does not understand `a?-1:0` is still that.

OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

PR tree-optimization/111002

gcc/ChangeLog:

* match.pd (view_convert(vec_cond(a,b,c))): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/cond_convert_8.c: New test.
---
 gcc/match.pd  |  9 
 .../gcc.target/aarch64/sve/cond_convert_8.c   | 22 +++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 851f1af6eac..81666f28465 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4718,6 +4718,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && types_match (TREE_TYPE (@0), truth_type_for (type)))
   (vec_cond @0 (convert! @1) (convert! @2
 
+/* Likewise for view_convert of nop_conversions. */
+(simplify
+ (view_convert (vec_cond:s @0 @1 @2))
+ (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@1))
+  && known_eq (TYPE_VECTOR_SUBPARTS (type),
+  TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
+  && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@1
+  (vec_cond @0 (view_convert! @1) (view_convert! @2
+
 /* Sink binary operation to branches, but only if we can fold it.  */
 (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
 lshift rshift rdiv trunc_div ceil_div floor_div round_div
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c
new file mode 100644
index 000..d8b96e5fcfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 
-fdump-tree-optimized" } */
+/* PR tree-optimization/111002 */
+
+/* We should be able to remove the neg. */
+
+void __attribute__ ((noipa))
+f (int *__restrict r,
+   int *__restrict a,
+   short *__restrict pred)
+{
+  for (int i = 0; i < 1024; ++i)
+r[i] = pred[i] != 0 ? -1 : 0;
+}
+
+
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } } 
*/
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } */
+
+/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */
-- 
2.31.1



[PATCH] Mention Intel -march=gracemont for Alderlake-N.

2023-08-20 Thread liuhongt via Gcc-patches
---
 htdocs/gcc-14/changes.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index eae25f1a..2c888660 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -151,6 +151,10 @@ a work-in-progress.
 -march=lunarlake.
 Lunar Lake is based on Arrow Lake S.
   
+  GCC now supports the Intel CPU named Alderlake-N through
+  -march=gracemont.
+  Alderlake-N is E-core only, not hybrid architecture.
+  
 
 
 
-- 
2.31.1



[PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-20 Thread Juzhe-Zhong
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest 
fusion.
I do the refactor for the following reasons:
  
  1. Current implementation of phase 3 is doing too many things which makes the 
code quality
 quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion 
including how to fuse
 VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion 
point (location)
 whether it is correct and optimal...etc.

 We are dong these things too much so I added these following functions:

enum fusion_type get_backward_fusion_type (const bb_info *,
 const vector_insn_info &);
bool hard_empty_block_p (const bb_info *, const vector_insn_info &) 
const;
bool backward_demand_fusion (void);
bool forward_demand_fusion (void);
bool cleanup_illegal_dirty_blocks (void);

 to make sure the VSETV fusion is optimal and correct. I found in may 
downstream testing it is
 not the reliable and optimal approach.

 Instead, this patch is to use 'compute_earliest' which is the function of 
LCM to fuse multiple
 'compatible' VSETVL demand info if they are having same earliest edge.  We 
let LCM decide almost
 everything of demand fusion for us. The only thing we do (Not the LCM do) 
is just checking the
 VSETVLs demand info are compatible or not. That's all we need to do.
 I belive such approach is much more reliable and optimal than before (We 
have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG 
than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for 
for
  ...
 for
   if (...)
 VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
 VSETVL 2 demand: SEW = 16.
   else
 VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 
and VSETVL 3.
 They are having same earliest edge which is outside the 1th inner-most 
loop.
   
   - Then, we check these 3 VSETVL demand info are compatible so fuse them into 
a single VSETVL info:
 demand SEW = 16, LMUL = MF2, TU, MU.
   
   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) 
will hoist such VSETVL
 to the outer-most loop. So that we can get optimal codegen.

This patch is depending on: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627948.html

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New 
function.
(find_reg_killed_by): Delete.
(after_or_same_p): New function.
(has_vsetvl_killed_avl_p):Delete.
(anticipatable_occurrence_p): Adapt function.
(get_same_bb_set): Delete.
(any_set_in_bb_p): Ditto.
(change_insn): Format.
(ge_sew_ratio_unavailable_p): Fix bug.
(backward_propagate_worthwhile_p): Delete.
(vector_insn_info::parse_insn): Adapt function.
(vector_insn_info::merge): Ditto.
(vector_insn_info::dump): Ditto.
(vector_infos_manager::vector_infos_manager): Refactor Phase 3.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_ratio_p): Refactor Phase 3.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Ditto.
(vector_infos_manager::free_bitmap_vectors): Ditto.
(vector_infos_manager::dump): Ditto.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Refactor Phase 3.
(pass_vsetvl::get_backward_fusion_type): Delete.
(demands_can_be_fused_p): New function.
(pass_vsetvl::hard_empty_block_p): Delete.
(earliest_pred_can_be_fused_p): New function.
(pass_vsetvl::backward_demand_fusion): Delete.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::forward_demand_fusion): Delete.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Adapt function.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::commit_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Adapt function.
(pass_vsetvl::compute_probabilities): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug.
* config/riscv/riscv-vsetvl.h: Refactor Phase 3.
* config/riscv/t-riscv: 

Re: [PATCH-1, combine] Don't widen shift mode when target has rotate/mask instruction on original mode [PR93738]

2023-08-20 Thread HAO CHEN GUI via Gcc-patches
Jeff,
  Thanks a lot for your comments.

  The widen shift mode is on i1/i2 before they're combined with i3 to newpat.
The newpat matches rotate/mask pattern. The i1/i2 itself don't match
rotate/mask pattern.

  I did an experiment to disable widen shift mode for
lshiftrt. I tested it on powerpc/x86/aarch64. There is no regression occurred.
I thought that the widen shift mode is helpful for newpat matching. But it seems
not, at least no impact on powerpc/x86/aarch64.

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 4bf867d74b0..0b9b115f9bb 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -10479,11 +10479,6 @@ try_widen_shift_mode (enum rtx_code code, rtx op, int 
count,
   return orig_mode;

 case LSHIFTRT:
-  /* Similarly here but with zero bits.  */
-  if (HWI_COMPUTABLE_MODE_P (mode)
- && (nonzero_bits (op, mode) & ~GET_MODE_MASK (orig_mode)) == 0)
-   return mode;
-
   /* We can also widen if the bits brought in will be masked off.  This
 operation is performed in ORIG_MODE.  */
   if (outer_code == AND)

Segher,
  Could you inform me what's the purpose of widen shift mode in
simplify_shift_const? Does it definitely reduce the rtx cost or it helps match
patterns? Thanks a lot.

Thanks
Gui Haochen


在 2023/8/5 7:32, Jeff Law 写道:
> 
> 
> On 7/20/23 18:59, HAO CHEN GUI wrote:
>> Hi Jeff,
>>
>> 在 2023/7/21 5:27, Jeff Law 写道:
>>> Wouldn't it make more sense to just try rotate/mask in the original mode 
>>> before trying a shift in a widened mode?  I'm not sure why we need a target 
>>> hook here.
>>
>> There is no change to try rotate/mask with the original mode when
>> expensive_optimizations is set. The subst widens the shift mode.
> But we can add it before the attempt in the wider mode.
> 
>>
>>    if (flag_expensive_optimizations)
>>  {
>>    /* Pass pc_rtx so no substitutions are done, just
>>   simplifications.  */
>>    if (i1)
>>  {
>>    subst_low_luid = DF_INSN_LUID (i1);
>>    i1src = subst (i1src, pc_rtx, pc_rtx, 0, 0, 0);
>>  }
>>
>>    subst_low_luid = DF_INSN_LUID (i2);
>>    i2src = subst (i2src, pc_rtx, pc_rtx, 0, 0, 0);
>>  }
>>
>> I don't know if the wider mode is helpful to other targets, so
>> I added the target hook.
> In this scenario we're often better off relying on rtx_costs (even with all 
> its warts) rather than adding yet another target hook.
> 
> I'd love to hear from Segher here to see if he's got other ideas.
> 
> jeff


[PATCH] LoongArch: initial ada support on linux

2023-08-20 Thread Yang Yujie
gcc/ChangeLog:

* ada/Makefile.rtl: Add LoongArch support.
* ada/libgnarl/s-linux__loongarch.ads: New.
* ada/libgnat/system-linux-loongarch.ads: New.
* config/loongarch/loongarch.h: mark normalized options
passed from driver to gnat1 as explicit for multilib.
---
 gcc/ada/Makefile.rtl   |  49 +++
 gcc/ada/libgnarl/s-linux__loongarch.ads| 134 +++
 gcc/ada/libgnat/system-linux-loongarch.ads | 145 +
 gcc/config/loongarch/loongarch.h   |   4 +-
 4 files changed, 330 insertions(+), 2 deletions(-)
 create mode 100644 gcc/ada/libgnarl/s-linux__loongarch.ads
 create mode 100644 gcc/ada/libgnat/system-linux-loongarch.ads

diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index b94caa45b10..8908a5acf38 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2118,6 +2118,55 @@ ifeq ($(strip $(filter-out cygwin% mingw32% 
pe,$(target_os))),)
   LIBRARY_VERSION := $(LIB_VERSION)
 endif
 
+# LoongArch Linux
+ifeq ($(strip $(filter-out loongarch% linux%,$(target_cpu) $(target_os))),)
+  LIBGNAT_TARGET_PAIRS = \
+  a-exetim.adbhttp://www.gnu.org/licenses/>.  --
+--  --
+--
+
+--  This is the LoongArch version of this package
+
+--  This package encapsulates cpu specific differences between implementations
+--  of GNU/Linux, in order to share s-osinte-linux.ads.
+
+--  PLEASE DO NOT add any with-clauses to this package or remove the pragma
+--  Preelaborate. This package is designed to be a bottom-level (leaf) package
+
+with Interfaces.C;
+with System.Parameters;
+
+package System.Linux is
+   pragma Preelaborate;
+
+   --
+   -- Time --
+   --
+
+   subtype int is Interfaces.C.int;
+   subtype longis Interfaces.C.long;
+   subtype suseconds_t is Interfaces.C.long;
+   type time_t is range -2 ** (System.Parameters.time_t_bits - 1)
+ .. 2 ** (System.Parameters.time_t_bits - 1) - 1;
+   subtype clockid_t   is Interfaces.C.int;
+
+   type timespec is record
+  tv_sec  : time_t;
+  tv_nsec : long;
+   end record;
+   pragma Convention (C, timespec);
+
+   type timeval is record
+  tv_sec  : time_t;
+  tv_usec : suseconds_t;
+   end record;
+   pragma Convention (C, timeval);
+
+   ---
+   -- Errno --
+   ---
+
+   EAGAIN: constant := 11;
+   EINTR : constant := 4;
+   EINVAL: constant := 22;
+   ENOMEM: constant := 12;
+   EPERM : constant := 1;
+   ETIMEDOUT : constant := 110;
+
+   -
+   -- Signals --
+   -
+
+   SIGHUP : constant := 1; --  hangup
+   SIGINT : constant := 2; --  interrupt (rubout)
+   SIGQUIT: constant := 3; --  quit (ASCD FS)
+   SIGILL : constant := 4; --  illegal instruction (not reset)
+   SIGTRAP: constant := 5; --  trace trap (not reset)
+   SIGIOT : constant := 6; --  IOT instruction
+   SIGABRT: constant := 6; --  used by abort, replace SIGIOT in the  future
+   SIGBUS : constant := 7; --  bus error
+   SIGFPE : constant := 8; --  floating point exception
+   SIGKILL: constant := 9; --  kill (cannot be caught or ignored)
+   SIGUSR1: constant := 10; --  user defined signal 1
+   SIGSEGV: constant := 11; --  segmentation violation
+   SIGUSR2: constant := 12; --  user defined signal 2
+   SIGPIPE: constant := 13; --  write on a pipe with no one to read it
+   SIGALRM: constant := 14; --  alarm clock
+   SIGTERM: constant := 15; --  software termination signal from kill
+   SIGSTKFLT  : constant := 16; --  coprocessor stack fault (Linux)
+   SIGCLD : constant := 17; --  alias for SIGCHLD
+   SIGCHLD: constant := 17; --  child status change
+   SIGCONT: constant := 18; --  stopped process has been continued
+   SIGSTOP: constant := 19; --  stop (cannot be caught or ignored)
+   SIGTSTP: constant := 20; --  user stop requested from tty
+   SIGTTIN: constant := 21; --  background tty read attempted
+   SIGTTOU: constant := 22; --  background tty write attempted
+   SIGURG : constant := 23; --  urgent condition on IO channel
+   SIGXCPU: constant := 24; --  CPU time limit exceeded
+   SIGXFSZ: constant := 25; --  filesize limit exceeded
+   SIGVTALRM  : constant := 26; --  virtual timer expired
+   SIGPROF: constant := 27; --  profiling timer expired
+   SIGWINCH   : constant := 28; --  window size change
+   SIGPOLL: constant := 29; --  pollable event occurred
+   SIGIO  : constant := 29; --  I/O now possible (4.2 BSD)
+   SIGPWR : constant := 30; --  power-fail restart
+   SIGSYS : constant := 31; --  bad system call
+   SIG32  : constant := 32; --  glibc internal signal
+   SIG33  : constant := 33; --  glibc internal signal
+   SIG34  : constant := 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-20 Thread Hongtao Liu via Gcc-patches
On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
 wrote:
>
> Hi,
>
> With the proposed design of these switches, how would I restrict AVX10.1
> to particular AVX-512 subsets?
We can't, avx10.1 is taken as an indivisible ISA which contains all
AVX512 related instructions.

> We’ve been taking these cases as bugs (but yes, intrinsics are still allowed, 
> so in some cases it might prove difficult to guarantee this).
intel sde support avx10.1-256 target which can be used to validate the
binary(if there's invalid 512-bit vector register or 64-bit kmask
register is used).
> I don’t see any other way of doing what you want within the constraints of 
> this design.
It looks like the requirement is that we want a
-mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
option that acts on the original -mavx512XXX option to produce
avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
include avx512fp16 directives and thus not be backward compatible
SKX/CLX/ICX.
>
> For example, usage of the |_mm256_rol_epi32| intrinsic should be
> compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> I want compatibility with both these targets?
>
>   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
> with 256-bit AVX10.
>   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
> from 512-bit registers, but I don't think it guarantees it.
>   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
> features at 256-bit wide (so in theory, it could choose to compile
> it with |vpshldd|) -> incompatible with Skylake-X.
>   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
> and ignore the attempts at disabling AVX-512 subsets.
>   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
> the /intersection./
>
> Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> misunderstanding the situation?
>
> Thanks!



-- 
BR,
Hongtao


Re: [PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-20 Thread Juzhe Zhong

I am so sorry sending the wrong and duplicate patch.
Forget about this patch.




[PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-20 Thread Juzhe-Zhong
This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
scope
which is going to be used in VSETVL PASS of RISC-V backend.

The demand fusion is the fusion of VSETVL information to emit VSETVL which 
dominate and pre-config for most
of the RVV instructions in order to elide redundant VSETVLs.

For exmaple:

for
 for
  for
if (cond}
  VSETVL demand 1: SEW/LMUL = 16 and TU policy
else
  VSETVL demand 2: SEW = 32

VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW = 
32, LMUL = M2, TU policy.
Then emit such VSETVL at the outmost of the for loop to get the most optimal 
codegen and run-time execution.

Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
un-reliable as well as un-maintainable.
And, I recently read dragon book and morgan's book again, I found there 
"earliest" can allow us to do the
demand fusion in a very reliable and optimal way.

So, this patch exports these 2 functions which are very helpful for VSETVL pass.

gcc/ChangeLog:

* lcm.cc (compute_antinout_edge): Export as global use.
(compute_earliest): Ditto.
(compute_rev_insert_delete): Ditto.
* lcm.h (compute_antinout_edge): Ditto.
(compute_earliest): Ditto.

---
 gcc/lcm.cc | 7 ++-
 gcc/lcm.h  | 3 +++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/lcm.cc b/gcc/lcm.cc
index 94a3ed43aea..03421e490e4 100644
--- a/gcc/lcm.cc
+++ b/gcc/lcm.cc
@@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "lcm.h"
 
 /* Edge based LCM routines.  */
-static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
-static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
- sbitmap *, sbitmap *, sbitmap *);
 static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
 sbitmap *, sbitmap *);
 static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
@@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list 
*edge_list, sbitmap *,
This is done based on the flow graph, and not on the pred-succ lists.
Other than that, its pretty much identical to compute_antinout.  */
 
-static void
+void
 compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin,
   sbitmap *antout)
 {
@@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap *transp, 
sbitmap *antin,
 
 /* Compute the earliest vector for edge based lcm.  */
 
-static void
+void
 compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin,
  sbitmap *antout, sbitmap *avout, sbitmap *kill,
  sbitmap *earliest)
diff --git a/gcc/lcm.h b/gcc/lcm.h
index e08339352e0..7145d6fc46d 100644
--- a/gcc/lcm.h
+++ b/gcc/lcm.h
@@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
   sbitmap *, sbitmap *,
   sbitmap *, sbitmap **,
   sbitmap **);
+extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
+extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
+ sbitmap *, sbitmap *, sbitmap *);
 #endif /* GCC_LCM_H */
-- 
2.36.3




[PATCH] RISC-V: Fix incorrect VTYPE fusion for floating point scalar move insn[PR111037]

2023-08-20 Thread Juzhe-Zhong
void foo(_Float16 y, int64_t *i64p)
{
  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
}

zve64f:
foo:
vsetivlizero,1,e16,mf4,ta,ma
vle64.v v1,0(a0)
vfmv.s.fv2,fa0
vsetvli zero,zero,e64,m1,ta,ma
vadd.vv v1,v1,v1

zve64d:
foo:
vsetivlizero,1,e64,m1,ta,ma
vle64.v v1,0(a0)
vfmv.s.fv2,fa0
vadd.vv v1,v1,v1

PR target111037

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (float_insn_valid_sew_p): New function.
(second_sew_less_than_first_sew_p): Fix bug.
(first_sew_less_than_second_sew_p): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr111037-1.c: New test.
* gcc.target/riscv/rvv/base/pr111037-2.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  | 22 +--
 .../gcc.target/riscv/rvv/base/pr111037-1.c| 15 +
 .../gcc.target/riscv/rvv/base/pr111037-2.c|  8 +++
 3 files changed, 43 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 08c487d82c0..79cbac01047 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1183,18 +1183,36 @@ second_ratio_invalid_for_first_lmul_p (const 
vector_insn_info &info1,
   return calculate_sew (info1.get_vlmul (), info2.get_ratio ()) == 0;
 }
 
+static bool
+float_insn_valid_sew_p (const vector_insn_info &info, unsigned int sew)
+{
+  if (info.get_insn () && info.get_insn ()->is_real ()
+  && get_attr_type (info.get_insn ()->rtl ()) == TYPE_VFMOVFV)
+{
+  if (sew == 16)
+   return TARGET_VECTOR_ELEN_FP_16;
+  else if (sew == 32)
+   return TARGET_VECTOR_ELEN_FP_32;
+  else if (sew == 64)
+   return TARGET_VECTOR_ELEN_FP_64;
+}
+  return true;
+}
+
 static bool
 second_sew_less_than_first_sew_p (const vector_insn_info &info1,
  const vector_insn_info &info2)
 {
-  return info2.get_sew () < info1.get_sew ();
+  return info2.get_sew () < info1.get_sew ()
+|| !float_insn_valid_sew_p (info1, info2.get_sew ());
 }
 
 static bool
 first_sew_less_than_second_sew_p (const vector_insn_info &info1,
  const vector_insn_info &info2)
 {
-  return info1.get_sew () < info2.get_sew ();
+  return info1.get_sew () < info2.get_sew ()
+|| !float_insn_valid_sew_p (info2, info1.get_sew ());
 }
 
 /* return 0 if LMUL1 == LMUL2.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
new file mode 100644
index 000..0b7b32fc3e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zve64f_zvfh -mabi=ilp32d -O3" } */
+
+#include "riscv_vector.h"
+
+void foo(_Float16 y, int64_t *i64p)
+{
+  vint64m1_t vx =__riscv_vle64_v_i64m1 (i64p, 1);
+  vx = __riscv_vadd_vv_i64m1 (vx, vx, 1);
+  vfloat16m1_t vy =__riscv_vfmv_s_f_f16m1 (y, 1);
+  asm volatile ("# use %0 %1" : : "vr"(vx), "vr" (vy));
+}
+
+/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*1,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+zero,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c
new file mode 100644
index 000..ac50da71726
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr111037-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zve64d_zvfh -mabi=ilp32d -O3" } */
+
+#include "pr111037-1.c"
+
+/* { dg-final { scan-assembler-times 
{vsetivli\s+zero,\s*1,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 } } */
+/* { dg-final { scan-assembler-not {vsetvli} } } */
+/* { dg-final { scan-assembler-times {vsetivli} 1 } } */
-- 
2.36.3



[committed] Testsuite, darwin: account for macOS 13 and 14

2023-08-20 Thread FX Coudert via Gcc-patches
Committed as obvious, making gcc.dg/darwin-minversion-link.c pass on macOS 13 
and 14 (darwin22 and darwin23, respectively).

FX



0001-Testsuite-darwin-account-for-macOS-13-and-14.patch
Description: Binary data


[PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

testsuite/libgomp.c/simd-math-1.c calls nonstandard functions that are not 
available on darwin (and possibly other systems?). Because I did not want to 
disable their testing completely, I suggest we simply use preprocessor macros 
to avoid them on darwin.

This fixes the test failure on aarch64-apple-darwin.
OK to commit?

FX



0001-libgomp-testsuite-Do-not-call-nonstandard-functions-.patch
Description: Binary data


Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-08-20 Thread Tobias Burnus

Hello Thiago,


On 18.08.23 23:24, Thiago Jung Bauermann wrote:

Tobias Burnus  writes:

the patch looks good to me. Thanks! Can you commit the patch yourself or
do you need someone to do this for you?

Thank you! I don't have commit access, so I would need someone to do
this for me.


Done now in commit r14-3344-g40a6803c6d8ca2.

Thanks,

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 40a6803c6d8ca244a7bdda8e4ec986c418362b24
Author: Thiago Jung Bauermann 
Date:   Sun Aug 20 20:46:05 2023 +0200

testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

Commit 92d1425ca780 "c++: redundant targ coercion for var/alias tmpls"
changed the compiler error message in this testcase from

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: expected iteration declaration or initialization
compiler exited with status 1

to:

: In instantiation of 'void foo() [with T = int]':
:14:11:   required from here
:8:22: error: 'int' is not a class, struct, or union type
:8:3: error: invalid type for iteration variable 'i'
compiler exited with status 1
Excess errors:
:8:3: error: invalid type for iteration variable 'i'

Andrew Pinski analysed the issue in PR 110756 and considered that it was a
testsuite issue in that the error message changed slightly.  Also, it's a
better error message.

Therefore, we only need to adjust the testcase to expect the new message.

gcc/testsuite/ChangeLog:
PR testsuite/110756
* g++.dg/gomp/pr58567.C: Adjust to new compiler error message.
---
 gcc/testsuite/g++.dg/gomp/pr58567.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/gomp/pr58567.C b/gcc/testsuite/g++.dg/gomp/pr58567.C
index 35a5bb027ff..866d831c65e 100644
--- a/gcc/testsuite/g++.dg/gomp/pr58567.C
+++ b/gcc/testsuite/g++.dg/gomp/pr58567.C
@@ -5,7 +5,7 @@
 template void foo()
 {
   #pragma omp parallel for
-  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a class, struct, or union type|expected iteration declaration or initialization" } */
+  for (typename T::X i = 0; i < 100; ++i)  /* { dg-error "'int' is not a class, struct, or union type|invalid type for iteration variable 'i'" } */
 ;
 }
 


[PATCH, committed] Testsuite, darwin: Fix analyzer testcases

2023-08-20 Thread FX Coudert via Gcc-patches
Committed as obvious, fixing three more darwin-specific failures in analyzer 
testsuite.
This fixes:

FAIL: gcc.dg/plugin/taint-CVE-2011-0521-5.c 
-fplugin=./analyzer_kernel_plugin.so  (test for warnings, line 39)
FAIL: gcc.dg/plugin/taint-CVE-2011-0521-6.c 
-fplugin=./analyzer_kernel_plugin.so  (test for warnings, line 36)
XPASS: gcc.dg/plugin/taint-CVE-2011-0521-5-fixed.c 
-fplugin=./analyzer_kernel_plugin.so  (test for bogus messages, line 39)

Committed to trunk,
FX


0001-Testsuite-darwin-Fix-analyzer-testcases.patch
Description: Binary data


Re: [PATCH] Testsuite: mark IPA test as requiring alias support

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

> IMO, changes like this qualify as obvious, so I’d go ahead (thanks for this 
> test fail triage)

Makes sense. I’ve committed, as well as another one, attached, slightly 
amending the expected pattern of a sarif plugin test.

FX



0001-Testsuite-plugin-make-testcase-pattern-more-flexible.patch
Description: Binary data


[committed] i386: Micro-optimize ix86_expand_sse_extend

2023-08-20 Thread Uros Bizjak via Gcc-patches
Partial vector src is forced to a register as ops[1], we can use it
instead of SRC in the call to ix86_expand_sse_cmp.  This change avoids
forcing operand[1] to a register in sign/zero-extend expanders.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1]
instead of src in the call to ix86_expand_sse_cmp.
* config/i386/sse.md (v8qiv8hi2): Do not
force operands[1] to a register.
(v4hiv4si2): Ditto.
(v2siv2di2): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 460d496ef22..031e2f72d15 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -5667,7 +5667,7 @@ ix86_expand_sse_extend (rtx dest, rtx src, bool 
unsigned_p)
 ops[2] = force_reg (imode, CONST0_RTX (imode));
   else
 ops[2] = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode),
- src, pc_rtx, pc_rtx);
+ ops[1], pc_rtx, pc_rtx);
 
   ix86_split_mmx_punpck (ops, false);
   emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), ops[0], imode));
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 87c3bf07020..da85223a9b4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22923,8 +22923,7 @@ (define_expand "v8qiv8hi2"
 {
   if (!TARGET_SSE4_1)
 {
-  rtx op1 = force_reg (V8QImode, operands[1]);
-  ix86_expand_sse_extend (operands[0], op1, );
+  ix86_expand_sse_extend (operands[0], operands[1], );
   DONE;
 }
 
@@ -23240,8 +23239,7 @@ (define_expand "v4hiv4si2"
 {
   if (!TARGET_SSE4_1)
 {
-  rtx op1 = force_reg (V4HImode, operands[1]);
-  ix86_expand_sse_extend (operands[0], op1, );
+  ix86_expand_sse_extend (operands[0], operands[1], );
   DONE;
 }
 
@@ -23846,8 +23844,7 @@ (define_expand "v2siv2di2"
 {
   if (!TARGET_SSE4_1)
 {
-  rtx op1 = force_reg (V2SImode, operands[1]);
-  ix86_expand_sse_extend (operands[0], op1, );
+  ix86_expand_sse_extend (operands[0], operands[1], );
   DONE;
 }
 


Re: [PATCH] Testsuite: mark IPA test as requiring alias support

2023-08-20 Thread Iain Sandoe
Hi FX,

> On 20 Aug 2023, at 13:15, FX Coudert via Gcc-patches 
>  wrote:

> The fact that this test needs alias support was indicated in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85656 but never committed. 
> Without it, the test fails on darwin.
> 
> OK to commit?

IMO, changes like this qualify as obvious, so I’d go ahead (thanks for this 
test fail triage)

Iain

> 
> FX
> 
> <0001-Testsuite-mark-IPA-test-as-requiring-alias-support.patch>



[PATCH] Testsuite: mark IPA test as requiring alias support

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

The fact that this test needs alias support was indicated in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85656 but never committed. Without 
it, the test fails on darwin.

OK to commit?

FX



0001-Testsuite-mark-IPA-test-as-requiring-alias-support.patch
Description: Binary data


[PATCH] Testsuite, DWARF2: adjust regexp to match darwin output

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

This was a painful one to fix, because I hate regexps, especially when they are 
quoted. On darwin, we have this failure:

FAIL: gcc.dg/debug/dwarf2/inline4.c scan-assembler 
DW_TAG_inlined_subroutine[^(]*([^)]*)[^(]*(DIE 
(0x[0-9a-f]*) DW_TAG_formal_parameter[^(]*(DIE 
(0x[0-9a-f]*) DW_TAG_variable

That hideous regexp is trying to match (generated on Linux):

> .uleb128 0x4# (DIE (0x5c) DW_TAG_inlined_subroutine)
> .long   0xa0# DW_AT_abstract_origin
> .quad   .LBI4   # DW_AT_entry_pc
> .byte   .LVU2   # DW_AT_GNU_entry_view
> .quad   .LBB4   # DW_AT_low_pc
> .quad   .LBE4-.LBB4 # DW_AT_high_pc
> .byte   0x1 # DW_AT_call_file (u.c)
> .byte   0xf # DW_AT_call_line
> .byte   0x14# DW_AT_call_column
> .uleb128 0x5# (DIE (0x7d) DW_TAG_formal_parameter)
> .long   0xad# DW_AT_abstract_origin
> .long   .LLST0  # DW_AT_location
> .long   .LVUS0  # DW_AT_GNU_locviews
> .uleb128 0x6# (DIE (0x8a) DW_TAG_variable)

It is using the parentheses to check what is between  
DW_TAG_inlined_subroutine, DW_TAG_formal_parameter and DW_TAG_variable. There’s 
only one block of parentheses in the middle, that "(u.c)”. However, on darwin, 
the generated output is more compact:

> .uleb128 0x4; (DIE (0x188) DW_TAG_inlined_subroutine)
> .long   0x1b8   ; DW_AT_abstract_origin
> .quad   LBB4; DW_AT_low_pc
> .quad   LBE4; DW_AT_high_pc
> .uleb128 0x5; (DIE (0x19d) DW_TAG_formal_parameter)
> .long   0x1c6   ; DW_AT_abstract_origin
> .uleb128 0x6; (DIE (0x1a2) DW_TAG_variable)

I think that’s valid as well, and the test should pass (what the test really 
wants to check is that there is no DW_TAG_lexical_block emitted there, see 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37801 for its origin). It could be 
achieved in two ways:

1. making darwin emit the DW_AT_call_file
2. adjusting the regexp to match, making the internal block of parentheses 
optional 

I chose the second approach. It makes the test pass on darwin. If someone can 
test it on linux, it’d be appreciated :) I don’t have ready access to such a 
system right now.

Once that passes, OK to commit?
FX



0001-Testsuite-DWARF2-adjust-regexp-to-match-darwin-outpu.patch
Description: Binary data


Re: Re: [PATCH 1/4][V4][RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-08-20 Thread Fei Gao

Hi Kito

This issue is due to zcmp and shrink-wrap-separate conflict,
which has been addressed by an under-review patch.
[PATCH 0/2] resolve confilct between RISC-V zcmp and shrink-wrap-separate
https://patchwork.sourceware.org/project/gcc/list/?series=21577
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311487.html

I'm making  [PATCH 1/4][V5][RISC-V] support cm.push cm.pop cm.popret in zcmp 
for the 1st issue you catched.
Please let me know if you want me to merge 
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg311486.html
into [PATCH 1/4][V5][RISC-V].

BR, 
Fei
On 2023-08-16 16:38  Kito Cheng  wrote:
>
>Another fail case for CFI:
>
>$ riscv64-unknown-elf-gcc _mulhc3.i
>-march=rv64imafd_zicsr_zifencei_zca_zcmp -mabi=lp64d -g  -O2  -o
>_mulhc3.s
>
>typedef float a __attribute__((mode(HF)));
>b, c;
>f() {
> a a, d, e = a + d;
> if (g() && e)
>   c = b;
>}
>
>
>0x10e508a maybe_record_trace_start
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2584
>0x10e58fb scan_trace
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2784
>0x10e5fab create_cfi_notes
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:2938
>0x10e6ee4 execute_dwarf2_frame
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3309
>0x10e7c5a execute
>   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/dwarf2cfi.cc:3797
>
>On Wed, Aug 16, 2023 at 4:33 PM Kito Cheng  wrote:
>>
>> Hi Fei:
>>
>> Tried to use Jiawei's patch to test this patch and found some issue:
>>
>>
>> > @@ -5430,13 +5632,15 @@ riscv_expand_prologue (void)
>> >    /* Save the registers.  */
>> >    if ((frame->mask | frame->fmask) != 0)
>> >  {
>> > -  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > -
>> > -  insn = gen_add3_insn (stack_pointer_rtx,
>> > -   stack_pointer_rtx,
>> > -   GEN_INT (-step1));
>> > -  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > -  remaining_size -= step1;
>> > +  if (known_gt (remaining_size, frame->frame_pointer_offset))
>> > +    {
>> > +  HOST_WIDE_INT step1 = riscv_first_stack_step (frame, 
>> > remaining_size);
>> > +  remaining_size -= step1;
>> > +  insn = gen_add3_insn (stack_pointer_rtx,
>> > +    stack_pointer_rtx,
>> > +    GEN_INT (-step1));
>> > +  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>> > +    }
>> >    riscv_for_each_saved_reg (remaining_size, riscv_save_reg, false, 
>> >false);
>> >  }
>> >
>>
>> I hit some issue here during building libgcc, I use
>> riscv-gnu-toolchain with --with-arch=rv64gzca_zcmp
>>
>> And the error message is:
>>
>> In file included from
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind-dw2.c:1471:
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc: In
>> function '_Unwind_Backtrace':
>> ../../../../../riscv-gnu-toolchain-trunk/gcc/libgcc/unwind.inc:330:1:
>> internal compiler error: in gen_reg_rtx, at emit-rtl.cc:1176
>>  330 | }
>>  | ^
>> 0x83753a gen_reg_rtx(machine_mode)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/emit-rtl.cc:1176
>> 0xf5566f maybe_legitimize_operand
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8047
>> 0xf5566f maybe_legitimize_operands(insn_code, unsigned int, unsigned
>> int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8191
>> 0xf511d9 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:8210
>> 0xf58539 expand_binop_directly
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1452
>> 0xf5 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
>> rtx_def*, int, optab_methods)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs.cc:1539
>> 0xcbfdd0 force_operand(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:8231
>> 0xc8fca1 force_reg(machine_mode, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/explow.cc:687
>> 0x144b8cd riscv_force_temporary
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1531
>> 0x144b8cd riscv_force_address
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:1528
>> 0x144b8cd riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.cc:2387
>> 0x1af063e gen_movdf(rtx_def*, rtx_def*)
>>    
>>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/config/riscv/riscv.md:2107
>> 0xcba503 rtx_insn* insn_gen_fn::operator()> rtx_def*>(rtx_def*, rtx_def*) const
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/recog.h:411
>> 0xcba503 emit_move_insn_1(rtx_def*, rtx_def*)
>>    ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/expr.cc:4164
>> 0x143d6c4 riscv_emit_move(rtx_def*, rtx_def*)
>

[PATCH] Testsuite, LTO: silence warning to make test pass on Darwin

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

On darwin (both x86_64-apple-darwin and aarch64-apple-darwin) we see the 
following test failure:

FAIL: gcc.dg/lto/20091013-1 c_lto_20091013-1_2.o assemble, -fPIC -r -nostdlib 
-O2 -flto

which is due to this extra warning:

In function 'fontcmp',
inlined from 'find_in_cache' at 
/tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:140:13,
inlined from 'WineEngCreateFontInstance' at 
/tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:160:15:
/tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:107:8: warning: 
'memcmp' specified bound 4 exceeds source size 0 [-Wst
ringop-overread]
/tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c: In function 
'WineEngCreateFontInstance':
/tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:66:20: note: 
source object allocated here

Now, the main file for the test has:

/* { dg-extra-ld-options "-flinker-output=nolto-rel -Wno-stringop-overread" } */

and I believe the intent of -Wno-stringop-overread is to silence this warning, 
but that only applies to the linker, and the warning on darwin is produced by 
the compiler (in addition to the linker). Adding the flag to the compilation of 
the source file makes the test pass on darwin.

OK to commit?
FX




0001-Testsuite-LTO-silence-warning-to-make-test-pass-on-D.patch
Description: Binary data


Re: [PATCH] core: Support heap-based trampolines

2023-08-20 Thread FX Coudert via Gcc-patches
Hi,

A gentle ping on the revised patch, for Richard or another global reviewer.

Thanks,
FX



> Le 5 août 2023 à 16:20, FX Coudert  a écrit :
> 
> Hi Richard,
> 
> Thanks for your feedback. Here is an amended version of the patch, taking 
> into consideration your requests and the following discussion. There is no 
> configure option for the libgcc part, and the documentation is amended. The 
> patch is split into three commits for core, target and libgcc.
> 
> Currently regtesting on x86_64 linux and darwin (it was fine before I split 
> up into three commits, so I’m re-testing to make sure I didn’t screw anything 
> up).
> 
> OK to commit?
> FX



0001-core-Support-heap-based-trampolines.patch
Description: Binary data


0002-target-Support-heap-based-trampolines.patch
Description: Binary data


0003-libgcc-support-heap-based-trampolines.patch
Description: Binary data


[committed] d: Merge upstream dmd, druntime 26f049fb26, phobos 330d6a4fd.

2023-08-20 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end and run-time library with upstream dmd
26f049fb26, and standard library with phobos 330d6a4fd.

Synchronizing with the latest bug fixes in the v2.105.0-beta.1 release.

D front-end changes:

- Import dmd v2.105.0-beta.1.
- Added predefined version identifier VisionOS (ignored by GDC).
- Functions can no longer have `enum` storage class.
- The deprecation of the `body` keyword has been reverted, it is
  now an obsolete feature.
- The error for `scope class` has been reverted, it is now an
  obsolete feature.

D runtime changes:

- Import druntime v2.105.0-beta.1.

Phobos changes:

- Import phobos v2.105.0-beta.1.
- AliasSeq has been removed from std.math.
- extern(C) getdelim and getline have been removed from
  std.stdio.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 26f049fb26.
* dmd/VERSION: Bump version to v2.105.0-beta.1.
* d-codegen.cc (get_frameinfo): Check useGC in condition.
* d-lang.cc (d_handle_option): Set obsolete parameter when compiling
with -Wall.
(d_post_options): Set useGC to false when compiling with
-fno-druntime.  Propagate obsolete flag to compileEnv.
* expr.cc (ExprVisitor::visit (CatExp *)): Check useGC in condition.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 26f049fb26.
* src/MERGE: Merge upstream phobos 330d6a4fd.
---
 gcc/d/d-codegen.cc|   2 +-
 gcc/d/d-lang.cc   |   3 +
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/VERSION |   2 +-
 gcc/d/dmd/clone.d |   2 +-
 gcc/d/dmd/common/string.d |   2 +-
 gcc/d/dmd/cond.d  |   1 +
 gcc/d/dmd/cparse.d|  10 +-
 gcc/d/dmd/dsymbolsem.d| 194 ++
 gcc/d/dmd/errors.d|  34 +--
 gcc/d/dmd/expression.d|  24 ++-
 gcc/d/dmd/expression.h|   6 +-
 gcc/d/dmd/expressionsem.d |   4 +-
 gcc/d/dmd/func.d  |  18 +-
 gcc/d/dmd/globals.d   |  10 +-
 gcc/d/dmd/globals.h   |  11 +-
 gcc/d/dmd/initsem.d   |  25 ++-
 gcc/d/dmd/lexer.d |   1 +
 gcc/d/dmd/nogc.d  |   2 +-
 gcc/d/dmd/parse.d |  86 +---
 gcc/d/dmd/semantic3.d |   3 +-
 gcc/d/dmd/target.d|   4 +-
 gcc/d/dmd/target.h|   2 +-
 gcc/d/dmd/traits.d|  23 ++-
 gcc/d/expr.cc |   2 +-
 gcc/testsuite/gdc.test/compilable/cppmangle.d |   1 -
 .../gdc.test/compilable/deprecate14283.d  |   8 +-
 .../gdc.test/compilable/emptystatement.d  |  19 ++
 .../gdc.test/compilable/imports/imp24022.c|   5 +
 .../gdc.test/compilable/parens_inc.d  |  23 +++
 gcc/testsuite/gdc.test/compilable/test23951.d |  10 +
 gcc/testsuite/gdc.test/compilable/test23966.d |  19 ++
 gcc/testsuite/gdc.test/compilable/test24022.d |  30 +++
 gcc/testsuite/gdc.test/compilable/test7172.d  |   6 +-
 .../gdc.test/fail_compilation/biterrors3.d|   2 +-
 .../gdc.test/fail_compilation/body.d  |  11 +
 .../gdc.test/fail_compilation/ccast.d |  21 +-
 .../gdc.test/fail_compilation/diag4596.d  |   4 +-
 .../gdc.test/fail_compilation/enum_function.d |  13 ++
 .../gdc.test/fail_compilation/fail10285.d |  12 +-
 .../gdc.test/fail_compilation/fail13116.d |   2 +-
 .../gdc.test/fail_compilation/fail15896.d |   1 +
 .../gdc.test/fail_compilation/fail22729.d |   2 +-
 .../gdc.test/fail_compilation/fail22780.d |   2 +-
 .../gdc.test/fail_compilation/fail4559.d  |  22 --
 .../gdc.test/fail_compilation/format.d|  21 +-
 .../fail_compilation/reserved_version.d   |   2 +
 .../gdc.test/fail_compilation/scope_class.d   |   2 +-
 .../gdc.test/fail_compilation/scope_type.d|  16 --
 .../gdc.test/fail_compilation/test23279.d |  14 ++
 .../gdc.test/fail_compilation/typeerrors.d|   2 +-
 gcc/testsuite/gdc.test/runnable/betterc.d |  11 +
 gcc/testsuite/gdc.test/runnable/sctor2.d  |   5 -
 gcc/testsuite/gdc.test/runnable/test24029.c   |  23 +++
 .../gdc.test/runnable/testcontracts.d |  16 --
 libphobos/libdruntime/MERGE   |   2 +-
 libphobos/libdruntime/core/int128.d   |   8 +-
 .../core/internal/array/comparison.d  |  25 ++-
 libphobos/libdruntime/core/lifetime.d |   6 +-
 libphobos/src/MERGE   |   2 +-
 libpho

Re: [PATCH v4 0/6] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-20 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-17 at 15:20 +0800, Chenghui Pan wrote:
> Seems ARMv8-A only guarantees to preserve low 64-bit value of
> NEON/floating-point register value. I'm not sure that I modify the
> testcase in the right way and maybe we need more investigations. Any
> ideas or suggestion?

Sorry, the following sentence in GCC manual section 6.47.5.2 suggests my
test case is not valid:

"As with global register variables, it is recommended that you choose a
register that is normally saved and restored by function calls on your
machine, so that calls to library routines will not clobber it."

So when I use asm(name), the compiler has no obligation to guarantee
that it will ever work like a normal variable after a function call.

But I still need to verify that the compiler correctly understands only
the low 64 bits of the vector register is saved.  I'll try to make
another test case...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[committed] fix misleading identation breaking bootstrap

2023-08-20 Thread Martin Uecker via Gcc-patches


Committed as obvious.


fix misleading identation breaking bootstrap

Fix identation issue introduced by 966f3c13
"Fix format attribute for printf".

gcc/c-family/ChangeLog:

* c-format.cc: Fix identation.

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 122ff9bd1cd..b3ef2d44ce9 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -1214,8 +1214,8 @@ check_function_format (const_tree fn, tree attrs, int 
nargs,
skipped_default_format = true;
break;
  }
-   if (skipped_default_format)
- continue;
+  if (skipped_default_format)
+continue;
}
 
  if (warn_format)




[PATCHv2/COMMITTED] MATCH: Sink convert for vec_cond

2023-08-20 Thread Andrew Pinski via Gcc-patches
Convert be sinked into a vec_cond if both sides
fold. Unlike other unary operations, we need to check that we still can handle
this vec_cond's first operand is the same as the new truth type.

I tried a few different versions of this patch:
view_convert to the new truth_type but that does not work as we always support 
all vec_cond
afterwards.
using expand_vec_cond_expr_p; but that would allow too much.

I also tried to see if view_convert can be handled here but we end up with:
  _3 = VEC_COND_EXPR <_2, {  Nan(-1),  Nan(-1),  Nan(-1),  Nan(-1) }, { 0.0, 
0.0, 0.0, 0.0 }>;
Which isel does not know how to handle as just being a view_convert from 
`vector(4) `
to `vector(4) float` and causes a regression with `g++.target/i386/pr88152.C`

Note, in the case of the SVE testcase, we will sink negate after the convert 
and be able
to remove a few extra instructions in the end.
Also with this change gcc.target/aarch64/sve/cond_unary_5.c will now pass.

Committed as approved after a bootstrapped and tested on x86_64-linux-gnu and 
aarch64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111006
PR tree-optimization/110986
* match.pd: (op(vec_cond(a,b,c))): Handle convert for op.

gcc/testsuite/ChangeLog:

PR tree-optimization/111006
* gcc.target/aarch64/sve/cond_convert_7.c: New test.
---
 gcc/match.pd  |  8 +++
 .../gcc.target/aarch64/sve/cond_convert_7.c   | 23 +++
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 6b2d3a11776..851f1af6eac 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4710,6 +4710,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (op (vec_cond:s @0 @1 @2))
   (vec_cond @0 (op! @1) (op! @2
 
+/* Sink unary conversions to branches, but only if we do fold both
+   and the target's truth type is the same as we already have.  */
+(simplify
+ (convert (vec_cond:s @0 @1 @2))
+ (if (VECTOR_TYPE_P (type)
+  && types_match (TREE_TYPE (@0), truth_type_for (type)))
+  (vec_cond @0 (convert! @1) (convert! @2
+
 /* Sink binary operation to branches, but only if we can fold it.  */
 (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
 lshift rshift rdiv trunc_div ceil_div floor_div round_div
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c 
b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c
new file mode 100644
index 000..4bb95b92195
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_7.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 
-fdump-tree-optimized" } */
+
+/* This is a modified reduced version of cond_unary_5.c */
+
+void __attribute__ ((noipa))
+f0 (unsigned short *__restrict r,
+   int *__restrict a,
+   int *__restrict pred)
+{
+  for (int i = 0; i < 1024; ++i)
+  {
+int p = pred[i]?-1:0;
+r[i] = p ;
+  }
+}
+
+/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } } 
*/
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } */
+
+/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */
-- 
2.31.1