Re: [PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-10 Thread Richard Biener
On Sun, Dec 10, 2023 at 4:00 PM xndcn  wrote:
>
> Hi, I am a newbie in GCC, and I do not have access to git repo.
>
> I found some misleading error messages in verify_gimple_assign_single 
> function of tree-cfg.cc. It prompt error "invalid RHS for gimple memory 
> store: ", but it checks lhs in fact.

it might be a bit confusing but it's correct.  There is a store
because !is_gimple_reg (lhs)
and the only case !is_gimple_reg (rhs1) is correct is when this is an aggregate
copy (!is_gimple_reg_type (TREE_TYPE (lhs))).  Otherwise the _RHS_ needs to be
a register.

Richard.


Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

2023-12-10 Thread Robin Dapp
On 12/11/23 03:09, juzhe.zh...@rivai.ai wrote:
> +   if (shuffle_series (d))
> + return true;
> 
> 
> Could you rename it into shuffle_series_patterns ?
> 
> Just to make naming consistent.

Done, going to push with that change in a while.

Regards
 Robin



Re: [PATCH] RISC-V: Support highest overlap for wv instructions

2023-12-10 Thread Robin Dapp
LGTM, thanks.

Regards
 Robin



Re: [PATCH] expr: catch more `a*bool` while expanding [PR 112935]

2023-12-10 Thread Richard Biener
On Sun, Dec 10, 2023 at 10:30 AM Xi Ruoyao  wrote:
>
> On Sun, 2023-12-10 at 01:21 -0800, Andrew Pinski wrote:
> > diff --git a/gcc/expr.cc b/gcc/expr.cc
> > index 6da51f2aca2..4686cacd22f 100644
> > --- a/gcc/expr.cc
> > +++ b/gcc/expr.cc
> > @@ -10209,8 +10209,9 @@ expand_expr_real_2 (sepops ops, rtx target,
> > machine_mode tmode,
> >/* Expand X*Y as X&-Y when Y must be zero or one.  */
> >if (SCALAR_INT_MODE_P (mode))
> >   {
> > -   bool bit0_p = tree_nonzero_bits (treeop0) == 1;
> > -   bool bit1_p = tree_nonzero_bits (treeop1) == 1;
> > +   bool gimple_zero_one_valued_p (tree, tree (*)(tree));
>
> Should we declare this in the file scope instead?

An improvement might be to have genmatch create a special header
file with all the match declarations.

Though initially I hoped we could eventually have "inlined" match predicates
to be picked up by genmatch from .cc files, like hidden within comments:

// @match
// (match (...)
//  ...)
// @endmatch

(doesn't solve the declaration part though).

Richard.

> > +   bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr);
> > +   bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr);
>
> --
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


Re: [PATCH] expr: catch more `a*bool` while expanding [PR 112935]

2023-12-10 Thread Richard Biener
On Sun, Dec 10, 2023 at 10:21 AM Andrew Pinski  wrote:
>
> After r14-1655-g52c92fb3f40050 (and the other commits
> which touch zero_one_valued_p), we end up with a with
> `bool * a` but where the bool is an SSA name that might not
> have non-zero bits set on it (to 0x1) even though it
> does the non-zero bits would be 0x1.
> The case of coremarks, it is only phiopt4 which adds the new
> ssa name and nothing afterwards updates the nonzero bits on it.
> This fixes the regression by using gimple_zero_one_valued_p
> rather than tree_nonzero_bits to match the cases where the
> SSA_NAME didn't have the non-zero bits set.
> gimple_zero_one_valued_p handles one level of cast and also
> and an `&`.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

I do wonder whether we should do CCP like ops late
to update both alignment and nonzero bits before RTL expansion.
The fold-builtins pass looks spot-on since we remove
assume_aligned & friends there.  I wouldn't exactly fire off CCP
but instead do a RPO walk and just update alignment & nonzero
bits via the bit_value_* helpers and without iteration (using
the SSA name store as lattice).

I'd also say RTL expansion should be re-done creating a
_copy_ of the CFG so we can keep the GIMPLE IL valid
and eventually can use ranger during RTL expansion.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR middle-end/112935
> * expr.cc (expand_expr_real_2): Use
> gimple_zero_one_valued_p instead of tree_nonzero_bits
> to find boolean defined expressions.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/expr.cc | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 6da51f2aca2..4686cacd22f 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -10209,8 +10209,9 @@ expand_expr_real_2 (sepops ops, rtx target, 
> machine_mode tmode,
>/* Expand X*Y as X&-Y when Y must be zero or one.  */
>if (SCALAR_INT_MODE_P (mode))
> {
> - bool bit0_p = tree_nonzero_bits (treeop0) == 1;
> - bool bit1_p = tree_nonzero_bits (treeop1) == 1;
> + bool gimple_zero_one_valued_p (tree, tree (*)(tree));
> + bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr);
> + bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr);
>
>   /* Expand X*Y as X&Y when both X and Y must be zero or one.  */
>   if (bit0_p && bit1_p)
> --
> 2.39.3
>


Re: [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging

2023-12-10 Thread Richard Biener
On Sat, 9 Dec 2023, Richard Sandiford wrote:

> Tamar Christina  writes:
> > Hi All,
> >
> > What do people think about having the ability to force only the latch 
> > connected
> > exit as the exit as a param? I.e. what's in the patch but as a param.
> >
> > I found this useful when debugging large example failures as it tells me 
> > where
> > I should be looking.  No hard requirement but just figured I'd ask if we 
> > should.
> 
> If it's useful for that, then perhaps it would be worth making it a
> DEBUG_COUNTER instead of a --param, for easy bisection.

Or even better, make a debug counter that would skip the IV edge and
choose the "next".

Richard.

> Thanks,
> Richard
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * tree-vect-loop.cc (vec_init_loop_exit_info): Allow forcing of exit.
> >
> > --- inline copy of patch -- 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 
> > 27ab6abfa854f14f8a4cf3d9fcb1ac1c203a4198..d6b35372623e94e02965510ab557cb568c302ebe
> >  100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -964,6 +964,7 @@ vec_init_loop_exit_info (class loop *loop)
> >if (exits.length () == 1)
> >  return exits[0];
> >  
> > +#if 0
> >/* If we have multiple exits we only support counting IV at the moment.  
> > Analyze
> >   all exits and return one */
> >class tree_niter_desc niter_desc;
> > @@ -982,6 +983,16 @@ vec_init_loop_exit_info (class loop *loop)
> >  }
> >  
> >return candidate;
> > +#else
> > +  basic_block bb = ip_normal_pos (loop);
> > +  if (!bb)
> > +return NULL;
> > +
> > +  edge exit = EDGE_SUCC (bb, 0);
> > +  if (exit->dest == loop->latch)
> > +return EDGE_SUCC (bb, 1);
> > +  return exit;
> > +#endif
> >  }
> >  
> >  /* Function bb_in_loop_p
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH v2] -finline-stringops: check base blksize for memset [PR112778]

2023-12-10 Thread Richard Biener
On Sat, Dec 9, 2023 at 8:05 AM Alexandre Oliva  wrote:
>
> Scratch the previous one, the "slightly different version" I had before
> it was not entirely broken due to unnecessary, suboptimal and incorrect
> use of ctz.  Here I have yet another implementation of that loop that
> should perform better and even work correctly ;-)
>
>
> This one has so far regstrapped on x86_64-linux-gnu (v1 failed in
> regression testing, sorry), and bootstrapped with -finline-stringops on
> ppc64le-linux-gnu (still ongoing on x86-64-linux-gnu and
> aarch64-linux-gnu).  Ok to install?

OK

>
> The recently-added logic for -finline-stringops=memset introduced an
> assumption that doesn't necessarily hold, namely, that
> can_store_by_pieces of a larger size implies can_store_by_pieces by
> smaller sizes.  Checks for all sizes the by-multiple-pieces machinery
> might use before committing to an expansion pattern.
>
>
> for  gcc/ChangeLog
>
> PR target/112778
> * builtins.cc (can_store_by_multiple_pieces): New.
> (try_store_by_multiple_pieces): Call it.
>
> for  gcc/testsuite/ChangeLog
>
> PR target/112778
> * gcc.dg/inline-mem-cmp-pr112778.c: New.
> ---
>  gcc/builtins.cc|   57 
> 
>  gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c |   10 
>  2 files changed, 58 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/inline-mem-cmp-pr112778.c
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 12a535d313f12..f6c96498f0783 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -4284,6 +4284,40 @@ expand_builtin_memset (tree exp, rtx target, 
> machine_mode mode)
>return expand_builtin_memset_args (dest, val, len, target, mode, exp);
>  }
>
> +/* Check that store_by_pieces allows BITS + LEN (so that we don't
> +   expand something too unreasonably long), and every power of 2 in
> +   BITS.  It is assumed that LEN has already been tested by
> +   itself.  */
> +static bool
> +can_store_by_multiple_pieces (unsigned HOST_WIDE_INT bits,
> + by_pieces_constfn constfun,
> + void *constfundata, unsigned int align,
> + bool memsetp,
> + unsigned HOST_WIDE_INT len)
> +{
> +  if (bits
> +  && !can_store_by_pieces (bits + len, constfun, constfundata,
> +  align, memsetp))
> +return false;
> +
> +  /* BITS set are expected to be generally in the low range and
> + contiguous.  We do NOT want to repeat the test above in case BITS
> + has a single bit set, so we terminate the loop when BITS == BIT.
> + In the unlikely case that BITS has the MSB set, also terminate in
> + case BIT gets shifted out.  */
> +  for (unsigned HOST_WIDE_INT bit = 1; bit < bits && bit; bit <<= 1)
> +{
> +  if ((bits & bit) == 0)
> +   continue;
> +
> +  if (!can_store_by_pieces (bit, constfun, constfundata,
> +   align, memsetp))
> +   return false;
> +}
> +
> +  return true;
> +}
> +
>  /* Try to store VAL (or, if NULL_RTX, VALC) in LEN bytes starting at TO.
> Return TRUE if successful, FALSE otherwise.  TO is assumed to be
> aligned at an ALIGN-bits boundary.  LEN must be a multiple of
> @@ -4341,7 +4375,11 @@ try_store_by_multiple_pieces (rtx to, rtx len, 
> unsigned int ctz_len,
>else
>  /* Huh, max_len < min_len?  Punt.  See pr100843.c.  */
>  return false;
> -  if (min_len >= blksize)
> +  if (min_len >= blksize
> +  /* ??? Maybe try smaller fixed-prefix blksizes before
> +punting?  */
> +  && can_store_by_pieces (blksize, builtin_memset_read_str,
> + &valc, align, true))
>  {
>min_len -= blksize;
>min_bits = floor_log2 (min_len);
> @@ -4367,8 +4405,9 @@ try_store_by_multiple_pieces (rtx to, rtx len, unsigned 
> int ctz_len,
>   happen because of the way max_bits and blksize are related, but
>   it doesn't hurt to test.  */
>if (blksize > xlenest
> -  || !can_store_by_pieces (xlenest, builtin_memset_read_str,
> -  &valc, align, true))
> +  || !can_store_by_multiple_pieces (xlenest - blksize,
> +   builtin_memset_read_str,
> +   &valc, align, true, blksize))
>  {
>if (!(flag_inline_stringops & ILSOP_MEMSET))
> return false;
> @@ -4386,17 +4425,17 @@ try_store_by_multiple_pieces (rtx to, rtx len, 
> unsigned int ctz_len,
>  of overflow.  */
>   if (max_bits < orig_max_bits
>   && xlenest + blksize >= xlenest
> - && can_store_by_pieces (xlenest + blksize,
> - builtin_memset_read_str,
> - &valc, align, true))
> + && can_store_by_multiple_pieces (xlenest,
> +

Re: [PATCH] -finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]

2023-12-10 Thread Richard Biener
On Sat, Dec 9, 2023 at 3:25 AM Alexandre Oliva  wrote:
>
>
> smallest_int_mode_for_size may abort when the requested mode is not
> available.  Call int_mode_for_size instead, that signals the
> unsatisfiable request in a more graceful way.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
> PR middle-end/112784
> * expr.cc (emit_block_move_via_loop): Call int_mode_for_size
> for maybe-too-wide sizes.
> (emit_block_cmp_via_loop): Likewise.
>
> for  gcc/testsuite/ChangeLog
>
> PR middle-end/112784
> * gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
> ---
>  gcc/expr.cc|   22 
> 
>  .../i386/avx512cd-inline-stringops-pr112784.c  |   12 +++
>  2 files changed, 25 insertions(+), 9 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 6da51f2aca296..178b3ec6d5adb 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -2449,15 +2449,17 @@ emit_block_move_via_loop (rtx x, rtx y, rtx size,
>  }
>emit_move_insn (iter, iter_init);
>
> -  scalar_int_mode int_move_mode
> -= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
> -  if (GET_MODE_BITSIZE (int_move_mode) != incr * BITS_PER_UNIT)
> +  opt_scalar_int_mode int_move_mode
> += int_mode_for_size (incr * BITS_PER_UNIT, 1);
> +  if (!int_move_mode.exists ()

you can use .exists (&move_mode) here to ...

> +  || (GET_MODE_BITSIZE (as_a  (int_move_mode))
> + != incr * BITS_PER_UNIT))
>  {
>move_mode = BLKmode;
>gcc_checking_assert (can_move_by_pieces (incr, align));
>  }
>else
> -move_mode = int_move_mode;
> +move_mode = as_a  (int_move_mode);

... elide this else IIRC.

>
>x_addr = force_operand (XEXP (x, 0), NULL_RTX);
>y_addr = force_operand (XEXP (y, 0), NULL_RTX);
> @@ -2701,16 +2703,18 @@ emit_block_cmp_via_loop (rtx x, rtx y, rtx len, tree 
> len_type, rtx target,
>iter = gen_reg_rtx (iter_mode);
>emit_move_insn (iter, iter_init);
>
> -  scalar_int_mode int_cmp_mode
> -= smallest_int_mode_for_size (incr * BITS_PER_UNIT);
> -  if (GET_MODE_BITSIZE (int_cmp_mode) != incr * BITS_PER_UNIT
> -  || !can_compare_p (NE, int_cmp_mode, ccp_jump))
> +  opt_scalar_int_mode int_cmp_mode
> += int_mode_for_size (incr * BITS_PER_UNIT, 1);
> +  if (!int_cmp_mode.exists ()
> +  || (GET_MODE_BITSIZE (as_a  (int_cmp_mode))
> + != incr * BITS_PER_UNIT)
> +  || !can_compare_p (NE, as_a  (int_cmp_mode), 
> ccp_jump))
>  {
>cmp_mode = BLKmode;
>gcc_checking_assert (incr != 1);
>  }
>else
> -cmp_mode = int_cmp_mode;
> +cmp_mode = as_a  (int_cmp_mode);

Likewise.

I'll note that int_mode_for_size and smallest_int_mode_for_size
are not semantically equivalent in what they can return.  In
particular it seems you are incrementing by iter_incr even when
formerly smallest_int_mode_for_size would have returned a
larger than necessary mode, resulting in at least inefficient
code, copying/comparing pieces multiple times.

So int_mode_for_size looks more correct.

OK with the above change.

Richard.

>/* Save the base addresses.  */
>x_addr = force_operand (XEXP (x, 0), NULL_RTX);
> diff --git 
> a/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c 
> b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
> new file mode 100644
> index 0..c81f99c693c24
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512cd-inline-stringops-pr112784.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512cd -finline-stringops" } */
> +
> +struct S {
> +  int e;
> +} __attribute__((aligned(128)));
> +
> +int main() {
> +  struct S s1;
> +  struct S s2;
> +  int v = __builtin_memcmp(&s1, &s2, sizeof(s1));
> +}
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH 0/3] Espressif xtensa chips multilib

2023-12-10 Thread Alexey Lapshin
Hi Max!

Could you please consider to merge patch 1 and patch 3?
Along with https://sourceware.org/pipermail/binutils/2023-July/128478.html

Regards,
Alexey


[Committed] RISC-V: Fix ICE in extract_single_source

2023-12-10 Thread Juzhe-Zhong
This patch fixes the following ICE in VSETVL PASS:
bug.c:39:1: internal compiler error: Segmentation fault
   39 | }
  | ^
0x1ad5a08 crash_signal
../../../../gcc/gcc/toplev.cc:316
0x7f7f55feb90f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x218d7c7 extract_single_source
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:583
0x218d95d extract_single_source
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:604
0x218fbc5 pre_vsetvl::compute_lcm_local_properties()
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2703
0x2190ef4 pre_vsetvl::earliest_fuse_vsetvl_info()
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:2890
0x2193e62 pass_vsetvl::lazy_vsetvl()
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3537
0x219406a pass_vsetvl::execute(function*)
../../../../gcc/gcc/config/riscv/riscv-vsetvl.cc:3584

The rootcause we have a case that the def info can not be traced:

(insn 208 327 333 27 (use (reg/i:DI 10 a0)) "bug.c":36:1 -1
 (nil))

It's obvious, we conservatively disable any optimization in this situation if
AVL def_info can not be tracded.

Committed as it is obvious.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (extract_single_source): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc  |  2 +
 .../riscv/rvv/vsetvl/avl_use_bug-1.c  | 39 +++
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 90477f331d7..ed5a2b58ab0 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -579,6 +579,8 @@ extract_single_source (set_info *set)
   if (!set->insn ()->is_phi ())
 return nullptr;
   hash_set sets = get_all_sets (set, true, false, true);
+  if (sets.is_empty ())
+return nullptr;
 
   insn_info *first_insn = (*sets.begin ())->insn ();
   if (first_insn->is_artificial ())
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c
new file mode 100644
index 000..330221c2d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_use_bug-1.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvl256b -mabi=lp64 
--param=riscv-autovec-lmul=m8 --param=riscv-autovec-preference=fixed-vlmax -O2" 
} */
+
+struct a_struct
+{
+  unsigned char a_character;
+};
+
+struct a_struct an_array[5];
+struct a_struct *a_ptr;
+int yabba = 1;
+
+int
+f (a, b)
+ unsigned char a;
+ unsigned long b;
+{
+  long i, j, p, q, r, s;
+
+  if (b != (unsigned long) 0)
+{
+  if (yabba)
+   return -1;
+  s = 400 / b;
+  for (i = 0; i < 11; i++)
+   {
+ for (j = 0; j < 256; j++)
+   {
+ if (((p - s < 0) ? -s : 0) < (( q - s < 0) ? -s : q))
+   r = i;
+   }
+   }
+}
+
+  if (yabba)
+return 0;
+  a_ptr = &an_array[a];
+  a_ptr->a_character = (unsigned char) r;
+}
-- 
2.36.3



Re: [PATCH 1/2] c-family: -Waddress-of-packed-member and casts

2023-12-10 Thread Richard Biener
On Wed, Nov 22, 2023 at 11:45 PM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu, OK for trunk?

OK

> -- 8< --
>
> -Waddress-of-packed-member, in addition to the documented warning about
> taking the address of a packed member, also warns about casting from
> a pointer to a TYPE_PACKED type to a pointer to a type with greater
> alignment.
>
> This wrongly warns if the source is a pointer to enum when -fshort-enums
> is on, since that is also represented by TYPE_PACKED.
>
> And there's already -Wcast-align to catch casting from pointer to less
> aligned type (packed or otherwise) to pointer to more aligned type; even
> apart from the enum problem, this seems like a somewhat arbitrary subset of
> that warning.  Though that isn't currently on by default.
>
> So, this patch removes the undocumented type-based warning from
> -Waddress-of-packed-member.  Some of the tests where the warning is
> desirable I changed to use -Wcast-align=strict instead.  The ones that
> require -Wno-incompatible-pointer-types, I just removed.
>
> gcc/c-family/ChangeLog:
>
> * c-warn.cc (check_address_or_pointer_of_packed_member):
> Remove warning based on TYPE_PACKED.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/Waddress-of-packed-member-1.c: Don't expect
> a warning on the cast cases.
> * c-c++-common/pr51628-35.c: Use -Wcast-align=strict.
> * g++.dg/warn/Waddress-of-packed-member3.C: Likewise.
> * gcc.dg/pr88928.c: Likewise.
> * gcc.dg/pr51628-20.c: Removed.
> * gcc.dg/pr51628-21.c: Removed.
> * gcc.dg/pr51628-25.c: Removed.
> ---
>  gcc/c-family/c-warn.cc| 58 +--
>  .../Waddress-of-packed-member-1.c | 12 ++--
>  gcc/testsuite/c-c++-common/pr51628-35.c   |  6 +-
>  .../g++.dg/warn/Waddress-of-packed-member3.C  |  8 +--
>  gcc/testsuite/gcc.dg/pr51628-20.c | 11 
>  gcc/testsuite/gcc.dg/pr51628-21.c | 11 
>  gcc/testsuite/gcc.dg/pr51628-25.c |  9 ---
>  gcc/testsuite/gcc.dg/pr88928.c|  6 +-
>  8 files changed, 19 insertions(+), 102 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.dg/pr51628-20.c
>  delete mode 100644 gcc/testsuite/gcc.dg/pr51628-21.c
>  delete mode 100644 gcc/testsuite/gcc.dg/pr51628-25.c
>
> diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
> index d2938b91043..2a399ba6d14 100644
> --- a/gcc/c-family/c-warn.cc
> +++ b/gcc/c-family/c-warn.cc
> @@ -2991,10 +2991,9 @@ check_alignment_of_packed_member (tree type, tree 
> field, bool rvalue)
>return NULL_TREE;
>  }
>
> -/* Return struct or union type if the right hand value, RHS:
> -   1. Is a pointer value which isn't aligned to a pointer type TYPE.
> -   2. Is an address which takes the unaligned address of packed member
> -  of struct or union when assigning to TYPE.
> +/* Return struct or union type if the right hand value, RHS
> +   is an address which takes the unaligned address of packed member
> +   of struct or union when assigning to TYPE.
> Otherwise, return NULL_TREE.  */
>
>  static tree
> @@ -3021,57 +3020,6 @@ check_address_or_pointer_of_packed_member (tree type, 
> tree rhs)
>
>type = TREE_TYPE (type);
>
> -  if (TREE_CODE (rhs) == PARM_DECL
> -  || VAR_P (rhs)
> -  || TREE_CODE (rhs) == CALL_EXPR)
> -{
> -  tree rhstype = TREE_TYPE (rhs);
> -  if (TREE_CODE (rhs) == CALL_EXPR)
> -   {
> - rhs = CALL_EXPR_FN (rhs); /* Pointer expression.  */
> - if (rhs == NULL_TREE)
> -   return NULL_TREE;
> - rhs = TREE_TYPE (rhs);/* Pointer type.  */
> - /* We could be called while processing a template and RHS could be
> -a functor.  In that case it's a class, not a pointer.  */
> - if (!rhs || !POINTER_TYPE_P (rhs))
> -   return NULL_TREE;
> - rhs = TREE_TYPE (rhs);/* Function type.  */
> - rhstype = TREE_TYPE (rhs);
> - if (!rhstype || !POINTER_TYPE_P (rhstype))
> -   return NULL_TREE;
> - rvalue = true;
> -   }
> -  if (rvalue && POINTER_TYPE_P (rhstype))
> -   rhstype = TREE_TYPE (rhstype);
> -  while (TREE_CODE (rhstype) == ARRAY_TYPE)
> -   rhstype = TREE_TYPE (rhstype);
> -  if (TYPE_PACKED (rhstype))
> -   {
> - unsigned int type_align = min_align_of_type (type);
> - unsigned int rhs_align = min_align_of_type (rhstype);
> - if (rhs_align < type_align)
> -   {
> - auto_diagnostic_group d;
> - location_t location = EXPR_LOC_OR_LOC (rhs, input_location);
> - if (warning_at (location, OPT_Waddress_of_packed_member,
> - "converting a packed %qT pointer (alignment %d) 
> "
> - "to a %qT pointer (alignment %d) may result in "
> - "an unaligned pointer value",
> - rhstype, rhs_align

Re: [PATCH] strub: add note on attribute access

2023-12-10 Thread Richard Biener
On Sat, Dec 9, 2023 at 3:09 AM Alexandre Oliva  wrote:
>
> On Dec  7, 2023, Alexandre Oliva  wrote:
>
> > Thanks for raising the issue.  Maybe there should be at least a comment
> > there, and perhaps some asserts to check that pointer and reference
> > types don't make to indirect_parms.
>
> Document why attribute access doesn't need the same treatment as fn
> spec, and check that the assumption behind it holds.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK

>
> for  gcc/ChangeLog
>
> * ipa-strub.cc (pass_ipa_strub::execute): Check that we don't
> add indirection to pointer parameters, and document attribute
> access non-interactions.
> ---
>  gcc/ipa-strub.cc |   11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ipa-strub.cc b/gcc/ipa-strub.cc
> index 2afb7a455751d..8ec6824e8a802 100644
> --- a/gcc/ipa-strub.cc
> +++ b/gcc/ipa-strub.cc
> @@ -2889,6 +2889,13 @@ pass_ipa_strub::execute (function *)
> && (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (nparm)))
> <= 4 * UNITS_PER_WORD
> {
> + /* No point in indirecting pointer types.  Presumably they
> +won't ever pass the size-based test above, but check the
> +assumption here, because getting this wrong would mess
> +with attribute access and possibly others.  We deal with
> +fn spec below.  */
> + gcc_checking_assert (!POINTER_TYPE_P (TREE_TYPE (nparm)));
> +
>   indirect_nparms.add (nparm);
>
>   /* ??? Is there any case in which it is not safe to suggest the 
> parms
> @@ -2976,7 +2983,9 @@ pass_ipa_strub::execute (function *)
> }
> }
>
> -   /* ??? Maybe we could adjust it instead.  */
> +   /* ??? Maybe we could adjust it instead.  Note we don't need
> +  to mess with attribute access: pointer-typed parameters are
> +  not modified, so they can remain unchanged.  */
> if (drop_fnspec)
>   remove_named_attribute_unsharing ("fn spec",
> &TYPE_ATTRIBUTES (nftype));
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code

2023-12-10 Thread Tamar Christina
> > >
> > > Hmm, but we're visiting them then?  I wonder how you get along
> > > without doing adjustmens on the uses if you consider
> > >
> > > _1 = a < b;
> > > _2 = c != d;
> > > _3 = _1 | _2;
> > > if (_3 != 0)
> > >   exit loop;
> > >
> > > thus a combined condition like
> > >
> > > if (a < b || c != d)
> > >
> > > that we if-converted.  We need to recognize that _1, _2 and _3 have
> > > mask uses and thus possibly adjust them.
> > >
> > > What bad happens if you drop 'analyze_only'?  We're not really
> > > rewriting anything there.
> >
> > You mean drop it only in the above? We then fail to update the type for
> > the gcond.  So in certain circumstances like with
> >
> > int a, c, d;
> > short b;
> >
> > int
> > main ()
> > {
> >   int e[1];
> >   for (; b < 2; b++)
> > {
> >   a = 0;
> >   if (b == 28378)
> > a = e[b];
> >   if (!(d || b))
> > for (; c;)
> >   ;
> > }
> >   return 0;
> > }
> >
> > Unless we walk the statements regardless of whether they come from inside 
> > the
> loop or not.
> 
> What do you mean by "fail to update the type for the gcond"?  If
> I understood correctly the 'analyze_only' short-cuts some
> checks, it doens't add some?
> 
> But it's hard to follow what's actually done for a gcond ...
> 

Yes so I had realized I had misunderstood what this pattern was doing and once
I had made the first wrong change it snowballed.

This is an updates patch where the only modification made is to 
check_bool_pattern
to also return the type of the overall expression even if we are going to 
handle the
conditional through an optab expansion.  I'm piggybacking on the fact that this 
function
has seen enough of the operands to be able to tell the precision needed when 
vectorizing.

This is needed because in the cases where the condition to the gcond was 
already a bool
The precision would be 1 bit, to find the actual mask since we have to dig 
through the
operands which this function already does.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no 
issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
(check_bool_pattern, vect_recog_bool_pattern): Support gconds type
analysis.
* tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
lhs.
(vectorizable_early_exit): New.
(vect_analyze_stmt, vect_transform_stmt): Use it.
(vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.

--- inline copy of patch ---

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
7debe7f0731673cd1bf25cd39d55e23990a73d0e..6bf1c0aba8ce94f70ce4e952efd1c5695b189690
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple 
*pattern_stmt,
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
 {
   gcc_assert (!vectype
+ || is_a  (pattern_stmt)
  || (VECTOR_BOOLEAN_TYPE_P (vectype)
  == vect_use_mask_type_p (orig_stmt_info)));
   STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
@@ -5210,10 +5211,12 @@ vect_recog_mixed_size_cond_pattern (vec_info *vinfo,
true if bool VAR can and should be optimized that way.  Assume it shouldn't
in case it's a result of a comparison which can be directly vectorized into
a vector comparison.  Fills in STMTS with all stmts visited during the
-   walk.  */
+   walk.  if VECTYPE then this value will contain the common type of the
+   operations making up the comparisons.  */
 
 static bool
-check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts)
+check_bool_pattern (tree var, vec_info *vinfo, hash_set &stmts,
+   tree *vectype)
 {
   tree rhs1;
   enum tree_code rhs_code;
@@ -5234,27 +5237,28 @@ check_bool_pattern (tree var, vec_info *vinfo, 
hash_set &stmts)
   switch (rhs_code)
 {
 case SSA_NAME:
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
return false;
   break;
 
 CASE_CONVERT:
   if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
return false;
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
return false;
   break;
 
 case BIT_NOT_EXPR:
-  if (! check_bool_pattern (rhs1, vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype))
return false;
   break;
 
 case BIT_AND_EXPR:
 case BIT_IOR_EXPR:
 case BIT_XOR_EXPR:
-  if (! check_bool_pattern (rhs1, vinfo, stmts)
- || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts))
+  if (! check_bool_pattern (rhs1, vinfo, stmts, vectype)
+ || ! check_bool_pattern (gimple_assign_rhs2 (def_stmt), vinfo, stmts,
+

[v3 PATCH] Simplify vector ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d to just (VCE ((a cmp b) ? (VCE c) : (VCE d))).

2023-12-10 Thread liuhongt
> since you are looking at TYPE_PRECISION below you want
> VECTOR_INTIEGER_TYPE_P here as well?  The alternative
> would be to compare TYPE_SIZE.
>
> Some of the checks feel redundant but are probably good for
> documentation purposes.
>
> OK with using VECTOR_INTIEGER_TYPE_P
Actually, the data type doens't need to integer, .i.e x86 support vblendvps
so I'm using TYPE_SIZE here, the code is adjusted to

&& tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
&& (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
   <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6

Here's the updated patch.
Ok for trunk?

When I'm working on PR112443, I notice there's some misoptimizations:
after we fold _mm{,256}_blendv_epi8/pd/ps into gimple, the backend
fails to combine it back to v{,p}blendv{v,ps,pd} since the pattern is
too complicated, so I think maybe we should hanlde it in the gimple
level.

The dump is like

  _1 = c_3(D) >= { 0, 0, 0, 0 };
  _2 = VEC_COND_EXPR <_1, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
  _7 = VIEW_CONVERT_EXPR(_2);
  _8 = VIEW_CONVERT_EXPR(b_6(D));
  _9 = VIEW_CONVERT_EXPR(a_5(D));
  _10 = _7 < { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  _11 = VEC_COND_EXPR <_10, _8, _9>;

It can be optimized to

  _1 = c_2(D) >= { 0, 0, 0, 0 };
  _6 = VEC_COND_EXPR <_1, b_5(D), a_4(D)>;

since _7 is either -1 or 0, the selection of _7 < 0 ? _8 : _9 should
be euqal to _1 ? b : a as long as TYPE_PRECISION of the component type
of the second VEC_COND_EXPR is less equal to the first one.
The patch add a gimple pattern to handle that.

gcc/ChangeLog:

* match.pd (VCE (a cmp b ? -1 : 0) < 0) ? c : d ---> (VCE ((a
cmp b) ? (VCE:c) : (VCE:d))): New gimple simplication.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512vl-blendv-3.c: New test.
* gcc.target/i386/blendv-3.c: New test.
---
 gcc/match.pd  | 23 ++
 .../gcc.target/i386/avx512vl-blendv-3.c   |  6 +++
 gcc/testsuite/gcc.target/i386/blendv-3.c  | 46 +++
 3 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/blendv-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4d554ba4721..359c7b07dc3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5190,6 +5190,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
   (vec_cond (bit_and @0 (bit_not @3)) @2 @1)))
 
+/*  ((VCE (a cmp b ? -1 : 0)) < 0) ? c : d is just
+(VCE ((a cmp b) ? (VCE c) : (VCE d))) when TYPE_PRECISION of the
+component type of the outer vec_cond is greater equal the inner one.  */
+(for cmp (simple_comparison)
+ (simplify
+  (vec_cond
+(lt (view_convert@5 (vec_cond@6 (cmp@4 @0 @1)
+   integer_all_onesp
+   integer_zerop))
+ integer_zerop) @2 @3)
+  (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0))
+   && VECTOR_INTEGER_TYPE_P (TREE_TYPE (@5))
+   && !TYPE_UNSIGNED (TREE_TYPE (@5))
+   && VECTOR_TYPE_P (TREE_TYPE (@6))
+   && VECTOR_TYPE_P (type)
+   && tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (type)))
+   && (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)))
+ <= TYPE_PRECISION (TREE_TYPE (TREE_TYPE (@6
+   && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (@6)))
+   (with { tree vtype = TREE_TYPE (@6);}
+ (view_convert:type
+   (vec_cond @4 (view_convert:vtype @2) (view_convert:vtype @3)))
+
 /* c1 ? c2 ? a : b : b  -->  (c1 & c2) ? a : b  */
 (simplify
  (vec_cond @0 (vec_cond:s @1 @2 @3) @3)
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c 
b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
new file mode 100644
index 000..2777e72ab5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-blendv-3.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include "blendv-3.c"
diff --git a/gcc/testsuite/gcc.target/i386/blendv-3.c 
b/gcc/testsuite/gcc.target/i386/blendv-3.c
new file mode 100644
index 000..fa0fb067a73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/blendv-3.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -O2" } */
+/* { dg-final { scan-assembler-times {vp?blendv(?:b|p[sd])[ \t]*} 6 } } */
+/* { dg-final { scan-assembler-not {vpcmp} } } */
+
+#include 
+
+__m256i
+foo (__m256i a, __m256i b, __m256i c)
+{
+  return _mm256_blendv_epi8 (a, b, ~c < 0);
+}
+
+__m256d
+foo1 (__m256d a, __m256d b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_pd (a, b, (__m256d)d);
+}
+
+__m256
+foo2 (__m256 a, __m256 b, __m256i c)
+{
+  __m256i d = ~c < 0;
+  return _mm256_blendv_ps (a, b, (__m256)d);
+}
+
+__m128i
+foo4 (__m128i a, __m128i b, __m128i c)
+{
+  return _m

PING^2: [PATCH v2] Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

2023-12-10 Thread Xi Ruoyao
Ping again.

On Fri, 2023-12-01 at 13:44 +0800, Xi Ruoyao wrote:
> Ping.
> 
> On Fri, 2023-11-24 at 17:09 +0800, Xi Ruoyao wrote:
> > With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
> > FE_INEXACT and it should produce an integral result (if the input is not
> > NaN or Inf).  Thus FE_INEXACT should not be raised.
> > 
> > But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
> > non-Inf value.  C23 recommends to do so in a footnote.
> > 
> > Thus we should not simplify (int)trunc(x) to (int)x if
> > -fno-fp-int-builtin-inexact is in-effect.
> > 
> > gcc/ChangeLog:
> > 
> > PR middle-end/107723
> > * convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
> > early if !flag_fp_int_builtin_inexact and flag_trapping_math.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR middle-end/107723
> > * gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.
> > ---
> > 
> > Change from v1: add flag_trapping_math into the condition.
> > 
> > Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk?
> > 
> >  gcc/convert.cc   |  3 ++-
> >  .../gcc.dg/torture/builtin-fp-int-inexact-trunc.c    | 12 
> >  2 files changed, 14 insertions(+), 1 deletion(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > 
> > diff --git a/gcc/convert.cc b/gcc/convert.cc
> > index 46c8bcb31f8..f214b750188 100644
> > --- a/gcc/convert.cc
> > +++ b/gcc/convert.cc
> > @@ -591,7 +591,8 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
> >     CASE_FLT_FN (BUILT_IN_TRUNC):
> >     CASE_FLT_FN_FLOATN_NX (BUILT_IN_TRUNC):
> >       if (call_expr_nargs (s_expr) != 1
> > -     || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (CALL_EXPR_ARG (s_expr, 0
> > +     || !SCALAR_FLOAT_TYPE_P (TREE_TYPE (CALL_EXPR_ARG (s_expr, 0)))
> > +     || (!flag_fp_int_builtin_inexact && flag_trapping_math))
> >         break;
> >       return convert_to_integer_1 (type, CALL_EXPR_ARG (s_expr, 0),
> >        dofold);
> > diff --git a/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c 
> > b/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > new file mode 100644
> > index 000..09731183621
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-trunc.c
> > @@ -0,0 +1,12 @@
> > +/* Test -fno-fp-int-builtin-inexact.  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-fno-fp-int-builtin-inexact -fdump-tree-original" } */
> > +
> > +long
> > +x (double y)
> > +{
> > +  return __builtin_trunc (y);
> > +}
> > +
> > +/* Optimization should not discard the __builtin_trunc call.  */
> > +/* { dg-final { scan-tree-dump "__builtin_trunc" "original" } } */
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-10 Thread Jeff Law




On 12/10/23 20:07, Jiufu Guo wrote:


I'm having a bit of a hard time convincing myself this is correct
though.  I can't see how rewriting the load to read the source of the
prior store is unsafe.  If that fixes a problem, then it would seem
like we've gone wrong before here -- perhaps failing to use the fusage
loads to "kill" any available stores to the same or aliased memory
locations.

As you said the later one, call's fusage would killing the previous
store. It is a kind of case like:

134: [argp:SI+0x8]=r134:SI
135: [argp:SI+0x4]=0x1
136: [argp:SI]=r132:SI
137: ax:SI=call [`memset'] argc:0xc
REG_CALL_DECL `memset'
REG_EH_REGION 0

This call insn is:
(call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
  (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  ) [0 __builtin_memset S1 A8])
  (const_int 12 [0xc]))) "pr102798.c":23:22 1086 {*sibcall_value}
   (expr_list:REG_UNUSED (reg:SI 0 ax)
  (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
)
  (expr_list:REG_EH_REGION (const_int 0 [0])
  (nil
  (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
  (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 
[0x4])) [0  S4 A32]))
  (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 
8 [0x8])) [0  S4 A32]))
  (nil)

The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
would prevent them to eliminated.

Right.  But unless I read something wrong, the patch wasn't changing
store removal, it was changing whether or not we forwarded the source
of the store into the destination of a subsequent load from the same
address.

"check_mem_read_rtx" has another behavior which checks the mem
and adds read_info to insn_info->read_rec. "read_rec" could prevent
the "store" from being eliminated during the dse's global alg. This
patch leverages this behavior.
And to avoid the "mem on fusage" to be replaced by leading store's rhs
"replace_read" was disabled if the mem is on the call's fusage.
Ah, so not only do we want to avoid the call to replace_read, but also 
avoid the early return.


By avoiding the early return, we proceed into later code which "kills" 
the tracked store, thus avoiding the problem.  Right?


jeff


Re: [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

2023-12-10 Thread Jeff Law




On 12/10/23 21:01, Fei Gao wrote:

On 2023-12-11 04:43  Jeff Law  wrote:




On 12/5/23 01:12, Fei Gao wrote:

op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for shift like op.
   (get_base_reg): add support for subreg to handle shift amount 
operand.
   (noce_bbs_ok_for_cond_zero_arith): to replace shift amount operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for shift like op.

So I removed the SUBREG handling code which makes this patch merely an
addition of the shift/rotate ops which trivally work just like PLUS,
MINUS, IOR, XOR (by conditionally zero-ing the shift count) tested on
x86 and pushed it to the trunk.

As I noted before while I think handling SUBREGs is important, now is
not the time to be adding that support.


Thanks for your review.
Got your point to defer support for SUBREGs.

Shift-like pattern:
(set (reg/v:DI 137 [ y ])
         (ashift:DI (reg/v:DI 137 [ y ])
             (subreg:QI (reg/v:DI 138 [ z ]) 0)))

No Zicond instructions are generated with the SUBREG handling code removed.
So I noticed your changes in testcases regarding the number of czero 
instruction number scanned.
Then this looks like a NFC patch.
Not on other targets -- not every target forces the shift count into a 
narrow mode.

jeff


Re: [PATCH] -finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

2023-12-10 Thread Jeff Law




On 12/10/23 15:38, Alexandre Oliva wrote:

On Dec 10, 2023, Jeff Law  wrote:


On 12/8/23 19:25, Alexandre Oliva wrote:

On aarch64 -milp32, and presumably on other such targets, ptr can be
in a different mode than ptr_mode in the testcase.  Cope with it.
Regstrapped on x86_64-linux-gnu, also tested the new test on
aarch64-elf.  Ok to install?

for  gcc/ChangeLog
PR target/112804
* builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
for the increment.
for  gcc/testsuite/ChangeLog
PR target/112804
* gcc.target/aarch64/inline-mem-set-pr112804.c: New.

Hmmm.  I'd like a little more information here I wouldn't expect those
modes to differ even for ILP32 on a 64bit target.  Are we just
papering over a problem elsewhere?


Possibly.  I've run into Pmode/ptr_mode issues before, so I didn't give
it much thought.

I'm not very familiar with ptr_mode.

I do remember that Pmode can ultimately differ from the mode that you'd 
get from turning POINTER_SIZE into an integer mode.  For example we 
might have Pmode defined as PSImode while POINTER_SIZE is 32.  That was 
the model used on the mn102 as pointers were quite literally 24 bits in 
a register.





Despite ptr_mode's being SImode, and DEST being a 32-bit pointer,
expand_builtin_memset_args gets from get_memory_rtx a (MEM:BLK (REG:DI))

The expansion of dest to RTL is convoluted.  It is first computed in:

(set (reg:DI 102)
 (plus:DI (reg/f:DI 95 virtual-stack-vars)
 (const_int -264 [0xfef8])))

in DImode because virtual-stack-vars is DImode, then it's
SUBREG-narrowed to SImode in expand_expr_addr_expr_1 because of
new_tmode (== pointer_mode), and that's the address stored in addr and
passed to memory_address in get_memory_rtx.

But then something unexpected happens: memory_address_addr_space extends
it back to DImode because address_mode = targetm.addr_space.address_mode
() = Pmode:

(set (reg:DI 103)
 (zero_extend:DI (subreg:SI (reg:DI 102) 0)))

and it's a MEM with (reg:DI 103) as the address that gets passed to
clear_storage_hints, and then try_store_by_multiple_pieces, where the
address is copied from the MEM into ptr, and then, in the update_needed
case I modified in the proposed patch, it gets incremented.


So it looks like my hunch that this was a Pmode/ptr_mode issue was
correct, and I don't see anything particularly fishy in the conversions
above.  Do you?
Not terribly fishy.  Mostly I think I just wasn't aware of ptr_mode.  It 
looks very similar to some of the kinds of bugs we found on the mn102 
way back in the day and since then we've grown some infrastructure to 
handle it better.


OK for the trunk. Thanks for walking me through it.

jeff




Re: [PATCH 5/5] [ifcvt] optimize extension for x=c ? (y op z) : y by RISC-V Zicond like insns

2023-12-10 Thread Jeff Law




On 12/5/23 01:12, Fei Gao wrote:

SIGN_EXTEND, ZERO_EXTEND and SUBREG has been considered
to support SImode in 64-bit machine.

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for 
extension
 (noce_bbs_ok_for_cond_zero_arith): likewise
 (noce_try_cond_zero_arith): support extension of LSHIFTRT case

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for extension
So I think this needs to defer to gcc-15.  But even so I think getting 
some review on the effort is useful.




---
  gcc/ifcvt.cc  |  16 ++-
  .../gcc.target/riscv/zicond_ifcvt_opt.c   | 126 +-
  2 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index b84be53ec5c..306497a8e37 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2934,6 +2934,10 @@ noce_cond_zero_binary_op_supported (rtx op)
  {
enum rtx_code opcode = GET_CODE (op);
  
+  /* Strip SIGN_EXTEND or ZERO_EXTEND if any.  */

+  if (opcode == SIGN_EXTEND || opcode == ZERO_EXTEND)
+opcode = GET_CODE (XEXP (op, 0));
So it seems to me like that you need to record what the extension was so 
that you can re-apply it to the result.



@@ -3114,7 +3122,11 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
if (CONST_INT_P (*to_replace))
{
  if (noce_cond_zero_shift_op_supported (bin_code))
-   *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+   {
+ *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ if (GET_CODE (a) == ZERO_EXTEND && bin_code == LSHIFTRT)
+   PUT_CODE (a, SIGN_EXTEND);
+   }
This doesn't look correct (ignoring the SUBREG issues with patch #4 in 
this series).


When we looked at this internally the conclusion was we needed to first 
strip the extension, recording what kind of extension it was, then 
reapply the same extension to the result of the now conditional 
operation.  And it's independent of SUBREG handling.



Jeff


Re: [PATCH 4/5] [ifcvt] optimize x=c ? (y op const_int) : y by RISC-V Zicond like insns

2023-12-10 Thread Jeff Law




On 12/5/23 01:12, Fei Gao wrote:

op=[PLUS, MINUS, IOR, XOR, ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT, AND]

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

 * ifcvt.cc (noce_cond_zero_shift_op_supported): check if OP is shift 
like operation
 (noce_cond_zero_binary_op_supported): restructure & call 
noce_cond_zero_shift_op_supported
 (noce_bbs_ok_for_cond_zero_arith): add support for const_int
 (noce_try_cond_zero_arith): add support for x=c ? (y op const_int)

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for x=c ? (y op 
const_int) : y





---
  gcc/ifcvt.cc  |  45 +-
  .../gcc.target/riscv/zicond_ifcvt_opt.c   | 774 +-
  2 files changed, 811 insertions(+), 8 deletions(-)

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 29f33f956eb..b84be53ec5c 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -2910,6 +2910,20 @@ noce_try_sign_mask (struct noce_if_info *if_info)
return true;
  }
  
+/*  Check if OP is shift-like operation supported by conditional zero

+based if conversion, returning TRUE if satisfied otherwise FALSE.
+
+OP is the operation to check.  */
+static bool
+noce_cond_zero_shift_op_supported (enum rtx_code op)
+{
+  if (op == ASHIFT || op == ASHIFTRT || op == LSHIFTRT || op == ROTATE
+  || op == ROTATERT)
+return true;
Formatting nit.  Go ahead and bring down the || op = ROTATE test as 
well.  That leaves the two lines better balanced with all the shifts on 
one line and all the rotates on another.  It's minor, but we might as 
well keep it easy to read.



@@ -3089,7 +3111,18 @@ noce_try_cond_zero_arith (struct noce_if_info *if_info)
  return false;
}
  
-  *to_replace = target;

+  if (CONST_INT_P (*to_replace))
+   {
+ if (noce_cond_zero_shift_op_supported (bin_code))
+   *to_replace = gen_rtx_SUBREG (E_QImode, target, 0);
+ else if (SUBREG_P (bin_op0))
+   *to_replace = gen_rtx_SUBREG (GET_MODE (bin_op0), target, 0);
+ else
+   *to_replace = target;
Not all targets use QImode for their shift counts, so you can't just 
force that argument to QImode.


The way this works in our internal tree is that we re-expand the binary 
operation rather than replacing bits of existing RTL.  That allows the 
expanders to do the right thing automatically for the target WRT 
handling of things like the mode of the shift count.  In fact, I don't 
see how you can ever do replacement of a constant with a register with 
the current scheme since the original constant will be modeless, so you 
never know what mode to use.




Jeff


Re: [PATCH V2] RISC-V: XFAIL scan dump fails for autovec PR111311

2023-12-10 Thread Kito Cheng
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
> index b8a2bd1ebbd..6f2fc56c82c 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr83518.C
> @@ -24,4 +24,4 @@ unsigned test()
>return sum;
>  }
>
> -/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail 
> vect_variable_length } } } */
> +/* { dg-final { scan-tree-dump "return 15;" "optimized" { xfail { 
> vect_variable_length && aarch64*-*-* } } } } */

aarch64?


Re: Re: [PATCH][v3] RISC-V: Add avail interface into function_group_info

2023-12-10 Thread Feng Wang
2023-12-11 13:05 Kito Cheng  wrote:



>I am happy with this change now, however I didn't see any place call avail



>function yet? Do I miss something? Are you planning to send follow up



>patches to add that?



>If so, LGTM.

Yes, the crypto vector will determine whether to register the corresponding sub 
extension's
intrinsic function by calling the AVAIL macro.

Re: [PATCH] RISC-V: Remove poly selftest when --preference=fixed-vlmax

2023-12-10 Thread Kito Cheng
LGTM

On Mon, Dec 11, 2023 at 1:19 PM Juzhe-Zhong  wrote:
>
> This patch fixes multiple ICEs in full coverage testing:
>
> cc1: internal compiler error: in riscv_legitimize_poly_move, at 
> config/riscv/riscv.cc:2456^M
> 0x1fd8d78 riscv_legitimize_poly_move^M
> ../../../../gcc/gcc/config/riscv/riscv.cc:2456^M
> 0x1fd9518 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)^M
> ../../../../gcc/gcc/config/riscv/riscv.cc:2583^M
> 0x2936820 gen_movdi(rtx_def*, rtx_def*)^M
> ../../../../gcc/gcc/config/riscv/riscv.md:2099^M
> 0x11a0f28 rtx_insn* insn_gen_fn::operator()(rtx_def*, 
> rtx_def*) const^M
> ../../../../gcc/gcc/recog.h:431^M
> 0x13cf2f9 emit_move_insn_1(rtx_def*, rtx_def*)^M
> ../../../../gcc/gcc/expr.cc:4553^M
> 0x13d010c emit_move_insn(rtx_def*, rtx_def*)^M
> ../../../../gcc/gcc/expr.cc:4723^M
> 0x216f5e0 run_poly_int_selftest^M
> ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:185^M
> 0x21701e6 run_poly_int_selftests^M
> ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:226^M
> 0x2172109 selftest::riscv_run_selftests()^M
> ../../../../gcc/gcc/config/riscv/riscv-selftests.cc:371^M
> 0x3b8067b selftest::run_tests()^M
> ../../../../gcc/gcc/selftest-run-tests.cc:112^M
> 0x1ad90ee toplev::run_self_tests()^M
> ../../../../gcc/gcc/toplev.cc:2209^M
>
> Running target 
> riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax
>
> The rootcause is that we are testing POLY value computation during 
> FIXED-VLMAX and ICE in this code:
>
>   if (BYTES_PER_RISCV_VECTOR.is_constant ())
> {
>   gcc_assert (value.is_constant ());   ->  
> assert failed.
>   riscv_emit_move (dest, GEN_INT (value.to_constant ()));
>   return;
> }
>
> For example, a poly value [15, 16] is computed by csrr vlen + multiple scalar 
> integer instructions.
>
> However, such compile-time unknown value need to be computed when it is 
> scalable vector, that is !BYTES_PER_RISCV_VECTOR.is_constant (),
> since csrr vlenb = [16, 0] when -march=rv64gcv 
> --param=riscv-autovec-preference=fixed-vlmax and we have no chance to compute 
> compile-time POLY value.
>
> Also, we never reach the situation to compute a compile time unknown value 
> when it is FIXED-VLMAX vector. So disable POLY selftest for FIXED-VLMAX.
>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-selftests.cc (riscv_run_selftests): Remove poly 
> self test when FIXED-VLMAX.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/poly-selftest-1.c: New test.
>
> ---
>  gcc/config/riscv/riscv-selftests.cc| 14 +-
>  .../gcc.target/riscv/rvv/base/poly-selftest-1.c| 12 
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c
>
> diff --git a/gcc/config/riscv/riscv-selftests.cc 
> b/gcc/config/riscv/riscv-selftests.cc
> index 0ac17fb70a1..289916b999e 100644
> --- a/gcc/config/riscv/riscv-selftests.cc
> +++ b/gcc/config/riscv/riscv-selftests.cc
> @@ -368,7 +368,19 @@ namespace selftest {
>  void
>  riscv_run_selftests (void)
>  {
> -  run_poly_int_selftests ();
> +  if (!BYTES_PER_RISCV_VECTOR.is_constant ())
> +/* We can know POLY value = [4, 4] when BYTES_PER_RISCV_VECTOR
> +   is !is_constant () since we can use csrr vlenb and scalar shift
> +   instruction to compute such POLY value and store it into a scalar
> +   register.  Wheras, we can't know [4, 4] on it is specified as
> +   FIXED-VLMAX since BYTES_PER_RISCV_VECTOR = 16 for -march=rv64gcv
> +   and csrr vlenb is 16 which is totally unrelated to any
> +   compile-time unknown POLY value.
> +
> +   Since we never need to compute a compile-time unknown POLY value
> +   when --param=riscv-autovec-preference=fixed-vlmax, disable poly
> +   selftests in such situation.  */
> +run_poly_int_selftests ();
>run_const_vector_selftests ();
>run_broadcast_selftests ();
>  }
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c
> new file mode 100644
> index 000..0f128ac26b2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 
> -fself-test=$srcdir/selftests --param=riscv-autovec-preference=fixed-vlmax 
> -S" } */
> +
> +/* Verify that -fself-test does not fail on a non empty source.  */
> +
> +int i;   
>void bar();
>  void foo()
> +{
> +  while (i--)
> +bar();
> +}
> +
> +/* { dg-regexp {^-fself-test: [0-9]+ pass\(es\) in [.0-9]+ seconds$|.*: 
> note: self-tests are not enabled in this bu

[PATCH] RISC-V: Remove poly selftest when --preference=fixed-vlmax

2023-12-10 Thread Juzhe-Zhong
This patch fixes multiple ICEs in full coverage testing:

cc1: internal compiler error: in riscv_legitimize_poly_move, at 
config/riscv/riscv.cc:2456^M
0x1fd8d78 riscv_legitimize_poly_move^M
../../../../gcc/gcc/config/riscv/riscv.cc:2456^M
0x1fd9518 riscv_legitimize_move(machine_mode, rtx_def*, rtx_def*)^M
../../../../gcc/gcc/config/riscv/riscv.cc:2583^M
0x2936820 gen_movdi(rtx_def*, rtx_def*)^M
../../../../gcc/gcc/config/riscv/riscv.md:2099^M
0x11a0f28 rtx_insn* insn_gen_fn::operator()(rtx_def*, 
rtx_def*) const^M
../../../../gcc/gcc/recog.h:431^M
0x13cf2f9 emit_move_insn_1(rtx_def*, rtx_def*)^M
../../../../gcc/gcc/expr.cc:4553^M
0x13d010c emit_move_insn(rtx_def*, rtx_def*)^M
../../../../gcc/gcc/expr.cc:4723^M
0x216f5e0 run_poly_int_selftest^M
../../../../gcc/gcc/config/riscv/riscv-selftests.cc:185^M
0x21701e6 run_poly_int_selftests^M
../../../../gcc/gcc/config/riscv/riscv-selftests.cc:226^M
0x2172109 selftest::riscv_run_selftests()^M
../../../../gcc/gcc/config/riscv/riscv-selftests.cc:371^M
0x3b8067b selftest::run_tests()^M
../../../../gcc/gcc/selftest-run-tests.cc:112^M
0x1ad90ee toplev::run_self_tests()^M
../../../../gcc/gcc/toplev.cc:2209^M

Running target 
riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m1/--param=riscv-autovec-preference=fixed-vlmax

The rootcause is that we are testing POLY value computation during FIXED-VLMAX 
and ICE in this code:

  if (BYTES_PER_RISCV_VECTOR.is_constant ())
{
  gcc_assert (value.is_constant ());   ->  
assert failed.
  riscv_emit_move (dest, GEN_INT (value.to_constant ()));
  return;
}

For example, a poly value [15, 16] is computed by csrr vlen + multiple scalar 
integer instructions.

However, such compile-time unknown value need to be computed when it is 
scalable vector, that is !BYTES_PER_RISCV_VECTOR.is_constant (),
since csrr vlenb = [16, 0] when -march=rv64gcv 
--param=riscv-autovec-preference=fixed-vlmax and we have no chance to compute 
compile-time POLY value.

Also, we never reach the situation to compute a compile time unknown value when 
it is FIXED-VLMAX vector. So disable POLY selftest for FIXED-VLMAX.


gcc/ChangeLog:

* config/riscv/riscv-selftests.cc (riscv_run_selftests): Remove poly 
self test when FIXED-VLMAX.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/poly-selftest-1.c: New test.

---
 gcc/config/riscv/riscv-selftests.cc| 14 +-
 .../gcc.target/riscv/rvv/base/poly-selftest-1.c| 12 
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c

diff --git a/gcc/config/riscv/riscv-selftests.cc 
b/gcc/config/riscv/riscv-selftests.cc
index 0ac17fb70a1..289916b999e 100644
--- a/gcc/config/riscv/riscv-selftests.cc
+++ b/gcc/config/riscv/riscv-selftests.cc
@@ -368,7 +368,19 @@ namespace selftest {
 void
 riscv_run_selftests (void)
 {
-  run_poly_int_selftests ();
+  if (!BYTES_PER_RISCV_VECTOR.is_constant ())
+/* We can know POLY value = [4, 4] when BYTES_PER_RISCV_VECTOR
+   is !is_constant () since we can use csrr vlenb and scalar shift
+   instruction to compute such POLY value and store it into a scalar
+   register.  Wheras, we can't know [4, 4] on it is specified as
+   FIXED-VLMAX since BYTES_PER_RISCV_VECTOR = 16 for -march=rv64gcv
+   and csrr vlenb is 16 which is totally unrelated to any
+   compile-time unknown POLY value.
+
+   Since we never need to compute a compile-time unknown POLY value
+   when --param=riscv-autovec-preference=fixed-vlmax, disable poly
+   selftests in such situation.  */
+run_poly_int_selftests ();
   run_const_vector_selftests ();
   run_broadcast_selftests ();
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c
new file mode 100644
index 000..0f128ac26b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/poly-selftest-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0 -fself-test=$srcdir/selftests 
--param=riscv-autovec-preference=fixed-vlmax -S" } */
+
+/* Verify that -fself-test does not fail on a non empty source.  */
+
+int i; 
 void bar();
 void foo()
+{
+  while (i--)
+bar();
+}
+
+/* { dg-regexp {^-fself-test: [0-9]+ pass\(es\) in [.0-9]+ seconds$|.*: note: 
self-tests are not enabled in this build$} } */
-- 
2.36.3



Re: [PATCH 3/5] [ifcvt] optimize x=c ? (y AND z) : y by RISC-V Zicond like insns

2023-12-10 Thread Jeff Law




On 12/5/23 01:12, Fei Gao wrote:

Take the following case for example.

CFLAGS: -march=rv64gc_zbb_zicond -mabi=lp64d -O2

long
test_AND_ceqz (long x, long y, long z, long c)
{
   if (c)
 x = y & z;
   else
 x = y;
   return x;
}

Before patch:

and a2,a1,a2
czero.eqz   a0,a2,a3
czero.nez   a3,a1,a3
or  a0,a3,a0
ret

After patch:
and a0,a1,a2
czero.nez   a1,a1,a3
or  a0,a1,a0
FWIW I was having a conversation with Andrew W. a little while ago and 
his preference was to change the IOR into an ADD.  We have a better 
chance of using a compressed variant as c.add allows the full set of 
registers while c.or only allows a subset of registers.  Given one of 
the two input registers in that IOR will always have the value zero, it 
should be equivalent.


Given this is target independent code we have to be careful, but I'm not 
immediately aware of a target where using ADD would be worse than IOR 
here.   Anyway, feel free to submit a patch for that minor change, it'll 
need to queue for gcc-15, but should be non-controversial at that time.


The same trick can be done in the conditional move expander within riscv.cc.






Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): Add support for AND.
 (noce_bbs_ok_for_cond_zero_arith): Likewise.
 (noce_try_cond_zero_arith): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for AND.
This looks fine.  I'll need to adjust the matches in the test file 
because we're not supporting SUBREGs.  I can do that easy enough.


I 3-stage bootstrapped and regression tested this on x86_64 as well as 
regression tested it in rv64gc.


I'll push it to the trunk shortly.

jeff


Re: [PATCH][v3] RISC-V: Add avail interface into function_group_info

2023-12-10 Thread Kito Cheng
I am happy with this change now, however I didn't see any place call avail
function yet? Do I miss something? Are you planning to send follow up
patches to add that?
If so, LGTM.


Re: Re: [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

2023-12-10 Thread Fei Gao
On 2023-12-11 04:43  Jeff Law  wrote:
>
>
>
>On 12/5/23 01:12, Fei Gao wrote:
>> op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]
>>
>> Conditional op, if zero
>> rd = (rc == 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> op rd, rs1, rd
>>
>> Conditional op, if non-zero
>> rd = (rc != 0) ? (rs1 op rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> op rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng
>>
>> gcc/ChangeLog:
>>
>> * ifcvt.cc (noce_cond_zero_binary_op_supported): add support for shift like 
>> op.
>>  (get_base_reg): add support for subreg to handle shift amount 
>>operand.
>>  (noce_bbs_ok_for_cond_zero_arith): to replace shift amount operand.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for shift like op.
>So I removed the SUBREG handling code which makes this patch merely an
>addition of the shift/rotate ops which trivally work just like PLUS,
>MINUS, IOR, XOR (by conditionally zero-ing the shift count) tested on
>x86 and pushed it to the trunk.
>
>As I noted before while I think handling SUBREGs is important, now is
>not the time to be adding that support. 

Thanks for your review.
Got your point to defer support for SUBREGs.

Shift-like pattern:
(set (reg/v:DI 137 [ y ])
        (ashift:DI (reg/v:DI 137 [ y ])
            (subreg:QI (reg/v:DI 138 [ z ]) 0)))

No Zicond instructions are generated with the SUBREG handling code removed.
So I noticed your changes in testcases regarding the number of czero 
instruction number scanned.
Then this looks like a NFC patch.

BR, 
Fei


>
>Thanks!
>
>jeff

[PATCH V4 2/3] Using pli for constant splitting

2023-12-10 Thread Jiufu Guo
Hi,

For constant building e.g. r120=0x, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.

While for a complicated constant, e.g. 0xULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x.

'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary.  For example: 0xULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.

Compare with previous:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639492.html
This verion is refreshed and updated testcase name.

Bootstrap®test pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build-1.c: New test.

---
 gcc/config/rs6000/rs6000.cc  | 9 -
 gcc/testsuite/gcc.target/powerpc/const-build-1.c | 9 +
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build-1.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 017000a4e02..531c40488b4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10511,7 +10511,14 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, 
int *num_insns)
   emit_insn (dest_or_insn);
   };
 
-  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
+  if (TARGET_PREFIXED && SIGNED_INTEGER_34BIT_P (c))
+{
+  /* li/lis/pli */
+  count_or_emit_insn (dest, GEN_INT (c));
+  return;
+}
+
+ if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
 {
   /* li */
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build-1.c 
b/gcc/testsuite/gcc.target/powerpc/const-build-1.c
new file mode 100644
index 000..7e35f8c507f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+/* { dg-require-effective-target power10_ok } */
+
+unsigned long long msk66() { return 0xULL; }
+
+/* { dg-final { scan-assembler-times {\mpli\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mli\M} } } */
+/* { dg-final { scan-assembler-not {\mlis\M} } } */
-- 
2.25.1



[PATCH V4 1/3]rs6000: accurate num_insns_constant_gpr

2023-12-10 Thread Jiufu Guo
Hi,

Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.

Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these 
two functions should be aligned.

The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num, 
then do not emit instructions).

Compare with the previous version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639491.html
this version updates a lambda usage and comments.

Bootstrap & regtest pass ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.

---
 gcc/config/rs6000/rs6000.cc | 284 ++--
 1 file changed, 146 insertions(+), 138 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cee22c359f3..1e3d1f7fc08 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1115,7 +1115,7 @@ static tree rs6000_handle_longcall_attribute (tree *, 
tree, tree, int, bool *);
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
-static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT, int * = nullptr);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -6054,21 +6054,9 @@ num_insns_constant_gpr (HOST_WIDE_INT value)
 
   else if (TARGET_POWERPC64)
 {
-  HOST_WIDE_INT low = sext_hwi (value, 32);
-  HOST_WIDE_INT high = value >> 31;
-
-  if (high == 0 || high == -1)
-   return 2;
-
-  high >>= 1;
-
-  if (low == 0 || low == high)
-   return num_insns_constant_gpr (high) + 1;
-  else if (high == 0)
-   return num_insns_constant_gpr (low) + 1;
-  else
-   return (num_insns_constant_gpr (high)
-   + num_insns_constant_gpr (low) + 1);
+  int num_insns = 0;
+  rs6000_emit_set_long_const (nullptr, value, &num_insns);
+  return num_insns;
 }
 
   else
@@ -10494,14 +10482,13 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
*shift, HOST_WIDE_INT *mask)
 
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
-   lis, ori and shl instructions.  */
+   lis, ori and shl instructions.  If NUM_INSNS is not NULL, then
+   only increase *NUM_INSNS as the number of insns, and do not emit
+   any insns.  */
 
 static void
-rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
+rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c, int *num_insns)
 {
-  rtx temp;
-  int shift;
-  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
@@ -10509,168 +10496,189 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
c)
   ud3 = (c >> 32) & 0x;
   ud4 = (c >> 48) & 0x;
 
-  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
-  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
-emit_move_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
+  /* This lambda is used to emit one insn or just increase the insn count.
+ When counting the insn number, no need to emit the insn.  */
+  auto count_or_emit_insn = [&num_insns] (rtx dest_or_insn, rtx src = nullptr) 
{
+if (num_insns)
+  {
+   (*num_insns)++;
+   return;
+  }
+
+if (src)
+  emit_move_insn (dest_or_insn, src);
+else
+  emit_insn (dest_or_insn);
+  };
 
-  else if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
-  || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
+  if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
+  || (ud4 == 0 && ud3 == 0 && ud2 == 0 && !(ud1 & 0x8000)))
 {
-  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  /* li */
+  count_or_emit_insn (dest, GEN_INT (sext_hwi (ud1, 16)));
+  return;
+}
+
+  rtx temp
+= (num_insns || !can_create_pseudo_p ()) ? dest : gen_reg_rtx (DImode);
 
-  emit_move_insn (ud1 != 0 ? temp : dest,
- GEN_INT (sext_hwi (ud2 << 16, 32)));
+  if ((ud4 == 0x && ud3 == 0x && (ud2 & 0x8000))
+  || (ud4 == 0 && ud3 == 0 && !(ud2 & 0x8000)))
+{
+  /* lis[; ori] */
+  count_o

Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-10 Thread Jiufu Guo


Hi,

Thanks again for your kind review!

Jeff Law  writes:

> On 12/8/23 00:18, Jiufu Guo wrote:
>>
>> Hi,
>>
>> Jeff Law  writes:
>>
>>> On 12/6/23 02:27, Jiufu Guo wrote:
 Hi,

 The issue mentioned in PR112525 would be able to be handled by
 updating dse.cc to treat arg_pointer_rtx similarly with frame_pointer_rtx.
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271#c10 also mentioned
 this idea.
   One 
 thing, arpg area may be used to pass argument to callee. So, it would
 be needed to check if call insns are using that mem.

 Bootstrap ®test pass on ppc64{,le} and x86_64.
 Is this ok for trunk?

 BR,
 Jeff (Jiufu Guo)


PR rtl-optimization/112525

 gcc/ChangeLog:

* dse.cc (get_group_info): Add arg_pointer_rtx as frame_related.
(check_mem_read_rtx): Add parameter to indicate if it is checking mem
for call insn.
(scan_insn): Add mem checking on call usage.

 gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr112525.c: New test.
>>> So conceptually the first chunk makes sense.  Though I do worry about
>>> Andrew's comment about it causing a bootstrap failure.  Even thought
>>> it was 15 years ago, it remains worrisome.
>>>
>> Yes, I understand your point.
>> At that time, it is a comparesion failure. It may be related to debug
>> info.  But I did not figure out possible failures.
>>
>>>
 @@ -2368,7 +2370,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
  /* If this read is just reading back something that we just
 stored, rewrite the read.  */
 -if (store_info->rhs
 +if (!used_in_call
 +&& store_info->rhs
  && store_info->group_id == -1
  && store_info->cse_base == base
  && known_subrange_p (offset, width, store_info->offset,
 @@ -2650,6 +2653,12 @@ scan_insn (bb_info_t bb_info, rtx_insn *insn, int 
 max_active_local_stores)
   that is not relative to the frame.  */
add_non_frame_wild_read (bb_info);
+  for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
 + link != NULL_RTX;
 + link = XEXP (link, 1))
 +  if (GET_CODE (XEXP (link, 0)) == USE && MEM_P (XEXP (XEXP (link, 0),0)))
 +check_mem_read_rtx (&XEXP (XEXP (link, 0),0), bb_info, true);
>>> I'm having a bit of a hard time convincing myself this is correct
>>> though.  I can't see how rewriting the load to read the source of the
>>> prior store is unsafe.  If that fixes a problem, then it would seem
>>> like we've gone wrong before here -- perhaps failing to use the fusage
>>> loads to "kill" any available stores to the same or aliased memory
>>> locations.
>> As you said the later one, call's fusage would killing the previous
>> store. It is a kind of case like:
>>
>>134: [argp:SI+0x8]=r134:SI
>>135: [argp:SI+0x4]=0x1
>>136: [argp:SI]=r132:SI
>>137: ax:SI=call [`memset'] argc:0xc
>>REG_CALL_DECL `memset'
>>REG_EH_REGION 0
>>
>> This call insn is:
>> (call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
>>  (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  
>> ) [0 __builtin_memset S1 A8])
>>  (const_int 12 [0xc]))) "pr102798.c":23:22 1086 {*sibcall_value}
>>   (expr_list:REG_UNUSED (reg:SI 0 ax)
>>  (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
>> )
>>  (expr_list:REG_EH_REGION (const_int 0 [0])
>>  (nil
>>  (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
>>  (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 
>> [0x4])) [0  S4 A32]))
>>  (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) 
>> (const_int 8 [0x8])) [0  S4 A32]))
>>  (nil)
>>
>> The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
>> would prevent them to eliminated.
> Right.  But unless I read something wrong, the patch wasn't changing
> store removal, it was changing whether or not we forwarded the source
> of the store into the destination of a subsequent load from the same
> address.
"check_mem_read_rtx" has another behavior which checks the mem
and adds read_info to insn_info->read_rec. "read_rec" could prevent
the "store" from being eliminated during the dse's global alg. This
patch leverages this behavior.
And to avoid the "mem on fusage" to be replaced by leading store's rhs
"replace_read" was disabled if the mem is on the call's fusage.


BR,
Jeff (Jiufu Guo)

>
> So I'm still a bit confused how this patch impacted store removal.
>
> jeff


[Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-10 Thread HAO CHEN GUI
Hi,
  This patch cleans up pre-checking of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 CPU test as only P7 above can enter this function and P7
LE is excluded by targetm.slow_unaligned_access.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up pre-checking of expand_block_compare

gcc/
* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
only P7 above can enter this function.  Return false when it's
optimized for size.  Remove P7 CPU test as only P7 above can enter
this function and P7 LE is excluded by the checking of
targetm.slow_unaligned_access on word_mode.

gcc/testsuite/
* gcc.target/powerpc/memcmp_for_size.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index d4030854b2a..dff69e90d0c 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,6 +1946,15 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
+  gcc_assert (TARGET_POPCNTD);
+
+  if (optimize_insn_for_size_p ())
+return false;
+
+  /* Allow this param to shut off all expansion.  */
+  if (rs6000_block_compare_inline_limit == 0)
+return false;
+
   rtx target = operands[0];
   rtx orig_src1 = operands[1];
   rtx orig_src2 = operands[2];
@@ -1959,23 +1968,9 @@ expand_block_compare (rtx operands[])
   if (TARGET_32BIT && TARGET_POWERPC64)
 return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
-  /* Allow this param to shut off all expansion.  */
-  if (rs6000_block_compare_inline_limit == 0)
-return false;
-
-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
- However slow_unaligned_access returns true on P7 even though the
- performance of this code is good there.  */
-  if (!isP7
-  && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
- || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2
-return false;
-
-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
- not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+   || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
 return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2018,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
 return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
- memcmp if alignment is small and length is not short, so bail
- out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
-  && ((base_align == 1 && bytes > 16)
- || (base_align == 2 && bytes > 32)))
-return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c 
b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
new file mode 100644
index 000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/memcmp_for_size.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}


Re: [PATCH v26 00/23] Optimize type traits compilation performance

2023-12-10 Thread Ken Matsui
On Sun, Dec 10, 2023 at 10:19 AM Jason Merrill  wrote:
>
> On 12/7/23 00:11, Ken Matsui wrote:
> > This patch series optimizes type traits compilation performance by
> > implementing built-in type traits and using them in libstdc++.
> >
> > Changes in v26:
> >
> >   * Rebased on top of trunk.
> >   * Moved is_function_v under is_const_v.
> >   * Isolated patches for is_const, is_volatile, is_pointer, and
> >   is_unbounded_array, which contain performance regression, from
> >   this patch series since they are not ready for review yet.
>
> I've applied all the compiler patches, with a few small tweaks,
> including this one as a separate commit.  One other was a formatting
> fix, the lats was using TYPE_PTRDATAMEM_P for CPTK_IS_MEMBER_OBJECT_POINTER.
>
> I'm leaving the library patches for library folks to apply.
>
> Thanks!
>

Thank you so much for your review and support!

> Jason
>
>


Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]

2023-12-10 Thread Jason Merrill

On 12/10/23 21:04, Nathaniel Shead wrote:

On Sat, Dec 09, 2023 at 03:40:44PM -0500, Jason Merrill wrote:

On 11/27/23 06:07, Nathaniel Shead wrote:

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634626.html.

I've been made aware since constructing this patch of CWG2820, which has
a proposed resolution that would change the result of the testcase
'noexcept(yesthrow_t())' (and similarly for the library builtin), but as
it hasn't yet been accepted I think at least ensuring the builtin
matches the behaviour of the operator is probably still sensible.


OK.


Thanks. Looking at the patch again as went go to push it I noticed a
couple of small things.

I've added another paragraph to the commit message to mention CWG2820,
fixed a trailing whitespace I'd missed before, and replaced a call to
'TREE_VEC_LENGTH (from)' with 'len' that I'd missed initially.

Bootstrapped and regtested on x86_64-pc-linux-gnu, still OK for trunk?


Please also mention CWG2820 in the testcase.  OK with that change.


-- >8 --

This patch stops eager folding of trivial operations (construction and
assignment) from occurring when checking for noexceptness. This was
previously done in PR c++/53025, but only for copy/move construction,
and the __is_nothrow_xible builtins did not receive the same treatment
when they were added.

To handle `is_nothrow_default_constructible`, the patch also ensures
that when no parameters are passed we do value initialisation instead of
just building the constructor call: in particular, value-initialisation
doesn't necessarily actually invoke the constructor for trivial default
constructors, and so we need to handle this case as well.

This is contrary to the proposed resolution of CWG2820; for now we just
ensure it matches the behaviour of the `noexcept` operator and create
testcases formalising this, and if that issue gets accepted we can
revisit.

PR c++/96090
PR c++/100470

gcc/cp/ChangeLog:

* call.cc (build_over_call): Prevent folding of trivial special
members when checking for noexcept.
* method.cc (constructible_expr): Perform value-initialisation
for empty parameter lists.
(is_nothrow_xible): Treat as noexcept operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept81.C: New test.
* g++.dg/ext/is_nothrow_constructible7.C: New test.
* g++.dg/ext/is_nothrow_constructible8.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/call.cc| 17 ++---
  gcc/cp/method.cc  | 19 --
  gcc/testsuite/g++.dg/cpp0x/noexcept81.C   | 36 +++
  .../g++.dg/ext/is_nothrow_constructible7.C| 20 ++
  .../g++.dg/ext/is_nothrow_constructible8.C| 63 +++
  5 files changed, 141 insertions(+), 14 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept81.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible7.C
  create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c7efc5b077a..4f0abf8e93f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10247,15 +10247,16 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
/* Avoid actually calling copy constructors and copy assignment operators,
   if possible.  */
  
-  if (! flag_elide_constructors && !force_elide)

+  if (!force_elide
+  && (!flag_elide_constructors
+ /* It's unsafe to elide the operation when handling
+a noexcept-expression, it may evaluate to the wrong
+value (c++/53025, c++/96090).  */
+ || cp_noexcept_operand != 0))
  /* Do things the hard way.  */;
-  else if (cand->num_convs == 1
-   && (DECL_COPY_CONSTRUCTOR_P (fn)
-   || DECL_MOVE_CONSTRUCTOR_P (fn))
-  /* It's unsafe to elide the constructor when handling
- a noexcept-expression, it may evaluate to the wrong
- value (c++/53025).  */
-  && (force_elide || cp_noexcept_operand == 0))
+  else if (cand->num_convs == 1
+  && (DECL_COPY_CONSTRUCTOR_P (fn)
+  || DECL_MOVE_CONSTRUCTOR_P (fn)))
  {
tree targ;
tree arg = argarray[num_artificial_parms_for (fn)];
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index a70dd5d6adc..26e6eb79946 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2091,6 +2091,7 @@ constructible_expr (tree to, tree from)
  {
tree expr;
cp_unevaluated cp_uneval_guard;
+  const int len = TREE_VEC_LENGTH (from);
if (CLASS_TYPE_P (to))
  {
tree ctype = to;
@@ -2098,11 +2099,16 @@ constructible_expr (tree to, tree from)
if (!TYPE_REF_P (to))
to = cp_build_reference_type (to, /*rval*/false);
tree ob = build_stub_object (to);
-  vec_alloc (args, TREE_VEC_LENGTH (from));
-  for (tree arg : tree_vec_range (from))
-   args->quick_push (build_stub_object

Re: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.

2023-12-10 Thread juzhe.zh...@rivai.ai
+ if (shuffle_series (d))
+   return true;

Could you rename it into shuffle_series_patterns ?

Just to make naming consistent.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-12-09 21:18
To: gcc-patches; palmer; Kito Cheng; jeffreyalaw; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Recognize stepped series in expand_vec_perm_const.
Hi,
 
we currently try to recognize various forms of stepped (const_vector)
sequence variants in expand_const_vector.  Because of complications with
canonicalization and encoding it is easier to identify such patterns
in expand_vec_perm_const_1 already where perm.series_p () is available.
 
This patch introduces shuffle_series as new permutation pattern and
tries to recognize series like [base0 base1 base1 + step ...].  If such
a series is found the series is expanded by expand_vec_series and a
gather is emitted.
 
On top the patch fixes the step recognition in expand_const_vector
for stepped series where such a series would end up before.
 
This fixes several execution failures when running code compiled for a
scalable vector size of 128 on a target with vlen = 256 or higher.
The problem was only noticed there because the encoding for a reversed
[2 2]-element vector ("3 2 1 0") is { [1 2], [0 2], [1 4] }.
Some testcases that failed were:
 
vect-alias-check-18.c
vect-alias-check-1.F90
pr64365.c
 
On a 128-bit target, only the first two elements are used.  The
third element causing the complications only comes into effect at
vlen = 256.
 
With this patch the testsuite results are similar with vlen = 128 and
vlen = 256 (when built with -march=rv64gcv_zvl128b).
 
Regards
Robin
 
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_const_vector):  Fix step
calculation.
(modulo_sel_indices): Also perform modulo for variable-length
constants.
(shuffle_series): Recognize series permutations.
(expand_vec_perm_const_1): Add shuffle_series.
---
gcc/config/riscv/riscv-v.cc | 66 +++--
1 file changed, 64 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 9b99d0aca84..fd6ef0660a2 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1378,12 +1378,15 @@ expand_const_vector (rtx target, rtx src)
 { base0, base1, base1 + step, base1 + step * 2, ... }  */
  rtx base0 = builder.elt (0);
  rtx base1 = builder.elt (1);
-   rtx step = builder.elt (2);
+   rtx base2 = builder.elt (2);
+
+   scalar_mode elem_mode = GET_MODE_INNER (mode);
+   rtx step = simplify_binary_operation (MINUS, elem_mode, base2, base1);
+
  /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
  rtx tmp = gen_reg_rtx (mode);
  expand_vec_series (tmp, base1, step);
  /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
-   scalar_mode elem_mode = GET_MODE_INNER (mode);
  if (!rtx_equal_p (base0, const0_rtx))
base0 = force_reg (elem_mode, base0);
@@ -3395,6 +3398,63 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
}
+static bool
+shuffle_series (struct expand_vec_perm_d *d)
+{
+  if (!d->one_vector_p || d->perm.encoding ().npatterns () != 1)
+return false;
+
+  poly_int64 el1 = d->perm[0];
+  poly_int64 el2 = d->perm[1];
+  poly_int64 el3 = d->perm[2];
+
+  poly_int64 step1 = el2 - el1;
+  poly_int64 step2 = el3 - el2;
+
+  bool need_insert = false;
+  bool have_series = false;
+
+  /* Check for a full series.  */
+  if (known_ne (step1, 0) && d->perm.series_p (0, 1, el1, step1))
+have_series = true;
+
+  /* Check for a series starting at the second element.  */
+  else if (known_ne (step2, 0) && d->perm.series_p (1, 1, el2, step2))
+{
+  have_series = true;
+  need_insert = true;
+}
+
+  if (!have_series)
+return false;
+
+  /* Get a vector int-mode to be used for the permute selector.  */
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  insn_code icode = optab_handler (vec_shl_insert_optab, sel_mode);
+
+  /* We need to be able to insert an element and shift the vector.  */
+  if (need_insert && icode == CODE_FOR_nothing)
+return false;
+
+  /* Success! */
+  if (d->testing_p)
+return true;
+
+  /* Create the series.  */
+  machine_mode eltmode = Pmode;
+  rtx series = gen_reg_rtx (sel_mode);
+  expand_vec_series (series, gen_int_mode (need_insert ? el2 : el1, eltmode),
+  gen_int_mode (need_insert ? step2 : step1, eltmode));
+
+  /* Insert the remaining element if necessary.  */
+  if (need_insert)
+emit_insn (GEN_FCN (icode) (series, series, gen_int_mode (el1, eltmode)));
+
+  emit_vlmax_gather_insn (d->target, d->op0, series);
+
+  return true;
+}
+
/* Recognize the pattern that can be shuffled by generic approach.  */
static bool
@@ -3475,6 +3535,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
return true;
  if (shuffle_extract_and_slide1up_patterns (d))
return true;
+   if (shuffle_series (d))
+ return true;

Re: [PATCH] c++: fix noexcept checking for trivial operations [PR96090]

2023-12-10 Thread Nathaniel Shead
On Sat, Dec 09, 2023 at 03:40:44PM -0500, Jason Merrill wrote:
> On 11/27/23 06:07, Nathaniel Shead wrote:
> > Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-October/634626.html.
> > 
> > I've been made aware since constructing this patch of CWG2820, which has
> > a proposed resolution that would change the result of the testcase
> > 'noexcept(yesthrow_t())' (and similarly for the library builtin), but as
> > it hasn't yet been accepted I think at least ensuring the builtin
> > matches the behaviour of the operator is probably still sensible.
> 
> OK.

Thanks. Looking at the patch again as went go to push it I noticed a
couple of small things.

I've added another paragraph to the commit message to mention CWG2820,
fixed a trailing whitespace I'd missed before, and replaced a call to
'TREE_VEC_LENGTH (from)' with 'len' that I'd missed initially.

Bootstrapped and regtested on x86_64-pc-linux-gnu, still OK for trunk?

-- >8 --

This patch stops eager folding of trivial operations (construction and
assignment) from occurring when checking for noexceptness. This was
previously done in PR c++/53025, but only for copy/move construction,
and the __is_nothrow_xible builtins did not receive the same treatment
when they were added.

To handle `is_nothrow_default_constructible`, the patch also ensures
that when no parameters are passed we do value initialisation instead of
just building the constructor call: in particular, value-initialisation
doesn't necessarily actually invoke the constructor for trivial default
constructors, and so we need to handle this case as well.

This is contrary to the proposed resolution of CWG2820; for now we just
ensure it matches the behaviour of the `noexcept` operator and create
testcases formalising this, and if that issue gets accepted we can
revisit.

PR c++/96090
PR c++/100470

gcc/cp/ChangeLog:

* call.cc (build_over_call): Prevent folding of trivial special
members when checking for noexcept.
* method.cc (constructible_expr): Perform value-initialisation
for empty parameter lists.
(is_nothrow_xible): Treat as noexcept operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept81.C: New test.
* g++.dg/ext/is_nothrow_constructible7.C: New test.
* g++.dg/ext/is_nothrow_constructible8.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/call.cc| 17 ++---
 gcc/cp/method.cc  | 19 --
 gcc/testsuite/g++.dg/cpp0x/noexcept81.C   | 36 +++
 .../g++.dg/ext/is_nothrow_constructible7.C| 20 ++
 .../g++.dg/ext/is_nothrow_constructible8.C| 63 +++
 5 files changed, 141 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept81.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible7.C
 create mode 100644 gcc/testsuite/g++.dg/ext/is_nothrow_constructible8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c7efc5b077a..4f0abf8e93f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10247,15 +10247,16 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   /* Avoid actually calling copy constructors and copy assignment operators,
  if possible.  */
 
-  if (! flag_elide_constructors && !force_elide)
+  if (!force_elide
+  && (!flag_elide_constructors
+ /* It's unsafe to elide the operation when handling
+a noexcept-expression, it may evaluate to the wrong
+value (c++/53025, c++/96090).  */
+ || cp_noexcept_operand != 0))
 /* Do things the hard way.  */;
-  else if (cand->num_convs == 1 
-   && (DECL_COPY_CONSTRUCTOR_P (fn) 
-   || DECL_MOVE_CONSTRUCTOR_P (fn))
-  /* It's unsafe to elide the constructor when handling
- a noexcept-expression, it may evaluate to the wrong
- value (c++/53025).  */
-  && (force_elide || cp_noexcept_operand == 0))
+  else if (cand->num_convs == 1
+  && (DECL_COPY_CONSTRUCTOR_P (fn)
+  || DECL_MOVE_CONSTRUCTOR_P (fn)))
 {
   tree targ;
   tree arg = argarray[num_artificial_parms_for (fn)];
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index a70dd5d6adc..26e6eb79946 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2091,6 +2091,7 @@ constructible_expr (tree to, tree from)
 {
   tree expr;
   cp_unevaluated cp_uneval_guard;
+  const int len = TREE_VEC_LENGTH (from);
   if (CLASS_TYPE_P (to))
 {
   tree ctype = to;
@@ -2098,11 +2099,16 @@ constructible_expr (tree to, tree from)
   if (!TYPE_REF_P (to))
to = cp_build_reference_type (to, /*rval*/false);
   tree ob = build_stub_object (to);
-  vec_alloc (args, TREE_VEC_LENGTH (from));
-  for (tree arg : tree_vec_range (from))
-   args->quick_push (build_stub_object (arg));
-  expr = build_special_member_call (ob, complete_ctor_identifier, &args,
-   

[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-10 Thread HAO CHEN GUI
Hi,
  The patch corrects the definition of
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a
comprehensible name.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Correct definition of macro of fixed point efficient unaligned

Marco TARGET_EFFICIENT_OVERLAPPING_UNALIGNED is used in rs6000-string.cc to
guard whether a platform is efficient on fixed point unaligned load/store.
It's originally defined by TARGET_EFFICIENT_UNALIGNED_VSX which is enabled
from P8 and can be disabled by mno-vsx option. So the definition is wrong.
This patch corrects the problem and define it by "!STRICT_ALIGNMENT" which
is true on P7 BE and P8 above.

gcc/
* config/rs6000/rs6000.h (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED):
Rename to...
(TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT): ...this, set it to
!STRICT_ALIGNMENT.
* config/rs6000/rs6000-string.cc (select_block_compare_mode):
Replace TARGET_EFFICIENT_OVERLAPPING_UNALIGNED with
TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT.
(select_block_compare_mode): Likewise.
(expand_block_compare_gpr): Likewise.
(expand_block_compare): Likewise.
(expand_strncmp_gpr_sequence): Likewise.

gcc/testsuite/
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c: New.
* gcc.target/powerpc/target_efficient_unaligned_fixedpoint-2.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 44a946cd453..d4030854b2a 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -305,7 +305,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
   else if (bytes == GET_MODE_SIZE (QImode))
 return QImode;
   else if (bytes < GET_MODE_SIZE (SImode)
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= GET_MODE_SIZE (SImode) - bytes)
 /* This matches the case were we have SImode and 3 bytes
and offset >= 1 and permits us to move back one and overlap
@@ -313,7 +313,7 @@ select_block_compare_mode (unsigned HOST_WIDE_INT offset,
unwanted bytes off of the input.  */
 return SImode;
   else if (word_mode_ok && bytes < UNITS_PER_WORD
-  && TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  && TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && offset >= UNITS_PER_WORD-bytes)
 /* Similarly, if we can use DImode it will get matched here and
can do an overlapping read that ends at the end of the block.  */
@@ -1749,7 +1749,7 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, 
unsigned int base_align,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
@@ -2026,7 +2026,7 @@ expand_block_compare (rtx operands[])
   /* The code generated for p7 and older is not faster than glibc
  memcmp if alignment is small and length is not short, so bail
  out to avoid those conditions.  */
-  if (!TARGET_EFFICIENT_OVERLAPPING_UNALIGNED
+  if (!TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT
   && ((base_align == 1 && bytes > 16)
  || (base_align == 2 && bytes > 32)))
 return false;
@@ -2168,7 +2168,7 @@ expand_strncmp_gpr_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
   load_mode_size = GET_MODE_SIZE (load_mode);
   if (bytes_to_compare >= load_mode_size)
cmp_bytes = load_mode_size;
-  else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+  else if (TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT)
{
  /* Move this load back so it doesn't go past the end.
 P8/P9 can do this efficiently.  */
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 326c45221e9..2f3a82942c1 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -483,9 +483,9 @@ extern int rs6000_vector_align[];
 #define TARGET_NO_SF_SUBREGTARGET_DIRECT_MOVE_64BIT
 #define TARGET_ALLOW_SF_SUBREG (!TARGET_DIRECT_MOVE_64BIT)

-/* This wants to be set for p8 and newer.  On p7, overlapping unaligned
-   loads are slow. */
-#define TARGET_EFFICIENT_OVERLAPPING_UNALIGNED TARGET_EFFICIENT_UNALIGNED_VSX
+/* Like TARGET_EFFICIENT_UNALIGNED_VSX, indicates if unaligned fixed point
+   loads/stores are efficient.  */
+#define TARGET_EFFICIENT_UNALIGNED_FIXEDPOINT (!STRICT_ALIGNMENT)

 /* Byte/char syncs were added as phased in for ISA 2.06B, but are not present
in power7, so conditionalize them on p8 features.  TImode syncs need quad
diff --git 
a/gcc/testsuite/gcc.target/powerpc/target_efficient_unaligned_fixedpoint-1.c 
b/gcc/testsuite/gcc.target/powerpc/targ

Re: [PATCH] extend.texi: Mark builtin arguments with @var{...}

2023-12-10 Thread Sandra Loosemore

On 12/4/23 00:36, Jakub Jelinek wrote:

On Fri, Dec 01, 2023 at 10:43:57AM -0700, Sandra Loosemore wrote:

On 12/1/23 10:33, Jakub Jelinek wrote:

Shall we tweak that somehow?  If the argument names are unimportant, perhaps
it is fine to leave that out, but shouldn't we always use @var{...} around
the parameter names when specified?


Yup.  The Texinfo manual says:  "When using @deftypefn command and
variations, you should mark parameter names with @var to distinguish these
from data type names, keywords, and other parts of the literal syntax of the
programming language."


Here is a patch which does that (but not adding types to where they were
missing, that will be harder to search for).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


This is mostly OK, but I noticed some things you missed:


@@ -12946,13 +12946,13 @@ and @code{__ATOMIC_CONSUME}.
  
  @enddefbuiltin
  
-@defbuiltin{void __atomic_load (@var{type} *ptr, @var{type} *ret, int memorder)}

+@defbuiltin{void __atomic_load (@var{type} *@var{ptr}, @var{type} *ret, int 
@var{memorder})}


@var markup on "ret" here.


-@defbuiltin{void __atomic_store_n (@var{type} *ptr, @var{type} val, int 
memorder)}
+@defbuiltin{void __atomic_store_n (@var{type} *@var{ptr}, @var{type} val, int 
@var{memorder})}


And on "val" here...


-@defbuiltin{void __atomic_store (@var{type} *ptr, @var{type} *val, int 
memorder)}
+@defbuiltin{void __atomic_store (@var{type} *@var{ptr}, @var{type} *val, int 
@var{memorder})}


...and here...


-@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *ptr, @var{type} val, 
int memorder)}
+@defbuiltin{@var{type} __atomic_exchange_n (@var{type} *@var{ptr}, @var{type} 
val, int @var{memorder})}


...and here.


-@defbuiltin{void __atomic_exchange (@var{type} *ptr, @var{type} *val, 
@var{type} *ret, int memorder)}
+@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} *val, 
@var{type} *ret, int @var{memorder})}


And both "val" and "ret" here.

OK with those things fixed.

-Sandra




RE: Re: [PATCH] RISC-V: Add vectorized strcmp.

2023-12-10 Thread Li, Pan2
FYI. I have the some failures as juzhe mentioned, with the emulator qemu 
version qemu-riscv64 version 8.1.93 (v8.2.0-rc3). The entire log may look like 
below:

Executing on host: 
/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
-B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/  
/home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
  -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
--param=riscv-autovec-preference=fixed-vlmax   -fdiagnostics-plain-output   
-ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp   
-lm  -o ./strcmp-run.exe(timeout = 600)
spawn -ignore SIGHUP 
/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/xgcc 
-B/home/box/panli/riscv-gnu-toolchain/build-gcc-newlib-stage2/gcc/ 
/home/box/panli/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c
 -march=rv64gcv -mabi=lp64d -mcmodel=medlow --param=riscv-autovec-lmul=m1 
--param=riscv-autovec-preference=fixed-vlmax -fdiagnostics-plain-output 
-ftree-vectorize -O3 --param riscv-autovec-lmul=m1 -O3 -minline-strcmp -lm -o 
./strcmp-run.exe^M
PASS: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c (test for excess errors)
spawn riscv64-unknown-elf-run ./strcmp-run.exe^M
qemu-riscv64: warning: CPU property 'Zve32f' is deprecated. Please use 'zve32f' 
instead^M
qemu-riscv64: warning: CPU property 'Zve64f' is deprecated. Please use 'zve64f' 
instead^M
FAIL: gcc.target/riscv/rvv/autovec/builtin/strcmp-run.c execution test

Pan

From: 钟居哲 
Sent: Saturday, December 9, 2023 10:18 PM
To: rdapp.gcc ; gcc-patches ; 
palmer ; kito.cheng ; Jeff Law 

Cc: rdapp.gcc 
Subject: Re: Re: [PATCH] RISC-V: Add vectorized strcmp.

I didn't use any special configuration:

--with-arch=rv64gcv_zvl256b --with-abi=lp64d --test --jobs=64 --with-sim=qemu 
--enable-gcc-checking=yes,assert,extra,rtlflag,rtl,gimple


juzhe.zh...@rivai.ai

From: Robin Dapp
Date: 2023-12-09 22:07
To: 钟居哲; 
gcc-patches; palmer; 
kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add vectorized strcmp.
> rv64gcv

With -minline-strcmp I assume?

Regards
Robin



[PATCH][v3] RISC-V: Add avail interface into function_group_info

2023-12-10 Thread Feng Wang
Patch v3: Fix typo and remove the modification of rvv.exp.
Patch v2: Using variadic macro and add the dependency into t-riscv.

In order to add other extension about vector,this patch add
unsigned int (*avail) (void) into function_group_info to determine
whether to register the intrinsic based on ISA info.
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Add AVAIL argument.
(read_vl): Using AVAIL argument default value.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.
(vfwcv

Re: [PATCH 0/4] v2 of Option handling: add documentation URLs

2023-12-10 Thread Mark Wielaard
Hi David,

On Fri, Dec 08, 2023 at 06:35:56PM -0500, David Malcolm wrote:
> On Tue, 2023-11-21 at 23:43 +, Joseph Myers wrote:
> > On Tue, 21 Nov 2023, Tobias Burnus wrote:
> > 
> > > On 21.11.23 14:57, David Malcolm wrote:
> > > > On Tue, 2023-11-21 at 02:09 +0100, Hans-Peter Nilsson wrote:
> > > > > Sorry for barging in though I did try finding the relevant
> > > > > discussion, but is committing this generated stuff necessary?
> > > > > Is it because we don't want to depend on Python being
> > > > > present at build time?
> > > > Partly, yes, [...]
> > > 
> > > I wonder how to ensure that this remains up to date. Should there
> > > be an item at
> > > 
> > > https://gcc.gnu.org/branching.html and/or
> > > https://gcc.gnu.org/releasing.html similar to the .pot generation?
> >
> > My suggestion earlier in the discussion was that it should be
> > added to the post-commit CI discussed starting at
> >  (I
> > think that CI is now in operation).  These are generated files
> > that ought to be kept up to date with each commit that affects
> > .opt files, unlike the .pot files where the expectation is that
> > they should be up to date for releases and updated from time to
> > time at other times for submission to the TP.
> > 
> I had a go at scripting the testing of this, but I am terrible at shell
> scripts (maybe I should use Python?).  Here's what I have so far:
> 
> $ cat contrib/regenerate-index-urls.sh
> 
> set -x
> 
> SRC_DIR=$1
> BUILD_DIR=$2
> NUM_JOBS=$3
> 
> # FIXME: error-checking!
> 
> mkdir -p $BUILD_DIR || exit 1
> cd $BUILD_DIR
> $SRC_DIR/configure --disable-bootstrap --enable-languages=c,d,fortran || exit 
> 2
> make html-gcc -j$NUM_JOBS || exit 3
> cd gcc || exit 4
> make regenerate-opt-urls || exit 5
> cd $SRC_DIR
> (git diff $1 > /dev/null ) && echo "regenerate-opt-urls needs to be run and 
> the results committed" || exit 6
> 
> # e.g.
> #  time bash contrib/regenerate-index-urls.sh $(pwd) $(pwd)/../build-ci 64
> 
> This takes about 100 seconds of wallclock on my 64-core box (mostly
> configuring gcc, which as well as the usual sequence of unparallelized
> tests seems to require building libiberty and lto-plugin).  Is that
> something we want to do on every commit?  Is implementing the CI a
> blocker for getting the patches in? (if so, I'll likely need some help)

The CI builers don't have 64-cores, but a couple of hundred seconds
shouldn't be an issue to do on each commit (OSUOSL just got us a
second x86_64 container builder for larger jobs). The above can easily
be added to the existing gcc-autoregen builder:
https://builder.sourceware.org/buildbot/#/builders/gcc-autoregen
https://sourceware.org/cgit/builder/tree/builder/master.cfg#n3453

Once your patch is in please feel free to sent an email to
build...@sourceware.org
https://sourceware.org/mailman/listinfo/buildbot
And we'll add the above build steps and update the autotools
Containerfile to include the fortran (gfortran?) and d (gdc?) build
dependencies.

Cheers,

Mark


Re: [PATCH] -finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

2023-12-10 Thread Andrew Pinski
On Sun, Dec 10, 2023 at 2:38 PM Alexandre Oliva  wrote:
>
> On Dec 10, 2023, Jeff Law  wrote:
>
> > On 12/8/23 19:25, Alexandre Oliva wrote:
> >> On aarch64 -milp32, and presumably on other such targets, ptr can be
> >> in a different mode than ptr_mode in the testcase.  Cope with it.
> >> Regstrapped on x86_64-linux-gnu, also tested the new test on
> >> aarch64-elf.  Ok to install?
> >>
> >> for  gcc/ChangeLog
> >> PR target/112804
> >> * builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
> >> for the increment.
> >> for  gcc/testsuite/ChangeLog
> >> PR target/112804
> >> * gcc.target/aarch64/inline-mem-set-pr112804.c: New.
> > Hmmm.  I'd like a little more information here I wouldn't expect those
> > modes to differ even for ILP32 on a 64bit target.  Are we just
> > papering over a problem elsewhere?
>
> Possibly.  I've run into Pmode/ptr_mode issues before, so I didn't give
> it much thought.
>
>
> Despite ptr_mode's being SImode, and DEST being a 32-bit pointer,
> expand_builtin_memset_args gets from get_memory_rtx a (MEM:BLK (REG:DI))
>
> The expansion of dest to RTL is convoluted.  It is first computed in:
>
> (set (reg:DI 102)
> (plus:DI (reg/f:DI 95 virtual-stack-vars)
> (const_int -264 [0xfef8])))
>
> in DImode because virtual-stack-vars is DImode, then it's
> SUBREG-narrowed to SImode in expand_expr_addr_expr_1 because of
> new_tmode (== pointer_mode), and that's the address stored in addr and
> passed to memory_address in get_memory_rtx.
>
> But then something unexpected happens: memory_address_addr_space extends
> it back to DImode because address_mode = targetm.addr_space.address_mode
> () = Pmode:
>
> (set (reg:DI 103)
> (zero_extend:DI (subreg:SI (reg:DI 102) 0)))
>
> and it's a MEM with (reg:DI 103) as the address that gets passed to
> clear_storage_hints, and then try_store_by_multiple_pieces, where the
> address is copied from the MEM into ptr, and then, in the update_needed
> case I modified in the proposed patch, it gets incremented.
>
>
> So it looks like my hunch that this was a Pmode/ptr_mode issue was
> correct, and I don't see anything particularly fishy in the conversions
> above.  Do you?

That looks correct.
Just for info; AARCH64:ILP32, even though the exposed pointer size is
32bit, the address for the loads is always done as 64bit.
That is why there is a difference in Pmode and ptr_mode here.
So Pmode still needs to be 64bit due to the addres requirement while
ptr_mode is 32bit.

x32 on x86_64 implements 2 different modes, one where Pmode==ptr_mode
and one where they differ. The default for x32 is Pmode==ptr_mode
IIRC.  This is because x86 has load instructions which handles
addresses that are 32bit (zero-extended IIRC).
IA64 HPUX has a similar mode to AARCH64:ILP32 but I doubt anyone tests
that any more or even supported; it is also where most of the original
testing for Pmode!=ptr_mode happened.
PowerPC64 has a mode where Pmode==ptr_mode==SImode but
word_mode==DImode; this kinda works on Linux but if you hit a place
into the kernel only the lower 32bits is save; it does work fully on
powerpc Darwin (Mac OS X) though.

Thanks,
Andrew Pinski

>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] -finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

2023-12-10 Thread Alexandre Oliva
On Dec 10, 2023, Jeff Law  wrote:

> On 12/8/23 19:25, Alexandre Oliva wrote:
>> On aarch64 -milp32, and presumably on other such targets, ptr can be
>> in a different mode than ptr_mode in the testcase.  Cope with it.
>> Regstrapped on x86_64-linux-gnu, also tested the new test on
>> aarch64-elf.  Ok to install?
>> 
>> for  gcc/ChangeLog
>> PR target/112804
>> * builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
>> for the increment.
>> for  gcc/testsuite/ChangeLog
>> PR target/112804
>> * gcc.target/aarch64/inline-mem-set-pr112804.c: New.
> Hmmm.  I'd like a little more information here I wouldn't expect those
> modes to differ even for ILP32 on a 64bit target.  Are we just
> papering over a problem elsewhere?

Possibly.  I've run into Pmode/ptr_mode issues before, so I didn't give
it much thought.


Despite ptr_mode's being SImode, and DEST being a 32-bit pointer,
expand_builtin_memset_args gets from get_memory_rtx a (MEM:BLK (REG:DI))

The expansion of dest to RTL is convoluted.  It is first computed in:

(set (reg:DI 102)
(plus:DI (reg/f:DI 95 virtual-stack-vars)
(const_int -264 [0xfef8])))

in DImode because virtual-stack-vars is DImode, then it's
SUBREG-narrowed to SImode in expand_expr_addr_expr_1 because of
new_tmode (== pointer_mode), and that's the address stored in addr and
passed to memory_address in get_memory_rtx.

But then something unexpected happens: memory_address_addr_space extends
it back to DImode because address_mode = targetm.addr_space.address_mode
() = Pmode:

(set (reg:DI 103)
(zero_extend:DI (subreg:SI (reg:DI 102) 0)))

and it's a MEM with (reg:DI 103) as the address that gets passed to
clear_storage_hints, and then try_store_by_multiple_pieces, where the
address is copied from the MEM into ptr, and then, in the update_needed
case I modified in the proposed patch, it gets incremented.


So it looks like my hunch that this was a Pmode/ptr_mode issue was
correct, and I don't see anything particularly fishy in the conversions
above.  Do you?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] Add some new DW_IDX_* constants

2023-12-10 Thread Tom Tromey
> "Jakub" == Jakub Jelinek  writes:

Jakub> LGTM for GCC (but it needs a ChangeLog entry).

Oops, yeah -- I am out of the habit of writing those.
I'll add one before I push this.

Tom


Re: [PATCH 2/5] [ifcvt] optimize x=c ? (y shift_op z):y by RISC-V Zicond like insns

2023-12-10 Thread Jeff Law




On 12/5/23 01:12, Fei Gao wrote:

op=[ASHIFT, ASHIFTRT, LSHIFTRT, ROTATE, ROTATERT]

Conditional op, if zero
rd = (rc == 0) ? (rs1 op rs2) : rs1
-->
czero.nez rd, rs2, rc
op rd, rs1, rd

Conditional op, if non-zero
rd = (rc != 0) ? (rs1 op rs2) : rs1
-->
czero.eqz rd, rs2, rc
op rd, rs1, rd

Co-authored-by: Xiao Zeng

gcc/ChangeLog:

* ifcvt.cc (noce_cond_zero_binary_op_supported): add support for shift 
like op.
 (get_base_reg): add support for subreg to handle shift amount operand.
 (noce_bbs_ok_for_cond_zero_arith): to replace shift amount operand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond_ifcvt_opt.c: add TCs for shift like op.
So I removed the SUBREG handling code which makes this patch merely an 
addition of the shift/rotate ops which trivally work just like PLUS, 
MINUS, IOR, XOR (by conditionally zero-ing the shift count) tested on 
x86 and pushed it to the trunk.


As I noted before while I think handling SUBREGs is important, now is 
not the time to be adding that support.


Thanks!

jeff


Re: [PATCH] Add some new DW_IDX_* constants

2023-12-10 Thread Jakub Jelinek
On Sat, Dec 09, 2023 at 01:56:53PM -0700, Tom Tromey wrote:
> I've reimplemented the .debug_names code in GDB -- it was quite far
> from being correct, and the new implementation is much closer to what
> is specified by DWARF.
> 
> However, the new writer in GDB needs to emit some symbol properties,
> so that the reader can be fully functional.  This patch adds a few new
> DW_IDX_* constants, and tries to document the existing extensions as
> well.  (My patch series add more documentation of these to the GDB
> manual as well.)

LGTM for GCC (but it needs a ChangeLog entry).

> diff --git a/include/dwarf2.def b/include/dwarf2.def
> index 7ab3ee611fd4..75b75d901884 100644
> --- a/include/dwarf2.def
> +++ b/include/dwarf2.def
> @@ -802,8 +802,17 @@ DW_IDX (DW_IDX_parent, 4)
>  DW_IDX (DW_IDX_type_hash, 5)
>  DW_IDX_DUP (DW_IDX_lo_user, 0x2000)
>  DW_IDX (DW_IDX_hi_user, 0x3fff)
> +/* Internal linkage.  A flag.  */
>  DW_IDX (DW_IDX_GNU_internal, 0x2000)
> +/* External linkage.  A flag.  Note that gdb no longer generates this;
> +   the default is to assume external linkage.  */
>  DW_IDX (DW_IDX_GNU_external, 0x2001)
> +/* This entry is the program's entry point.  A flag.  */
> +DW_IDX (DW_IDX_GNU_main, 0x2002)
> +/* Language for this entry.  A DW_LANG_* value.  */
> +DW_IDX (DW_IDX_GNU_language, 0x2003)
> +/* This entry is a linkage name.  A flag.  */
> +DW_IDX (DW_IDX_GNU_linkage_name, 0x2004)
>  DW_END_IDX
>  
>  /* DWARF5 Unit type header encodings  */
> -- 
> 2.43.0

Jakub



[PATCH 1/2] analyzer: Remove check of unsigned_char in maybe_undo_optimize_bit_field_compare.

2023-12-10 Thread Andrew Pinski
From: Andrew Pinski 

The check for the type seems unnecessary and gets in the way sometimes.
Also with a patch I am working on for match.pd, it causes a failure to happen.
Before my patch the IR was:
  _1 = BIT_FIELD_REF ;
  _2 = _1 & 1;
  _3 = _2 != 0;
  _4 = (int) _3;
  __analyzer_eval (_4);

Where _2 was an unsigned char type.
And After my patch we have:
  _1 = BIT_FIELD_REF ;
  _2 = (int) _1;
  _3 = _2 & 1;
  __analyzer_eval (_3);

But in this case, the BIT_AND_EXPR is in an int type.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/analyzer/ChangeLog:

* region-model-manager.cc (maybe_undo_optimize_bit_field_compare): 
Remove
the check for type being unsigned_char_type_node.
---
 gcc/analyzer/region-model-manager.cc | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index b631bcb04d0..26c34e38875 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -596,9 +596,6 @@ maybe_undo_optimize_bit_field_compare (tree type,
   tree cst,
   const svalue *arg1)
 {
-  if (type != unsigned_char_type_node)
-return NULL;
-
   const binding_map &map = compound_sval->get_map ();
   unsigned HOST_WIDE_INT mask = TREE_INT_CST_LOW (cst);
   /* If "mask" is a contiguous range of set bits, see if the
-- 
2.39.3



[PATCHv2 0/2] Fix PR 111972: Missed vectorization due to phiopt changes

2023-12-10 Thread Andrew Pinski
This patch set fixes PR 111972 and the fallout from it.

The first patch is a fix for -fanalyzer which I had implemented with
a different version of the 2nd patch while I was working at Marvell.

And the 2nd patch fixes the issue by having the following as
canonical forms:
* `a ^ 1` is the canonical form of `(convert_back)(zero_one == 0)` (and 
`(convert_back)(zero_one != 1)`).
* `(convert)a` is the canonical form of `(convert)(zero_one != 0)` (and 
`(convert)(zero_one == 1)`).

Andrew Pinski (2):
  Remove check of unsigned_char in
maybe_undo_optimize_bit_field_compare.
  MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type
are the same

 gcc/analyzer/region-model-manager.cc   |  3 --
 gcc/fold-const.cc  | 27 -
 gcc/match.pd   | 16 ++
 gcc/testsuite/gcc.dg/fold-even-1.c | 32 
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c | 13 +
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c | 14 +
 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c | 34 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr69270.c|  4 +--
 gcc/testsuite/gcc.target/i386/pr110790-2.c | 16 --
 10 files changed, 103 insertions(+), 66 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/fold-even-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c

-- 
2.39.3



[PATCHv2 2/2] MATCH: (convert)(zero_one !=/== 0/1) for outer type and zero_one type are the same

2023-12-10 Thread Andrew Pinski
When I moved two_value to match.pd, I removed the check for the {0,+-1}
as I had placed it after the {0,+-1} case for cond in match.pd.
In the case of {0,+-1} and non boolean, before we would optmize those
case to just `(convert)a` but after we would get `(convert)(a != 0)`
which was not handled anyways to just `(convert)a`.
So this adds a pattern to match `(convert)(zeroone != 0)` and simplify
to `(convert)zeroone`.

Also this optimizes (convert)(zeroone == 0) into (zeroone^1) if the
type match. Removing the opposite transformation from fold.
The opposite transformation was added with
https://gcc.gnu.org/pipermail/gcc-patches/2006-February/190514.html
It is no longer considered the canonicalization either, even VRP will
transform it back into `(~a) & 1` so removing it is a good idea.

Note the testcase pr69270.c needed a slight update due to not matching
exactly a scan pattern, this update makes it more robust and will match
before and afterwards and if there are other changes in this area too.

Note the testcase gcc.target/i386/pr110790-2.c needs a slight update
for better code generation in LP64 bit mode.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/111972
PR tree-optimization/110637
* match.pd (`(convert)(zeroone !=/== CST)`): Match
and simplify to ((convert)zeroone){,^1}.
* fold-const.cc (fold_binary_loc): Remove
transformation of `(~a) & 1` and `(a ^ 1) & 1)`
into `(convert)(a == 0)`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr110637-1.c: New test.
* gcc.dg/tree-ssa/pr110637-2.c: New test.
* gcc.dg/tree-ssa/pr110637-3.c: New test.
* gcc.dg/tree-ssa/pr111972-1.c: New test.
* gcc.dg/tree-ssa/pr69270.c: Update testcase.
* gcc.target/i386/pr110790-2.c: Update testcase.
* gcc.dg/fold-even-1.c: Removed.

Signed-off-by: Andrew Pinski 
---
 gcc/fold-const.cc  | 27 -
 gcc/match.pd   | 16 ++
 gcc/testsuite/gcc.dg/fold-even-1.c | 32 
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c | 13 +
 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c | 14 +
 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c | 34 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr69270.c|  4 +--
 gcc/testsuite/gcc.target/i386/pr110790-2.c | 16 --
 9 files changed, 103 insertions(+), 63 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.dg/fold-even-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110637-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111972-1.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 2692b98ceac..f5d68ac323a 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12077,33 +12077,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
   goto bit_rotate;
 
 case BIT_AND_EXPR:
-  /* Fold (X ^ 1) & 1 as (X & 1) == 0.  */
-  if (TREE_CODE (arg0) == BIT_XOR_EXPR
- && INTEGRAL_TYPE_P (type)
- && integer_onep (TREE_OPERAND (arg0, 1))
- && integer_onep (arg1))
-   {
- tree tem2;
- tem = TREE_OPERAND (arg0, 0);
- tem2 = fold_convert_loc (loc, TREE_TYPE (tem), arg1);
- tem2 = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (tem),
- tem, tem2);
- return fold_build2_loc (loc, EQ_EXPR, type, tem2,
- build_zero_cst (TREE_TYPE (tem)));
-   }
-  /* Fold ~X & 1 as (X & 1) == 0.  */
-  if (TREE_CODE (arg0) == BIT_NOT_EXPR
- && INTEGRAL_TYPE_P (type)
- && integer_onep (arg1))
-   {
- tree tem2;
- tem = TREE_OPERAND (arg0, 0);
- tem2 = fold_convert_loc (loc, TREE_TYPE (tem), arg1);
- tem2 = fold_build2_loc (loc, BIT_AND_EXPR, TREE_TYPE (tem),
- tem, tem2);
- return fold_build2_loc (loc, EQ_EXPR, type, tem2,
- build_zero_cst (TREE_TYPE (tem)));
-   }
   /* Fold !X & 1 as X == 0.  */
   if (TREE_CODE (arg0) == TRUTH_NOT_EXPR
  && integer_onep (arg1))
diff --git a/gcc/match.pd b/gcc/match.pd
index 4d554ba4721..e242eac92f5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3332,6 +3332,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0)))
 (rcmp @0 @1
 
+/* (type)([0,1]@a != 0) -> (type)a
+   (type)([0,1]@a == 1) -> (type)a
+   (type)([0,1]@a == 0) -> a ^ 1
+   (type)([0,1]@a != 1) -> a ^ 1.  */
+(for eqne (eq ne)
+ (simplify
+  (convert (eqne zero_one_valued_p@0 INTEGER_CST@1))
+  (if ((integer_zerop (@1) || integer_onep (@1)))
+   (if ((eqne == EQ_EXPR) ^ intege

[pushed] aarch64: Fix invalid subregs for BE svread/write_za

2023-12-10 Thread Richard Sandiford
Multi-register svread_za and svwrite_za are implemented using one
pattern per register count, with the register contents being bitcast
on entry (for writes) or return (for reads).  Previously we relied
on subregs for this, with the subreg for reads being handled by
target-independent code.  But using subregs isn't correct for many
big-endian cases, where following subreg rules often requires actual
instructions.  The semantics are instead supposed to be those of
svreinterpret.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
PR target/112931
PR target/112933
* config/aarch64/aarch64-protos.h (aarch64_sve_reinterpret): Declare.
* config/aarch64/aarch64.cc (aarch64_sve_reinterpret): New function.
* config/aarch64/aarch64-sve-builtins-sme.cc (svread_za_impl::expand)
(svwrite_za_impl::expand): Use it to cast the SVE register to the
right mode.
---
 gcc/config/aarch64/aarch64-protos.h   |  1 +
 .../aarch64/aarch64-sve-builtins-sme.cc   |  5 +++--
 gcc/config/aarch64/aarch64.cc | 22 +++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index d1af7f40891..eaf74a725e7 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -789,6 +789,7 @@ bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, 
rtx, rtx);
 bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned 
HOST_WIDE_INT,
unsigned HOST_WIDE_INT,
unsigned HOST_WIDE_INT);
+rtx aarch64_sve_reinterpret (machine_mode, rtx);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);
 machine_mode aarch64_sve_int_mode (machine_mode);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
index 8d06a72f384..047a333ef47 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-sme.cc
@@ -365,7 +365,8 @@ public:
   expand (function_expander &e) const override
   {
 machine_mode mode = e.vectors_per_tuple () == 4 ? VNx8DImode : VNx4DImode;
-return e.use_exact_insn (code_for_aarch64_sme_read (mode));
+rtx res = e.use_exact_insn (code_for_aarch64_sme_read (mode));
+return aarch64_sve_reinterpret (e.result_mode (), res);
   }
 };
 
@@ -457,7 +458,7 @@ public:
   expand (function_expander &e) const override
   {
 machine_mode mode = e.vectors_per_tuple () == 4 ? VNx8DImode : VNx4DImode;
-e.args[1] = lowpart_subreg (mode, e.args[1], e.tuple_mode (1));
+e.args[1] = aarch64_sve_reinterpret (mode, e.args[1]);
 return e.use_exact_insn (code_for_aarch64_sme_write (mode));
   }
 };
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 2a64053f675..0889ceb7db1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -3226,6 +3226,28 @@ aarch64_split_simd_move (rtx dst, rtx src)
 }
 }
 
+/* Return a register that contains SVE value X reinterpreted as SVE mode MODE.
+   The semantics of those of svreinterpret rather than those of subregs;
+   see the comment at the head of aarch64-sve.md for details about the
+   difference.  */
+
+rtx
+aarch64_sve_reinterpret (machine_mode mode, rtx x)
+{
+  if (GET_MODE (x) == mode)
+return x;
+
+  /* can_change_mode_class must only return true if subregs and svreinterprets
+ have the same semantics.  */
+  if (targetm.can_change_mode_class (GET_MODE (x), mode, FP_REGS))
+return lowpart_subreg (mode, x, GET_MODE (x));
+
+  rtx res = gen_reg_rtx (mode);
+  x = force_reg (GET_MODE (x), x);
+  emit_insn (gen_aarch64_sve_reinterpret (mode, res, x));
+  return res;
+}
+
 bool
 aarch64_zero_extend_const_eq (machine_mode xmode, rtx x,
  machine_mode ymode, rtx y)
-- 
2.25.1



[pushed] aarch64: Fix SMSTART/SMSTOP save/restore for BE

2023-12-10 Thread Richard Sandiford
VNx16QI (the SVE register byte mode) is the only SVE mode for which
LD1 and LDR result in the same register layout for big-endian.  It is
therefore the only mode for which we allow LDR and STR to be used for
big-endian SVE moves.

The SME support sometimes needs to use LDR and STR to save and restore
Z register contents around an SMSTART/SMSTOP SM.  It therefore needs
to use VNx16QI regardless of the type of value that is stored in the
Z registers.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/
PR target/112930
* config/aarch64/aarch64.cc (aarch64_sme_mode_switch_regs::add_reg):
Force specific SVE modes for single registers as well as structures.
---
 gcc/config/aarch64/aarch64.cc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5cffdabc62e..2a64053f675 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4956,14 +4956,17 @@ aarch64_sme_mode_switch_regs::add_reg (machine_mode 
mode, unsigned int regno)
   gcc_assert ((vec_flags & VEC_STRUCT) || end_regno == regno + 1);
   for (; regno < end_regno; regno++)
 {
+  /* Force the mode of SVE saves and restores even for single registers.
+This is necessary because big-endian targets only allow LDR Z and
+STR Z to be used with byte modes.  */
   machine_mode submode = mode;
-  if (vec_flags & VEC_STRUCT)
+  if (vec_flags & VEC_SVE_PRED)
+   submode = VNx16BImode;
+  else if (vec_flags & VEC_SVE_DATA)
+   submode = SVE_BYTE_MODE;
+  else if (vec_flags & VEC_STRUCT)
{
- if (vec_flags & VEC_SVE_PRED)
-   submode = VNx16BImode;
- else if (vec_flags & VEC_SVE_DATA)
-   submode = SVE_BYTE_MODE;
- else if (vec_flags & VEC_PARTIAL)
+ if (vec_flags & VEC_PARTIAL)
submode = V8QImode;
  else
submode = V16QImode;
-- 
2.25.1



[pushed] aarch64: XFAIL some SME tests for BE

2023-12-10 Thread Richard Sandiford
The z0_z23 tests rely on being able to propagate:

  (1) set of double-register z0-z1
  (2) copy of z0 to z28
  (3) use of z28

to a use of z0.  On LE targets it's regcprop that does this.
But regcprop punts on (2) because of:

  https://gcc.gnu.org/pipermail/gcc-patches/2002-July/081990.html

This patch therefore XFAILs the affected tests.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/uzp_bf16_x2.c: XFAIL z0_z23 tests
for big-endian.
* gcc.target/aarch64/sme2/acle-asm/uzp_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzp_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/uzpq_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zip_u8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_bf16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_f64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_s8_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u16_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u32_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u64_x2.c: Likewise.
* gcc.target/aarch64/sme2/acle-asm/zipq_u8_x2.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_bf16_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_f16_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_f32_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_f64_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_s16_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_s32_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_s64_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_s8_x2.c| 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_u16_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_u32_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_u64_x2.c   | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzp_u8_x2.c| 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_bf16_x2.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_f16_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_f32_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_f64_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_s16_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme2/acle-asm/uzpq_s32_x2.c  | 2 +-
 gcc/testsuite/gcc.target/aarch

[pushed] aarch64: Skip some SME register save tests on BE

2023-12-10 Thread Richard Sandiford
Big-endian targets need to save Z8-Z15 in the same order as
the registers would appear for D8-D15, because the layout is
mandated by the EH ABI.  BE targets therefore use ST1D instead
of the normal STR for those registers (but not for others).

That difference is already tested elsewhere and isn't important
for the SME tests.  This patch therefore restricts the affected
tests to LE.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_5.c: Restrict tests that
contain Z8-Z23 saves to little-endian.
* gcc.target/aarch64/sme/call_sm_switch_8.c: Likewise.
* gcc.target/aarch64/sme/locally_streaming_1.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c| 6 +++---
 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c| 6 +++---
 gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
index e3d9bc274d8..6238ab80da2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
@@ -14,7 +14,7 @@ struct callbacks {
 };
 
 /*
-** n_caller:   { target lp64 }
+** n_caller:   { target { lp64 && aarch64_little_endian } }
 ** stp x30, (x19|x2[0-8]), \[sp, #?-32\]!
 ** cntdx16
 ** str x16, \[sp, #?16\]
@@ -114,7 +114,7 @@ n_caller (struct callbacks *c)
 }
 
 /*
-** s_caller:   { target lp64 }
+** s_caller:   { target { lp64 && aarch64_little_endian } }
 ** stp x30, (x19|x2[0-8]), \[sp, #?-32\]!
 ** cntdx16
 ** str x16, \[sp, #?16\]
@@ -214,7 +214,7 @@ s_caller (struct callbacks *c) [[arm::streaming]]
 }
 
 /*
-** sc_caller:
+** sc_caller:  { target aarch64_little_endian }
 ** stp x29, x30, \[sp, #?-32\]!
 ** mov x29, sp
 ** cntdx16
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
index f44724df32f..c909b34ff5e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_8.c
@@ -7,7 +7,7 @@ svint8_t produce_z0 ();
 void consume_z0 (svint8_t);
 
 /*
-** test_z0:
+** test_z0:{ target aarch64_little_endian }
 ** ...
 ** smstop  sm
 ** bl  produce_z0
@@ -32,7 +32,7 @@ svint8x4_t produce_z3 ();
 void consume_z3 (svint8x4_t);
 
 /*
-** test_z3:
+** test_z3:{ target aarch64_little_endian }
 ** ...
 ** smstop  sm
 ** bl  produce_z3
@@ -61,7 +61,7 @@ svbool_t produce_p0 ();
 void consume_p0 (svbool_t);
 
 /*
-** test_p0:
+** test_p0:{ target aarch64_little_endian }
 ** ...
 ** smstop  sm
 ** bl  produce_p0
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c 
b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c
index 20ff4b87d94..4bb637f4781 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/locally_streaming_1.c
@@ -265,7 +265,7 @@ n_ls_vector_pcs ()
 }
 
 /*
-** n_ls_sve_pcs:
+** n_ls_sve_pcs:   { target aarch64_little_endian }
 ** sub sp, sp, #?16
 ** cntdx16
 ** str x16, \[sp\]
-- 
2.25.1



[pushed] aarch64: Add -funwind-tables to some tests

2023-12-10 Thread Richard Sandiford
The .cfi scans in these tests failed for *-elf targets because
those targets don't enable .eh_frame info by default.

Tested on aarch64-linux-gnu and aarch64_be-elf, pushed to trunk.

Richard


gcc/testsuite/
* gcc.target/aarch64/sme/call_sm_switch_1.c: Add -funwind-tables.
* gcc.target/aarch64/sme/call_sm_switch_3.c: Likewise.
* gcc.target/aarch64/sme/call_sm_switch_5.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
index a2de55773af..98922aaeae0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_1.c
@@ -1,4 +1,4 @@
-// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
-funwind-tables" }
 // { dg-final { check-function-bodies "**" "" } }
 
 void ns_callee ();
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
index ed999d08560..4250fe7984c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_3.c
@@ -1,4 +1,4 @@
-// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
-funwind-tables" }
 // { dg-final { check-function-bodies "**" "" } }
 
 __attribute__((aarch64_vector_pcs)) void ns_callee ();
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c 
b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
index be9b5cc0410..e3d9bc274d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
+++ b/gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5.c
@@ -1,4 +1,4 @@
-// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls" }
+// { dg-options "-O -fomit-frame-pointer -fno-optimize-sibling-calls 
-funwind-tables" }
 // { dg-final { check-function-bodies "**" "" } }
 
 #include 
-- 
2.25.1



Re: [PATCH v3 3/3] c++: note other candidates when diagnosing deletedness

2023-12-10 Thread Jason Merrill

On 11/30/23 10:46, Patrick Palka wrote:

On Fri, 27 Oct 2023, Patrick Palka wrote:

On Fri, 27 Oct 2023, Jason Merrill wrote:

On 10/27/23 15:55, Patrick Palka wrote:

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

deleted16.C: In function 'int main()':
deleted16.C:10:4: error: use of deleted function 'void f(int)'
   10 |   f(0);
  |   ~^~~
deleted16.C:5:6: note: declared here
5 | void f(int) = delete;
  |  ^
deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
deleted16.C:6:6: note: candidate: 'void f(...)'
6 | void f(...);
  |  ^
deleted16.C:7:6: note: candidate: 'void f(int, int)'
7 | void f(int, int);
  |  ^
deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag -fnote-all-cands,
which also controls whether we note ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document -fnote-all-cands.

gcc/c-family/ChangeLog:

* c.opt: Add -fnote-all-cands.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fnote-all-cands is set.
(build_over_call): When diagnosing deletedness, call
print_z_candidates if -fnote-all-cands is set.


My suggestion was also to suggest using the flag in cases where it would make
a difference, e.g.

note: some candidates omitted, use '-fnote-all-cands' to display them


Ah thanks, fixed.  That'll help a lot with discoverability of the flag.



Maybe "-fdiagnostics-all-candidates"?


Nice, that's a better name indeed :)

How does the following look?  Full bootstrap/regtest in progress.

Here's the output of e.g. deleted16a.C.  I think I'd prefer to not print
the source line when emitting the suggestion, but I don't know how to do
that properly (aside from e.g. emitting the note at UNKNOWN_LOCATION).

In file included from gcc/testsuite/g++.dg/cpp0x/deleted16a.C:4:
gcc/testsuite/g++.dg/cpp0x/deleted16.C: In function ‘int main()’:
gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: error: use of deleted function 
‘void f(int)’
21 |   f(0); // { dg-error "deleted" }
   |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:6:6: note: declared here
 6 | void f(int) = delete; // { dg-message "declared here" }
   |  ^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:21:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
21 |   f(0); // { dg-error "deleted" }
   |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: error: use of deleted function 
‘void g(int)’
22 |   g(0); // { dg-error "deleted" }
   |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:12:6: note: declared here
12 | void g(int) = delete; // { dg-message "declared here" }
   |  ^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:22:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
22 |   g(0); // { dg-error "deleted" }
   |   ~^~~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: error: use of deleted function 
‘void h(T, T) [with T = int]’
23 |   h(1, 1); // { dg-error "deleted" }
   |   ~^~
gcc/testsuite/g++.dg/cpp0x/deleted16.C:17:24: note: declared here
17 | template void h(T, T) = delete; // { dg-message "declared 
here|candidate" }
   |^
gcc/testsuite/g++.dg/cpp0x/deleted16.C:23:4: note: use 
‘-fdiagnostics-all-candidates’ to display considered candidates
23 |   h(1, 1); // { dg-error "deleted" }
   |   ~^~

-- >8 --


Subject: [PATCH 3/3] c++: note other candidates when diagnosing deletedness

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

   deleted16.C: In function 'int main()':
   deleted16.C:10:4: error: use of deleted function 'void f(int)'
  10 |   f(0);
 |   ~^~~
   deleted16.C:5:6: note: declared here
   5 | void f(int) = delete;
 |  ^
   deleted16.C:5:6: note: candidate: 'void f(int)' (deleted)
   deleted16.C:6:6: note: candidate: 'void f(...)'
   6 | void f(...);
 |  ^
   deleted16.C:7:6: note: candidate: 'void f(int, int)'
   7 | void f(int, int);
 |  ^
   deleted16.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag
-fdiagnostics-all-candidates which also controls whether we note
ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document
-fdiagnostics-all-candidates.

gcc/c-family/ChangeLog:

* c.opt: Add -fdiagnostics-all-candidates.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fdiagnostics-all-candidates is set, otherwise suggest
the flag.
(build_over_call): When di

Re: [PATCH v3 10/11] c: Turn -Wincompatible-pointer-types into a permerror

2023-12-10 Thread Jason Merrill

On 11/30/23 16:36, Marek Polacek wrote:

On Thu, Nov 30, 2023 at 10:30:44PM +0100, Jakub Jelinek wrote:

On Thu, Nov 30, 2023 at 10:27:33PM +0100, Florian Weimer wrote:

* Jakub Jelinek:


On Thu, Nov 30, 2023 at 04:15:26PM -0500, Marek Polacek wrote:

On Thu, Nov 30, 2023 at 10:11:31PM +0100, Florian Weimer wrote:

* Marek Polacek:


On Mon, Nov 20, 2023 at 10:56:36AM +0100, Florian Weimer wrote:

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6183,6 +6183,7 @@ that have their own flag:
  @gccoptlist{
  -Wimplicit-function-declaration @r{(C)}
  -Wimplicit-int @r{(C)}
+-Wincompatible-pointer-types @r{(C)}
  -Wint-conversion @r{(C)}
  -Wnarrowing @r{(C++)}
  -Wreturn-mismatch @r{(C)}


BTW, should the C ones mention Objective-C as well?


Isn't there Objective-C++ as well?  I assumed it applied to both
dialects.


I think we usually spell both, if they apply.  But you can leave it as it is.


Seems we use (C and Objective-C only) (40 times) in preference to (C only)
(4 times), (C++ and Objective-C++ only) (61 times) in preference to (6
times), but (C and C++ only) (5 times) and never all 4 languages, even
when I think it is very likely some switch would be C only, C++ only or
C and C++ only.  And (C) is used just for Copyright ;)


So it should say “C and Objective-C only” for the C stuff,


Yes please.

and “C++ and
Objective-C++ only” for -Wnarrowing?


I'd say so, but let's wait for Jason.


FWIW, I just checked the code and I agree.


Yes.  Adjusted thus:


From fa99f7d12b87f36d3c38349fcdcfca074564858d Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Sun, 10 Dec 2023 14:20:32 -0500
Subject: [PATCH] doc: small tweak
To: gcc-patches@gcc.gnu.org

Mention Objective-C++ here to be consistent with the surrounding C/ObjC
lines.

gcc/ChangeLog:

	* doc/invoke.texi (-fpermissive): Mention ObjC++ for -Wnarrowing.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d4e689b64c0..15f3a86e768 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6246,7 +6246,7 @@ that have their own flag:
 -Wimplicit-int @r{(C and Objective-C only)}
 -Wincompatible-pointer-types @r{(C and Objective-C only)}
 -Wint-conversion @r{(C and Objective-C only)}
--Wnarrowing @r{(C++)}
+-Wnarrowing @r{(C++ and Objective-C++ only)}
 -Wreturn-mismatch @r{(C and Objective-C only)}
 }
 
-- 
2.39.3



Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-10 Thread Jason Merrill

On 12/10/23 10:22, waffl3x wrote:

On Friday, December 8th, 2023 at 12:25 PM, Jason Merrill  
wrote:



On 12/6/23 02:33, waffl3x wrote:


Here is the next version, it feels very close to finished. As before, I
haven't ran a bootstrap or the full testsuite yet but I did run the
explicit-obj tests which completed as expected.

There's a few test cases that still need to be written but more tests
can always be added. The behavior added by CWG2789 works in at least
one case, but I have not added tests for it yet. The test cases for
dependent lambda expressions need to be fleshed out more, but a few
temporary ones are included to demonstrate that they do work and that
the crash is fixed. Explicit object conversion functions work, but I
need to add fleshed out tests for them, explicit-obj-basic5.C has that
test.



@@ -6586,6 +6586,17 @@ add_candidates (tree fns, tree first_arg, const vec args,
+ / FIXME: I believe this will be bugged for xobj member functions,
+ leaving this comment here to make sure we look into it
+ at some point.
+ Seeing this makes me want correspondence checking to be unified
+ in one place though, not sure if this one needs to be different
+ from other ones though.
+ This function is only used here, but maybe we can use it in add
+ method and move some of the logic out of there?



fns_correspond absolutely needs updating to handle xob fns, and doing
that by unifying it with add_method's calculation would be good.


+ Side note: CWG2586 might be relevant for this area in
+ particular, perhaps we wait to see if it gets accepted first? */



2586 was accepted last year.


@@ -12574,17 +12601,25 @@ cand_parms_match (z_candidate *c1, z_candidate c2)
fn1 = DECL_TEMPLATE_RESULT (t1);
fn2 = DECL_TEMPLATE_RESULT (t2);
}
+ / The changes I made here might be stuff I was told not to worry about?
+ I'm not really sure so I'm going to leave it in. */



Good choice, this comment can go.


tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (fn2));
if (DECL_FUNCTION_MEMBER_P (fn1)
&& DECL_FUNCTION_MEMBER_P (fn2)
- && (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn1)
- != DECL_NONSTATIC_MEMBER_FUNCTION_P (fn2)))
+ && (DECL_STATIC_FUNCTION_P (fn1)
+ != DECL_STATIC_FUNCTION_P (fn2)))
{
/* Ignore 'this' when comparing the parameters of a static member
function with those of a non-static one. */
- parms1 = skip_artificial_parms_for (fn1, parms1);
- parms2 = skip_artificial_parms_for (fn2, parms2);
+ auto skip_parms = [](tree fn, tree parms){
+ if (DECL_XOBJ_MEMBER_FUNCTION_P (fn))
+ return TREE_CHAIN (parms);
+ else
+ return skip_artificial_parms_for (fn, parms);
+ };
+ parms1 = skip_parms (fn1, parms1);
+ parms2 = skip_parms (fn2, parms2);
}



https://cplusplus.github.io/CWG/issues/2789.html fixes the handling of
xobj fns here.


I have a minor concern here.
https://eel.is/c++draft/basic.scope.scope#3

Is it okay that CWG2789 has separate rules than the rules for
declarations? As far as I know there's nothing else that describes how
to handle the object parameters so it was my assumption that the rules
here are also used for that. I know at least one person has disagreed
with me about that so I'm not sure what is actually correct.

template 
struct S {
   constexpr void f() {}   // #f1
   constexpr void f(this S&&) requires true {} // #f2

   constexpr void g() requires true {} // #g1
   constexpr void g(this S&&) {}   // #g2
};

void test() {
   S<>{}.f(); // calls #?
   S<>{}.g(); // calls #?
}

But with the wording proposed by CWG2789, wouldn't this example would
be a problem? If we follow the standard to the letter, constraints
can't be applied here right?

I wouldn't be surprised if I'm missing something but I figured I ought
to raise it just in case. Obviously it should call #f2 and #g1 but I'm
pretty sure the current wording only allows these calls to be ambiguous.


I agree, I've suggested amending 2789 to use "corresponding object 
parameter".



@@ -8727,21 +8882,42 @@ resolve_address_of_overloaded_function (tree 
target_type,
/* Good, exactly one match. Now, convert it to the correct type. */
fn = TREE_PURPOSE (matches);

- if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
- && !(complain & tf_ptrmem_ok) && !flag_ms_extensions)
+ if (DECL_OBJECT_MEMBER_FUNCTION_P (fn)
+ && !(complain & tf_ptrmem_ok))
{
- static int explained;
-
- if (!(complain & tf_error))
+ /* For iobj member functions, if if -fms_extensions was passed in, this
+ is not an error, so we do nothing. It is still an error regardless
+ for xobj member functions though, as it is a new feature we
+ (hopefully) don't need to support the behavior. */



Unfortunately, it seems that MSVC extended their weirdness to xobj fns,
so -fms-extensions should as well.
https://godbolt.org/z/nfvn64Kx5


Ugh, what a shame. We don't have to support it though do we? The
documentation for -fms-extensions seems to state that it's for
microsoft headers, if they aren't u

Re: [PATCH] c++: Clear uninstantiated friend flag when instantiating [PR104234]

2023-12-10 Thread Jason Merrill

On 11/23/23 08:40, Nathaniel Shead wrote:

Sorry, I just noticed I hadn't actually filled in the changelog. It
should say "Clear DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P."


OK with that change.


On Thu, Nov 23, 2023 at 11:54 PM Nathaniel Shead
 wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Otherwise attempting to get the originating module declaration ICEs
because the DECL_CHAIN of an instantiated friend template is no longer
its context.

 PR c++/104234
 PR c++/112580

gcc/cp/ChangeLog:

 * pt.cc (tsubst_template_decl):

gcc/testsuite/ChangeLog:

 * g++.dg/modules/pr104234.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc|  2 ++
  gcc/testsuite/g++.dg/modules/pr104234.C | 16 
  2 files changed, 18 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr104234.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ed681afb5d4..5e10a523e1a 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -14789,6 +14789,8 @@ tsubst_template_decl (tree t, tree args, tsubst_flags_t 
complain,
if (PRIMARY_TEMPLATE_P (t))
  DECL_PRIMARY_TEMPLATE (r) = r;

+  DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (r) = false;
+
if (!lambda_fntype && !class_p)
  {
/* Record this non-type partial instantiation.  */
diff --git a/gcc/testsuite/g++.dg/modules/pr104234.C 
b/gcc/testsuite/g++.dg/modules/pr104234.C
new file mode 100644
index 000..d81f0d435bc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr104234.C
@@ -0,0 +1,16 @@
+// PR c++/104234
+// { dg-additional-options "-fmodules-ts" }
+
+template  struct _Node_handle_common {
+  template  friend class _Rb_tree;
+};
+struct _Hashtable {
+  using node_type = _Node_handle_common;
+  node_type __trans_tmp_1;
+};
+template  class _Rb_tree {
+  struct _Rb_tree_impl {
+_Rb_tree_impl();
+  } _M_impl;
+};
+_Rb_tree _M_tmap_;
--
2.42.0







Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-10 Thread Jason Merrill

On 12/10/23 05:22, Richard Biener wrote:

Am 09.12.2023 um 21:13 schrieb Jason Merrill :

On 11/2/23 21:18, Nathaniel Shead wrote:

Bootstrapped and regtested on x86-64_pc_linux_gnu.
I'm not entirely sure if the change I made to have destructors clobber with
CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to have
broken by doing this and I wasn't able to find anything else that really
depended on this distinction other than a warning pass. Otherwise I could
experiment with a new clobber kind for destructor calls.


It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is 
expiring at that point as well, which a (pseudo-)destructor does not imply; 
it's perfectly valid to destroy an object and then create another in the same 
storage.

We probably do want another clobber kind for end of object lifetime. And/or one 
for beginning of object lifetime.


There’s not much semantically different between UNDEF and end of object but not 
storage lifetime?  At least for what middle-end optimizations do.


That's fine for the middle-end, but Nathaniel's patch wants to 
distinguish between the clobbers at beginning and end of object lifetime 
in order to diagnose stores to an out-of-lifetime object in constexpr 
evaluation.


One option might be to remove the clobber at the beginning of the 
constructor; are there any useful optimizations enabled by that, or is 
it just pedantically breaking people's code?



EOL is used by stack slot sharing and that operates on the underlying storage, 
not individual objects live in it.


I wonder about changing the name to EOS (end of storage [duration]) to 
avoid similar confusion with object lifetime?


Jason



Re: [PATCH] treat argp-based mem as frame related in dse

2023-12-10 Thread Jeff Law




On 12/8/23 00:18, Jiufu Guo wrote:


Hi,

Jeff Law  writes:


On 12/6/23 02:27, Jiufu Guo wrote:

Hi,

The issue mentioned in PR112525 would be able to be handled by
updating dse.cc to treat arg_pointer_rtx similarly with frame_pointer_rtx.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30271#c10 also mentioned
this idea.
  One thing, 
arpg area may be used to pass argument to callee. So, it would
be needed to check if call insns are using that mem.

Bootstrap ®test pass on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)


PR rtl-optimization/112525

gcc/ChangeLog:

* dse.cc (get_group_info): Add arg_pointer_rtx as frame_related.
(check_mem_read_rtx): Add parameter to indicate if it is checking mem
for call insn.
(scan_insn): Add mem checking on call usage.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr112525.c: New test.

So conceptually the first chunk makes sense.  Though I do worry about
Andrew's comment about it causing a bootstrap failure.  Even thought
it was 15 years ago, it remains worrisome.


Yes, I understand your point.
At that time, it is a comparesion failure. It may be related to debug
info.  But I did not figure out possible failures.




@@ -2368,7 +2370,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t bb_info)
  /* If this read is just reading back something that we just
 stored, rewrite the read.  */
- if (store_info->rhs
+ if (!used_in_call
+ && store_info->rhs
  && store_info->group_id == -1
  && store_info->cse_base == base
  && known_subrange_p (offset, width, store_info->offset,
@@ -2650,6 +2653,12 @@ scan_insn (bb_info_t bb_info, rtx_insn *insn, int 
max_active_local_stores)
  that is not relative to the frame.  */
   add_non_frame_wild_read (bb_info);
   +  for (rtx link = CALL_INSN_FUNCTION_USAGE (insn);
+  link != NULL_RTX;
+  link = XEXP (link, 1))
+   if (GET_CODE (XEXP (link, 0)) == USE && MEM_P (XEXP (XEXP (link, 0),0)))
+ check_mem_read_rtx (&XEXP (XEXP (link, 0),0), bb_info, true);

I'm having a bit of a hard time convincing myself this is correct
though.  I can't see how rewriting the load to read the source of the
prior store is unsafe.  If that fixes a problem, then it would seem
like we've gone wrong before here -- perhaps failing to use the fusage
loads to "kill" any available stores to the same or aliased memory
locations.

As you said the later one, call's fusage would killing the previous
store. It is a kind of case like:

   134: [argp:SI+0x8]=r134:SI
   135: [argp:SI+0x4]=0x1
   136: [argp:SI]=r132:SI
   137: ax:SI=call [`memset'] argc:0xc
   REG_CALL_DECL `memset'
   REG_EH_REGION 0

This call insn is:
(call_insn/j 137 136 147 27 (set (reg:SI 0 ax)
 (call (mem:QI (symbol_ref:SI ("memset") [flags 0x41]  ) [0 __builtin_memset S1 A8])
 (const_int 12 [0xc]))) "pr102798.c":23:22 1086 {*sibcall_value}
  (expr_list:REG_UNUSED (reg:SI 0 ax)
 (expr_list:REG_CALL_DECL (symbol_ref:SI ("memset") [flags 0x41]  
)
 (expr_list:REG_EH_REGION (const_int 0 [0])
 (nil
 (expr_list:SI (use (mem/f:SI (reg/f:SI 16 argp) [0  S4 A32]))
 (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 4 
[0x4])) [0  S4 A32]))
 (expr_list:SI (use (mem:SI (plus:SI (reg/f:SI 16 argp) (const_int 
8 [0x8])) [0  S4 A32]))
 (nil)

The stores in "insns 134-136" are used by the call. "check_mem_read_rtx"
would prevent them to eliminated.
Right.  But unless I read something wrong, the patch wasn't changing 
store removal, it was changing whether or not we forwarded the source of 
the store into the destination of a subsequent load from the same address.


So I'm still a bit confused how this patch impacted store removal.

jeff


Re: [PATCH v26 00/23] Optimize type traits compilation performance

2023-12-10 Thread Jason Merrill

On 12/7/23 00:11, Ken Matsui wrote:

This patch series optimizes type traits compilation performance by
implementing built-in type traits and using them in libstdc++.

Changes in v26:

* Rebased on top of trunk.
* Moved is_function_v under is_const_v.
* Isolated patches for is_const, is_volatile, is_pointer, and
is_unbounded_array, which contain performance regression, from
this patch series since they are not ready for review yet.


I've applied all the compiler patches, with a few small tweaks, 
including this one as a separate commit.  One other was a formatting 
fix, the lats was using TYPE_PTRDATAMEM_P for CPTK_IS_MEMBER_OBJECT_POINTER.


I'm leaving the library patches for library folks to apply.

Thanks!

Jason


From e410303f768fa7b020e46f3bd7d28381144e5340 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Fri, 8 Dec 2023 15:55:49 -0500
Subject: [PATCH 01/11] c++: trait patch tweak
To: gcc-patches@gcc.gnu.org

As Patrick suggested elsewhere, let's move this into the default case.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_simple_type_specifier): Move trait
	handling to default label.
---
 gcc/cp/parser.cc | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 9e76426566b..b987324f669 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -20164,22 +20164,22 @@ cp_parser_simple_type_specifier (cp_parser* parser,
   return type;
 
 default:
+  /* If token is a type-yielding built-in traits, parse it.  */
+  const cp_trait* trait = cp_lexer_peek_trait_type (parser->lexer);
+  if (trait)
+	{
+	  type = cp_parser_trait (parser, trait);
+	  if (decl_specs)
+	cp_parser_set_decl_spec_type (decl_specs, type,
+	  token,
+	  /*type_definition_p=*/false);
+
+	  return type;
+	}
+
   break;
 }
 
-  /* If token is a type-yielding built-in traits, parse it.  */
-  const cp_trait* trait = cp_lexer_peek_trait_type (parser->lexer);
-  if (trait)
-{
-  type = cp_parser_trait (parser, trait);
-  if (decl_specs)
-	cp_parser_set_decl_spec_type (decl_specs, type,
-  token,
-  /*type_definition_p=*/false);
-
-  return type;
-}
-
   /* If token is an already-parsed decltype not followed by ::,
  it's a simple-type-specifier.  */
   if (token->type == CPP_DECLTYPE
-- 
2.39.3



Re: [PATCH] -finline-stringops: don't assume ptr_mode ptr in memset [PR112804]

2023-12-10 Thread Jeff Law




On 12/8/23 19:25, Alexandre Oliva wrote:


On aarch64 -milp32, and presumably on other such targets, ptr can be
in a different mode than ptr_mode in the testcase.  Cope with it.

Regstrapped on x86_64-linux-gnu, also tested the new test on
aarch64-elf.  Ok to install?


for  gcc/ChangeLog

PR target/112804
* builtins.cc (try_store_by_multiple_pieces): Use ptr's mode
 for the increment.

for  gcc/testsuite/ChangeLog

PR target/112804
* gcc.target/aarch64/inline-mem-set-pr112804.c: New.
Hmmm.  I'd like a little more information here I wouldn't expect those 
modes to differ even for ILP32 on a 64bit target.  Are we just papering 
over a problem elsewhere?


Jeff


Re: [PATCH] multiflags: fix doc warning

2023-12-10 Thread Jeff Law




On 12/8/23 19:48, Alexandre Oliva wrote:


Comply with dubious doc warning that after an @xref there must be a
comma or a period, not a close parentheses.

Build-testing on x86_64-linux-gnu now.  Ok to install?


for  gcc/ChangeLog

* doc/invoke.texi (multiflags): Add period after @xref to
silence warning.

OK
jeff


Re: [PING] PR112380: Defend against CLOBBERs in RTX expressions in combine.cc

2023-12-10 Thread Jeff Law




On 12/10/23 07:56, Roger Sayle wrote:


I'd like to ping my patch for PR rtl-optimization/112380.
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636203.html
Sorry, I'd hoped Segher would chime in.  The first simple patch & 
testcase are OK and can go in immediately.  It'll take more time on the 
larger patch to work through it.


jeff


[committed] Support uaddv and usubv on the H8

2023-12-10 Thread Jeff Law
This patch adds uaddv/usubv support on the H8 port to speed up those 
pesky builtin-overflow tests.  It's a variant of something I'd been 
running for a while -- the major change between the old approach I'd 
been using and this patch is this version does not expose the CC 
register until after reload to be consistent with the rest of the H8 port.


The general approach is to first clear the GPR that's going to hold the 
overflow status, perform the arithmetic operation (add/sub), then use 
addx to move the overflow indicator (in the C bit) into the GPR holding 
the overflow status.


That's a significant improvement over the mess of logicals that's 
generated by the generic code.


Handling signed overflow is possible and something I'll probably port to 
this scheme at some point.  It's a bit more complex because we can't 
trivially move the bit from CCR into the right position in a GPR and 
other quirks of the H8.


This has been regression tested on the H8 without problems.  Pushing to 
the trunk.


Jeffcommit 7fb9454c748632d148a07c275ea1f77b290b0c2d
Author: Jeff Law 
Date:   Sun Dec 10 10:41:05 2023 -0700

[committed] Support uaddv and usubv on the H8

This patch adds uaddv/usubv support on the H8 port to speed up those pesky
builtin-overflow tests.  It's a variant of something I'd been running for a
while -- the major change between the old approach I'd been using and this
patch is this version does not expose the CC register until after reload to 
be
consistent with the rest of the H8 port.

The general approach is to first clear the GPR that's going to hold the
overflow status, perform the arithmetic operation (add/sub), then use addx 
to
move the overflow indicator (in the C bit) into the GPR holding the overflow
status.

That's a significant improvement over the mess of logicals that's generated 
by
the generic code.

Handling signed overflow is possible and something I'll probably port to 
this
scheme at some point.  It's a bit more complex because we can't trivially 
move
the bit from CCR into the right position in a GPR and other quirks of the 
H8.

This has been regression tested on the H8 without problems.  Pushing to the
trunk.

gcc/
* config/h8300/addsub.md (uaddv4, usubv4): New 
expanders.
(uaddv): New define_insn_and_split plus post-reload pattern.

diff --git a/gcc/config/h8300/addsub.md b/gcc/config/h8300/addsub.md
index b1eb0d20188..32eba9df67a 100644
--- a/gcc/config/h8300/addsub.md
+++ b/gcc/config/h8300/addsub.md
@@ -239,3 +239,80 @@ (define_insn "*negsf2_clobber_flags"
   "reload_completed"
   "xor.w\\t#32768,%e0"
   [(set_attr "length" "4")])
+
+(define_expand "uaddv4"
+  [(set (match_operand:QHSI 0 "register_operand" "")
+   (plus:QHSI (match_operand:QHSI 1 "register_operand" "")
+  (match_operand:QHSI 2 "register_operand" "")))
+   (set (pc)
+   (if_then_else (ltu (match_dup 0) (match_dup 1))
+ (label_ref (match_operand 3 ""))
+ (pc)))]
+  "")
+
+(define_insn_and_split "*uaddv"
+  [(set (match_operand:QHSI2 3 "register_operand" "=&r")
+   (ltu:QHSI2 (plus:QHSI (match_operand:QHSI 1 "register_operand" "%0")
+  (match_operand:QHSI 2 "register_operand" "r"))
+   (match_dup 1)))
+   (set (match_operand:QHSI 0 "register_operand" "=r")
+   (plus:QHSI (match_dup 1) (match_dup 2)))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 3) (ltu:QHSI2 (plus:QHSI (match_dup 1) 
(match_dup 2))
+ (match_dup 1)))
+ (set (match_dup 0) (plus:QHSI (match_dup 1) (match_dup 2)))
+ (clobber (reg:CC CC_REG))])])
+
+(define_insn "*uaddv"
+  [(set (match_operand:QHSI2 3 "register_operand" "=&r")
+   (ltu:QHSI2 (plus:QHSI (match_operand:QHSI 1 "register_operand" "%0")
+(match_operand:QHSI 2 "register_operand" "r"))
+   (match_dup 1)))
+   (set (match_operand:QHSI 0 "register_operand" "=r")
+   (plus (match_dup 1) (match_dup 2)))
+   (clobber (reg:CC CC_REG))]
+  ""
+{
+  if (E_mode == E_QImode)
+{
+  if (E_mode == E_QImode)
+   return "sub.b\t%X3,%X3\;add.b\t%X2,%X0\;addx\t%X3,%X3";
+  else if (E_mode == E_HImode)
+   return "sub.b\t%X3,%X3\;add.w\t%T2,%T0\;addx\t%X3,%X3";
+  else if (E_mode == E_SImode)
+   return "sub.b\t%X3,%X3\;add.l\t%S2,%S0\;addx\t%X3,%X3";
+}
+  else if (E_mode == E_HImode)
+{
+  if (E_mode == E_QImode)
+   return "sub.w\t%T3,%T3\;add.b\t%X2,%X0\;addx\t%X3,%X3";
+  else if (E_mode == E_HImode)
+   return "sub.w\t%T3,%T3\;add.w\t%T2,%T0\;addx\t%X3,%X3";
+  else if (E_mode == E_SImode)
+   return "sub.w\t%T3,%T3\;add.l\t%S2,%S0\;addx\t%X3,%X3";
+}
+  else if (E_mode == E_SImode)
+{
+  if (E_mode == E_QImode)
+   return "sub.l\t%S3,%S3\;add.b\t%X2,%X0\;addx\t

[committed] Provide patterns for signed bitfield extractions on H8

2023-12-10 Thread Jeff Law
Inspired by Roger's work on the ARC port, this patch provides a 
define_and_split pattern to optimize sign extended bitfields starting at 
position 0 using an approach that doesn't require shifting.


It then builds on that to provide another define_and_split pattern to 
support arbitrary signed bitfield extractions -- it uses a right logical 
shift to move the bitfield into position 0, then the specialized pattern 
above to sign extend the MSB of the field through the rest of the register.


This is often, but certainly not always, better than a two shift 
approach.  The code uses the sizes of the sequences to select between 
the two shift approach and single shift with extension from an arbitrary 
location approach.


There's certainly further improvements that could be made here, but I 
think we're getting the bulk of the improvements already.


Regression tested on the H8 port without errors.  Installing on the trunk.

Jeffcommit 73f6e1fe8835085ccc6de5c5f4428d47e853913b
Author: Jeff Law 
Date:   Sun Dec 10 10:29:23 2023 -0700

[committed] Provide patterns for signed bitfield extractions on H8

Inspired by Roger's work on the ARC port, this patch provides a
define_and_split pattern to optimize sign extended bitfields starting at
position 0 using an approach that doesn't require shifting.

It then builds on that to provide another define_and_split pattern to 
support
arbitrary signed bitfield extractions -- it uses a right logical shift to 
move
the bitfield into position 0, then the specialized pattern above to sign 
extend
the MSB of the field through the rest of the register.

This is often, but certainly not always, better than a two shift approach.  
The
code uses the sizes of the sequences to select between the two shift 
approach
and single shift with extension from an arbitrary location approach.

There's certainly further improvements that could be made here, but I think
we're getting the bulk of the improvements already.

Regression tested on the H8 port without errors.  Installing on the trunk.

gcc/
* config/h8300/h8300-protos.h (use_extvsi): Prototype.
* config/h8300/combiner.md: Two new define_insn_and_split patterns
to implement signed bitfield extractions.
* config/h8300/h8300.cc (use_extvsi): New function.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index cce187805c7..d5f26b50983 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1269,7 +1269,54 @@ (define_insn ""
 ;;   (pc)))]
 ;;   "")
 
-;; Various ways to extract a single bit bitfield and sign extend it
+;; This is a signed bitfield extraction starting at bit 0
+;; It's usually faster than using shifts, but not always,
+;; particularly on the H8/S and H8/SX variants.
+(define_insn_and_split "*extvsi_n_0"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (sign_extract:SI (match_operand:SI 1 "register_operand" "0")
+(match_operand 2 "const_int_operand")
+(const_int 0)))]
+  "INTVAL (operands[2]) > 1
+   && INTVAL (operands[2]) < (TARGET_H8300S ? 26 - TARGET_H8300SX : 29)
+   && (!TARGET_H8300SX || (INTVAL (operands[2]) != 24 && INTVAL (operands[2]) 
!= 17))"
+  "#"
+  "&& reload_completed"
+[(parallel [(set (match_dup 0) (and:SI (match_dup 0) (match_dup 3)))
+   (clobber (reg:CC CC_REG))])
+ (parallel [(set (match_dup 0) (xor:SI (match_dup 0) (match_dup 4)))
+   (clobber (reg:CC CC_REG))])
+ (parallel [(set (match_dup 0) (minus:SI (match_dup 0) (match_dup 4)))
+   (clobber (reg:CC CC_REG))])]
+{
+  int tmp = INTVAL (operands[2]);
+  operands[3] = GEN_INT (~(HOST_WIDE_INT_M1U << tmp));
+  operands[4] = GEN_INT (HOST_WIDE_INT_1U << (tmp - 1));
+})
+
+(define_insn_and_split "*extvsi_n_n"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+   (sign_extract:SI (match_operand:SI 1 "register_operand" "0")
+(match_operand 2 "const_int_operand")
+(match_operand 3 "const_int_operand")))]
+  "(!h8300_shift_needs_scratch_p (INTVAL (operands[3]), SImode, LSHIFTRT)
+&& use_extvsi (INTVAL (operands[2]), INTVAL (operands[3])))"
+  "#"
+  "&& reload_completed"
+[(parallel [(set (match_dup 0) (lshiftrt:SI (match_dup 0) (match_dup 3)))
+   (clobber (reg:CC CC_REG))])
+ (parallel [(set (match_dup 0) (and:SI (match_dup 0) (match_dup 4)))
+   (clobber (reg:CC CC_REG))])
+ (parallel [(set (match_dup 0) (xor:SI (match_dup 0) (match_dup 5)))
+   (clobber (reg:CC CC_REG))])
+ (parallel [(set (match_dup 0) (minus:SI (match_dup 0) (match_dup 5)))
+   (clobber (reg:CC CC_REG))])]
+{
+  int tmp = INTVAL (operands[2]);
+  operands[4] = gen_int_mode (~(HOST_WIDE_INT_M1U << tmp), SImode);
+  operands[5] = gen_int_mode (HOST_WIDE_INT_1U << (tmp - 1), SImode);
+})
+
 ;;
 ;; Testing showed this 

[committed] Fix length computation of single bit bitfield extraction on H8

2023-12-10 Thread Jeff Law


Various approaches are used to optimize extracting a sign extended 
single bit bitfield.  The length computation of 10 bytes was 
conservatively correct, but inaccurate.


In particular when the bit we want is in the low half word we don't need 
the move high half to low half instruction.  Account for that in the 
length computation.


This was spotted when looking at regressions in the generalized signed 
bitfield extraction pattern.


This has been regression tested on the H8 port.

Pushing to the trunk,
Jeff
commit 1f55c5cc698519094f751257db62ff274c015fdc
Author: Jeff Law 
Date:   Sun Dec 10 10:05:18 2023 -0700

[committed] Fix length computation of single bit bitfield extraction on H8

Various approaches are used to optimize extracting a sign extended single 
bit
bitfield.  The length computation of 10 bytes was conservatively correct, 
but
inaccurate.

In particular when the bit we want is in the low half word we don't need the
move high half to low half instruction.  Account for that in the length
computation.

This was spotted when looking at regressions in the generalized signed 
bitfield
extraction pattern.

This has been regression tested on the H8 port.

gcc/
* config/h8300/combiner.md (single bit signed bitfield extraction): 
Fix
length computation when the bit we want is in the low half word.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index e1179b5fea6..cce187805c7 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1358,7 +1358,7 @@ (define_insn ""
  to get that bit into the destination, properly extended.  */
   return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
 }
-  [(set_attr "length" "10")])
+  [(set (attr "length") (symbol_ref "INTVAL (operands[2]) >= 16 ? 10 : 8"))])
 
 ;; For shift counts >= 16 we can always do better than the
 ;; generic sequences.  Other patterns handle smaller counts.


[committed] Fix length computation for logical shifts on H8

2023-12-10 Thread Jeff Law

This fixes the length computation for logical shifts on the H8/SX.

The H8/SX has a richer set of logical shifts compared to early parts in 
the H8 family.  It has special 2 byte instructions for shifts by power 
of two immediate values as well as a special 4 byte shift by other 
immediate values.


These were never accounted for (AFIACT) in the length computation for 
shifts.  Until now that's mostly just affected branch shortening.  But 
an upcoming patch uses instruction lengths to select between two 
potential sequences and getting these lengths wrong will cause it to 
miss optimization opportunities on the H8/SX.


A slightly different version of this patch has been tested on the H8 
without regressions.


Pushed to the trunk,
Jeff

commit 4ac358c619e364ad767242409765c178da9d83e0
Author: Jeff Law 
Date:   Sun Dec 10 09:32:55 2023 -0700

[committed] Fix length computation for logical shifts on H8

This fixes the length computation for logical shifts on the H8/SX.

The H8/SX has a richer set of logical shifts compared to early parts in the 
H8
family.  It has special 2 byte instructions for shifts by power of two
immediate values as well as a special 4 byte shift by other immediate 
values.

These were never accounted for (AFIACT) in the length computation for 
shifts.
Until now that's mostly just affected branch shortening.  But an upcoming 
patch
uses instruction lengths to select between two potential sequences and 
getting
these lengths wrong will cause it to miss optimization opportunities on the
H8/SX.

gcc
* config/h8300/h8300.cc (compute_a_shift_length): Fix computation
of logical shifts on the H8/SX.

diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index 5936cdca177..5f9bbc9793b 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -4299,6 +4299,11 @@ compute_a_shift_length (rtx operands[3], rtx_code code)
  /* Fall through.  */
 
case SHIFT_INLINE:
+ /* H8/SX has a richer set of logical shifts.  */
+ if (TARGET_H8300SX
+ && (code == ASHIFT || code == LSHIFTRT))
+   return (exact_log2 (n) >= 0) ? 2 : 4;
+
  n = info.remainder;
 
  if (info.shift2 != NULL)


Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-10 Thread Richard Biener



> Am 10.12.2023 um 12:21 schrieb Alexander Monakov :
> 
> 
> On Sun, 10 Dec 2023, Richard Biener wrote:
> 
>>> It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is
>>> expiring at that point as well, which a (pseudo-)destructor does not imply;
>>> it's perfectly valid to destroy an object and then create another in the
>>> same storage.
>>> 
>>> We probably do want another clobber kind for end of object lifetime. And/or
>>> one for beginning of object lifetime.
>> 
>> There’s not much semantically different between UNDEF and end of object but
>> not storage lifetime?  At least for what middle-end optimizations do.
>> 
>> EOL is used by stack slot sharing and that operates on the underlying 
>> storage,
>> not individual objects live in it.
> 
> I thought EOL implies that ASan may poison underlying memory. In the respin
> of the Valgrind interop patch we instrument CLOBBER_UNDEF, but not 
> CLOBBER_EOL.

EOL is like free (), while UNDEF is more
Like malloc ().

Richard 


> Alexander


Re: [ARC PATCH] Add *extvsi_n_0 define_insn_and_split for PR 110717.

2023-12-10 Thread Jeff Law




On 12/5/23 06:59, Roger Sayle wrote:


This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.


Compiling the following test case:

int foo(int x) { return (x<<27)>>27; }

with -O2 -mcpu=em, generates two loops:

foo:mov lp_count,27
 lp  2f
 add r0,r0,r0
 nop
2:  # end single insn loop
 mov lp_count,27
 lp  2f
 asr r0,r0
 nop
2:  # end single insn loop
 j_s [blink]


and the closely related test case:

struct S { int a : 5; };
int bar (struct S *p) { return p->a; }

generates the slightly better:

bar:ldb_s   r0,[r0]
 mov_s   r2,0;3
 add3r0,r2,r0
 sexb_s  r0,r0
 asr_s   r0,r0
 asr_s   r0,r0
 j_s.d   [blink]
 asr_s   r0,r0

which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"].  Using this, the sign
extensions above on ARC's EM both become:

 bmsk_s  r0,r0,4
 xor r0,r0,32
 sub r0,r0,32

which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-12-05  Roger Sayle  

gcc/ChangeLog
 * config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
 implement SImode sign extract using a AND, XOR and MINUS sequence.
Note with this sequence in place (assuming it moves forward on the ARC), 
you may be able to build a better generalized signed bitfield extraction.


Rather than a shift-left followed by an arithmetic shift right, you can 
instead do a logical shift right to get the field into the LSBs, then 
use the this pattern to implement the sign extension from the MSB of the 
field.


Given it saves a potentially very expensive shift, it may be worth 
exploring for the ARC.


I've done this for the H8.  Not every sequence is better, but many are. 
There's improvements that could be made, but we're probably capturing 
the vast majority of the benefit in the patch I'm currently testing.


Anyway just thought I'd point out the natural follow-on from this effort.

Jeff


[PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single.

2023-12-10 Thread xndcn
gcc/ChangeLog:

* tree-cfg.cc (verify_gimple_assign_single): Fix misleading error, from 
"invalid LHS ..." to "invalid RHS ..."

Signed-off-by: xndcn 
---
 gcc/tree-cfg.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index d784b9115..f041786b3 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4695,7 +4695,7 @@ verify_gimple_assign_single (gassign *stmt)
   if (!is_gimple_reg (lhs)
  && is_gimple_reg_type (TREE_TYPE (lhs)))
{
- error ("invalid RHS for gimple memory store: %qs", code_name);
+ error ("invalid LHS for gimple memory store: %qs", code_name);
  debug_generic_stmt (lhs);
  debug_generic_stmt (rhs1);
  return true;
@@ -4721,7 +4721,10 @@ verify_gimple_assign_single (gassign *stmt)
  && !is_gimple_reg (rhs1)
  && is_gimple_reg_type (TREE_TYPE (lhs)))
{
- error ("invalid RHS for gimple memory store: %qs", code_name);
+ if (!is_gimple_reg (rhs1))
+   error ("invalid RHS for gimple memory store: %qs", code_name);
+ else
+   error ("invalid LHS for gimple memory store: %qs", code_name);
  debug_generic_stmt (lhs);
  debug_generic_stmt (rhs1);
  return true;
-- 
2.25.1



[PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-10 Thread xndcn
Sorry about the attachment, re-paste here




Re: [PATCH v7 1/1] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-12-10 Thread waffl3x
On Friday, December 8th, 2023 at 12:25 PM, Jason Merrill  
wrote:
> 
> 
> On 12/6/23 02:33, waffl3x wrote:
> 
> > Here is the next version, it feels very close to finished. As before, I
> > haven't ran a bootstrap or the full testsuite yet but I did run the
> > explicit-obj tests which completed as expected.
> > 
> > There's a few test cases that still need to be written but more tests
> > can always be added. The behavior added by CWG2789 works in at least
> > one case, but I have not added tests for it yet. The test cases for
> > dependent lambda expressions need to be fleshed out more, but a few
> > temporary ones are included to demonstrate that they do work and that
> > the crash is fixed. Explicit object conversion functions work, but I
> > need to add fleshed out tests for them, explicit-obj-basic5.C has that
> > test.
> 
> > @@ -6586,6 +6586,17 @@ add_candidates (tree fns, tree first_arg, const 
> > vec args,
> > + / FIXME: I believe this will be bugged for xobj member functions,
> > + leaving this comment here to make sure we look into it
> > + at some point.
> > + Seeing this makes me want correspondence checking to be unified
> > + in one place though, not sure if this one needs to be different
> > + from other ones though.
> > + This function is only used here, but maybe we can use it in add
> > + method and move some of the logic out of there?
> 
> 
> fns_correspond absolutely needs updating to handle xob fns, and doing
> that by unifying it with add_method's calculation would be good.
> 
> > + Side note: CWG2586 might be relevant for this area in
> > + particular, perhaps we wait to see if it gets accepted first? */
> 
> 
> 2586 was accepted last year.
> 
> > @@ -12574,17 +12601,25 @@ cand_parms_match (z_candidate *c1, z_candidate c2)
> > fn1 = DECL_TEMPLATE_RESULT (t1);
> > fn2 = DECL_TEMPLATE_RESULT (t2);
> > }
> > + / The changes I made here might be stuff I was told not to worry about?
> > + I'm not really sure so I'm going to leave it in. */
> 
> 
> Good choice, this comment can go.
> 
> > tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
> > tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (fn2));
> > if (DECL_FUNCTION_MEMBER_P (fn1)
> > && DECL_FUNCTION_MEMBER_P (fn2)
> > - && (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn1)
> > - != DECL_NONSTATIC_MEMBER_FUNCTION_P (fn2)))
> > + && (DECL_STATIC_FUNCTION_P (fn1)
> > + != DECL_STATIC_FUNCTION_P (fn2)))
> > {
> > /* Ignore 'this' when comparing the parameters of a static member
> > function with those of a non-static one. */
> > - parms1 = skip_artificial_parms_for (fn1, parms1);
> > - parms2 = skip_artificial_parms_for (fn2, parms2);
> > + auto skip_parms = [](tree fn, tree parms){
> > + if (DECL_XOBJ_MEMBER_FUNCTION_P (fn))
> > + return TREE_CHAIN (parms);
> > + else
> > + return skip_artificial_parms_for (fn, parms);
> > + };
> > + parms1 = skip_parms (fn1, parms1);
> > + parms2 = skip_parms (fn2, parms2);
> > }
> 
> 
> https://cplusplus.github.io/CWG/issues/2789.html fixes the handling of
> xobj fns here.

I have a minor concern here.
https://eel.is/c++draft/basic.scope.scope#3

Is it okay that CWG2789 has separate rules than the rules for
declarations? As far as I know there's nothing else that describes how
to handle the object parameters so it was my assumption that the rules
here are also used for that. I know at least one person has disagreed
with me about that so I'm not sure what is actually correct.

template 
struct S {
  constexpr void f() {}   // #f1
  constexpr void f(this S&&) requires true {} // #f2

  constexpr void g() requires true {} // #g1
  constexpr void g(this S&&) {}   // #g2
};

void test() {
  S<>{}.f(); // calls #?
  S<>{}.g(); // calls #?
}

But with the wording proposed by CWG2789, wouldn't this example would
be a problem? If we follow the standard to the letter, constraints
can't be applied here right?

I wouldn't be surprised if I'm missing something but I figured I ought
to raise it just in case. Obviously it should call #f2 and #g1 but I'm
pretty sure the current wording only allows these calls to be ambiguous.

> Your change does the right thing for comparing static and xobj, but
> doesn't handle comparing iobj and xobj; I think we want to share
> parameter comparison code with fns_correspond/add_method. Maybe
> parms_correspond?

Yeah, I'll see what I can put together. The only issue being what I
noted above, I'm not sure the rules are actually the same. I think they
should be, but my reading of things seems like it's not right now.

For the time being I'm going to assume things should work the way I
want them to, because I don't think the example I presented above
should be ambiguous.

> > @@ -8727,21 +8882,42 @@ resolve_address_of_overloaded_function (tree 
> > target_type,
> > /* Good, exactly one match. Now, convert it to the correct type. */
> > fn = TREE_PURPOSE (matches);
> > 
> > - if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)
> > - && !(complain & tf_p

[PATCH] tree-cfg: Fix misleading error message in verify_gimple_assign_single

2023-12-10 Thread xndcn
Hi, I am a newbie in GCC, and I do not have access to git repo.

I found some misleading error messages in verify_gimple_assign_single
function of tree-cfg.cc. It prompt error "invalid RHS for gimple memory
store: ", but it checks lhs in fact.
From 72fc4abe635c3e35ac31457aeba92b528f0574fe Mon Sep 17 00:00:00 2001
From: xndcn 
Date: Sun, 10 Dec 2023 22:38:16 +0800
Subject: [PATCH] tree-cfg: Fix misleading error message in
 verify_gimple_assign_single.

gcc/ChangeLog:

	* tree-cfg.cc (verify_gimple_assign_single): Fix misleading error, from "invalid LHS ..." to "invalid RHS ..."

Signed-off-by: xndcn 
---
 gcc/tree-cfg.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index d784b9115..f041786b3 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -4695,7 +4695,7 @@ verify_gimple_assign_single (gassign *stmt)
   if (!is_gimple_reg (lhs)
 	  && is_gimple_reg_type (TREE_TYPE (lhs)))
 	{
-	  error ("invalid RHS for gimple memory store: %qs", code_name);
+	  error ("invalid LHS for gimple memory store: %qs", code_name);
 	  debug_generic_stmt (lhs);
 	  debug_generic_stmt (rhs1);
 	  return true;
@@ -4721,7 +4721,10 @@ verify_gimple_assign_single (gassign *stmt)
 	  && !is_gimple_reg (rhs1)
 	  && is_gimple_reg_type (TREE_TYPE (lhs)))
 	{
-	  error ("invalid RHS for gimple memory store: %qs", code_name);
+	  if (!is_gimple_reg (rhs1))
+	error ("invalid RHS for gimple memory store: %qs", code_name);
+	  else
+	error ("invalid LHS for gimple memory store: %qs", code_name);
 	  debug_generic_stmt (lhs);
 	  debug_generic_stmt (rhs1);
 	  return true;
-- 
2.25.1



[PING] PR112380: Defend against CLOBBERs in RTX expressions in combine.cc

2023-12-10 Thread Roger Sayle


I'd like to ping my patch for PR rtl-optimization/112380.
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636203.html

For those unfamiliar with the (clobber (const_int 0)) idiom used by
combine, I'll explain a little of the ancient history...

Back before time, in the prehistory of git/subversion/cvs or even
ChangeLogs,
in March 1987 to be precise, Richard Stallman's GCC version 0.9, had RTL
optimization passes similar to those in use today.  This far back, combine.c
contained the function gen_lowpart_for_combine, which was documented as
"Like gen_lowpart but for use in combine" where "it is not possible to
create any new pseudoregs." and "return zero if we don't see a way to
make a lowpart.".  And indeed, this function returned (rtx)0, and the
single caller of gen_lowpart_for_combine checked whether the return value
was non-zero.

Unfortunately, gcc 0.9's combine also contained bugs; At three places in
combine.c, it called gen_lowpart, the first of these looked like:
  return gen_rtx (AND, GET_MODE (x),
  gen_lowpart (GET_MODE (x), XEXP (to, 0)),
  XEXP (to, 1));

Time passes, and by version 1.21 in May 1988 (in fact before the
earliest ChangeLogs were introduced for version 1.17 in January 1988),
this issue had been identified, and a helpful reminder placed at the
top of the code:

/* It is not safe to use ordinary gen_lowpart in combine.
   Use gen_lowpart_for_combine instead.  See comments there.  */
#define gen_lowpart dont_use_gen_lowpart_you_dummy

However, to save a little effort, and avoid checking the return value
for validity at all of the callers of gen_lowpart_for_combine, RMS
invented the "(clobber (const_int 0))" idiom, which was returned
instead of zero.  The comment above gen_lowpart_for_combine was
modified to state:

/* If for some reason this cannot do its job, an rtx
   (clobber (const_int 0)) is returned.
   An insn containing that will not be recognized.  */

Aside: Around this time Bjarne Stroustrup was also trying to avoid
testing function return values for validitity, so introduced exceptions
into C++.

Thirty five years later this decision (short-cut) still haunts combine.
Using "(clobber (const_int 0))", like error_mark_node, that can appear
anywhere in a RTX expression makes it hard to impose strict typing (to
catch things like a CLOBBER of a CLOBBER) and as shown by bugzilla's
PR rtl-optimization/112380 these RTX occasionally escape from combine
to cause problems in generic RTL handling functions.

This patch doesn't eliminate combine.cc's such of (clobber (const_int 0)),
we still allocate memory to indicate exceptional conditions, and require
the garbage collector to clean things up, but testing the values returned
from functions for errors/exceptions is good software engineering, and
hopefully a step in the right direction.  I'd hoped allowing combine to
continue exploring alternate simplifications would also lead to better
code generation, but I've not been able to find any examples on x86_64.

This patch has been retested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-11-12  Roger Sayle  

gcc/ChangeLog
PR rtl-optimization/112380
* combine.cc (find_split_point): Check if gen_lowpart returned
a CLOBBER.
(subst): Check if combine_simplify_rtx returned a CLOBBER.
(simplify_set): Check if force_to_mode returned a CLOBBER.
Check if gen_lowpart returned a CLOBBER.
(expand_field_assignment): Likewise.
(make_extraction): Check if force_to_mode returned a CLOBBER.
(force_int_to_mode): Likewise.
(simplify_and_const_int_1): Check if VAROP is a CLOBBER, after
call to force_to_mode (and before).
(simplify_comparison): Check if force_to_mode returned a CLOBBER.
Check if gen_lowpart returned a CLOBBER.

gcc/testsuite/ChangeLog
PR rtl-optimization/112380
* gcc.dg/pr112380.c: New test case.


Thanks in advance,
Roger
--




Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

2023-12-10 Thread Richard Sandiford
Thanks for the patch and sorry for the slow review.

I can only comment on the usage of SVE, rather than on the scaffolding
around it.  Hopefully Jonathan or others can comment on the rest.

The main thing that worries me is:

  #if _GLIBCXX_SIMD_HAVE_SVE
  constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8;
  #else
  constexpr inline int __sve_vectorized_size_bytes = 0;
  #endif

Although -msve-vector-bits is currently a per-TU setting, that isn't
necessarily going to be the case in future.  Ideally it would be
possible to define different implementations of a function with
different (fixed) vector lengths within the same TU.  The value at
the point that the header file is included is then somewhat arbitrary.

So rather than have:

>  using __neon128 = _Neon<16>;
>  using __neon64 = _Neon<8>;
> +using __sve = _Sve<>;

would it be possible instead to have:

  using __sve128 = _Sve<128>;
  using __sve256 = _Sve<256>;
  ...etc...

?  Code specialised for 128-bit vectors could then use __sve128 and
code specialised for 256-bit vectors could use __sve256.

Perhaps that's not possible as things stand, but it'd be interesting
to know the exact failure mode if so.  Either way, it would be good to
remove direct uses of __ARM_FEATURE_SVE_BITS from simd_sve.h if possible,
and instead rely on information derived from template parameters.

It should be possible to use SVE to optimise some of the __neon*
implementations, which has the advantage of working even for VLA SVE.
That's obviously a separate patch, though.  Just saying for the record.

On:

inline static __sve_bool_type
__sve_active_mask()
{ return svwhilelt_b8(int8_t(0), int8_t(_Np)); };

I think it'd be more canonical to use svptrue_b8().  (The inputs to
svwhilelt aren't naturally tied to the size of the output elements.
This particular call will use WHILELE with 32-bit registers.  So if
you do continue to use svwhile*, it'd be good to remove the casts on
the inputs.)

It'd also be good to use svptrue_b8() for all element sizes where possible,
and only use element-specific masks where absolutely necesary.  This helps
to premote reuse of predicates.

Or does the API require that every mask value is internally "canonical",
i.e.  every element must be 0 or 1, regardless of the element size?
If so, that could introduce some inefficiency on SVE, where upper bits
are (by design) usually ignored.

On:

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

  template 
struct __sve_vector_type
: __sve_vector_type<__get_sve_value_type_t, _Np>
{};

Does this ensure complete coverage without duplicate definitions?  It seems
to be hard-coding a specific choice of stdint.h type (e.g. by including
char but not unsigned char, and long long int but not long int).
Unfortunately the stdint.h definitions vary between newlib and glibc,
for example.

inline static type
__sve_mask_first_true()
{ return svptrue_pat_b8(SV_VL1); }

SV_VL1 produces the same predicate result for all element widths,
so I don't think this needs to be parameterised.

  template <>
struct __sve_mask_type<8>
{
  ...

  inline static void
  called()
  {}

What's the called() function for?

  template 
_GLIBCXX_SIMD_INTRINSIC constexpr auto
__sve_reinterpret_cast(_From __v)
{
  if constexpr (std::is_same_v<_To, int32_t>)
return svreinterpret_s32(__v);
  else if constexpr (std::is_same_v<_To, int64_t>)
return svreinterpret_s64(__v);
  else if constexpr (std::is_same_v<_To, float32_t>)
return svreinterpret_f32(__v);
  else if constexpr (std::is_same_v<_To, float64_t>)
return svreinterpret_f64(__v);
}

Why does this only need to handle these 4 types?  Think it'd be
worth a comment.

  template 
struct _SveMaskWrapper
{
  ...

  _GLIBCXX_SIMD_INTRINSIC constexpr value_type
  operator[](size_t __i) const
  {
return _BuiltinSveMaskType::__sve_mask_active_count(
_BuiltinSveVectorType::__sve_active_mask(),
svand_z(_BuiltinSveVectorType::__sve_active_mask(),
svcmpeq(_BuiltinSveVectorType::__sve_active_mask(),
_BuiltinSveMaskType::__index0123,
typename 
_BuiltinSveMaskType::__sve_mask_uint_type(__i)),
_M_data));
  }

A simpler way to do this might be to use svdup_u_z (_M_data, 1, 0)
and then svorrv on the result.

  struct _CommonImplSve
  {

Re: [PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-10 Thread Richard Sandiford
Victor Do Nascimento  writes:
> In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed
> long.  This means that when the `arm_neon.h' header is used by the
> kernel, any use of the `uint64_t' / `in64_t' types needs to be
> correctly cast to the correct `__builtin_aarch64_simd_di' /
> `__builtin_aarch64_simd_df' types when calling the relevant ACLE
> builtins.
>
> This patch adds the necessary fixes to ensure that `vstl1_*' and
> `vldap1_*' intrinsics are correctly defined for use by the kernel.

The patch is OK, but I think it's only a workaround.  The compiler
has its own idea of what the stdint.h types are, with the choice
being guided by the runtime (so glibc for *-linux-gnu).  GCC provides
its own implementation of stdint.h that conforms to the internal
expectations.

If linux defines the types to something else than other things are
likely to break.  E.g. the same sort of issue would be seen if linux
ever wants to use arm_sve.h, and there'll be no simple workaround
for that case.

The types that GCC expects are available as __INT8_TYPE__ etc.
I think linux ACLE code should try to use those (typedefed to
prettier names), and handle the difference from linux's types
at API boundaries.

But the patch is still OK.  Good catch on the stray "const" in
vstl1_lane_u64 btw.

Thanks,
Richard

>
> gcc/ChangeLog:
>
>   * config/aarch64/arm_neon.h (vldap1_lane_u64): Add
>   `const' to `__builtin_aarch64_simd_di *' cast.
>   (vldap1q_lane_u64): Likewise.
>   (vldap1_lane_s64): Cast __src to `const __builtin_aarch64_simd_di *'.
>   (vldap1q_lane_s64): Likewise.
>   (vldap1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
>   (vldap1q_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
>   (vldap1_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
>   (vldap1q_lane_p64): Add `const' to `__builtin_aarch64_simd_di *' cast.
>   (vstl1_lane_u64): remove stray `const'.
>   (vstl1_lane_s64): Cast __src to `__builtin_aarch64_simd_di *'.
>   (vstl1q_lane_s64): Likewise.
>   (vstl1_lane_f64): Cast __src to `const __builtin_aarch64_simd_df *'.
>   (vstl1q_lane_f64): Likewise.
> ---
>  gcc/config/aarch64/arm_neon.h | 34 +-
>  1 file changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index ef0d75e07ce..f394de595f7 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -13456,7 +13456,7 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
> __artificial__))
>  vldap1_lane_u64 (const uint64_t *__src, uint64x1_t __vec, const int __lane)
>  {
>return __builtin_aarch64_vec_ldap1_lanev1di_usus (
> -   (__builtin_aarch64_simd_di *) __src, __vec, __lane);
> +   (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline uint64x2_t
> @@ -13464,35 +13464,39 @@ __attribute__ ((__always_inline__, __gnu_inline__, 
> __artificial__))
>  vldap1q_lane_u64 (const uint64_t *__src, uint64x2_t __vec, const int __lane)
>  {
>return __builtin_aarch64_vec_ldap1_lanev2di_usus (
> -   (__builtin_aarch64_simd_di *) __src, __vec, __lane);
> +   (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline int64x1_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vldap1_lane_s64 (const int64_t *__src, int64x1_t __vec, const int __lane)
>  {
> -  return __builtin_aarch64_vec_ldap1_lanev1di (__src, __vec, __lane);
> +  return __builtin_aarch64_vec_ldap1_lanev1di (
> +   (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline int64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vldap1q_lane_s64 (const int64_t *__src, int64x2_t __vec, const int __lane)
>  {
> -  return __builtin_aarch64_vec_ldap1_lanev2di (__src, __vec, __lane);
> +  return __builtin_aarch64_vec_ldap1_lanev2di (
> +   (const __builtin_aarch64_simd_di *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline float64x1_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vldap1_lane_f64 (const float64_t *__src, float64x1_t __vec, const int __lane)
>  {
> -  return __builtin_aarch64_vec_ldap1_lanev1df (__src, __vec, __lane);
> +  return __builtin_aarch64_vec_ldap1_lanev1df (
> +   (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline float64x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vldap1q_lane_f64 (const float64_t *__src, float64x2_t __vec, const int 
> __lane)
>  {
> -  return __builtin_aarch64_vec_ldap1_lanev2df (__src, __vec, __lane);
> +  return __builtin_aarch64_vec_ldap1_lanev2df (
> +   (const __builtin_aarch64_simd_df *) __src, __vec, __lane);
>  }
>  
>  __extension__ extern __inline poly64x1_t
> @@ -13500,7 +1350

Re: [PATCH v5] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-12-10 Thread Jonathan Wakely
On Sat, 25 Nov 2023 at 13:52, Cassio Neri wrote:
>
> The following invoke signed integer overflow (UB) [1]:
>
>   month   + months{MAX} // where MAX is the maximum value of months::rep
>   month   + months{MIN} // where MIN is the maximum value of months::rep
>   month   - months{MIN} // where MIN is the minimum value of months::rep
>   weekday + days  {MAX} // where MAX is the maximum value of days::rep
>   weekday - days  {MIN} // where MIN is the minimum value of days::rep
>
> For the additions to MAX, the crux of the problem is that, in libstdc++,
> months::rep and days::rep are int64_t. Other implementations use int32_t, cast
> operands to int64_t and perform arithmetic operations without risk of
> overflowing.
>
> For month + months{MIN}, the implementation follows the Standard's "returns
> clause" and evaluates:
>
>modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);
>
> Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
> but, unfortunately again, this is not possible for libstdc++.
>
> For the subtraction of MIN, the problem is that -MIN is not representable.
>
> It's fair to say that the intention is for these additions/subtractions to
> be performed in modulus (12 or 7) arithmetic so that no overflow is expected.
>
> To fix these UB, this patch implements:
>
>   template 
>   unsigned __add_modulo(unsigned __x, _T __y);
>
>   template 
>   unsigned __sub_modulo(unsigned __x, _T __y);
>
> which respectively, returns the remainder of Euclidean division of, __x + __y
> and __x - __y by __d without overflowing. These functions replace
>
>   constexpr unsigned __modulo(long long __n, unsigned __d);
>
> which also calculates the reminder of __n, where __n is the result of the
> addition or subtraction. Hence, these operations might invoke UB before 
> __modulo
> is called and thus, __modulo can't do anything to remediate the issue.
>
> In addition to solve the UB issues, __add_modulo and __sub_modulo allow better
> codegen (shorter and branchless) on x86-64 and ARM [2].
>
> [1] https://godbolt.org/z/a9YfWdn57
> [2] https://godbolt.org/z/Gh36cr7E4
>
> libstdc++-v3/ChangeLog:
>
> * include/std/chrono: Fix + and - for months and weekdays.
> * testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
> * testsuite/std/time/month/2.cc: New test for extreme values.
> * testsuite/std/time/weekday/1.cc: Add constexpr tests against 
> overflow.
> * testsuite/std/time/weekday/2.cc: New test for extreme values.
> ---
> Good for trunk?
>
> Changes with respect to previous versions:
>   v5: Fix typos in commit message.
>   v4: Remove UB also from operator-.
>   v3: Fix screwed up email send with v2.
>   v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from test.
>
>  libstdc++-v3/include/std/chrono  | 79 +---
>  libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
>  libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
>  libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
>  libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
>  5 files changed, 151 insertions(+), 27 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
>  create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc
>
> diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
> index e4ba6eafceb..1b7cdb96e1c 100644
> --- a/libstdc++-v3/include/std/chrono
> +++ b/libstdc++-v3/include/std/chrono
> @@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  namespace __detail
>  {
> -  // Compute the remainder of the Euclidean division of __n divided by 
> __d.
> -  // Euclidean division truncates toward negative infinity and always
> -  // produces a remainder in the range of [0,__d-1] (whereas standard
> -  // division truncates toward zero and yields a nonpositive remainder
> -  // for negative __n).
> +  // Helper to __add_modulo and __sub_modulo.
> +  template 
> +  constexpr auto

I think this can be consteval. We only use it in C++20 code.

Otherwise this looks good, so I'll make that change and push to trunk. Thanks.

P.S. for some targets we *could* do the arithmetic using 128-bit
integers, but we'd still need this code for targets that don't support
that type. So I think to keep things simpler we might as well just use
this code everywhere. If using 128-bit integers would produce
significantly better codegen we can consider that later.


> +  __modulo_offset()
> +  {
> +   using _Up = make_unsigned_t<_Tp>;
> +   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
> +   auto constexpr __b = _Up(__d * (__a / __d) - 1);
> +   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
> +   return _Up(-1) - __b; // >= 255 + d - 1
> +  }
> +
> +  // Compute the remainder of the Euclidean division of __x + __y 
> divided by
> +  // __d without overflowing.  Ty

[PING, PATCH v5] libstdc++: Remove UB from month and weekday additions and subtractions.

2023-12-10 Thread Cassio Neri
The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementation follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intention is for these additions/subtractions to
be performed in modulus (12 or 7) arithmetic so that no overflow is expected.

To fix these UB, this patch implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

  template 
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + __y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before __modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allow better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

Ping.

Good for trunk?

Changes with respect to previous versions:
  v5: Fix typos in commit message.
  v4: Remove UB also from operator-.
  v3: Fix screwed up email send with v2.
  v2: Replace _T with _Tp and _U with _Up. Remove copyright+license from test.

 libstdc++-v3/include/std/chrono  | 79 +---
 libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +
 libstdc++-v3/testsuite/std/time/month/2.cc   | 32 
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +++-
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 
 5 files changed, 151 insertions(+), 27 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index e4ba6eafceb..1b7cdb96e1c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -501,18 +501,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Helper to __add_modulo and __sub_modulo.
+  template 
+  constexpr auto
+  __modulo_offset()
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
+   return _Up(-1) - __b; // >= 255 + d - 1
+  }
+
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month with a shift in [0, d - 1] and __y is a duration count.
+  template 
+  constexpr unsigned
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+   // values than _Tp's.  For __y < 0, _Up(__y) has a mathematical value in
+   // the upper-half range of _Up so that adding a positive value to it
+   // might overflow.  Moreover, most likely, _Up(__y

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-10 Thread Alexander Monakov

On Sun, 10 Dec 2023, Richard Biener wrote:

> > It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is
> > expiring at that point as well, which a (pseudo-)destructor does not imply;
> > it's perfectly valid to destroy an object and then create another in the
> > same storage.
> > 
> > We probably do want another clobber kind for end of object lifetime. And/or
> > one for beginning of object lifetime.
> 
> There’s not much semantically different between UNDEF and end of object but
> not storage lifetime?  At least for what middle-end optimizations do.
> 
> EOL is used by stack slot sharing and that operates on the underlying storage,
> not individual objects live in it.

I thought EOL implies that ASan may poison underlying memory. In the respin
of the Valgrind interop patch we instrument CLOBBER_UNDEF, but not CLOBBER_EOL.

Alexander

Re: [PATCH] c++: End lifetime of objects in constexpr after destructor call [PR71093]

2023-12-10 Thread Richard Biener



> Am 09.12.2023 um 21:13 schrieb Jason Merrill :
> 
> On 11/2/23 21:18, Nathaniel Shead wrote:
>> Bootstrapped and regtested on x86-64_pc_linux_gnu.
>> I'm not entirely sure if the change I made to have destructors clobber with
>> CLOBBER_EOL instead of CLOBBER_UNDEF is appropriate, but nothing seemed to 
>> have
>> broken by doing this and I wasn't able to find anything else that really
>> depended on this distinction other than a warning pass. Otherwise I could
>> experiment with a new clobber kind for destructor calls.
> 
> It seems wrong to me: CLOBBER_EOL is documented to mean that the storage is 
> expiring at that point as well, which a (pseudo-)destructor does not imply; 
> it's perfectly valid to destroy an object and then create another in the same 
> storage.
> 
> We probably do want another clobber kind for end of object lifetime. And/or 
> one for beginning of object lifetime.

There’s not much semantically different between UNDEF and end of object but not 
storage lifetime?  At least for what middle-end optimizations do.

EOL is used by stack slot sharing and that operates on the underlying storage, 
not individual objects live in it.

Richard 

> Jason
> 


Re: [PATCH] expr: catch more `a*bool` while expanding [PR 112935]

2023-12-10 Thread Xi Ruoyao
On Sun, 2023-12-10 at 01:21 -0800, Andrew Pinski wrote:
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 6da51f2aca2..4686cacd22f 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -10209,8 +10209,9 @@ expand_expr_real_2 (sepops ops, rtx target,
> machine_mode tmode,
>    /* Expand X*Y as X&-Y when Y must be zero or one.  */
>    if (SCALAR_INT_MODE_P (mode))
>   {
> -   bool bit0_p = tree_nonzero_bits (treeop0) == 1;
> -   bool bit1_p = tree_nonzero_bits (treeop1) == 1;
> +   bool gimple_zero_one_valued_p (tree, tree (*)(tree));

Should we declare this in the file scope instead?

> +   bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr);
> +   bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr);

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH] expr: catch more `a*bool` while expanding [PR 112935]

2023-12-10 Thread Andrew Pinski
After r14-1655-g52c92fb3f40050 (and the other commits
which touch zero_one_valued_p), we end up with a with
`bool * a` but where the bool is an SSA name that might not
have non-zero bits set on it (to 0x1) even though it
does the non-zero bits would be 0x1.
The case of coremarks, it is only phiopt4 which adds the new
ssa name and nothing afterwards updates the nonzero bits on it.
This fixes the regression by using gimple_zero_one_valued_p
rather than tree_nonzero_bits to match the cases where the
SSA_NAME didn't have the non-zero bits set.
gimple_zero_one_valued_p handles one level of cast and also
and an `&`.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR middle-end/112935
* expr.cc (expand_expr_real_2): Use
gimple_zero_one_valued_p instead of tree_nonzero_bits
to find boolean defined expressions.

Signed-off-by: Andrew Pinski 
---
 gcc/expr.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 6da51f2aca2..4686cacd22f 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10209,8 +10209,9 @@ expand_expr_real_2 (sepops ops, rtx target, 
machine_mode tmode,
   /* Expand X*Y as X&-Y when Y must be zero or one.  */
   if (SCALAR_INT_MODE_P (mode))
{
- bool bit0_p = tree_nonzero_bits (treeop0) == 1;
- bool bit1_p = tree_nonzero_bits (treeop1) == 1;
+ bool gimple_zero_one_valued_p (tree, tree (*)(tree));
+ bool bit0_p = gimple_zero_one_valued_p (treeop0, nullptr);
+ bool bit1_p = gimple_zero_one_valued_p (treeop1, nullptr);
 
  /* Expand X*Y as X&Y when both X and Y must be zero or one.  */
  if (bit0_p && bit1_p)
-- 
2.39.3



Re: [PATCH] strub: skip emutls after strubm errors

2023-12-10 Thread FX Coudert
> Yes, GCC/nvptx ICEs gone with that, thanks!

And on darwin as well, test results are back to the same state as before. 
Thanks!

FX

[PATCH] aarch64: Fix wrong code for bfloat when f16 is enabled [PR 111867]

2023-12-10 Thread Andrew Pinski
The problem here is when f16 is enabled, movbf_aarch64 accepts `Ufc`
as a constraint:
 [ w, Ufc ; fconsts , fp16  ] fmov\t%h0, %1
But that is for fmov values and in this case fmov represents f16 rather than 
bfloat16 values.
This means we would get the wrong value in the register.

Built and tested for aarch64-linux-gnu with no regressions.  Also tested with 
`-march=armv9-a+sve2,
gcc.dg/torture/bfloat16-basic.c and gcc.dg/torture/bfloat16-builtin.c no longer 
fail.

gcc/ChangeLog:

PR target/111867
* config/aarch64/aarch64.cc (aarch64_float_const_representable_p): For 
BFmode,
only accept +0.0.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5cffdabc62e..d48f5a1ba4b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23904,6 +23904,7 @@ aarch64_float_const_representable_p (rtx x)
 
   r = *CONST_DOUBLE_REAL_VALUE (x);
 
+
   /* We cannot represent infinities, NaNs or +/-zero.  We won't
  know if we have +zero until we analyse the mantissa, but we
  can reject the other invalid values.  */
@@ -23911,6 +23912,10 @@ aarch64_float_const_representable_p (rtx x)
   || REAL_VALUE_MINUS_ZERO (r))
 return false;
 
+  /* For BFmode, only handle 0.0. */
+  if (GET_MODE (x) == BFmode)
+return real_iszero (&r, false);
+
   /* Extract exponent.  */
   r = real_value_abs (&r);
   exponent = REAL_EXP (&r);
-- 
2.39.3