Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-08 Thread Kewen.Lin
Hi,

on 2024/5/8 14:47, HAO CHEN GUI wrote:
> Hi,
>   This patch enables overlapped by-piece operations. On rs6000, default
> move/set/clear ratio is 2. So the overlap is only enabled with compare
> by-pieces.

Thanks for enabling this, did you evaluate if it can help some benchmark?

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable overlapped by-pieces operations
> 
> This patch enables overlapped by-piece operations by defining
> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
> ratio is 2.  So the overlap is only enabled with compare by-pieces.
> 
> gcc/
>   * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-9.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6b9a40fcc66..2b5f5cf1d86 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
> rs6000_attribute_table[] =
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x8000
> 
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
>  
> 
>  /* Processor table.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> new file mode 100644
> index 000..b5f51affbb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does it need power8 forced here?

BR,
Kewen

> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
> +
> +/* Test if by-piece overlap compare is enabled and following case is
> +   implemented by two overlap word loads and compares.  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 7) == 0;
> +}



Re: [PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]g

2024-05-08 Thread Kewen.Lin
Hi,

on 2024/5/9 06:01, Steve Kargl wrote:
> On Wed, May 08, 2024 at 01:27:53PM +0800, Kewen.Lin wrote:
>>
>> Previously effective target fortran_real_c_float128 never
>> passes on Power regardless of the default 128 long double
>> is ibmlongdouble or ieeelongdouble.  It's due to that TF
>> mode is always used for kind 16 real, which has precision
>> 127, while the node float128_type_node for c_float128 has
>> 128 type precision, get_real_kind_from_node can't find a
>> matching as it only checks gfc_real_kinds[i].mode_precision
>> and type precision.
>>
>> With changing TFmode/IFmode/KFmode to have the same mode
>> precision 128, now fortran_real_c_float12 can pass with
>> ieeelongdouble enabled by default and test cases guarded
>> with it get tested accordingly.  But with ibmlongdouble
>> enabled by default, since TFmode has precision 128 which
>> is the same as type precision 128 of float128_type_node,
>> get_real_kind_from_node considers kind for TFmode matches
>> float128_type_node, but it's wrong as at this time point
>> TFmode is with ibm extended format.  So this patch is to
>> teach get_real_kind_from_node to check one more field which
>> can be differentiable from the underlying real format, it
>> can avoid the unexpected matching when there more than one
>> modes have the same precision.
>>
>> Bootstrapped and regress-tested on:
>>   - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
>>   - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
>>   - powerpc64le-linux-gnu P9 (with ieee128 by default)
>>
>> BR,
>> Kewen
>> -
>>  PR target/112993
>>

> OK from the fortran point of view.
> Thanks.

> 
> First, I have no issue with Mikael's OK for committing the
> patch. 

Thanks to both!

> 
> That said, Fortran has the concept of model numbers, which
> are set in arith.c.  Does this change give the expected 
> value for ibm128?  For example, with "REAL(16) X", one
> has "DIGITS(X) = 113", which is the precision on the 
> of the underlying IEEE754 binary128 type.
> 

With some testings locally, I noticed that currently DIGITS has
been already correct even without this change.  For "REAL(16) X",
with -mabi=ibmlongdouble it's long double with ibm128 format and
its DIGITS(X) is 106, while with -mabi=ieeelongdouble it's long
double with ieee128 format and its DIGITS(X) is 113.

BR,
Kewen



Re: [PATCH] [ranger] Force buffer alignment in Value_Range [PR114912]

2024-05-08 Thread Aldy Hernandez
Pushed to trunk to unblock sparc.


On Fri, May 3, 2024 at 4:24 PM Aldy Hernandez  wrote:
>
> Ahh, that is indeed cleaner, and there's no longer a need to assert
> the sizeof of individual ranges.
>
> It looks like a default constructor is needed for the buffer now, but
> only for the default constructor of Value_Range.
>
> I have verified that the individual range constructors are not called
> on initialization to Value_Range, which was the original point of the
> patch.  I have also run our performance suite, and there are no
> changes to VRP or overall.
>
> I would appreciate a review from someone more C++ savvy than me :).
>
> OK for trunk?
>
> On Fri, May 3, 2024 at 11:32 AM Andrew Pinski  wrote:
> >
> > On Fri, May 3, 2024 at 2:24 AM Aldy Hernandez  wrote:
> > >
> > > Sparc requires strict alignment and is choking on the byte vector in
> > > Value_Range.  Is this the right approach, or is there a more canonical
> > > way of forcing alignment?
> >
> > I think the suggestion was to change over to use an union and use the
> > types directly in the union (anonymous unions and unions containing
> > non-PODs are part of C++11).
> > That is:
> > union {
> >   int_range_max int_range;
> >   frange fload_range;
> >   unsupported_range un_range;
> > };
> > ...
> > m_vrange = new (_range) int_range_max ();
> > ...
> >
> > Also the canonical way of forcing alignment in C++ is to use aliagnas
> > as my patch in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114912
> > did.
> > Also I suspect the alignment is not word alignment but rather the
> > alignment of HOST_WIDE_INT which is not always the same as the
> > alignment of the pointer but bigger and that is why it is failing on
> > sparc (32bit rather than 64bit).
> >
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > If this is correct, OK for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * value-range.h (class Value_Range): Use a union.
> > > ---
> > >  gcc/value-range.h | 24 +++-
> > >  1 file changed, 15 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 934eec9e386..31af7888018 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -740,9 +740,14 @@ private:
> > >void init (const vrange &);
> > >
> > >vrange *m_vrange;
> > > -  // The buffer must be at least the size of the largest range.
> > > -  static_assert (sizeof (int_range_max) > sizeof (frange), "");
> > > -  char m_buffer[sizeof (int_range_max)];
> > > +  union {
> > > +// The buffer must be at least the size of the largest range, and
> > > +// be aligned on a word boundary for strict alignment targets
> > > +// such as sparc.
> > > +static_assert (sizeof (int_range_max) > sizeof (frange), "");
> > > +char m_buffer[sizeof (int_range_max)];
> > > +void *align;
> > > +  } u;
> > >  };
> > >
> > >  // The default constructor is uninitialized and must be initialized
> > > @@ -816,11 +821,11 @@ Value_Range::init (tree type)
> > >gcc_checking_assert (TYPE_P (type));
> > >
> > >if (irange::supports_p (type))
> > > -m_vrange = new (_buffer) int_range_max ();
> > > +m_vrange = new (_buffer) int_range_max ();
> > >else if (frange::supports_p (type))
> > > -m_vrange = new (_buffer) frange ();
> > > +m_vrange = new (_buffer) frange ();
> > >else
> > > -m_vrange = new (_buffer) unsupported_range ();
> > > +m_vrange = new (_buffer) unsupported_range ();
> > >  }
> > >
> > >  // Initialize object with a copy of R.
> > > @@ -829,11 +834,12 @@ inline void
> > >  Value_Range::init (const vrange )
> > >  {
> > >if (is_a  (r))
> > > -m_vrange = new (_buffer) int_range_max (as_a  (r));
> > > +m_vrange = new (_buffer) int_range_max (as_a  (r));
> > >else if (is_a  (r))
> > > -m_vrange = new (_buffer) frange (as_a  (r));
> > > +m_vrange = new (_buffer) frange (as_a  (r));
> > >else
> > > -m_vrange = new (_buffer) unsupported_range (as_a 
> > >  (r));
> > > +m_vrange
> > > +  = new (_buffer) unsupported_range (as_a  
> > > (r));
> > >  }
> > >
> > >  // Assignment operator.  Copying incompatible types is allowed.  That
> > > --
> > > 2.44.0
> > >
> >



[COMMITTED] [prange] Reword dispatch error message [PR114985]

2024-05-08 Thread Aldy Hernandez
After reading the ICE for the PR, it's obvious the error message is
rather cryptic.  This makes it less so.

gcc/ChangeLog:

* range-op.cc (range_op_handler::discriminator_fail): Reword error
message.
---
 gcc/range-op.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 65f3843227d..a134af68141 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -207,7 +207,8 @@ range_op_handler::discriminator_fail (const vrange ,
   gcc_checking_assert (r1.m_discriminator < sizeof (name) - 1);
   gcc_checking_assert (r2.m_discriminator < sizeof (name) - 1);
   gcc_checking_assert (r3.m_discriminator < sizeof (name) - 1);
-  fprintf (stderr, "DISCRIMINATOR FAIL.  Dispatch > RO_%c%c%c <\n",
+  fprintf (stderr,
+  "Unsupported operand combination in dispatch: RO_%c%c%c\n",
   name[r1.m_discriminator],
   name[r2.m_discriminator],
   name[r3.m_discriminator]);
-- 
2.45.0



Re: [PATCH] i386: Fix some intrinsics without alignment requirements.

2024-05-08 Thread Hongtao Liu
On Wed, May 8, 2024 at 10:13 AM Hu, Lin1  wrote:
>
> Hi all,
>
> This patch aims to fix some intrinsics without alignment requirement, but
> raised runtime error's problem.
>
> Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/84508
> * config/i386/emmintrin.h
> (_mm_load_sd): Remove alignment requirement.
> (_mm_store_sd): Ditto.
> (_mm_loadh_pd): Ditto.
> (_mm_loadl_pd): Ditto.
> (_mm_storel_pd): Add alignment requirement.
> * config/i386/xmmintrin.h
> (_mm_loadh_pi): Remove alignment requirement.
> (_mm_loadl_pi): Ditto.
> (_mm_load_ss): Ditto.
> (_mm_store_ss): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> PR target/84508
> * gcc.target/i386/pr84508-1.c: New test.
> * gcc.target/i386/pr84508-2.c: Ditto.
> ---
>  gcc/config/i386/emmintrin.h   | 11 ++-
>  gcc/config/i386/xmmintrin.h   |  9 +
>  gcc/testsuite/gcc.target/i386/pr84508-1.c | 11 +++
>  gcc/testsuite/gcc.target/i386/pr84508-2.c | 11 +++
>  4 files changed, 33 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84508-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr84508-2.c
>
> diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
> index 915a5234c38..d7fc1af9687 100644
> --- a/gcc/config/i386/emmintrin.h
> +++ b/gcc/config/i386/emmintrin.h
> @@ -56,6 +56,7 @@ typedef double __m128d __attribute__ ((__vector_size__ 
> (16), __may_alias__));
>  /* Unaligned version of the same types.  */
>  typedef long long __m128i_u __attribute__ ((__vector_size__ (16), 
> __may_alias__, __aligned__ (1)));
>  typedef double __m128d_u __attribute__ ((__vector_size__ (16), 
> __may_alias__, __aligned__ (1)));
> +typedef double double_u __attribute__ ((__may_alias__, __aligned__ (1)));
>
>  /* Create a selector for use with the SHUFPD instruction.  */
>  #define _MM_SHUFFLE2(fp1,fp0) \
> @@ -145,7 +146,7 @@ _mm_load1_pd (double const *__P)
>  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_load_sd (double const *__P)
>  {
> -  return _mm_set_sd (*__P);
> +  return __extension__ (__m128d){ *(double_u *)__P, 0.0 };
>  }
>
>  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> @@ -180,7 +181,7 @@ _mm_storeu_pd (double *__P, __m128d __A)
>  extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_store_sd (double *__P, __m128d __A)
>  {
> -  *__P = ((__v2df)__A)[0];
> +  *(double_u *)__P = ((__v2df)__A)[0] ;
>  }
>
>  extern __inline double __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> @@ -192,7 +193,7 @@ _mm_cvtsd_f64 (__m128d __A)
>  extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_storel_pd (double *__P, __m128d __A)
>  {
> -  _mm_store_sd (__P, __A);
> +  *__P = ((__v2df)__A)[0];
>  }
>
>  /* Stores the upper DPFP value.  */
> @@ -973,13 +974,13 @@ _mm_unpacklo_pd (__m128d __A, __m128d __B)
>  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_loadh_pd (__m128d __A, double const *__B)
>  {
> -  return (__m128d)__builtin_ia32_loadhpd ((__v2df)__A, __B);
> +  return __extension__ (__m128d) { ((__v2df)__A)[0], *(double_u*)__B };
>  }
>
>  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_loadl_pd (__m128d __A, double const *__B)
>  {
> -  return (__m128d)__builtin_ia32_loadlpd ((__v2df)__A, __B);
> +  return __extension__ (__m128d) { *(double_u*)__B, ((__v2df)__A)[1] };
>  }
>
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
> index 71b9955b843..9e20f262839 100644
> --- a/gcc/config/i386/xmmintrin.h
> +++ b/gcc/config/i386/xmmintrin.h
> @@ -73,6 +73,7 @@ typedef float __m128 __attribute__ ((__vector_size__ (16), 
> __may_alias__));
>
>  /* Unaligned version of the same type.  */
>  typedef float __m128_u __attribute__ ((__vector_size__ (16), __may_alias__, 
> __aligned__ (1)));
> +typedef float float_u __attribute__ ((__may_alias__, __aligned__ (1)));
>
>  /* Internal data types for implementing the intrinsics.  */
>  typedef float __v4sf __attribute__ ((__vector_size__ (16)));
> @@ -774,7 +775,7 @@ _mm_unpacklo_ps (__m128 __A, __m128 __B)
>  /* Sets the upper two SPFP values with 64-bits of data loaded from P;
> the lower two values are passed through from A.  */
>  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
> -_mm_loadh_pi (__m128 __A, __m64 const *__P)
> +_mm_loadh_pi (__m128 __A, __m64_u const *__P)
>  {
>return (__m128) __builtin_ia32_loadhps ((__v4sf)__A, (const __v2sf *)__P);
>  }
> @@ -803,7 

RE: [pushed][PR114810][LRA]: Recognize alternatives with lack of available registers for insn and demote them.

2024-05-08 Thread Li, Pan2
CC more RISC-V port people for awareness.

Pan

From: Li, Pan2 
Sent: Thursday, May 9, 2024 11:25 AM
To: Vladimir Makarov ; gcc-patches@gcc.gnu.org
Subject: RE: [pushed][PR114810][LRA]: Recognize alternatives with lack of 
available registers for insn and demote them.

Hi Vladimir,

Looks this patch results in some ICE in the rvv.exp of RISC-V backend, feel 
free to ping me if more information is needed for reproducing.

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |
rv64gcv/  lp64d/ medlow | 1061 /69 |0 / 0 |  - |
make: *** [Makefile:1096: report-gcc-newlib] Error 1

Just pick one imm_loop_invariant-10.c as below.

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
 error: unrecognizable insn:
(insn 265 0 0 (parallel [
(set (reg:RVVMF8QI 309 [239])
(unspec:RVVMF8QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))
(clobber (scratch:SI))
]) -1
 (nil))
during RTL pass: reload
…. gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1: 
internal compiler error: in extract_insn, at recog.cc:2812
0xa9d309 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:108
0xa9d32b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:116
0xa9bc07 extract_insn(rtx_insn*)
../.././gcc/gcc/recog.cc:2812
0x10e5ad2 ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def* 
(*)(rtx_def*))
../.././gcc/gcc/ira.cc:5381
0x112868f remove_insn_scratches
../.././gcc/gcc/lra.cc:2154
0x112868f lra_emit_move(rtx_def*, rtx_def*)
../.././gcc/gcc/lra.cc:513
0x1136883 match_reload
../.././gcc/gcc/lra-constraints.cc:1184
0x1142ae4 curr_insn_transform
../.././gcc/gcc/lra-constraints.cc:4778
0x11443cb lra_constraints(bool)
../.././gcc/gcc/lra-constraints.cc:5481
0x112b192 lra(_IO_FILE*, int)
../.././gcc/gcc/lra.cc:2442
0x10e0e7f do_reload
../.././gcc/gcc/ira.cc:5973
0x10e0e7f execute
../.././gcc/gcc/ira.cc:6161

Pan

From: Vladimir Makarov mailto:vmaka...@redhat.com>>
Sent: Thursday, May 9, 2024 12:40 AM
To: gcc-patches@gcc.gnu.org
Subject: [pushed][PR114810][LRA]: Recognize alternatives with lack of available 
registers for insn and demote them.


The following patch is a fix for PR114810 from LRA side.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114810

The patch was successfully bootstrapped and tested on x86_64, aarch64, ppc64le.




RE: [pushed][PR114810][LRA]: Recognize alternatives with lack of available registers for insn and demote them.

2024-05-08 Thread Li, Pan2
Hi Vladimir,

Looks this patch results in some ICE in the rvv.exp of RISC-V backend, feel 
free to ping me if more information is needed for reproducing.

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |
rv64gcv/  lp64d/ medlow | 1061 /69 |0 / 0 |  - |
make: *** [Makefile:1096: report-gcc-newlib] Error 1

Just pick one imm_loop_invariant-10.c as below.

/home/pli/gcc/111/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1:
 error: unrecognizable insn:
(insn 265 0 0 (parallel [
(set (reg:RVVMF8QI 309 [239])
(unspec:RVVMF8QI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))
(clobber (scratch:SI))
]) -1
 (nil))
during RTL pass: reload
…. gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_loop_invariant-10.c:20:1: 
internal compiler error: in extract_insn, at recog.cc:2812
0xa9d309 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:108
0xa9d32b _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././gcc/gcc/rtl-error.cc:116
0xa9bc07 extract_insn(rtx_insn*)
../.././gcc/gcc/recog.cc:2812
0x10e5ad2 ira_remove_insn_scratches(rtx_insn*, bool, _IO_FILE*, rtx_def* 
(*)(rtx_def*))
../.././gcc/gcc/ira.cc:5381
0x112868f remove_insn_scratches
../.././gcc/gcc/lra.cc:2154
0x112868f lra_emit_move(rtx_def*, rtx_def*)
../.././gcc/gcc/lra.cc:513
0x1136883 match_reload
../.././gcc/gcc/lra-constraints.cc:1184
0x1142ae4 curr_insn_transform
../.././gcc/gcc/lra-constraints.cc:4778
0x11443cb lra_constraints(bool)
../.././gcc/gcc/lra-constraints.cc:5481
0x112b192 lra(_IO_FILE*, int)
../.././gcc/gcc/lra.cc:2442
0x10e0e7f do_reload
../.././gcc/gcc/ira.cc:5973
0x10e0e7f execute
../.././gcc/gcc/ira.cc:6161

Pan

From: Vladimir Makarov 
Sent: Thursday, May 9, 2024 12:40 AM
To: gcc-patches@gcc.gnu.org
Subject: [pushed][PR114810][LRA]: Recognize alternatives with lack of available 
registers for insn and demote them.


The following patch is a fix for PR114810 from LRA side.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114810

The patch was successfully bootstrapped and tested on x86_64, aarch64, ppc64le.




[PATCH v1] RISC-V: Make full-vec-move1.c test robust for optimization

2024-05-08 Thread pan2 . li
From: Pan Li 

During investigate the support of early break autovec, we notice
the test full-vec-move1.c will be optimized to 'return 0;' in main
function body.  Because somehow the value of V type is compiler
time constant,  and then the second loop will be considered as
assert (true).

Thus,  the ccp4 pass will eliminate these stmt and just return 0.

typedef int16_t V __attribute__((vector_size (128)));

int main ()
{
  V v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
(v)[i] = i;

  V res = v;
  for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
assert (res[i] == i); // will be optimized to assert (true)
}

This patch would like to introduce a extern function to use the res[i]
that get rid of the ccp4 optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c:
Introduce extern func use to get rid of ccp4 optimization.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c
index d73bad4af6f..fae2ae91572 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1.c
@@ -2,11 +2,12 @@
 /* { dg-additional-options "-std=c99 -O3 -march=rv64gcv_zvl128b -mabi=lp64d 
-fno-vect-cost-model -mrvv-vector-bits=zvl" } */
 
 #include 
-#include 
 
 /* This would cause us to emit a vl1r.v for VNx4HImode even when
the hardware vector size vl > 64.  */
 
+extern int16_t test_element (int16_t);
+
 typedef int16_t V __attribute__((vector_size (128)));
 
 int main ()
@@ -14,9 +15,10 @@ int main ()
   V v;
   for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
 (v)[i] = i;
+
   V res = v;
   for (int i = 0; i < sizeof (v) / sizeof (v[0]); i++)
-assert (res[i] == i);
+test_element (res[i]);
 }
 
 /* { dg-final { scan-assembler-not {vl[1248]r.v} } }  */
-- 
2.34.1



Re: Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-08 Thread 钟居哲
Thanks Dim.

We noticed there is regression in aarch64 CI.
We will fix it with following your comments and regression in aarch64 CI.



juzhe.zh...@rivai.ai
 
From: Dimitar Dimitrov
Date: 2024-05-08 23:57
To: 陈硕
CC: 丁乐华; gcc-patches; 钟居哲; 夏晋; vmakarov; richard.sandiford
Subject: Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem
On Wed, May 08, 2024 at 11:34:48AM +0800, 陈硕 wrote:
> Hi Dimitar
> 
> 
> I send a patch just now, modifies accordingly
> 
> 
> some comments:
> 
> 
> Nit: Should have two spaces after the dot, per GNU coding style. 
> I'd suggest
> to run the contrib/check_GNU_style.py script on your patches.
> Do you mean "star" by "dot", i.e. "/*" should be "/* "?
 
No, I was referring to the following paragraph from
https://www.gnu.org/prep/standards/standards.html :
   "Please put two spaces after the end of a sentence in your comments, ..."
 
To fix, simply add a second space after the dot, e.g.:
  -   Like DF_LR, but include tracking subreg liveness. Currently used to 
provide
  +   Like DF_LR, but include tracking subreg liveness.  Currently used to 
provide
 
 
For reference, here is the output from the style checker:
  $ git show | ./contrib/check_GNU_style.py -
  === ERROR type #4: dot, space, space, new sentence (24 error(s)) ===
  ...
  gcc/df-problems.cc:1350:52:   Like DF_LR, but include tracking subreg 
liveness.█Currently used to provide
 
> 
> 
> These names seem a bit too short for global variables. Perhaps tuck
> them in a namespace?
> 
> Also, since these must remain empty, shouldn't they be declared as const?
> 
> namespace df {
>  const bitmap_head empty_bitmap;
>  const subregs_live empty_live;
> }
> 
> 
> 
> May be better if "namespace df" contains all DF related code? as a minor 
> modification, I add a prefix "df_" to the variables.
> Meanwhile, const seems inapropriate here, since it's returned as normal 
> pointer rather than const pointer in some funtions, 
> 
> change to const would break this return value type check, and a const_cast 
> would make the const meanlingless.
> 
> 
> more details see in the patch
 
Thanks for considering my suggestion.
 
Regards,
Dimitar
> 
> 
> regards
> Shuo
> 
> 
> 
 


Re: [PATCH 1/2] RISC-V: Add tests for cpymemsi expansion

2024-05-08 Thread Jeff Law




On 5/7/24 11:52 PM, Christoph Müllner wrote:

cpymemsi expansion was available for RISC-V since the initial port.
However, there are not tests to detect regression.
This patch adds such tests.

Three of the tests target the expansion requirements (known length and
alignment). One test reuses an existing memcpy test from the by-pieces
framework (gcc/testsuite/gcc.dg/torture/inline-mem-cpy-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymemsi-1.c: New test.
* gcc.target/riscv/cpymemsi-2.c: New test.
* gcc.target/riscv/cpymemsi-3.c: New test.
* gcc.target/riscv/cpymemsi.c: New test.

OK
jeff



Re: [PATCH gcc-13-backport] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2024-05-08 Thread Jeff Law




On 5/8/24 11:32 AM, Palmer Dabbelt wrote:

From: Yanzhang Wang 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
when enabling -mno-omit-leaf-frame-pointer
(riscv_option_override): Override omit-frame-pointer.
(riscv_frame_pointer_required): Save s0 for non-leaf function
(TARGET_FRAME_POINTER_REQUIRED): Override defination
* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/omit-frame-pointer-1.c: New test.
* gcc.target/riscv/omit-frame-pointer-2.c: New test.
* gcc.target/riscv/omit-frame-pointer-3.c: New test.
* gcc.target/riscv/omit-frame-pointer-4.c: New test.
* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang 
(cherry picked from commit 39663298b5934831a0125e12f113ebd83248c3be)
---
I haven't tested this (just an all-gcc build), but I figured I'd just
send it now as it's kind of a grey area for backports: the flag itself
is a new feature, but it also fixes a compatibility issue with the psABI
-- which itself is a grey area, as the psABI change was a retrofit and is
marked as optional.  I'd test it before pushing it, but this is one of
those things where I'm not really sure what the backporting rules
indicate we should do.

There's more discussion on this LKML thread:
https://lore.kernel.org/linux-riscv/527dd4d8-f1e5-4581-b1e3-aa315fea8...@sifive.com/T/#mf15ccc659b7b8b838b88959fbea460210875eb9c

That also has a much smaller fix, but having the whole argument seems
like a nicer user interface to me -- then users who really want
compatibility with the psABI's section on frame records can just ask for
it directly (via the odd spelling `-fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer`, but too late to change that).

Thoughts on this for 13?

Given its target specific, I think we have a lot more leeway here.

I think there was a followup in the space.  defa8681d951




We'd probably also want it all the way back to 11, but I assume that's
going to be the same discussion.

Yea.

You might explicitly run it by Jakub.  But I'm certainly OK with this 
being backported.


jeff


Re: Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-08 Thread 钟居哲
Thanks Vlad.

I noticed there is devel/subreg-coalesce branch.

We are working on supporting subreg coalesce in IRA/LRA base on the latest 
version of subreg DF patch.

And we will send the followup patches.

Thanks.


juzhe.zh...@rivai.ai
 
From: Vladimir Makarov
Date: 2024-05-09 00:29
To: Lehua Ding
CC: richard.sandiford; juzhe.zhong; gcc-patches
Subject: Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data
 
On 5/7/24 23:01, Lehua Ding wrote:
> Hi Vladimir,
>
> I'll send V3 patchs based on these comments. Note that these four 
> patches only support subreg liveness tracking and apply to IRA and LRA 
> pass. Therefore, no performance changes are expected before we support 
> subreg coalesce. There will be new patches later to complete the 
> subreg coalesce functionality. Support for subreg coalesce requires 
> support for subreg copy i.e. modifying the logic for conflict detection.
>
>
Thank you for your clarification that the current batch of patches does 
not change the performance.  I hope the next batch of patches will be 
added to devel/subreg-coalesce branch too for their easier evaluation.
 
 
 


Re: [PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]g

2024-05-08 Thread Steve Kargl
On Wed, May 08, 2024 at 01:27:53PM +0800, Kewen.Lin wrote:
> 
> Previously effective target fortran_real_c_float128 never
> passes on Power regardless of the default 128 long double
> is ibmlongdouble or ieeelongdouble.  It's due to that TF
> mode is always used for kind 16 real, which has precision
> 127, while the node float128_type_node for c_float128 has
> 128 type precision, get_real_kind_from_node can't find a
> matching as it only checks gfc_real_kinds[i].mode_precision
> and type precision.
> 
> With changing TFmode/IFmode/KFmode to have the same mode
> precision 128, now fortran_real_c_float12 can pass with
> ieeelongdouble enabled by default and test cases guarded
> with it get tested accordingly.  But with ibmlongdouble
> enabled by default, since TFmode has precision 128 which
> is the same as type precision 128 of float128_type_node,
> get_real_kind_from_node considers kind for TFmode matches
> float128_type_node, but it's wrong as at this time point
> TFmode is with ibm extended format.  So this patch is to
> teach get_real_kind_from_node to check one more field which
> can be differentiable from the underlying real format, it
> can avoid the unexpected matching when there more than one
> modes have the same precision.
> 
> Bootstrapped and regress-tested on:
>   - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
>   - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
>   - powerpc64le-linux-gnu P9 (with ieee128 by default)
> 
> BR,
> Kewen
> -
>   PR target/112993
> 

First, I have no issue with Mikael's OK for committing the
patch. 

That said, Fortran has the concept of model numbers, which
are set in arith.c.  Does this change give the expected 
value for ibm128?  For example, with "REAL(16) X", one
has "DIGITS(X) = 113", which is the precision on the 
of the underlying IEEE754 binary128 type.

-- 
Steve


Re: [Patch, fortran] PR84006 [11/12/13/14/15 Regression] ICE in storage_size() with CLASS entity

2024-05-08 Thread Harald Anlauf

Hi Paul,

this looks mostly good, but the new testcase transfer_class_4.f90
does exhibit a problem with your patch.  Run it with valgrind,
or with -fcheck=bounds, or with -fsanitize=address, or add the
following around the final transfer:

print *, storage_size (star_a), storage_size (chr_a), size (chr_a), len 
(chr_a)

  chr_a = transfer (star_a, chr_a)
print *, storage_size (star_a), storage_size (chr_a), size (chr_a), len 
(chr_a)

print *, ">", chr_a, "<"

This prints for me:

  40  40   2   5$
  40  40   4   5$
 >abcdefghij^@^@^@^@^@^@^@^@^@^@<$

So since the physical representation of chr_a is sufficient
to hold star_a (F2023:16.9.212), no reallocation with a wrong
calculated size should happen.  (Intel and NAG get this right.)

Can you check again?

Thanks,
Harald


Am 08.05.24 um 17:01 schrieb Paul Richard Thomas:

This fix is straightforward and described by the ChangeLog. Jose Rui
Faustino de Sousa posted the same fix for the ICE on the fortran list
slightly more than three years ago. Thinking that he had commit rights, I
deferred but, regrettably, the patch was never applied. The attached patch
also fixes storage_size and transfer for unlimited polymorphic arguments
with character payloads.

OK for mainline and backporting after a reasonable interval?

Paul

Fortran: Unlimited polymorphic intrinsic function arguments [PR84006]

2024-05-08  Paul Thomas  

gcc/fortran
PR fortran/84006
PR fortran/100027
PR fortran/98534
* trans-expr.cc (gfc_resize_class_size_with_len): Use the fold
even if a block is not available in which to fix the result.
(trans_class_assignment): Enable correct assignment of
character expressions to unlimited polymorphic variables using
lhs _len field and rse string_length.
* trans-intrinsic.cc (gfc_conv_intrinsic_storage_size): Extract
the class expression so that the unlimited polymorphic class
expression can be used in gfc_resize_class_size_with_len to
obtain the storage size for character payloads. Guard the use
of GFC_DECL_SAVED_DESCRIPTOR by testing for DECL_LANG_SPECIFIC
to prevent the ICE. Also, invert the order to use the class
expression extracted from the argument.
(gfc_conv_intrinsic_transfer): In same way as 'storage_size',
use the _len field to obtaining the correct length for arg 1.

gcc/testsuite/
PR fortran/84006
PR fortran/100027
* gfortran.dg/storage_size_7.f90: New test.

PR fortran/98534
* gfortran.dg/transfer_class_4.f90: New test.






Re: [PATCH] [RFC] Add function filtering to gcov

2024-05-08 Thread Jan Hubicka
> > 
> > For JSON output I suppose there's a way to "grep" without the line oriented
> > issue?  I suppose we could make the JSON more hierarchical by adding
> > an outer function object?
> 
> Absolutely, yes, this is much less useful for JSON. The filtering works,
> which may be occasionally handy for very large files, but jq and other query
> engines already work filter quite well. For me, the problem is actually
> working with JSON and mapping it back to the source.
> 
> > 
> > That said, I think this is a useful feature and thus OK for trunk if there 
> > are
> > no other comments in about a week if you also update the gcov documentation.
> 
> Thanks! I was actually surprised at how much I liked this feature once I
> started playing with it, and it made the edit, build, run, gcov -t cycle
> quite effective at exploring the program. For path coverage, filtering also
> becomes mandatory - the number of paths grows *very* quickly, and different
> levels of verbosity is useful too. Being able to focus on the function(s)
> you care about makes a huge difference.
> 
> I won't merge the patch in its current state - I will make a pass or two
> over it, add some documentation, and resubmit the patch, but not make large
> design changes unless someone objects. I won't merge that patch until it has
> been reviewed again, of course.

I also think it is useful feature to be able to restrict gcov to
selected functions.
Concerning path coverage profiling, this is something that can be
potentially also useful for optimization.  I.e. search for "Effecient
path profiling" by Ball and Larus on scholar.  One obvious thing to do
is to perform tail duplicaiton on code path that shows corelated
branches, but there are quite few other things to do with such
infrastructure (there are over 900 references to that at scholar and
some of them may be practically relevant).

Honza
> 
> Thanks,
> Jørgen
> 
> > 
> > Thanks,
> > Richard.
> > 
> > > ---
> > >   gcc/gcov.cc| 101 +++--
> > >   gcc/testsuite/g++.dg/gcov/gcov-19.C|  35 +
> > >   gcc/testsuite/g++.dg/gcov/gcov-20.C|  38 ++
> > >   gcc/testsuite/gcc.misc-tests/gcov-24.c |  20 +
> > >   gcc/testsuite/gcc.misc-tests/gcov-25.c |  23 ++
> > >   gcc/testsuite/gcc.misc-tests/gcov-26.c |  23 ++
> > >   gcc/testsuite/gcc.misc-tests/gcov-27.c |  22 ++
> > >   gcc/testsuite/lib/gcov.exp |  53 -
> > >   8 files changed, 306 insertions(+), 9 deletions(-)
> > >   create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-19.C
> > >   create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-20.C
> > >   create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-24.c
> > >   create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-25.c
> > >   create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-26.c
> > >   create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-27.c
> > > 
> > > diff --git a/gcc/gcov.cc b/gcc/gcov.cc
> > > index fad704eb7c9..e53765de00a 100644
> > > --- a/gcc/gcov.cc
> > > +++ b/gcc/gcov.cc
> > > @@ -46,6 +46,7 @@ along with Gcov; see the file COPYING3.  If not see
> > >   #include "color-macros.h"
> > >   #include "pretty-print.h"
> > >   #include "json.h"
> > > +#include "xregex.h"
> > > 
> > >   #include 
> > >   #include 
> > > @@ -643,6 +644,43 @@ static int flag_counts = 0;
> > >   /* Return code of the tool invocation.  */
> > >   static int return_code = 0;
> > > 
> > > +/* "Keep policy" when adding functions to the global function table.  
> > > This will
> > > +   be set to false when --include is used, otherwise every function 
> > > should be
> > > +   added to the table.  Used for --include/exclude.  */
> > > +static bool default_keep = true;
> > > +
> > > +/* A 'function filter', a filter and action for determining if a function
> > > +   should be included in the output or not.  Used for --include/--exclude
> > > +   filtering.  */
> > > +struct fnfilter
> > > +{
> > > +  /* The (extended) compiled regex for this filter.  */
> > > +  regex_t regex;
> > > +
> > > +  /* The action when this filter (regex) matches - if true, the function 
> > > should
> > > + be kept, otherwise discarded.  */
> > > +  bool keep;
> > > +
> > > +  /* Compile the regex EXPR, or exit if pattern is malformed.  */
> > > +  void compile (const char *expr)
> > > +  {
> > > +int err = regcomp (, expr, REG_NOSUB | REG_EXTENDED);
> > > +if (err)
> > > +  {
> > > +   size_t len = regerror (err, , nullptr, 0);
> > > +   char *msg = XNEWVEC (char, len);
> > > +   regerror (err, , msg, len);
> > > +   fprintf (stderr, "Bad regular expression: %s\n", msg);
> > > +   free (msg);
> > > +   exit (EXIT_FAILURE);
> > > +}
> > > +  }
> > > +};
> > > +
> > > +/* A collection of filter functions for including/exclude functions in 
> > > the
> > > +   output.  This is empty unless --include/--exclude is used.  */
> > > +static vector filters;
> > > +
> > >   /* Forward 

[committed] [RISC-V] Provide splitting guidance to combine to faciliate shNadd.uw generation

2024-05-08 Thread Jeff Law
This fixes a minor code quality issue I found while comparing GCC and 
LLVM.  Essentially we want to do a bit of re-association to generate 
shNadd.uw instructions.


Combine does the right thing and finds all the necessary instructions, 
reassociates the operands, combines constants, etc.  Where is fails is 
finding a good split point.  The backend can trivially provide guidance 
on how to split via a define_split pattern.


This has survived both Ventana's internal CI system (rv64gcb) as well as 
my own (rv64gc, rv32gcv).


I'll wait for the external CI system to give the all-clear before pushing.



jeff

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index ad3ad758959..d76a72d30e0 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -184,6 +184,23 @@ (define_insn "*slliuw"
   [(set_attr "type" "bitmanip")
(set_attr "mode" "DI")])
 
+;; Combine will reassociate the operands in the most useful way here.  We
+;; just have to give it guidance on where to split the result to facilitate
+;; shNadd.uw generation.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (plus:DI (plus:DI (and:DI (ashift:DI (match_operand:DI 1 
"register_operand")
+(match_operand:QI 2 
"imm123_operand"))
+ (match_operand 3 
"consecutive_bits32_operand"))
+ (match_operand:DI 4 "register_operand"))
+(match_operand 5 "immediate_operand")))]
+  "TARGET_64BIT && TARGET_ZBA"
+  [(set (match_dup 0)
+   (plus:DI (and:DI (ashift:DI (match_dup 1) (match_dup 2))
+(match_dup 3))
+(match_dup 4)))
+   (set (match_dup 0) (plus:DI (match_dup 0) (match_dup 5)))])
+
 ;; ZBB extension.
 
 (define_expand "clzdi2"
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shadduw.c 
b/gcc/testsuite/gcc.target/riscv/zba-shadduw.c
new file mode 100644
index 000..5b77447e681
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba-shadduw.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zba -mabi=lp64" } */
+
+typedef struct simple_bitmap_def
+{
+  unsigned char *popcount;
+  unsigned int n_bits;
+  unsigned int size;
+  unsigned long elms[1];
+} *sbitmap;
+typedef const struct simple_bitmap_def *const_sbitmap;
+
+typedef unsigned long *sbitmap_ptr;
+typedef const unsigned long *const_sbitmap_ptr;
+static unsigned long sbitmap_elt_popcount (unsigned long);
+
+void
+sbitmap_a_or_b (sbitmap dst, const_sbitmap a, const_sbitmap b)
+{
+  unsigned int i, n = dst->size;
+  sbitmap_ptr dstp = dst->elms;
+  const_sbitmap_ptr ap = a->elms;
+  const_sbitmap_ptr bp = b->elms;
+  unsigned char has_popcount = dst->popcount != ((void *) 0);
+
+  for (i = 0; i < n; i++)
+{
+  const unsigned long tmp = *ap++ | *bp++;
+  *dstp++ = tmp;
+}
+}
+
+
+/* { dg-final { scan-assembler "sh3add.uw" } } */
+/* { dg-final { scan-assembler-not {\mslli.uw} } } */


Re: [PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

2024-05-08 Thread Jeff Law




On 5/7/24 6:38 PM, Xiao Zeng wrote:

1 This patch implements the Nan-box of bf16.

2 Please refer to the Nan-box implementation of hf16 in:


3 The discussion about Nan-box can be found on the website:


4 Below test are passed for this patch
 * The riscv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
---
  gcc/config/riscv/riscv.cc | 51 ++-
  gcc/config/riscv/riscv.md | 12 -
  .../gcc.target/riscv/_Bfloat16-nanboxing.c| 38 ++
  3 files changed, 76 insertions(+), 25 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 545e68566dc..be2cb245733 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3120,35 +3120,38 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)


  
- if (TARGET_HARD_FLOAT

- && !TARGET_ZFHMIN && mode == HFmode
- && REG_P (dest) && FP_REG_P (REGNO (dest))
- && REG_P (src) && !FP_REG_P (REGNO (src))
- && can_create_pseudo_p ())

[ ... ]


+  if (TARGET_HARD_FLOAT
+  && ((!TARGET_ZFHMIN && mode == HFmode)
+ || (!TARGET_ZFBFMIN && mode == BFmode))
+  && REG_P (dest) && FP_REG_P (REGNO (dest)) && REG_P (src)
+  && !FP_REG_P (REGNO (src)) && can_create_pseudo_p ())


So there's a bit of gratutious rewriting going on here.  I realize you 
were fixing formatting problems (thanks!), but I don't see a need to 
rewriting the tests starting with REG_P.  I put those back in their 
original form with the whitespace fixes.


I'll push the fixed version momentarily.

Thanks again!

jeff




Re: [PATCH] [RFC] Add function filtering to gcov

2024-05-08 Thread Jørgen Kvalsvik

On 5/8/24 15:29, Richard Biener wrote:

On Fri, Mar 29, 2024 at 8:02 PM Jørgen Kvalsvik  wrote:


This is a prototype for --include/--exclude flags, and I would like a
review of both the approach and architecture, and the implementation,
plus feedback on the feature itself. I did not update the manuals or
carefully extend --help, in case the interface itself needs some
revision before it can be merged.

---

Add the --include and --exclude flags to gcov to control what functions
to report on. This is meant to make gcov more practical as an when
writing test suites or performing other coverage experiments, which
tends to focus on a few functions at the time. This really shines in
combination with the -t/--stdout flag. With support for more expansive
metrics in gcov like modified condition/decision coverage (MC/DC) and
path coverage, output quickly gets overwhelming without filtering.

The approach is quite simple: filters are egrep regexes and are
evaluated left-to-right, and the last filter "wins", that is, if a
function matches an --include and a subsequent --exclude, it should not
be included in the output. The output machinery is already interacting
with the function table, which makes the json output work as expected,
and only minor changes are needed to suppress the filtered-out
functions.

Demo: math.c

 int mul (int a, int b) {
 return a * b;
 }

 int sub (int a, int b) {
 return a - b;
 }

 int sum (int a, int b) {
 return a + b;
 }

Plain matches:

$ gcov -t math --include=sum
 -:0:Source:filter.c
 -:0:Graph:filter.gcno
 -:0:Data:-
 -:0:Runs:0
 #:9:int sum (int a, int b) {
 #:   10:return a + b;

$ gcov -t math --include=mul
 -:0:Source:filter.c
 -:0:Graph:filter.gcno
 -:0:Data:-
 -:0:Runs:0
 #:1:int mul (int a, int b) {
 #:2:return a * b;

Regex match:

$ gcov -t math --include=su
 -:0:Source:filter.c
 -:0:Graph:filter.gcno
 -:0:Data:-
 -:0:Runs:0
 #:5:int sub (int a, int b) {
 #:6:return a - b;
 -:7:}
 #:9:int sum (int a, int b) {
 #:   10:return a + b;

And similar for exclude:

$ gcov -t math --exclude=sum
 -:0:Source:filter.c
 -:0:Graph:filter.gcno
 -:0:Data:-
 -:0:Runs:0
 #:1:int mul (int a, int b) {
 #:2:return a * b;
 -:3:}
 #:5:int sub (int a, int b) {
 #:6:return a - b;

And json, for good measure:

$ gcov -t math --include=sum --json | jq ".files[].lines[]"
{
   "line_number": 9,
   "function_name": "sum",
   "count": 0,
   "unexecuted_block": true,
   "block_ids": [],
   "branches": [],
   "calls": []
}
{
   "line_number": 10,
   "function_name": "sum",
   "count": 0,
   "unexecuted_block": true,
   "block_ids": [
 2
   ],
   "branches": [],
   "calls": []
}

Note that the last function gets "clipped" when lines are associated to
functions, which means the closing brace is dropped from the report. I
hope this can be fixed, but considering it is not really a part of the
function body, the gcov report is "complete".

Matching generally work well for mangled names, as the mangled names
also have the base symbol name in it. A possible extension to the
filtering commands would be to mix it with demangling to more nicely
being able to filter specific overloads, without manually having to
mangle the interesting symbols. The g++.dg/gcov/gcov-20.C test tests the
matching of a mangled name.

The dejagnu testing function verify-calls is somewhat minimal, but does
the job well enough.

Why not just use grep? grep is not really sufficient, as grep is very
line oriented, and the reports that benefit the most from filtering
often span multiple lines, unpredictably.


For JSON output I suppose there's a way to "grep" without the line oriented
issue?  I suppose we could make the JSON more hierarchical by adding
an outer function object?


Absolutely, yes, this is much less useful for JSON. The filtering works, 
which may be occasionally handy for very large files, but jq and other 
query engines already work filter quite well. For me, the problem is 
actually working with JSON and mapping it back to the source.




That said, I think this is a useful feature and thus OK for trunk if there are
no other comments in about a week if you also update the gcov documentation.


Thanks! I was actually surprised at how much I liked this feature once I 
started playing with it, and it made the edit, build, run, gcov -t cycle 
quite effective at exploring the program. For path coverage, filtering 
also becomes mandatory - the number of paths grows *very* quickly, and 
different levels of verbosity is useful too. Being able to focus on the 
function(s) you care about makes a huge difference.



Re: Ping [PATCH/RFC] target, hooks: Allow a target to trap on unreachable [PR109267].

2024-05-08 Thread Andrew Pinski
On Wed, May 8, 2024 at 12:37 PM Iain Sandoe  wrote:
>
> Hi Folks,
>
> I’d like to land a viable solution to this issue if possible, (it is a show-
> stopper for the aarch64-darwin development branch).
>
> > On 9 Apr 2024, at 14:55, Iain Sandoe  wrote:
> >
> > So far, tested lightly on aarch64-darwin; if this is acceptable then
> > it will be possible to back out of the ad hoc fixes used on x86 and
> > powerpc darwin.
> > Comments welcome, thanks,
>
> @Andrew - you were also (at one stage) talking about some ideas about
> how to handle this is in the middle end.
> Is that something you are likely to have time to do?
> Would it still be reasonable to have a target hook to control the behaviour.
> (the implementation below allows one to make the effect per TU)

I won't be able to implement the idea until July at earliest though.

Thanks,
Andrew

>
>
> > Iain
> >
> > --- 8< ---
> >
> >
> > In the PR cited case a target linker cannot handle enpty FDEs,
> > arguably this is a linker bug - but in some cases we might still
> > wish to work around it.
> >
> > In the case of Darwin, the ABI does not allow two global symbols
> > to have the same address, so that emitting empty functions has
> > potential (almost guarantee) to break ABI.
> >
> > This patch allows a target to ask that __builtin_unreachable is
> > expanded in the same way as __builtin_trap (either to a trap
> > instruction or to abort() if there is no such insn).
> >
> > This means that the middle end's use of unreachability for
> > optimisation should not be altered.
> >
> > __builtin_unreachble is currently expanded to a barrier and
> > __builtin_trap is expanded to a trap insn + a barrier so that it
> > seems we should not be unduly affecting RTL optimisations.
> >
> > For Darwin, we enable this by default, but allow it to be disabled
> > per TU using -mno-unreachable-traps.
> >
> >   PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> >   * builtins.cc (expand_builtin_unreachable): Allow for
> >   a target to expand this as a trap.
> >   * config/darwin-protos.h (darwin_unreachable_traps_p): New.
> >   * config/darwin.cc (darwin_unreachable_traps_p): New.
> >   * config/darwin.h (TARGET_UNREACHABLE_SHOULD_TRAP): New.
> >   * config/darwin.opt (munreachable-traps): New.
> >   * doc/invoke.texi: Document -munreachable-traps.
> >   * doc/tm.texi: Regenerate.
> >   * doc/tm.texi.in: Document TARGET_UNREACHABLE_SHOULD_TRAP.
> >   * target.def (TARGET_UNREACHABLE_SHOULD_TRAP): New hook.
> >
> > Signed-off-by: Iain Sandoe 
> > ---
> > gcc/builtins.cc|  7 +++
> > gcc/config/darwin-protos.h |  1 +
> > gcc/config/darwin.cc   |  7 +++
> > gcc/config/darwin.h|  4 
> > gcc/config/darwin.opt  |  4 
> > gcc/doc/invoke.texi|  7 ++-
> > gcc/doc/tm.texi|  5 +
> > gcc/doc/tm.texi.in |  2 ++
> > gcc/target.def | 10 ++
> > 9 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index f8d94c4b435..13f321b6be6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -5929,6 +5929,13 @@ expand_builtin_trap (void)
> > static void
> > expand_builtin_unreachable (void)
> > {
> > +  /* If the target wants a trap in place of the fall-through, use that.  */
> > +  if (targetm.unreachable_should_trap ())
> > +{
> > +  expand_builtin_trap ();
> > +  return;
> > +}
> > +
> >   /* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
> >  to avoid this.  */
> >   gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> > diff --git a/gcc/config/darwin-protos.h b/gcc/config/darwin-protos.h
> > index b67e05264e1..48a32b2ccc2 100644
> > --- a/gcc/config/darwin-protos.h
> > +++ b/gcc/config/darwin-protos.h
> > @@ -124,6 +124,7 @@ extern void darwin_enter_string_into_cfstring_table 
> > (tree);
> > extern void darwin_asm_output_anchor (rtx symbol);
> > extern bool darwin_use_anchors_for_symbol_p (const_rtx symbol);
> > extern bool darwin_kextabi_p (void);
> > +extern bool darwin_unreachable_traps_p (void);
> > extern void darwin_override_options (void);
> > extern void darwin_patch_builtins (void);
> > extern void darwin_rename_builtins (void);
> > diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
> > index dcfccb4952a..018547d09c6 100644
> > --- a/gcc/config/darwin.cc
> > +++ b/gcc/config/darwin.cc
> > @@ -3339,6 +3339,13 @@ darwin_kextabi_p (void) {
> >   return flag_apple_kext;
> > }
> >
> > +/* True, iff we want to map __builtin_unreachable to a trap.  */
> > +
> > +bool
> > +darwin_unreachable_traps_p (void) {
> > +  return darwin_unreachable_traps;
> > +}
> > +
> > void
> > darwin_override_options (void)
> > {
> > diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
> > index d335ffe7345..17f41cf30ef 100644
> > --- a/gcc/config/darwin.h
> > +++ b/gcc/config/darwin.h
> > @@ -1225,6 +1225,10 @@ void add_framework_path (char *);
> > 

Ping [PATCH/RFC] target, hooks: Allow a target to trap on unreachable [PR109267].

2024-05-08 Thread Iain Sandoe
Hi Folks,

I’d like to land a viable solution to this issue if possible, (it is a show-
stopper for the aarch64-darwin development branch).

> On 9 Apr 2024, at 14:55, Iain Sandoe  wrote:
> 
> So far, tested lightly on aarch64-darwin; if this is acceptable then
> it will be possible to back out of the ad hoc fixes used on x86 and
> powerpc darwin.
> Comments welcome, thanks,

@Andrew - you were also (at one stage) talking about some ideas about
how to handle this is in the middle end.
Is that something you are likely to have time to do?
Would it still be reasonable to have a target hook to control the behaviour.
(the implementation below allows one to make the effect per TU)


> Iain
> 
> --- 8< ---
> 
> 
> In the PR cited case a target linker cannot handle enpty FDEs,
> arguably this is a linker bug - but in some cases we might still
> wish to work around it.
> 
> In the case of Darwin, the ABI does not allow two global symbols
> to have the same address, so that emitting empty functions has
> potential (almost guarantee) to break ABI.
> 
> This patch allows a target to ask that __builtin_unreachable is
> expanded in the same way as __builtin_trap (either to a trap
> instruction or to abort() if there is no such insn).
> 
> This means that the middle end's use of unreachability for
> optimisation should not be altered.
> 
> __builtin_unreachble is currently expanded to a barrier and
> __builtin_trap is expanded to a trap insn + a barrier so that it
> seems we should not be unduly affecting RTL optimisations.
> 
> For Darwin, we enable this by default, but allow it to be disabled
> per TU using -mno-unreachable-traps.
> 
>   PR middle-end/109267
> 
> gcc/ChangeLog:
> 
>   * builtins.cc (expand_builtin_unreachable): Allow for
>   a target to expand this as a trap.
>   * config/darwin-protos.h (darwin_unreachable_traps_p): New.
>   * config/darwin.cc (darwin_unreachable_traps_p): New.
>   * config/darwin.h (TARGET_UNREACHABLE_SHOULD_TRAP): New.
>   * config/darwin.opt (munreachable-traps): New.
>   * doc/invoke.texi: Document -munreachable-traps.
>   * doc/tm.texi: Regenerate.
>   * doc/tm.texi.in: Document TARGET_UNREACHABLE_SHOULD_TRAP.
>   * target.def (TARGET_UNREACHABLE_SHOULD_TRAP): New hook.
> 
> Signed-off-by: Iain Sandoe 
> ---
> gcc/builtins.cc|  7 +++
> gcc/config/darwin-protos.h |  1 +
> gcc/config/darwin.cc   |  7 +++
> gcc/config/darwin.h|  4 
> gcc/config/darwin.opt  |  4 
> gcc/doc/invoke.texi|  7 ++-
> gcc/doc/tm.texi|  5 +
> gcc/doc/tm.texi.in |  2 ++
> gcc/target.def | 10 ++
> 9 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..13f321b6be6 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -5929,6 +5929,13 @@ expand_builtin_trap (void)
> static void
> expand_builtin_unreachable (void)
> {
> +  /* If the target wants a trap in place of the fall-through, use that.  */
> +  if (targetm.unreachable_should_trap ())
> +{
> +  expand_builtin_trap ();
> +  return;
> +}
> +
>   /* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
>  to avoid this.  */
>   gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> diff --git a/gcc/config/darwin-protos.h b/gcc/config/darwin-protos.h
> index b67e05264e1..48a32b2ccc2 100644
> --- a/gcc/config/darwin-protos.h
> +++ b/gcc/config/darwin-protos.h
> @@ -124,6 +124,7 @@ extern void darwin_enter_string_into_cfstring_table 
> (tree);
> extern void darwin_asm_output_anchor (rtx symbol);
> extern bool darwin_use_anchors_for_symbol_p (const_rtx symbol);
> extern bool darwin_kextabi_p (void);
> +extern bool darwin_unreachable_traps_p (void);
> extern void darwin_override_options (void);
> extern void darwin_patch_builtins (void);
> extern void darwin_rename_builtins (void);
> diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
> index dcfccb4952a..018547d09c6 100644
> --- a/gcc/config/darwin.cc
> +++ b/gcc/config/darwin.cc
> @@ -3339,6 +3339,13 @@ darwin_kextabi_p (void) {
>   return flag_apple_kext;
> }
> 
> +/* True, iff we want to map __builtin_unreachable to a trap.  */
> +
> +bool
> +darwin_unreachable_traps_p (void) {
> +  return darwin_unreachable_traps;
> +}
> +
> void
> darwin_override_options (void)
> {
> diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
> index d335ffe7345..17f41cf30ef 100644
> --- a/gcc/config/darwin.h
> +++ b/gcc/config/darwin.h
> @@ -1225,6 +1225,10 @@ void add_framework_path (char *);
> #define TARGET_N_FORMAT_TYPES 1
> #define TARGET_FORMAT_TYPES darwin_additional_format_types
> 
> +/* We want __builtin_unreachable to be expanded as a trap instruction.  */
> +#undef TARGET_UNREACHABLE_SHOULD_TRAP
> +#define TARGET_UNREACHABLE_SHOULD_TRAP darwin_unreachable_traps_p
> +
> #ifndef USED_FOR_TARGET
> extern void darwin_driver_init (unsigned int *,struct 

Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-08 Thread Qing Zhao

> On May 7, 2024, at 17:52, Kees Cook  wrote:
> 
> On Tue, May 07, 2024 at 06:34:19PM +, Qing Zhao wrote:
>> On May 7, 2024, at 13:57, Sebastian Huber 
>>  wrote:
>>> On 07.05.24 16:26, Qing Zhao wrote:
 Hi, Sebastian,
 Thanks for your explanation.
 Our goal is to deprecate the GCC extension on  structure
 containing a flexible array member not at the end of another
 structure. In order to achieve this goal, we provided the warning option
 -Wflex-array-member-not-at-end for the users to locate all such
 cases in their source code and update the source code to eliminate
 such cases.
>>> 
>>> What is the benefit of deprecating this GCC extension? If GCC
>>> extensions are removed, then it would be nice to enable the associated
>>> warnings by default.
> 
> The goal of all of the recent array bounds and flexible array work is to
> make sizing information unambiguous (e.g. via __builtin_object_size(),
> __builtin_dynamic_object_size(), and the array-bounds sanitizer). For
> the compiler to be able to deterministically report size information
> on arrays, we needed to deprecate this case even though it had been
> supported in the past. (Though we also _added_ extensions to support
> for other things, like flexible arrays in unions, and the coming
> __counted_by attribute.)
> 
> For example:
> 
> struct flex { int length; char data[]; };
> struct mid_flex { int m; struct flex flex_data; int n; int o; };
> 
> #define SZ(p) __builtin_dynamic_object_size(p, 1)
> 
> void foo(struct flex *f, struct mid_flex *m)
> {
> printf("%zu\n", SZ(f));
> printf("%zu\n", SZ(m->flex_data));
> }
> 
> int main(void)
> {
>struct mid_flex m = { .flex_data.length = 8 };
> foo(>flex_data, );
> return 0;
> }
> 
> This is printing the size of the same object. But the desired results
> are ambiguous. Does m->flex_data have an unknown size (i.e. SIZE_MAX)
> because it's a flex array, or does it contain 8 bytes, since it overlaps
> with the other structure's trailing 2 ints?
> 
> The answer from GCC 13 was neither:
> 
> 18446744073709551615
> 4
> 
> It considered flex_data to be only the size of it's non-flex-array
> members, but only when there was semantic context that it was part of
> another structure. (Yet more ambiguity.)
> 
> In GCC 14, this is "resolved" to be unknown since it is a flex array
> which has no sizing info, and context doesn't matter:
> 
> 18446744073709551615
> 18446744073709551615
> 
> But this paves the way for the coming 'counted_by' attribute which will
> allow for struct flex above to be defined as:
> 
> struct flex { int length; char data[] __attribute__((counted_by(length))); };
> 
> At which point GCC can deterministically report the object size.
> 
> Hopefully I've captured this all correctly -- Qing can correct me. :)
> 
>>> 
 We had a long discussion before deciding to deprecating this GCC
 extension. Please see details here:
 
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
 
 Yes, we do plan to enable this warning by default before final
 deprecation.  (Might consider to enable this warning by default in
 GCC15… and then deprecate it in the next release)
 
 Right now, there is an ongoing work in Linux kernel to get rid of
 all such cases. Kees might have more information on this.
 
 
 The static initialization of structures with flexible array members
 will still work as long as the flexible array members are at the end of
 the structures.
>>> 
>>> Removing the support for flexible array members in the middle of
>>> compounds will make the static initialization practically infeasible.
>> 
>> If the flexible array member is moved to the end of the compounds,
>> the static initialization still work. What’s the issue here?
>> 
 My question: is it possible to update your source code to move
 the structure with flexible array member to the end of the containing
 structure?
 
 i.e, in your example, in the struct Thread_Configured_control,
 move the field “Thread_Control Control” to the end of the structure?
>>> 
>>> If we move the Thread_Control to the end, how would I add a
>>> configuration defined number of elements at the end?
>> 
>> Don’t understand this, why moving the Thread_Control Control” to
>> the end of the containing structure will make this a problem?
>> Could you please explain this with a simplified example?
> 
> I found your example at [2] and tried to trim/summarize it here:
> 
> 
> struct _Thread_Control {
>Objects_Control Object;
>...
>void*extensions[];
> };
> typedef struct _Thread_Control Thread_Control;
> 
> struct Thread_Configured_control {
>  Thread_Control Control;
> 
>  #if CONFIGURE_MAXIMUM_USER_EXTENSIONS > 0
>void *extensions[ CONFIGURE_MAXIMUM_USER_EXTENSIONS + 1 ];
>  #endif
>  Configuration_Scheduler_node Scheduler_nodes[ _CONFIGURE_SCHEDULER_COUNT ];
>  RTEMS_API_Control API_RTEMS;
>  #ifdef RTEMS_POSIX_API
>  

Re: [PATCH] c++: nested aggregate/alias CTAD fixes

2024-05-08 Thread Patrick Palka
On Wed, 8 May 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look
> OK for trunk and perhaps 14?
> 
> -- >8 --
> 
> During maybe_aggr_guide with a nested class template and paren init,
> like with list init we need to consider the generic template type rather
> than the partially instantiated type since only the former has
> TYPE_FIELDS.  And in turn we need to partially substitute PARMs in the
> paren init case as well.  As a drive-by improvement it seems better to
> use outer_template_args instead of DECL_TI_ARGS during this partial
> substitution so that we lower instead of substitute the innermost
> generic template arguments, which is generally more robust.

... lower instead of substitute the innermost _template parameters_, rather

> 
> And during alias_ctad_tweaks with a nested class template, even though
> the guides may be already partially instantiated we still need to
> use the full set of arguments, not just the innermost, when substituting
> its constraints.
> 
>   PR c++/114974
>   PR c++/114901
>   PR c++/114903
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (maybe_aggr_guide): Fix obtaining TYPE_FIELDS in
>   the paren init case.  Hoist out partial substitution logic
>   to apply to the paren init case as well.
>   (alias_ctad_tweaks): Substitute constraints using the full
>   set of template arguments.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/class-deduction-aggr14.C: New test.
>   * g++.dg/cpp2a/class-deduction-alias20.C: New test.
>   * g++.dg/cpp2a/class-deduction-alias21.C: New test.
> ---
>  gcc/cp/pt.cc  | 39 +++
>  .../g++.dg/cpp2a/class-deduction-aggr14.C | 11 ++
>  .../g++.dg/cpp2a/class-deduction-alias20.C| 22 +++
>  .../g++.dg/cpp2a/class-deduction-alias21.C| 38 ++
>  4 files changed, 93 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr14.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias20.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias21.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 1816bfd1f40..f3d52acaaac 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -30192,26 +30192,11 @@ maybe_aggr_guide (tree tmpl, tree init, 
> vec *args)
>if (init == error_mark_node)
>   return NULL_TREE;
>parms = collect_ctor_idx_types (init, parms);
> -  /* If we're creating a deduction guide for a member class template,
> -  we've used the original template pattern type for the reshape_init
> -  above; this is done because we want PARMS to be a template parameter
> -  type, something that can be deduced when used as a function template
> -  parameter.  At this point the outer class template has already been
> -  partially instantiated (we deferred the deduction until the enclosing
> -  scope is non-dependent).  Therefore we have to partially instantiate
> -  PARMS, so that its template level is properly reduced and we don't get
> -  mismatches when deducing types using the guide with PARMS.  */
> -  if (member_template_p)
> - {
> -   ++processing_template_decl;
> -   parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
> -   --processing_template_decl;
> - }
>  }
>else if (TREE_CODE (init) == TREE_LIST)
>  {
>int len = list_length (init);
> -  for (tree field = TYPE_FIELDS (type);
> +  for (tree field = TYPE_FIELDS (template_type);
>  len;
>  --len, field = DECL_CHAIN (field))
>   {
> @@ -30226,6 +30211,22 @@ maybe_aggr_guide (tree tmpl, tree init, 
> vec *args)
>  /* Aggregate initialization doesn't apply to an initializer expression.  
> */
>  return NULL_TREE;
>  
> +  /* If we're creating a deduction guide for a member class template,
> + we've used the original template pattern type for the reshape_init
> + above; this is done because we want PARMS to be a template parameter
> + type, something that can be deduced when used as a function template
> + parameter.  At this point the outer class template has already been
> + partially instantiated (we deferred the deduction until the enclosing
> + scope is non-dependent).  Therefore we have to partially instantiate
> + PARMS, so that its template level is properly reduced and we don't get
> + mismatches when deducing types using the guide with PARMS.  */
> +  if (member_template_p)
> +{
> +  ++processing_template_decl;
> +  parms = tsubst (parms, outer_template_args (tmpl), complain, init);
> +  --processing_template_decl;
> +}
> +
>if (parms)
>  {
>tree last = parms;
> @@ -30417,7 +30418,11 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
> /* Substitute the associated constraints.  */
> tree ci = get_constraints (f);
> if (ci)

[PATCH] c++: nested aggregate/alias CTAD fixes

2024-05-08 Thread Patrick Palka
Bootstrapped and regtested on x86-64-pc-linux-gnu, does this look
OK for trunk and perhaps 14?

-- >8 --

During maybe_aggr_guide with a nested class template and paren init,
like with list init we need to consider the generic template type rather
than the partially instantiated type since only the former has
TYPE_FIELDS.  And in turn we need to partially substitute PARMs in the
paren init case as well.  As a drive-by improvement it seems better to
use outer_template_args instead of DECL_TI_ARGS during this partial
substitution so that we lower instead of substitute the innermost
generic template arguments, which is generally more robust.

And during alias_ctad_tweaks with a nested class template, even though
the guides may be already partially instantiated we still need to
use the full set of arguments, not just the innermost, when substituting
its constraints.

PR c++/114974
PR c++/114901
PR c++/114903

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Fix obtaining TYPE_FIELDS in
the paren init case.  Hoist out partial substitution logic
to apply to the paren init case as well.
(alias_ctad_tweaks): Substitute constraints using the full
set of template arguments.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr14.C: New test.
* g++.dg/cpp2a/class-deduction-alias20.C: New test.
* g++.dg/cpp2a/class-deduction-alias21.C: New test.
---
 gcc/cp/pt.cc  | 39 +++
 .../g++.dg/cpp2a/class-deduction-aggr14.C | 11 ++
 .../g++.dg/cpp2a/class-deduction-alias20.C| 22 +++
 .../g++.dg/cpp2a/class-deduction-alias21.C| 38 ++
 4 files changed, 93 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr14.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias20.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-alias21.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1816bfd1f40..f3d52acaaac 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30192,26 +30192,11 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
   if (init == error_mark_node)
return NULL_TREE;
   parms = collect_ctor_idx_types (init, parms);
-  /* If we're creating a deduction guide for a member class template,
-we've used the original template pattern type for the reshape_init
-above; this is done because we want PARMS to be a template parameter
-type, something that can be deduced when used as a function template
-parameter.  At this point the outer class template has already been
-partially instantiated (we deferred the deduction until the enclosing
-scope is non-dependent).  Therefore we have to partially instantiate
-PARMS, so that its template level is properly reduced and we don't get
-mismatches when deducing types using the guide with PARMS.  */
-  if (member_template_p)
-   {
- ++processing_template_decl;
- parms = tsubst (parms, DECL_TI_ARGS (tmpl), complain, init);
- --processing_template_decl;
-   }
 }
   else if (TREE_CODE (init) == TREE_LIST)
 {
   int len = list_length (init);
-  for (tree field = TYPE_FIELDS (type);
+  for (tree field = TYPE_FIELDS (template_type);
   len;
   --len, field = DECL_CHAIN (field))
{
@@ -30226,6 +30211,22 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
 /* Aggregate initialization doesn't apply to an initializer expression.  */
 return NULL_TREE;
 
+  /* If we're creating a deduction guide for a member class template,
+ we've used the original template pattern type for the reshape_init
+ above; this is done because we want PARMS to be a template parameter
+ type, something that can be deduced when used as a function template
+ parameter.  At this point the outer class template has already been
+ partially instantiated (we deferred the deduction until the enclosing
+ scope is non-dependent).  Therefore we have to partially instantiate
+ PARMS, so that its template level is properly reduced and we don't get
+ mismatches when deducing types using the guide with PARMS.  */
+  if (member_template_p)
+{
+  ++processing_template_decl;
+  parms = tsubst (parms, outer_template_args (tmpl), complain, init);
+  --processing_template_decl;
+}
+
   if (parms)
 {
   tree last = parms;
@@ -30417,7 +30418,11 @@ alias_ctad_tweaks (tree tmpl, tree uguides)
  /* Substitute the associated constraints.  */
  tree ci = get_constraints (f);
  if (ci)
-   ci = tsubst_constraint_info (ci, targs, complain, in_decl);
+   {
+ if (tree outer_targs = outer_template_args (f))
+   ci = tsubst_constraint_info (ci, outer_targs, complain, 
in_decl);
+ ci = tsubst_constraint_info 

[PATCH gcc-13-backport] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2024-05-08 Thread Palmer Dabbelt
From: Yanzhang Wang 

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_save_reg_p): Save ra for leaf
when enabling -mno-omit-leaf-frame-pointer
(riscv_option_override): Override omit-frame-pointer.
(riscv_frame_pointer_required): Save s0 for non-leaf function
(TARGET_FRAME_POINTER_REQUIRED): Override defination
* config/riscv/riscv.opt: Add option support.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/omit-frame-pointer-1.c: New test.
* gcc.target/riscv/omit-frame-pointer-2.c: New test.
* gcc.target/riscv/omit-frame-pointer-3.c: New test.
* gcc.target/riscv/omit-frame-pointer-4.c: New test.
* gcc.target/riscv/omit-frame-pointer-test.c: New test.

Signed-off-by: Yanzhang Wang 
(cherry picked from commit 39663298b5934831a0125e12f113ebd83248c3be)
---
I haven't tested this (just an all-gcc build), but I figured I'd just
send it now as it's kind of a grey area for backports: the flag itself
is a new feature, but it also fixes a compatibility issue with the psABI
-- which itself is a grey area, as the psABI change was a retrofit and is
marked as optional.  I'd test it before pushing it, but this is one of
those things where I'm not really sure what the backporting rules
indicate we should do.

There's more discussion on this LKML thread:
https://lore.kernel.org/linux-riscv/527dd4d8-f1e5-4581-b1e3-aa315fea8...@sifive.com/T/#mf15ccc659b7b8b838b88959fbea460210875eb9c

That also has a much smaller fix, but having the whole argument seems
like a nicer user interface to me -- then users who really want
compatibility with the psABI's section on frame records can just ask for
it directly (via the odd spelling `-fno-omit-frame-pointer
-mno-omit-leaf-frame-pointer`, but too late to change that).

Thoughts on this for 13?

We'd probably also want it all the way back to 11, but I assume that's
going to be the same discussion.
---
 gcc/config/riscv/riscv.cc | 34 ++-
 gcc/config/riscv/riscv.opt|  4 +++
 .../gcc.target/riscv/omit-frame-pointer-1.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-2.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-3.c   |  7 
 .../gcc.target/riscv/omit-frame-pointer-4.c   |  7 
 .../riscv/omit-frame-pointer-test.c   | 13 +++
 7 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/omit-frame-pointer-test.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index cefd3b7b2b2..e8572f8739d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -408,6 +408,10 @@ static const struct riscv_tune_info 
riscv_tune_info_table[] = {
 #include "riscv-cores.def"
 };
 
+/* Global variable to distinguish whether we should save and restore s0/fp for
+   function.  */
+static bool riscv_save_frame_pointer;
+
 void riscv_frame_info::reset(void)
 {
   total_size = 0;
@@ -4786,7 +4790,11 @@ riscv_save_reg_p (unsigned int regno)
   if (regno == HARD_FRAME_POINTER_REGNUM && frame_pointer_needed)
 return true;
 
-  if (regno == RETURN_ADDR_REGNUM && crtl->calls_eh_return)
+  /* Need not to use ra for leaf when frame pointer is turned off by option
+ whatever the omit-leaf-frame's value.  */
+  bool keep_leaf_ra = frame_pointer_needed && crtl->is_leaf
+&& !TARGET_OMIT_LEAF_FRAME_POINTER;
+  if (regno == RETURN_ADDR_REGNUM && (crtl->calls_eh_return || keep_leaf_ra))
 return true;
 
   /* If this is an interrupt handler, then must save extra registers.  */
@@ -6316,6 +6324,21 @@ riscv_option_override (void)
   if (flag_pic)
 riscv_cmodel = CM_PIC;
 
+  /* We need to save the fp with ra for non-leaf functions with no fp and ra
+ for leaf functions while no-omit-frame-pointer with
+ omit-leaf-frame-pointer.  The x_flag_omit_frame_pointer has the first
+ priority to determine whether the frame pointer is needed.  If we do not
+ override it, the fp and ra will be stored for leaf functions, which is not
+ our wanted.  */
+  riscv_save_frame_pointer = false;
+  if (TARGET_OMIT_LEAF_FRAME_POINTER_P (global_options.x_target_flags))
+{
+  if (!global_options.x_flag_omit_frame_pointer)
+   riscv_save_frame_pointer = true;
+
+  global_options.x_flag_omit_frame_pointer = 1;
+}
+
   /* We get better code with explicit relocs for CM_MEDLOW, but
  worse code for the others (for now).  Pick the best default.  */
   if ((target_flags_explicit & MASK_EXPLICIT_RELOCS) == 0)
@@ -7235,6 +7258,12 @@ riscv_lshift_subword (machine_mode mode, rtx value, rtx 
shift,
  gen_lowpart (QImode, shift)));

Re: [COMMITTED] warn-access: Fix handling of unnamed types [PR109804]

2024-05-08 Thread Andrew Pinski
On Thu, Feb 22, 2024 at 9:28 AM Andrew Pinski  wrote:
>
> This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
> DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
> the code expected newc.u.s_binary.left would be valid.
> So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function 
> paramaters
> (DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters 
> (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
>
> Note the code in the demangler does this when it sets 
> DEMANGLE_COMPONENT_UNNAMED_TYPE:
>   ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
>   ret->u.s_number.number = num;
>
> Committed as obvious after bootstrap/test on x86_64-linux-gnu
> Will commit to other branches in a few days.

Now committed (with the testcase fix backported too) to the GCC 12 branch.

Thanks,
Andrew Pinski

>
> PR tree-optimization/109804
>
> gcc/ChangeLog:
>
> * gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
> DEMANGLE_COMPONENT_UNNAMED_TYPE.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/warn/Wmismatched-new-delete-8.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-ssa-warn-access.cc |  1 +
>  .../g++.dg/warn/Wmismatched-new-delete-8.C| 42 +++
>  2 files changed, 43 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
>
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index cd083ab2237..dedaae27b31 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -1701,6 +1701,7 @@ new_delete_mismatch_p (const demangle_component ,
>
>  case DEMANGLE_COMPONENT_FUNCTION_PARAM:
>  case DEMANGLE_COMPONENT_TEMPLATE_PARAM:
> +case DEMANGLE_COMPONENT_UNNAMED_TYPE:
>return newc.u.s_number.number != delc.u.s_number.number;
>
>  case DEMANGLE_COMPONENT_CHARACTER:
> diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C 
> b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> new file mode 100644
> index 000..0ddc056c6df
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> @@ -0,0 +1,42 @@
> +/* PR tree-optimization/109804 */
> +/* { dg-do compile { target c++11 } } */
> +/* { dg-options "-Wall" } */
> +
> +/* Here we used to ICE in new_delete_mismatch_p because
> +   we didn't handle unnamed types from the demangler 
> (DEMANGLE_COMPONENT_UNNAMED_TYPE). */
> +
> +template 
> +static inline T * construct_at(void *at, ARGS && args)
> +{
> + struct Placeable : T
> + {
> +  Placeable(ARGS && args) : T(args) { }
> +  void * operator new (long unsigned int, void *ptr) { return ptr; }
> +  void operator delete (void *, void *) { }
> + };
> + return new (at) Placeable(static_cast(args));
> +}
> +template 
> +struct Reconstructible
> +{
> +  char _space[sizeof(MT)];
> +  Reconstructible() { }
> +};
> +template 
> +struct Constructible : Reconstructible
> +{
> + Constructible(){}
> +};
> +struct A { };
> +struct B
> +{
> + Constructible a { };
> + B(int) { }
> +};
> +Constructible b { };
> +void f()
> +{
> +  enum { ENUM_A = 1 };
> +  enum { ENUM_B = 1 };
> +  construct_at(b._space, ENUM_B);
> +}
> --
> 2.43.0
>


[patch,avr,applied] PR114981: Implement __powidf2

2024-05-08 Thread Georg-Johann Lay

This adds __powidf2 as a wrapper in LibF7.

Johann

--

avr: target/114981 - Support __builtin_powi[l] / __powidf2.

This supports __powidf2 by means of a double wrapper for already
existing f7_powi (renamed to __f7_powi by f7-renames.h).
It tweaks the implementation so that it does not perform trivial
multiplications with 1.0 any more, but instead uses a move.
It also fixes the last statement of f7_powi, which was wrong.
Notice that f7_powi was unused until now.

PR target/114981
libgcc/config/avr/libf7/
* libf7-common.mk (F7_ASM_PARTS): Add D_powi
* libf7-asm.sx (F7MOD_D_powi_, __powidf2): New module and function.
* libf7.c (f7_powi): Fix last (wrong) statement.
Tweak trivial multiplications with 1.0.
testsuite/
* gcc.target/avr/pr114981-powil.c: New test.diff --git a/gcc/testsuite/gcc.target/avr/pr114981-powil.c b/gcc/testsuite/gcc.target/avr/pr114981-powil.c
new file mode 100644
index 000..70f8e796c65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr114981-powil.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-Os" } */
+
+const long double vals[] =
+  {
+0.0625L, -0.125L, 0.25L, -0.5L,
+1.0L,
+-2.0L, 4.0L, -8.0L, 16.0L
+  };
+
+#define ARRAY_SIZE(X) ((int) (sizeof(X) / sizeof(*X)))
+
+__attribute__((noinline,noclone))
+void test1 (long double x)
+{
+  int i;
+
+  for (i = 0; i < ARRAY_SIZE (vals); ++i)
+{
+  long double val0 = vals[i];
+  long double val1 = __builtin_powil (x, i - 4);
+  __asm ("" : "+r" (val0));
+
+  if (val0 != val1)
+	__builtin_exit (__LINE__);
+}
+}
+
+int main (void)
+{
+  test1 (-2.0L);
+  return 0;
+}
diff --git a/libgcc/config/avr/libf7/libf7-asm.sx b/libgcc/config/avr/libf7/libf7-asm.sx
index 1ab91270cb2..1f8f60ab282 100644
--- a/libgcc/config/avr/libf7/libf7-asm.sx
+++ b/libgcc/config/avr/libf7/libf7-asm.sx
@@ -1877,4 +1877,16 @@ DEFUN call_ddd
 
 #include "f7-wraps.h"
 
+;;; Some additional, singular wraps that don't match any pattern.
+
+;; double __powidf2 (double, int)  ; __builtin_powi
+#ifdef F7MOD_D_powi_
+_DEFUN __powidf2
+.global F7_NAME(powi)
+ldi ZH, hi8(gs(F7_NAME(powi)))
+ldi ZL, lo8(gs(F7_NAME(powi)))
+F7jmp   call_ddx
+_ENDF __powidf2
+#endif /* F7MOD_D_powi_ */
+
 #endif /* !AVR_TINY */
diff --git a/libgcc/config/avr/libf7/libf7-common.mk b/libgcc/config/avr/libf7/libf7-common.mk
index d541b48ff3c..5d411071c8e 100644
--- a/libgcc/config/avr/libf7/libf7-common.mk
+++ b/libgcc/config/avr/libf7/libf7-common.mk
@@ -22,7 +22,7 @@ F7_ASM_PARTS += addsub_mant_scaled store load
 F7_ASM_PARTS += to_integer to_unsigned clz normalize_with_carry normalize
 F7_ASM_PARTS += store_expo sqrt16 sqrt_approx div
 
-F7_ASM_PARTS += D_class D_fma
+F7_ASM_PARTS += D_class D_fma D_powi
 F7_ASM_PARTS += D_isnan D_isinf D_isfinite D_signbit D_copysign D_neg D_fabs
 
 F7_ASM_PARTS += call_dd call_ddd
diff --git a/libgcc/config/avr/libf7/libf7.c b/libgcc/config/avr/libf7/libf7.c
index 369dbe24103..375becb854c 100644
--- a/libgcc/config/avr/libf7/libf7.c
+++ b/libgcc/config/avr/libf7/libf7.c
@@ -1752,20 +1752,33 @@ void f7_powi (f7_t *cc, const f7_t *aa, int ii)
 {
   uint16_t u16 = ii;
   f7_t xx27, *xx2 = 
+  bool cc_is_one = true;
+  bool expo_is_neg = false;
 
   if (ii < 0)
-u16 = -u16;
+{
+  u16 = -u16;
+  expo_is_neg = true;
+}
 
   f7_copy (xx2, aa);
 
-  f7_set_u16 (cc, 1);
-
   while (1)
 {
   if (u16 & 1)
-	f7_Imul (cc, xx2);
+	{
+	  if (cc_is_one)
+	{
+	  // C *= X2 simplifies to C = X2.
+	  f7_copy (cc, xx2);
+	  cc_is_one = false;
+	}
+	  else
+	f7_Imul (cc, xx2);
+	}
 
-  if (! f7_is_nonzero (cc))
+  if (! cc_is_one
+	  && ! f7_is_nonzero (cc))
 	break;
 
   u16 >>= 1;
@@ -1774,8 +1787,10 @@ void f7_powi (f7_t *cc, const f7_t *aa, int ii)
   f7_Isquare (xx2);
 }
 
-  if (ii < 0)
-f7_div1 (xx2, aa);
+  if (cc_is_one)
+f7_set_u16 (cc, 1);
+  else if (expo_is_neg)
+f7_div1 (cc, cc);
 }
 #endif // F7MOD_powi_
 


[pushed][PR114810][LRA]: Recognize alternatives with lack of available registers for insn and demote them.

2024-05-08 Thread Vladimir Makarov

The following patch is a fix for PR114810 from LRA side.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114810

The patch was successfully bootstrapped and tested on x86_64, aarch64, 
ppc64le.


commit dc859c1fcb6f3ad95022fb078c040907ef361e4c
Author: Vladimir N. Makarov 
Date:   Wed May 8 10:39:04 2024 -0400

[PR114810][LRA]: Recognize alternatives with lack of available registers for insn and demote them.

  PR114810 was fixed in machine-dependent way.  This patch is a fix of
the PR on LRA side.  LRA chose alternative with constraints `,r,ro`
on i686 when all operands of DImode and there are only 6 available
general regs.  The patch recognizes such case and significantly
increase the alternative cost.  It does not reject alternative
completely.  So the fix is safe but it might not work for all
potentially possible cases of registers lack as register classes can
have any relations including subsets and intersections.

gcc/ChangeLog:

PR target/114810
* lra-constraints.cc (process_alt_operands): Calculate union reg
class for the alternative, peak matched regs and required reload
regs.  Recognize alternatives with lack of available registers and
make them costly.  Add debug print about this case.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 10e3d4e4097..5b78fd0b7e5 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -2127,6 +2127,8 @@ process_alt_operands (int only_alternative)
   /* Numbers of operands which are early clobber registers.  */
   int early_clobbered_nops[MAX_RECOG_OPERANDS];
   enum reg_class curr_alt[MAX_RECOG_OPERANDS];
+  enum reg_class all_this_alternative;
+  int all_used_nregs, all_reload_nregs;
   HARD_REG_SET curr_alt_set[MAX_RECOG_OPERANDS];
   HARD_REG_SET curr_alt_exclude_start_hard_regs[MAX_RECOG_OPERANDS];
   bool curr_alt_match_win[MAX_RECOG_OPERANDS];
@@ -2229,7 +2231,8 @@ process_alt_operands (int only_alternative)
   curr_alt_out_sp_reload_p = false;
   curr_reuse_alt_p = true;
   curr_alt_class_change_p = false;
-  
+  all_this_alternative = NO_REGS;
+  all_used_nregs = all_reload_nregs = 0;
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2660,6 +2663,15 @@ process_alt_operands (int only_alternative)
 	  /* Record which operands fit this alternative.  */
 	  if (win)
 	{
+	  if (early_clobber_p
+		  || curr_static_id->operand[nop].type != OP_OUT)
+		{
+		  all_used_nregs
+		+= ira_reg_class_min_nregs[this_alternative][mode];
+		  all_this_alternative
+		= (reg_class_subunion
+		   [all_this_alternative][this_alternative]);
+		}
 	  this_alternative_win = true;
 	  if (class_change_p)
 		{
@@ -2781,7 +2793,19 @@ process_alt_operands (int only_alternative)
 		   & ~((ira_prohibited_class_mode_regs
 			[this_alternative][mode])
 			   | lra_no_alloc_regs));
-		  if (hard_reg_set_empty_p (available_regs))
+		  if (!hard_reg_set_empty_p (available_regs))
+		{
+		  if (early_clobber_p
+			  || curr_static_id->operand[nop].type != OP_OUT)
+			{
+			  all_reload_nregs
+			+= ira_reg_class_min_nregs[this_alternative][mode];
+			  all_this_alternative
+			= (reg_class_subunion
+			   [all_this_alternative][this_alternative]);
+			}
+		}
+		  else
 		{
 		  /* There are no hard regs holding a value of given
 			 mode.  */
@@ -3217,6 +3241,21 @@ process_alt_operands (int only_alternative)
 		 "Cycle danger: overall += LRA_MAX_REJECT\n");
 	  overall += LRA_MAX_REJECT;
 	}
+  if (all_this_alternative != NO_REGS
+	  && all_used_nregs != 0 && all_reload_nregs != 0
+	  && (all_used_nregs + all_reload_nregs + 1
+	  >= ira_class_hard_regs_num[all_this_alternative]))
+	{
+	  if (lra_dump_file != NULL)
+	fprintf
+	  (lra_dump_file,
+	   "Register starvation: overall += LRA_MAX_REJECT"
+	   "(class=%s,avail=%d,used=%d,reload=%d)\n",
+	   reg_class_names[all_this_alternative],
+	   ira_class_hard_regs_num[all_this_alternative],
+	   all_used_nregs, all_reload_nregs);
+	  overall += LRA_MAX_REJECT;
+	}
   ok_p = true;
   curr_alt_dont_inherit_ops_num = 0;
   for (nop = 0; nop < early_clobbered_regs_num; nop++)


Re: [PATCH 4/4] lra: Apply DF_LIVE_SUBREG data

2024-05-08 Thread Vladimir Makarov



On 5/7/24 23:01, Lehua Ding wrote:

Hi Vladimir,

I'll send V3 patchs based on these comments. Note that these four 
patches only support subreg liveness tracking and apply to IRA and LRA 
pass. Therefore, no performance changes are expected before we support 
subreg coalesce. There will be new patches later to complete the 
subreg coalesce functionality. Support for subreg coalesce requires 
support for subreg copy i.e. modifying the logic for conflict detection.



Thank you for your clarification that the current batch of patches does 
not change the performance.  I hope the next batch of patches will be 
added to devel/subreg-coalesce branch too for their easier evaluation.





Re: [COMMITTED/13] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2024-05-08 Thread Andrew Pinski
On Sun, Oct 1, 2023 at 1:23 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
> the comparison to see if the transformation could be done was using the
> wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
> the outer value, it was comparing the inner to the value used in the 
> comparison
> which was wrong.
>
> Committed to GCC 13 branch after bootstrapped and tested on x86_64-linux-gnu.

Committed also to GCC 12 and 11 branches.

>
> gcc/ChangeLog:
>
> PR tree-optimization/111331
> * tree-ssa-phiopt.cc (minmax_replacement):
> Fix the LE/GE comparison for the
> `(a CMP CST1) ? max : a` optimization.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/111331
> * gcc.c-torture/execute/pr111331-1.c: New test.
> * gcc.c-torture/execute/pr111331-2.c: New test.
> * gcc.c-torture/execute/pr111331-3.c: New test.
>
> (cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)
> ---
>  .../gcc.c-torture/execute/pr111331-1.c| 17 +
>  .../gcc.c-torture/execute/pr111331-2.c| 19 +++
>  .../gcc.c-torture/execute/pr111331-3.c| 15 +++
>  gcc/tree-ssa-phiopt.cc|  8 
>  4 files changed, 55 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> new file mode 100644
> index 000..4c7f4fdbaa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> @@ -0,0 +1,17 @@
> +int a;
> +int b;
> +int c(int d, int e, int f) {
> +  if (d < e)
> +return e;
> +  if (d > f)
> +return f;
> +  return d;
> +}
> +int main() {
> +  int g = -1;
> +  a = c(b + 30, 29, g + 29);
> +  volatile t = a;
> +  if (t != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> new file mode 100644
> index 000..5c677f2caa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> @@ -0,0 +1,19 @@
> +
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +if (d < 29)
> +  t =  29;
> +else
> +  t = (d > 28) ? 28 : d;
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> new file mode 100644
> index 000..213d9bdd539
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> @@ -0,0 +1,15 @@
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +t = d < 29 ? 29 : ((d > 28) ? 28 : d);
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index a7ab6ce4ad9..c3d78d1400b 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -2270,7 +2270,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND <= LARGER.  */
>   if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
> - bound, larger)))
> + bound, arg_false)))
> return false;
> }
>   else if (operand_equal_for_phi_arg_p (arg_false, smaller)
> @@ -2301,7 +2301,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= SMALLER.  */
>   if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
> - bound, smaller)))
> + bound, arg_false)))
> return false;
> }
>   else
> @@ -2341,7 +2341,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= LARGER.  */
>   if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
> - bound, larger)))
> + bound, arg_true)))
> return false;
> }
>   else if (operand_equal_for_phi_arg_p (arg_true, smaller)
> @@ -2368,7 +2368,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND <= SMALLER.  */
>   if 

Re: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-05-08 Thread Andrew Pinski
On Mon, Mar 11, 2024 at 11:41 PM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Sunday, March 10, 2024 7:58 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Andrew Pinski (QUIC) 
> > Subject: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for
> > NaNs [PR95351]
> >
> > The problem here is that merge_truthop_with_opposite_arm would use the
> > type of the result of the comparison rather than the operands of the
> > comparison to figure out if we are honoring NaNs.
> > This fixes that oversight and now we get the correct results in this case.
> >
> > Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
>
> Committed to the GCC 13 branch too.

And the GCC 12 and 11 branches too.


>
> Thanks,
> Andrew
>
> >
> >   PR middle-end/95351
> >
> > gcc/ChangeLog:
> >
> >   * fold-const.cc (merge_truthop_with_opposite_arm): Use
> >   the type of the operands of the comparison and not the type
> >   of the comparison.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/float_opposite_arm-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/fold-const.cc   |  3 ++-
> >  gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> > 43105d20be3..299c22bf391 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -6420,7 +6420,6 @@ static tree
> >  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
> >bool rhs_only)
> >  {
> > -  tree type = TREE_TYPE (cmpop);
> >enum tree_code code = TREE_CODE (cmpop);
> >enum tree_code truthop_code = TREE_CODE (op);
> >tree lhs = TREE_OPERAND (op, 0);
> > @@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t
> > loc, tree op, tree cmpop,
> >if (TREE_CODE_CLASS (code) != tcc_comparison)
> >  return NULL_TREE;
> >
> > +  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
> > +
> >if (rhs_code == truthop_code)
> >  {
> >tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop,
> > rhs_only); diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > new file mode 100644
> > index 000..d2dbff35066
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
> > +/* { dg-add-options ieee } */
> > +/* PR middle-end/95351 */
> > +
> > +int Foo(double possiblyNAN, double b, double c) {
> > +return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c)); }
> > +
> > +/* Make sure we don't remove either >/<=  */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0"
> > +"optimized" } } */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } }
> > +*/
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0"
> > +"optimized" } } */
> > --
> > 2.43.0
>


Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2024-05-08 Thread Andrew Pinski
On Sat, Jul 22, 2023 at 8:36 PM Kito Cheng via Gcc-patches
 wrote:
>
> OK for trunk, thanks:)

I have now backported it to 13 branch.

Thanks,
Andrew


>
> Andrew Pinski via Gcc-patches  於 2023年7月23日 週日
> 09:07 寫道:
>
> > The problem -fasynchronous-unwind-tables is on by default for riscv linux
> > We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__
> > point
> > to .eh_frame data from crtbeginT.o instead of the user-defined object
> > during static linking.
> >
> > This turns it off.
> >
> > OK?
> >
> > libgcc/ChangeLog:
> >
> > * config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
> > (riscv*-*-freebsd*): Likewise.
> > * config/riscv/t-crtstuff: New file.
> > ---
> >  libgcc/config.host | 4 ++--
> >  libgcc/config/riscv/t-crtstuff | 5 +
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >  create mode 100644 libgcc/config/riscv/t-crtstuff
> >
> > diff --git a/libgcc/config.host b/libgcc/config.host
> > index 9d7212028d0..c94d69d84b7 100644
> > --- a/libgcc/config.host
> > +++ b/libgcc/config.host
> > @@ -1304,12 +1304,12 @@ pru-*-*)
> > tm_file="$tm_file pru/pru-abi.h"
> > ;;
> >  riscv*-*-linux*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > md_unwind_header=riscv/linux-unwind.h
> > ;;
> >  riscv*-*-freebsd*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > ;;
> >  riscv*-*-*)
> > diff --git a/libgcc/config/riscv/t-crtstuff
> > b/libgcc/config/riscv/t-crtstuff
> > new file mode 100644
> > index 000..685d11b3e66
> > --- /dev/null
> > +++ b/libgcc/config/riscv/t-crtstuff
> > @@ -0,0 +1,5 @@
> > +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv
> > linux
> > +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> > +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> > +# during static linking.
> > +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > --
> > 2.39.1
> >
> >


Re: [PATCH 2/4] df: Add DF_LIVE_SUBREG problem

2024-05-08 Thread Dimitar Dimitrov
On Wed, May 08, 2024 at 11:34:48AM +0800, 陈硕 wrote:
> Hi Dimitar
> 
> 
> I send a patch just now, modifies accordingly
> 
> 
> some comments:
> 
> 
> Nit: Should have two spaces after the dot, per GNU coding style. 
> I'd suggest
> to run the contrib/check_GNU_style.py script on your patches.
> Do you mean "star" by "dot", i.e. "/*" should be "/* "?

No, I was referring to the following paragraph from
https://www.gnu.org/prep/standards/standards.html :
   "Please put two spaces after the end of a sentence in your comments, ..."

To fix, simply add a second space after the dot, e.g.:
  -   Like DF_LR, but include tracking subreg liveness. Currently used to 
provide
  +   Like DF_LR, but include tracking subreg liveness.  Currently used to 
provide


For reference, here is the output from the style checker:
  $ git show | ./contrib/check_GNU_style.py -
  === ERROR type #4: dot, space, space, new sentence (24 error(s)) ===
  ...
  gcc/df-problems.cc:1350:52:   Like DF_LR, but include tracking subreg 
liveness.█Currently used to provide

> 
> 
> These names seem a bit too short for global variables. Perhaps tuck
> them in a namespace?
> 
> Also, since these must remain empty, shouldn't they be declared as const?
> 
> namespace df {
>  const bitmap_head empty_bitmap;
>  const subregs_live empty_live;
> }
> 
> 
> 
> May be better if "namespace df" contains all DF related code? as a minor 
> modification, I add a prefix "df_" to the variables.
> Meanwhile, const seems inapropriate here, since it's returned as normal 
> pointer rather than const pointer in some funtions, 
> 
> change to const would break this return value type check, and a const_cast 
> would make the const meanlingless.
> 
> 
> more details see in the patch

Thanks for considering my suggestion.

Regards,
Dimitar
> 
> 
> regards
> Shuo
> 
> 
> 


Products

2024-05-08 Thread Justin Taylor
Hello,

We would like to purchase your product. Would you mind sharing 
your catalog with us?

Thank you!

Justin


[Patch, fortran] PR84006 [11/12/13/14/15 Regression] ICE in storage_size() with CLASS entity

2024-05-08 Thread Paul Richard Thomas
This fix is straightforward and described by the ChangeLog. Jose Rui
Faustino de Sousa posted the same fix for the ICE on the fortran list
slightly more than three years ago. Thinking that he had commit rights, I
deferred but, regrettably, the patch was never applied. The attached patch
also fixes storage_size and transfer for unlimited polymorphic arguments
with character payloads.

OK for mainline and backporting after a reasonable interval?

Paul

Fortran: Unlimited polymorphic intrinsic function arguments [PR84006]

2024-05-08  Paul Thomas  

gcc/fortran
PR fortran/84006
PR fortran/100027
PR fortran/98534
* trans-expr.cc (gfc_resize_class_size_with_len): Use the fold
even if a block is not available in which to fix the result.
(trans_class_assignment): Enable correct assignment of
character expressions to unlimited polymorphic variables using
lhs _len field and rse string_length.
* trans-intrinsic.cc (gfc_conv_intrinsic_storage_size): Extract
the class expression so that the unlimited polymorphic class
expression can be used in gfc_resize_class_size_with_len to
obtain the storage size for character payloads. Guard the use
of GFC_DECL_SAVED_DESCRIPTOR by testing for DECL_LANG_SPECIFIC
to prevent the ICE. Also, invert the order to use the class
expression extracted from the argument.
(gfc_conv_intrinsic_transfer): In same way as 'storage_size',
use the _len field to obtaining the correct length for arg 1.

gcc/testsuite/
PR fortran/84006
PR fortran/100027
* gfortran.dg/storage_size_7.f90: New test.

PR fortran/98534
* gfortran.dg/transfer_class_4.f90: New test.
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index bc8eb419cff..4590aa6edb4 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -317,6 +317,8 @@ gfc_resize_class_size_with_len (stmtblock_t * block, tree class_expr, tree size)
 	  size = gfc_evaluate_now (size, block);
 	  tmp = gfc_evaluate_now (fold_convert (type , tmp), block);
 	}
+  else
+	tmp = fold_convert (type , tmp);
   tmp2 = fold_build2_loc (input_location, MULT_EXPR,
 			  type, size, tmp);
   tmp = fold_build2_loc (input_location, GT_EXPR,
@@ -11994,15 +11996,24 @@ trans_class_assignment (stmtblock_t *block, gfc_expr *lhs, gfc_expr *rhs,
 
   /* Take into account _len of unlimited polymorphic entities.
 	 TODO: handle class(*) allocatable function results on rhs.  */
-  if (UNLIMITED_POLY (rhs) && rhs->expr_type == EXPR_VARIABLE)
+  if (UNLIMITED_POLY (rhs))
 	{
-	  tree len = trans_get_upoly_len (block, rhs);
+	  tree len;
+	  if (rhs->expr_type == EXPR_VARIABLE)
+	len = trans_get_upoly_len (block, rhs);
+	  else
+	len = gfc_class_len_get (tmp);
 	  len = fold_build2_loc (input_location, MAX_EXPR, size_type_node,
  fold_convert (size_type_node, len),
  size_one_node);
 	  size = fold_build2_loc (input_location, MULT_EXPR, TREE_TYPE (size),
   size, fold_convert (TREE_TYPE (size), len));
 	}
+  else if (rhs->ts.type == BT_CHARACTER && rse->string_length)
+	size = fold_build2_loc (input_location, MULT_EXPR,
+gfc_charlen_type_node, size,
+rse->string_length);
+
 
   tmp = lse->expr;
   class_han = GFC_CLASS_TYPE_P (TREE_TYPE (tmp))
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 83041183fcb..e18e4d1e183 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -8250,7 +8250,9 @@ gfc_conv_intrinsic_storage_size (gfc_se *se, gfc_expr *expr)
 {
   gfc_expr *arg;
   gfc_se argse;
-  tree type, result_type, tmp;
+  tree type, result_type, tmp, class_decl = NULL;
+  gfc_symbol *sym;
+  bool unlimited = false;
 
   arg = expr->value.function.actual->expr;
 
@@ -8261,10 +8263,12 @@ gfc_conv_intrinsic_storage_size (gfc_se *se, gfc_expr *expr)
 {
   if (arg->ts.type == BT_CLASS)
 	{
+	  unlimited = UNLIMITED_POLY (arg);
 	  gfc_add_vptr_component (arg);
 	  gfc_add_size_component (arg);
 	  gfc_conv_expr (, arg);
 	  tmp = fold_convert (result_type, argse.expr);
+	  class_decl = gfc_get_class_from_expr (argse.expr);
 	  goto done;
 	}
 
@@ -8276,14 +8280,20 @@ gfc_conv_intrinsic_storage_size (gfc_se *se, gfc_expr *expr)
 {
   argse.want_pointer = 0;
   gfc_conv_expr_descriptor (, arg);
+  sym = arg->expr_type == EXPR_VARIABLE ? arg->symtree->n.sym : NULL;
   if (arg->ts.type == BT_CLASS)
 	{
-	  if (arg->rank > 0)
+	  unlimited = UNLIMITED_POLY (arg);
+	  if (TREE_CODE (argse.expr) == COMPONENT_REF)
+	tmp = gfc_class_vtab_size_get (TREE_OPERAND (argse.expr, 0));
+	  else if (arg->rank > 0 && sym
+		   && DECL_LANG_SPECIFIC (sym->backend_decl))
 	tmp = gfc_class_vtab_size_get (
-		 GFC_DECL_SAVED_DESCRIPTOR (arg->symtree->n.sym->backend_decl));
+		 GFC_DECL_SAVED_DESCRIPTOR (sym->backend_decl));
 	  else
-	tmp = gfc_class_vtab_size_get (TREE_OPERAND (argse.expr, 0));
+	gcc_unreachable ();
 	  tmp = fold_convert (result_type, tmp);
+	  class_decl = gfc_get_class_from_expr 

Re: [PATCH] testsuite, rs6000: Remove some checks with aix[456]

2024-05-08 Thread David Edelsohn
On Wed, May 8, 2024 at 2:36 AM Kewen.Lin  wrote:

> Hi,
>
> Since r12-75-g0745b6fa66c69c aix6 support had been dropped,
> so we don't need to check for aix[456].* when testing, this
> patch is to remove such checks.
>
> Regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>

Okay


>
> I'm going to push this soon if no objections.
>
> BR,
> Kewen
> -
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp
> (check_effective_target_powerpc_altivec_ok): Remove checks for
> aix[456].*
> (check_effective_target_powerpc_p9modulo_ok): Likewise.
> (check_effective_target_powerpc_float128_sw_ok): Likewise.
> (check_effective_target_powerpc_float128_hw_ok): Likewise.
> (check_effective_target_powerpc_vsx_ok): Likewise.
> ---
>  gcc/testsuite/lib/target-supports.exp | 29 ---
>  1 file changed, 29 deletions(-)
>
> diff --git a/gcc/testsuite/lib/target-supports.exp
> b/gcc/testsuite/lib/target-supports.exp
> index 3a55b2a4159..16dc2766850 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -6963,11 +6963,6 @@ proc check_effective_target_powerpc_altivec_ok { } {
>  # Paired Single, then not ok
>  if { [istarget powerpc-*-linux*paired*] } { return 0 }
>
> -# AltiVec is not supported on AIX before 5.3.
> -if { [istarget powerpc*-*-aix4*]
> -|| [istarget powerpc*-*-aix5.1*]
> -|| [istarget powerpc*-*-aix5.2*] } { return 0 }
> -
>  # Return true iff compiling with -maltivec does not error.
>  return [check_no_compiler_messages powerpc_altivec_ok object {
> int dummy;
> @@ -6980,12 +6975,6 @@ proc check_effective_target_powerpc_p9modulo_ok { }
> {
>  if { ([istarget powerpc*-*-*]
>   && ![istarget powerpc-*-linux*paired*])
>  || [istarget rs6000-*-*] } {
> -   # AltiVec is not supported on AIX before 5.3.
> -   if { [istarget powerpc*-*-aix4*]
> -|| [istarget powerpc*-*-aix5.1*]
> -|| [istarget powerpc*-*-aix5.2*] } {
> -   return 0
> -   }
> return [check_no_compiler_messages powerpc_p9modulo_ok object {
> int main (void) {
> int i = 5, j = 3, r = -1;
> @@ -7116,12 +7105,6 @@ proc check_effective_target_powerpc_float128_sw_ok
> { } {
>  if { ([istarget powerpc*-*-*]
>   && ![istarget powerpc-*-linux*paired*])
>  || [istarget rs6000-*-*] } {
> -   # AltiVec is not supported on AIX before 5.3.
> -   if { [istarget powerpc*-*-aix4*]
> -|| [istarget powerpc*-*-aix5.1*]
> -|| [istarget powerpc*-*-aix5.2*] } {
> -   return 0
> -   }
> # Darwin doesn't have VSX, so no soft support for float128.
> if { [istarget *-*-darwin*] } {
> return 0
> @@ -7146,12 +7129,6 @@ proc check_effective_target_powerpc_float128_hw_ok
> { } {
>  if { ([istarget powerpc*-*-*]
>   && ![istarget powerpc-*-linux*paired*])
>  || [istarget rs6000-*-*] } {
> -   # AltiVec is not supported on AIX before 5.3.
> -   if { [istarget powerpc*-*-aix4*]
> -|| [istarget powerpc*-*-aix5.1*]
> -|| [istarget powerpc*-*-aix5.2*] } {
> -   return 0
> -   }
> # Darwin doesn't run on any machine with float128 h/w so far.
> if { [istarget *-*-darwin*] } {
> return 0
> @@ -7215,12 +7192,6 @@ proc check_effective_target_powerpc_vsx_ok { } {
>  if { ([istarget powerpc*-*-*]
>   && ![istarget powerpc-*-linux*paired*])
>  || [istarget rs6000-*-*] } {
> -   # VSX is not supported on AIX before 7.1.
> -   if { [istarget powerpc*-*-aix4*]
> -|| [istarget powerpc*-*-aix5*]
> -|| [istarget powerpc*-*-aix6*] } {
> -   return 0
> -   }
> # Darwin doesn't have VSX, even if it's used with an assembler
> # which recognises the insns.
> if { [istarget *-*-darwin*] } {
> --
> 2.39.1
>


Re: [PATCH 3/4] ranger: Revert the workaround introduced in PR112788 [PR112993]

2024-05-08 Thread Aldy Hernandez
I'll defer to the PPC maintainers, but LGTM. The less special casing, the
better.

Aldy

On Wed, May 8, 2024, 07:33 Kewen.Lin  wrote:

> Hi,
>
> This reverts commit r14-6478-gfda8e2f8292a90 "range:
> Workaround different type precision between _Float128 and
> long double [PR112788]" as the fixes for PR112993 make
> all 128 bits scalar floating point have the same 128 bit
> precision, this workaround isn't needed any more.
>
> Bootstrapped and regress-tested on:
>   - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
>   - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
>   - powerpc64le-linux-gnu P9 (with ieee128 by default)
>
> Is it OK for trunk if {1,2}/4 in this series get landed?
>
> BR,
> Kewen
> -
>
> PR target/112993
>
> gcc/ChangeLog:
>
> * value-range.h (range_compatible_p): Remove the workaround on
> different type precision between _Float128 and long double.
> ---
>  gcc/value-range.h | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 9531df56988..39de7daf3d9 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -1558,13 +1558,7 @@ range_compatible_p (tree type1, tree type2)
>// types_compatible_p requires conversion in both directions to be
> useless.
>// GIMPLE only requires a cast one way in order to be compatible.
>// Ranges really only need the sign and precision to be the same.
> -  return TYPE_SIGN (type1) == TYPE_SIGN (type2)
> -&& (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
> -// FIXME: As PR112788 shows, for now on rs6000 _Float128 has
> -// type precision 128 while long double has type precision 127
> -// but both have the same mode so their precision is actually
> -// the same, workaround it temporarily.
> -|| (SCALAR_FLOAT_TYPE_P (type1)
> -&& TYPE_MODE (type1) == TYPE_MODE (type2)));
> +  return (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
> + && TYPE_SIGN (type1) == TYPE_SIGN (type2));
>  }
>  #endif // GCC_VALUE_RANGE_H
> --
> 2.39.1
>
>


Re: [PATCH v2 4/4] RISC-V: Cover sign-extensions in lshr3_zero_extend_4

2024-05-08 Thread Christoph Müllner
On Wed, May 8, 2024 at 3:48 PM Jeff Law  wrote:
>
>
>
> On 5/8/24 1:36 AM, Christoph Müllner wrote:
> > The lshr3_zero_extend_4 pattern targets bit extraction
> > with zero-extension. This pattern represents the canonical form
> > of zero-extensions of a logical right shift.
> >
> > The same optimization can be applied to sign-extensions.
> > Given the two optimizations are so similar, this patch converts
> > the existing one to also cover the sign-extension case as well.
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/iterators.md (ashiftrt): New code attribute
> >   'extract_shift' and adding extractions to optab.
> >   * config/riscv/riscv.md (*lshr3_zero_extend_4): Rename to...
> >   (*3):...this and add support for
> >   sign-extensions.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/extend-shift-helpers.h: Add helpers for
> >   sign-extension.
> >   * gcc.target/riscv/sign-extend-rshift-32.c: New test.
> >   * gcc.target/riscv/sign-extend-rshift-64.c: New test.
> >   * gcc.target/riscv/sign-extend-rshift.c: New test.
> Oh, I see, you handled the special case with this patch.  Ignore my
> comment on 3/4.  3/4 is fine, as is this patch.

Oh, yes, I forgot to add this to 3/4.

Thanks!

>
> Thanks!
>
> jeff


Re: [PATCH v2 4/4] RISC-V: Cover sign-extensions in lshr3_zero_extend_4

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The lshr3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.

The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr3_zero_extend_4): Rename to...
(*3):...this and add support for
sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.
Oh, I see, you handled the special case with this patch.  Ignore my 
comment on 3/4.  3/4 is fine, as is this patch.


Thanks!

jeff


Re: [PATCH v2 3/4] RISC-V: Add zero_extract support for rv64gc

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions.  Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR 111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.

Doesn't your new pattern still match this one:


;; Canonical form for a zero-extend of a logical right shift.
(define_insn "*lshrsi3_zero_extend_2"   [(set (match_operand:DI   0 
"register_operand" "=r")
(zero_extract:DI (match_operand:DI  1 "register_operand" " r")
 (match_operand 2 "const_int_operand")
 (match_operand 3 "const_int_operand")))]
  "(TARGET_64BIT && (INTVAL (operands[3]) > 0)
&& (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
{
  return "srliw\t%0,%1,%3";
}
  [(set_attr "type" "shift")
   (set_attr "mode" "SI")]) 


Meaning that we'll start generating shift-pairs for this special case 
rather than using srliw directly.  I'm pretty sure Lyut and I stumbled 
over this exact problem when evaluating his effort in this space.


?

Jeff


Re: [PATCH v2 2/4] RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.

The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.

Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.

OK
jeff



Re: [PATCH v2 1/4] RISC-V: Add test for sraiw-31 special case

2024-05-08 Thread Jeff Law




On 5/8/24 1:36 AM, Christoph Müllner wrote:

We already optimize a sign-extension of a right-shift by 31 in
si3_extend.  Let's add a test for that (similar to
zero-extend-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: New test.

OK
jeff



Re: [PATCH] [RFC] Add function filtering to gcov

2024-05-08 Thread Richard Biener
On Fri, Mar 29, 2024 at 8:02 PM Jørgen Kvalsvik  wrote:
>
> This is a prototype for --include/--exclude flags, and I would like a
> review of both the approach and architecture, and the implementation,
> plus feedback on the feature itself. I did not update the manuals or
> carefully extend --help, in case the interface itself needs some
> revision before it can be merged.
>
> ---
>
> Add the --include and --exclude flags to gcov to control what functions
> to report on. This is meant to make gcov more practical as an when
> writing test suites or performing other coverage experiments, which
> tends to focus on a few functions at the time. This really shines in
> combination with the -t/--stdout flag. With support for more expansive
> metrics in gcov like modified condition/decision coverage (MC/DC) and
> path coverage, output quickly gets overwhelming without filtering.
>
> The approach is quite simple: filters are egrep regexes and are
> evaluated left-to-right, and the last filter "wins", that is, if a
> function matches an --include and a subsequent --exclude, it should not
> be included in the output. The output machinery is already interacting
> with the function table, which makes the json output work as expected,
> and only minor changes are needed to suppress the filtered-out
> functions.
>
> Demo: math.c
>
> int mul (int a, int b) {
> return a * b;
> }
>
> int sub (int a, int b) {
> return a - b;
> }
>
> int sum (int a, int b) {
> return a + b;
> }
>
> Plain matches:
>
> $ gcov -t math --include=sum
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:9:int sum (int a, int b) {
> #:   10:return a + b;
>
> $ gcov -t math --include=mul
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:1:int mul (int a, int b) {
> #:2:return a * b;
>
> Regex match:
>
> $ gcov -t math --include=su
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:5:int sub (int a, int b) {
> #:6:return a - b;
> -:7:}
> #:9:int sum (int a, int b) {
> #:   10:return a + b;
>
> And similar for exclude:
>
> $ gcov -t math --exclude=sum
> -:0:Source:filter.c
> -:0:Graph:filter.gcno
> -:0:Data:-
> -:0:Runs:0
> #:1:int mul (int a, int b) {
> #:2:return a * b;
> -:3:}
> #:5:int sub (int a, int b) {
> #:6:return a - b;
>
> And json, for good measure:
>
> $ gcov -t math --include=sum --json | jq ".files[].lines[]"
> {
>   "line_number": 9,
>   "function_name": "sum",
>   "count": 0,
>   "unexecuted_block": true,
>   "block_ids": [],
>   "branches": [],
>   "calls": []
> }
> {
>   "line_number": 10,
>   "function_name": "sum",
>   "count": 0,
>   "unexecuted_block": true,
>   "block_ids": [
> 2
>   ],
>   "branches": [],
>   "calls": []
> }
>
> Note that the last function gets "clipped" when lines are associated to
> functions, which means the closing brace is dropped from the report. I
> hope this can be fixed, but considering it is not really a part of the
> function body, the gcov report is "complete".
>
> Matching generally work well for mangled names, as the mangled names
> also have the base symbol name in it. A possible extension to the
> filtering commands would be to mix it with demangling to more nicely
> being able to filter specific overloads, without manually having to
> mangle the interesting symbols. The g++.dg/gcov/gcov-20.C test tests the
> matching of a mangled name.
>
> The dejagnu testing function verify-calls is somewhat minimal, but does
> the job well enough.
>
> Why not just use grep? grep is not really sufficient, as grep is very
> line oriented, and the reports that benefit the most from filtering
> often span multiple lines, unpredictably.

For JSON output I suppose there's a way to "grep" without the line oriented
issue?  I suppose we could make the JSON more hierarchical by adding
an outer function object?

That said, I think this is a useful feature and thus OK for trunk if there are
no other comments in about a week if you also update the gcov documentation.

Thanks,
Richard.

> ---
>  gcc/gcov.cc| 101 +++--
>  gcc/testsuite/g++.dg/gcov/gcov-19.C|  35 +
>  gcc/testsuite/g++.dg/gcov/gcov-20.C|  38 ++
>  gcc/testsuite/gcc.misc-tests/gcov-24.c |  20 +
>  gcc/testsuite/gcc.misc-tests/gcov-25.c |  23 ++
>  gcc/testsuite/gcc.misc-tests/gcov-26.c |  23 ++
>  gcc/testsuite/gcc.misc-tests/gcov-27.c |  22 ++
>  gcc/testsuite/lib/gcov.exp |  53 -
>  8 files changed, 306 insertions(+), 9 deletions(-)
>  create mode 100644 

Re: [PATCH] tree-ssa-sink: Improve code sinking pass

2024-05-08 Thread Richard Biener
On Wed, Mar 13, 2024 at 2:56 PM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in begining of the block after the labels.
>
> For example :
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
>
> Code Sinking does the following:
>
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>
>   if (a != 5)
> {
>   l = a + b + c + d +e + f;
>   bar();
>   j = l;
> }
> }
>
> Bootstrapped regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
>
> tree-ssa-sink: Improve code sinking pass
>
> Currently, code sinking will sink code at the use points with loop having same
> nesting depth. The following patch improves code sinking by placing the sunk
> code in begining of the block after the labels.
>
> 2024-03-13  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> PR tree-optimization/81953
> * tree-ssa-sink.cc (statement_sink_location):Sink statements at
> the begining of the basic block after labels.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/81953
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
>  gcc/tree-ssa-sink.cc|  7 ++-
>  2 files changed, 17 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 880d6f70a80..1ec5c048fe7 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -208,7 +208,6 @@ select_best_block (basic_block early_bb,
>  loop nest.  */
>temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>  }
> -
>/* Placing a statement before a setjmp-like function would be invalid
>   (it cannot be reevaluated when execution follows an abnormal edge).
>   If we selected a block with abnormal predecessors, just punt.  */
> @@ -430,6 +429,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> continue;
>   break;
> }
> +
>use = USE_STMT (one_use);
>
>if (gimple_code (use) != GIMPLE_PHI)

OK if you avoid the stray whitespace changes above.

Richard.

> @@ -439,10 +439,7 @@ statement_sink_location (gimple *stmt, basic_block 
> frombb,
>   if (sinkbb == frombb)
> return false;
>
> - if (sinkbb == gimple_bb (use))
> -   *togsi = gsi_for_stmt (use);
> - else
> -   *togsi = gsi_after_labels (sinkbb);
> + *togsi = gsi_after_labels (sinkbb);
>
>   return true;
> }
> --
> 2.39.3
>


[pushed] wwwdocs: gcc-12: 512-bit instead of 512 bit

2024-05-08 Thread Gerald Pfeifer
Same as for GCC 13, as I just noticed. Pushed.

Gerald

---
 htdocs/gcc-12/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index b4e29d72..8a0347e3 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -1149,7 +1149,7 @@ are not listed here).
 
   GCC now supports AMD CPUs based on the znver4 core
 via -march=znver4.  The switch makes GCC consider
-using 512 bit vectors when auto-vectorizing.
+using 512-bit vectors when auto-vectorizing.
   
 
 
-- 
2.44.0


[pushed] wwwdocs: gcc-13: 512-bit instead of 512 bit

2024-05-08 Thread Gerald Pfeifer
A detail I missed last year. My bad. Fixed thusly and pushed.

Gerald
---
 htdocs/gcc-13/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 15a309d6..e324b782 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -609,7 +609,7 @@ You may also want to check out our
   
   GCC now supports AMD CPUs based on the znver4 core
 via -march=znver4.  The switch makes GCC consider
-using 512 bit vectors when auto-vectorizing.
+using 512-bit vectors when auto-vectorizing.
   
 
 
-- 
2.44.0


Re: [PATCH, aarch64] v2: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-08 Thread Alex Coplan
Hi Ajit,

Sorry for the long delay in reviewing this.

This is really getting there now.  I've left a few more comments below.

Apart from minor style things, the main remaining issues are mostly
around comments.  It's important to have good clear comments for
functions with the parameters (and return value, if any) clearly
described.  See https://www.gnu.org/prep/standards/standards.html#Comments

Note that this now needs a little rebasing, too.

On 21/04/2024 13:22, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed and changes are made to transform_for_base
> function as per consensus.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> 
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-04-21  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 484 +++
>  1 file changed, 325 insertions(+), 159 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 365dcf48b22..83a917e1d20 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,189 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// Forward declaration to be used inside the aarch64_pair_fusion class.
> +bool ldp_operand_mode_ok_p (machine_mode mode);
> +rtx aarch64_destructure_load_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_destructure_store_pair (rtx regs[2], rtx pattern);
> +rtx aarch64_gen_writeback_pair (rtx wb_effect, rtx pair_mem, rtx regs[2],
> + bool load_p);

I don't think we want to change the linkage of these, they should be kept
static.

> +enum class writeback{

Nit: space before '{'

> +  WRITEBACK_PAIR_P,
> +  WRITEBACK
> +};

We're going to want some more descriptive names here.  How about
EXISTING and ALL?  Note that the WRITEBACK_ prefix isn't necessary as
you're using an enum class, so uses of the enumerators need to be
prefixed with writeback:: anyway.  A comment describing the usage of the
enum as well as comments above the enumerators describing their
interpretation would be good.

> +
> +struct pair_fusion {
> +

Nit: excess blank line.

> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };

Can we have one blank line between the virtual functions, please?  I
think that would be more readable now that there are comments above each
of them.

> +  // Return true if GPR is FP or SIMD accesses, passed
> +  // with GPR reg_op rtx, machine mode and load_p.

It's slightly awkward trying to document this without the parameter
names, but I can see that you're omitting them to avoid unused parameter
warnings.  One option would be to introduce names in the comment as you
go.  How about this instead:

// Given:
// - an rtx REG_OP, the non-memory operand in a load/store insn,
// - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
// - a boolean LOAD_P (true iff the insn is a load), then:
// return true if the access should be considered an FP/SIMD access.
// Such accesses are segregated from GPR accesses, since we only want to
// form pairs for accesses that use the same register file.

> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +  // Return true if pair operand mode is ok. Passed with
> +  // machine mode.

Could you use something closer to the comment that is already above
ldp_operand_mode_ok_p?  The purpose of this predicate is really to test
the following: "is it a good idea (for optimization) to form paired
accesses with this operand mode at this stage in compilation?"

> + 

Re: [PATCH 3/4] gcc/c-family/c-opts: fix quoting for `-fdeps-format=` error message

2024-05-08 Thread Ben Boeckel
On Tue, May 07, 2024 at 21:15:09 +, Joseph Myers wrote:
> That can't be right.  The GCC %q is a modifier that needs to have an 
> actual format specifier it modifies (so %qs - which produces the same 
> output as %<%s%> - but not %q by itself).

Yes, I got CI results of failure and noticed that I had prepared the
patches on my laptop, but when I investigated, I had done additional
work on my desktop concurrently I had not pulled back (it builds GCC in
a…reasonable time comparatively) which did have the `%qs` change, but
I've not gotten around to running the test suite again (or reporting
back here). I have another patch revision in the works.

Thanks,

--Ben


[PATCH] MIPS: Support constraint 'w' for MSA instruction

2024-05-08 Thread YunQiang Su
Support syntax like:
asm volatile ("fmadd.d %w0, %w1, %w2" : "+w"(a): "w"(b), "w"(c));

gcc
* config/mips/constraints.md: Add new constraint 'w'.

gcc/testsuite
* gcc.target/mips/msa-inline-asm.c: New test.
---
 gcc/config/mips/constraints.md | 3 +++
 gcc/testsuite/gcc.target/mips/msa-inline-asm.c | 9 +
 2 files changed, 12 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/msa-inline-asm.c

diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
index a96028dd746..f5c88179038 100644
--- a/gcc/config/mips/constraints.md
+++ b/gcc/config/mips/constraints.md
@@ -29,6 +29,9 @@ (define_register_constraint "t" "T_REG"
 (define_register_constraint "f" "TARGET_HARD_FLOAT ? FP_REGS : NO_REGS"
   "A floating-point register (if available).")
 
+(define_register_constraint "w" "ISA_HAS_MSA ? FP_REGS : NO_REGS"
+  "A MIPS SIMD register (if available).")
+
 (define_register_constraint "h" "NO_REGS"
   "Formerly the @code{hi} register.  This constraint is no longer supported.")
 
diff --git a/gcc/testsuite/gcc.target/mips/msa-inline-asm.c 
b/gcc/testsuite/gcc.target/mips/msa-inline-asm.c
new file mode 100644
index 000..bdf6816ab3b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/msa-inline-asm.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-mips16 -mfp64 -mhard-float -mmsa" } */
+
+double
+f(double a, double b, double c) {
+  asm volatile ("fmadd.d %w0, %w1, %w2" : "+w"(a): "w"(b), "w"(c));
+  return a;
+}
+/* { dg-final { scan-assembler "fmadd.d \\\$w0, \\\$w\[0-9\]*, \\\$w\[0-9\]*" 
} }  */
-- 
2.39.2



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-08 Thread Jonathan Wakely
On Wed, 8 May 2024 at 11:33, Andrew Waterman  wrote:
>
> On Tue, May 7, 2024 at 9:46 AM Jonathan Wakely  wrote:
> >
> > On Tue, 7 May 2024 at 17:39, Jonathan Wakely  wrote:
> > >
> > > On Tue, 7 May 2024 at 17:33, Jeff Law wrote:
> > > >
> > > >
> > > >
> > > > On 5/7/24 9:36 AM, Andreas Schwab wrote:
> > > > > On Mai 07 2024, Jonathan Wakely wrote:
> > > > >
> > > > >> +#ifdef __riscv
> > > > >> +return _M_insert(__builtin_copysign((double)__f,
> > > > >> +
> > > > >> (double)-__builtin_signbit(__f));
> > > > >
> > > > > Should this use static_cast?
> > >
> > > Meh. It wouldn't fit in 80 columns any more with static_cast, and it
> > > means exactly the same thing.
> > >
> > > > And it's missing a close paren.
> > >
> > > Now that's more important! Thanks.
> >
> > Also, I've just realised that signbit might return a negative value if
> > the signbit is set. The spec only says it returns non-zero if the
> > signbit is set.
> >
> > So maybe we want:
> >
> > #ifdef __riscv
> > const int __neg = __builtin_signbit(__f) ? -1 : 0;
> > return _M_insert(__builtin_copysign(static_cast(__f),
> >   static_cast(__neg)));
> > #else
> > return _M_insert(static_cast(__f));
> > #endif
>
> We can avoid the signbit call altogether by taking advantage of the
> fact that type-punning the float to an int, then converting that int
> to a double, will produce a double with the sign of the original
> value, with no exceptions raised in the process.  (I don't know
> whether we're allowed to use std::bit_cast in this context, but a
> type-punning memcpy would have the same effect.)

I'll check when Clang added support for __builtin_bit_cast, but I
think we can use that (we can't use std::bit_cast because this needs
to compile as C++98).


>
>   int __i = std::bit_cast(__f);
>   return _M_insert(__builtin_copysign(static_cast(__f),
> static_cast(__i)));
>
> Empirically, this saves 3 instructions on RV64 or 1 instruction on
> RV32 (as measured on GCC 13.2.0).  Note, I'm not trying to drag-race
> on performance here.  Rather, I'm trying to minimize the extent to
> which this RISC-V idiosyncrasy results in static code-size bloat.

Yup, this is nice, thanks.

>
> BTW, I agree with Palmer that adding a __builtin with these semantics
> seems advisable if this pattern turns out to recur frequently.
>



Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-08 Thread Andrew Waterman
On Tue, May 7, 2024 at 9:46 AM Jonathan Wakely  wrote:
>
> On Tue, 7 May 2024 at 17:39, Jonathan Wakely  wrote:
> >
> > On Tue, 7 May 2024 at 17:33, Jeff Law wrote:
> > >
> > >
> > >
> > > On 5/7/24 9:36 AM, Andreas Schwab wrote:
> > > > On Mai 07 2024, Jonathan Wakely wrote:
> > > >
> > > >> +#ifdef __riscv
> > > >> +return _M_insert(__builtin_copysign((double)__f,
> > > >> +
> > > >> (double)-__builtin_signbit(__f));
> > > >
> > > > Should this use static_cast?
> >
> > Meh. It wouldn't fit in 80 columns any more with static_cast, and it
> > means exactly the same thing.
> >
> > > And it's missing a close paren.
> >
> > Now that's more important! Thanks.
>
> Also, I've just realised that signbit might return a negative value if
> the signbit is set. The spec only says it returns non-zero if the
> signbit is set.
>
> So maybe we want:
>
> #ifdef __riscv
> const int __neg = __builtin_signbit(__f) ? -1 : 0;
> return _M_insert(__builtin_copysign(static_cast(__f),
>   static_cast(__neg)));
> #else
> return _M_insert(static_cast(__f));
> #endif

We can avoid the signbit call altogether by taking advantage of the
fact that type-punning the float to an int, then converting that int
to a double, will produce a double with the sign of the original
value, with no exceptions raised in the process.  (I don't know
whether we're allowed to use std::bit_cast in this context, but a
type-punning memcpy would have the same effect.)

  int __i = std::bit_cast(__f);
  return _M_insert(__builtin_copysign(static_cast(__f),
static_cast(__i)));

Empirically, this saves 3 instructions on RV64 or 1 instruction on
RV32 (as measured on GCC 13.2.0).  Note, I'm not trying to drag-race
on performance here.  Rather, I'm trying to minimize the extent to
which this RISC-V idiosyncrasy results in static code-size bloat.

BTW, I agree with Palmer that adding a __builtin with these semantics
seems advisable if this pattern turns out to recur frequently.


[patch,avr] PR114981: Implement __builtin_powif in assembly

2024-05-08 Thread Georg-Johann Lay

__builtin_powif is currently implemented in C,
and this patch implements it (__powisf2) in assembly.

Ok for master?

Johann

--

AVR: target/114981 - Tweak __powisf2

Implement __powisf2 in assembly.

PR target/114981
libgcc/
* config/avr/t-avr (LIB2FUNCS_EXCLUDE): Add _powisf2.
(LIB1ASMFUNCS) [!avrtiny]: Add _powif.
* config/avr/lib1funcs.S (mov4): New .macro.
(L_powif, __powisf2) [!avrtiny]: New module and function.

testsuite/
* gcc.target/avr/pr114981-powif.c: New test.diff --git a/gcc/testsuite/gcc.target/avr/pr114981-powif.c b/gcc/testsuite/gcc.target/avr/pr114981-powif.c
new file mode 100644
index 000..191dcc61e6d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr114981-powif.c
@@ -0,0 +1,33 @@
+/* { dg-do run { target { ! avr_tiny } } } */
+/* { dg-additional-options "-Os" } */
+
+const float vals[] =
+  {
+0.0625f, -0.125f, 0.25f, -0.5f,
+1.0f,
+-2.0f, 4.0f, -8.0f, 16.0f
+  };
+
+#define ARRAY_SIZE(X) ((int) (sizeof(X) / sizeof(*X)))
+
+__attribute__((noinline,noclone))
+void test1 (float x)
+{
+  int i;
+
+  for (i = 0; i < ARRAY_SIZE (vals); ++i)
+{
+  float val0 = vals[i];
+  float val1 = __builtin_powif (x, i - 4);
+  __asm ("" : "+r" (val0));
+
+  if (val0 != val1)
+	__builtin_exit (__LINE__);
+}
+}
+
+int main (void)
+{
+  test1 (-2.0f);
+  return 0;
+}
diff --git a/libgcc/config/avr/lib1funcs.S b/libgcc/config/avr/lib1funcs.S
index 4ac31fa104e..04a4eb01ab4 100644
--- a/libgcc/config/avr/lib1funcs.S
+++ b/libgcc/config/avr/lib1funcs.S
@@ -80,6 +80,11 @@
 #endif
 .endm
 
+.macro	mov4  r_dest, r_src
+wmov \r_dest,   \r_src
+wmov \r_dest+2, \r_src+2
+.endm
+
 #if defined (__AVR_HAVE_JMP_CALL__)
 #define XCALL call
 #define XJMP  jmp
@@ -3312,4 +3317,153 @@ DEFUN __fmul
 #undef C0
 #undef C1
 
+
+
+/**
+ * Floating-Point
+ **/
+
+#if defined (L_powif)
+#ifndef __AVR_TINY__
+
+;; float output and arg #1
+#define A0  22
+#define A1  A0 + 1
+#define A2  A0 + 2
+#define A3  A0 + 3
+
+;; float arg #2
+#define B0  18
+#define B1  B0 + 1
+#define B2  B0 + 2
+#define B3  B0 + 3
+
+;; float X: input and iterated squares
+#define X0  10
+#define X1  X0 + 1
+#define X2  X0 + 2
+#define X3  X0 + 3
+
+;; float Y: expand result
+#define Y0  14
+#define Y1  Y0 + 1
+#define Y2  Y0 + 2
+#define Y3  Y0 + 3
+
+;; .7 = Sign of I.
+;; .0 == 0  =>  Y = 1.0f implicitly.
+#define Flags   R9
+#define Y_set   0
+
+;;;  Integer exponent input.
+#define I0  28
+#define I1  I0+1
+
+#define ONE 0x3f80
+
+DEFUN __powisf2
+;; Save 11 Registers: R9...R17, R28, R29
+do_prologue_saves 11
+
+;; Fill local vars with input parameters.
+wmovI0, 20
+mov4X0, A0
+;; Save sign of exponent for later.
+mov Flags,  I1
+;; I := abs (I)
+tst I1
+brpl 1f
+NEG2I0
+1:
+;; Y := (I % 2) ? X : 1.0f
+;; (When we come from below, this is like SET, i.e. Flags.Y_set := 1).
+bst I0, 0
+;; Flags.Y_set = false means that we have to assume Y = 1.0f below.
+bld Flags,  Y_set
+2:  ;; We have A == X when we come from above.
+mov4Y0, A0
+
+.Loop:
+;; while (I >>= 1)
+lsr I1
+ror I0
+sbiwI0, 0
+breq .Loop_done
+
+;; X := X * X
+mov4A0, X0
+#ifdef __WITH_AVRLIBC__
+XCALL   squaref
+#else
+mov4B0, X0
+XCALL   __mulsf3
+#endif /* Have AVR-LibC? */
+mov4X0, A0
+
+;; if (I % 2 == 1)  Y := Y * X
+bst I0, 0
+brtc .Loop
+bst Flags, Y_set
+;; When Y is not set  =>  Y := Y * X = 1.0f * X (= A)
+;; Plus, we have to set Y_set = 1 (= I0.0)
+brtc 1b
+;; Y is already set: Y := X * Y (= A * Y)
+mov4B0, Y0
+XCALL   __mulsf3
+rjmp 2b
+
+;; End while
+.Loop_done:
+
+;; A := 1.0f
+ldi A3, hhi8(ONE)
+ldi A2, hlo8(ONE)
+ldi A1, hi8(ONE)
+ldi A0, lo8(ONE)
+
+;; When Y is still not set, the result is 1.0f (= A).
+bst Flags, Y_set
+brtc .Lret
+
+;; if (I was < 0) Y = 1.0f / Y
+tst Flags
+brmi 1f
+;; A := Y
+mov4A0, Y0
+rjmp .Lret
+1:  ;; A := 1 / Y = A / Y
+mov4B0, Y0
+XCALL   __divsf3
+
+.Lret:
+do_epilogue_restores 11
+ENDF __powisf2
+
+#undef A0
+#undef A1
+#undef A2
+#undef A3
+
+#undef B0
+#undef B1
+#undef B2
+#undef B3
+
+#undef X0
+#undef X1
+#undef X2
+#undef X3
+
+#undef Y0
+#undef Y1
+#undef Y2
+#undef Y3
+
+#undef I0
+#undef I1
+#undef ONE
+
+#endif /* __AVR_TINY__ */
+#endif /* L_powif */
+
 #include "lib1funcs-fixed.S"
diff --git a/libgcc/config/avr/t-avr b/libgcc/config/avr/t-avr
index ed84b3f342e..971a092aceb 100644
--- a/libgcc/config/avr/t-avr
+++ b/libgcc/config/avr/t-avr
@@ -68,7 +68,8 @@ LIB1ASMFUNCS += \
 	_bswapdi2 \
 	_ashldi3 _ashrdi3 _lshrdi3 _rotldi3 \
 	_adddi3 _adddi3_s8 _subdi3 \
-	

[PATCH] Fix SLP reduction initial value for pointer reductions

2024-05-08 Thread Richard Biener
For pointer reductions we need to convert the initial value to
the vector component integer type.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

I've ran into this latent bug on the force-slp branch.

Richard.

* tree-vect-loop.cc (get_initial_defs_for_reduction): Convert
initial value to the vector component type.
---
 gcc/tree-vect-loop.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 29c03c246d4..704df7bdcc7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5618,7 +5618,14 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
   if (i >= initial_values.length () || (j > i && neutral_op))
op = neutral_op;
   else
-   op = initial_values[i];
+   {
+ if (!useless_type_conversion_p (TREE_TYPE (vector_type),
+ TREE_TYPE (initial_values[i])))
+   initial_values[i] = gimple_convert (_seq,
+   TREE_TYPE (vector_type),
+   initial_values[i]);
+ op = initial_values[i];
+   }
 
   /* Create 'vect_ = {op0,op1,...,opn}'.  */
   number_of_places_left_in_vector--;
-- 
2.35.3


[PATCH] Fix non-grouped SLP load/store accounting in alignment peeling

2024-05-08 Thread Richard Biener
When we have a non-grouped access we bogously multiply by zero.
This shows most with single-lane SLP but also happens with
the multi-lane splat case.

Re-bootstrap & regtest running on x86_64-unknown-linux-gnu.

I've ran into this latent bug on the force-slp branch.

Richard.

* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
Properly guard DR_GROUP_SIZE access with STMT_VINFO_GROUPED_ACCESS.
---
 gcc/tree-vect-data-refs.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index c531079d3bb..ae237407672 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2290,8 +2290,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
{
  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
- nscalars = (STMT_SLP_TYPE (stmt_info)
- ? vf * DR_GROUP_SIZE (stmt_info) : vf);
+ unsigned group_size = 1;
+ if (STMT_SLP_TYPE (stmt_info)
+ && STMT_VINFO_GROUPED_ACCESS (stmt_info))
+   group_size = DR_GROUP_SIZE (stmt_info);
+ nscalars = vf * group_size;
}
 
  /* Save info about DR in the hash table.  Also include peeling
-- 
2.35.3


Re: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128

2024-05-08 Thread Kewen.Lin
on 2024/4/30 07:11, Alexandre Oliva wrote:
> On Apr 29, 2024, "Kewen.Lin"  wrote:
> 
>> Thanks for catching this and sorry
>> that I didn't check it before suggesting it, I think we can aggressively
>> drop this effective target instead to avoid any possible confusion.
> 
> The 128-bit ones, unfortunately, follow the same pattern but are
> probably used.  IMHO we should transition all 3 to an '_ok' suffix, but...
> 

Yeah, I noticed the 128-bit ones are used, I was just suggesting dropping
check_effective_target_long_double_64bit and add_options_for_long_double_64bit
as there is no user (since release 12 when it's introduced r12-3151), IMHO
there would be not any uses in future, ...

>> How about the generic one "longdouble64"?  I did a grep and found it has one
>> use, I'd expect it can work here. :)
> 
> ... since this and longdouble128 exist, maybe we can fix it and leave
> them all alone, despite the interface oddity.
> 
... personally I'm inclined to drop this 64 bit one. :)

BR,
Kewen



Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-08 Thread Kewen.Lin
Hi Richi,

>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index c584664e168..58e48f7dc55 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -18363,11 +18363,11 @@ If @code{N=0}, no pad location is recorded.
>>  The NOP instructions are inserted at---and maybe before, depending on
>>  @var{M}---the function entry address, even before the prologue.  On
>>  PowerPC with the ELFv2 ABI, for a function with dual entry points,
>> -the local entry point is this function entry address.
>> +@var{M} NOP instructions are inserted before the global entry point and
>> +@var{N} - @var{M} NOP instructions are inserted after the local entry
>> +point, which means the NOP instructions may not be consecutive.
> 
> Isn't it @var{M-1} NOP instructions before the global entry?  I suppose

No, the existing documentation is a bit confusing, sigh ...

> the existing
> 
> "... with the function entry point before the @var{M}th NOP.
> If @var{M} is omitted, it defaults to @code{0} so the
> function entry points to the address just at the first NOP."
> 
> wording is self-contradicting in a way since before the 0th NOP (default)
> to me is the same as before the 1st NOP (M == 1).  So maybe that should
> be _after_ the @var{M}th NOP instead which would be consistent with your
> ELFv2 docs?  Maybe the sentence should be re-worded similar to your
> ELVv2 one, specifying the number of NOPs before and after the entry point.
> 

... the current "with the function entry point before the Mth NOP."
has the 0th NOP assumption, so the default (0th) NOP and 1st NOP (M == 1)
are actually different, such as:

-fpatchable-function-entry=3,0

foo:
nop
nop
nop

-fpatchable-function-entry=3,1

nop
foo:
nop
nop

Alan also had the similar concern on this wording before:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888#c8

" Alan Modra 2022-08-12 03:00:29 UTC
"
"(In reply to Segher Boessenkool from comment #7)
"> '-fpatchable-function-entry=N[,M]'
">  Generate N NOPs right at the beginning of each function, with the
">  function entry point before the Mth NOP.
"
" Bad doco.  Should be "after the Mth NOP" I think.
" Or better written to avoid the concept of a 0th nop.
" Default for M is zero, placing all nops after the function entry and
" before normal function prologue code.

BR,
Kewen

>> -The maximum value of @var{N} and @var{M} is 65535.  On PowerPC with the
>> -ELFv2 ABI, for a function with dual entry points, the supported values
>> -for @var{M} are 0, 2, 6 and 14.
>> +The maximum value of @var{N} and @var{M} is 65535.
>>  @end table
>>





Re: [PATCH] tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 9:56 AM Stefan Schulze Frielinghaus
 wrote:
>
> On s390 the following tests fail
>
> FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CLZ (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .CLZ 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CTZ (vect" 2
> FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .CTZ 
> (vect" 2
> FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
> FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .CTZ (vect" 2
> FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
> optimized " = .POPCOUNT (vect" 1
> FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .CTZ 
> (vect" 2
> FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .POPCOUNT 
> (vect" 1
>
> because aprefetch unrolls loops even if -fno-unroll-loops is used.
> Accordingly, the scan patterns match more than one time.
>
> Could also be fixed by using -fno-prefetch-loop-arrays for the tests.
> Though, I tend to prefer if aprefetch honours -fno-unroll-loops.  Any
> preferences?
>
> Bootstrapped and regtested on x86_64 and s390.  Ok for mainline?

OK.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
> -fno-unroll-loops.
> ---
>  gcc/tree-ssa-loop-prefetch.cc | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
> index 70073cc4fe4..bb5d5dec779 100644
> --- a/gcc/tree-ssa-loop-prefetch.cc
> +++ b/gcc/tree-ssa-loop-prefetch.cc
> @@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct 
> mem_ref_group *refs,
>struct mem_ref_group *agp;
>struct mem_ref *ref;
>
> +  /* Bail out early in case we must not unroll loops.  */
> +  if (!flag_unroll_loops)
> +return 1;
> +
>/* First check whether the loop is not too large to unroll.  We ignore
>   PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us
>   from unrolling them enough to make exactly one cache line covered by 
> each
> --
> 2.44.0
>


[PATCH] Fix and speedup IDF pruning by dominator

2024-05-08 Thread Richard Biener
When insert_updated_phi_nodes_for tries to skip pruning the IDF to
blocks dominated by the nearest common dominator of the set of
definition blocks it compares against ENTRY_BLOCK but that's never
going to be the common dominator.  In fact if it ever were the code
fails to copy IDF to PRUNED_IDF, leading to wrong code.

The following fixes that by avoiding the copy and pruning from the
IDF in-place as well as using the more approprate check against
the single successor of the ENTRY_BLOCK.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I've tried to split the patch but that runs into the pre-existing
issue, appearantly I had never tested the two patches separately
so now here's the squashed variant.  Pushed.

* tree-into-ssa.cc (insert_updated_phi_nodes_for): Skip
pruning when the nearest common dominator is the successor
of ENTRY_BLOCK.  Do not copy IDF but prune it directly.
---
 gcc/tree-into-ssa.cc | 47 +++-
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 705e4119ba3..3732c269ca3 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -3233,7 +3233,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 {
   basic_block entry;
   def_blocks *db;
-  bitmap idf, pruned_idf;
+  bitmap pruned_idf;
   bitmap_iterator bi;
   unsigned i;
 
@@ -3250,8 +3250,7 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 return;
 
   /* Compute the initial iterated dominance frontier.  */
-  idf = compute_idf (db->def_blocks, dfs);
-  pruned_idf = BITMAP_ALLOC (NULL);
+  pruned_idf = compute_idf (db->def_blocks, dfs);
 
   if (TREE_CODE (var) == SSA_NAME)
 {
@@ -3262,27 +3261,32 @@ insert_updated_phi_nodes_for (tree var, bitmap_head 
*dfs,
 common dominator of all the definition blocks.  */
  entry = nearest_common_dominator_for_set (CDI_DOMINATORS,
db->def_blocks);
- if (entry != ENTRY_BLOCK_PTR_FOR_FN (cfun))
-   EXECUTE_IF_SET_IN_BITMAP (idf, 0, i, bi)
- if (BASIC_BLOCK_FOR_FN (cfun, i) != entry
- && dominated_by_p (CDI_DOMINATORS,
-BASIC_BLOCK_FOR_FN (cfun, i), entry))
-   bitmap_set_bit (pruned_idf, i);
+ if (entry != single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)))
+   {
+ unsigned to_remove = ~0U;
+ EXECUTE_IF_SET_IN_BITMAP (pruned_idf, 0, i, bi)
+   {
+ if (to_remove != ~0U)
+   {
+ bitmap_clear_bit (pruned_idf, to_remove);
+ to_remove = ~0U;
+   }
+ if (BASIC_BLOCK_FOR_FN (cfun, i) == entry
+ || !dominated_by_p (CDI_DOMINATORS,
+ BASIC_BLOCK_FOR_FN (cfun, i), entry))
+   to_remove = i;
+   }
+ if (to_remove != ~0U)
+   bitmap_clear_bit (pruned_idf, to_remove);
+   }
}
   else
-   {
- /* Otherwise, do not prune the IDF for VAR.  */
- gcc_checking_assert (update_flags == TODO_update_ssa_full_phi);
- bitmap_copy (pruned_idf, idf);
-   }
-}
-  else
-{
-  /* Otherwise, VAR is a symbol that needs to be put into SSA form
-for the first time, so we need to compute the full IDF for
-it.  */
-  bitmap_copy (pruned_idf, idf);
+   /* Otherwise, do not prune the IDF for VAR.  */
+   gcc_checking_assert (update_flags == TODO_update_ssa_full_phi);
 }
+  /* Otherwise, VAR is a symbol that needs to be put into SSA form
+ for the first time, so we need to compute the full IDF for
+ it.  */
 
   if (!bitmap_empty_p (pruned_idf))
 {
@@ -3309,7 +3313,6 @@ insert_updated_phi_nodes_for (tree var, bitmap_head *dfs,
 }
 
   BITMAP_FREE (pruned_idf);
-  BITMAP_FREE (idf);
 }
 
 /* Sort symbols_to_rename after their DECL_UID.  */
-- 
2.35.3


Re: [PATCH] reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]

2024-05-08 Thread Richard Biener
On Wed, 8 May 2024, Jakub Jelinek wrote:

> Hi!
> 
> The optimize_range_tests_to_bit_test optimization normally emits a range
> test first:
>   if (entry_test_needed)
> {
>   tem = build_range_check (loc, optype, unshare_expr (exp),
>false, lowi, high);
>   if (tem == NULL_TREE || is_gimple_val (tem))
> continue;
> }
> so during the bit test we already know that exp is in the [lowi, high]
> range, but skips it if we have range info which tells us this isn't
> necessary.
> Also, normally it emits shifts by exp - lowi counter, but has an
> optimization to use just exp counter if the mask isn't a more expensive
> constant in that case and lowi is > 0 and high is smaller than prec.
> 
> The following testcase is miscompiled because the two abnormal cases
> are triggered.  The range of exp is [43, 43][48, 48][95, 95], so we on
> 64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
> And we also decide to use just exp as counter, because the range test
> tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
> Because 95 is in the exp range, we can't do that, we'd either need to
> do a range test first, i.e.
> if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
> or need to subtract lowi from the shift counter, i.e.
> if ((1UL << (exp - 43)) & mask2)
> but can't do both unless r.upper_bound () is < prec.
> 
> The following patch ensures that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-05-08  Jakub Jelinek  
> 
>   PR tree-optimization/114965
>   * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
>   optimize away exp - lowi subtraction from shift count unless entry
>   test is emitted or unless r.upper_bound () is smaller than prec.
> 
>   * gcc.c-torture/execute/pr114965.c: New test.
> 
> --- gcc/tree-ssa-reassoc.cc.jj2024-01-12 10:07:58.384848977 +0100
> +++ gcc/tree-ssa-reassoc.cc   2024-05-07 18:18:45.558814991 +0200
> @@ -3418,7 +3418,8 @@ optimize_range_tests_to_bit_test (enum t
>We can avoid then subtraction of the minimum value, but the
>mask constant could be perhaps more expensive.  */
> if (compare_tree_int (lowi, 0) > 0
> -   && compare_tree_int (high, prec) < 0)
> +   && compare_tree_int (high, prec) < 0
> +   && (entry_test_needed || wi::ltu_p (r.upper_bound (), prec)))
>   {
> int cost_diff;
> HOST_WIDE_INT m = tree_to_uhwi (lowi);
> --- gcc/testsuite/gcc.c-torture/execute/pr114965.c.jj 2024-05-07 
> 18:17:16.767031821 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr114965.c2024-05-07 
> 18:15:52.332188943 +0200
> @@ -0,0 +1,30 @@
> +/* PR tree-optimization/114965 */
> +
> +static void
> +foo (const char *x)
> +{
> +
> +  char a = '0';
> +  while (1)
> +{
> +  switch (*x)
> + {
> + case '_':
> + case '+':
> +   a = *x;
> +   x++;
> +   continue;
> + default:
> +   break;
> + }
> +  break;
> +}
> +  if (a == '0' || a == '+')
> +__builtin_abort ();
> +}
> +
> +int
> +main ()
> +{
> +  foo ("_");
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-05-08 Thread Stefan Schulze Frielinghaus
Consider a NOCE conversion as profitable if there is at least one
conditional move.

gcc/ChangeLog:

* config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
Define.
(s390_noce_conversion_profitable_p): Implement.

gcc/testsuite/ChangeLog:

* gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
consequence the condition has to be reversed.
---
 Bootstrapped and regtested on s390.  Ok for mainline?

 gcc/config/s390/s390.cc  | 32 
 gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index bf46eab2d63..23b18b5c506 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "context.h"
 #include "builtins.h"
+#include "ifcvt.h"
 #include "rtl-iter.h"
 #include "intl.h"
 #include "tm-constrs.h"
@@ -18037,6 +18038,37 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
machine_mode op_mode,
   return vectorize_vec_perm_const_1 (d);
 }
 
+/* Consider a NOCE conversion as profitable if there is at least one
+   conditional move.  */
+
+#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
+#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
+
+static bool
+s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info *if_info)
+{
+  if (if_info->speed_p)
+{
+  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
+   {
+ rtx set = single_set (insn);
+ if (set == NULL)
+   continue;
+ if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
+   continue;
+ rtx src = SET_SRC (set);
+ machine_mode mode = GET_MODE (src);
+ if (GET_MODE_CLASS (mode) != MODE_INT
+ && GET_MODE_CLASS (mode) != MODE_FLOAT)
+   continue;
+ if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (Pmode))
+   continue;
+ return true;
+   }
+}
+  return default_noce_conversion_profitable_p (seq, if_info);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
b/gcc/testsuite/gcc.target/s390/ccor.c
index 31f30f60314..36a3c3a999a 100644
--- a/gcc/testsuite/gcc.target/s390/ccor.c
+++ b/gcc/testsuite/gcc.target/s390/ccor.c
@@ -42,7 +42,7 @@ GENFUN1(2)
 
 GENFUN1(3)
 
-/* { dg-final { scan-assembler {locrno} } } */
+/* { dg-final { scan-assembler {locro} } } */
 
 GENFUN2(0,1)
 
@@ -58,7 +58,7 @@ GENFUN2(0,3)
 
 GENFUN2(1,2)
 
-/* { dg-final { scan-assembler {locrnlh} } } */
+/* { dg-final { scan-assembler {locrlh} } } */
 
 GENFUN2(1,3)
 
-- 
2.44.0



Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 7:50 AM Kewen.Lin  wrote:
>
> Hi,
>
> As the discussion in PR112980, although the current
> implementation for -fpatchable-function-entry* conforms
> with the documentation (making N NOPs be consecutive),
> it's inefficient for both kernel and userspace livepatching
> (see comments in PR for the details).
>
> So this patch is to change the current implementation by
> emitting the "before" NOPs before global entry point and
> the "after" NOPs after local entry point.  The new behavior
> would not keep NOPs to be consecutive, so the documentation
> is updated to emphasize this.
>
> Bootstrapped and regress-tested on powerpc64-linux-gnu
> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>
> Is it ok for trunk?  And backporting to active branches
> after burn-in time?  I guess we should also mention this
> change in changes.html?
>
> BR,
> Kewen
> -
> PR target/112980
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
> Adjust the handling on patch area emitting with dual entry, remove
> the restriction on "before" NOPs count, not emit "before" NOPs any
> more but only emit "after" NOPs.
> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
> Adjust by respecting cfun->machine->stop_patch_area_print.
> (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
> cfun->machine->stop_patch_area_print as true.
> * config/rs6000/rs6000.h (struct machine_function): Remove member
> global_entry_emitted, add new member stop_patch_area_print.
> * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
> documentation for PowerPC ELFv2 dual entry.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/patchable_function_entry-default.c: Adjust.
> * gcc.target/powerpc/pr99888-4.c: Likewise.
> * gcc.target/powerpc/pr99888-5.c: Likewise.
> * gcc.target/powerpc/pr99888-6.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 40 +--
>  gcc/config/rs6000/rs6000.cc   | 15 +--
>  gcc/config/rs6000/rs6000.h| 10 +++--
>  gcc/doc/invoke.texi   |  8 ++--
>  .../patchable_function_entry-default.c|  3 --
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>  8 files changed, 33 insertions(+), 55 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..0eb019b44b3 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
>   fprintf (file, "\tadd 2,2,12\n");
> }
>
> -  unsigned short patch_area_size = crtl->patch_area_size;
> -  unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> -   {
> - cfun->machine->global_entry_emitted = true;
> - /* As ELFv2 ABI shows, the allowable bytes between the global
> -and local entry points are 0, 4, 8, 16, 32 and 64 when
> -there is a local entry point.  Considering there are two
> -non-prefixed instructions for global entry point prologue
> -(8 bytes), the count for patchable nops before local entry
> -point would be 2, 6 and 14.  It's possible to support those
> -other counts of nops by not making a local entry point, but
> -we don't have clear use cases for them, so leave them
> -unsupported for now.  */
> - if (patch_area_entry > 0)
> -   {
> - if (patch_area_entry != 2
> - && patch_area_entry != 6
> - && patch_area_entry != 14)
> -   error ("unsupported number of nops before function entry 
> (%u)",
> -  patch_area_entry);
> - rs6000_print_patchable_function_entry (file, patch_area_entry,
> -true);
> - patch_area_size -= patch_area_entry;
> -   }
> -   }
> -
>fputs ("\t.localentry\t", file);
>assemble_name (file, name);
>fputs (",.-", file);
>assemble_name (file, name);
>fputs ("\n", file);
>/* Emit the nops after local entry.  */
> -  if (patch_area_size > 0)
> -   rs6000_print_patchable_function_entry (file, patch_area_size,
> -  patch_area_entry == 0);
> +  unsigned short patch_area_size = crtl->patch_area_size;
> +  unsigned short patch_area_entry = crtl->patch_area_entry;
> +  if (patch_area_size > patch_area_entry)
> +   {
> + cfun->machine->stop_patch_area_print 

[PATCH] reassoc: Fix up optimize_range_tests_to_bit_test [PR114965]

2024-05-08 Thread Jakub Jelinek
Hi!

The optimize_range_tests_to_bit_test optimization normally emits a range
test first:
  if (entry_test_needed)
{
  tem = build_range_check (loc, optype, unshare_expr (exp),
   false, lowi, high);
  if (tem == NULL_TREE || is_gimple_val (tem))
continue;
}
so during the bit test we already know that exp is in the [lowi, high]
range, but skips it if we have range info which tells us this isn't
necessary.
Also, normally it emits shifts by exp - lowi counter, but has an
optimization to use just exp counter if the mask isn't a more expensive
constant in that case and lowi is > 0 and high is smaller than prec.

The following testcase is miscompiled because the two abnormal cases
are triggered.  The range of exp is [43, 43][48, 48][95, 95], so we on
64-bit arch decide we don't need the entry test, because 95 - 43 < 64.
And we also decide to use just exp as counter, because the range test
tests just for exp == 43 || exp == 48, so high is smaller than 64 too.
Because 95 is in the exp range, we can't do that, we'd either need to
do a range test first, i.e.
if (exp - 43U <= 48U - 43U) if ((1UL << exp) & mask1))
or need to subtract lowi from the shift counter, i.e.
if ((1UL << (exp - 43)) & mask2)
but can't do both unless r.upper_bound () is < prec.

The following patch ensures that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-05-08  Jakub Jelinek  

PR tree-optimization/114965
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Don't try to
optimize away exp - lowi subtraction from shift count unless entry
test is emitted or unless r.upper_bound () is smaller than prec.

* gcc.c-torture/execute/pr114965.c: New test.

--- gcc/tree-ssa-reassoc.cc.jj  2024-01-12 10:07:58.384848977 +0100
+++ gcc/tree-ssa-reassoc.cc 2024-05-07 18:18:45.558814991 +0200
@@ -3418,7 +3418,8 @@ optimize_range_tests_to_bit_test (enum t
 We can avoid then subtraction of the minimum value, but the
 mask constant could be perhaps more expensive.  */
  if (compare_tree_int (lowi, 0) > 0
- && compare_tree_int (high, prec) < 0)
+ && compare_tree_int (high, prec) < 0
+ && (entry_test_needed || wi::ltu_p (r.upper_bound (), prec)))
{
  int cost_diff;
  HOST_WIDE_INT m = tree_to_uhwi (lowi);
--- gcc/testsuite/gcc.c-torture/execute/pr114965.c.jj   2024-05-07 
18:17:16.767031821 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr114965.c  2024-05-07 
18:15:52.332188943 +0200
@@ -0,0 +1,30 @@
+/* PR tree-optimization/114965 */
+
+static void
+foo (const char *x)
+{
+
+  char a = '0';
+  while (1)
+{
+  switch (*x)
+   {
+   case '_':
+   case '+':
+ a = *x;
+ x++;
+ continue;
+   default:
+ break;
+   }
+  break;
+}
+  if (a == '0' || a == '+')
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  foo ("_");
+}

Jakub



[PATCH] tree-ssa-loop-prefetch.cc: Honour -fno-unroll-loops

2024-05-08 Thread Stefan Schulze Frielinghaus
On s390 the following tests fail

FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CLZ (vect" 1
FAIL: gcc.dg/vect/pr109011-1.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .CLZ 
(vect" 1
FAIL: gcc.dg/vect/pr109011-1.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1
FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CTZ (vect" 2
FAIL: gcc.dg/vect/pr109011-2.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .CTZ 
(vect" 2
FAIL: gcc.dg/vect/pr109011-2.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1
FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .CTZ (vect" 2
FAIL: gcc.dg/vect/pr109011-4.c -flto -ffat-lto-objects  scan-tree-dump-times 
optimized " = .POPCOUNT (vect" 1
FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .CTZ 
(vect" 2
FAIL: gcc.dg/vect/pr109011-4.c scan-tree-dump-times optimized " = .POPCOUNT 
(vect" 1

because aprefetch unrolls loops even if -fno-unroll-loops is used.
Accordingly, the scan patterns match more than one time.

Could also be fixed by using -fno-prefetch-loop-arrays for the tests.
Though, I tend to prefer if aprefetch honours -fno-unroll-loops.  Any
preferences?

Bootstrapped and regtested on x86_64 and s390.  Ok for mainline?

gcc/ChangeLog:

* tree-ssa-loop-prefetch.cc (determine_unroll_factor): Honour
-fno-unroll-loops.
---
 gcc/tree-ssa-loop-prefetch.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/tree-ssa-loop-prefetch.cc b/gcc/tree-ssa-loop-prefetch.cc
index 70073cc4fe4..bb5d5dec779 100644
--- a/gcc/tree-ssa-loop-prefetch.cc
+++ b/gcc/tree-ssa-loop-prefetch.cc
@@ -1401,6 +1401,10 @@ determine_unroll_factor (class loop *loop, struct 
mem_ref_group *refs,
   struct mem_ref_group *agp;
   struct mem_ref *ref;
 
+  /* Bail out early in case we must not unroll loops.  */
+  if (!flag_unroll_loops)
+return 1;
+
   /* First check whether the loop is not too large to unroll.  We ignore
  PARAM_MAX_UNROLL_TIMES, because for small loops, it prevented us
  from unrolling them enough to make exactly one cache line covered by each
-- 
2.44.0



Re: [PATCH] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-08 Thread Richard Biener
On Wed, May 8, 2024 at 5:25 AM Andrew Pinski  wrote:
>
> We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : 
> ABS`
> and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
> some extra comparison and extra conditional moves in some cases.
> I don't remember where I had found though but it is simple to add so
> let's add it.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Note I have a secondary pattern for the equal case as either a or nonnegative
> could be used.

OK

> PR tree-optimization/112392
>
> gcc/ChangeLog:
>
> * match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
> where CMP is ==, > and >=.
> (`x CMP nonnegative@y ? y : ABS`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-41.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd   | 15 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
>  2 files changed, 49 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 03a03c31233..07e743ae464 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (convert (absu:utype @0)))
>  @3
>
> +/* X >  Positive ? X : ABS(X) -> ABS(X) */
> +/* X >= Positive ? X : ABS(X) -> ABS(X) */
> +/* X == Positive ? X : ABS(X) -> ABS(X) */
> +(for cmp (eq gt ge)
> + (simplify
> +  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
> +  (if (INTEGRAL_TYPE_P (type))
> +   @3)))
> +
> +/* X == Positive ? Positive : ABS(X) -> ABS(X) */
> +(simplify
> + (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
> + (if (INTEGRAL_TYPE_P (type))
> +  @3))
> +
>  /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
> X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
>  (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> new file mode 100644
> index 000..9774e283a7b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-phiopt1" } */
> +/* PR tree-optimization/112392 */
> +
> +int feq_1(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a == absb)  return absb;
> +  return a > 0 ? a : -a;
> +}
> +int feq_2(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a == absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +int fgt(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a > absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +int fge(int a, unsigned char b)
> +{
> +  int absb = b;
> +  if (a >= absb)  return a;
> +  return a > 0 ? a : -a;
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
> --
> 2.43.0
>


Re: [PATCH] RISC-V: Add zero_extract support for rv64gc

2024-05-08 Thread Christoph Müllner
On Mon, May 6, 2024 at 11:43 PM Vineet Gupta  wrote:
>
>
>
> On 5/6/24 13:40, Christoph Müllner wrote:
> > The combiner attempts to optimize a zero-extension of a logical right shift
> > using zero_extract. We already utilize this optimization for those cases
> > that result in a single instructions.  Let's add a insn_and_split
> > pattern that also matches the generic case, where we can emit an
> > optimized sequence of a slli/srli.
> >
> > ...
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index d4676507b45..80cbecb78e8 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
> >[(set_attr "type" "shift")
> > (set_attr "mode" "SI")])
> >
> > +;; Canonical form for a zero-extend of a logical right shift.
> > +;; Special cases are handled above.
> > +;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
>
> Dumb question: Why not for Zbs: Zb[abs] is going to be very common going
> fwd and will end up being unused.
>
> > +(define_insn_and_split "*lshr3_zero_extend_4"
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +  (zero_extract:GPR
> > +   (match_operand:GPR 1 "register_operand" " r")
> > +   (match_operand 2 "const_int_operand")
> > +   (match_operand 3 "const_int_operand")))
> > +   (clobber (match_scratch:GPR  4 "="))]
> > +  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
> > +   && !TARGET_XTHEADBB"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(set (match_dup 4)
> > + (ashift:GPR (match_dup 1) (match_dup 2)))
> > +   (set (match_dup 0)
> > + (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
> > +{
> > +  int regbits = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> > +  int sizebits = INTVAL (operands[2]);
> > +  int startbits = INTVAL (operands[3]);
> > +  int lshamt = regbits - sizebits - startbits;
> > +  int rshamt = lshamt + startbits;
> > +  operands[2] = GEN_INT (lshamt);
> > +  operands[3] = GEN_INT (rshamt);
> > +}
> > +  [(set_attr "type" "shift")
> > +   (set_attr "mode" "")])
> > +
> >  ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
> >  ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
> >  ;; xor/addi/srli, and.
> > diff --git a/gcc/testsuite/gcc.target/riscv/pr111501.c 
> > b/gcc/testsuite/gcc.target/riscv/pr111501.c
> > new file mode 100644
> > index 000..9355be242e7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/pr111501.c
> > @@ -0,0 +1,32 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target rv64 } */
> > +/* { dg-options "-march=rv64gc" { target { rv64 } } } */
> > +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
>
> Is function body check really needed: isn't count of srli and slli each
> sufficient ?
> Last year we saw a lot of false failures due to unrelated scheduling
> changes as such tripping these up.

I've dropped the check-function-bodies in the v2.

Thanks!

>
> > +/* { dg-allow-blank-lines-in-output 1 } */
> > +
> > +/*
> > +**do_shift:
> > +**...
> > +**slli\ta[0-9],a[0-9],16
> > +**srli\ta[0-9],a[0-9],48
> > +**...
> > +*/
> > +unsigned int
> > +do_shift(unsigned long csum)
> > +{
> > +  return (unsigned short)(csum >> 32);
> > +}
> > +
> > +/*
> > +**do_shift2:
> > +**...
> > +**slli\ta[0-9],a[0-9],16
> > +**srli\ta[0-9],a[0-9],48
> > +**...
> > +*/
> > +unsigned int
> > +do_shift2(unsigned long csum)
> > +{
> > +  return (csum << 16) >> 48;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c 
> > b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> > new file mode 100644
> > index 000..2824d6fe074
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> > @@ -0,0 +1,37 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target rv32 } */
> > +/* { dg-options "-march=rv32gc" } */
> > +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
> > +/* { dg-final { check-function-bodies "**" "" } } */
>
> Same as above, counts where possible.
>
> -Vineet
>


Re: [PATCH] RISC-V: Add zero_extract support for rv64gc

2024-05-08 Thread Christoph Müllner
On Mon, May 6, 2024 at 11:24 PM Jeff Law  wrote:
>
>
>
> On 5/6/24 2:40 PM, Christoph Müllner wrote:
> > The combiner attempts to optimize a zero-extension of a logical right shift
> > using zero_extract. We already utilize this optimization for those cases
> > that result in a single instructions.  Let's add a insn_and_split
> > pattern that also matches the generic case, where we can emit an
> > optimized sequence of a slli/srli.
> >
> > Tested with SPEC CPU 2017 (rv64gc).
> >
> >   PR 111501
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.md (*lshr3_zero_extend_4): New
> >   pattern for zero-extraction.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/pr111501.c: New test.
> >   * gcc.target/riscv/zero-extend-rshift-32.c: New test.
> >   * gcc.target/riscv/zero-extend-rshift-64.c: New test.
> >   * gcc.target/riscv/zero-extend-rshift.c: New test.
> So I had Lyut looking in this space as well.  Mostly because there's a
> desire to avoid the srl+and approach and instead represent this stuff as
> shifts (which are fusible in our uarch).  SO I've already got some state...
>
>
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv.md |  30 +
> >   gcc/testsuite/gcc.target/riscv/pr111501.c |  32 +
> >   .../gcc.target/riscv/zero-extend-rshift-32.c  |  37 ++
> >   .../gcc.target/riscv/zero-extend-rshift-64.c  |  63 ++
> >   .../gcc.target/riscv/zero-extend-rshift.c | 119 ++
> >   5 files changed, 281 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/pr111501.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-64.c
> >   create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift.c
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index d4676507b45..80cbecb78e8 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -2792,6 +2792,36 @@ (define_insn "*lshrsi3_zero_extend_3"
> > [(set_attr "type" "shift")
> >  (set_attr "mode" "SI")])
> >
> > +;; Canonical form for a zero-extend of a logical right shift.
> > +;; Special cases are handled above.
> > +;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
> > +(define_insn_and_split "*lshr3_zero_extend_4"
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +  (zero_extract:GPR
> > +   (match_operand:GPR 1 "register_operand" " r")
> > +   (match_operand 2 "const_int_operand")
> > +   (match_operand 3 "const_int_operand")))
> > +   (clobber (match_scratch:GPR  4 "="))]
> > +  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
> > +   && !TARGET_XTHEADBB"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(set (match_dup 4)
> > + (ashift:GPR (match_dup 1) (match_dup 2)))
> > +   (set (match_dup 0)
> > + (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
> Consider adding support for signed extractions as well.  You just need
> an iterator across zero_extract/sign_extract and suitable selection of
> arithmetic vs logical right shift step.

The sign-extension/extraction code was worse than the
zero-extension/extraction code.
So, I ended up doing some initial work for addressing corner cases first, before
converting this pattern using an any_extract iterator for the v2
(already on the list).

>
> A nit on the condition.   Bring the && INTVAL (operands[2]) == 1 down to
> a new line like you've gone with !TARGET_XTHEADBB.
>
> You also want to make sure the condition rejects the cases handled by
> this pattern (or merge your pattern with this one):

I kept the pattern, but added sign_extract support.

>
> > ;; Canonical form for a zero-extend of a logical right shift.
> > (define_insn "*lshrsi3_zero_extend_2"
> >   [(set (match_operand:DI   0 "register_operand" "=r")
> > (zero_extract:DI (match_operand:DI  1 "register_operand" " r")
> >  (match_operand 2 "const_int_operand")
> >  (match_operand 3 "const_int_operand")))]
> >   "(TARGET_64BIT && (INTVAL (operands[3]) > 0)
> > && (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
> > {
> >   return "srliw\t%0,%1,%3";
> > }
> >   [(set_attr "type" "shift")
> >(set_attr "mode" "SI")])
>
> So generally going the right direction.  But needs another iteration.

Thanks for the review!

>
> Jeff
>


Re: [PATCH] MATCH: Add some more value_replacement simplifications (a != 0 ? expr : 0) to match

2024-05-08 Thread Richard Biener
On Tue, May 7, 2024 at 10:56 PM Andrew Pinski  wrote:
>
> On Tue, May 7, 2024 at 1:45 PM Jeff Law  wrote:
> >
> >
> >
> > On 4/30/24 9:21 PM, Andrew Pinski wrote:
> > > This adds a few more of what is currently done in phiopt's 
> > > value_replacement
> > > to match. I noticed this when I was hooking up phiopt's value_replacement
> > > code to use match and disabling the old code. But this can be done
> > > independently from the hooking up phiopt's value_replacement as phiopt
> > > is already hooked up for simplified versions already.
> > >
> > > /* a != 0 ? a / b : 0  -> a / b iff b is nonzero. */
> > > /* a != 0 ? a * b : 0 -> a * b */
> > > /* a != 0 ? a & b : 0 -> a & b */
> > >
> > > We prefer the `cond ? a : 0` forms to allow optimization of `a * cond` 
> > > which
> > > uses that form.
> > >
> > > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > >
> > >   PR treee-optimization/114894
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd (`a != 0 ? a / b : 0`): New pattern.
> > >   (`a != 0 ? a * b : 0`): New pattern.
> > >   (`a != 0 ? a & b : 0`): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/tree-ssa/phi-opt-value-5.c: New test.
> > Is there any need to also handle the reversed conditional with the arms
> > swapped?If not, this is fine as-is.  If yes, then fine with the
> > obvious generalization.
>
> The answer is yes and no. While the PHI-OPT pass will try both cases
> but the other (all?) passes does not. This is something I have been
> thinking about trying to solve in a generic way instead of adding many
> more patterns here. I will start working on that in the middle of
> June.
> Most of the time cond patterns in match are used is inside phiopt so
> having the revered conditional has not been on high on my priority but
> with VRP and scev and match (itself) producing more cond_expr, we
> should fix this once and for all for GCC 15.

IMO this is a classical case for canonicalization.  IIRC in fold we
rely on tree_swap_operands_p for the COND_EXPR arms and if
we can invert the condition we do so.  So there's a conflict of interest
with respect to condition canonicalization and true/false canonicalization.
We do not canonicalize COND_EXPRs in gimple_resimplify3, but
the only natural thing there would be to do it based on the op2/op3
operands, looking at the conditional would dive down one level too deep.

Richard.

> Thanks,
> Andrew Pinski
>
> >
> > jeff
> >


Re: [PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]

2024-05-08 Thread Mikael Morin

Hello,

Le 08/05/2024 à 07:27, Kewen.Lin a écrit :

Hi,

Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble.  It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.

With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly.  But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format.  So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precision.

Bootstrapped and regress-tested on:
   - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
   - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
   - powerpc64le-linux-gnu P9 (with ieee128 by default)


OK from the fortran point of view.
Thanks.


BR,
Kewen




[PATCH v2 3/4] RISC-V: Add zero_extract support for rv64gc

2024-05-08 Thread Christoph Müllner
The combiner attempts to optimize a zero-extension of a logical right shift
using zero_extract. We already utilize this optimization for those cases
that result in a single instructions.  Let's add a insn_and_split
pattern that also matches the generic case, where we can emit an
optimized sequence of a slli/srli.

Tested with SPEC CPU 2017 (rv64gc).

PR 111501

gcc/ChangeLog:

* config/riscv/riscv.md (*lshr3_zero_extend_4): New
pattern for zero-extraction.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: New test.
* gcc.target/riscv/pr111501.c: New test.
* gcc.target/riscv/zero-extend-rshift-32.c: New test.
* gcc.target/riscv/zero-extend-rshift-64.c: New test.
* gcc.target/riscv/zero-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md |  30 +
 .../gcc.target/riscv/extend-shift-helpers.h   |  26 
 gcc/testsuite/gcc.target/riscv/pr111501.c |  21 
 .../gcc.target/riscv/zero-extend-rshift-32.c  |  13 ++
 .../gcc.target/riscv/zero-extend-rshift-64.c  |  17 +++
 .../gcc.target/riscv/zero-extend-rshift.c | 115 ++
 6 files changed, 222 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr111501.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift-64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zero-extend-rshift.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index b7fc13e4e61..58bf7712277 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2793,6 +2793,36 @@ (define_insn "*lshrsi3_zero_extend_3"
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
+;; Canonical form for a zero-extend of a logical right shift.
+;; Special cases are handled above.
+;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
+(define_insn_and_split "*lshr3_zero_extend_4"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+(zero_extract:GPR
+   (match_operand:GPR 1 "register_operand" " r")
+   (match_operand 2 "const_int_operand")
+   (match_operand 3 "const_int_operand")))
+   (clobber (match_scratch:GPR  4 "="))]
+  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
+   && !TARGET_XTHEADBB"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+ (ashift:GPR (match_dup 1) (match_dup 2)))
+   (set (match_dup 0)
+ (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
+{
+  int regbits = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
+  int sizebits = INTVAL (operands[2]);
+  int startbits = INTVAL (operands[3]);
+  int lshamt = regbits - sizebits - startbits;
+  int rshamt = lshamt + startbits;
+  operands[2] = GEN_INT (lshamt);
+  operands[3] = GEN_INT (rshamt);
+}
+  [(set_attr "type" "shift")
+   (set_attr "mode" "")])
+
 ;; Handle AND with 2^N-1 for N from 12 to XLEN.  This can be split into
 ;; two logical shifts.  Otherwise it requires 3 instructions: lui,
 ;; xor/addi/srli, and.
diff --git a/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h 
b/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
new file mode 100644
index 000..4853fe490d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
@@ -0,0 +1,26 @@
+#ifndef EXTEND_SHIFT_HELPERS_H
+#define EXTEND_SHIFT_HELPERS_H
+
+#define RT_EXT_CT_RSHIFT_N_AT(RTS,RT,CTS,CT,N,ATS,AT)  \
+RTS RT \
+RTS##_##RT##_ext_##CTS##_##CT##_rshift_##N##_##ATS##_##AT(ATS AT v)\
+{  \
+return (CTS CT)(v >> N);   \
+}
+
+#define ULONG_EXT_USHORT_RSHIFT_N_ULONG(N) \
+   RT_EXT_CT_RSHIFT_N_AT(unsigned,long,unsigned,short,N,unsigned,long)
+
+#define ULONG_EXT_UINT_RSHIFT_N_ULONG(N) \
+   RT_EXT_CT_RSHIFT_N_AT(unsigned,long,unsigned,int,N,unsigned,long)
+
+#define UINT_EXT_USHORT_RSHIFT_N_UINT(N) \
+   RT_EXT_CT_RSHIFT_N_AT(unsigned,int,unsigned,short,N,unsigned,int)
+
+#define UINT_EXT_USHORT_RSHIFT_N_ULONG(N) \
+   RT_EXT_CT_RSHIFT_N_AT(unsigned,int,unsigned,short,N,unsigned,long)
+
+#define ULONG_EXT_USHORT_RSHIFT_N_UINT(N) \
+   RT_EXT_CT_RSHIFT_N_AT(unsigned,long,unsigned,short,N,unsigned,int)
+
+#endif /* EXTEND_SHIFT_HELPERS_H */
diff --git a/gcc/testsuite/gcc.target/riscv/pr111501.c 
b/gcc/testsuite/gcc.target/riscv/pr111501.c
new file mode 100644
index 000..db48c34ce9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr111501.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-options "-march=rv64gc" { target { rv64 } } } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+
+unsigned int

[PATCH v2 4/4] RISC-V: Cover sign-extensions in lshr3_zero_extend_4

2024-05-08 Thread Christoph Müllner
The lshr3_zero_extend_4 pattern targets bit extraction
with zero-extension. This pattern represents the canonical form
of zero-extensions of a logical right shift.

The same optimization can be applied to sign-extensions.
Given the two optimizations are so similar, this patch converts
the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (ashiftrt): New code attribute
'extract_shift' and adding extractions to optab.
* config/riscv/riscv.md (*lshr3_zero_extend_4): Rename to...
(*3):...this and add support for
sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/extend-shift-helpers.h: Add helpers for
sign-extension.
* gcc.target/riscv/sign-extend-rshift-32.c: New test.
* gcc.target/riscv/sign-extend-rshift-64.c: New test.
* gcc.target/riscv/sign-extend-rshift.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/iterators.md |   4 +
 gcc/config/riscv/riscv.md |  25 ++--
 .../gcc.target/riscv/extend-shift-helpers.h   |  20 +++
 .../gcc.target/riscv/sign-extend-rshift-32.c  |  17 +++
 .../gcc.target/riscv/sign-extend-rshift-64.c  |  17 +++
 .../gcc.target/riscv/sign-extend-rshift.c | 123 ++
 6 files changed, 198 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sign-extend-rshift-32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sign-extend-rshift-64.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sign-extend-rshift.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index c5ca01f382a..8a9d1986b4a 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -155,6 +155,8 @@ (define_code_iterator any_extend [sign_extend zero_extend])
 (define_code_iterator any_extract [sign_extract zero_extract])
 (define_code_attr extract_sidi_shift [(sign_extract "sraiw")
  (zero_extract "srliw")])
+(define_code_attr extract_shift [(sign_extract "ashiftrt")
+(zero_extract "lshiftrt")])
 
 ;; This code iterator allows the two right shift instructions to be
 ;; generated from the same template.
@@ -261,6 +263,8 @@ (define_code_attr optab [(ashift "ashl")
 (us_minus "ussub")
 (sign_extend "extend")
 (zero_extend "zero_extend")
+(sign_extract "extract")
+(zero_extract "zero_extract")
 (fix "fix_trunc")
 (unsigned_fix "fixuns_trunc")])
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 58bf7712277..620a1b3bd32 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2793,24 +2793,33 @@ (define_insn "*lshrsi3_zero_extend_3"
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
-;; Canonical form for a zero-extend of a logical right shift.
-;; Special cases are handled above.
-;; Skip for single-bit extraction (Zbs/XTheadBs) and th.extu (XTheadBb)
-(define_insn_and_split "*lshr3_zero_extend_4"
+;; Canonical form for a extend of a logical shift right (sign/zero extraction).
+;; Special cases, that are ignored (handled elsewhere):
+;; * Single-bit extraction (Zbs/XTheadBs)
+;; * Single-bit extraction (Zicondops/XVentanaCondops)
+;; * Single-bit extraction (SFB)
+;; * Extraction instruction th.ext(u) (XTheadBb)
+;; * lshrsi3_extend_2 (see above)
+(define_insn_and_split "*3"
   [(set (match_operand:GPR 0 "register_operand" "=r")
-(zero_extract:GPR
+(any_extract:GPR
(match_operand:GPR 1 "register_operand" " r")
(match_operand 2 "const_int_operand")
(match_operand 3 "const_int_operand")))
(clobber (match_scratch:GPR  4 "="))]
-  "!((TARGET_ZBS || TARGET_XTHEADBS) && (INTVAL (operands[2]) == 1))
-   && !TARGET_XTHEADBB"
+  "!((TARGET_ZBS || TARGET_XTHEADBS || TARGET_ZICOND
+  || TARGET_XVENTANACONDOPS || TARGET_SFB_ALU)
+ && (INTVAL (operands[2]) == 1))
+   && !TARGET_XTHEADBB
+   && !(TARGET_64BIT
+&& (INTVAL (operands[3]) > 0)
+&& (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
   "#"
   "&& reload_completed"
   [(set (match_dup 4)
  (ashift:GPR (match_dup 1) (match_dup 2)))
(set (match_dup 0)
- (lshiftrt:GPR (match_dup 4) (match_dup 3)))]
+ (:GPR (match_dup 4) (match_dup 3)))]
 {
   int regbits = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
   int sizebits = INTVAL (operands[2]);
diff --git a/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h 
b/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
index 4853fe490d8..720672de242 100644
--- a/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
+++ b/gcc/testsuite/gcc.target/riscv/extend-shift-helpers.h
@@ -8,6 +8,26 @@ RTS##_##RT##_ext_##CTS##_##CT##_rshift_##N##_##ATS##_##AT(ATS 
AT v)\
 

[PATCH v2 2/4] RISC-V: Cover sign-extensions in lshrsi3_zero_extend_2

2024-05-08 Thread Christoph Müllner
The pattern lshrsi3_zero_extend_2 extracts the MSB bits of the lower
32-bit word and zero-extends it back to DImode.
This is realized using srliw, which operates on 32-bit registers.

The same optimziation can be applied to sign-extensions when emitting
a sraiw instead of the srliw.

Given these two optimizations are so similar, this patch simply
converts the existing one to also cover the sign-extension case as well.

gcc/ChangeLog:

* config/riscv/iterators.md (sraiw): New code iterator 'any_extract'.
New code attribute 'extract_sidi_shift'.
* config/riscv/riscv.md (*lshrsi3_zero_extend_2): Rename to...
(*lshrsi3_extend_2):...this and add support for sign-extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: Test sraiw 24 and sraiw 16.

Signed-off-by: Christoph Müllner 
---
 gcc/config/riscv/iterators.md  |  6 ++
 gcc/config/riscv/riscv.md  |  9 +
 gcc/testsuite/gcc.target/riscv/sign-extend-1.c | 14 ++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 32e1b140305..c5ca01f382a 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -150,6 +150,12 @@ (define_mode_attr slot12_offset [(SI "-52") (DI "-104")])
 ;; to use the same template.
 (define_code_iterator any_extend [sign_extend zero_extend])
 
+;; These code iterators allow unsigned and signed extraction to be generated
+;; from the same template.
+(define_code_iterator any_extract [sign_extract zero_extract])
+(define_code_attr extract_sidi_shift [(sign_extract "sraiw")
+ (zero_extract "srliw")])
+
 ;; This code iterator allows the two right shift instructions to be
 ;; generated from the same template.
 (define_code_iterator any_shiftrt [ashiftrt lshiftrt])
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 24558682eb8..b7fc13e4e61 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2765,16 +2765,17 @@ (define_insn "*lshrsi3_zero_extend_1"
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
 
-;; Canonical form for a zero-extend of a logical right shift.
-(define_insn "*lshrsi3_zero_extend_2"
+;; Canonical form for a sign/zero-extend of a logical right shift.
+;; Special case: extract MSB bits of lower 32-bit word
+(define_insn "*lshrsi3_extend_2"
   [(set (match_operand:DI   0 "register_operand" "=r")
-   (zero_extract:DI (match_operand:DI  1 "register_operand" " r")
+   (any_extract:DI (match_operand:DI  1 "register_operand" " r")
 (match_operand 2 "const_int_operand")
 (match_operand 3 "const_int_operand")))]
   "(TARGET_64BIT && (INTVAL (operands[3]) > 0)
 && (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))"
 {
-  return "srliw\t%0,%1,%3";
+  return "\t%0,%1,%3";
 }
   [(set_attr "type" "shift")
(set_attr "mode" "SI")])
diff --git a/gcc/testsuite/gcc.target/riscv/sign-extend-1.c 
b/gcc/testsuite/gcc.target/riscv/sign-extend-1.c
index e9056ec0d42..d8c18dd1aaa 100644
--- a/gcc/testsuite/gcc.target/riscv/sign-extend-1.c
+++ b/gcc/testsuite/gcc.target/riscv/sign-extend-1.c
@@ -9,6 +9,20 @@ foo1 (int i)
 }
 /* { dg-final { scan-assembler "sraiw\ta\[0-9\],a\[0-9\],31" } } */
 
+signed char
+sub2 (long i)
+{
+  return i >> 24;
+}
+/* { dg-final { scan-assembler "sraiw\ta\[0-9\],a\[0-9\],24" } } */
+
+signed short
+sub3 (long i)
+{
+  return i >> 16;
+}
+/* { dg-final { scan-assembler "sraiw\ta\[0-9\],a\[0-9\],16" } } */
+
 /* { dg-final { scan-assembler-not "srai\t" } } */
 /* { dg-final { scan-assembler-not "srli\t" } } */
 /* { dg-final { scan-assembler-not "srliw\t" } } */
-- 
2.44.0



[PATCH v2 1/4] RISC-V: Add test for sraiw-31 special case

2024-05-08 Thread Christoph Müllner
We already optimize a sign-extension of a right-shift by 31 in
si3_extend.  Let's add a test for that (similar to
zero-extend-1.c).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sign-extend-1.c: New test.

Signed-off-by: Christoph Müllner 
---
 gcc/testsuite/gcc.target/riscv/sign-extend-1.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sign-extend-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/sign-extend-1.c 
b/gcc/testsuite/gcc.target/riscv/sign-extend-1.c
new file mode 100644
index 000..e9056ec0d42
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sign-extend-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { riscv64*-*-* } } } */
+/* { dg-options "-march=rv64gc -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+
+signed long
+foo1 (int i)
+{
+  return i >> 31;
+}
+/* { dg-final { scan-assembler "sraiw\ta\[0-9\],a\[0-9\],31" } } */
+
+/* { dg-final { scan-assembler-not "srai\t" } } */
+/* { dg-final { scan-assembler-not "srli\t" } } */
+/* { dg-final { scan-assembler-not "srliw\t" } } */
-- 
2.44.0



Re: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2024-05-08 Thread Richard Biener
On Tue, 7 May 2024, Kees Cook wrote:

> On Tue, May 07, 2024 at 06:34:19PM +, Qing Zhao wrote:
> > On May 7, 2024, at 13:57, Sebastian Huber 
> >  wrote:
> > > On 07.05.24 16:26, Qing Zhao wrote:
> > > > Hi, Sebastian,
> > > > Thanks for your explanation.
> > > > Our goal is to deprecate the GCC extension on  structure
> > > > containing a flexible array member not at the end of another
> > > > structure. In order to achieve this goal, we provided the warning option
> > > > -Wflex-array-member-not-at-end for the users to locate all such
> > > > cases in their source code and update the source code to eliminate
> > > > such cases.
> > >
> > > What is the benefit of deprecating this GCC extension? If GCC
> > > extensions are removed, then it would be nice to enable the associated
> > > warnings by default.
> 
> The goal of all of the recent array bounds and flexible array work is to
> make sizing information unambiguous (e.g. via __builtin_object_size(),
> __builtin_dynamic_object_size(), and the array-bounds sanitizer). For
> the compiler to be able to deterministically report size information
> on arrays, we needed to deprecate this case even though it had been
> supported in the past. (Though we also _added_ extensions to support
> for other things, like flexible arrays in unions, and the coming
> __counted_by attribute.)
> 
> For example:
> 
> struct flex { int length; char data[]; };
> struct mid_flex { int m; struct flex flex_data; int n; int o; };

It might be reasonable to allow tag-less "anonymous" struct that's
"completed" by means of the static initializer of a declared object, thus

struct { int m; struct { int length, char data[]; } flex_data; int n; int 
o; } object
 = { 3, { 2, { 1, 2 } }, 4, 5 };

The frontend would make the size of data[] static, determined by the
initializer.  I _think_ the C standard makes object.flex_data
inter-operate with struct flex from above but struct mid_flex would
be an invalid type declaration and thus there's no way to have
an API with a pointer to such structure(?) which makes the extension
somewhat less useful.

> 
> #define SZ(p) __builtin_dynamic_object_size(p, 1)
> 
> void foo(struct flex *f, struct mid_flex *m)
> {
>   printf("%zu\n", SZ(f));
>   printf("%zu\n", SZ(m->flex_data));
> }
> 
> int main(void)
> {
> struct mid_flex m = { .flex_data.length = 8 };
>   foo(>flex_data, );
>   return 0;
> }
> 
> This is printing the size of the same object. But the desired results
> are ambiguous. Does m->flex_data have an unknown size (i.e. SIZE_MAX)
> because it's a flex array, or does it contain 8 bytes, since it overlaps
> with the other structure's trailing 2 ints?
> 
> The answer from GCC 13 was neither:
> 
> 18446744073709551615
> 4
> 
> It considered flex_data to be only the size of it's non-flex-array
> members, but only when there was semantic context that it was part of
> another structure. (Yet more ambiguity.)
> 
> In GCC 14, this is "resolved" to be unknown since it is a flex array
> which has no sizing info, and context doesn't matter:
> 
> 18446744073709551615
> 18446744073709551615
> 
> But this paves the way for the coming 'counted_by' attribute which will
> allow for struct flex above to be defined as:
> 
> struct flex { int length; char data[] __attribute__((counted_by(length))); };
> 
> At which point GCC can deterministically report the object size.
> 
> Hopefully I've captured this all correctly -- Qing can correct me. :)
> 
> > >
> > > > We had a long discussion before deciding to deprecating this GCC
> > > > extension. Please see details here:
> > > >
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
> > > >
> > > > Yes, we do plan to enable this warning by default before final
> > > > deprecation.  (Might consider to enable this warning by default in
> > > > GCC15… and then deprecate it in the next release)
> > > >
> > > > Right now, there is an ongoing work in Linux kernel to get rid of
> > > > all such cases. Kees might have more information on this.
> > > >
> > > >
> > > > The static initialization of structures with flexible array members
> > > > will still work as long as the flexible array members are at the end of
> > > > the structures.
> > >
> > > Removing the support for flexible array members in the middle of
> > > compounds will make the static initialization practically infeasible.
> >
> > If the flexible array member is moved to the end of the compounds,
> > the static initialization still work. What’s the issue here?
> >
> > > > My question: is it possible to update your source code to move
> > > > the structure with flexible array member to the end of the containing
> > > > structure?
> > > >
> > > > i.e, in your example, in the struct Thread_Configured_control,
> > > > move the field “Thread_Control Control” to the end of the structure?
> > >
> > > If we move the Thread_Control to the end, how would I add a
> > > configuration defined number of elements at the end?
> >
> > Don’t 

Re: [PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-05-08 Thread Richard Biener
On Tue, May 7, 2024 at 10:44 PM Joseph Myers  wrote:
>
> On Fri, 3 May 2024, Richard Biener wrote:
>
> > So what I do not necessarily agree with is that we need to preserve
> > the multiplication with -fsignaling-nans.  Do we consider a program doing
> >
> > handler() { exit(0); }
> >
> >  x = sNaN;
> > ...
> >  sigaction(SIGFPE, ... handler)
> >  x*x;
> >  format_hard_drive();
> >
> > and expecting the program to exit(0) rather than formating the hard-disk
> > to be expecting something the C standard guarantees?  And is it enough
> > for the program to enable -fsignaling-nans for this?
> >
> > If so then the first and foremost bug is that 'x*x' doesn't have
> > TREE_SIDE_EFFECTS
> > set and thus we do not preserve it when optimizing __builtin_signbit () of 
> > it.
>
> Signaling NaNs don't seem relevant here.  "Signal" means "set the
> exception flag" - and 0 * Inf raises the same "invalid" exception flag as
> sNaN * sNaN.  Changing flow of control on an exception is outside the
> scope of standard C and requires nonstandard extensions such as
> feenableexcept.  (At present -ftrapping-math covers both kinds of
> exception handling - the default setting of a flag, and the nonstandard
> change of flow of control.)

So it's reasonable to require -fnon-call-exceptions (which now enables
-fexceptions) and -fno-delete-dead-exceptions to have GCC preserve
a change of control flow side-effect of x*x?  We do not preserve
FP exception bits set by otherwise unused operations, that is, we
do not consider that side-effect to be observable even with
-ftrapping-math.  In fact I most uses of flag_trapping_math
are related to a possible control flow side-effect of FP math.
Exact preservation of FP exception flags will likely have to disable
all FP optimization if one considers FE_INEXACT and FE_UNDERFLOW.

Every time I try to make up my mind how to improve the situation for
the user I'm only confusing myself :/

Richard.

> --
> Joseph S. Myers
> josmy...@redhat.com
>


[PATCH] testsuite, rs6000: Remove powerpcspe test cases and checks

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-4728 the powerpcspe support had been removed, this
follow-up patch is to remove the remaining pieces in testsuite.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_cmdline_needed): Remove
check_effective_target_powerpc_spe.
(check_effective_target_powerpc_spe_nocache): Remove.
(check_effective_target_powerpc_spe): Remove.
(check_ppc_cpu_supports_hw_available): Remove powerpc*-*-eabispe check.
(check_p8vector_hw_available): Likewise.
(check_p9vector_hw_available): Likewise.
(check_p9modulo_hw_available): Likewise.
(check_ppc_float128_sw_available): Likewise.
(check_ppc_float128_hw_available): Likewise.
(check_vsx_hw_available): Likewise.
(check_vmx_hw_available): Likewise.
(check_ppc_recip_hw_available): Likewise.
(check_dfp_hw_available): Likewise.
(check_htm_hw_available): Likewise.
* g++.dg/ext/spe1.C: Remove.
* g++.dg/other/opaque-1.C: Remove.
* g++.dg/other/opaque-2.C: Remove.
* g++.dg/other/opaque-3.C: Remove.
* g++.target/powerpc/simd-5.C: Remove.
---
 gcc/testsuite/g++.dg/ext/spe1.C   | 10 -
 gcc/testsuite/g++.dg/other/opaque-1.C | 31 --
 gcc/testsuite/g++.dg/other/opaque-2.C | 19 -
 gcc/testsuite/g++.dg/other/opaque-3.C | 12 --
 gcc/testsuite/g++.target/powerpc/simd-5.C | 44 ---
 gcc/testsuite/lib/target-supports.exp | 51 +++
 6 files changed, 5 insertions(+), 162 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/ext/spe1.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-1.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-2.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-3.C
 delete mode 100644 gcc/testsuite/g++.target/powerpc/simd-5.C

diff --git a/gcc/testsuite/g++.dg/ext/spe1.C b/gcc/testsuite/g++.dg/ext/spe1.C
deleted file mode 100644
index b98d4b27b3d..000
--- a/gcc/testsuite/g++.dg/ext/spe1.C
+++ /dev/null
@@ -1,10 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single -O0" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-typedef int v2si __attribute__ ((vector_size (8)));
-
-/* The two specializations must be considered different.  */
-template  class X { };
-template <>class X<__ev64_opaque__> { };
-template <>class X   { };
diff --git a/gcc/testsuite/g++.dg/other/opaque-1.C 
b/gcc/testsuite/g++.dg/other/opaque-1.C
deleted file mode 100644
index 669776b9f97..000
--- a/gcc/testsuite/g++.dg/other/opaque-1.C
+++ /dev/null
@@ -1,31 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-#define __vector __attribute__((vector_size(8)))
-typedef float __vector __ev64_fs__;
-
-__ev64_fs__ f;
-__ev64_opaque__ o;
-
-int here = 0;
-
-void bar (__ev64_opaque__ x)
-{
-  here = 0;
-}
-
-void bar (__ev64_fs__ x)
-{
-  here = 888;
-}
-
-int main ()
-{
-  f = o;
-  o = f;
-  bar (f);
-  if (here != 888)
-return 1;
-  return 0;
-}
diff --git a/gcc/testsuite/g++.dg/other/opaque-2.C 
b/gcc/testsuite/g++.dg/other/opaque-2.C
deleted file mode 100644
index 414f87e6c9a..000
--- a/gcc/testsuite/g++.dg/other/opaque-2.C
+++ /dev/null
@@ -1,19 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-#define __vector __attribute__((vector_size(8)))
-typedef float __vector __ev64_fs__;
-
-__ev64_fs__ f;
-__ev64_opaque__ o;
-
-extern void bar (__ev64_opaque__);
-
-int main ()
-{
-  f = o;
-  o = f;
-  bar (f);
-  return 0;
-}
diff --git a/gcc/testsuite/g++.dg/other/opaque-3.C 
b/gcc/testsuite/g++.dg/other/opaque-3.C
deleted file mode 100644
index f915f840510..000
--- a/gcc/testsuite/g++.dg/other/opaque-3.C
+++ /dev/null
@@ -1,12 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-__ev64_opaque__ o;
-#define v __attribute__((vector_size(8)))
-v unsigned int *p;
-
-void m()
-{
-  o = __builtin_spe_evldd(p, 5);
-}
diff --git a/gcc/testsuite/g++.target/powerpc/simd-5.C 
b/gcc/testsuite/g++.target/powerpc/simd-5.C
deleted file mode 100644
index 71e117ead2a..000
--- a/gcc/testsuite/g++.target/powerpc/simd-5.C
+++ /dev/null
@@ -1,44 +0,0 @@
-// Test EH with V2SI SIMD registers actually restores correct values.
-// Origin: Joseph Myers 
-// { dg-options "-O" }
-// { dg-do run { target { powerpc_spe && { ! *-*-vxworks* } } } }
-
-extern "C" void abort (void);
-extern 

[PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-08 Thread HAO CHEN GUI
Hi,
  This patch enables overlapped by-piece operations. On rs6000, default
move/set/clear ratio is 2. So the overlap is only enabled with compare
by-pieces.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable overlapped by-pieces operations

This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
ratio is 2.  So the overlap is only enabled with compare by-pieces.

gcc/
* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.

gcc/testsuite/
* gcc.target/powerpc/block-cmp-9.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6b9a40fcc66..2b5f5cf1d86 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
rs6000_attribute_table[] =
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x8000

+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
+
 

 /* Processor table.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
new file mode 100644
index 000..b5f51affbb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
+
+/* Test if by-piece overlap compare is enabled and following case is
+   implemented by two overlap word loads and compares.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 7) == 0;
+}


Re: [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-08 Thread Uros Bizjak
On Wed, May 8, 2024 at 4:44 AM Levy Hsu  wrote:
>
> PR target/107563
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
> subroutine.
> (ix86_expand_vec_perm_const_1): New Entry.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr107563.C: New test.
> ---
>  gcc/config/i386/i386-expand.cc   | 64 
>  gcc/testsuite/g++.target/i386/pr107563.C | 23 +
>  2 files changed, 87 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 2f27bfb484c..2718b0acb87 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct 
> expand_vec_perm_d *d, bool two_insn)
>return true;
>  }
>
> +/* A subroutine of ix86_expand_vec_perm_const_1.
> +   Implement a permutation with psrlw, psllw and por.
> +   It handles case:
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */
> +
> +static bool
> +expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d)
> +{
> +  unsigned i;
> +  rtx (*gen_shr) (rtx, rtx, rtx);
> +  rtx (*gen_shl) (rtx, rtx, rtx);
> +  rtx (*gen_or) (rtx, rtx, rtx);
> +  machine_mode mode = VOIDmode;
> +
> +  if (!TARGET_SSE2 || !d->one_operand_p)
> +return false;
> +
> +  switch (d->vmode)
> +{
> +case E_V8QImode:
> +  if (!TARGET_MMX_WITH_SSE)
> +   return false;
> +  mode = V4HImode;
> +  gen_shr = gen_ashrv4hi3;
> +  gen_shl = gen_ashlv4hi3;
> +  gen_or = gen_iorv4hi3;
> +  break;
> +case E_V16QImode:
> +  mode = V8HImode;
> +  gen_shr = gen_vlshrv8hi3;
> +  gen_shl = gen_vashlv8hi3;
> +  gen_or = gen_iorv8hi3;
> +  break;
> +default: return false;
> +}
> +
> +  if (!rtx_equal_p (d->op0, d->op1))
> +return false;
> +
> +  for (i = 0; i < d->nelt; i += 2)
> +if (d->perm[i] != i + 1 || d->perm[i + 1] != i)
> +  return false;
> +
> +  if (d->testing_p)
> +return true;
> +
> +  rtx tmp1 = gen_reg_rtx (mode);
> +  rtx tmp2 = gen_reg_rtx (mode);
> +  rtx op0 = force_reg (d->vmode, d->op0);
> +
> +  emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode));
> +  emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode));
> +  emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8)));
> +  emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8)));
> +  emit_insn (gen_or (tmp1, tmp1, tmp2));
> +  emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode));
> +
> +  return true;
> +}
> +
>  /* A subroutine of ix86_expand_vec_perm_const_1.  Implement a V4DF
> permutation using two vperm2f128, followed by a vshufpd insn blending
> the two vectors together.  */
> @@ -23781,6 +23842,9 @@ ix86_expand_vec_perm_const_1 (struct 
> expand_vec_perm_d *d)
>
>if (expand_vec_perm_2perm_pblendv (d, false))
>  return true;
> +
> +  if (expand_vec_perm_psrlw_psllw_por (d))
> +return true;
>
>/* Try sequences of four instructions.  */
>
> diff --git a/gcc/testsuite/g++.target/i386/pr107563.C 
> b/gcc/testsuite/g++.target/i386/pr107563.C
> new file mode 100755
> index 000..5b0c648e8f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr107563.C
> @@ -0,0 +1,23 @@
> +/* PR target/107563.C */
> +/* { dg-do compile { target { ! ia32 } } } */

Please split the testcase to two files, one (e.g. pr107563-a.C)
testing 8-byte vectors and the other (e.g. pr107563-b.C) using 16-byte
vectors. The latter can also be tested with 32-bit targets.

Uros.

> +/* { dg-options "-std=c++2b -O3 -msse2" } */
> +/* { dg-final { scan-assembler-not "movzbl" } } */
> +/* { dg-final { scan-assembler-not "salq" } } */
> +/* { dg-final { scan-assembler-not "orq" } } */
> +/* { dg-final { scan-assembler-not "punpcklqdq" } } */
> +/* { dg-final { scan-assembler-times "psllw" 2 } } */
> +/* { dg-final { scan-assembler-times "psrlw" 1 } } */
> +/* { dg-final { scan-assembler-times "psraw" 1 } } */
> +/* { dg-final { scan-assembler-times "por" 2 } } */
> +
> +using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> +void foo (temp_vec_type& v) noexcept
> +{
> +  v = __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> +}
> +
> +using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] = char;
> +void foo2 (temp_vec_type2& v) noexcept
> +{
> +  v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6);
> +}
> --
> 2.31.1
>


[PATCH] testsuite, rs6000: Remove powerpc_popcntb_ok

2024-05-08 Thread Kewen.Lin
Hi,

There are three uses of effective target powerpc_popcntb_ok,
they are all for compiling, but powerpc_popcntb_ok checks
for executable generation, which is too heavy.  This patch
is to remove powerpc_popcntb_ok and adjust its three uses
accordingly.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_powerpc_popcntb_ok):
Remove.
* gcc.target/powerpc/cmpb-2.c: Adjust with dg-skip-if as
powerpc_popcntb_ok gets removed.
* gcc.target/powerpc/cmpb-3.c: Likewise.
* gcc.target/powerpc/cmpb32-2.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/cmpb-3.c   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c |  3 ++-
 gcc/testsuite/lib/target-supports.exp   | 20 
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
index 02b84d0731d..44a554bee4a 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
 /* { dg-options "-mdejagnu-cpu=power5" } */

 void abort ();
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
index 75641bdb22c..43de37a571d 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-require-effective-target ilp32 } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
 /* { dg-options "-mdejagnu-cpu=power6" } */

 void abort ();
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
index d4264ab6e7d..0713c44fcff 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-options "-mdejagnu-cpu=power5" } */

 void abort ();
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 16dc2766850..5f34f02c387 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3952,26 +3952,6 @@ proc check_effective_target_unsigned_char {} {
 }]
 }

-proc check_effective_target_powerpc_popcntb_ok { } {
-return [check_cached_effective_target powerpc_popcntb_ok {
-
-   # Disable on Darwin.
-   if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
[istarget *-*-darwin*]} {
-   expr 0
-   } else {
-   check_runtime_nocache powerpc_popcntb_ok {
-   volatile int r;
-   volatile int a = 0x12345678;
-   int main()
-   {
-   asm volatile ("popcntb %0,%1" : "=r" (r) : "r" (a));
-   return 0;
-   }
-   } "-mcpu=power5"
-   }
-}]
-}
-
 # Return 1 if the target supports executing DFP hardware instructions,
 # 0 otherwise.  Cache the result.

--
2.39.1


[PATCH 1/2] testsuite, rs6000: Make powerpc_vsx consider current_compiler_flags [PR114842]

2024-05-08 Thread Kewen.Lin
Hi,

As noted in PR114842, most of the test cases which require
effective target check powerpc_vsx_ok actually care about
if VSX feature is enabled, and they should adopt effective
target powerpc_vsx instead.  By considering we already have
a number of test cases having explicit -mvsx in dg-options
etc., to keep them still be tested as before even without
vsx enabled by default, this patch is to make powerpc_vsx
consider current_compiler_flags.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

PR testsuite/114842

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_powerpc_vsx): Take
current_compiler_flags into account.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 969456281c7..713898d5554 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7144,7 +7144,7 @@ proc check_effective_target_powerpc_vsx { } {
  nope no vsx
#endif
}
-}]
+} [current_compiler_flags]]
 }

 # Return 1 if this is a PowerPC target supporting -mvsx
--
2.39.1


[PATCH] testsuite, rs6000: Remove effective target powerpc_405_nocache

2024-05-08 Thread Kewen.Lin
Hi,

With the introduction of -mdejagnu-cpu=, when the test case
is specifying -mdejagnu-cpu=405, it would override the other
possibly given -mcpu=, so it would compile for PowerPC 405
for sure.  This patch is to remove the effective target
powerpc_405_nocache and update all its uses.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/405-dlmzb-strlen-1.c: Remove the line using
powerpc_405_nocache check.
* gcc.target/powerpc/405-macchw-1.c: Likewise.
* gcc.target/powerpc/405-macchw-2.c: Likewise.
* gcc.target/powerpc/405-macchwu-1.c: Likewise.
* gcc.target/powerpc/405-macchwu-2.c: Likewise.
* gcc.target/powerpc/405-machhw-1.c: Likewise.
* gcc.target/powerpc/405-machhw-2.c: Likewise.
* gcc.target/powerpc/405-machhwu-1.c: Likewise.
* gcc.target/powerpc/405-machhwu-2.c: Likewise.
* gcc.target/powerpc/405-maclhw-1.c: Likewise.
* gcc.target/powerpc/405-maclhw-2.c: Likewise.
* gcc.target/powerpc/405-maclhwu-1.c: Likewise.
* gcc.target/powerpc/405-maclhwu-2.c: Likewise.
* gcc.target/powerpc/405-mulchw-1.c: Likewise.
* gcc.target/powerpc/405-mulchw-2.c: Likewise.
* gcc.target/powerpc/405-mulchwu-1.c: Likewise.
* gcc.target/powerpc/405-mulchwu-2.c: Likewise.
* gcc.target/powerpc/405-mulhhw-1.c: Likewise.
* gcc.target/powerpc/405-mulhhw-2.c: Likewise.
* gcc.target/powerpc/405-mulhhwu-1.c: Likewise.
* gcc.target/powerpc/405-mulhhwu-2.c: Likewise.
* gcc.target/powerpc/405-mullhw-1.c: Likewise.
* gcc.target/powerpc/405-mullhw-2.c: Likewise.
* gcc.target/powerpc/405-mullhwu-1.c: Likewise.
* gcc.target/powerpc/405-mullhwu-2.c: Likewise.
* gcc.target/powerpc/405-nmacchw-1.c: Likewise.
* gcc.target/powerpc/405-nmacchw-2.c: Likewise.
* gcc.target/powerpc/405-nmachhw-1.c: Likewise.
* gcc.target/powerpc/405-nmachhw-2.c: Likewise.
* gcc.target/powerpc/405-nmaclhw-1.c: Likewise.
* gcc.target/powerpc/405-nmaclhw-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_powerpc_405_nocache): Remove.
---
 .../gcc.target/powerpc/405-dlmzb-strlen-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-macchw-1.c |  6 +-
 gcc/testsuite/gcc.target/powerpc/405-macchw-2.c |  1 -
 .../gcc.target/powerpc/405-macchwu-1.c  |  1 -
 .../gcc.target/powerpc/405-macchwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-machhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-machhw-2.c |  1 -
 .../gcc.target/powerpc/405-machhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-machhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-maclhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-maclhw-2.c |  1 -
 .../gcc.target/powerpc/405-maclhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-maclhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulchw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulchw-2.c |  1 -
 .../gcc.target/powerpc/405-mulchwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mulchwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulhhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulhhw-2.c |  1 -
 .../gcc.target/powerpc/405-mulhhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mulhhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mullhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mullhw-2.c |  1 -
 .../gcc.target/powerpc/405-mullhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mullhwu-2.c  |  1 -
 .../gcc.target/powerpc/405-nmacchw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmacchw-2.c  |  1 -
 .../gcc.target/powerpc/405-nmachhw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmachhw-2.c  |  1 -
 .../gcc.target/powerpc/405-nmaclhw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmaclhw-2.c  |  1 -
 gcc/testsuite/lib/target-supports.exp   | 17 -
 32 files changed, 5 insertions(+), 48 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c 
b/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
index 5ee427a3b4a..984ffe7144c 100644
--- a/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
@@ -4,7 +4,6 @@
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
 /* { dg-require-effective-target ilp32 } */
 /* { dg-options "-O2 -mdejagnu-cpu=405" } */
-/* { dg-skip-if "other options override -mcpu=405" { ! powerpc_405_nocache } } 
*/

 /* { dg-final { scan-assembler "dlmzb\\. " } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c 
b/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c
index 2253a9c9deb..10ea9cc10f8 100644
--- a/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c
+++ 

[PATCH] libgcc, rs6000: Remove powerpcspe related code

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-4728 the powerpcspe support had been removed, this
follow-up patch is to remove the remaining pieces in libgcc.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

libgcc/ChangeLog:

* config.host: Remove powerpc-*-eabispe* support.
* config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Remove
__SPE__ code.
* config/rs6000/t-savresfgpr (LIB2ADD_ST): Remove e500crtres32gpr.S,
e500crtres32gpr.S, e500crtsav64gpr.S, e500crtsav64gprctr.S,
e500crtres64gpr.S, e500crtsav32gpr.S, e500crtsavg32gpr.S,
e500crtres64gprctr.S, e500crtsavg64gprctr.S, e500crtresx32gpr.S,
e500crtrest32gpr.S, e500crtrest64gpr.S and e500crtresx64gpr.S.
* config/rs6000/e500crtres32gpr.S: Remove.
* config/rs6000/e500crtres64gpr.S: Remove.
* config/rs6000/e500crtres64gprctr.S: Remove.
* config/rs6000/e500crtrest32gpr.S: Remove.
* config/rs6000/e500crtrest64gpr.S: Remove.
* config/rs6000/e500crtresx32gpr.S: Remove.
* config/rs6000/e500crtresx64gpr.S: Remove.
* config/rs6000/e500crtsav32gpr.S: Remove.
* config/rs6000/e500crtsav64gpr.S: Remove.
* config/rs6000/e500crtsav64gprctr.S: Remove.
* config/rs6000/e500crtsavg32gpr.S: Remove.
* config/rs6000/e500crtsavg64gpr.S: Remove.
* config/rs6000/e500crtsavg64gprctr.S: Remove.
---
 libgcc/config.host |  4 -
 libgcc/config/rs6000/e500crtres32gpr.S | 73 -
 libgcc/config/rs6000/e500crtres64gpr.S | 73 -
 libgcc/config/rs6000/e500crtres64gprctr.S  | 90 -
 libgcc/config/rs6000/e500crtrest32gpr.S| 75 --
 libgcc/config/rs6000/e500crtrest64gpr.S| 74 --
 libgcc/config/rs6000/e500crtresx32gpr.S| 75 --
 libgcc/config/rs6000/e500crtresx64gpr.S| 75 --
 libgcc/config/rs6000/e500crtsav32gpr.S | 73 -
 libgcc/config/rs6000/e500crtsav64gpr.S | 72 -
 libgcc/config/rs6000/e500crtsav64gprctr.S  | 91 --
 libgcc/config/rs6000/e500crtsavg32gpr.S| 73 -
 libgcc/config/rs6000/e500crtsavg64gpr.S| 73 -
 libgcc/config/rs6000/e500crtsavg64gprctr.S | 90 -
 libgcc/config/rs6000/linux-unwind.h| 11 ---
 libgcc/config/rs6000/t-savresfgpr  | 15 +---
 16 files changed, 1 insertion(+), 1036 deletions(-)
 delete mode 100644 libgcc/config/rs6000/e500crtres32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtres64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtres64gprctr.S
 delete mode 100644 libgcc/config/rs6000/e500crtrest32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtrest64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtresx32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtresx64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav64gprctr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg64gprctr.S

diff --git a/libgcc/config.host b/libgcc/config.host
index e75a7af647f..fe958caa040 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1236,10 +1236,6 @@ powerpc*-*-freebsd*)
 powerpc-*-netbsd*)
tmake_file="$tmake_file rs6000/t-netbsd rs6000/t-crtstuff"
;;
-powerpc-*-eabispe*)
-   tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-savresfgpr 
rs6000/t-crtstuff t-crtstuff-pic t-fdpbit"
-   extra_parts="$extra_parts crtbegin.o crtend.o crtbeginS.o crtendS.o 
crtbeginT.o ecrti.o ecrtn.o ncrti.o ncrtn.o"
-   ;;
 powerpc-*-eabisimaltivec*)
tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-crtstuff 
t-crtstuff-pic t-fdpbit"
extra_parts="$extra_parts crtbegin.o crtend.o crtbeginS.o crtendS.o 
crtbeginT.o ecrti.o ecrtn.o ncrti.o ncrtn.o"
diff --git a/libgcc/config/rs6000/e500crtres32gpr.S 
b/libgcc/config/rs6000/e500crtres32gpr.S
deleted file mode 100644
index b19703073ca..000
--- a/libgcc/config/rs6000/e500crtres32gpr.S
+++ /dev/null
@@ -1,73 +0,0 @@
-/*
- * Special support for e500 eabi and SVR4
- *
- *   Copyright (C) 2008-2024 Free Software Foundation, Inc.
- *   Written by Nathan Froyd
- *
- * This file is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 3, or (at your option) any
- * later version.
- *
- * This file is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * 

[PATCH] rs6000: Add assert !TARGET_VSX if !TARGET_ALTIVEC and strip a useless check

2024-05-08 Thread Kewen.Lin
Hi,

In function rs6000_option_override_internal, we have the
checks and adjustments like:

  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;

  if (TARGET_P8_VECTOR && !TARGET_VSX)
rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;

But in fact some previous code has guaranteed !TARGET_VSX if
!TARGET_ALTIVEC, so we can remove the former check and
adjustment.  This patch is to remove it accordingly and also
place an explicit assertion.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
useless check on TARGET_P8_VECTOR && !TARGET_ALTIVEC and add an
assertion on !TARGET_VSX if !TARGET_ALTIVEC.
---
 gcc/config/rs6000/rs6000.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 117999613d8..b5553e27aa3 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3937,8 +3937,9 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_isa_flags &= ~OPTION_MASK_FPRND;
 }

-  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
-rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+  /* Assert !TARGET_VSX if !TARGET_ALTIVEC and make some adjustments
+ based on either !TARGET_VSX or !TARGET_ALTIVEC concise.  */
+  gcc_assert (TARGET_ALTIVEC || !TARGET_VSX);

   if (TARGET_P8_VECTOR && !TARGET_VSX)
 rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
--
2.39.1


[PATCH] testsuite, rs6000: Remove all linux*paired* checks and cases

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-115-g559289370f76bf the support of paired single
had been dropped, but we still have some test checks and
cases for that, this patch is to get rid of them.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_int): Remove
the check on powerpc-*-linux*paired*.
(check_effective_target_vect_intfloat_cvt): Likewise.
(check_effective_target_vect_uintfloat_cvt): Likewise.
(check_effective_target_vect_floatint_cvt): Likewise.
(check_effective_target_vect_floatuint_cvt): Likewise.
(check_effective_target_powerpc_altivec_ok): Likewise.
(check_effective_target_powerpc_p9modulo_ok): Likewise.
(check_effective_target_powerpc_float128_sw_ok): Likewise.
(check_effective_target_powerpc_float128_hw_ok): Likewise.
(check_effective_target_powerpc_vsx_ok): Likewise.
(check_effective_target_powerpc_htm_ok): Likewise.
(check_effective_target_vect_shift): Likewise.
(check_effective_target_vect_char_add): Likewise.
(check_effective_target_vect_shift_char): Likewise.
(check_effective_target_vect_long): Likewise.
(check_effective_target_ifn_copysign): Likewise.
(check_effective_target_vect_sdot_hi): Likewise.
(check_effective_target_vect_udot_hi): Likewise.
(check_effective_target_vect_pack_trunc): Likewise.
(check_effective_target_vect_int_mult): Likewise.
* gcc.target/powerpc/paired-1.c: Remove.
* gcc.target/powerpc/paired-10.c: Remove.
* gcc.target/powerpc/paired-2.c: Remove.
* gcc.target/powerpc/paired-3.c: Remove.
* gcc.target/powerpc/paired-4.c: Remove.
* gcc.target/powerpc/paired-5.c: Remove.
* gcc.target/powerpc/paired-6.c: Remove.
* gcc.target/powerpc/paired-7.c: Remove.
* gcc.target/powerpc/paired-8.c: Remove.
* gcc.target/powerpc/paired-9.c: Remove.
* gcc.target/powerpc/ppc-paired.c: Remove.
---
 gcc/testsuite/gcc.target/powerpc/paired-1.c   | 33 ---
 gcc/testsuite/gcc.target/powerpc/paired-10.c  | 25 
 gcc/testsuite/gcc.target/powerpc/paired-2.c   | 35 ---
 gcc/testsuite/gcc.target/powerpc/paired-3.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-4.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-5.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-6.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-7.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-8.c   | 25 
 gcc/testsuite/gcc.target/powerpc/paired-9.c   | 25 
 gcc/testsuite/gcc.target/powerpc/ppc-paired.c | 45 --
 gcc/testsuite/lib/target-supports.exp | 59 +++
 12 files changed, 20 insertions(+), 397 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-1.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-10.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-3.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-4.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-5.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-6.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-9.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-paired.c

diff --git a/gcc/testsuite/gcc.target/powerpc/paired-1.c 
b/gcc/testsuite/gcc.target/powerpc/paired-1.c
deleted file mode 100644
index 19a66a15b30..000
--- a/gcc/testsuite/gcc.target/powerpc/paired-1.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-do compile { target { powerpc-*-linux*paired* && ilp32} } } */
-/* { dg-options "-mpaired -ffinite-math-only " } */
-
-/* Test PowerPC PAIRED extensions.  */
-
-#include 
-
-static float in1[2] __attribute__ ((aligned (8))) =
-{6.0, 7.0};
-static float in2[2] __attribute__ ((aligned (8))) =
-{4.0, 3.0};
-
-static float out[2] __attribute__ ((aligned (8)));
-
-vector float a, b, c, d;
-void
-test_api ()
-{
-  b = paired_lx (0, in1);
-  c = paired_lx (0, in2);
-
-  a = paired_sub (b, c);
-
-  paired_stx (a, 0, out);
-}
-
-int
-main ()
-{
-  test_api ();
-  return (0);
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/paired-10.c 
b/gcc/testsuite/gcc.target/powerpc/paired-10.c
deleted file mode 100644
index 1f904c25841..000
--- a/gcc/testsuite/gcc.target/powerpc/paired-10.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* { dg-do compile { target { powerpc-*-linux*paired* && ilp32 } } } */
-/* { dg-options "-mpaired -ffinite-math-only " } */
-
-/* Test PowerPC PAIRED extensions.  */
-
-#include 
-
-static float out[2] __attribute__ ((aligned (8)));
-void

[PATCH] testsuite, rs6000: Remove some checks with aix[456]

2024-05-08 Thread Kewen.Lin
Hi,

Since r12-75-g0745b6fa66c69c aix6 support had been dropped,
so we don't need to check for aix[456].* when testing, this
patch is to remove such checks.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_powerpc_altivec_ok): Remove checks for
aix[456].*
(check_effective_target_powerpc_p9modulo_ok): Likewise.
(check_effective_target_powerpc_float128_sw_ok): Likewise.
(check_effective_target_powerpc_float128_hw_ok): Likewise.
(check_effective_target_powerpc_vsx_ok): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 29 ---
 1 file changed, 29 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a55b2a4159..16dc2766850 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6963,11 +6963,6 @@ proc check_effective_target_powerpc_altivec_ok { } {
 # Paired Single, then not ok
 if { [istarget powerpc-*-linux*paired*] } { return 0 }

-# AltiVec is not supported on AIX before 5.3.
-if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } { return 0 }
-
 # Return true iff compiling with -maltivec does not error.
 return [check_no_compiler_messages powerpc_altivec_ok object {
int dummy;
@@ -6980,12 +6975,6 @@ proc check_effective_target_powerpc_p9modulo_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
return [check_no_compiler_messages powerpc_p9modulo_ok object {
int main (void) {
int i = 5, j = 3, r = -1;
@@ -7116,12 +7105,6 @@ proc check_effective_target_powerpc_float128_sw_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
# Darwin doesn't have VSX, so no soft support for float128.
if { [istarget *-*-darwin*] } {
return 0
@@ -7146,12 +7129,6 @@ proc check_effective_target_powerpc_float128_hw_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
# Darwin doesn't run on any machine with float128 h/w so far.
if { [istarget *-*-darwin*] } {
return 0
@@ -7215,12 +7192,6 @@ proc check_effective_target_powerpc_vsx_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # VSX is not supported on AIX before 7.1.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5*]
-|| [istarget powerpc*-*-aix6*] } {
-   return 0
-   }
# Darwin doesn't have VSX, even if it's used with an assembler
# which recognises the insns.
if { [istarget *-*-darwin*] } {
--
2.39.1


[PATCH] testsuite: Fix typo in torture/vector-{1,2}.c

2024-05-08 Thread Kewen.Lin
Hi,

When making some clean up patches, I happened to find test
cases vector-{1,2}.c are having typo "powerpc64--*-*" in
target selector, which should be powerpc64-*-*.  The reason
why we didn't catch before is that all our testing machines
support VMX insns, so it passes always.  But it would break
if a test machine doesn't support that, so this patch is to
fix it to ensure robustness.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.dg/torture/vector-1.c: Fix typo.
* gcc.dg/torture/vector-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/vector-1.c | 2 +-
 gcc/testsuite/gcc.dg/torture/vector-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/vector-1.c 
b/gcc/testsuite/gcc.dg/torture/vector-1.c
index 205fee6d6de..1b98ee26ff3 100644
--- a/gcc/testsuite/gcc.dg/torture/vector-1.c
+++ b/gcc/testsuite/gcc.dg/torture/vector-1.c
@@ -4,7 +4,7 @@
 /* { dg-options "-msse" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-require-effective-target sse_runtime { target { i?86-*-* x86_64-*-* } 
} } */
 /* { dg-options "-mabi=altivec" { target { powerpc-*-* powerpc64-*-* } } } */
-/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64--*-* 
} } } */
+/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64-*-* } 
} } */

 #define vector __attribute__((vector_size(16) ))

diff --git a/gcc/testsuite/gcc.dg/torture/vector-2.c 
b/gcc/testsuite/gcc.dg/torture/vector-2.c
index b004d005775..c9a3a44d4df 100644
--- a/gcc/testsuite/gcc.dg/torture/vector-2.c
+++ b/gcc/testsuite/gcc.dg/torture/vector-2.c
@@ -4,7 +4,7 @@
 /* { dg-options "-msse" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-require-effective-target sse_runtime { target { i?86-*-* x86_64-*-* } 
} } */
 /* { dg-options "-mabi=altivec" { target { powerpc-*-* powerpc64-*-* } } } */
-/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64--*-* 
} } } */
+/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64-*-* } 
} } */

 #define vector __attribute__((vector_size(16) ))

--
2.39.1


[PATCH] rs6000: Drop useless vector_{load,store}_ defines

2024-05-08 Thread Kewen.Lin
Hi,

When I was working on a patch to get rid of TFmode, I
noticed that define_expands vector_load_ and
vector_store_ are useless.  This patch is to clean up
both.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/vector.md (define_expand vector_load_): Remove.
(vector_store_): Likewise.
---
 gcc/config/rs6000/vector.md | 14 --
 1 file changed, 14 deletions(-)

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index f9796fb3781..59489e06839 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -163,20 +163,6 @@ (define_expand "mov"
 }
 })

-;; Generic vector floating point load/store instructions.  These will match
-;; insns defined in vsx.md or altivec.md depending on the switches.
-(define_expand "vector_load_"
-  [(set (match_operand:VEC_M 0 "vfloat_operand")
-   (match_operand:VEC_M 1 "memory_operand"))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "")
-
-(define_expand "vector_store_"
-  [(set (match_operand:VEC_M 0 "memory_operand")
-   (match_operand:VEC_M 1 "vfloat_operand"))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "")
-
 ;; Splits if a GPR register was chosen for the move
 (define_split
   [(set (match_operand:VEC_L 0 "nonimmediate_operand")
--
2.39.1


[PATCH] rs6000: Remove useless entries in rreg

2024-05-08 Thread Kewen.Lin
Hi,

When I was working on a trial patch to get rid of TFmode,
I noticed that mode attribute rreg only gets used for mode
iterator SFDF, it means that only SF and DF key-value pairs
are useful, the other are useless, so this patch is to clean
up them.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.md (mode attribute rreg): Remove useless
entries with modes TF, TD, V4SF and V2DF.
---
 gcc/config/rs6000/rs6000.md | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..4b70b50edca 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -725,11 +725,7 @@ (define_mode_attr ptrm [(SI "m")
(DI "Y")])

 (define_mode_attr rreg [(SF   "f")
-   (DF   "wa")
-   (TF   "f")
-   (TD   "f")
-   (V4SF "wa")
-   (V2DF "wa")])
+   (DF   "wa")])

 (define_mode_attr rreg2 [(SF   "f")
 (DF   "d")])
--
2.39.1


[PATCH] rs6000: Remove useless operands[3]

2024-05-08 Thread Kewen.Lin
Hi,

As shown, three uses of operands[3] are totally useless, so
this patch is to remove them to avoid any confusion.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.md (@ieee_128bit_vsx_neg2): Remove
the use of operands[3].
(@ieee_128bit_vsx_neg2): Likewise.
(*ieee_128bit_vsx_nabs2): Likewise.
---
 gcc/config/rs6000/rs6000.md | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4b70b50edca..daae2f81061 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9256,7 +9256,6 @@ (define_insn_and_split "@ieee_128bit_vsx_neg2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
@@ -9285,7 +9284,6 @@ (define_insn_and_split "@ieee_128bit_vsx_abs2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
@@ -9317,7 +9315,6 @@ (define_insn_and_split "*ieee_128bit_vsx_nabs2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
--
2.39.1


[PATCH] rs6000: Clean up TF and TD check with FLOAT128_2REG_P

2024-05-08 Thread Kewen.Lin
Hi,

Commit r6-2116-g2c83faf86827bf did some clean up on TFmode
and TFmode check with FLOAT128_2REG_P, but it missed to
update an assertion, this patch is to make it align.

btw, it's noticed when I'm making a patch to get rid of
TFmode.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000-call.cc (rs6000_darwin64_record_arg_recurse):
Clean up TFmode and TDmode check with FLOAT128_2REG_P.
---
 gcc/config/rs6000/rs6000-call.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..a039ff75f3c 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -1391,7 +1391,7 @@ rs6000_darwin64_record_arg_recurse (CUMULATIVE_ARGS *cum, 
const_tree type,
if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1)
  {
gcc_assert (cum->fregno == FP_ARG_MAX_REG
-   && (mode == TFmode || mode == TDmode));
+   && FLOAT128_2REG_P (mode));
/* Long double or _Decimal128 split over regs and memory.  */
mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : DFmode;
cum->use_stack=1;
--
2.39.1


[COMMITTED] Enable prange support.

2024-05-08 Thread Aldy Hernandez
This throws the switch on prange.  After this patch, it is no longer
valid to store a pointer in an irange (or vice versa).  Instead, they
must go in prange, which is faster and more memory efficient.

I will push this now, so I have time to do any follow-up bugfixing
before going on paternity leave.

There are various cleanups we plan on doing after this patch (faster
intersect/union, remove range-op-mixed.h, remove value_range in favor
of int_range_max, reclaim the name for the Value_Range temporary,
clean up range-ops, etc etc).  But we will hold off on those for now
to make it easier to revert this patch, if for some reason we need to
do so while I'm away.

Tested on x86-64 Linux.

gcc/ChangeLog:

* gimple-range-cache.cc (sbr_sparse_bitmap::sbr_sparse_bitmap):
Change irange to prange.
* gimple-range-fold.cc (fold_using_range::fold_stmt): Same.
(fold_using_range::range_of_address): Same.
* gimple-range-fold.h (range_of_address): Same.
* gimple-range-infer.cc (gimple_infer_range::add_nonzero): Same.
* gimple-range-op.cc (class cfn_strlen): Same.
* gimple-range-path.cc
(path_range_query::adjust_for_non_null_uses): Same.
* gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses): Same.
* tree-ssa-structalias.cc (find_what_p_points_to): Same.
* range-op-ptr.cc (range_op_table::initialize_pointer_ops): Remove
hybrid entries in table.
* range-op.cc (range_op_table::range_op_table): Add pointer
entries for bitwise and/or and min/max.
* value-range.cc (irange::verify_range): Add assert.
* value-range.h (irange::varying_compatible_p): Remove check for
error_mark_node.
(irange::supports_p): Remove pointer support.
* ipa-cp.h (ipa_supports_p): Add prange support.
---
 gcc/gimple-range-cache.cc |  4 ++--
 gcc/gimple-range-fold.cc  |  4 ++--
 gcc/gimple-range-fold.h   |  2 +-
 gcc/gimple-range-infer.cc |  2 +-
 gcc/gimple-range-op.cc|  2 +-
 gcc/gimple-range-path.cc  |  2 +-
 gcc/gimple-ssa-warn-access.cc |  2 +-
 gcc/ipa-cp.h  |  2 +-
 gcc/range-op-ptr.cc   |  4 
 gcc/range-op.cc   | 18 --
 gcc/tree-ssa-structalias.cc   |  2 +-
 gcc/value-range.cc|  1 +
 gcc/value-range.h |  4 ++--
 13 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 72ac2552311..bdd2832873a 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -274,10 +274,10 @@ sbr_sparse_bitmap::sbr_sparse_bitmap (tree t, 
vrange_allocator *allocator,
   // Pre-cache zero and non-zero values for pointers.
   if (POINTER_TYPE_P (t))
 {
-  int_range<2> nonzero;
+  prange nonzero;
   nonzero.set_nonzero (t);
   m_range[1] = m_range_allocator->clone (nonzero);
-  int_range<2> zero;
+  prange zero;
   zero.set_zero (t);
   m_range[2] = m_range_allocator->clone (zero);
 }
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 9c4ad1ee7b9..a9c8c4d03e6 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -597,7 +597,7 @@ fold_using_range::fold_stmt (vrange , gimple *s, 
fur_source , tree name)
   // Process addresses.
   if (gimple_code (s) == GIMPLE_ASSIGN
   && gimple_assign_rhs_code (s) == ADDR_EXPR)
-return range_of_address (as_a  (r), s, src);
+return range_of_address (as_a  (r), s, src);
 
   gimple_range_op_handler handler (s);
   if (handler)
@@ -757,7 +757,7 @@ fold_using_range::range_of_range_op (vrange ,
 // If a range cannot be calculated, set it to VARYING and return true.
 
 bool
-fold_using_range::range_of_address (irange , gimple *stmt, fur_source )
+fold_using_range::range_of_address (prange , gimple *stmt, fur_source )
 {
   gcc_checking_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
   gcc_checking_assert (gimple_assign_rhs_code (stmt) == ADDR_EXPR);
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 7cbe15d05e5..c7c599bfc93 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -157,7 +157,7 @@ protected:
  fur_source );
   bool range_of_call (vrange , gcall *call, fur_source );
   bool range_of_cond_expr (vrange , gassign* cond, fur_source );
-  bool range_of_address (irange , gimple *s, fur_source );
+  bool range_of_address (prange , gimple *s, fur_source );
   bool range_of_phi (vrange , gphi *phi, fur_source );
   void range_of_ssa_name_with_loop_info (vrange &, tree, class loop *, gphi *,
 fur_source );
diff --git a/gcc/gimple-range-infer.cc b/gcc/gimple-range-infer.cc
index c8e8b9b60ac..d5e1aa14275 100644
--- a/gcc/gimple-range-infer.cc
+++ b/gcc/gimple-range-infer.cc
@@ -123,7 +123,7 @@ gimple_infer_range::add_nonzero (tree name)
 {
   if (!gimple_range_ssa_p (name))
 return;
-  int_range<2> nz;
+