date:20240516

Re: [PATCH] s390: Implement TARGET_NOCE_CONVERSION_PROFITABLE_P [PR109549]

2024-05-16 Thread Andreas Krebbel

On 5/8/24 10:06, Stefan Schulze Frielinghaus wrote:
> Consider a NOCE conversion as profitable if there is at least one
> conditional move.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (TARGET_NOCE_CONVERSION_PROFITABLE_P):
>   Define.
>   (s390_noce_conversion_profitable_p): Implement.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/ccor.c: Order of loads are reversed, now, as a
>   consequence the condition has to be reversed.
> ---
>  Bootstrapped and regtested on s390.  Ok for mainline?
> 
>  gcc/config/s390/s390.cc  | 32 
>  gcc/testsuite/gcc.target/s390/ccor.c |  4 ++--
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index bf46eab2d63..23b18b5c506 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "context.h"
>  #include "builtins.h"
> +#include "ifcvt.h"
>  #include "rtl-iter.h"
>  #include "intl.h"
>  #include "tm-constrs.h"
> @@ -18037,6 +18038,37 @@ s390_vectorize_vec_perm_const (machine_mode vmode, 
> machine_mode op_mode,
>return vectorize_vec_perm_const_1 (d);
>  }
>  
> +/* Consider a NOCE conversion as profitable if there is at least one
> +   conditional move.  */
> +
> +#undef TARGET_NOCE_CONVERSION_PROFITABLE_P
> +#define TARGET_NOCE_CONVERSION_PROFITABLE_P s390_noce_conversion_profitable_p
We collect these definitions at the very end of s390.cc

> +
> +static bool
> +s390_noce_conversion_profitable_p (rtx_insn *seq, struct noce_if_info 
> *if_info)
> +{
> +  if (if_info->speed_p)
> +{
> +  for (rtx_insn *insn = seq; insn; insn = NEXT_INSN (insn))
> + {
> +   rtx set = single_set (insn);
> +   if (set == NULL)
> + continue;
> +   if (GET_CODE (SET_SRC (set)) != IF_THEN_ELSE)
> + continue;
> +   rtx src = SET_SRC (set);
> +   machine_mode mode = GET_MODE (src);
> +   if (GET_MODE_CLASS (mode) != MODE_INT
> +   && GET_MODE_CLASS (mode) != MODE_FLOAT)
> + continue;
> +   if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (Pmode))
I guess GET_MODE_SIZE(Pmode) should be UNITS_PER_WORD here to enable the 
conversion also for 64 bit
modes with -m31 -mzarch.

Ok with these changes. Thanks!

Andreas

> + continue;
> +   return true;
> + }
> +}
> +  return default_noce_conversion_profitable_p (seq, if_info);
> +}
> +
>  /* Initialize GCC target structure.  */
>  
>  #undef  TARGET_ASM_ALIGNED_HI_OP
> diff --git a/gcc/testsuite/gcc.target/s390/ccor.c 
> b/gcc/testsuite/gcc.target/s390/ccor.c
> index 31f30f60314..36a3c3a999a 100644
> --- a/gcc/testsuite/gcc.target/s390/ccor.c
> +++ b/gcc/testsuite/gcc.target/s390/ccor.c
> @@ -42,7 +42,7 @@ GENFUN1(2)
>  
>  GENFUN1(3)
>  
> -/* { dg-final { scan-assembler {locrno} } } */
> +/* { dg-final { scan-assembler {locro} } } */
>  
>  GENFUN2(0,1)
>  
> @@ -58,7 +58,7 @@ GENFUN2(0,3)
>  
>  GENFUN2(1,2)
>  
> -/* { dg-final { scan-assembler {locrnlh} } } */
> +/* { dg-final { scan-assembler {locrlh} } } */
>  
>  GENFUN2(1,3)
>

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Richard Biener

On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
 wrote:
>
> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.

Richard.

> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>?

Re: [PATCH] make -freg-struct-return visibly a negative alias of -fpcc-struct-return

2024-05-16 Thread Alexandre Oliva

On Apr 30, 2024, Richard Biener  wrote:

> OK.

Thanks, I'm back (from LibrePlanet), and I've just installed it in the
trunk.

>> for  gcc/ChangeLog
>> 
>> * common.opt (freg-struct-return): Make it explicitly
>> fpcc-struct-return's NegativeAlias.  Copy Optimization...
>> (freg-struct-return): ... here.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Li, Pan2

Hi Tamar,

I am trying to add more shape(s) like below branch version for SAT_ADD. I 
suspect that widening_mul may not be the best place to take care of this shape.
Because after_dom_children almost works on bb but we actually need to find the 
def/use cross the bb.

Thus, is there any suggestion for branch shape? Add new simplify to match.pd 
works well but it is not recommended per previous discussion.

Thanks a lot for help!

Pan

---Source code-

#define SAT_ADD_U_1(T) \
T sat_add_u_1_##T(T x, T y) \
{ \
  return (T)(x + y) >= x ? (x + y) : -1; \
}

SAT_ADD_U_1(uint16_t)

---Gimple-

uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
{
  short unsigned int _1;
  uint16_t _2;

   [local count: 1073741824]:
  _1 = x_3(D) + y_4(D);
  if (_1 >= x_3(D))
goto ; [65.00%]
  else
goto ; [35.00%]

   [local count: 697932184]:

   [local count: 1073741824]:
  # _2 = PHI <65535(2), _1(3)>
  return _2;
}

Pan

-Original Message-
From: Tamar Christina  
Sent: Wednesday, May 15, 2024 5:12 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

Hi Pan,

Thanks!

> -Original Message-
> From: pan2...@intel.com 
> Sent: Wednesday, May 15, 2024 3:14 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com;
> hongtao@intel.com; Pan Li 
> Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar
> int
> 
> From: Pan Li 
> 
> This patch would like to add the middle-end presentation for the
> saturation add.  Aka set the result of add to the max when overflow.
> It will take the pattern similar as below.
> 
> SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> 
> Take uint8_t as example, we will have:
> 
> * SAT_ADD (1, 254)   => 255.
> * SAT_ADD (1, 255)   => 255.
> * SAT_ADD (2, 255)   => 255.
> * SAT_ADD (255, 255) => 255.
> 
> Given below example for the unsigned scalar integer uint64_t:
> 
> uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> {
>   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> }
> 
> Before this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   _Bool _2;
>   long unsigned int _3;
>   long unsigned int _4;
>   uint64_t _7;
>   long unsigned int _10;
>   __complex__ long unsigned int _11;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
>   _1 = REALPART_EXPR <_11>;
>   _10 = IMAGPART_EXPR <_11>;
>   _2 = _10 != 0;
>   _3 = (long unsigned int) _2;
>   _4 = -_3;
>   _7 = _1 | _4;
>   return _7;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> {
>   uint64_t _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
>   return _7;
> ;;succ:   EXIT
> }
> 
> The below tests are passed for this patch:
> 1. The riscv fully regression tests.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
> 
>   PR target/51492
>   PR target/112600
> 
> gcc/ChangeLog:
> 
>   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
>   to the return true switch case(s).
>   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
>   * match.pd: Add unsigned SAT_ADD match(es).
>   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
>   us/ssadd.
>   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
>   extern func decl generated in match.pd match.
>   (match_saturation_arith): New func impl to match the saturation arith.
>   (math_opts_dom_walker::after_dom_children): Try match saturation
>   arith when IOR expr.
> 

 LGTM but you'll need an OK from Richard,

Thanks for working on this!

Tamar

> Signed-off-by: Pan Li 
> ---
>  gcc/internal-fn.cc|  1 +
>  gcc/internal-fn.def   |  2 ++
>  gcc/match.pd  | 51 +++
>  gcc/optabs.def|  4 +--
>  gcc/tree-ssa-math-opts.cc | 32 
>  5 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 0a7053c2286..73045ca8c8c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
>  case IFN_UBSAN_CHECK_MUL:
>  case IFN_ADD_OVERFLOW:
>  case IFN_MUL_OVERFLOW:
> +case IFN_SAT_ADD:
>  case IFN_VEC_WIDEN_PLUS:
>  case IFN_VEC_WIDEN_PLUS_LO:
>  case IFN_VEC_WIDEN_PLUS_HI:
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 848bb9dbff3..25badbb86e5 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHS, ECF_CONST
> |

Re: [PATCH] add rlwinm pattern for DImode for constant building

2024-05-16 Thread Jiufu Guo



Hi,

Gentle ping ...

BR,
Jeff(Jiufu) Guo

Jiufu Guo  writes:

> Hi,
>
> 'rlwinm' pattern is already well used for SImode.  As this instruction
> can touch the whole 64bit register, so some constants in 64bit(DImode)
> can be built via 'lis/li+rlwinm'.  To achieve this, a new pattern for
> 'rlwinm' is added, and 'rs6000_emit_set_long_const' is updated to check
> if a constant is able to be built by 'lis/li; rlwinm'.
>
> Bootstrap and regtest pass on ppc64{,le}.
>
> Is this patch ok for trunk (when stage1 is open)?
>
> Jeff (Jiufu Guo).
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): Add new
>   parameter.
>   * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rlwinm): New 
> function.
>   (rs6000_emit_set_long_const): Generate 'lis/li+rlwinm'.
>   (can_be_rotated_to_lowbits): Add new parameter.
>   * config/rs6000/rs6000.md (rlwinm_di_mask): New pattern.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr93012.c: Update to match 'rlwinm'.
>   * gcc.target/powerpc/rlwinm4di-1.c: New test.
>   * gcc.target/powerpc/rlwinm4di-2.c: New test.
>   * gcc.target/powerpc/rlwinm4di.c: New test.
>   * gcc.target/powerpc/rlwinm4di.h: New test.
>
> ---
>  gcc/config/rs6000/rs6000-protos.h |  2 +-
>  gcc/config/rs6000/rs6000.cc   | 65 ++-
>  gcc/config/rs6000/rs6000.md   | 18 +
>  gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
>  .../gcc.target/powerpc/rlwinm4di-1.c  | 25 +++
>  .../gcc.target/powerpc/rlwinm4di-2.c  | 19 ++
>  gcc/testsuite/gcc.target/powerpc/rlwinm4di.c  |  6 ++
>  gcc/testsuite/gcc.target/powerpc/rlwinm4di.h  | 25 +++
>  8 files changed, 158 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di-2.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/rlwinm4di.h
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 09a57a806fa..10505a8061a 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -36,7 +36,7 @@ extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, 
> int * = nullptr);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> -extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *);
> +extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *, 
> bool = false);
>  extern bool can_be_rotated_to_positive_16bits (HOST_WIDE_INT);
>  extern bool can_be_rotated_to_negative_15bits (HOST_WIDE_INT);
>  extern int num_insns_constant (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6ba9df4f02e..853eaede673 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10454,6 +10454,51 @@ can_be_built_by_li_and_rldic (HOST_WIDE_INT c, int 
> *shift, HOST_WIDE_INT *mask)
>return false;
>  }
>  
> +/* Check if value C can be generated by 2 instructions, one instruction
> +   is li/lis, another instruction is rlwinm.  */
> +
> +static bool
> +can_be_built_by_li_lis_and_rlwinm (HOST_WIDE_INT c, HOST_WIDE_INT *val,
> +int *shift, HOST_WIDE_INT *mask)
> +{
> +  unsigned HOST_WIDE_INT low = c & 0xULL;
> +  unsigned HOST_WIDE_INT high = (c >> 32) & 0xULL;
> +  unsigned HOST_WIDE_INT v;
> +
> +  /* diff of high and low (high ^ low) should be the mask position.  */
> +  unsigned HOST_WIDE_INT m = low ^ high;
> +  int tz = ctz_hwi (m);
> +  int lz = clz_hwi (m);
> +  if (m != 0)
> +m = ((HOST_WIDE_INT_M1U >> (lz + tz)) << tz);
> +  if (high != 0)
> +m = ~m;
> +  v = high != 0 ? high : ((low | ~m) & 0x);
> +
> +  if ((high != 0) && ((v & m) != low || lz < 33 || tz < 1))
> +return false;
> +
> +  /* rotl32 on positive/negative value of 'li' 15/16bits.  */
> +  int n;
> +  if (!can_be_rotated_to_lowbits (v, 15, , true)
> +  && !can_be_rotated_to_lowbits ((~v) & 0xULL, 15, , true))
> +{
> +  /* rotate32 from a negative value of 'lis'.  */
> +  if (!can_be_rotated_to_lowbits (v & 0xULL, 16, , true))
> + return false;
> +  n += 16;
> +}
> +  n = 32 - (n % 32);
> +  n %= 32;
> +  v = ((v >> n) | (v << (32 - n))) & 0x;
> +  if (v & 0x8000ULL)
> +v |= HOST_WIDE_INT_M1U << 32;
> +  *mask = m;
> +  *val = v;
> +  *shift = n;
> +  return true;
> +}
> +
>  /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
> Output insns to set DEST equal to the constant C as a series of
> lis, ori and shl instructions.  If NUM_INSNS is not NULL, then
> @@ -10553,6 +10598,18 @@ rs6000_emit_set_long_const (rtx

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread HAO CHEN GUI

Hi Segher,
  Thanks for your review comments. I will modify it and resend. Just
one question on the insn condition.

在 2024/5/17 1:25, Segher Boessenkool 写道:
>> +(define_expand "isnormal2"
>> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
>> +(use (match_operand:SFDF 1 "gpc_reg_operand"))]
>> +  "TARGET_HARD_FLOAT
>> +   && TARGET_P9_VECTOR"
> Please put the condition on just one line if it is as simple and short
> as this.
> 
> Why is TARGET_P9_VECTOR the correct condition?

This expand calls gen_xststdcp which is a P9 vector instruction and
relies on "TARGET_P9_VECTOR". So I set the condition.

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu

> >
> Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> 2-way dot_prod operations can be generated
>
This is the link https://godbolt.org/z/hcWr64vx3, x86 define
udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
instructions are generated


-- 
BR,
Hongtao

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Hongtao Liu

On Thu, May 16, 2024 at 10:40 PM Victor Do Nascimento
 wrote:
>
> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
2-way dot_prod operations can be generated

> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>
> [1] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] 
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md 
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> +++ b/gcc/config/aarch64/aarch64-sve2.md
> @@ -2021,7 +2021,7 @@

[PATCH] Use pblendw instead of pand to clear upper 16 bits.

2024-05-16 Thread liuhongt

For vec_pack_truncv8si/v4si w/o AVX512,
(const_vector:v4si (const_int 0x) x4) is used as mask to clear
upper 16 bits, but vpblendw with zero_vector can also be used, and
zero vector is cheaper than (const_vector:v4si (const_int 0x) x4).

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:
PR target/114427
* config/i386/i386-expand.cc (expand_vec_perm_even_odd_pack):
Use pblendw instead of pand to clear upper bits.

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr114427.c: New test.
---
 gcc/config/i386/i386-expand.cc   | 34 +---
 gcc/testsuite/gcc.target/i386/pr114427.c | 18 +
 2 files changed, 48 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114427.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4e16aedc5c1..231e9321d81 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -22918,6 +22918,7 @@ expand_vec_perm_even_odd_pack (struct expand_vec_perm_d 
*d)
 {
   rtx op, dop0, dop1, t;
   unsigned i, odd, c, s, nelt = d->nelt;
+  int pblendw_i = 0;
   bool end_perm = false;
   machine_mode half_mode;
   rtx (*gen_and) (rtx, rtx, rtx);
@@ -22939,6 +22940,7 @@ expand_vec_perm_even_odd_pack (struct expand_vec_perm_d 
*d)
   gen_and = gen_andv2si3;
   gen_pack = gen_mmx_packusdw;
   gen_shift = gen_lshrv2si3;
+  pblendw_i = 0x5;
   break;
 case E_V8HImode:
   /* Required for "pack".  */
@@ -22950,6 +22952,7 @@ expand_vec_perm_even_odd_pack (struct expand_vec_perm_d 
*d)
   gen_and = gen_andv4si3;
   gen_pack = gen_sse4_1_packusdw;
   gen_shift = gen_lshrv4si3;
+  pblendw_i = 0x55;
   break;
 case E_V8QImode:
   /* No check as all instructions are SSE2.  */
@@ -22978,6 +22981,7 @@ expand_vec_perm_even_odd_pack (struct expand_vec_perm_d 
*d)
   gen_and = gen_andv8si3;
   gen_pack = gen_avx2_packusdw;
   gen_shift = gen_lshrv8si3;
+  pblendw_i = 0x;
   end_perm = true;
   break;
 case E_V32QImode:
@@ -23013,10 +23017,32 @@ expand_vec_perm_even_odd_pack (struct 
expand_vec_perm_d *d)
   dop1 = gen_reg_rtx (half_mode);
   if (odd == 0)
 {
-  t = gen_const_vec_duplicate (half_mode, GEN_INT (c));
-  t = force_reg (half_mode, t);
-  emit_insn (gen_and (dop0, t, gen_lowpart (half_mode, d->op0)));
-  emit_insn (gen_and (dop1, t, gen_lowpart (half_mode, d->op1)));
+  /* Use pblendw since const_vector 0 should be cheaper than
+const_vector 0x.  */
+  if (d->vmode == V4HImode
+ || d->vmode == E_V8HImode
+ || d->vmode == E_V16HImode)
+   {
+ rtx dop0_t = gen_reg_rtx (d->vmode);
+ rtx dop1_t = gen_reg_rtx (d->vmode);
+ t = gen_reg_rtx (d->vmode);
+ emit_move_insn (t, CONST0_RTX (d->vmode));
+
+ emit_move_insn (dop0_t, gen_rtx_VEC_MERGE (d->vmode, d->op0, t,
+GEN_INT (pblendw_i)));
+ emit_move_insn (dop1_t, gen_rtx_VEC_MERGE (d->vmode, d->op1, t,
+GEN_INT (pblendw_i)));
+
+ emit_move_insn (dop0, gen_lowpart (half_mode, dop0_t));
+ emit_move_insn (dop1, gen_lowpart (half_mode, dop1_t));
+   }
+  else
+   {
+ t = gen_const_vec_duplicate (half_mode, GEN_INT (c));
+ t = force_reg (half_mode, t);
+ emit_insn (gen_and (dop0, t, gen_lowpart (half_mode, d->op0)));
+ emit_insn (gen_and (dop1, t, gen_lowpart (half_mode, d->op1)));
+   }
 }
   else
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr114427.c 
b/gcc/testsuite/gcc.target/i386/pr114427.c
new file mode 100644
index 000..58b66db7fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114427.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v3 -O2 -mno-avx512f" } */
+/* { dg-final { scan-assembler-not "vpand" } } */
+/* { dg-final { scan-assembler-not "65535" } } */
+
+void
+foo (int* a, short* __restrict b, int* c)
+{
+for (int i = 0; i != 16; i++)
+  b[i] = c[i] + a[i];
+}
+
+void
+foo1 (int* a, short* __restrict b, int* c)
+{
+for (int i = 0; i != 8; i++)
+  b[i] = c[i] + a[i];
+}
-- 
2.31.1

[PATCH] RISC-V: Fix testcases renamed test flag options

2024-05-16 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai

[PATCH] RISC-V: Fix testcases renamed test flag options

2024-05-16 Thread Edwin Lu

Some testcases still had --param=riscv-autovec-preference=_,
update to use -mrvv-vector-bits=_. Also add missing period
in riscv.opt which caused a compiler driver error.

gcc/ChangeLog:

* config/riscv/riscv.opt: Add missing period

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/no-segment.c: Update dejagnu flags
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-3.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-4.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-5.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-6.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-7.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-8.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-9.c:
  Ditto
---
 gcc/config/riscv/riscv.opt| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c  | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c   | 4 ++--
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c| 2 +-

RE: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Tamar Christina

Hi,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Thursday, May 16, 2024 2:57 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization
> [PR114061]'
> 
> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
> 
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
> 
> The failure stems from two issues:
> 
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
> 
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
> 
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
> 
> This patch addresses both of these:
> 
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
> 
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
> 
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.
> 
>PR target/114061
> 
> gcc/ChangeLog:
> 
>   * tree-data-ref.cc (get_references_in_stmt): set
>   `clobbers_memory' to false for __builtin_prefetch.
>   * tree-vect-loop.cc (vect_transform_loop): Drop all
>   __builtin_prefetch calls from loops.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-prefetch-drop.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
>  gcc/tree-data-ref.cc   |  9 +
>  gcc/tree-vect-loop.cc  |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..57723a8c972
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve -fdump-tree-
> vect-details" { target { aarch64*-*-* } } } */
> +

See the review about two-way dotprod for comments on this.
However this specific test does not need to check for any assembly instructions.

You're going from being unable to vectorize a function, to being able to 
vectorize
It.

So the `vectorized 1 loops` check is sufficient, then this will work for all 
targets.
This requires a check on vect_double (see gcc/testsuite/lib/target-supports.exp)

I'd also change the loop to just use int, as more targets will support 
vectorizing
those, (and of course at a vect_int check instead)

> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m, z\[0-9\]+.d, 
> z\[0-
> 9\]+.d" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
>

[r15-580 Regression] FAIL: experimental/functional/searchers.cc -std=gnu++17 execution test on Linux/x86_64

2024-05-16 Thread haochen.jiang

On Linux/x86_64,

f3e5f4c58591f5dacdd14a65ec47bbe310df02a0 is the first bad commit
commit f3e5f4c58591f5dacdd14a65ec47bbe310df02a0
Author: Richard Biener 
Date:   Mon Mar 11 11:17:32 2024 +0100

tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

caused

FAIL: experimental/functional/searchers.cc  -std=gnu++17 execution test

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-580/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=experimental/functional/searchers.cc 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=experimental/functional/searchers.cc 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=experimental/functional/searchers.cc 
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=experimental/functional/searchers.cc 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [Patch, fortran] PR114874 - [14/15 Regression] ICE with select type, type is (character(*)), and substring

2024-05-16 Thread Harald Anlauf


Hi Paul!

Am 15.05.24 um 19:07 schrieb Paul Richard Thomas:

Hi All,

I have been around several circuits with a patch for this regression. I
posted one in Bugzilla but rejected it because it was not direct enough.
This one, however, is more to my liking and fixes another bug lurking in
the shadows.

The way in which select type has been implemented is a bit weird in that
the select type temporaries don't get their assoc set until resolution.
Therefore, if the selector is of inferred type, the namespace is tagged by
setting 'assoc_name_inferred'. This narrows down the range of select type
temporaries that are picked out by the chunk in primary.cc, thereby fixing
the problem.


I think that is a most reasonable approach.  I like it!

What I find hard to read is the logic in match.cc that sets
gfc_current_ns->assoc_name_inferred.  I wonder if reordering the
outer if-conditions and adding a comment might be a good thing:

@@ -6721,6 +6721,20 @@ gfc_match_select_type (void)
   goto cleanup;
 }

+  if (expr2 && expr2->expr_type == EXPR_VARIABLE
+  && expr2->symtree->n.sym->assoc)
+{
+  if (expr2->symtree->n.sym->assoc->inferred_type)
+   gfc_current_ns->assoc_name_inferred = 1;
+  else if (expr2->symtree->n.sym->assoc->target
+  && expr2->symtree->n.sym->assoc->target->ts.type ==
BT_UNKNOWN)
+   gfc_current_ns->assoc_name_inferred = 1;
+}
+  else if (!expr2
+  && expr1->symtree->n.sym->assoc
+  && expr1->symtree->n.sym->assoc->inferred_type)
+gfc_current_ns->assoc_name_inferred = 1;

As the second part refers to the case there is only a selector
and no associate-name, i.e. the simple case, have it first?

Otherwise it looks very good.


The chunks in resolve.cc fix a problem found on the way, where invalid
array references, either cause an ICE or were silently absorbed.

OK for mainline and 14-branch?


Yes.

Thanks for the patch!

Harald



Paul

Fortran: Fix select type regression due to r14-9489 [PR114874]

2024-05-15  Paul Thomas  

gcc/fortran
PR fortran/114874
* gfortran.h: Add 'assoc_name_inferred' to gfc_namespace.
* match.cc (gfc_match_select_type) : Set 'assoc_name_inferred'
in select type namespace if the selector has inferred type.
* primary.cc (gfc_match_varspec): If a select type temporary
is apparently scalar and '(' has been detected, check to see if
the current name space has 'assoc_name_inferred' set. If so,
set inferred_type.
* resolve.cc (resolve_variable): If the namespace of a select
type temporary is marked with 'assoc_name_inferred' call
gfc_fixup_inferred_type_refs to ensure references are OK.
(gfc_fixup_inferred_type_refs): Catch invalid array refs..

gcc/testsuite/
PR fortran/114874
* gfortran.dg/pr114874_1.f90: New test for valid code.
* gfortran.dg/pr114874_2.f90: New test for invalid code.

New Swedish PO file for 'gcc' (version 14.1.0)

2024-05-16 Thread Translation Project Robot

Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

https://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-14.1.0.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.

Re: [PATCH] AArch64: Use LDP/STP for large struct types

2024-05-16 Thread Richard Sandiford

Richard Sandiford  writes:
> Wilco Dijkstra  writes:
>> Use LDP/STP for large struct types as they have useful immediate offsets and 
>> are typically faster.
>> This removes differences between little and big endian and allows use of 
>> LDP/STP without UNSPEC.
>>
>> Passes regress and bootstrap, OK for commit?
>>
>> gcc:
>> * config/aarch64/aarch64.cc (aarch64_classify_address): Treat SIMD 
>> structs identically
>> in little and bigendian.
>> * config/aarch64/aarch64.md (aarch64_mov): Remove VSTRUCT 
>> instructions.
>> (aarch64_be_mov): Allow little-endian, rename to 
>> aarch64_mov.
>> (aarch64_be_movoi): Allow little-endian, rename to aarch64_movoi.
>> (aarch64_be_movci): Allow little-endian, rename to aarch64_movci.
>> (aarch64_be_movxi): Allow little-endian, rename to aarch64_movxi.
>> Remove big-endian special case in define_split variants.
>>
>> gcc/testsuite:
>> * gcc.target/aarch64/torture/simd-abi-8.c: Update to check for 
>> LDP/STP.
>
> [...]
> So another alternative would be to go with the patch as-is,
> but add a new mechanism for gimple to query the valid addresses
> for IFN_(MASK_)LOAD_LANES and IFN_(MASK_)STORE_LANES, rather than
> relying purely on the legitimate address mechanism,.  Ideally, the new
> interface would be generic enough that we could use it for target (md)
> builtins as well, to better optimise ACLE code.

Gah, just realised after sending that there's another potential problem.
Currently inline asms can assume that "m" will only include the LD1/ST1
range for little-endian.  We might need to consider using
TARGET_MEM_CONSTRAINT, so that we continue to present the same
interface to asms, but can use the wider range internally.

Thanks,
Richard

Re: [PATCH] c++: paren aggr CTAD with base classes [PR115114]

2024-05-16 Thread Jason Merrill


On 5/16/24 11:32, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk and perhaps 14?


OK for both.


-- >8 --

We're accidentally ignoring base classes during parenthesized aggregate
CTAD because the TYPE_FIELDS of a template type doesn't contain bases,
so we need to consider them separately.

PR c++/115114

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Consider base classes in the paren
init case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr15.C: New test.
---
  gcc/cp/pt.cc  |  7 ++
  .../g++.dg/cpp2a/class-deduction-aggr15.C | 23 +++
  2 files changed, 30 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d83f530ac8d..54d74989903 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30202,6 +30202,13 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
else if (TREE_CODE (init) == TREE_LIST)
  {
int len = list_length (init);
+  for (tree binfo : BINFO_BASE_BINFOS (TYPE_BINFO (template_type)))
+   {
+ if (!len)
+   break;
+ parms = tree_cons (NULL_TREE, BINFO_TYPE (binfo), parms);
+ --len;
+   }
for (tree field = TYPE_FIELDS (template_type);
   len;
   --len, field = DECL_CHAIN (field))
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
new file mode 100644
index 000..16dc0f52b64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
@@ -0,0 +1,23 @@
+// PR c++/115114
+// { dg-do compile { target c++20 } }
+
+struct X {} x;
+struct Y {} y;
+
+template
+struct A : T {
+  U m;
+};
+
+using ty1 = decltype(A{x, 42}); // OK
+using ty1 = decltype(A(x, 42)); // OK, used to fail
+using ty1 = A;
+
+template
+struct B : T, V {
+  U m = 42;
+};
+
+using ty2 = decltype(B{x, y}); // OK
+using ty2 = decltype(B(x, y)); // OK, used to fail
+using ty2 = B;

Re: [PATCH] AArch64: Use LDP/STP for large struct types

2024-05-16 Thread Richard Sandiford

Wilco Dijkstra  writes:
> Use LDP/STP for large struct types as they have useful immediate offsets and 
> are typically faster.
> This removes differences between little and big endian and allows use of 
> LDP/STP without UNSPEC.
>
> Passes regress and bootstrap, OK for commit?
>
> gcc:
> * config/aarch64/aarch64.cc (aarch64_classify_address): Treat SIMD 
> structs identically
> in little and bigendian.
> * config/aarch64/aarch64.md (aarch64_mov): Remove VSTRUCT 
> instructions.
> (aarch64_be_mov): Allow little-endian, rename to 
> aarch64_mov.
> (aarch64_be_movoi): Allow little-endian, rename to aarch64_movoi.
> (aarch64_be_movci): Allow little-endian, rename to aarch64_movci.
> (aarch64_be_movxi): Allow little-endian, rename to aarch64_movxi.
> Remove big-endian special case in define_split variants.
>
> gcc/testsuite:
> * gcc.target/aarch64/torture/simd-abi-8.c: Update to check for 
> LDP/STP.

I'm nervous about approving the removal of something that was deliberately
added by the initial commits. :)  But, even ignoring the extra offset range,
using LDP/STP makes strong intuitive sense for 2-register modes.  And for
3- and 4-registers modes, it's not surprising if the split that the
patch performs is (at worst) equivalent to what the hardware would do
itself or (at best) something that the hardware handles slightly better.

It's also a significant clean-up.

My only concern is that the main uses of these modes are for LD[234] and
ST[234].  By imposing the LD1/ST1 restrictions, the current little-endian
definition of "m" also corresponds to what LD[234] and ST[234] expect.
This in turn means that ivopts will optimise induction variable selection
to account for the fact that LD[234] and ST[234] do not support offsets.

I think the effect of the patch will be to make ivopts optimise LD[234]
and ST[234] on the assumption that they have the same range as LDP/STP.
We could avoid that if we

(1) Keep:

> @@ -10482,14 +10481,6 @@ aarch64_classify_address (struct 
> aarch64_address_info *info,
>&& (code != REG && code != PLUS))
>  return false;
>  
> -  /* On LE, for AdvSIMD, don't support anything other than POST_INC or
> - REG addressing.  */
> -  if (advsimd_struct_p
> -  && TARGET_SIMD
> -  && !BYTES_BIG_ENDIAN
> -  && (code != POST_INC && code != REG))
> -return false;
> -
>gcc_checking_assert (GET_MODE (x) == VOIDmode
>  || SCALAR_INT_MODE_P (GET_MODE (x)));
>  

but drop the !BYTES_BIG_ENDIAN condition.

(2) Make Ump a defined_relaxed_memory_constraint (so that it accepts
more than "m" does).

(3) Use Ump instead of "o" in the move patterns.

Of course, this might make pure gimple-level data-shuffling worse.
I suppose it could also make RTL passes handle your memcpy use case
more pessimistically, although I'm not sure whether that would be for
legitimate reasons.

So another alternative would be to go with the patch as-is,
but add a new mechanism for gimple to query the valid addresses
for IFN_(MASK_)LOAD_LANES and IFN_(MASK_)STORE_LANES, rather than
relying purely on the legitimate address mechanism,.  Ideally, the new
interface would be generic enough that we could use it for target (md)
builtins as well, to better optimise ACLE code.

So the patch is OK as-is from my POV, but I think it's relatively
important that we try to fix the ivopts handling before GCC 15.

Thanks,
Richard

> ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 16b7445d9f72f77a98ab262e21fd24e6cc97eba0..bb8b6963fd5117be82afe6ccd7154ae5302c3691
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -7917,32 +7917,6 @@
>[(set_attr "type" "neon_store1_4reg")]
>  )
>  
> -(define_insn "*aarch64_mov"
> -  [(set (match_operand:VSTRUCT_QD 0 "aarch64_simd_nonimmediate_operand")
> - (match_operand:VSTRUCT_QD 1 "aarch64_simd_general_operand"))]
> -  "TARGET_SIMD && !BYTES_BIG_ENDIAN
> -   && (register_operand (operands[0], mode)
> -   || register_operand (operands[1], mode))"
> -  {@ [ cons: =0 , 1   ; attrs: type, length]
> - [ w, w   ; multiple   ,   ] #
> - [ Utv  , w   ; neon_store_reg_q , 4 ] 
> st1\t{%S1. - %1.}, %0
> - [ w, Utv ; neon_load_reg_q  , 4 ] 
> ld1\t{%S0. - %0.}, %1
> -  }
> -)
> -
> -(define_insn "*aarch64_mov"
> -  [(set (match_operand:VSTRUCT 0 "aarch64_simd_nonimmediate_operand")
> - (match_operand:VSTRUCT 1 "aarch64_simd_general_operand"))]
> -  "TARGET_SIMD && !BYTES_BIG_ENDIAN
> -   && (register_operand (operands[0], mode)
> -   || register_operand (operands[1], mode))"
> -  {@ [ cons: =0 , 1   ; attrs: type, length]
> - [ w, w   ; multiple   ,   ] #
> - [ Utv  , w   ; neon_store_reg_q , 4 ]

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-05-16 Thread Robin Dapp

> Can eqne pattern removal patches be committed firstly?

Please first make sure you test with corner cases, NaNs in
particular.  I'm pretty sure we don't have any test cases for
those.

Regards
 Robin

Re: [PATCH gcc-13] Fix RISC-V missing stack tie

2024-05-16 Thread Jeff Law





On 5/16/24 12:24 PM, Palmer Dabbelt wrote:



gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
tie for scalable and final stack adjustment if needed.

Co-authored-by: Raphael Zinsly 

(cherry picked from commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239)
---
I've only build tested this one, but it's tripping up some of the Fedora
folks here https://bugzilla.redhat.com/show_bug.cgi?id=2242327 so I
figured it's worth backporting.
Yes, that's the the original report from Florian that led Raphael and I 
to dive in.  Definitely worth backporting.


jeff

[PATCH] libstdc++: detect DLLs on windows with

2024-05-16 Thread Björn Schäpers

From: Björn Schäpers 

libstdc++-v3/Changelog

* acinclude.m4 (GLIBCXX_ENABLE_BACKTACE): Add check for
  tlhelp32.h, matching libbacktrace.
* configure: Regenerate.
* config.h.in: Regenerate.

Signed-off-by: Björn Schäpers 
---
 libstdc++-v3/acinclude.m4 |  4 
 libstdc++-v3/config.h.in  |  3 +++
 libstdc++-v3/configure| 15 +++
 3 files changed, 22 insertions(+)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 51a08bcc8b1..506ce98ae43 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5481,6 +5481,10 @@ AC_DEFUN([GLIBCXX_ENABLE_BACKTRACE], [
 BACKTRACE_CPPFLAGS="$BACKTRACE_CPPFLAGS -DHAVE_DL_ITERATE_PHDR=1"
   fi
   AC_CHECK_HEADERS(windows.h)
+  AC_CHECK_HEADERS(tlhelp32.h, [], [],
+  [#ifdef HAVE_WINDOWS_H
+  #  include 
+  #endif])
 
   # Check for the fcntl function.
   if test -n "${with_target_subdir}"; then
diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 906e0143099..486ba450749 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -490,6 +490,9 @@
 /* Define to 1 if you have the `timespec_get' function. */
 #undef HAVE_TIMESPEC_GET
 
+/* Define to 1 if you have the  header file. */
+#undef HAVE_TLHELP32_H
+
 /* Define to 1 if the target supports thread-local storage. */
 #undef HAVE_TLS
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 21abaeb0778..a2d59520146 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -53865,6 +53865,21 @@ _ACEOF
 
 fi
 
+done
+
+  for ac_header in tlhelp32.h
+do :
+  ac_fn_c_check_header_compile "$LINENO" "tlhelp32.h" 
"ac_cv_header_tlhelp32_h" "#ifdef HAVE_WINDOWS_H
+  #  include 
+  #endif
+"
+if test "x$ac_cv_header_tlhelp32_h" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_TLHELP32_H 1
+_ACEOF
+
+fi
+
 done
 
 
-- 
2.44.0

Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-05-16 Thread One of your IPs tried to hack me


Am 25.04.2024 um 22:16 schrieb Björn Schäpers:

Am 24.03.2024 um 22:34 schrieb Björn Schäpers:

From: Björn Schäpers 

This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
I don't know if I picked the right way to do it.

When acceptable I think the declaration should be moved into
ops-common.h, since then we could use stat_type and also use that in the
commonly used function.

Manually tested on i686-w64-mingw32.

-- >8 --
libstdc++: Fix overwriting files on windows

The inodes have no meaning on windows, thus all files have an inode of
0. Use a differenz approach to identify equivalent files. As a result
std::filesystem::copy_file did not honor
copy_options::overwrite_existing. Factored the method out of
std::filesystem::equivalent.

libstdc++-v3/Changelog:

* include/bits/fs_ops.h: Add declaration of
  __detail::equivalent_win32.
* src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
(fs::equivalent): Use __detail::equivalent_win32, factored the
old test out.
* src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
  Use the function.

Signed-off-by: Björn Schäpers 
---
  libstdc++-v3/include/bits/fs_ops.h   |  8 +++
  libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
  libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
  3 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_ops.h 
b/libstdc++-v3/include/bits/fs_ops.h

index 90650c47b46..d10b78a4bdd 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  namespace filesystem
  {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+namespace __detail
+{
+  bool
+  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);
+} // namespace __detail
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
    /** @addtogroup filesystem
 *  @{
 */
diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 61df19753ef..3cc87d45237 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -67,6 +67,49 @@
  namespace fs = std::filesystem;
  namespace posix = std::filesystem::__gnu_posix;
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+bool
+fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
+   error_code& ec)
+{
+  struct auto_handle {
+    explicit auto_handle(const path& p_)
+    : handle(CreateFileW(p_.c_str(), 0,
+    FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
+    0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
+    { }
+
+    ~auto_handle()
+    { if (*this) CloseHandle(handle); }
+
+    explicit operator bool() const
+    { return handle != INVALID_HANDLE_VALUE; }
+
+    bool get_info()
+    { return GetFileInformationByHandle(handle, ); }
+
+    HANDLE handle;
+    BY_HANDLE_FILE_INFORMATION info;
+  };
+  auto_handle h1(p1);
+  auto_handle h2(p2);
+  if (!h1 || !h2)
+    {
+  if (!h1 && !h2)
+    ec = __last_system_error();
+  return false;
+    }
+  if (!h1.get_info() || !h2.get_info())
+    {
+  ec = __last_system_error();
+  return false;
+    }
+  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
+    && h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
+    && h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+}
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
  fs::path
  fs::absolute(const path& p)
  {
@@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, 
error_code& ec) noexcept

    if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
  return false;
-  struct auto_handle {
-    explicit auto_handle(const path& p_)
-    : handle(CreateFileW(p_.c_str(), 0,
-  FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
-  0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
-    { }
-
-    ~auto_handle()
-    { if (*this) CloseHandle(handle); }
-
-    explicit operator bool() const
-    { return handle != INVALID_HANDLE_VALUE; }
-
-    bool get_info()
-    { return GetFileInformationByHandle(handle, ); }
-
-    HANDLE handle;
-    BY_HANDLE_FILE_INFORMATION info;
-  };
-  auto_handle h1(p1);
-  auto_handle h2(p2);
-  if (!h1 || !h2)
-    {
-  if (!h1 && !h2)
-    ec = __last_system_error();
-  return false;
-    }
-  if (!h1.get_info() || !h2.get_info())
-    {
-  ec = __last_system_error();
-  return false;
-    }
-  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
-    && h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
-    && h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+  return __detail::equivalent_win32(p1.c_str(), p2.c_str(), ec);
  #else
    return st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino;
  #endif
diff --git a/libstdc++-v3/src/filesystem/ops-common.h 
b/libstdc++-v3/src/filesystem/ops-common.h

index d917fddbeb1..7e67286bd01 100644
--- a/libstdc++-v3/src/filesystem/ops-common.h
+++

[PATCH gcc-13] Fix RISC-V missing stack tie

2024-05-16 Thread Palmer Dabbelt

From: Jeff Law 

As some of you know, Raphael has been working on stack-clash support for the
RISC-V port.  A little while ago Florian reached out to us with an issue where
glibc was failing its smoke test due to referencing an unallocated stack slot.

Without diving into the code in detail I (incorrectly) concluded it was a
problem with the fallback of using Ada's stack-check paths due to not having
stack-clash support.

Once enough stack-clash bits were ready I had Raphael review the code generated
for Florian's test and we concluded the the original case from Florian was just
wrong irrespective of stack clash/stack check.  While Raphael's stack-clash
work will indirectly fix Florian's case, it really should also work without
stack-clash.

In particular this code was called out by valgrind:

> 0003cb5e :
> __GI___realpath():
>3cb5e:   81010113addisp,sp,-2032
>3cb62:   7d313423sd  s3,1992(sp)
>3cb66:   79fdlui s3,0xf
>3cb68:   7e813023sd  s0,2016(sp)
>3cb6c:   7c913c23sd  s1,2008(sp)
>3cb70:   7f010413addis0,sp,2032
>3cb74:   35098793addia5,s3,848 # f350 
> <__libc_initial+0xffe8946a>
>3cb78:   74fdlui s1,0xf
>3cb7a:   008789b3add s3,a5,s0
>3cb7e:   f9048793addia5,s1,-112 # ef90 
> <__libc_initial+0xffe890aa>
>3cb82:   008784b3add s1,a5,s0
>3cb86:   77fdlui a5,0xf
>3cb88:   7d413023sd  s4,1984(sp)
>3cb8c:   7b513c23sd  s5,1976(sp)
>3cb90:   7e113423sd  ra,2024(sp)
>3cb94:   7d213823sd  s2,2000(sp)
>3cb98:   7b613823sd  s6,1968(sp)
>3cb9c:   7b713423sd  s7,1960(sp)
>3cba0:   7b813023sd  s8,1952(sp)
>3cba4:   79913c23sd  s9,1944(sp)
>3cba8:   79a13823sd  s10,1936(sp)
>3cbac:   79b13423sd  s11,1928(sp)
>3cbb0:   34878793addia5,a5,840 # f348 
> <__libc_initial+0xffe89462>
>3cbb4:   4713li  a4,1024
>3cbb8:   00132a17auipc   s4,0x132
>3cbbc:   ae0a3a03ld  s4,-1312(s4) # 16e698 
> <__stack_chk_guard>
>3cbc0:   01098893addia7,s3,16
>3cbc4:   42098693addia3,s3,1056
>3cbc8:   b8040a93addis5,s0,-1152
>3cbcc:   97a2add a5,a5,s0
>3cbce:   000a3603ld  a2,0(s4)
>3cbd2:   f8c43423sd  a2,-120(s0)
>3cbd6:   4601li  a2,0
>3cbd8:   3d14b023sd  a7,960(s1)
>3cbdc:   3ce4b423sd  a4,968(s1)
>3cbe0:   7cd4b823sd  a3,2000(s1)
>3cbe4:   7ce4bc23sd  a4,2008(s1)
>3cbe8:   b7543823sd  s5,-1168(s0)
>3cbec:   b6e43c23sd  a4,-1160(s0)
>3cbf0:   e38csd  a1,0(a5)
>3cbf2:   b0010113addisp,sp,-1280
In particular note the store at 0x3cbd8.  That's hitting (s1 + 960). If you
chase the values around, you'll find it's a bit more than 1k into unallocated
stack space.  It's also worth noting the final stack adjustment at 0x3cbf2.

While I haven't reproduced Florian's code exactly, I was able to get reasonably
close and verify my suspicion that everything was fine before sched2 and
incorrect after sched2.  It was also obvious at that point what had gone wrong
-- we were missing a stack tie after the final stack pointer adjustment.

This patch adds the missing stack tie.

While not technically a regression, I shudder at the thought of chasing one of
these issues down again in the wild.  Been there, done that.

Regression tested on rv64gc.  Verified the scheduler no longer mucked up
realpath by hand.  Pushing to the trunk.

gcc/
* config/riscv/riscv.cc (riscv_expand_prologue): Add missing stack
tie for scalable and final stack adjustment if needed.

Co-authored-by: Raphael Zinsly 

(cherry picked from commit c65046ff2ef0a9a46e59bc0b3369b2d226f6a239)
---
I've only build tested this one, but it's tripping up some of the Fedora
folks here https://bugzilla.redhat.com/show_bug.cgi?id=2242327 so I
figured it's worth backporting.
---
 gcc/config/riscv/riscv.cc | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski

On Thu, May 16, 2024, 7:46 PM Tamar Christina 
wrote:

> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_dotvnx4sivnx8hi):
> >   renamed to `dot_prod_twoway_vnx8hi'.
> >   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >   update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >   * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >   `optab_for_tree_code' and added new argument.
> >   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >   * optabs-tree.h (optab_for_tree_code_1): New.
> >   * optabs.cc (expand_widen_pattern_expr): Expand support for
> >   DOT_PROD_EXPR patterns.
> >   * optabs.def (udot_prod_twoway_optab): New.
> >   (sdot_prod_twoway_optab): Likewise.
> >   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >   support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md|  2 +-
> >  gcc/optabs-tree.cc| 23 --
> >  gcc/optabs-tree.h |  2 ++
> >  gcc/optabs.cc |  2 +-
> >  gcc/optabs.def|  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
> >  gcc/tree-vect-patterns.cc |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >icode = (e.type_suffix (0).float_p
> >  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >  : e.type_suffix (0).unsigned_p
> > -?

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Tamar Christina

Hi Victor,

> -Original Message-
> From: Victor Do Nascimento 
> Sent: Thursday, May 16, 2024 3:39 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford ; Richard Earnshaw
> ; Victor Do Nascimento
> 
> Subject: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer
> 
> From: Victor Do Nascimento 
> 
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
> 
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
> 
> This means that the following example fails autovectorization:
> 
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
> 
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
> 
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> 
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
> 
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
> 
> [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
>   renamed to `dot_prod_twoway_vnx8hi'.
>   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
>   update icodes used in line with above rename.

Please split the target specific bits from the target agnostic parts.
I.e. this patch series should be split in two.

>   * optabs-tree.cc (optab_for_tree_code_1): Renamed
>   `optab_for_tree_code' and added new argument.
>   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
>   * optabs-tree.h (optab_for_tree_code_1): New.
>   * optabs.cc (expand_widen_pattern_expr): Expand support for
>   DOT_PROD_EXPR patterns.
>   * optabs.def (udot_prod_twoway_optab): New.
>   (sdot_prod_twoway_optab): Likewise.
>   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
>   support for misc optabs that use two modes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>  : e.type_suffix (0).unsigned_p
> -? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -: CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +? CODE_FOR_udot_prod_twoway_vnx8hi
> +: CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread Segher Boessenkool

Hi!

On Fri, Apr 12, 2024 at 04:24:23PM +0800, HAO CHEN GUI wrote:
>   This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test
> data class instructions.
> 
>   This patch relies on former patch which adds optab_isnormal.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isnormal2): New expand for SFmode and
>   DFmode.

* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5357,6 +5357,30 @@ (define_expand "isfinite2"
>DONE;
>  })
> 
> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:SFDF 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"

Please put the condition on just one line if it is as simple and short
as this.

Why is TARGET_P9_VECTOR the correct condition?

> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];

This is an expander.  can_create_pseudo_p always return true.  Please
simplify the code, keeping that in mind :-)

> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
> +  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
> +  DONE;
> +})

Same issues here, of course.

> +

Why add radom white lines?  Pleaase don't.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */

If you use a -mcpu=, don't use vsx_ok.

If you use a -mcpu=, don't use -mvsx.

> +int test1 (double x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */

Just \mfcmp please (so that it also catches fcmpo, if we ever generate
that).

> +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */

Maybe you should test for one each of the s and d version?  So just
/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target lp64 } } */

Why run this on 64-bit systems only?  If there is a reason, document
that here (but is there a reason?)

> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
> -Wno-psabi" } */

Same comments here: If you have a -mcpu you do not want vsx_ok or -mvsx.

Please fix these things and resend.  Thanks!


Segher

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-16 Thread Alex Coplan

Hi Ajit,

Thanks a lot for working through the review feedback.

The patch LGTM with the two minor suggested changes below.  I can't
approve the patch, though, so you'll need an OK from Richard S.

Also, I'm not sure if it makes sense to apply the patch in isolation, it
might make more sense to only apply it in series with follow-up patches to:
 - Finish renaming any bits of the generic code that need renaming (I
   guess we'll want to rename at least ldp_bb_info to something else,
   probably there are other bits too).
 - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
   middle-end.

I'll let Richard S make the final judgement on that.  I don't really
mind either way.

On 15/05/2024 15:06, Ajit Agarwal wrote:
> Hello Alex/Richard:
> 
> All review comments are addressed.
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
> 
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
> 
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
> 
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
> 
> 2024-05-15  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
>  1 file changed, 357 insertions(+), 176 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..429e532ea3b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,225 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying handle_writeback_opportunities, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,

Very minor nit: I think an extra blank line here would be nice for readability
now that the enumerators have comments above.

> +  // All writeback opportunities including those that involve folding
> +  // base register updates into a non-writeback pair.
> +  ALL
> +};
> +

Can we have a block comment here which describes the purpose of the
class and how it fits together with the target?  Something like the
following would do:

// This class can be overriden by targets to give a pass that fuses
// adjacent loads and stores into load/store pair instructions.
//
// The target can override the various virtual functions to customize
// the behaviour of the pass as appropriate for the target.

> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };
> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming pairs from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;
> +
> +  // Return alias check limit.
> +  // This is needed to avoid

[PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-16 Thread Andrew Carlotti

The existing implementation of this function was convoluted, and had
multiple control flow errors that became apparent to me while reading
the code:

1. The initial early return only checked the properties of the first
exclusion in the list, when these properties could be different for
subsequent exclusions.

2. excl was not reset within the outer loop, so the inner loop body
would only execute during the first iteration of the outer loop.  This
effectively meant that the value of attrs[1] was ignored.

3. The function called itself recursively twice, with both last_decl and
TREE_TYPE (last_decl) as parameters. The second recursive call should
have been redundant, since attrs[1] = TREE_TYPE (last_decl) during the
first recursive call.

This patch eliminated the early return, and combines the checks with
those present within the inner loop.  It also fixes the inner loop
initialisation, and modifies the outer loop to iterate over nodes
instead of their attributes. This latter change allows the recursion to
be eliminated, by extending the new nodes array to include last_decl
(and its type) as well.

This patch provides an alternative fix for PR114634, although I wasn't
aware of that issue until rebasing on top of Jakub's fix.

I am not aware of any other compiler bugs resulting from these issues.
However, if the exclusions for target_clones were listed in the opposite
order, then it would have broken detection of the always_inline
exclusion on aarch64 (where TARGET_HAS_FMV_TARGET_ATTRIBUTE is false).

Is this ok for master?

gcc/ChangeLog:

* attribs.cc (diag_attr_exclusions): Fix and refactor.


diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
3ab0b0fd87a4404a593b2de365ea5226e31fe24a..431dd4255e68e92dd8d10bbb21ea079e50811faa
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -433,84 +433,69 @@ get_attribute_namespace (const_tree attr)
or a TYPE.  */
 
 static bool
-diag_attr_exclusions (tree last_decl, tree node, tree attrname,
+diag_attr_exclusions (tree last_decl, tree base_node, tree attrname,
  const attribute_spec *spec)
 {
-  const attribute_spec::exclusions *excl = spec->exclude;
 
-  tree_code code = TREE_CODE (node);
+  /* BASE_NODE is either the current decl to which the attribute is being
+ applied, or its type.  For the former, consider the attributes on both the
+ decl and its type.  Check both LAST_DECL and its type as well.  */
 
-  if ((code == FUNCTION_DECL && !excl->function
-   && (!excl->type || !spec->affects_type_identity))
-  || (code == VAR_DECL && !excl->variable
- && (!excl->type || !spec->affects_type_identity))
-  || (((code == TYPE_DECL || RECORD_OR_UNION_TYPE_P (node)) && 
!excl->type)))
-return false;
+  tree nodes[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
 
-  /* True if an attribute that's mutually exclusive with ATTRNAME
- has been found.  */
-  bool found = false;
+  nodes[0] = base_node;
+  if (DECL_P (base_node))
+  nodes[1] = (TREE_TYPE (base_node));
 
-  if (last_decl && last_decl != node && TREE_TYPE (last_decl) != node)
+  if (last_decl)
 {
-  /* Check both the last DECL and its type for conflicts with
-the attribute being added to the current decl or type.  */
-  found |= diag_attr_exclusions (last_decl, last_decl, attrname, spec);
-  tree decl_type = TREE_TYPE (last_decl);
-  found |= diag_attr_exclusions (last_decl, decl_type, attrname, spec);
+  nodes[2] = last_decl;
+  if (DECL_P (last_decl))
+ nodes[3] = TREE_TYPE (last_decl);
 }
 
-  /* NODE is either the current DECL to which the attribute is being
- applied or its TYPE.  For the former, consider the attributes on
- both the DECL and its type.  */
-  tree attrs[2];
-
-  if (DECL_P (node))
-{
-  attrs[0] = DECL_ATTRIBUTES (node);
-  if (TREE_TYPE (node))
-   attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
-  else
-   /* TREE_TYPE can be NULL e.g. while processing attributes on
-  enumerators.  */
-   attrs[1] = NULL_TREE;
-}
-  else
-{
-  attrs[0] = TYPE_ATTRIBUTES (node);
-  attrs[1] = NULL_TREE;
-}
+  /* True if an attribute that's mutually exclusive with ATTRNAME
+ has been found.  */
+  bool found = false;
 
   /* Iterate over the mutually exclusive attribute names and verify
  that the symbol doesn't contain it.  */
-  for (unsigned i = 0; i != ARRAY_SIZE (attrs); ++i)
+  for (unsigned i = 0; i != ARRAY_SIZE (nodes); ++i)
 {
-  if (!attrs[i])
+  tree node = nodes[i];
+
+  if (!node)
continue;
 
-  for ( ; excl->name; ++excl)
+  tree attr;
+  if DECL_P (node)
+   attr = DECL_ATTRIBUTES (node);
+  else
+   attr = TYPE_ATTRIBUTES (node);
+
+  tree_code code = TREE_CODE (node);
+
+  for (auto excl = spec->exclude; excl->name; ++excl)
{
  /* Avoid checking the attribute against itself.  */
  if (is_attribute_p (excl->name,

[PATCH] c++: paren aggr CTAD with base classes [PR115114]

2024-05-16 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk and perhaps 14?

-- >8 --

We're accidentally ignoring base classes during parenthesized aggregate
CTAD because the TYPE_FIELDS of a template type doesn't contain bases,
so we need to consider them separately.

PR c++/115114

gcc/cp/ChangeLog:

* pt.cc (maybe_aggr_guide): Consider base classes in the paren
init case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/class-deduction-aggr15.C: New test.
---
 gcc/cp/pt.cc  |  7 ++
 .../g++.dg/cpp2a/class-deduction-aggr15.C | 23 +++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d83f530ac8d..54d74989903 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30202,6 +30202,13 @@ maybe_aggr_guide (tree tmpl, tree init, 
vec *args)
   else if (TREE_CODE (init) == TREE_LIST)
 {
   int len = list_length (init);
+  for (tree binfo : BINFO_BASE_BINFOS (TYPE_BINFO (template_type)))
+   {
+ if (!len)
+   break;
+ parms = tree_cons (NULL_TREE, BINFO_TYPE (binfo), parms);
+ --len;
+   }
   for (tree field = TYPE_FIELDS (template_type);
   len;
   --len, field = DECL_CHAIN (field))
diff --git a/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C 
b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
new file mode 100644
index 000..16dc0f52b64
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/class-deduction-aggr15.C
@@ -0,0 +1,23 @@
+// PR c++/115114
+// { dg-do compile { target c++20 } }
+
+struct X {} x;
+struct Y {} y;
+
+template
+struct A : T {
+  U m;
+};
+
+using ty1 = decltype(A{x, 42}); // OK
+using ty1 = decltype(A(x, 42)); // OK, used to fail
+using ty1 = A;
+
+template
+struct B : T, V {
+  U m = 42;
+};
+
+using ty2 = decltype(B{x, y}); // OK
+using ty2 = decltype(B(x, y)); // OK, used to fail
+using ty2 = B;
-- 
2.45.1.190.g19fe900cfc

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-16 Thread Segher Boessenkool

Hi!

On Thu, May 16, 2024 at 02:56:49PM +0800, Jiufu Guo wrote:
> Jiufu Guo  writes:
> > Segher Boessenkool  writes:
> >> On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote:
> >>> Thanks so much for your great review!
> >>> Reference other messages, I'm wondering "invalid %%a value" may be
> >>> acceptable, or "invalid %%a address expression in TOC" maybe better.
> >>
> >> "%%a requires a memory operand"?  Maybe even print out the actual
> >> operand given, too.
> >
> > Thanks! I updated the code using:
> > "%%a requires a memory reference operand", since the actual operand
> > is treated as the address.
> 
> I suspect one thing here: if "%%a requires memory" is accurate vs.
> "%%a requires a memory reference".
> 
> Reference the words from doc:
> https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Generic-Operand-Modifiers
> a: Substitute a memory reference, with the actual operand treated as the
> address.
> 
> And for below code:
> '("#%a0" : :"m"(x))' is not accepted.

Yeah, it always confuses me.  Sorry.  The operand is the actual address.

> While '("#%a0" : :"r"())' is ok.
> 
> So, it may be more accurate that: "%%a" as requirement of address of
> memory.

That sounds good yes.


Segher

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski

On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
>

[PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Victor Do Nascimento

From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
  uint32_t sum = 0;
  for (int i=0; ihttps://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
[2] 
https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md (@aarch64_sve_dotvnx4sivnx8hi):
renamed to `dot_prod_twoway_vnx8hi'.
* config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
update icodes used in line with above rename.
* optabs-tree.cc (optab_for_tree_code_1): Renamed
`optab_for_tree_code' and added new argument.
(optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
* optabs-tree.h (optab_for_tree_code_1): New.
* optabs.cc (expand_widen_pattern_expr): Expand support for
DOT_PROD_EXPR patterns.
* optabs.def (udot_prod_twoway_optab): New.
(sdot_prod_twoway_optab): Likewise.
* tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
support for misc optabs that use two modes.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-dotprod-twoway.c: New.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
 gcc/config/aarch64/aarch64-sve2.md|  2 +-
 gcc/optabs-tree.cc| 23 --
 gcc/optabs-tree.h |  2 ++
 gcc/optabs.cc |  2 +-
 gcc/optabs.def|  2 ++
 .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
 gcc/tree-vect-patterns.cc |  2 +-
 8 files changed, 54 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..e457db09f66 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -764,8 +764,8 @@ public:
   icode = (e.type_suffix (0).float_p
   ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
   : e.type_suffix (0).unsigned_p
-  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
-  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
+  ? CODE_FOR_udot_prod_twoway_vnx8hi
+  : CODE_FOR_sdot_prod_twoway_vnx8hi);
 return e.use_unpred_insn (icode);
   }
 };
diff --git a/gcc/config/aarch64/aarch64-sve2.md 
b/gcc/config/aarch64/aarch64-sve2.md
index 934e57055d3..5677de7108d 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2021,7 +2021,7 @@ (define_insn "@aarch64_sve_qsub__lane_"
 )
 
 ;; Two-way dot-product.
-(define_insn "@aarch64_sve_dotvnx4sivnx8hi"
+(define_insn "dot_prod_twoway_vnx8hi"
   [(set (match_operand:VNx4SI 0 "register_operand")
(plus:VNx4SI
  (unspec:VNx4SI
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index b69a5bc3676..e3c5a618ea2 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -35,8 +35,8 @@ along with GCC; see the file COPYING3.  If not see
cannot give complete results for multiplication or division) but probably
ought to be relied on more widely throughout the expander.  */
 optab
-optab_for_tree_code (enum tree_code code, const_tree type,
-enum optab_subtype subtype)
+optab_for_tree_code_1 (enum tree_code code, const_tree type,
+  const_tree otype, enum optab_subtype subtype)
 {
   bool trapv;
   switch (code)
@@ -149,6 +149,14 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 
 case DOT_PROD_EXPR:
   {
+   if (otype && (TYPE_PRECISION (TREE_TYPE (type)) * 2
+ == TYPE_PRECISION (TREE_TYPE (otype
+ {
+   if (TYPE_UNSIGNED (type) && TYPE_UNSIGNED (otype))
+ return udot_prod_twoway_optab;
+   if (!TYPE_UNSIGNED (type) && !TYPE_UNSIGNED (otype))
+ return sdot_prod_twoway_optab;
+ }
if (subtype ==

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento


On 5/16/24 15:16, Andrew Pinski wrote:



On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento 
mailto:victor.donascime...@arm.com>> wrote:


At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
   int i;
   for(i=0; iThis most likely be tree-optimization/114061 since it is a generic 
vectorizer issue. Oh maybe reference the bug # in summary next time just 
for easier reference.


Thanks,
Andrew


My bad.

You're right, it's tree-optimization/114061.  Thanks for catching this.

Cheers,
Victor



gcc/ChangeLog:

         * tree-data-ref.cc (get_references_in_stmt): set
         `clobbers_memory' to false for __builtin_prefetch.
         * tree-vect-loop.cc (vect_transform_loop): Drop all
         __builtin_prefetch calls from loops.

gcc/testsuite/ChangeLog:

         * gcc.dg/vect/vect-prefetch-drop.c: New test.
---
  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
  gcc/tree-data-ref.cc                           |  9 +
  gcc/tree-vect-loop.cc                          |  7 ++-
  3 files changed, 29 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
new file mode 100644
index 000..57723a8c972
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
-fdump-tree-vect-details" { target { aarch64*-*-* } } } */
+
+void foo(double * restrict a, double * restrict b, int n){
+  int i;
+  for(i=0; i+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" 
} } */

diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index f37734b5340..47bfec0f922 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
vec *references)
             clobbers_memory = true;
             break;
           }
+
+      else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+       {
+         enum built_in_function fn_type = DECL_FUNCTION_CODE
(TREE_OPERAND (gimple_call_fn (stmt), 0));
+         if (fn_type == BUILT_IN_PREFETCH)
+           clobbers_memory = false;
+         else
+           clobbers_memory = true;
+       }
        else
         clobbers_memory = true;
      }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info
loop_vinfo, gimple *loop_vectorized_call)
            !gsi_end_p (si);)
         {
           stmt = gsi_stmt (si);
-         /* During vectorization remove existing clobber stmts.  */
+         /* During vectorization remove existing clobber stmts and
+            prefetches.  */
           if (gimple_clobber_p (stmt))
             {
               unlink_stmt_vdef (stmt);
               gsi_remove (, true);
               release_defs (stmt);
             }
+         else if (gimple_call_builtin_p (stmt) &&
+                  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn
(stmt),
+                                                    0)) ==
BUILT_IN_PREFETCH)
+               gsi_remove (, true);
           else
             {
               /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1

Re: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-16 Thread 钟居哲

LGTM this patch （fix for vfwadd.wf).

And here is a simpel case to reproduce same bug for vwadd.wx:

https://compiler-explorer.com/z/4rP9Yvdq1

#include 
#include 

vint64m8_t test_vwadd_wx_i64m8_m(vbool8_t vm, vint64m8_t vs2, int rs1, size_t 
vl) {
  return __riscv_vwadd_wx_i64m8_m(vm, vs2, rs1, vl);
}

char global_memory[1024];
void *fake_memory = (void *)global_memory;

int main ()
{
  asm volatile("fence":::"memory");
  long x;
  asm volatile("":"=r"(x)::"memory");
  vint64m8_t vwadd_wx_i64m8_m_vd = test_vwadd_wx_i64m8_m(
__riscv_vreinterpret_v_i8m1_b8(__riscv_vundefined_i8m1()), 
__riscv_vundefined_i64m8(), x, __riscv_vsetvlmax_e64m8());
  asm volatile(""::"vr"(vwadd_wx_i64m8_m_vd):"memory");

  return 0;
}

main:
fence
vsetvli a4,zero,e32,m4,ta,ma
vwadd.wxv0,v8,a5,v0.t > vd and vm are both v0 which is 
wrong.
li  a0,0
ret


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-16 03:31
To: 钟居哲; gcc-patches
CC: rdapp.gcc; palmer; kito.cheng; Jeff Law
Subject: Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].
> I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ?
 
Yes, will do.  At first I didn't manage to reproduce it because we
seem to be lacking a combine-opt pattern for it.  I'm going to post
it separately.
 
Regards
Robin

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Andrew Pinski

On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
>
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
>
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.
>
>PR target/114061
>

This most likely be tree-optimization/114061 since it is a generic
vectorizer issue. Oh maybe reference the bug # in summary next time just
for easier reference.

Thanks,
Andrew


> gcc/ChangeLog:
>
> * tree-data-ref.cc (get_references_in_stmt): set
> `clobbers_memory' to false for __builtin_prefetch.
> * tree-vect-loop.cc (vect_transform_loop): Drop all
> __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-prefetch-drop.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
>  gcc/tree-data-ref.cc   |  9 +
>  gcc/tree-vect-loop.cc  |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..57723a8c972
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
> -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m,
> z\[0-9\]+.d, z\[0-9\]+.d" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index f37734b5340..47bfec0f922 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
> vec *references)
> clobbers_memory = true;
> break;
>   }
> +
> +  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +   {
> + enum built_in_function fn_type = DECL_FUNCTION_CODE
> (TREE_OPERAND (gimple_call_fn (stmt), 0));
> + if (fn_type == BUILT_IN_PREFETCH)
> +   clobbers_memory = false;
> + else
> +   clobbers_memory =

[PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento

At present the autovectorizer fails to vectorize simple loops
involving calls to `__builtin_prefetch'.  A simple example of such
loop is given below:

void foo(double * restrict a, double * restrict b, int n){
  int i;
  for(i=0; i *references)
clobbers_memory = true;
break;
  }
+
+  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+   {
+ enum built_in_function fn_type = DECL_FUNCTION_CODE (TREE_OPERAND 
(gimple_call_fn (stmt), 0));
+ if (fn_type == BUILT_IN_PREFETCH)
+   clobbers_memory = false;
+ else
+   clobbers_memory = true;
+   }
   else
clobbers_memory = true;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 361aec06488..65e8b421d80 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12069,13 +12069,18 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
   !gsi_end_p (si);)
{
  stmt = gsi_stmt (si);
- /* During vectorization remove existing clobber stmts.  */
+ /* During vectorization remove existing clobber stmts and
+prefetches.  */
  if (gimple_clobber_p (stmt))
{
  unlink_stmt_vdef (stmt);
  gsi_remove (, true);
  release_defs (stmt);
}
+ else if (gimple_call_builtin_p (stmt) &&
+  DECL_FUNCTION_CODE (TREE_OPERAND (gimple_call_fn (stmt),
+0)) == BUILT_IN_PREFETCH)
+   gsi_remove (, true);
  else
{
  /* Ignore vector stmts created in the outer loop.  */
-- 
2.34.1

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento

The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.

These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.

libatomic/ChangeLog:

* configure.ac: Add call to LIBAT_TEST_FEAT_LRCPC3() test.
* configure: Regenerate.
* config/linux/aarch64/host-config.h (has_rcpc3): New.
(HWCAP2_LRCPC3): Likewise.
(LSE2_LRCPC3_ATOP): Likewise.
* libatomic/config/linux/aarch64/atomic_16.S: New +rcpc3 .arch
directives.
* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* acinclude.m4 (LIBAT_TEST_FEAT_AARCH64_LRCPC3): New.
(HAVE_FEAT_LRCPC3): Likewise
(ARCH_AARCH64_HAVE_LRCPC3): Likewise.
* auto-config.h.in (HAVE_FEAT_LRCPC3): New.
---
 libatomic/acinclude.m4   | 18 +++
 libatomic/auto-config.h.in   |  3 ++
 libatomic/config/linux/aarch64/atomic_16.S   | 55 +++-
 libatomic/config/linux/aarch64/host-config.h | 39 --
 libatomic/configure  | 41 +++
 libatomic/configure.ac   |  1 +
 6 files changed, 152 insertions(+), 5 deletions(-)

diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index 6d2e0b1c355..628275b9945 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -101,6 +101,24 @@ AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LSE128],[
[Have LSE128 support for 16 byte integers.])
 ])
 
+dnl
+dnl Test if the host assembler supports armv8.2-a RCPC3 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_AARCH64_LRCPC3],[
+  AC_CACHE_CHECK([for armv8.2-a LRCPC3 insn support],
+[libat_cv_have_feat_lrcpc3],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv8.2-a+rcpc3")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lrcpc3=yes
+else
+  eval libat_cv_have_feat_lrcpc3=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LRCPC3], [$libat_cv_have_feat_lrcpc3],
+   [Have LRCPC3 support for 16 byte integers.])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index 7c78933b07d..a925686effa 100644
--- a/libatomic/auto-config.h.in
+++ b/libatomic/auto-config.h.in
@@ -108,6 +108,9 @@
 /* Have LSE128 support for 16 byte integers. */
 #undef HAVE_FEAT_LSE128
 
+/* Have LRCPC3 support for 16 byte integers. */
+#undef HAVE_FEAT_LRCPC3
+
 /* Define to 1 if you have the  header file. */
 #undef HAVE_FENV_H
 
diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 27363f82b75..47ceb7301c9 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -42,7 +42,13 @@
 
 #if HAVE_IFUNC
 # if HAVE_FEAT_LSE128
+#  if HAVE_FEAT_LRCPC3
+   .arch   armv9-a+lse128+rcpc3
+#  else
.arch   armv9-a+lse128
+#  endif
+# elif HAVE_FEAT_LRCPC3
+   .arch   armv8-a+lse+rcpc3
 # else
.arch   armv8-a+lse
 # endif
@@ -50,9 +56,20 @@
.arch   armv8-a+lse
 #endif
 
+/* There is overlap in some atomic instructions being implemented in both RCPC3
+   and LSE2 extensions, so both _i1 and _i2 suffixes are needed in such
+   situations.  Otherwise, all extension-specific implementations are mapped
+   to _i1.  */
+
+#if HAVE_FEAT_LRCPC3
+# define LRCPC3(NAME)  libat_##NAME##_i1
+# define LSE2(NAME)libat_##NAME##_i2
+#else
+# define LSE2(NAME)libat_##NAME##_i1
+#endif
+
 #define LSE128(NAME)   libat_##NAME##_i1
 #define LSE(NAME)  libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
@@ -722,6 +739,42 @@ ENTRY_FEAT (and_fetch_16, LSE128)
ret
 END_FEAT (and_fetch_16, LSE128)
 #endif /* HAVE_FEAT_LSE128 */
+
+
+#if HAVE_FEAT_LRCPC3
+ENTRY_FEAT (load_16, LRCPC3)
+   cbnzw1, 1f
+
+   /* RELAXED.  */
+   ldp res0, res1, [x0]
+   ret
+1:
+   cmp w1, SEQ_CST
+   b.eq2f
+
+   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
+   ldiapp  res0, res1, [x0]
+   ret
+
+   /* SEQ_CST.  */
+2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
+   ldiapp  res0, res1, [x0]
+   ret
+END_FEAT (load_16, LRCPC3)
+
+
+ENTRY_FEAT (store_16, LRCPC3)
+   cbnzw4, 1f
+
+   /* RELAXED.  */
+   stp in0, in1, [x0]
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+1: stilp   in0, in1, [x0]
+   ret
+END_FEAT

RE: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-16 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 16, 2024 8:19 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; tamar.christina 
; Richard Biener ; 
richard.sandiford ; Li, Pan2 
Subject: Re: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

RISC-V part LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
tamar.christina; 
richard.guenther; 
Richard.Sandiford; Pan 
Li
Subject: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite
From: Pan Li mailto:pan2...@intel.com>>

After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.

The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 2 ++
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98..2f80bf89e5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb..101ae1e0eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
/* { dg-require-effective-target vect_early_break_hw } */
/* { dg-require-effective-target vect_long_long } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
#include "tree-vect.h"
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6f5d477b128..ec9baa4f32a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v]
}}]
}
@@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v_ok]
}}]
}
--
2.34.1

RE: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-16 Thread Li, Pan2

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, May 16, 2024 8:19 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; tamar.christina 
; Richard Biener ; 
richard.sandiford ; Li, Pan2 
Subject: Re: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len

RISC-V part LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
tamar.christina; 
richard.guenther; 
Richard.Sandiford; Pan 
Li
Subject: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
From: Pan Li mailto:pan2...@intel.com>>

After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:

unsigned vect_a[1923];
unsigned vect_b[1923];

void test (unsigned limit, int n)
{
  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;

  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}

  vect_a[i] = limit;
}
}

Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret

After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...

The below tests are passed for this patch:
1. The riscv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec-opt.md
  (*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern
of vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec
  UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add
VLS mode to popcount pattern.
(@pred_popcount): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/autovec-opt.md   | 33 ++
gcc/config/riscv/autovec.md   | 61 +++
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md| 18 +++---
.../riscv/rvv/autovec/early-break-1.c | 34 +++
.../riscv/rvv/autovec/early-break-2.c | 37 +++
6 files changed, 175 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..04f85d8e455 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+ code_for_pred_popcount (mode, Pmode),
+ riscv_vector::CPOP_OP,
+ operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..1ee3c8052fb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,64 @@ (define_expand "rawmemchr"
 DONE;
   }
)
+
+;; =
+;; == Early break auto-vectorization patterns
+;;

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento

In order to facilitate the fine-tuning of how `libatomic_i.h' and
`host-config.h' headers are used by different atomic functions, we
define distinct identifier macros for each file which, in implementing
atomic operations, imports these headers.

The idea is that different parts of these headers could then be
conditionally defined depending on the macros set by the file that
`#include'd them.

Given how it is possible that some file names are generic enough that
using them as-is for macro names (e.g. flag.c -> FLAG) may potentially
lead to name clashes with other macros, all file names first have LAT_
prepended to them such that, for example, flag.c is assigned the
LAT_FLAG macro.

Libatomic/ChangeLog:

* cas_n.c (LAT_CAS_N): New.
* exch_n.c (LAT_EXCH_N): Likewise.
* fadd_n.c (LAT_FADD_N): Likewise.
* fand_n.c (LAT_FAND_N): Likewise.
* fence.c (LAT_FENCE): Likewise.
* fenv.c (LAT_FENV): Likewise.
* fior_n.c (LAT_FIOR_N): Likewise.
* flag.c (LAT_FLAG): Likewise.
* fnand_n.c (LAT_FNAND_N): Likewise.
* fop_n.c (LAT_FOP_N): Likewise
* fsub_n.c (LAT_FSUB_N): Likewise.
* fxor_n.c (LAT_FXOR_N): Likewise.
* gcas.c (LAT_GCAS): Likewise.
* gexch.c (LAT_GEXCH): Likewise.
* glfree.c (LAT_GLFREE): Likewise.
* gload.c (LAT_GLOAD): Likewise.
* gstore.c (LAT_GSTORE): Likewise.
* load_n.c (LAT_LOAD_N): Likewise.
* store_n.c (LAT_STORE_N): Likewise.
* tas_n.c (LAT_TAS_N): Likewise.
---
 libatomic/cas_n.c   | 2 ++
 libatomic/exch_n.c  | 2 ++
 libatomic/fadd_n.c  | 2 ++
 libatomic/fand_n.c  | 2 ++
 libatomic/fence.c   | 2 ++
 libatomic/fenv.c| 2 ++
 libatomic/fior_n.c  | 2 ++
 libatomic/flag.c| 2 ++
 libatomic/fnand_n.c | 2 ++
 libatomic/fop_n.c   | 2 ++
 libatomic/fsub_n.c  | 2 ++
 libatomic/fxor_n.c  | 2 ++
 libatomic/gcas.c| 2 ++
 libatomic/gexch.c   | 2 ++
 libatomic/glfree.c  | 2 ++
 libatomic/gload.c   | 2 ++
 libatomic/gstore.c  | 2 ++
 libatomic/load_n.c  | 2 ++
 libatomic/store_n.c | 2 ++
 libatomic/tas_n.c   | 2 ++
 20 files changed, 40 insertions(+)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index a080b990371..2a6357e48db 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_CAS_N
 #include "libatomic_i.h"
 
 
@@ -122,3 +123,4 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
 #endif
 
 EXPORT_ALIAS (SIZE(compare_exchange));
+#undef LAT_CAS_N
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index e5ff80769b9..184d3de1009 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_EXCH_N
 #include "libatomic_i.h"
 
 
@@ -126,3 +127,4 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel 
UNUSED)
 #endif
 
 EXPORT_ALIAS (SIZE(exchange));
+#undef LAT_EXCH_N
diff --git a/libatomic/fadd_n.c b/libatomic/fadd_n.c
index bc15b8bc0e6..32b75cec654 100644
--- a/libatomic/fadd_n.c
+++ b/libatomic/fadd_n.c
@@ -22,6 +22,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FADD_N
 #include 
 
 #define NAME   add
@@ -43,3 +44,4 @@
 #endif
 
 #include "fop_n.c"
+#undef LAT_FADD_N
diff --git a/libatomic/fand_n.c b/libatomic/fand_n.c
index ffe9ed8700f..9eab55bcd72 100644
--- a/libatomic/fand_n.c
+++ b/libatomic/fand_n.c
@@ -1,3 +1,5 @@
+#define LAT_FAND_N
 #define NAME   and
 #define OP(X,Y)((X) & (Y))
 #include "fop_n.c"
+#undef LAT_FAND_N
diff --git a/libatomic/fence.c b/libatomic/fence.c
index a9b1e280c5a..4022194a57a 100644
--- a/libatomic/fence.c
+++ b/libatomic/fence.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENCE
 #include "libatomic_i.h"
 
 #include 
@@ -43,3 +44,4 @@ void
 {
   atomic_signal_fence (order);
 }
+#undef LAT_FENCE
diff --git a/libatomic/fenv.c b/libatomic/fenv.c
index 41f187c1f85..dccad356a31 100644
--- a/libatomic/fenv.c
+++ b/libatomic/fenv.c
@@ -21,6 +21,7 @@
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.  */
 
+#define LAT_FENV
 #include "libatomic_i.h"
 
 #ifdef HAVE_FENV_H
@@ -70,3 +71,4 @@ __atomic_feraiseexcept (int excepts __attribute__ ((unused)))
 }
 #endif
 }
+#undef LAT_FENV
diff --git a/libatomic/fior_n.c b/libatomic/fior_n.c
index 55d0d66b469..2b58d4805d6 100644
--- a/libatomic/fior_n.c
+++ b/libatomic/fior_n.c
@@ -1,3 +1,5 @@
+#define LAT_FIOR_N
 #define NAME   or
 #define OP(X,Y)((X) | (Y))
 #include "fop_n.c"
+#undef LAT_FIOR_N
diff --git a/libatomic/flag.c b/libatomic/flag.c
index e4a5a27819a..8afd80c9130 100644
---

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento

At present, `atomic_16.S' groups different implementations of the
same functions together in the file.  Therefore, as an example,
the LSE128 implementation of `exchange_16' follows on immediately
from its core implementation, as does the `fetch_or_16' LSE128
implementation.

Such architectural extension-dependent implementations are dependent
both on ifunc and assembler support.  They may therefore conceivably
be guarded by 2 preprocessor macros, e.g. `#if HAVE_IFUNC' and `#if
HAVE_FEAT_LSE128'.

Having to apply these guards on a per-function basis adds unnecessary
clutter to the file and makes its maintenance more error-prone.

We therefore reorganize the layout of the file in such a way that all
core implementations needing no `#ifdef's are placed first, followed
by all ifunc-dependent implementations, which can all be guarded by a
single `#if HAVE_IFUNC'.  Within the guard, these are then subdivided
and organized according to architectural extension requirements such
that in the case of LSE128-specific functions, for example, they can
all be guarded by a single `#if HAVE_FEAT_LSE128', greatly reducing
the overall number of required `#ifdef' macros.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: reshuffle functions.
---
 libatomic/config/linux/aarch64/atomic_16.S | 583 ++---
 1 file changed, 288 insertions(+), 295 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 16ff03057ab..27363f82b75 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,15 +40,12 @@
 
 #include "auto-config.h"
 
-#if !HAVE_IFUNC
-# undef HAVE_FEAT_LSE128
-# define HAVE_FEAT_LSE128 0
-#endif
-
-#define HAVE_FEAT_LSE2 HAVE_IFUNC
-
-#if HAVE_FEAT_LSE128
+#if HAVE_IFUNC
+# if HAVE_FEAT_LSE128
.arch   armv9-a+lse128
+# else
+   .arch   armv8-a+lse
+# endif
 #else
.arch   armv8-a+lse
 #endif
@@ -124,6 +121,8 @@ NAME:   \
 #define ACQ_REL 4
 #define SEQ_CST 5
 
+/* Core atomic operation implementations.  These are available irrespective of
+   ifunc support or the presence of additional architectural extensions.  */
 
 ENTRY (load_16)
mov x5, x0
@@ -143,31 +142,6 @@ ENTRY (load_16)
 END (load_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (load_16, LSE2)
-   cbnzw1, 1f
-
-   /* RELAXED.  */
-   ldp res0, res1, [x0]
-   ret
-1:
-   cmp w1, SEQ_CST
-   b.eq2f
-
-   /* ACQUIRE/CONSUME (Load-AcquirePC semantics).  */
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-
-   /* SEQ_CST.  */
-2: ldartmp0, [x0]  /* Block reordering with Store-Release instr.  
*/
-   ldp res0, res1, [x0]
-   dmb ishld
-   ret
-END_FEAT (load_16, LSE2)
-#endif
-
-
 ENTRY (store_16)
cbnzw4, 2f
 
@@ -185,23 +159,6 @@ ENTRY (store_16)
 END (store_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (store_16, LSE2)
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   stp in0, in1, [x0]
-   ret
-
-   /* RELEASE/SEQ_CST.  */
-1: ldxpxzr, tmp0, [x0]
-   stlxp   w4, in0, in1, [x0]
-   cbnzw4, 1b
-   ret
-END_FEAT (store_16, LSE2)
-#endif
-
-
 ENTRY (exchange_16)
mov x5, x0
cbnzw4, 2f
@@ -229,31 +186,6 @@ ENTRY (exchange_16)
 END (exchange_16)
 
 
-#if HAVE_FEAT_LSE128
-ENTRY_FEAT (exchange_16, LSE128)
-   mov tmp0, x0
-   mov res0, in0
-   mov res1, in1
-   cbnzw4, 1f
-
-   /* RELAXED.  */
-   swppres0, res1, [tmp0]
-   ret
-1:
-   cmp w4, ACQUIRE
-   b.hi2f
-
-   /* ACQUIRE/CONSUME.  */
-   swppa   res0, res1, [tmp0]
-   ret
-
-   /* RELEASE/ACQ_REL/SEQ_CST.  */
-2: swppal  res0, res1, [tmp0]
-   ret
-END_FEAT (exchange_16, LSE128)
-#endif
-
-
 ENTRY (compare_exchange_16)
ldp exp0, exp1, [x1]
cbz w4, 3f
@@ -301,43 +233,97 @@ ENTRY (compare_exchange_16)
 END (compare_exchange_16)
 
 
-#if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE)
-   ldp exp0, exp1, [x1]
-   mov tmp0, exp0
-   mov tmp1, exp1
-   cbz w4, 2f
-   cmp w4, RELEASE
-   b.hs3f
+ENTRY (fetch_or_16)
+   mov x5, x0
+   cbnzw4, 2f
 
-   /* ACQUIRE/CONSUME.  */
-   caspa   exp0, exp1, in0, in1, [x0]
-0:
-   cmp exp0, tmp0
-   ccmpexp1, tmp1, 0, eq
-   bne 1f
-   mov x0, 1
+   /* RELAXED.  */
+1: ldxpres0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stxpw4, tmp0, tmp1, [x5]
+   cbnzw4, 1b
ret
-1:
-   stp exp0, exp1, [x1]
-   mov x0, 0
+
+   /* ACQUIRE/CONSUME/RELEASE/ACQ_REL/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   orr tmp0, res0, in0
+   orr tmp1, res1, in1
+   stlxp   w4, tmp0, tmp1, [x5]
+   cbnz

[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento

By querying previously-defined file-identifier macros, `host-config.h'
is able to get information about its environment and, based on this
information, select more appropriate function-specific ifunc
selectors.  This reduces the number of unnecessary feature tests that
need to be carried out in order to find the best atomic implementation
for a function at run-time.

An immediate benefit of this is that we can further fine-tune the
architectural requirements for each atomic function without risk of
incurring the maintenance and runtime-performance penalties of having
to maintain an ifunc selector with a huge number of alternatives, most
of which are irrelevant for any particular function.  Consequently,
for AArch64 targets, we relax the architectural requirements of
`compare_exchange_16', which now requires only LSE as opposed to the
newer LSE2.

The new flexibility provided by this approach also means that certain
functions can now be called directly, doing away with ifunc selectors
altogether when only a single implementation is available for it on a
given target.  As per the macro expansion framework laid out in
`libatomic_i.h', such functions should have their names prefixed with
`__atomic_' as opposed to `libat_'.  This is the same prefix applied
to function names when Libatomic is configured with
`--disable-gnu-indirect-function'.

To achieve this, these functions unconditionally apply the aliasing
rule that at present is conditionally applied only when libatomic is
built without ifunc support, which ensures that the default
`libat_##NAME' is accessible via the equivalent `__atomic_##NAME' too.
This is ensured by using the new `ENTRY_ALIASED' macro.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S (LSE): New.
(ENTRY_ALIASED): Likewise.
* config/linux/aarch64/host-config.h (LSE_ATOP): New.
(LSE2_ATOP): Likewise.
(LSE128_ATOP): Likewise.
(IFUNC_COND_1): Make its definition conditional on above 3
macros.
(IFUNC_NCOND): Likewise.
---
 libatomic/config/linux/aarch64/atomic_16.S   | 31 +
 libatomic/config/linux/aarch64/host-config.h | 35 
 2 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index b63e97ac5a2..1517e9e78df 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -54,17 +54,20 @@
 #endif
 
 #define LSE128(NAME)   libat_##NAME##_i1
-#define LSE2(NAME) libat_##NAME##_i2
+#define LSE(NAME)  libat_##NAME##_i1
+#define LSE2(NAME) libat_##NAME##_i1
 #define CORE(NAME) libat_##NAME
 #define ATOMIC(NAME)   __atomic_##NAME
 
+/* Emit __atomic_* entrypoints if no ifuncs.  */
+#define ENTRY_ALIASED(NAME)ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+
 #if HAVE_IFUNC
 # define ENTRY(NAME)   ENTRY2 (CORE (NAME), )
 # define ENTRY_FEAT(NAME, FEAT) ENTRY2 (FEAT (NAME), )
 # define END_FEAT(NAME, FEAT)  END2 (FEAT (NAME))
 #else
-/* Emit __atomic_* entrypoints if no ifuncs.  */
-# define ENTRY(NAME)   ENTRY2 (CORE (NAME), ALIAS (NAME, ATOMIC, CORE))
+# define ENTRY(NAME)   ENTRY_ALIASED (NAME)
 #endif
 
 #define END(NAME)  END2 (CORE (NAME))
@@ -299,7 +302,7 @@ END (compare_exchange_16)
 
 
 #if HAVE_FEAT_LSE2
-ENTRY_FEAT (compare_exchange_16, LSE2)
+ENTRY_FEAT (compare_exchange_16, LSE)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -332,11 +335,11 @@ ENTRY_FEAT (compare_exchange_16, LSE2)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END_FEAT (compare_exchange_16, LSE2)
+END_FEAT (compare_exchange_16, LSE)
 #endif
 
 
-ENTRY (fetch_add_16)
+ENTRY_ALIASED (fetch_add_16)
mov x5, x0
cbnzw4, 2f
 
@@ -358,7 +361,7 @@ ENTRY (fetch_add_16)
 END (fetch_add_16)
 
 
-ENTRY (add_fetch_16)
+ENTRY_ALIASED (add_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -380,7 +383,7 @@ ENTRY (add_fetch_16)
 END (add_fetch_16)
 
 
-ENTRY (fetch_sub_16)
+ENTRY_ALIASED (fetch_sub_16)
mov x5, x0
cbnzw4, 2f
 
@@ -402,7 +405,7 @@ ENTRY (fetch_sub_16)
 END (fetch_sub_16)
 
 
-ENTRY (sub_fetch_16)
+ENTRY_ALIASED (sub_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -624,7 +627,7 @@ END_FEAT (and_fetch_16, LSE128)
 #endif
 
 
-ENTRY (fetch_xor_16)
+ENTRY_ALIASED (fetch_xor_16)
mov x5, x0
cbnzw4, 2f
 
@@ -646,7 +649,7 @@ ENTRY (fetch_xor_16)
 END (fetch_xor_16)
 
 
-ENTRY (xor_fetch_16)
+ENTRY_ALIASED (xor_fetch_16)
mov x5, x0
cbnzw4, 2f
 
@@ -668,7 +671,7 @@ ENTRY (xor_fetch_16)
 END (xor_fetch_16)
 
 
-ENTRY (fetch_nand_16)
+ENTRY_ALIASED (fetch_nand_16)
mov x5, x0
mvn in0, in0
mvn in1, in1
@@ -692,7 +695,7 @@ ENTRY (fetch_nand_16)
 END (fetch_nand_16)
 
 
-ENTRY (nand_fetch_16)
+ENTRY_ALIASED

[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento

Following improvements to the way ifuncs are selected based on
detected architectural features, we are able to do away with many of
the aliases that were previously needed for subsets of atomic
functions that were not implemented in a given extension.

This may be clarified by virtue of an example. Before, LSE128
functions carried the suffix _i1 and LSE2 functions the _i2.

Using a single ifunc selector for all atomic functions meant that if
LSE128 was detected, the _i1 function variant would be used
indiscriminately, irrespective of whether or not a function had an
LSE128-specific implementation.  Aliasing was thus needed to redirect
calls to these missing functions to their _i2 LSE2 alternatives.

The more architectural extensions for which support was added, the
more complex the aliasing chain.

With the per-file configuration of ifuncs, we do away with the need
for such aliasing.

libatomic/ChangeLog:

* config/linux/aarch64/atomic_16.S: Remove unnecessary
aliasing.
---
 libatomic/config/linux/aarch64/atomic_16.S | 41 --
 1 file changed, 41 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 1517e9e78df..16ff03057ab 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -732,47 +732,6 @@ ENTRY_ALIASED (test_and_set_16)
 END (test_and_set_16)
 
 
-/* Alias entry points which are the same in LSE2 and LSE128.  */
-
-#if HAVE_IFUNC
-# if !HAVE_FEAT_LSE128
-ALIAS (exchange_16, LSE128, LSE2)
-ALIAS (fetch_or_16, LSE128, LSE2)
-ALIAS (fetch_and_16, LSE128, LSE2)
-ALIAS (or_fetch_16, LSE128, LSE2)
-ALIAS (and_fetch_16, LSE128, LSE2)
-# endif
-ALIAS (load_16, LSE128, LSE2)
-ALIAS (store_16, LSE128, LSE2)
-ALIAS (compare_exchange_16, LSE128, LSE2)
-ALIAS (fetch_add_16, LSE128, LSE2)
-ALIAS (add_fetch_16, LSE128, LSE2)
-ALIAS (fetch_sub_16, LSE128, LSE2)
-ALIAS (sub_fetch_16, LSE128, LSE2)
-ALIAS (fetch_xor_16, LSE128, LSE2)
-ALIAS (xor_fetch_16, LSE128, LSE2)
-ALIAS (fetch_nand_16, LSE128, LSE2)
-ALIAS (nand_fetch_16, LSE128, LSE2)
-ALIAS (test_and_set_16, LSE128, LSE2)
-
-/* Alias entry points which are the same in baseline and LSE2.  */
-
-ALIAS (exchange_16, LSE2, CORE)
-ALIAS (fetch_add_16, LSE2, CORE)
-ALIAS (add_fetch_16, LSE2, CORE)
-ALIAS (fetch_sub_16, LSE2, CORE)
-ALIAS (sub_fetch_16, LSE2, CORE)
-ALIAS (fetch_or_16, LSE2, CORE)
-ALIAS (or_fetch_16, LSE2, CORE)
-ALIAS (fetch_and_16, LSE2, CORE)
-ALIAS (and_fetch_16, LSE2, CORE)
-ALIAS (fetch_xor_16, LSE2, CORE)
-ALIAS (xor_fetch_16, LSE2, CORE)
-ALIAS (fetch_nand_16, LSE2, CORE)
-ALIAS (nand_fetch_16, LSE2, CORE)
-ALIAS (test_and_set_16, LSE2, CORE)
-#endif
-
 /* GNU_PROPERTY_AARCH64_* macros from elf.h for use in asm code.  */
 #define FEATURE_1_AND 0xc000
 #define FEATURE_1_BTI 1
-- 
2.34.1

[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento

The recent introduction of the optional LSE128 and RCPC3 architectural
extensions to AArch64 has further led to the increased flexibility of
atomic support in the architecture, with many extensions providing
support for distinct atomic operations, each with different potential
applications in mind.

This has led to maintenance difficulties in Libatomic, in particular
regarding the way the ifunc selector is generated via a series of
macro expansions at compile-time.

Until now, irrespective of the atomic operation in question, all atomic
functions for a particular operand size were expected to have the same
number of ifunc alternatives, meaning that a one-size-fits-all
approach could reasonably be taken for the selector.

This meant that if, hypothetically, for a particular architecture and
operand size one particular atomic operation was to have 3 different
implementations associated with different extensions, libatomic would
likewise be required to present three ifunc alternatives for all other
atomic functions.

The consequence in the design choice was the unnecessary use of
function aliasing and the unwieldy code which resulted from this.

This patch series attempts to remediate this issue by making the
preprocessor macros defining the number of ifunc alternatives and
their respective selection functions dependent on the file importing
the ifunc selector-generating framework.

all files are given `LAT_' macros, defined at the beginning
and undef'd at the end of the file.  It is these macros that are
subsequently used to fine-tune the behaviors of `libatomic_i.h' and
`host-config.h'.

In particular, the definition of the `IFUNC_NCOND(N)' and
`IFUNC_COND_' macros in host-config.h can now be guarded behind
these new file-specific macros, which ultimately control what the
`GEN_SELECTOR(X)' macro in `libatomic_i.h' expands to.  As both of
these headers are imported once per file implementing some atomic
operation, fine-tuned control is now possible.

Regtested with both `--enable-gnu-indirect-function' and
`--disable-gnu-indirect-function' configurations on armv9.4-a target
with LRCPC3 and LSE128 support and without.

Victor Do Nascimento (4):
  Libatomic: Define per-file identifier macros
  Libatomic: Make ifunc selector behavior contingent on importing file
  Libatomic: Clean up AArch64 ifunc aliasing
  Libatomic: Clean up AArch64 `atomic_16.S' implementation file

 libatomic/cas_n.c|   2 +
 libatomic/config/linux/aarch64/atomic_16.S   | 623 +--
 libatomic/config/linux/aarch64/host-config.h |  35 +-
 libatomic/exch_n.c   |   2 +
 libatomic/fadd_n.c   |   2 +
 libatomic/fand_n.c   |   2 +
 libatomic/fence.c|   2 +
 libatomic/fenv.c |   2 +
 libatomic/fior_n.c   |   2 +
 libatomic/flag.c |   2 +
 libatomic/fnand_n.c  |   2 +
 libatomic/fop_n.c|   2 +
 libatomic/fsub_n.c   |   2 +
 libatomic/fxor_n.c   |   2 +
 libatomic/gcas.c |   2 +
 libatomic/gexch.c|   2 +
 libatomic/glfree.c   |   2 +
 libatomic/gload.c|   2 +
 libatomic/gstore.c   |   2 +
 libatomic/load_n.c   |   2 +
 libatomic/store_n.c  |   2 +
 libatomic/tas_n.c|   2 +
 22 files changed, 357 insertions(+), 341 deletions(-)

-- 
2.34.1

Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener

On Thu, 16 May 2024, Jeff Law wrote:

> 
> 
> On 5/16/24 6:03 AM, Richard Biener wrote:
> > Now that we handle pt.null conservatively we can implement the missing
> > tracking of constant pool entries (aka STRING_CST) and handle
> > ptr-ptr compares using points-to info in ptrs_compare_unequal.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.
> > 
> > Richard.
> > 
> >  PR tree-optimization/13962
> >  PR tree-optimization/96564
> >  * tree-ssa-alias.h (pt_solution::const_pool): New flag.
> >  * tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
> >  compares.
> >  (dump_points_to_solution): Dump the const_pool flag, fix guard
> >  of flag dumping.
> >  * gimple-pretty-print.cc (pp_points_to_solution): Likewise.
> >  * tree-ssa-structalias.cc (find_what_var_points_to): Set
> >  the const_pool flag for STRING.
> >  (pt_solution_ior_into): Handle the const_pool flag.
> >  (ipa_escaped_pt): Initialize it.
> > 
> >  * gcc.dg/tree-ssa/alias-39.c: New testcase.
> >  * g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
> >  to manifest in transforms no longer vectorizing this testcase
> >  for an ICE.
> You might want to test this against 92539 as well.  There's a nonzero chance
> it'll resolve that one.

Unfortunately it doesn't.

Richard.

Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF

2024-05-16 Thread Jan Hubicka

Hi,
TARGET_MEM_REF can be used to offset constant base into a memory object (to
produce lea instruction).  This confuses points_to_local_or_readonly_memory_p
which treats the constant address as a base of the access.

Bootstrapped/regtsted x86_64-linux, comitted.
Honza

gcc/ChangeLog:

PR ipa/113787
* ipa-fnsummary.cc (points_to_local_or_readonly_memory_p): Do not
look into TARGET_MEM_REFS with constant opreand 0.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr113787.c: New test.

diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
index 07a853f78e3..2faf2389297 100644
--- a/gcc/ipa-fnsummary.cc
+++ b/gcc/ipa-fnsummary.cc
@@ -2648,7 +2648,9 @@ points_to_local_or_readonly_memory_p (tree t)
return true;
   return !ptr_deref_may_alias_global_p (t, false);
 }
-  if (TREE_CODE (t) == ADDR_EXPR)
+  if (TREE_CODE (t) == ADDR_EXPR
+  && (TREE_CODE (TREE_OPERAND (t, 0)) != TARGET_MEM_REF
+ || TREE_CODE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)) != INTEGER_CST))
 return refs_local_or_readonly_memory_p (TREE_OPERAND (t, 0));
   return false;
 }
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113787.c 
b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
new file mode 100644
index 000..702b6c35fc6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113787.c
@@ -0,0 +1,38 @@
+void foo(int x, int y, int z, int d, int *buf)
+{
+  for(int i = z; i < y-z; ++i)
+for(int j = 0; j < d; ++j)
+  /* buf[x(i+1) + j] = buf[x(i+1)-j-1] */
+  buf[i*x+(x-z+j)] = buf[i*x+(x-z-1-j)];
+}
+
+void bar(int x, int y, int z, int d, int *buf)
+{
+  for(int i = 0; i < d; ++i)
+for(int j = z; j < x-z; ++j)
+  /* buf[j+(y+i)*x] = buf[j+(y-1-i)*x] */
+  buf[j+(y-z+i)*x] = buf[j+(y-z-1-i)*x];
+}
+
+__attribute__((noipa))
+void baz(int x, int y, int d, int *buf)
+{
+  foo(x, y, 0, d, buf);
+  bar(x, y, 0, d, buf);
+}
+
+int main(void)
+{
+  int a[] = { 1, 2, 3 };
+  baz (1, 2, 1, a);
+  /* foo does:
+ buf[1] = buf[0];
+ buf[2] = buf[1];
+
+ bar does:
+ buf[2] = buf[1]; (no-op)
+ so we should have { 1, 1, 1 }.  */
+  for (int i = 0; i < 3; i++)
+if (a[i] != 1)
+  __builtin_abort ();
+}

Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Jeff Law





On 5/16/24 5:58 AM, Richard Biener wrote:

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:



OK.


Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.


For the series, the riscv specific part of course needs riscv approval.
Yea, we'll take a look at it.  Tons of stuff to go through, but this is 
definitely on the list.


jeff

Re: [PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Jeff Law





On 5/16/24 6:03 AM, Richard Biener wrote:

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
You might want to test this against 92539 as well.  There's a nonzero 
chance it'll resolve that one.


jeff

C++ Patch ping - Re: [PATCH] c++: Fix parsing of abstract-declarator starting with ... followed by [ or ( [PR115012]

2024-05-16 Thread Jakub Jelinek

Hi!

I'd like to ping the 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651199.html
patch.

Thanks.

On Thu, May 09, 2024 at 08:12:30PM +0200, Jakub Jelinek wrote:
> The C++26 P2662R3 Pack indexing paper mentions that both GCC
> and MSVC don't handle T...[10] parameter declaration when T
> is a pack.  While that will change meaning in C++26, in C++11 .. C++23
> this ought to be valid.  Also, T...(args) as well.
> 
> The following patch handles those in cp_parser_direct_declarator.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-05-09  Jakub Jelinek  
> 
>   PR c++/115012
>   * parser.cc (cp_parser_direct_declarator): Handle
>   abstract declarator starting with ... followed by [
>   or (.
> 
>   * g++.dg/cpp0x/variadic185.C: New test.
>   * g++.dg/cpp0x/variadic186.C: New test.

Jakub

Re: [PATCH] c++: represent all class non-dep assignments as CALL_EXPR

2024-05-16 Thread Jason Merrill


On 5/15/24 13:55, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linu-xgnu, does this look OK
for trunk?


OK.


-- >8 --

Non-dependent compound assignment expressions are currently represented
as CALL_EXPR to the selected operator@= overload.  Non-dependent simple
assignments on the other hand are still represented as MODOP_EXPR, which
doesn't hold on to the selected overload.

That we need to remember the selected operator@= overload ahead of time
is a correctness thing, because they can be declared at namespace scope
and we don't want to consider later-declared namespace scope overloads
at instantiation time.  This doesn't apply to simple operator= because
it can only be declared at class scope, so it's fine to repeat the name
lookup and overload resolution at instantiation time.  But it still
seems desirable for sake of QoI to also avoid this repeated name lookup
and overload resolution for simple assignments along the lines of
r12-6075-g2decd2cabe5a4f.

To that end, this patch makes us represent non-dependent simple
assignments as CALL_EXPR to the selected operator= overload rather than
as MODOP_EXPR.  In order for is_assignment_op_expr_p to recognize such
CALL_EXPR as an assignment expression, cp_get_fndecl_from_callee needs
to look through templated COMPONENT_REF callee corresponding to a member
function call, otherwise ahead of time -Wparentheses warnings stop
working (e.g. g++.dg/warn/Wparentheses-{32,33}.C).

gcc/cp/ChangeLog:

* call.cc (build_new_op): Pass 'overload' to
cp_build_modify_expr.
* cp-tree.h (cp_build_modify_expr): New overload that
takes a tree* out-parameter.
* pt.cc (tsubst_expr) : Propagate
OPT_Wparentheses warning suppression to the result.
* cvt.cc (cp_get_fndecl_from_callee): Use maybe_get_fns
to extract the FUNCTION_DECL from a callee.
* semantics.cc (is_assignment_op_expr_p): Also recognize
templated operator expressions represented as a CALL_EXPR
to operator=.
* typeck.cc (cp_build_modify_expr): Add 'overload'
out-parameter and pass it to build_new_op.
(build_x_modify_expr): Pass 'overload' to cp_build_modify_expr.
---
  gcc/cp/call.cc   |  2 +-
  gcc/cp/cp-tree.h |  3 +++
  gcc/cp/cvt.cc|  5 +++--
  gcc/cp/pt.cc |  2 ++
  gcc/cp/typeck.cc | 18 ++
  5 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e058da7735f..e3d4cf8949d 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -7473,7 +7473,7 @@ build_new_op (const op_location_t , enum tree_code 
code, int flags,
switch (code)
  {
  case MODIFY_EXPR:
-  return cp_build_modify_expr (loc, arg1, code2, arg2, complain);
+  return cp_build_modify_expr (loc, arg1, code2, arg2, overload, complain);
  
  case INDIRECT_REF:

return cp_build_indirect_ref (loc, arg1, RO_UNARY_STAR, complain);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9a8c8659157..1e565086e80 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8267,6 +8267,9 @@ extern tree cp_build_c_cast   
(location_t, tree, tree,
  extern cp_expr build_x_modify_expr(location_t, tree,
 enum tree_code, tree,
 tree, tsubst_flags_t);
+extern tree cp_build_modify_expr   (location_t, tree,
+enum tree_code, tree,
+tree *, tsubst_flags_t);
  extern tree cp_build_modify_expr  (location_t, tree,
 enum tree_code, tree,
 tsubst_flags_t);
diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index db086c017e8..2f4c0f88694 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1015,8 +1015,9 @@ cp_get_fndecl_from_callee (tree fn, bool fold /* = true 
*/)
return f;
  };
  
-  if (TREE_CODE (fn) == FUNCTION_DECL)

-return fn_or_local_alias (fn);
+  if (tree f = maybe_get_fns (fn))
+if (TREE_CODE (f) == FUNCTION_DECL)
+  return fn_or_local_alias (f);
tree type = TREE_TYPE (fn);
if (type == NULL_TREE || !INDIRECT_TYPE_P (type))
  return NULL_TREE;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 32640f8e946..d83f530ac8d 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21093,6 +21093,8 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
if (warning_suppressed_p (t, OPT_Wpessimizing_move))
  /* This also suppresses -Wredundant-move.  */
  suppress_warning (ret, OPT_Wpessimizing_move);
+   if (warning_suppressed_p (t, OPT_Wparentheses))
+ suppress_warning (STRIP_REFERENCE_REF (ret), OPT_Wparentheses);
  }
  
  	RETURN (ret);

diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
index 5f16994300f..75b696e32e0 100644

Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-05-16 Thread Richard Biener

On Wed, 3 Apr 2024, Chung-Lin Tang wrote:

> Hi Richard, Thomas,
> 
> On 2023/10/30 8:46 PM, Richard Biener wrote:
> >>
> >> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
> >> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
> >> flag.
> >>
> >> The actual optimization then is done in this second patch.  Chung-Lin
> >> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
> >> I don't have much experience with most of the following generic code, so
> >> would appreciate a helping hand, whether that conceptually makes sense as
> >> well as from the implementation point of view:
> 
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes 
> to
> gimplify.cc now)
> 
> I remember this code was an artifact of earlier attempts to allow 
> struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging 
> deeper)
> 
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> cases) for map
> clause decl reference building, so hoping that the variables "happen to be" 
> single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
> appear to be
> a little risky.
> 
> However, for firstprivate pointers processed during omp-low, it appears to be 
> somewhat different.
> (see below description)
> 
> > No, I don't think you can use that flag on non-default-defs, nor
> > preserve it on copying.  So
> > it also doesn't nicely extend to DECLs as done by the patch.  We
> > currently _only_ use it
> > for incoming parameters.  When used on arbitrary code you can get to for 
> > example
> > 
> > ptr1(points-to-readony-memory) = >x;
> > ... access via ptr1 ...
> > ptr2 = >x;
> > ... access via ptr2 ...
> > 
> > where both are your OMP regions differently constrained (the constrain is 
> > on the
> > code in the region, _not_ on the actual protections of the pointed to
> > data, much like
> > for the fortran case).  But now CSE comes along and happily replaces all 
> > ptr2
> > with ptr2 in the second region and ... oops!
> 
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
> the second region"?
> 
> That doesn't happen, because during omp-lower/expand, OMP target regions 
> (which is all that
> this applies currently) is separated into different individual child 
> functions.
> 
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> the first load
> of this pointer)
> 
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[8]);
> r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[12]);
> r = a[12];
>   }
> 
> After omp-expand (before SSA):
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>:
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;

So 'readonly: a[:32]' is put in .omp_data_i->D.2947 in the caller
and extracted here.  And you arrange for 'a.8' to have
DECL_POINTS_TO_READONLY set by "magic"?  Looking at this I wonder
if it would be more useful to "const qualify" (but "really", not
in the C sense) .omp_data_i->D.2947 instead?  Thus have a
FIELD_POINTS_TO_READONLY_MEMORY flag on the FIELD_DECL.

Points-to analysis should then be able to handle this similar to how
it handles loads of restrict qualified pointers.  Well, of course not
as simple since it now adds "qualifiers" to storage since I presume
the same object can be both readonly and not readonly like via

 #pragma acc parallel copyin(readonly: a[:32], a[33:64]) copyout(r)

?  That is, currently there's only one "readonly" object kind in
points-to, that's STRING_CSTs which get all globbed to string_id
and "ignored" for alias purposes since you can't change them.

So possibly you want to combine this with restrict qualifying the
pointer so we know there's no other (read-write) access to the memory
possible.  But then you might get all the good stuff already by
_just_ doing that restrict qualification and ignoring the readonly-ness?

>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
> 
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>:
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
> 
> So actually, the

RE: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Li, Pan2

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 8:13 PM
To: Tamar Christina 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Richard Sandiford 

Subject: Re: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

On Thu, May 16, 2024 at 8:50 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Thursday, May 16, 2024 5:06 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Richard Sandiford
> > ; Pan Li 
> > Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit
> >
> > From: Pan Li 
> >
> > This patch adds early break auto-vectorization support for target which
> > use length on partial vectorization.  Consider this following example:
> >
> > unsigned vect_a[802];
> > unsigned vect_b[802];
> >
> > void test (unsigned x, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> > vect_b[i] = x + i;
> >
> > if (vect_a[i] > x)
> >   break;
> >
> > vect_a[i] = x;
> >   }
> > }
> >
> > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> > And then the IR of RVV looks like below:
> >
> >   ...
> >   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
> >   _55 = (int) _87;
> >   ...
> >   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
> >   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> > {0, ... }, _87, 0);
> >   if (vec_len_mask_72 != { 0, ... })
> > goto ; [5.50%]
> >   else
> > goto ; [94.50%]
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
> >   handling for one or multiple stmt.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
> >   the loop len mask.
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
> >   vect_gen_loop_len_mask for 1 or more stmt(s).
> >   * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
> >   for vect_gen_loop_len_mask.
> >
>
> Thanks, this version looks good to me!
>
> You'll need Richi's review still.

OK.

Thanks,
Richard.

> Cheers,
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/tree-vect-loop.cc  | 27 +++
> >  gcc/tree-vect-stmts.cc | 17 +++--
> >  gcc/tree-vectorizer.h  |  4 
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..83c0544b6aa 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> >return loop_len;
> >  }
> >
> > +/* Generate the tree for the loop len mask and return it.  Given the lens,
> > +   nvectors, vectype, index and factor to gen the len mask as below.
> > +
> > +   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> > +*/
> > +tree
> > +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator 
> > *gsi,
> > + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
> > + unsigned int nvectors, tree vectype, tree stmt,
> > + unsigned int index, unsigned int factor)
> > +{
> > +  tree all_one_mask = build_all_ones_cst (vectype);
> > +  tree all_zero_mask = build_zero_cst (vectype);
> > +  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
> > index,
> > + factor);
> > +  tree bias = build_int_cst (intQI_type_node,
> > +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo));
> > +  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL,
> > "vec_len_mask");
> > +  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
> > + all_one_mask, all_zero_mask, len,
> > + bias);
> > +  gimple_call_set_lhs (call, len_mask);
> > +  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
> > +
> > +  return len_mask;
> > +}
> > +
> >  /* Scale profiling counters by estimation for LOOP which is vectorized
> > by factor VF.
> > If FLAT is true, the loop we started with had unrealistically flat
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index b8a71605f1b..672959501bb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> >
> >vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > +  vec_loop_lens *lens =

Re: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-16 Thread juzhe.zh...@rivai.ai

RISC-V part LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; kito.cheng; tamar.christina; richard.guenther; 
Richard.Sandiford; Pan Li
Subject: [PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite
From: Pan Li 
 
After we supported vectorizable early exit in RISC-V,  we would like to
enable the gcc vect test for vectorizable early test.
 
The vect-early-break_124-pr114403.c failed to vectorize for now.
Because that the __builtin_memcpy with 8 bytes failed to folded into
int64 assignment during ccp1.  We will improve that first and mark
this as xfail for RISC-V.
 
The below tests are passed for this patch:
1. The riscv fully regression tests.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/slp-mask-store-1.c: Add pragma novector as it will
have 2 times LOOP VECTORIZED in RISC-V.
* gcc.dg/vect/vect-early-break_124-pr114403.c: Xfail for the
riscv backend.
* lib/target-supports.exp: Add RISC-V backend.
 
Signed-off-by: Pan Li 
---
gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c  | 2 ++
gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 2 +-
gcc/testsuite/lib/target-supports.exp | 2 ++
3 files changed, 5 insertions(+), 1 deletion(-)
 
diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
index fdd9032da98..2f80bf89e5e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
@@ -28,6 +28,8 @@ main ()
   if (__builtin_memcmp (x, res, sizeof (x)) != 0)
 abort ();
+
+#pragma GCC novector
   for (int i = 0; i < 32; ++i)
 if (flag[i] != 0 && flag[i] != 1)
   abort ();
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 51abf245ccb..101ae1e0eaa 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -2,7 +2,7 @@
/* { dg-require-effective-target vect_early_break_hw } */
/* { dg-require-effective-target vect_long_long } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
#include "tree-vect.h"
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 6f5d477b128..ec9baa4f32a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4099,6 +4099,7 @@ proc check_effective_target_vect_early_break { } {
|| [check_effective_target_arm_v8_neon_ok]
|| [check_effective_target_sse4]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v]
}}]
}
@@ -4114,6 +4115,7 @@ proc check_effective_target_vect_early_break_hw { } {
|| [check_effective_target_arm_v8_neon_hw]
|| [check_sse4_hw_available]
|| [istarget amdgcn-*-*]
+ || [check_effective_target_riscv_v_ok]
}}]
}
-- 
2.34.1

Re: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-16 Thread juzhe.zh...@rivai.ai

RISC-V part LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-05-16 12:05
To: gcc-patches
CC: juzhe.zhong; kito.cheng; tamar.christina; richard.guenther; 
Richard.Sandiford; Pan Li
Subject: [PATCH v2 2/3] RISC-V: Implement vectorizable early exit with 
vcond_mask_len
From: Pan Li 
 
After we support the loop lens for the vectorizable,  we would like to
implement the feature for the RISC-V target.  Given below example:
 
unsigned vect_a[1923];
unsigned vect_b[1923];
 
void test (unsigned limit, int n)
{
  for (int i = 0; i < n; i++)
{
  vect_b[i] = limit + i;
 
  if (vect_a[i] > limit)
{
  ret = vect_b[i];
  return ret;
}
 
  vect_a[i] = limit;
}
}
 
Before this patch:
  ...
.L8:
  swa3,0(a5)
  addiw a0,a0,1
  addi  a4,a4,4
  addi  a5,a5,4
  beq   a1,a0,.L2
.L4:
  swa0,0(a4)
  lwa2,0(a5)
  bleu  a2,a3,.L8
  ret
 
After this patch:
  ...
.L5:
  vsetvli   a5,a3,e8,mf4,ta,ma
  vmv1r.v   v4,v2
  vsetvli   t4,zero,e32,m1,ta,ma
  vmv.v.x   v1,a5
  vadd.vv   v2,v2,v1
  vsetvli   zero,a5,e32,m1,ta,ma
  vadd.vv   v5,v4,v3
  slli  a6,a5,2
  vle32.v   v1,0(t1)
  vmsltu.vv v1,v3,v1
  vcpop.m   t4,v1
  beq   t4,zero,.L4
  vmv.x.s   a4,v4
.L3:
  ...
 
The below tests are passed for this patch:
1. The riscv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md
  (*vcond_mask_len_popcount_):
New pattern of vcond_mask_len_popcount for vector bool mode.
* config/riscv/autovec.md (vcond_mask_len_): New pattern
of vcond_mask_len for vector bool mode.
(cbranch4): New pattern for vector bool mode.
* config/riscv/vector-iterators.md: Add new unspec
  UNSPEC_SELECT_MASK.
* config/riscv/vector.md (@pred_popcount): Add
VLS mode to popcount pattern.
(@pred_popcount): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/early-break-1.c: New test.
* gcc.target/riscv/rvv/autovec/early-break-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec-opt.md   | 33 ++
gcc/config/riscv/autovec.md   | 61 +++
gcc/config/riscv/vector-iterators.md  |  1 +
gcc/config/riscv/vector.md| 18 +++---
.../riscv/rvv/autovec/early-break-1.c | 34 +++
.../riscv/rvv/autovec/early-break-2.c | 37 +++
6 files changed, 175 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/early-break-2.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..04f85d8e455 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,36 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; Optimization pattern for early break auto-vectorization
+;; vcond_mask_len (mask, ones, zeros, len, bias) + vlmax popcount
+;; -> non vlmax popcount (mask, len)
+(define_insn_and_split "*vcond_mask_len_popcount_"
+  [(set (match_operand:P 0 "register_operand")
+(popcount:P
+ (unspec:VB_VLS [
+  (unspec:VB_VLS [
+   (match_operand:VB_VLS 1 "register_operand")
+   (match_operand:VB_VLS 2 "const_1_operand")
+   (match_operand:VB_VLS 3 "const_0_operand")
+   (match_operand 4 "autovec_length_operand")
+   (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK)
+  (match_operand 6 "autovec_length_operand")
+  (const_int 1)
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+riscv_vector::emit_nonvlmax_insn (
+ code_for_pred_popcount (mode, Pmode),
+ riscv_vector::CPOP_OP,
+ operands, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..1ee3c8052fb 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2612,3 +2612,64 @@ (define_expand "rawmemchr"
 DONE;
   }
)
+
+;; =
+;; == Early break auto-vectorization patterns
+;; =
+
+;; vcond_mask_len (mask, 1s, 0s, len, bias)
+;; => mask[i] = mask[i] && i < len ? 1 : 0
+(define_insn_and_split "vcond_mask_len_"
+  [(set (match_operand:VB 0 "register_operand")
+(unspec: VB [
+ (match_operand:VB 1 "register_operand")
+ (match_operand:VB 2 "const_1_operand")
+ (match_operand:VB 3 "const_0_operand")
+ (match_operand 4 "autovec_length_operand")
+ (match_operand 5 "const_0_operand")] UNSPEC_SELECT_MASK))]
+  "TARGET_VECTOR
+   && can_create_pseudo_p ()
+   && riscv_vector::get_vector_mode (Pmode, GET_MODE_NUNITS 
(mode)).exists ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-16 Thread Richard Biener

On Fri, Apr 12, 2024 at 5:07 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

OK if the rs6000 part is approved.

> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index d2786f207b8..5262aa01660 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")

Re: [PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-05-16 Thread Richard Biener

On Fri, Apr 12, 2024 at 10:10 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

Looks good, if the rs6000 part is approved.

> Thanks
> Gui Haochen
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 3174f52ebe8..defb39de95f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")

Re: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-16 Thread Richard Biener

On Thu, May 16, 2024 at 8:50 AM Tamar Christina  wrote:
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Thursday, May 16, 2024 5:06 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com; Richard Sandiford
> > ; Pan Li 
> > Subject: [PATCH v2 1/3] Vect: Support loop len in vectorizable early exit
> >
> > From: Pan Li 
> >
> > This patch adds early break auto-vectorization support for target which
> > use length on partial vectorization.  Consider this following example:
> >
> > unsigned vect_a[802];
> > unsigned vect_b[802];
> >
> > void test (unsigned x, int n)
> > {
> >   for (int i = 0; i < n; i++)
> >   {
> > vect_b[i] = x + i;
> >
> > if (vect_a[i] > x)
> >   break;
> >
> > vect_a[i] = x;
> >   }
> > }
> >
> > We use VCOND_MASK_LEN to simulate the generate (mask && i < len + bias).
> > And then the IR of RVV looks like below:
> >
> >   ...
> >   _87 = .SELECT_VL (ivtmp_85, POLY_INT_CST [32, 32]);
> >   _55 = (int) _87;
> >   ...
> >   mask_patt_6.13_69 = vect_cst__62 < vect__3.12_67;
> >   vec_len_mask_72 = .VCOND_MASK_LEN (mask_patt_6.13_69, { -1, ... }, \
> > {0, ... }, _87, 0);
> >   if (vec_len_mask_72 != { 0, ... })
> > goto ; [5.50%]
> >   else
> > goto ; [94.50%]
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Add loop len
> >   handling for one or multiple stmt.
> >
> > gcc/ChangeLog:
> >
> >   * tree-vect-loop.cc (vect_gen_loop_len_mask): New func to gen
> >   the loop len mask.
> >   * tree-vect-stmts.cc (vectorizable_early_exit): Invoke the
> >   vect_gen_loop_len_mask for 1 or more stmt(s).
> >   * tree-vectorizer.h (vect_gen_loop_len_mask): New func decl
> >   for vect_gen_loop_len_mask.
> >
>
> Thanks, this version looks good to me!
>
> You'll need Richi's review still.

OK.

Thanks,
Richard.

> Cheers,
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/tree-vect-loop.cc  | 27 +++
> >  gcc/tree-vect-stmts.cc | 17 +++--
> >  gcc/tree-vectorizer.h  |  4 
> >  3 files changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 361aec06488..83c0544b6aa 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -11416,6 +11416,33 @@ vect_get_loop_len (loop_vec_info loop_vinfo,
> > gimple_stmt_iterator *gsi,
> >return loop_len;
> >  }
> >
> > +/* Generate the tree for the loop len mask and return it.  Given the lens,
> > +   nvectors, vectype, index and factor to gen the len mask as below.
> > +
> > +   tree len_mask = VCOND_MASK_LEN (compare_mask, ones, zero, len, bias)
> > +*/
> > +tree
> > +vect_gen_loop_len_mask (loop_vec_info loop_vinfo, gimple_stmt_iterator 
> > *gsi,
> > + gimple_stmt_iterator *cond_gsi, vec_loop_lens *lens,
> > + unsigned int nvectors, tree vectype, tree stmt,
> > + unsigned int index, unsigned int factor)
> > +{
> > +  tree all_one_mask = build_all_ones_cst (vectype);
> > +  tree all_zero_mask = build_zero_cst (vectype);
> > +  tree len = vect_get_loop_len (loop_vinfo, gsi, lens, nvectors, vectype, 
> > index,
> > + factor);
> > +  tree bias = build_int_cst (intQI_type_node,
> > +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
> > (loop_vinfo));
> > +  tree len_mask = make_temp_ssa_name (TREE_TYPE (stmt), NULL,
> > "vec_len_mask");
> > +  gcall *call = gimple_build_call_internal (IFN_VCOND_MASK_LEN, 5, stmt,
> > + all_one_mask, all_zero_mask, len,
> > + bias);
> > +  gimple_call_set_lhs (call, len_mask);
> > +  gsi_insert_before (cond_gsi, call, GSI_SAME_STMT);
> > +
> > +  return len_mask;
> > +}
> > +
> >  /* Scale profiling counters by estimation for LOOP which is vectorized
> > by factor VF.
> > If FLAT is true, the loop we started with had unrealistically flat
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index b8a71605f1b..672959501bb 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12895,7 +12895,9 @@ vectorizable_early_exit (vec_info *vinfo,
> > stmt_vec_info stmt_info,
> >  ncopies = vect_get_num_copies (loop_vinfo, vectype);
> >
> >vec_loop_masks *masks = _VINFO_MASKS (loop_vinfo);
> > +  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
> >bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +  bool len_loop_p = LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo);
> >
> >/* Now build the new conditional.  Pattern gimple_conds get dropped 
> > during
> >   codegen so we must replace the original insn.  */
> > @@

[PATCH] tree-optimization/13962 - handle ptr-ptr compares in ptrs_compare_unequal

2024-05-16 Thread Richard Biener

Now that we handle pt.null conservatively we can implement the missing
tracking of constant pool entries (aka STRING_CST) and handle
ptr-ptr compares using points-to info in ptrs_compare_unequal.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Richard.

PR tree-optimization/13962
PR tree-optimization/96564
* tree-ssa-alias.h (pt_solution::const_pool): New flag.
* tree-ssa-alias.cc (ptrs_compare_unequal): Handle pointer-pointer
compares.
(dump_points_to_solution): Dump the const_pool flag, fix guard
of flag dumping.
* gimple-pretty-print.cc (pp_points_to_solution): Likewise.
* tree-ssa-structalias.cc (find_what_var_points_to): Set
the const_pool flag for STRING.
(pt_solution_ior_into): Handle the const_pool flag.
(ipa_escaped_pt): Initialize it.

* gcc.dg/tree-ssa/alias-39.c: New testcase.
* g++.dg/vect/pr68145.cc: Use -fno-tree-pta to avoid UB
to manifest in transforms no longer vectorizing this testcase
for an ICE.
---
 gcc/gimple-pretty-print.cc   |  5 +++-
 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c | 12 ++
 gcc/tree-ssa-alias.cc| 30 
 gcc/tree-ssa-alias.h |  5 
 gcc/tree-ssa-structalias.cc  |  6 ++---
 5 files changed, 50 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-39.c

diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index abda8871f97..a71e1e0efc7 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -822,6 +822,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
 pp_string (buffer, "unit-escaped ");
   if (pt->null)
 pp_string (buffer, "null ");
+  if (pt->const_pool)
+pp_string (buffer, "const-pool ");
   if (pt->vars
   && !bitmap_empty_p (pt->vars))
 {
@@ -838,7 +840,8 @@ pp_points_to_solution (pretty_printer *buffer, const 
pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  pp_string (buffer, " (");
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
new file mode 100644
index 000..3b452893f6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-39.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-forwprop3" } */
+
+static int a, b;
+int foo (int n, int which)
+{
+  void *p = __builtin_malloc (n);
+  void *q = which ?  : 
+  return p == q;
+}
+
+/* { dg-final { scan-tree-dump "return 0;" "forwprop3" } } */
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 96301bbde7f..6d31fc83691 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -484,9 +484,27 @@ ptrs_compare_unequal (tree ptr1, tree ptr2)
}
   return !pt_solution_includes (>pt, obj1);
 }
-
-  /* ???  We'd like to handle ptr1 != NULL and ptr1 != ptr2
- but those require pt.null to be conservatively correct.  */
+  else if (TREE_CODE (ptr1) == SSA_NAME)
+{
+  struct ptr_info_def *pi1 = SSA_NAME_PTR_INFO (ptr1);
+  if (!pi1
+ || pi1->pt.vars_contains_restrict
+ || pi1->pt.vars_contains_interposable)
+   return false;
+  if (integer_zerop (ptr2) && !pi1->pt.null)
+   return true;
+  if (TREE_CODE (ptr2) == SSA_NAME)
+   {
+ struct ptr_info_def *pi2 = SSA_NAME_PTR_INFO (ptr2);
+ if (!pi2
+ || pi2->pt.vars_contains_restrict
+ || pi2->pt.vars_contains_interposable)
+   return false;
+ if ((!pi1->pt.null || !pi2->pt.null)
+ && (!pi1->pt.const_pool || !pi2->pt.const_pool))
+   return !pt_solutions_intersect (>pt, >pt);
+   }
+}
 
   return false;
 }
@@ -636,6 +654,9 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->null)
 fprintf (file, ", points-to NULL");
 
+  if (pt->const_pool)
+fprintf (file, ", points-to const-pool");
+
   if (pt->vars)
 {
   fprintf (file, ", points-to vars: ");
@@ -643,7 +664,8 @@ dump_points_to_solution (FILE *file, struct pt_solution *pt)
   if (pt->vars_contains_nonlocal
  || pt->vars_contains_escaped
  || pt->vars_contains_escaped_heap
- || pt->vars_contains_restrict)
+ || pt->vars_contains_restrict
+ || pt->vars_contains_interposable)
{
  const char *comma = "";
  fprintf (file, " (");
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index b26fffeeb2d..e29dff58375 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -47,6 +47,11 @@ struct GTY(()) pt_solution
  includes memory at address NULL.  */

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Li, Pan2

> For the series, the riscv specific part of course needs riscv approval.

Thanks a lot, have a nice day!

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 7:59 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this 
> patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
> 
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> > Liu, Hongtao 
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for 
> > unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > > ; richard.guent...@gmail.com;
> > > hongtao@intel.com; Pan Li 
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > > scalar
> > > int
> > >
> > > From: Pan Li 
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;succ:   EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >   PR target/51492
> > >   PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >   to the return true switch case(s).
> > >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >   * match.pd: Add unsigned SAT_ADD match(es).
> > >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >   us/ssadd.
> > >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >   extern func decl generated in match.pd match.
> > >   (match_saturation_arith): New func impl to match the saturation 
> > > arith.
> > >   (math_opts_dom_walker::after_dom_children): Try match saturation
> > >   arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li 
> > > ---
> > >  gcc/internal-fn.cc|  1 +
> > >  gcc/internal-fn.def   |  2 ++
> > >  gcc/match.pd  | 51

Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Richard Biener

On Thu, May 16, 2024 at 11:35 AM Li, Pan2  wrote:
>
> > OK.
>
> Thanks Richard for help and coaching. To double confirm, are you OK with this 
> patch only or for the series patch(es) of SAT middle-end?
> Thanks again for reviewing and suggestions.

For the series, the riscv specific part of course needs riscv approval.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 16, 2024 4:10 PM
> To: Li, Pan2 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
> 
> Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
> >
> > > LGTM but you'll need an OK from Richard,
> > > Thanks for working on this!
> >
> > Thanks Tamar for help and coaching, let's wait Richard for a while,!
>
> OK.
>
> Thanks for the patience,
> Richard.
>
> > Pan
> >
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, May 15, 2024 5:12 PM
> > To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> > Liu, Hongtao 
> > Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for 
> > unsigned scalar int
> >
> > Hi Pan,
> >
> > Thanks!
> >
> > > -Original Message-
> > > From: pan2...@intel.com 
> > > Sent: Wednesday, May 15, 2024 3:14 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > > ; richard.guent...@gmail.com;
> > > hongtao@intel.com; Pan Li 
> > > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > > scalar
> > > int
> > >
> > > From: Pan Li 
> > >
> > > This patch would like to add the middle-end presentation for the
> > > saturation add.  Aka set the result of add to the max when overflow.
> > > It will take the pattern similar as below.
> > >
> > > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> > >
> > > Take uint8_t as example, we will have:
> > >
> > > * SAT_ADD (1, 254)   => 255.
> > > * SAT_ADD (1, 255)   => 255.
> > > * SAT_ADD (2, 255)   => 255.
> > > * SAT_ADD (255, 255) => 255.
> > >
> > > Given below example for the unsigned scalar integer uint64_t:
> > >
> > > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > > {
> > >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > > }
> > >
> > > Before this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   long unsigned int _1;
> > >   _Bool _2;
> > >   long unsigned int _3;
> > >   long unsigned int _4;
> > >   uint64_t _7;
> > >   long unsigned int _10;
> > >   __complex__ long unsigned int _11;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> > >   _1 = REALPART_EXPR <_11>;
> > >   _10 = IMAGPART_EXPR <_11>;
> > >   _2 = _10 != 0;
> > >   _3 = (long unsigned int) _2;
> > >   _4 = -_3;
> > >   _7 = _1 | _4;
> > >   return _7;
> > > ;;succ:   EXIT
> > >
> > > }
> > >
> > > After this patch:
> > > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > > {
> > >   uint64_t _7;
> > >
> > > ;;   basic block 2, loop depth 0
> > > ;;pred:   ENTRY
> > >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> > >   return _7;
> > > ;;succ:   EXIT
> > > }
> > >
> > > The below tests are passed for this patch:
> > > 1. The riscv fully regression tests.
> > > 3. The x86 bootstrap tests.
> > > 4. The x86 fully regression tests.
> > >
> > >   PR target/51492
> > >   PR target/112600
> > >
> > > gcc/ChangeLog:
> > >
> > >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > >   to the return true switch case(s).
> > >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > >   * match.pd: Add unsigned SAT_ADD match(es).
> > >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > >   us/ssadd.
> > >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > >   extern func decl generated in match.pd match.
> > >   (match_saturation_arith): New func impl to match the saturation 
> > > arith.
> > >   (math_opts_dom_walker::after_dom_children): Try match saturation
> > >   arith when IOR expr.
> > >
> >
> >  LGTM but you'll need an OK from Richard,
> >
> > Thanks for working on this!
> >
> > Tamar
> >
> > > Signed-off-by: Pan Li 
> > > ---
> > >  gcc/internal-fn.cc|  1 +
> > >  gcc/internal-fn.def   |  2 ++
> > >  gcc/match.pd  | 51 +++
> > >  gcc/optabs.def|  4 +--
> > >  gcc/tree-ssa-math-opts.cc | 32 
> > >  5 files changed, 88 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > > index 0a7053c2286..73045ca8c8c 100644
> > > --- a/gcc/internal-fn.cc
> > > +++ b/gcc/internal-fn.cc
> > > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn

[PATCH] wrong code with points-to and volatile

2024-05-16 Thread Richard Biener

The following fixes points-to analysis which ignores the fact that
volatile qualified refs can result in any pointer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Btw, I noticed this working on ptr-vs-ptr compare simplification
using points-to info and running into gcc.c-torture/execute/pr64242.c

* tree-ssa-structalias.cc (get_constraint_for_1): For
volatile referenced or decls use ANYTHING.

* gcc.dg/tree-ssa/alias-38.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c | 14 ++
 gcc/tree-ssa-structalias.cc  |  7 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/alias-38.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c 
b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
new file mode 100644
index 000..a5c41493473
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/alias-38.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int x;
+int y;
+
+int main ()
+{
+  int *volatile p = 
+  return (p != );
+}
+
+/* { dg-final { scan-tree-dump " != " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "return 1;" "optimized" } } */
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9c63305063c..f0454bea2ea 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -3575,6 +3575,10 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_reference:
   {
+   if (TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
+
switch (TREE_CODE (t))
  {
  case MEM_REF:
@@ -3676,6 +3680,9 @@ get_constraint_for_1 (tree t, vec *results, bool 
address_p,
   }
 case tcc_declaration:
   {
+   if (VAR_P (t) && TREE_THIS_VOLATILE (t))
+ /* Fall back to anything.  */
+ break;
get_constraint_for_ssa_var (t, results, address_p);
return;
   }
-- 
2.35.3

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Andrew Pinski

On Thu, May 16, 2024, 12:55 PM Oleg Endo  wrote:

>
> On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> > On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > >
> > > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis 
> wrote:
> > > >
> > > > If we consider code like:
> > > >
> > > > if (bar1 == x)
> > > >   return foo();
> > > > if (bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > We would like the ifcombine pass to convert this to:
> > > >
> > > > if (bar1 == x || bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > The ifcombine pass can handle this transformation but it is ran very
> early and
> > > > it misses the opportunity because there are two seperate blocks for
> foo().
> > > > The pre pass is good at removing duplicate code and blocks and due
> to that
> > > > running ifcombine again after it can increase the number of
> successful
> > > > conversions.
> > >
> > > I do think we should have something similar to re-running
> > > ssa-ifcombine but I think it should be much later, like after the loop
> > > optimizations are done.
> > > Maybe just a simplified version of it (that does the combining and not
> > > the optimizations part) included in isel or pass_optimize_widening_mul
> > > (which itself should most likely become part of isel or renamed since
> > > it handles more than just widening multiply these days).
> >
> > I've long wished we had a (late?) pass that can also undo if-conversion
> > (basically do what RTL expansion would later do).  Maybe
> > gimple-predicate-analysis.cc (what's used by uninit analysis) can
> > represent mixed CFG + if-converted conditions so we can optimize
> > it and code-gen the condition in a more optimal manner much like
> > we have if-to-switch, switch-conversion and switch-expansion.
> >
> > That said, I agree that re-running ifcombine should be later.  And
> there's
> > still the old task of splitting tail-merging from PRE (and possibly
> making
> > it more effective).
>
> Sorry to butt in, but it might be little bit relevant and caught my
> attention.
>
> I've got this SH patch sitting around
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543
>
> The idea is basically to run an additional loop pass after combine and
> split1.  The main purpose is to hoist constant loads out of loops. Such
> constant loads might be formed (in this particular case) during combine
> transformations.
>
> The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
> plate code copy pasted from other places to get the loop pass setup and
> going.
>
> Any thoughts on this way of doing it?
>

I have been looking at a similar issue on aarch64 for a few cases, csinc
and nand. What I decided to do for nand was not depend on combine in the
end and create a new infrastructure to expand better to rtl from gimple and
maybe even have target specific pattern matching on the gimple level. So
the constant is not part of the other instruction.

I should have a write up/first draft of an implementation by August time
frame or so. The write up will most likely be earlier.

Thanks,
Andrew



>
> Best regards,
> Oleg Endo
>

Re: [PATCH v3] driver: Output to a temp file; rename upon success [PR80182]

2024-05-16 Thread Peter0x44


On 2024-05-16 01:29, Richard Biener wrote:
On Sun, May 12, 2024 at 3:40 PM Peter Damianov  
wrote:


Currently, commands like:
gcc -o file.c -lm
will delete the user's code.

This patch makes the linker write executables to a temp file, and then 
renames
the temp file if successful. This fixes the case above, but has 
limitations.
The source file will still get overwritten if the link "succeeds", 
such as the

case of: gcc -o file.c -lm -r

It's not perfect, but it should hopefully stop some people from 
ruining their

day.


Hmm.  When suggesting this I was originally hoping for this to be 
implemented

in the linker so that it delays opening (and truncating) of the output
file as much as possible.
Ah, okay, I assumed you wanted it in the driver since then it could 
still fix the problem for older linker versions, but it could be a 
problem to sort the linker too.


If we want to do something in the compiler driver then I like the 
filename based
heuristics more.  v3 seems to only address the case of -o specifying 
the linker

output file but of course
The filename heuristics feel like too much hacks for my liking, but 
maybe I don't have a rational reason to feel that way.
I had some trouble figuring exactly which suffixes to reject, obviously 
-S should not reject .s as an output file, but I still don't think I got 
it all correct. I'm also a little worried, perhaps there is some weird 
makefiles or configure scripts out there that do depend on this 
behavior.


gcc -c t.c -o t2.c

or

gcc -S t.c -o t2.c

happily overwrite a source file as well.  For these cases
heuristically rejecting
source file patterns would be better.  As we've shown the rename trick 
when
the link was successful doesn't fully solve the issue.  And I bet some 
people

will claim it isn't an issue at all ...
I don't think there is any easy or nice way to "fully solve the issue", 
especially if you want to consider -c, -E, -S, etc.


One other idea for -c could be refusing to write out the object file if 
there is no elf/coff/pe/macho header, but I don't like it, sounds too 
complex.


That is, I do think the linker itself, as a quality of implementation 
issue,
should avoid truncating the output early.  In fact the BFD linker seems 
to

unlink the output very early:
Agreed. I decided to try some other linkers, lld and mold both don't 
have this issue.
BFD and gold do. I suppose I should open a bug for that, or investigate 
myself.


24937 stat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 lstat("t.c", {st_mode=S_IFREG|0644, st_size=13, ...}) = 0
24937 unlink("t.c") = 0
24937 openat(AT_FDCWD, "t.c", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3

before even opening other inputs or the default linker script.

Richard.


gcc/ChangeLog:
PR driver/80182
* gcc.cc (output_file_temp): New global variable
(driver_handle_option): Create temp file for executable output
(driver::maybe_run_linker): Rename output_file_temp to 
output_file if

the linker ran successfully

Signed-off-by: Peter Damianov 
---

v3: don't attempt to create temp files -> rename for -o /dev/null

 gcc/gcc.cc | 53 +
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 830a4700a87..5e38c6e578a 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2138,6 +2138,11 @@ static int have_E = 0;
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;

+/* We write the output file to a temp file, and rename it if linking
+   is successful. This is to prevent mistakes like: gcc -o file.c -lm 
from

+   deleting the user's code.  */
+static const char *output_file_temp = 0;
+
 /* Pointer to input file name passed in with -truncate.
This file should be truncated after linking. */
 static const char *totruncate_file = 0;
@@ -4610,10 +4615,18 @@ driver_handle_option (struct gcc_options 
*opts,
 #if defined(HAVE_TARGET_EXECUTABLE_SUFFIX) || 
defined(HAVE_TARGET_OBJECT_SUFFIX)

   arg = convert_filename (arg, ! have_c, 0);
 #endif
-  output_file = arg;
+  output_file_temp = output_file = arg;
+  /* If creating an executable, create a temp file for the 
output, unless
+ -o /dev/null was requested. This will later get renamed, if 
the linker

+ succeeds.  */
+  if (!have_c && strcmp (output_file, HOST_BIT_BUCKET) != 0)
+{
+  output_file_temp = make_temp_file ("");
+  record_temp_file (output_file_temp, false, true);
+}
   /* On some systems, ld cannot handle "-o" without a space.  So
 split the option from its argument.  */
-  save_switch ("-o", 1, , validated, true);
+  save_switch ("-o", 1, _file_temp, validated, true);
   return true;

 case OPT_pie:
@@ -9266,22 +9279,30 @@ driver::maybe_run_linker (const char *argv0) 
const

   linker_was_run = (tmp != execution_count);
 }

-  /* If options said don't

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Oleg Endo



On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > 
> > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  
> > wrote:
> > > 
> > > If we consider code like:
> > > 
> > > if (bar1 == x)
> > >   return foo();
> > > if (bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > We would like the ifcombine pass to convert this to:
> > > 
> > > if (bar1 == x || bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > The ifcombine pass can handle this transformation but it is ran very 
> > > early and
> > > it misses the opportunity because there are two seperate blocks for foo().
> > > The pre pass is good at removing duplicate code and blocks and due to that
> > > running ifcombine again after it can increase the number of successful
> > > conversions.
> > 
> > I do think we should have something similar to re-running
> > ssa-ifcombine but I think it should be much later, like after the loop
> > optimizations are done.
> > Maybe just a simplified version of it (that does the combining and not
> > the optimizations part) included in isel or pass_optimize_widening_mul
> > (which itself should most likely become part of isel or renamed since
> > it handles more than just widening multiply these days).
> 
> I've long wished we had a (late?) pass that can also undo if-conversion
> (basically do what RTL expansion would later do).  Maybe
> gimple-predicate-analysis.cc (what's used by uninit analysis) can
> represent mixed CFG + if-converted conditions so we can optimize
> it and code-gen the condition in a more optimal manner much like
> we have if-to-switch, switch-conversion and switch-expansion.
> 
> That said, I agree that re-running ifcombine should be later.  And there's
> still the old task of splitting tail-merging from PRE (and possibly making
> it more effective).

Sorry to butt in, but it might be little bit relevant and caught my
attention.

I've got this SH patch sitting around
https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543

The idea is basically to run an additional loop pass after combine and
split1.  The main purpose is to hoist constant loads out of loops. Such
constant loads might be formed (in this particular case) during combine
transformations.

The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
plate code copy pasted from other places to get the loop pass setup and
going.

Any thoughts on this way of doing it?


Best regards,
Oleg Endo

[wwwdocs] Document reimplementation of GNU threads library on Windows

2024-05-16 Thread Eric Botcazou

... which happened in GCC 13.

Validated with W3C's Validator and applied.

-- 
Eric Botcazoudiff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index e324b782..3ab4a101 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -770,8 +770,17 @@ You may also want to check out our
 
 
 
-
-
+Windows
+
+  The GNU threads library used by the win32 thread model has
+  been reimplemented using direct Win32 API calls, except for the Objective-C
+  specific subset.  It requires Windows XP/Server 2003 or later.  The new
+  implementation also adds the support needed for the C++11 threads, using
+  again direct Win32 API calls; this additional layer requires Windows
+  Vista/Server 2008 or later.  It is recommended to use a recent version of
+  MinGW-W64 in conjunction with the win32 thread model.
+  
+

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Jakub Jelinek

On Thu, May 16, 2024 at 12:14:09PM +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what I saw 
> in:

Reverting a patch (that isn't a reversion) just push git revert.
The important part is not to modify the This reverts commit line from what
git revert created.

> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
> Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
> This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

So, this is just fine.

> and here:
> 
> commit 0c6dd4b0973738ce43e76b468a002ab5eb58aaf4
> Author: YunQiang Su 
> Date:   Mon May 13 14:15:38 2024 +0800
> 
> Revert "MIPS: Support constraint 'w' for MSA instruction"
> 
> This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.

And this too.

What is not fine is hand edit the message:
This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8 because
foo and bar.
You can do that separately, so
This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.
The reversion is because of foo and bar.
Or being further creative:
This reverts commit r13-8390-g9de6ff5ec9a46951d2.
etc.

> commit f6ce85502eb2e4e7bbd9b3c6c1c065a004f8f531
> Author: Hans-Peter Nilsson 
> Date:   Wed May 8 04:11:20 2024 +0200
> 
> Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle
> xpass from combine improvement""
> 
> This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.

This one is not fine.  Our current infrastructure for ChangeLog
generation can't deal with that and there is no agreement what to
write in the ChangeLog for it anyway, whether 2 reversions turn it into
Reapply commit: or 2 Revert: lines?  What happens on 3rd reversion?
So, one needs to manually remove the
This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.
line and supply ChangeLog entry.

For cases like this or the ammended lines (or say if This reverts
commit or (cherry-picked from ) lines refer to invalid commit
the daily DATESTAMP update then fails, I need to add manually
all problematic commits to IGNORED_COMMITS, rerun it by hand and
then write the ChangeLog entries by hand.
See
https://gcc.gnu.org/r13-8764
https://gcc.gnu.org/r15-426
https://gcc.gnu.org/r15-345
https://gcc.gnu.org/r15-344
https://gcc.gnu.org/r15-341
https://gcc.gnu.org/r14-9832
https://gcc.gnu.org/r14-9830
for what I had to do only in April/May for this.

Jakub

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Xi Ruoyao

On Thu, 2024-05-16 at 12:14 +0200, Aldy Hernandez wrote:
> Wait, what's the preferred way of reverting a patch?  I followed what
> I saw in:
> 
> commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
> Author: Jeff Law 
> Date:   Mon May 13 21:42:38 2024 -0600
> 
>     Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"
> 
>     This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

Revert is OK, but revert revert is not.

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651144.html

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Aldy Hernandez

Wait, what's the preferred way of reverting a patch?  I followed what I saw in:

commit 04ee1f788ceaa4c7f777ff3b9441ae076191439c
Author: Jeff Law 
Date:   Mon May 13 21:42:38 2024 -0600

Revert "[PATCH v2 1/3] RISC-V: movmem for RISCV with V extension"

This reverts commit df15eb15b5f820321c81efc75f0af13ff8c0dd5b.

and here:

commit 0c6dd4b0973738ce43e76b468a002ab5eb58aaf4
Author: YunQiang Su 
Date:   Mon May 13 14:15:38 2024 +0800

Revert "MIPS: Support constraint 'w' for MSA instruction"

This reverts commit 9ba01240864ac446052d97692e2199539b7c76d8.

and here:

commit f6ce85502eb2e4e7bbd9b3c6c1c065a004f8f531
Author: Hans-Peter Nilsson 
Date:   Wed May 8 04:11:20 2024 +0200

Revert "Revert "testsuite/gcc.target/cris/pr93372-2.c: Handle
xpass from combine improvement""

This reverts commit 39f81924d88e3cc197fc3df74204c9b5e01e12f7.

etc etc.

Next time, would you like me to add manual changelog entries?

My apologies, I thought what I did was the blessed way of doing things.
Aldy

On Thu, May 16, 2024 at 12:08 PM Jakub Jelinek  wrote:
>
> On Thu, May 16, 2024 at 12:01:01PM +0200, Aldy Hernandez wrote:
> > This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
> > prange
> > support again.
>
> Please don't do this.
> This breaks ChangeLog generation, will need to handle it tomorrow by hand 
> again.
> Both the ammendments to the git (cherry-pick -x or revert) added message
> lines
> This reverts commit COMMITHASH.
> and
> (cherry picked from commit COMMITHASH)
> and revert of revert.
>
> Jakub
>

Re: [COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Jakub Jelinek

On Thu, May 16, 2024 at 12:01:01PM +0200, Aldy Hernandez wrote:
> This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
> prange
> support again.

Please don't do this.
This breaks ChangeLog generation, will need to handle it tomorrow by hand again.
Both the ammendments to the git (cherry-pick -x or revert) added message
lines
This reverts commit COMMITHASH.
and
(cherry picked from commit COMMITHASH)
and revert of revert.

Jakub

[COMMITTED] Revert "Revert: "Enable prange support.""

2024-05-16 Thread Aldy Hernandez

This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables prange
support again.
---
 gcc/gimple-range-cache.cc |  4 ++--
 gcc/gimple-range-fold.cc  |  4 ++--
 gcc/gimple-range-fold.h   |  2 +-
 gcc/gimple-range-infer.cc |  2 +-
 gcc/gimple-range-op.cc|  2 +-
 gcc/gimple-range-path.cc  |  2 +-
 gcc/gimple-ssa-warn-access.cc |  2 +-
 gcc/ipa-cp.h  |  2 +-
 gcc/range-op-ptr.cc   |  4 
 gcc/range-op.cc   | 18 --
 gcc/tree-ssa-structalias.cc   |  2 +-
 gcc/value-range.cc|  1 +
 gcc/value-range.h |  4 ++--
 13 files changed, 18 insertions(+), 31 deletions(-)

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 72ac2552311..bdd2832873a 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -274,10 +274,10 @@ sbr_sparse_bitmap::sbr_sparse_bitmap (tree t, 
vrange_allocator *allocator,
   // Pre-cache zero and non-zero values for pointers.
   if (POINTER_TYPE_P (t))
 {
-  int_range<2> nonzero;
+  prange nonzero;
   nonzero.set_nonzero (t);
   m_range[1] = m_range_allocator->clone (nonzero);
-  int_range<2> zero;
+  prange zero;
   zero.set_zero (t);
   m_range[2] = m_range_allocator->clone (zero);
 }
diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 9c4ad1ee7b9..a9c8c4d03e6 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -597,7 +597,7 @@ fold_using_range::fold_stmt (vrange , gimple *s, 
fur_source , tree name)
   // Process addresses.
   if (gimple_code (s) == GIMPLE_ASSIGN
   && gimple_assign_rhs_code (s) == ADDR_EXPR)
-return range_of_address (as_a  (r), s, src);
+return range_of_address (as_a  (r), s, src);
 
   gimple_range_op_handler handler (s);
   if (handler)
@@ -757,7 +757,7 @@ fold_using_range::range_of_range_op (vrange ,
 // If a range cannot be calculated, set it to VARYING and return true.
 
 bool
-fold_using_range::range_of_address (irange , gimple *stmt, fur_source )
+fold_using_range::range_of_address (prange , gimple *stmt, fur_source )
 {
   gcc_checking_assert (gimple_code (stmt) == GIMPLE_ASSIGN);
   gcc_checking_assert (gimple_assign_rhs_code (stmt) == ADDR_EXPR);
diff --git a/gcc/gimple-range-fold.h b/gcc/gimple-range-fold.h
index 7cbe15d05e5..c7c599bfc93 100644
--- a/gcc/gimple-range-fold.h
+++ b/gcc/gimple-range-fold.h
@@ -157,7 +157,7 @@ protected:
  fur_source );
   bool range_of_call (vrange , gcall *call, fur_source );
   bool range_of_cond_expr (vrange , gassign* cond, fur_source );
-  bool range_of_address (irange , gimple *s, fur_source );
+  bool range_of_address (prange , gimple *s, fur_source );
   bool range_of_phi (vrange , gphi *phi, fur_source );
   void range_of_ssa_name_with_loop_info (vrange &, tree, class loop *, gphi *,
 fur_source );
diff --git a/gcc/gimple-range-infer.cc b/gcc/gimple-range-infer.cc
index c8e8b9b60ac..d5e1aa14275 100644
--- a/gcc/gimple-range-infer.cc
+++ b/gcc/gimple-range-infer.cc
@@ -123,7 +123,7 @@ gimple_infer_range::add_nonzero (tree name)
 {
   if (!gimple_range_ssa_p (name))
 return;
-  int_range<2> nz;
+  prange nz;
   nz.set_nonzero (TREE_TYPE (name));
   add_range (name, nz);
 }
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 7321342b00d..aec3f39ec0e 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1107,7 +1107,7 @@ class cfn_strlen : public range_operator
 {
 public:
   using range_operator::fold_range;
-  virtual bool fold_range (irange , tree type, const irange &,
+  virtual bool fold_range (irange , tree type, const prange &,
   const irange &, relation_trio) const
   {
 wide_int max = irange_val_max (ptrdiff_type_node);
diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 96c6ac6b6a5..f1a12f76144 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -443,7 +443,7 @@ path_range_query::compute_ranges_in_block (basic_block bb)
 void
 path_range_query::adjust_for_non_null_uses (basic_block bb)
 {
-  int_range_max r;
+  prange r;
   bitmap_iterator bi;
   unsigned i;
 
diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 2c10d19e7f3..0cd5b6d6ef4 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -4213,7 +4213,7 @@ pass_waccess::check_pointer_uses (gimple *stmt, tree ptr,
 where the realloc call is known to have failed are valid.
 Ignore pointers that nothing is known about.  Those could
 have escaped along with their nullness.  */
- value_range vr;
+ prange vr;
  if (m_ptr_qry.rvals->range_of_expr (vr, realloc_lhs, use_stmt))
{
  if (vr.zero_p ())
diff --git a/gcc/ipa-cp.h b/gcc/ipa-cp.h
index abeaaa4053e..e62a09f38af 100644
--- a/gcc/ipa-cp.h
+++

[COMMITTED] Use a boolean type when folding conditionals in simplify_using_ranges.

2024-05-16 Thread Aldy Hernandez

In adding some traps for PR114985 I noticed that the conditional
folding code in simplify_using_ranges was using the wrong type.  This
cleans up the oversight.

gcc/ChangeLog:

PR tree-optimization/114985
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Use
boolean type when folding conditionals.
---
 gcc/vr-values.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 0572bf6c8c7..e6ea9592574 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -316,10 +316,9 @@ simplify_using_ranges::fold_cond_with_ops (enum tree_code 
code,
   || !query->range_of_expr (r1, op1, s))
 return NULL_TREE;
 
-  tree type = TREE_TYPE (op0);
   int_range<1> res;
   range_op_handler handler (code);
-  if (handler && handler.fold_range (res, type, r0, r1))
+  if (handler && handler.fold_range (res, boolean_type_node, r0, r1))
 {
   if (res == range_true ())
return boolean_true_node;
-- 
2.45.0

[COMMITTED] Cleanup prange sanity checks.

2024-05-16 Thread Aldy Hernandez

The pointers_handled_p() code was a temporary sanity check, and not
even a good one, since we have a cleaner way of checking type
mismatches with operand_check_p.  This patch removes all the code, and
adds an explicit type check for relational operators, which are the
main problem in PR114985.

Adding this check makes it clear where the type mismatch is happening
in IPA, even without prange.  I've added code to skip the range
folding if the types don't match what the operator expects.  In order
to reproduce the latent bug, just remove the operand_check_p calls.

Tested on x86-64 and ppc64le with and without prange support.

gcc/ChangeLog:

PR tree-optimization/114985
* gimple-range-op.cc: Remove pointers_handled_p.
* ipa-cp.cc (ipa_value_range_from_jfunc): Skip range folding if
operands don't match.
(propagate_vr_across_jump_function): Same.
* range-op-mixed.h: Remove pointers_handled_p and tweak
operand_check_p.
* range-op-ptr.cc (range_operator::pointers_handled_p): Remove.
(pointer_plus_operator::pointers_handled_p): Remove.
(class operator_pointer_diff): Remove pointers_handled_p.
(operator_pointer_diff::pointers_handled_p): Remove.
(operator_identity::pointers_handled_p): Remove.
(operator_cst::pointers_handled_p): Remove.
(operator_cast::pointers_handled_p): Remove.
(operator_min::pointers_handled_p): Remove.
(operator_max::pointers_handled_p): Remove.
(operator_addr_expr::pointers_handled_p): Remove.
(operator_bitwise_and::pointers_handled_p): Remove.
(operator_bitwise_or::pointers_handled_p): Remove.
(operator_equal::pointers_handled_p): Remove.
(operator_not_equal::pointers_handled_p): Remove.
(operator_lt::pointers_handled_p): Remove.
(operator_le::pointers_handled_p): Remove.
(operator_gt::pointers_handled_p): Remove.
(operator_ge::pointers_handled_p): Remove.
* range-op.cc (TRAP_ON_UNHANDLED_POINTER_OPERATORS): Remove.
(range_op_handler::lhs_op1_relation): Remove pointers_handled_p checks.
(range_op_handler::lhs_op2_relation): Same.
(range_op_handler::op1_op2_relation): Same.
* range-op.h: Remove RO_* declarations.
---
 gcc/gimple-range-op.cc |  24 
 gcc/ipa-cp.cc  |  12 ++
 gcc/range-op-mixed.h   |  38 ++
 gcc/range-op-ptr.cc| 259 -
 gcc/range-op.cc|  43 +--
 gcc/range-op.h |  17 ---
 6 files changed, 25 insertions(+), 368 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..7321342b00d 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -329,19 +329,6 @@ public:
 r = lhs;
 return true;
   }
-  virtual bool pointers_handled_p (range_op_dispatch_type type,
-  unsigned dispatch) const
-  {
-switch (type)
-  {
-  case DISPATCH_FOLD_RANGE:
-   return dispatch == RO_PPP;
-  case DISPATCH_OP1_RANGE:
-   return dispatch == RO_PPP;
-  default:
-   return true;
-  }
-  }
 } op_cfn_pass_through_arg1;
 
 // Implement range operator for CFN_BUILT_IN_SIGNBIT.
@@ -1132,17 +1119,6 @@ public:
 r.set (type, wi::zero (TYPE_PRECISION (type)), max - 2);
 return true;
   }
-  virtual bool pointers_handled_p (range_op_dispatch_type type,
-  unsigned dispatch) const
-  {
-switch (type)
-  {
-  case DISPATCH_FOLD_RANGE:
-   return dispatch == RO_IPI;
-  default:
-   return true;
-  }
-  }
 } op_cfn_strlen;
 
 
diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 5781f50c854..09cab761822 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1740,6 +1740,11 @@ ipa_value_range_from_jfunc (vrange ,
 
  if (!handler
  || !op_res.supports_type_p (vr_type)
+ /* Sometimes we try to fold comparison operators using a
+pointer type to hold the result instead of a boolean
+type.  Avoid trapping in the sanity check in
+fold_range until this is fixed.  */
+ || !handler.operand_check_p (vr_type, srcvr.type (), op_vr.type 
())
  || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
op_res.set_varying (vr_type);
 
@@ -2547,6 +2552,13 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 
  if (!handler
  || !ipa_supports_p (operand_type)
+ /* Sometimes we try to fold comparison operators using a
+pointer type to hold the result instead of a boolean
+type.  Avoid trapping in the sanity check in
+fold_range until this is fixed.  */
+ || !handler.operand_check_p (operand_type,
+  src_lats->m_value_range.m_vr.type (),
+

Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-16 Thread Kewen.Lin

Hi,

on 2024/5/16 12:08, Andrew Pinski wrote:
> 
> On Thu, May 16, 2024, 4:09 AM Kewen.Lin  > wrote:
> 
> Hi,
> 
> As the associated test case in PR114846 shows, currently
> with eh_return involved some register restoring for EH
> RETURN DATA in epilogue can clobber the one which holding
> the return value.  Referring to the existing handlings in
> some other targets, this patch makes eh_return expander
> call one new define_insn_and_split eh_return_internal which
> directly calls rs6000_emit_epilogue with epilogue_type
> EPILOGUE_TYPE_EH_RETURN instead of the previous treating
> normal return with crtl->calls_eh_return specially.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
> 
> I'm going to push this next week if no objections.
> 
> 
> 
> Thanks for fixing this for powerpc. I hope my patch for aarch64 gets reviewed 
> soon and it will contain many more testcases. Hopefully someone will fix the 
> arm target too.
> 

Looking forward to that!  Thanks for contributing those new eh-return c-torture
test cases, I just tested all of them on LE, all passed. :)

BR,
Kewen

> Thanks,
> Andrew
> 
> 
> 
> BR,
> Kewen
> -
>         PR target/114846
> 
> gcc/ChangeLog:
> 
>         * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
>         EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
>         now, adjust the relevant handlings on it.
>         * config/rs6000/rs6000.md (eh_return expander): Append by calling
>         gen_eh_return_internal and emit_barrier.
>         (eh_return_internal): New define_insn_and_split, call function
>         rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/powerpc/pr114846.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc           |  7 +++
>  gcc/config/rs6000/rs6000.md                 | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
>  3 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..bd5d56ba002 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    rs6000_stack_t *info = rs6000_stack_info ();
> 
> -  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
> -    epilogue_type = EPILOGUE_TYPE_EH_RETURN;
> -
>    int strategy = info->savres_strategy;
>    bool using_load_multiple = !!(strategy & REST_MULTIPLE);
>    bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
> @@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    /* In the ELFv2 ABI we need to restore all call-saved CR fields from
>       *separate* slots if the routine calls __builtin_eh_return, so
> -     that they can be independently restored by the unwinder.  */
> +     that they can be independently restored by the unwinder.  Since
> +     it is for CR fields restoring, it should be done for any epilogue
> +     types (not EPILOGUE_TYPE_EH_RETURN specific).  */
>    if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
>      {
>        int i, cr_off = info->ehcr_offset;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..d4120c3b9ce 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14281,6 +14281,8 @@ (define_expand "eh_return"
>    ""
>  {
>    emit_insn (gen_eh_set_lr (Pmode, operands[0]));
> +  emit_jump_insn (gen_eh_return_internal ());
> +  emit_barrier ();
>    DONE;
>  })
> 
> @@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
>    DONE;
>  })
> 
> +(define_insn_and_split "eh_return_internal"
> +  [(eh_return)]
> +  ""
> +  "#"
> +  "epilogue_completed"
> +  [(const_int 0)]
> +{
> +  if (!TARGET_SCHED_PROLOG)
> +    emit_insn (gen_blockage ());
> +  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
> +  DONE;
> +})
> +
>  (define_insn "prefetch"
>    [(prefetch (match_operand 0 "indexed_or_indirect_address" "a")
>              (match_operand:SI 1 "const_int_operand" "n")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114846.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> new file mode 100644
> index 000..efe2300b73a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* {

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-16 Thread Li, Pan2

> OK.

Thanks Richard for help and coaching. To double confirm, are you OK with this 
patch only or for the series patch(es) of SAT middle-end?
Thanks again for reviewing and suggestions.

Pan

-Original Message-
From: Richard Biener  
Sent: Thursday, May 16, 2024 4:10 PM
To: Li, Pan2 
Cc: Tamar Christina ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao 
Subject: Re: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

On Wed, May 15, 2024 at 1:36 PM Li, Pan2  wrote:
>
> > LGTM but you'll need an OK from Richard,
> > Thanks for working on this!
>
> Thanks Tamar for help and coaching, let's wait Richard for a while,!

OK.

Thanks for the patience,
Richard.

> Pan
>
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> scalar int
>
> Hi Pan,
>
> Thanks!
>
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
> > scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> >   PR target/51492
> >   PR target/112600
> >
> > gcc/ChangeLog:
> >
> >   * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> >   to the return true switch case(s).
> >   * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> >   * match.pd: Add unsigned SAT_ADD match(es).
> >   * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> >   us/ssadd.
> >   * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> >   extern func decl generated in match.pd match.
> >   (match_saturation_arith): New func impl to match the saturation arith.
> >   (math_opts_dom_walker::after_dom_children): Try match saturation
> >   arith when IOR expr.
> >
>
>  LGTM but you'll need an OK from Richard,
>
> Thanks for working on this!
>
> Tamar
>
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.cc|  1 +
> >  gcc/internal-fn.def   |  2 ++
> >  gcc/match.pd  | 51 +++
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 32 
> >  5 files changed, 88 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index 0a7053c2286..73045ca8c8c 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -4202,6 +4202,7 @@ commutative_binary_fn_p (internal_fn fn)
> >  case IFN_UBSAN_CHECK_MUL:
> >  case IFN_ADD_OVERFLOW:
> >  case IFN_MUL_OVERFLOW:
> > +case IFN_SAT_ADD:
> >  case IFN_VEC_WIDEN_PLUS:
> >  case IFN_VEC_WIDEN_PLUS_LO:
> >  case IFN_VEC_WIDEN_PLUS_HI:
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 848bb9dbff3..25badbb86e5 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@

[COMMITTED 35/35] ada: Remove obsolete reference in comment

2024-05-16 Thread Marc Poulhiès

From: Eric Botcazou 

gcc/ada/

* exp_ch7.adb (Attach_Object_To_Master_Node): Remove reference to a
transient object in comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index f9738e115f9..993c13c7318 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -798,10 +798,10 @@ package body Exp_Ch7 is
  return;
   end if;
 
-  --  When the transient object is initialized by an aggregate, the
-  --  attachment must occur after the last aggregate assignment takes
-  --  place. Only then is the object considered initialized. Likewise
-  --  if we have a build-in-place call: we must attach only after it.
+  --  When the object is initialized by an aggregate, the attachment must
+  --  occur after the last aggregate assignment takes place; only then is
+  --  the object considered initialized. Likewise if it is initialized by
+  --  a build-in-place call: we must attach only after the call.
 
   if Ekind (Obj_Id) in E_Constant | E_Variable then
  if Present (Last_Aggregate_Assignment (Obj_Id)) then
-- 
2.43.2

[COMMITTED 34/35] ada: Reset scope of top level object declaration during unnesting

2024-05-16 Thread Marc Poulhiès

When unnesting, the compiler gathers elaboration code and wraps it with
a new dedicated procedure. While doing so, it resets the scopes of
entities that are wrapped to point to this new procedure. This change
also resets the scopes of N_Object_Declaration and
N_Object_Renaming_Declaration nodes only if an elaboration procedure
is needed.

gcc/ada/

* exp_ch7.adb (Reset_Scopes_To_Block_Elab_Proc): also reset scope
for object declarations.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 6d76572f405..f9738e115f9 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -3646,9 +3646,10 @@ package body Exp_Ch7 is
   --  unnesting actions, which depend on proper setting of the Scope links
   --  to determine the nesting level of each subprogram.
 
-  ---
-  --  Find_Local_Scope --
-  ---
+  --
+  --  Reset_Scopes_To_Block_Elab_Proc --
+  --
+  Maybe_Reset_Scopes_For_Decl : constant Elist_Id := New_Elmt_List;
 
   procedure Reset_Scopes_To_Block_Elab_Proc (L : List_Id) is
  Id   : Entity_Id;
@@ -3707,7 +3708,8 @@ package body Exp_Ch7 is
  Next (Node);
   end loop;
 
-   --  Reset the Scope of a subprogram occurring at the top level
+   --  Reset the Scope of a subprogram and object declaration
+   --  occurring at the top level
 
when N_Subprogram_Body =>
   Id := Defining_Entity (Stat);
@@ -3715,12 +3717,33 @@ package body Exp_Ch7 is
   Set_Block_Elab_Proc;
   Set_Scope (Id, Block_Elab_Proc);
 
+   when N_Object_Declaration
+ | N_Object_Renaming_Declaration =>
+  Id := Defining_Entity (Stat);
+  if No (Block_Elab_Proc) then
+ Append_Elmt (Id, Maybe_Reset_Scopes_For_Decl);
+  else
+ Set_Scope (Id, Block_Elab_Proc);
+  end if;
+
when others =>
   null;
 end case;
 
 Next (Stat);
  end loop;
+
+ --  If we are creating an Elab procedure, move all the gathered
+ --  declarations in its scope.
+
+ if Present (Block_Elab_Proc) then
+while not Is_Empty_Elmt_List (Maybe_Reset_Scopes_For_Decl) loop
+   Set_Scope
+ (Elists.Node
+   (Last_Elmt (Maybe_Reset_Scopes_For_Decl)), Block_Elab_Proc);
+   Remove_Last_Elmt (Maybe_Reset_Scopes_For_Decl);
+end loop;
+ end if;
   end Reset_Scopes_To_Block_Elab_Proc;
 
   --  Local variables
-- 
2.43.2

[COMMITTED 33/35] ada: Redundant validity checks

2024-05-16 Thread Marc Poulhiès

From: Steve Baird 

In some cases with validity checking enabled via the -gnatVa option,
the compiler generates validity checks that can (obviously) never fail.
These include validity checks for (some) static expressions, and consecutive
identical checks generated for a single read of an object.

gcc/ada/

* checks.adb (Expr_Known_Valid): Return True for a static expression.
* exp_util.adb (Adjust_Condition): No validity check needed for a
condition if it is an expression for which a validity check has
already been generated.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb   | 3 +++
 gcc/ada/exp_util.adb | 5 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 6af392eeda8..bada3dffcbf 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -6839,6 +6839,9 @@ package body Checks is
   then
  return True;
 
+  elsif Is_Static_Expression (Expr) then
+ return True;
+
   --  If the expression is the value of an object that is known to be
   --  valid, then clearly the expression value itself is valid.
 
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 057cf3ebc48..b71f7739481 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -416,6 +416,11 @@ package body Exp_Util is
  if Validity_Checks_On
and then
  (Validity_Check_Tests or else Is_Hardbool_Type (T))
+
+   --  no check needed here if validity has already been checked
+   and then not
+ (Validity_Check_Operands and then
+   (Nkind (N) in N_Op or else Nkind (Parent (N)) in N_Op))
  then
 Ensure_Valid (N);
  end if;
-- 
2.43.2

[COMMITTED 32/35] ada: Exception on Indefinite_Vector aggregate with loop_parameter_specification

2024-05-16 Thread Marc Poulhiès

From: Gary Dismukes 

Constraint_Error is raised on evaluation of a container aggregate with
a loop_parameter_specification for the type Indefinite_Vector. This
happens due to the Aggregate aspect for type Indefinite_Vector specifying
the Empty_Vector constant for the type's Empty operation rather than
using the type's primitive Empty function. This problem shows up as
a recent regression relative to earlier compilers, evidently due to
recent fixes in the container aggregate area, which uncovered this
issue of the wrong specification in Ada.Containers.Indefinite_Vectors.
The compiler incorrectly initializes the aggregate object using the
Empty_Vector constant rather than invoking the New_Vector function
to allocate the vector object with the appropriate number of elements,
and subsequent calls to Replace_Element fail because the vector object
is empty.

In addition to correcting the Indefinite_Vectors generic package,
checking is added to give an error for an attempt to specify the
Empty operation as a constant rather than a function. (Also note
that another AdaCore package that needs a similar correction is
the VSS.Vector_Strings package.)

gcc/ada/

* libgnat/a-coinve.ads (type Vector): In the Aggregate aspect for
this type, the Empty operation is changed to denote the Empty
function, rather than the Empty_Vector constant.
* exp_aggr.adb (Expand_Container_Aggregate): Remove code for
handling the case where the Empty_Subp denotes a constant object,
which should never happen (and add an assertion that Empty_Subp
must denote a function).
* sem_ch13.adb (Valid_Empty): No longer allow the entity to be an
E_Constant, and require the (optional) parameter of an Empty
function to be of a signed integer type (rather than any integer
type).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 24 +---
 gcc/ada/libgnat/a-coinve.ads |  2 +-
 gcc/ada/sem_ch13.adb |  5 +
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index f04dba719d9..5d2b334722a 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -7119,10 +7119,12 @@ package body Exp_Aggr is
  Append (Init_Stat, Aggr_Code);
 
   --  The container will grow dynamically. Create a declaration for
-  --  the object, and initialize it either from a call to the Empty
-  --  function, or from the Empty constant.
+  --  the object, and initialize it from a call to the parameterless
+  --  Empty function.
 
   else
+ pragma Assert (Ekind (Entity (Empty_Subp)) = E_Function);
+
  Decl :=
Make_Object_Declaration (Loc,
  Defining_Identifier => Temp,
@@ -7130,20 +7132,12 @@ package body Exp_Aggr is
 
  Insert_Action (N, Decl);
 
- --  The Empty entity is either a parameterless function, or
- --  a constant.
-
- if Ekind (Entity (Empty_Subp)) = E_Function then
-Init_Stat := Make_Assignment_Statement (Loc,
-  Name => New_Occurrence_Of (Temp, Loc),
-  Expression => Make_Function_Call (Loc,
-Name => New_Occurrence_Of (Entity (Empty_Subp), Loc)));
+ --  The Empty entity is a parameterless function
 
- else
-Init_Stat := Make_Assignment_Statement (Loc,
-  Name => New_Occurrence_Of (Temp, Loc),
-  Expression => New_Occurrence_Of (Entity (Empty_Subp), Loc));
- end if;
+ Init_Stat := Make_Assignment_Statement (Loc,
+   Name => New_Occurrence_Of (Temp, Loc),
+   Expression => Make_Function_Call (Loc,
+ Name => New_Occurrence_Of (Entity (Empty_Subp), Loc)));
 
  Append (Init_Stat, Aggr_Code);
   end if;
diff --git a/gcc/ada/libgnat/a-coinve.ads b/gcc/ada/libgnat/a-coinve.ads
index 138ec3641c3..c51ec8aa06d 100644
--- a/gcc/ada/libgnat/a-coinve.ads
+++ b/gcc/ada/libgnat/a-coinve.ads
@@ -63,7 +63,7 @@ is
  Variable_Indexing => Reference,
  Default_Iterator  => Iterate,
  Iterator_Element  => Element_Type,
- Aggregate => (Empty  => Empty_Vector,
+ Aggregate => (Empty  => Empty,
Add_Unnamed=> Append,
New_Indexed=> New_Vector,
Assign_Indexed => Replace_Element);
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 00392ae88eb..13bf93ca548 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -16527,13 +16527,10 @@ package body Sem_Ch13 is
  if Etype (E) /= Typ or else Scope (E) /= Scope (Typ) then
 return False;
 
- elsif Ekind (E) = E_Constant then
-return True;
-
  elsif Ekind (E) = E_Function then
 return No (First_Formal (E))
   or else
-(Is_Integer_Type

[COMMITTED 29/35] ada: Fix missing length checks with case expressions

2024-05-16 Thread Marc Poulhiès

From: Ronan Desplanques 

This fixes an issue where length checks were not generated when the
right-hand side of an assigment involved a case expression.

gcc/ada/

* sem_res.adb (Resolve_Case_Expression): Add length check
insertion.
* exp_ch4.adb (Expand_N_Case_Expression): Add handling of nodes
known to raise Constraint_Error.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 18 ++
 gcc/ada/sem_res.adb |  3 +++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 7a2003691ec..448cd5c82b6 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5098,10 +5098,20 @@ package body Exp_Ch4 is
 
 else
if not Is_Copy_Type (Typ) then
-  Alt_Expr :=
-Make_Attribute_Reference (Alt_Loc,
-  Prefix => Relocate_Node (Alt_Expr),
-  Attribute_Name => Name_Unrestricted_Access);
+  --  It's possible that a call to Apply_Length_Check in
+  --  Resolve_Case_Expression rewrote the dependent expression
+  --  into a N_Raise_Constraint_Error. If that's the case, we
+  --  don't create a reference to Unrestricted_Access, but we
+  --  update the type of the N_Raise_Constraint_Error node.
+
+  if Nkind (Alt_Expr) in N_Raise_Constraint_Error then
+ Set_Etype (Alt_Expr, Target_Typ);
+  else
+ Alt_Expr :=
+   Make_Attribute_Reference (Alt_Loc,
+ Prefix => Relocate_Node (Alt_Expr),
+ Attribute_Name => Name_Unrestricted_Access);
+  end if;
end if;
 
LHS := New_Occurrence_Of (Target, Loc);
diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index 85795ba3a05..d2eca7c5459 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -7438,6 +7438,9 @@ package body Sem_Res is
  if Is_Scalar_Type (Alt_Typ) and then Alt_Typ /= Typ then
 Rewrite (Alt_Expr, Convert_To (Typ, Alt_Expr));
 Analyze_And_Resolve (Alt_Expr, Typ);
+
+ elsif Is_Array_Type (Typ) then
+Apply_Length_Check (Alt_Expr, Typ);
  end if;
 
  Next (Alt);
-- 
2.43.2

[COMMITTED 30/35] ada: Fix reference to RM clause in comment

2024-05-16 Thread Marc Poulhiès

From: Ronan Desplanques 

gcc/ada/

* sem_util.ads (Check_Function_Writable_Actuals): Fix comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.ads | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index 527b1075c3f..99c60ddf708 100644
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -373,7 +373,7 @@ package Sem_Util is
--  call C2 (not including the construct N itself), there is no other name
--  anywhere within a direct constituent of the construct C other than
--  the one containing C2, that is known to refer to the same object (RM
-   --  6.4.1(6.17/3)).
+   --  6.4.1(6.18-6.19)).
 
procedure Check_Implicit_Dereference (N : Node_Id; Typ : Entity_Id);
--  AI05-139-2: Accessors and iterators for containers. This procedure
-- 
2.43.2

[COMMITTED 22/35] ada: No need to follow New_Occurrence_Of with Set_Etype

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Routine New_Occurrence_Of itself sets the Etype of its result; there is
no need to set it explicitly afterwards.

Code cleanup related to fix for attribute 'Old; semantics is unaffected.

gcc/ada/

* exp_ch13.adb (Expand_N_Free_Statement): After analysis, the
new temporary has the type of its Object_Definition and the new
occurrence of this temporary has this type as well; simplify.
* sem_util.adb
(Indirect_Temp_Value): Remove redundant call to Set_Etype;
simplify.
(Is_Access_Type_For_Indirect_Temp): Add missing body header.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch13.adb |  9 ++---
 gcc/ada/sem_util.adb | 11 +++
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/gcc/ada/exp_ch13.adb b/gcc/ada/exp_ch13.adb
index 2d5ee9b6e80..af8c925586c 100644
--- a/gcc/ada/exp_ch13.adb
+++ b/gcc/ada/exp_ch13.adb
@@ -358,21 +358,16 @@ package body Exp_Ch13 is
  declare
 Expr_Typ : constant Entity_Id  := Etype (Expr);
 Loc  : constant Source_Ptr := Sloc (N);
-New_Expr : Node_Id;
-Temp_Id  : Entity_Id;
+Temp_Id  : constant Entity_Id  := Make_Temporary (Loc, 'T');
 
  begin
-Temp_Id := Make_Temporary (Loc, 'T');
 Insert_Action (N,
   Make_Object_Declaration (Loc,
 Defining_Identifier => Temp_Id,
 Object_Definition   => New_Occurrence_Of (Expr_Typ, Loc),
 Expression  => Relocate_Node (Expr)));
 
-New_Expr := New_Occurrence_Of (Temp_Id, Loc);
-Set_Etype (New_Expr, Expr_Typ);
-
-Set_Expression (N, New_Expr);
+Set_Expression (N, New_Occurrence_Of (Temp_Id, Loc));
  end;
   end if;
 
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 766cabfc109..5ebb1319de7 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -31081,8 +31081,7 @@ package body Sem_Util is
  begin
 if Is_Anonymous_Access_Type (Typ) then
--  No indirection in this case; just evaluate the temp.
-   Result := New_Occurrence_Of (Temp, Loc);
-   Set_Etype (Result, Etype (Temp));
+   return New_Occurrence_Of (Temp, Loc);
 
 else
Result := Make_Explicit_Dereference (Loc,
@@ -31101,11 +31100,15 @@ package body Sem_Util is
 
   Set_Etype (Result, Typ);
end if;
-end if;
 
-return Result;
+   return Result;
+end if;
  end Indirect_Temp_Value;
 
+ --
+ -- Is_Access_Type_For_Indirect_Temp --
+ --
+
  function Is_Access_Type_For_Indirect_Temp
(T : Entity_Id) return Boolean is
  begin
-- 
2.43.2

[COMMITTED 28/35] ada: Fix standalone Windows builds of adaint.c

2024-05-16 Thread Marc Poulhiès

From: Sebastian Poeplau 

Define PATH_SEPARATOR and HOST_EXECUTABLE_SUFFIX in standalone MinGW
builds; the definitions normally come from GCC, and the defaults don't
work for native Windows.

gcc/ada/

* adaint.c: New defines for STANDALONE mode.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/adaint.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index 74aa3c4128e..f26d69a1a2a 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -242,6 +242,13 @@ UINT __gnat_current_ccs_encoding;
 #undef DIR_SEPARATOR
 #define DIR_SEPARATOR '\\'
 
+#ifdef STANDALONE
+#undef PATH_SEPARATOR
+#define PATH_SEPARATOR ';'
+#undef HOST_EXECUTABLE_SUFFIX
+#define HOST_EXECUTABLE_SUFFIX ".exe"
+#endif
+
 #else
 #include 
 #include 
-- 
2.43.2

[COMMITTED 24/35] ada: Propagate Program_Error from failed finalization of collection

2024-05-16 Thread Marc Poulhiès

From: Eric Botcazou 

This aligns finalization collections with finalization masters when it comes
to propagating an exception raised by the finalization of a specific object,
by always propagating Program_Error instead of the aforementioned exception.

gcc/ada/

* libgnat/s-finpri.adb (Raise_From_Controlled_Operation): New
declaration of imported procedure moved from...
(Finalize_Master): ...there.
(Finalize): Call Raise_From_Controlled_Operation instead of
Reraise_Occurrence to propagate the exception, if any.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index bd70e582de3..89f5f2952e4 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -37,6 +37,10 @@ with System.Soft_Links;use System.Soft_Links;
 
 package body System.Finalization_Primitives is
 
+   procedure Raise_From_Controlled_Operation (X : Exception_Occurrence);
+   pragma Import (Ada, Raise_From_Controlled_Operation,
+  "__gnat_raise_from_controlled_operation");
+
function To_Collection_Node_Ptr is
  new Ada.Unchecked_Conversion (Address, Collection_Node_Ptr);
 
@@ -297,7 +301,7 @@ package body System.Finalization_Primitives is
   --  If one of the finalization actions raised an exception, reraise it
 
   if Finalization_Exception_Raised then
- Reraise_Occurrence (Exc_Occur);
+ Raise_From_Controlled_Operation (Exc_Occur);
   end if;
end Finalize;
 
@@ -306,12 +310,8 @@ package body System.Finalization_Primitives is
-
 
procedure Finalize_Master (Master : in out Finalization_Master) is
-  procedure Raise_From_Controlled_Operation (X : Exception_Occurrence);
-  pragma Import (Ada, Raise_From_Controlled_Operation,
- "__gnat_raise_from_controlled_operation");
-
-  Finalization_Exception_Raised : Boolean := False;
   Exc_Occur : Exception_Occurrence;
+  Finalization_Exception_Raised : Boolean := False;
   Node  : Master_Node_Ptr;
 
begin
-- 
2.43.2

[COMMITTED 18/35] ada: Fixup one more pattern of broken scope information

2024-05-16 Thread Marc Poulhiès

When an array's initialization contains a `others =>` clause with an
expression that involves finalization, the resulting scope information
is incorrect and can cause crashes with backend (i.e. gnat-llvm) that
also use unnesting. The observable symptom is a nested object
declaration (created by the compiler) within a loop wrapped in a
procedure created by the unnester that has incoherent scope information:
its Scope field points to the scope of the procedure (1 level too high)
and is contained in the entity chain of some entity nested in the
procedure (correct).

The correct solution would be to fix the scope information when it is
created, but this revealed too large of a task with many interaction
with existing code.

This change adds another pattern to the Fixup_Inner_Scopes procedure to
detect the problematic case and fix the scope, "after the facts".

gcc/ada/

* exp_ch7.adb (Unnest_Loop::Fixup_Inner_Scopes): detect a new
problematic pattern and fixup the scope accordingly.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 66 ++---
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 25a7c0b2b46..6d76572f405 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -8809,8 +8809,11 @@ package body Exp_Ch7 is
 
procedure Unnest_Loop (Loop_Stmt : Node_Id) is
 
-  procedure Fixup_Inner_Scopes (Loop_Stmt : Node_Id);
-  --  The loops created by the compiler for array aggregates can have
+  procedure Fixup_Inner_Scopes (Loop_Or_Block : Node_Id);
+  --  This procedure fixes the scope for 2 identified cases of incorrect
+  --  scope information.
+  --
+  --  1) The loops created by the compiler for array aggregates can have
   --  nested finalization procedure when the type of the array components
   --  needs finalization. It has the following form:
 
@@ -8825,7 +8828,7 @@ package body Exp_Ch7 is
   --obj (J4b) := ...;
 
   --  When the compiler creates the N_Block_Statement, it sets its scope to
-  --  the upper scope (the one containing the loop).
+  --  the outer scope (the one containing the loop).
 
   --  The Unnest_Loop procedure moves the N_Loop_Statement inside a new
   --  procedure and correctly sets the scopes for both the new procedure
@@ -8833,25 +8836,68 @@ package body Exp_Ch7 is
   --  leaves the Tree in an incoherent state (i.e. the inner procedure must
   --  have its enclosing procedure in its scope ancestries).
 
-  --  This procedure fixes the scope links.
+  --  2) The second case happens when an object declaration is created
+  --  within a loop used to initialize the 'others' components of an
+  --  aggregate that is nested within a transient scope. When the transient
+  --  scope is removed, the object scope is set to the outer scope. For
+  --  example:
+
+  --  package pack
+  --   ...
+  -- L98s : for J90s in 2 .. 19 loop
+  --B101s : declare
+  --   R92s : aliased some_type;
+  --   ...
+
+  --  The loop L98s was initially wrapped in a transient scope B72s and
+  --  R92s was nested within it. Then the transient scope is removed and
+  --  the scope of R92s is set to 'pack'. And finally, when the unnester
+  --  moves the loop body in a new procedure, R92s's scope is still left
+  --  unchanged.
+
+  --  This procedure finds the two previous patterns and fixes the scope
+  --  information.
 
   --  Another (better) fix would be to have the block scope set to be the
   --  loop entity earlier (when the block is created or when the loop gets
   --  an actual entity set). But unfortunately this proved harder to
   --  implement ???
 
-  procedure Fixup_Inner_Scopes (Loop_Stmt : Node_Id) is
- Stmt  : Node_Id:= First (Statements (Loop_Stmt));
- Loop_Stmt_Ent : constant Entity_Id := Entity (Identifier (Loop_Stmt));
- Ent_To_Fix: Entity_Id;
+  procedure Fixup_Inner_Scopes (Loop_Or_Block : Node_Id) is
+ Stmt  : Node_Id;
+ Loop_Or_Block_Ent : Entity_Id;
+ Ent_To_Fix: Entity_Id;
+ Decl  : Node_Id := Empty;
   begin
+ pragma Assert (Nkind (Loop_Or_Block) in
+   N_Loop_Statement | N_Block_Statement);
+
+ Loop_Or_Block_Ent := Entity (Identifier (Loop_Or_Block));
+ if Nkind (Loop_Or_Block) = N_Loop_Statement then
+Stmt := First (Statements (Loop_Or_Block));
+ else -- N_Block_Statement
+Stmt := First
+  (Statements (Handled_Statement_Sequence (Loop_Or_Block)));
+Decl := First (Declarations (Loop_Or_Block));
+ end if;
+
+ --  Fix scopes for any object declaration found in the block
+ while Present (Decl) loop
+

[COMMITTED 27/35] ada: Avoid checking parameters of protected procedures

2024-05-16 Thread Marc Poulhiès

From: Viljar Indus 

The compiler triggers warnings on generated protected procedures
if the procedure does not have an explicit spec. Instead check
if the body was created for a protected procedure if the spec
is not present.

gcc/ada/

* sem_ch6.adb (Analyze_Subprogram_Body_Helper):
If the spec is not present for a subprogram body then
check if the body definiton was created for a protected
procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index 0a8030cb923..ca40b5479e0 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -4971,8 +4971,11 @@ package body Sem_Ch6 is
  --  Skip the check for subprograms generated for protected subprograms
  --  because it is also done for the protected subprograms themselves.
 
- elsif Present (Spec_Id)
-   and then Present (Protected_Subprogram (Spec_Id))
+ elsif (Present (Spec_Id)
+ and then Present (Protected_Subprogram (Spec_Id)))
+   or else
+ (Acts_As_Spec (N)
+   and then Present (Protected_Subprogram (Body_Id)))
  then
 null;
 
-- 
2.43.2

[COMMITTED 26/35] ada: Ignore ghost nodes in call graph information for dispatching calls

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

When emitting call graph information, we already skipped calls to
ignored ghost entities, but this code was causing crashes (in production
builds) and assertion failures (in development builds), because the
ignored ghost entities are not fully decorated, e.g. when they come from
instances of generic units with default subprograms.

With this patch we skip call graph information for ignored ghost
entities when they are registered, both as explicit calls and as
tagged types that will come with internally generated dispatching
subprograms.

gcc/ada/

* exp_cg.adb (Generate_CG_Output): Remove code for ignored ghost
entities that applied to subprogram calls.
(Register_CG_Node): Skip ignored ghost entities, both calls
and tagged types, when they are registered.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_cg.adb | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/exp_cg.adb b/gcc/ada/exp_cg.adb
index addf1cae32a..91a6d40a6fa 100644
--- a/gcc/ada/exp_cg.adb
+++ b/gcc/ada/exp_cg.adb
@@ -125,14 +125,7 @@ package body Exp_CG is
   for J in Call_Graph_Nodes.First .. Call_Graph_Nodes.Last loop
  N := Call_Graph_Nodes.Table (J);
 
- --  No action needed for subprogram calls removed by the expander
- --  (for example, calls to ignored ghost entities).
-
- if Nkind (N) = N_Null_Statement then
-pragma Assert (Nkind (Original_Node (N)) in N_Subprogram_Call);
-null;
-
- elsif Nkind (N) in N_Subprogram_Call then
+ if Nkind (N) in N_Subprogram_Call then
 Write_Call_Info (N);
 
  else pragma Assert (Nkind (N) = N_Defining_Identifier);
@@ -358,7 +351,13 @@ package body Exp_CG is
 
procedure Register_CG_Node (N : Node_Id) is
begin
-  if Nkind (N) in N_Subprogram_Call then
+  --  Skip ignored ghost calls that will be removed by the expander
+
+  if Is_Ignored_Ghost_Node (N) then
+ null;
+
+  elsif Nkind (N) in N_Subprogram_Call then
+
  if Current_Scope = Main_Unit_Entity
or else Entity_Is_In_Main_Unit (Current_Scope)
  then
-- 
2.43.2

[COMMITTED 19/35] ada: Minor performance improvement for dynamically-allocated controlled objects

2024-05-16 Thread Marc Poulhiès

From: Eric Botcazou 

The values returned by Header_Alignment and Header_Size are known at compile
time and powers of two on almost all platforms, so inlining them by means of
an expression function improves the object code generated for alignment and
size calculations involving them.

gcc/ada/

* libgnat/s-finpri.ads: Add use type clause for Storage_Offset.
(Header_Alignment): Turn into an expression function.
(Header_Size): Likewise.
* libgnat/s-finpri.adb: Remove use type clause for Storage_Offset.
(Header_Alignment): Delete.
(Header_Size): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 20 
 gcc/ada/libgnat/s-finpri.ads |  8 ++--
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 5bd8eeaea22..bd70e582de3 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -37,8 +37,6 @@ with System.Soft_Links;use System.Soft_Links;
 
 package body System.Finalization_Primitives is
 
-   use type System.Storage_Elements.Storage_Offset;
-
function To_Collection_Node_Ptr is
  new Ada.Unchecked_Conversion (Address, Collection_Node_Ptr);
 
@@ -389,24 +387,6 @@ package body System.Finalization_Primitives is
   end if;
end Finalize_Object;
 
-   --
-   -- Header_Alignment --
-   --
-
-   function Header_Alignment return System.Storage_Elements.Storage_Count is
-   begin
-  return Collection_Node'Alignment;
-   end Header_Alignment;
-
-   -
-   -- Header_Size --
-   -
-
-   function Header_Size return System.Storage_Elements.Storage_Count is
-   begin
-  return Collection_Node'Object_Size / Storage_Unit;
-   end Header_Size;
-

-- Initialize --

diff --git a/gcc/ada/libgnat/s-finpri.ads b/gcc/ada/libgnat/s-finpri.ads
index 468aa584958..b0b662ca39c 100644
--- a/gcc/ada/libgnat/s-finpri.ads
+++ b/gcc/ada/libgnat/s-finpri.ads
@@ -39,6 +39,8 @@ with System.Storage_Elements;
 
 package System.Finalization_Primitives with Preelaborate is
 
+   use type System.Storage_Elements.Storage_Offset;
+
type Finalize_Address_Ptr is access procedure (Obj : System.Address);
--  Values of this type denote finalization procedures associated with
--  objects that have controlled parts. For convenience, such objects
@@ -168,10 +170,12 @@ package System.Finalization_Primitives with Preelaborate 
is
--  Calls to the procedure with an object that has already been detached
--  have no effects.
 
-   function Header_Alignment return System.Storage_Elements.Storage_Count;
+   function Header_Alignment return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Alignment);
--  Return the alignment of type Collection_Node as Storage_Count
 
-   function Header_Size return System.Storage_Elements.Storage_Count;
+   function Header_Size return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Object_Size / Storage_Unit);
--  Return the object size of type Collection_Node as Storage_Count
 
 private
-- 
2.43.2

[COMMITTED 25/35] ada: Fix reason code for length check

2024-05-16 Thread Marc Poulhiès

From: Ronan Desplanques 

This patch fixes the reason code used by Apply_Selected_Length_Checks,
which was wrong in some cases when the check could be determined to
always fail at compile time.

gcc/ada/

* checks.adb (Apply_Selected_Length_Checks): Fix reason code.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index 4e3eb502706..6af392eeda8 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -322,7 +322,8 @@ package body Checks is
--  that the access value is non-null, since the checks do not
--  not apply to null access values.
 
-   procedure Install_Static_Check (R_Cno : Node_Id; Loc : Source_Ptr);
+   procedure Install_Static_Check
+ (R_Cno : Node_Id; Loc : Source_Ptr; Reason : RT_Exception_Code);
--  Called by Apply_{Length,Range}_Checks to rewrite the tree with the
--  Constraint_Error node.
 
@@ -3001,7 +3002,7 @@ package body Checks is
 Insert_Action (Insert_Node, R_Cno);
 
  else
-Install_Static_Check (R_Cno, Loc);
+Install_Static_Check (R_Cno, Loc, CE_Range_Check_Failed);
  end if;
   end loop;
end Apply_Range_Check;
@@ -3469,7 +3470,7 @@ package body Checks is
 end if;
 
  else
-Install_Static_Check (R_Cno, Loc);
+Install_Static_Check (R_Cno, Loc, CE_Length_Check_Failed);
  end if;
   end loop;
end Apply_Selected_Length_Checks;
@@ -8692,14 +8693,16 @@ package body Checks is
-- Install_Static_Check --
--
 
-   procedure Install_Static_Check (R_Cno : Node_Id; Loc : Source_Ptr) is
+   procedure Install_Static_Check
+ (R_Cno : Node_Id; Loc : Source_Ptr; Reason : RT_Exception_Code)
+   is
   Stat : constant Boolean   := Is_OK_Static_Expression (R_Cno);
   Typ  : constant Entity_Id := Etype (R_Cno);
 
begin
   Rewrite (R_Cno,
 Make_Raise_Constraint_Error (Loc,
-  Reason => CE_Range_Check_Failed));
+  Reason => Reason));
   Set_Analyzed (R_Cno);
   Set_Etype (R_Cno, Typ);
   Set_Raises_Constraint_Error (R_Cno);
-- 
2.43.2

[COMMITTED 11/35] ada: Follow up fixes for Put_Image/streaming regressions

2024-05-16 Thread Marc Poulhiès

From: Steve Baird 

A recent change to reduce duplication of compiler-generated Put_Image and
streaming subprograms introduced some regressions. The fix for one of them
was incomplete.

gcc/ada/

* exp_attr.adb (Build_And_Insert_Type_Attr_Subp): Further tweaking
of the point where a compiler-generated Put_Image or streaming
subprogram is to be inserted in the tree. If one such subprogram
calls another (as is often the case with, for example, Put_Image
procedures for composite type and for a component type thereof),
then we want to avoid use-before-definition problems that can
result from inserting the caller ahead of the callee.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_attr.adb | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/gcc/ada/exp_attr.adb b/gcc/ada/exp_attr.adb
index e12e8b4a439..03bf4cf329c 100644
--- a/gcc/ada/exp_attr.adb
+++ b/gcc/ada/exp_attr.adb
@@ -1954,6 +1954,44 @@ package body Exp_Attr is
 while Present (Ancestor) loop
if Is_List_Member (Ancestor) then
   Insertion_Point := First (List_Containing (Ancestor));
+
+  --  A hazard to avoid here is use-before-definition
+  --  errors that can result when we have two of these
+  --  subprograms where one calls the other (e.g., given
+  --  Put_Image procedures for a composite type and
+  --  for a component type, the former will often call
+  --  the latter). At the time a subprogram is inserted,
+  --  we know that the one and only call to it is
+  --  somewhere in the subtree rooted at Ancestor.
+  --  So that placement constraint is easy to satisfy.
+  --  But if we construct another subprogram later and
+  --  if that second subprogram calls the first one,
+  --  then we need to be careful not to place the
+  --  second one ahead of the first one. That is the goal
+  --  of this loop. This may need to be revised if it turns
+  --  out that other stuff is being inserted on the list,
+  --  so that the loop terminates too early.
+
+  --  On the other hand, it seems like inserting things
+  --  earlier offers more opportunities for sharing.
+  --  If Ancestor occurs in the statement list of a
+  --  subprogram body (ignore the HSS node for now),
+  --  then perhaps we should look for an insertion site
+  --  in the decl list of the subprogram body and only
+  --  look in the statement list if the decl list is empty.
+  --  Similarly if Ancestor occors in the private decls list
+  --  for a package spec that has a non-empty visible
+  --  decls list. No examples where this would result in more
+  --  sharing and less duplication have been observed, so this
+  --  is just speculation.
+
+  while Insertion_Point /= Ancestor
+and then Nkind (Insertion_Point) = N_Subprogram_Body
+and then not Comes_From_Source (Insertion_Point)
+  loop
+ Next (Insertion_Point);
+  end loop;
+
   pragma Assert (Present (Insertion_Point));
end if;
Ancestor := Parent (Ancestor);
-- 
2.43.2

[COMMITTED 14/35] ada: Fix bogus error on function returning noncontrolling result in private part

2024-05-16 Thread Marc Poulhiès

From: Eric Botcazou 

This occurs in the additional case of RM 3.9.3(10) in Ada 2012, that is to
say the access controlling result, because the implementation does not use
the same (correct) conditions as in the original case.

This factors out these conditions and uses them in both cases, as well as
adjusts the wording of the message in the first case.

gcc/ada/

* sem_ch6.adb (Check_Private_Overriding): Implement the second part
of RM 3.9.3(10) consistently in both cases.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch6.adb | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
index c0bfe873111..0a8030cb923 100644
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -11555,35 +11555,30 @@ package body Sem_Ch6 is
   Incomplete_Or_Partial_View (T);
 
   begin
- if not Overrides_Visible_Function (Partial_View) then
+ if not Overrides_Visible_Function (Partial_View)
+   and then
+ Is_Tagged_Type
+   (if Present (Partial_View) then Partial_View else T)
+ then
 
 --  Here, S is "function ... return T;" declared in
 --  the private part, not overriding some visible
 --  operation. That's illegal in the tagged case
 --  (but not if the private type is untagged).
 
-if ((Present (Partial_View)
-  and then Is_Tagged_Type (Partial_View))
-  or else (No (Partial_View)
-and then Is_Tagged_Type (T)))
-  and then T = Base_Type (Etype (S))
-then
+if T = Base_Type (Etype (S)) then
Error_Msg_N
- ("private function with tagged result must"
+ ("private function with controlling result must"
   & " override visible-part function", S);
Error_Msg_N
  ("\move subprogram to the visible part"
   & " (RM 3.9.3(10))", S);
 
 --  Ada 2012 (AI05-0073): Extend this check to the case
---  of a function whose result subtype is defined by an
---  access_definition designating specific tagged type.
+--  of a function with access result type.
 
 elsif Ekind (Etype (S)) = E_Anonymous_Access_Type
-  and then Is_Tagged_Type (Designated_Type (Etype (S)))
-  and then
-not Is_Class_Wide_Type
-  (Designated_Type (Etype (S)))
+  and then T = Base_Type (Designated_Type (Etype (S)))
   and then Ada_Version >= Ada_2012
 then
Error_Msg_N
-- 
2.43.2

[COMMITTED 16/35] ada: Fix latent alignment issue for dynamically-allocated controlled objects

2024-05-16 Thread Marc Poulhiès

From: Eric Botcazou 

Dynamically-allocated controlled objects are attached to a finalization
collection by means of a hidden header placed right before the object,
which means that the size effectively allocated must naturally account
for the size of this header.  But the allocation must also account for
the alignment of this header in order to have it properly aligned.

gcc/ada/

* libgnat/s-finpri.ads (Header_Alignment): New function.
(Header_Size): Adjust description.
(Master_Node): Put Finalize_Address as first component.
(Collection_Node): Likewise.
* libgnat/s-finpri.adb (Header_Alignment): New function.
(Header_Size): Return the object size in storage units.
* libgnat/s-stposu.ads (Adjust_Controlled_Dereference): Replace
collection node with header in description.
* libgnat/s-stposu.adb (Adjust_Controlled_Dereference): Likewise.
(Allocate_Any_Controlled): Likewise.  Pass the maximum of the
specified alignment and that of the header to the allocator.
(Deallocate_Any_Controlled): Likewise to the deallocator.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb | 11 +-
 gcc/ada/libgnat/s-finpri.ads | 21 +++
 gcc/ada/libgnat/s-stposu.adb | 69 +---
 gcc/ada/libgnat/s-stposu.ads |  2 +-
 4 files changed, 66 insertions(+), 37 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 09f2761a5b9..5bd8eeaea22 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -389,13 +389,22 @@ package body System.Finalization_Primitives is
   end if;
end Finalize_Object;
 
+   --
+   -- Header_Alignment --
+   --
+
+   function Header_Alignment return System.Storage_Elements.Storage_Count is
+   begin
+  return Collection_Node'Alignment;
+   end Header_Alignment;
+
-
-- Header_Size --
-
 
function Header_Size return System.Storage_Elements.Storage_Count is
begin
-  return Collection_Node'Size / Storage_Unit;
+  return Collection_Node'Object_Size / Storage_Unit;
end Header_Size;
 

diff --git a/gcc/ada/libgnat/s-finpri.ads b/gcc/ada/libgnat/s-finpri.ads
index 4ba13dadec0..468aa584958 100644
--- a/gcc/ada/libgnat/s-finpri.ads
+++ b/gcc/ada/libgnat/s-finpri.ads
@@ -168,8 +168,11 @@ package System.Finalization_Primitives with Preelaborate is
--  Calls to the procedure with an object that has already been detached
--  have no effects.
 
+   function Header_Alignment return System.Storage_Elements.Storage_Count;
+   --  Return the alignment of type Collection_Node as Storage_Count
+
function Header_Size return System.Storage_Elements.Storage_Count;
-   --  Return the size of type Collection_Node as Storage_Count
+   --  Return the object size of type Collection_Node as Storage_Count
 
 private
 
@@ -182,11 +185,13 @@ private
 
--  Finalization masters:
 
-   --  Master node type structure
+   --  Master node type structure. Finalize_Address comes first because it is
+   --  an access-to-subprogram and, therefore, might be twice as large and as
+   --  aligned as an access-to-object on some platforms.
 
type Master_Node is record
-  Object_Address   : System.Address   := System.Null_Address;
   Finalize_Address : Finalize_Address_Ptr := null;
+  Object_Address   : System.Address   := System.Null_Address;
   Next : Master_Node_Ptr  := null;
end record;
 
@@ -211,15 +216,17 @@ private
 
--  Finalization collections:
 
-   --  Collection node type structure
+   --  Collection node type structure. Finalize_Address comes first because it
+   --  is an access-to-subprogram and, therefore, might be twice as large and
+   --  as aligned as an access-to-object on some platforms.
 
type Collection_Node is record
-  Enclosing_Collection : Finalization_Collection_Ptr := null;
-  --  A pointer to the collection to which the node is attached
-
   Finalize_Address : Finalize_Address_Ptr := null;
   --  A pointer to the Finalize_Address procedure of the object
 
+  Enclosing_Collection : Finalization_Collection_Ptr := null;
+  --  A pointer to the collection to which the node is attached
+
   Prev : Collection_Node_Ptr := null;
   Next : Collection_Node_Ptr := null;
   --  Collection nodes are managed as a circular doubly-linked list
diff --git a/gcc/ada/libgnat/s-stposu.adb b/gcc/ada/libgnat/s-stposu.adb
index 38dc69f976a..84535d2a506 100644
--- a/gcc/ada/libgnat/s-stposu.adb
+++ b/gcc/ada/libgnat/s-stposu.adb
@@ -56,12 +56,12 @@ package body System.Storage_Pools.Subpools is
   Header_And_Padding : constant Storage_Offset :=
  Header_Size_With_Padding (Alignment);
begin
-  --  Expose the collection node and its padding by shifting

[COMMITTED 21/35] ada: Fix detection of if_expressions that are known on entry

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Fix a small glitch in routine Is_Known_On_Entry, which returned False
for all if_expressions, regardless whether their conditions or dependent
expressions are known on entry.

gcc/ada/

* sem_util.adb (Is_Known_On_Entry): Check whether condition and
dependent expressions of an if_expression are known on entry.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 68e131db606..766cabfc109 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -30784,9 +30784,7 @@ package body Sem_Util is
   return Is_Known_On_Entry (Expression (Expr));
 
when N_If_Expression =>
-  if not All_Exps_Known_On_Entry (Expressions (Expr)) then
- return False;
-  end if;
+  return All_Exps_Known_On_Entry (Expressions (Expr));
 
when N_Case_Expression =>
   if not Is_Known_On_Entry (Expression (Expr)) then
-- 
2.43.2

[COMMITTED 07/35] ada: Remove Aspect_Specifications field from N_Procedure_Specification

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Sync Has_Aspect_Specifications_Flag with the actual flags in the AST.
Code cleanup; behavior is unaffected.

gcc/ada/

* gen_il-gen-gen_nodes.adb (N_Procedure_Specification): Remove
Aspect_Specifications field.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gen_il-gen-gen_nodes.adb | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/ada/gen_il-gen-gen_nodes.adb b/gcc/ada/gen_il-gen-gen_nodes.adb
index f3dc215673a..a7021dc49bb 100644
--- a/gcc/ada/gen_il-gen-gen_nodes.adb
+++ b/gcc/ada/gen_il-gen-gen_nodes.adb
@@ -736,7 +736,6 @@ begin -- Gen_IL.Gen.Gen_Nodes
 Sy (Null_Present, Flag),
 Sy (Must_Override, Flag),
 Sy (Must_Not_Override, Flag),
-Sy (Aspect_Specifications, List_Id, Default_No_List),
 Sm (Null_Statement, Node_Id)));
 
Ab (N_Access_To_Subprogram_Definition, Node_Kind);
-- 
2.43.2

[COMMITTED 12/35] ada: Fix crash with -gnatdJ and -gnatw_q

2024-05-16 Thread Marc Poulhiès

From: Ronan Desplanques 

This commit makes the emission of -gnatw_q warnings pass node information
so as to handle the enclosing subprogram display of -gnatdJ instead of
crashing.

gcc/ada/

* exp_ch4.adb (Expand_Composite_Equality): Call Error_Msg_N
instead of Error_Msg.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 762e75616a7..7a2003691ec 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -2340,12 +2340,12 @@ package body Exp_Ch4 is
pragma Assert
  (Is_First_Subtype (Outer_Type)
or else Is_Generic_Actual_Type (Outer_Type));
-   Error_Msg_Node_1 := Outer_Type;
Error_Msg_Node_2 := Comp_Type;
-   Error_Msg
- ("?_q?""="" for type & uses predefined ""="" for }", Loc);
+   Error_Msg_N
+ ("?_q?""="" for type & uses predefined ""="" for }",
+  Outer_Type);
Error_Msg_Sloc := Sloc (Op);
-   Error_Msg ("\?_q?""="" # is ignored here", Loc);
+   Error_Msg_N ("\?_q?""="" # is ignored here", Outer_Type);
 end if;
  end;
 
-- 
2.43.2

[COMMITTED 20/35] ada: Fix comments about Get_Ranged_Checks

2024-05-16 Thread Marc Poulhiès

From: Ronan Desplanques 

Checks.Get_Ranged_Checks was onced named Range_Check, and a few
comments referred to it by that name before this commit. To avoid
confusion with Types.Range_Check, this commits fixes those comments.

gcc/ada/

* checks.ads: Fix comments.
* checks.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/checks.adb | 4 ++--
 gcc/ada/checks.ads | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
index c81482a7b05..4e3eb502706 100644
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -346,7 +346,7 @@ package body Checks is
   Warn_Node  : Node_Id) return Check_Result;
--  Like Apply_Selected_Length_Checks, except it doesn't modify
--  anything, just returns a list of nodes as described in the spec of
-   --  this package for the Range_Check function.
+   --  this package for the Get_Range_Checks function.
--  ??? In fact it does construct the test and insert it into the tree,
--  and insert actions in various ways (calling Insert_Action directly
--  in particular) so we do not call it in GNATprove mode, contrary to
@@ -359,7 +359,7 @@ package body Checks is
   Warn_Node  : Node_Id) return Check_Result;
--  Like Apply_Range_Check, except it does not modify anything, just
--  returns a list of nodes as described in the spec of this package
-   --  for the Range_Check function.
+   --  for the Get_Range_Checks function.
 
--
-- Access_Checks_Suppressed --
diff --git a/gcc/ada/checks.ads b/gcc/ada/checks.ads
index 36b5fa490fe..010627c3b03 100644
--- a/gcc/ada/checks.ads
+++ b/gcc/ada/checks.ads
@@ -980,7 +980,7 @@ package Checks is
 private
 
type Check_Result is array (Positive range 1 .. 2) of Node_Id;
-   --  There are two cases for the result returned by Range_Check:
+   --  There are two cases for the result returned by Get_Range_Checks:
--
--For the static case the result is one or two nodes that should cause
--a Constraint_Error. Typically these will include Expr itself or the
-- 
2.43.2

[COMMITTED 23/35] ada: Improve recovery from illegal occurrence of 'Old in if_expression

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Fix assertion failure in developer builds which happened when the THEN
expression contains an illegal occurrence of 'Old and the type of the
THEN expression is left as Any_Type, but there is no ELSE expression.

gcc/ada/

* sem_ch4.adb (Analyze_If_Expression): Add guard for
if_expression without an ELSE part.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index b4414a3f7ff..03364dade9f 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -2645,7 +2645,7 @@ package body Sem_Ch4 is
  ("\ELSE expression has}!", Else_Expr, Etype (Else_Expr));
 end if;
 
- else
+ elsif Present (Else_Expr) then
 if Is_Overloaded (Else_Expr) then
Error_Msg_N
  ("no interpretation compatible with type of THEN expression",
-- 
2.43.2

[COMMITTED 15/35] ada: Fix resolving tagged operations in array aggregates

2024-05-16 Thread Marc Poulhiès

From: Viljar Indus 

In the Two_Pass_Aggregate_Expansion we were removing
all of the entity links in the Iterator_Specification
to avoid reusing the same Iterator_Definition in both
loops.

However this approach was also breaking the links to
calls with dot notation that had been transformed to
the regular call notation.

In order to circumvent this, explicitly create new
identifier definitions when copying the
Iterator_Specfications for both of the loops.

gcc/ada/

* exp_aggr.adb (Two_Pass_Aggregate_Expansion):
Explicitly create new Defining_Iterators for both
of the loops.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index bdaca4aab58..f04dba719d9 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -5714,6 +5714,7 @@ package body Exp_Aggr is
  Iter : Node_Id;
  New_Comp : Node_Id;
  One_Loop : Node_Id;
+ Iter_Id  : Entity_Id;
 
  Size_Expr_Code : List_Id;
  Insertion_Code : List_Id := New_List;
@@ -5730,6 +5731,7 @@ package body Exp_Aggr is
 
  while Present (Assoc) loop
 Iter := Iterator_Specification (Assoc);
+Iter_Id := Defining_Identifier (Iter);
 Incr := Make_Assignment_Statement (Loc,
   Name => New_Occurrence_Of (Size_Id, Loc),
   Expression =>
@@ -5737,10 +5739,16 @@ package body Exp_Aggr is
  Left_Opnd  => New_Occurrence_Of (Size_Id, Loc),
  Right_Opnd => Make_Integer_Literal (Loc, 1)));
 
+--  Avoid using the same iterator definition in both loops by
+--  creating a new iterator for each loop and mapping it over the
+--  original iterator references.
+
 One_Loop := Make_Implicit_Loop_Statement (N,
   Iteration_Scheme =>
 Make_Iteration_Scheme (Loc,
-  Iterator_Specification => New_Copy_Tree (Iter)),
+  Iterator_Specification =>
+ New_Copy_Tree (Iter,
+Map => New_Elmt_List (Iter_Id, New_Copy (Iter_Id,
 Statements => New_List (Incr));
 
 Append (One_Loop, Size_Expr_Code);
@@ -5837,6 +5845,7 @@ package body Exp_Aggr is
 
  while Present (Assoc) loop
 Iter := Iterator_Specification (Assoc);
+Iter_Id := Defining_Identifier (Iter);
 New_Comp := Make_Assignment_Statement (Loc,
Name =>
  Make_Indexed_Component (Loc,
@@ -5869,10 +5878,16 @@ package body Exp_Aggr is
   Attribute_Name => Name_Last)),
Then_Statements => New_List (Incr));
 
+--  Avoid using the same iterator definition in both loops by
+--  creating a new iterator for each loop and mapping it over the
+--  original iterator references.
+
 One_Loop := Make_Implicit_Loop_Statement (N,
   Iteration_Scheme =>
 Make_Iteration_Scheme (Loc,
-  Iterator_Specification => Copy_Separate_Tree (Iter)),
+  Iterator_Specification =>
+ New_Copy_Tree (Iter,
+Map => New_Elmt_List (Iter_Id, New_Copy (Iter_Id,
 Statements => New_List (New_Comp, Incr));
 
 Append (One_Loop, Insertion_Code);
-- 
2.43.2

[COMMITTED 06/35] ada: Reuse existing expression when rewriting aspects to pragmas

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_ch13.adb (Analyze_Aspect_Specification): Consistently
reuse existing constant where possible.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch13.adb | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index ce9f15c1491..00392ae88eb 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -1838,7 +1838,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Loc,
  Expression => Conv),
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc;
+ Expression => Ent)));
 
   Decorate (Aspect, Aitem);
   Insert_Pragma (Aitem);
@@ -3099,7 +3099,7 @@ package body Sem_Ch13 is
   Aitem := Make_Aitem_Pragma
 (Pragma_Argument_Associations => New_List (
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc)),
+ Expression => Ent),
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr))),
  Pragma_Name  => Name_Linker_Section);
@@ -3120,7 +3120,7 @@ package body Sem_Ch13 is
   Aitem := Make_Aitem_Pragma
 (Pragma_Argument_Associations => New_List (
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc)),
+ Expression => Ent),
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr))),
  Pragma_Name  => Name_Implemented);
@@ -3439,7 +3439,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Loc,
  Expression => Relocate_Node (Expr)),
Make_Pragma_Argument_Association (Sloc (Expr),
- Expression => New_Occurrence_Of (E, Loc))),
+ Expression => Ent)),
  Pragma_Name  => Nam);
 
   Delay_Required := False;
@@ -3452,7 +3452,7 @@ package body Sem_Ch13 is
Make_Pragma_Argument_Association (Sloc (Expr),
  Expression => Relocate_Node (Expr)),
Make_Pragma_Argument_Association (Loc,
- Expression => New_Occurrence_Of (E, Loc))),
+ Expression => Ent)),
  Pragma_Name  => Name_Warnings);
 
   Decorate (Aspect, Aitem);
-- 
2.43.2

[COMMITTED 09/35] ada: Formal_Derived_Type'Size is not static

2024-05-16 Thread Marc Poulhiès

From: Steve Baird 

In deciding whether a Size attribute reference is static, the compiler could
get confused about whether an implicitly-declared subtype of a generic formal
type is itself a generic formal type, possibly resulting in an assertion
failure and then a bugbox.

gcc/ada/

* sem_attr.adb (Eval_Attribute): Expand existing checks for
generic formal types for which Is_Generic_Type returns False. In
that case, mark the attribute reference as nonstatic.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 65442d45a85..2fa7d7d25d2 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -8685,10 +8685,20 @@ package body Sem_Attr is
   --  If the root type or base type is generic, then we cannot fold. This
   --  test is needed because subtypes of generic types are not always
   --  marked as being generic themselves (which seems odd???)
+  --
+  --  Should this situation be addressed instead by either
+  -- a) setting Is_Generic_Type in more cases
+  --  or b) replacing preceding calls to Is_Generic_Type with calls to
+  --Sem_Util.Some_New_Function
+  --  so that we wouldn't have to deal with these cases here ???
 
   if Is_Generic_Type (P_Root_Type)
 or else Is_Generic_Type (P_Base_Type)
+or else (Present (Associated_Node_For_Itype (P_Base_Type))
+  and then Is_Generic_Type (Defining_Identifier
+ (Associated_Node_For_Itype (P_Base_Type
   then
+ Set_Is_Static_Expression (N, False);
  return;
   end if;
 
-- 
2.43.2

[COMMITTED 13/35] ada: Fix casing of CUDA in error messages

2024-05-16 Thread Marc Poulhiès

From: Piotr Trojanek 

Error messages now capitalize CUDA.

gcc/ada/

* erroutc.adb (Set_Msg_Insertion_Reserved_Word): Fix casing for
CUDA appearing in error message strings.
(Set_Msg_Str): Likewise for CUDA being a part of a Name_Id.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/erroutc.adb | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
index be200e0016e..cef04d5daf2 100644
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -1475,12 +1475,17 @@ package body Erroutc is
   if Name_Len = 2 and then Name_Buffer (1 .. 2) = "RM" then
  Set_Msg_Name_Buffer;
 
+  --  We make a similar exception for CUDA
+
+  elsif Name_Len = 4 and then Name_Buffer (1 .. 4) = "CUDA" then
+ Set_Msg_Name_Buffer;
+
   --  We make a similar exception for SPARK
 
   elsif Name_Len = 5 and then Name_Buffer (1 .. 5) = "SPARK" then
  Set_Msg_Name_Buffer;
 
-  --  Neither RM nor SPARK: case appropriately and add surrounding quotes
+  --  Otherwise, case appropriately and add surrounding quotes
 
   else
  Set_Casing (Keyword_Casing (Flag_Source), All_Lower_Case);
@@ -1608,6 +1613,12 @@ package body Erroutc is
   elsif Text = "Cpp_Vtable" then
  Set_Msg_Str ("CPP_Vtable");
 
+  elsif Text = "Cuda_Device" then
+ Set_Msg_Str ("CUDA_Device");
+
+  elsif Text = "Cuda_Global" then
+ Set_Msg_Str ("CUDA_Global");
+
   elsif Text = "Persistent_Bss" then
  Set_Msg_Str ("Persistent_BSS");
 
-- 
2.43.2

1 2 >

1 - 100 of 126 matches

Mail list logo