RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-17 Thread Li, Pan2
Thanks Tamer for enlightening, will have a try for the ingenious idea!

Pan

-Original Message-
From: Tamar Christina  
Sent: Friday, May 17, 2024 10:46 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com; 
Liu, Hongtao 
Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned 
scalar int

Hi Pan,

> 
> Hi Tamar,
> 
> I am trying to add more shape(s) like below branch version for SAT_ADD. I 
> suspect
> that widening_mul may not be the best place to take care of this shape.
> Because after_dom_children almost works on bb but we actually need to find the
> def/use cross the bb.

It actually already does this, see for example optimize_spaceship which 
optimizes
across basic blocks. However...

> 
> Thus, is there any suggestion for branch shape? Add new simplify to match.pd
> works well but it is not recommended per previous discussion.

The objection previously was not to introduce the IFNs at match.pd, it doesn't
mean we can't use match.pd to force the versions with branches to banchless
code so the existing patterns can deal with them as is.

...in this case something like this:

#if GIMPLE
(simplify
 (cond (ge (plus:c@3 @0 @1) @0) @3 integer_minus_onep)
  (if (direct_internal_fn_supported_p (...))
   (bit_ior @3 (negate (...)
#endif

Works better I think.

That is, for targets we know we can optimize it later on, or do something with 
it
in the vectorizer we canonicalize it.  The reason I have it guarded with the 
IFN is
that some target maintainers objected to replacing the branch code with 
branchless
code as their targets can more optimally deal with branches.

Cheers,
Tamar
> 
> Thanks a lot for help!
> 
> Pan
> 
> ---Source code-
> 
> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint16_t)
> 
> ---Gimple-
> 
> uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
> {
>   short unsigned int _1;
>   uint16_t _2;
> 
>[local count: 1073741824]:
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
> goto ; [65.00%]
>   else
> goto ; [35.00%]
> 
>[local count: 697932184]:
> 
>[local count: 1073741824]:
>   # _2 = PHI <65535(2), _1(3)>
>   return _2;
> }
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> Thanks!
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > to the return true switch case(s).
> > * internal-fn.def (SAT_ADD):  Add new signed optab 

Re: [PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-05-17 Thread Hans-Peter Nilsson
> Date: Wed, 15 May 2024 11:38:58 +0200 (CEST)
> From: Richard Biener 

> The following removes the profile based heuristic limiting sinking
> and instead uses post-dominators to avoid sinking to places that
> are executed under the same conditions as the earlier location which
> the profile based heuristic should have guaranteed as well.
> 
> To avoid regressing this moves the empty-latch check to cover all
> sink cases.
> 
> It also stream-lines the resulting select_best_block a bit but avoids
> adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
> starts execute failing with this on x86_64 with -m32 because the
> (float)i * 9....e-7 compute is sunk across a STOP causing it
> to be no longer spilled and thus the compare failing due to excess
> precision.  The patch adds -ffloat-store to avoid this, following
> other similar testcases.
> 
> This change doesn't fix the testcase in the PR on itself.

It may come as no surprise that this patch (commit
r15-518-g99b1daae18c095) caused a regression for some codes,
for some targets.

I entered PR115144 for the one that came to my attention.
TL;DR: not sure what to do about it, if anything; it
corresponds to the random_bitstring function in
gcc.c-torture/execute/arith-rand-ll.c compiled for cris-elf
with -O2 -march=v10 but there's no regression for coremark.
The random_bitstring code does intense operations on "long
long", i.e. lots of int64_t two-register operations and
arithmetic library calls.

I'd be grateful if you had a quick glance at
random_bitstring in gcc.c-torture/execute/arith-rand-ll.c in
case that code rings a bell related to the r15-518 change;
maybe there's a trivial improvement you see.

brgds, H-P


[COMMITTED] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-17 Thread Xiao Zeng
2024-05-18 09:57  Jeff Law  wrote:
>
>
>
>On 5/15/24 7:55 PM, Xiao Zeng wrote:
>> 1 According to unpriv-isa spec:
>> 
>>    1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008
>>    standard encoding from the lower 16 bits of integer register rs1
>>    to the floating-point register rd, NaN-boxing the result."
>>    1.2 "FMV.W.X moves the single-precision value encoded in IEEE 754-2008
>>    standard encoding from the lower 32 bits of integer register rs1
>>    to the floating-point register rd. The bits are not modified in the
>>    transfer, and in particular, the payloads of non-canonical NaNs are 
>>preserved."
>>
>> 2 When (!TARGET_ZFHMIN == true && TARGET_HARD_FLOAT == true), instruction 
>> needs
>> to be added to complete the Nan-box, as done in
>> "RISC-V: Nan-box the result of movhf on soft-fp16":
>> 
>>
>> 3 Consider the "RISC-V: Nan-box the result of movbf on soft-bf16" in:
>> 
>> It ignores that both hf16 and bf16 are 16bits floating-point.
>>
>> 4 zfbfmin -> zfhmin in:
>> 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.cc (riscv_legitimize_move): Optimize movbf
>> with Nan-boxing value.
>> * config/riscv/riscv.md (*movhf_softfloat_boxing): Expand movbf
>> with Nan-boxing value.
>> (*mov_softfloat_boxing): Ditto.
>> with Nan-boxing value.
>> (*movbf_softfloat_boxing): Delete abandon pattern.
>> ---
>>   gcc/config/riscv/riscv.cc | 15 +--
>>   gcc/config/riscv/riscv.md | 19 +--
>>   2 files changed, 10 insertions(+), 24 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 4067505270e..04513537aad 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -3178,13 +3178,10 @@ riscv_legitimize_move (machine_mode mode, rtx dest, 
>> rtx src)
>>    (set (reg:SI/DI mask) (const_int -65536)
>>    (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 
>>0)))
>>    (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
>> - (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
>> -    UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
>> - */
>> + (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ] 
>> UNSPEC_FMV_FP16_X))
>> +  */
>>  
>> -  if (TARGET_HARD_FLOAT
>> -  && ((!TARGET_ZFHMIN && mode == HFmode)
>> -  || (!TARGET_ZFBFMIN && mode == BFmode))
>> +  if (TARGET_HARD_FLOAT && !TARGET_ZFHMIN && (mode == HFmode || mode == 
>> BFmode)
>We generally prefer not to mix && and || operators on the same line.
>I'd suggest
>
>if (TARGET_HARD_FLOAT
> && !TARGET_ZFHMIN
> && (mode == HFmode || mode == BFmode)
>[ ... ] 
Fixed.

>
>
>> @@ -1959,23 +1958,15 @@
>>  (set_attr "type" "fmove,move,load,store,mtc,mfc")
>>  (set_attr "mode" "")])
>>  
>> -(define_insn "*movhf_softfloat_boxing"
>> -  [(set (match_operand:HF 0 "register_operand"    "=f")
>> -    (unspec:HF [(match_operand:X 1 "register_operand" " r")] 
>> UNSPEC_FMV_SFP16_X))]
>> +(define_insn "*mov_softfloat_boxing"
>> +  [(set (match_operand:HFBF 0 "register_operand"    "=f")
>> +    (unspec:HFBF [(match_operand:X 1 "register_operand" " r")]
>> +UNSPEC_FMV_FP16_X))]
>> "!TARGET_ZFHMIN"
>I think the linter complained about having 8 spaces instead of a tab in
>one of the lines above. 
Fixed.

>
>With those fixes, this is fine for the trunk.
>
>jeff
Thanks
Xiao Zeng



Re: Re: [PATCH] RISC-V: Modify _Bfloat16 to __bf16

2024-05-17 Thread Xiao Zeng
2024-05-18 08:36  Jeff Law  wrote:
>
>
>
>On 5/17/24 2:19 AM, Kito Cheng wrote:
>> LGTM, thanks for fixing this :)
>And just to be clear for Xiao, you can go ahead and commit this patch to
>the trunk. 
> An ACK from Kito, Juzhe, Palmer, Robin or myself
good.

>is all you need for a change that is isolated to RISC-V code.
>
>jeff
Thanks
Xiao Zeng



[COMMITTED] RISC-V: Modify _Bfloat16 to __bf16

2024-05-17 Thread Xiao Zeng
2024-05-17 16:19  Kito Cheng  wrote:
>
>LGTM, thanks for fixing this :) 
1 Passed CI testing:


2 pushed to trunk.
>
>On Fri, May 17, 2024 at 4:05 PM Xiao Zeng  wrote:
>>
>> According to the description in:
>> ,
>> the type representation symbol of BF16 has been corrected.
>>
>> Kito Cheng pointed out relevant information in the email:
>> 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-builtins.cc (riscv_init_builtin_types):
>> Modify _Bfloat16 to __bf16.
>> * config/riscv/riscv.cc (riscv_mangle_type): Ditto.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/_Bfloat16-nanboxing.c: Move to...
>> * gcc.target/riscv/__bf16-nanboxing.c: ...here.
>> * gcc.target/riscv/bf16_arithmetic.c: Modify _Bfloat16 to __bf16.
>> * gcc.target/riscv/bf16_call.c: Ditto.
>> * gcc.target/riscv/bf16_comparison.c: Ditto.
>> * gcc.target/riscv/bf16_float_libcall_convert.c: Ditto.
>> * gcc.target/riscv/bf16_integer_libcall_convert.c: Ditto.
>> ---
>>  gcc/config/riscv/riscv-builtins.cc   |  6 +++---
>>  gcc/config/riscv/riscv.cc    |  2 +-
>>  .../{_Bfloat16-nanboxing.c => __bf16-nanboxing.c}    | 12 ++--
>>  gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c |  6 +++---
>>  gcc/testsuite/gcc.target/riscv/bf16_call.c   |  4 ++--
>>  gcc/testsuite/gcc.target/riscv/bf16_comparison.c |  6 +++---
>>  .../gcc.target/riscv/bf16_float_libcall_convert.c    |  2 +-
>>  .../gcc.target/riscv/bf16_integer_libcall_convert.c  |  2 +-
>>  8 files changed, 20 insertions(+), 20 deletions(-)
>>  rename gcc/testsuite/gcc.target/riscv/{_Bfloat16-nanboxing.c => 
>>__bf16-nanboxing.c} (83%)
>>
>> diff --git a/gcc/config/riscv/riscv-builtins.cc 
>> b/gcc/config/riscv/riscv-builtins.cc
>> index 4c08834288a..dc54e1a59b5 100644
>> --- a/gcc/config/riscv/riscv-builtins.cc
>> +++ b/gcc/config/riscv/riscv-builtins.cc
>> @@ -275,7 +275,7 @@ riscv_init_builtin_types (void)
>>  lang_hooks.types.register_builtin_type (riscv_float16_type_node,
>> "_Float16");
>>
>> -  /* Provide the _Bfloat16 type and bfloat16_type_node if needed.  */
>> +  /* Provide the __bf16 type and bfloat16_type_node if needed.  */
>>    if (!bfloat16_type_node)
>>  {
>>    riscv_bfloat16_type_node = make_node (REAL_TYPE);
>> @@ -286,9 +286,9 @@ riscv_init_builtin_types (void)
>>    else
>>  riscv_bfloat16_type_node = bfloat16_type_node;
>>
>> -  if (!maybe_get_identifier ("_Bfloat16"))
>> +  if (!maybe_get_identifier ("__bf16"))
>>  lang_hooks.types.register_builtin_type (riscv_bfloat16_type_node,
>> -   "_Bfloat16");
>> +   "__bf16");
>>  }
>>
>>  /* Implement TARGET_INIT_BUILTINS.  */
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 4067505270e..cf15a12de3a 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -10262,7 +10262,7 @@ riscv_asan_shadow_offset (void)
>>  static const char *
>>  riscv_mangle_type (const_tree type)
>>  {
>> -  /* Half-precision float, _Float16 is "DF16_" and _Bfloat16 is "DF16b".  */
>> +  /* Half-precision float, _Float16 is "DF16_" and __bf16 is "DF16b".  */
>>    if (SCALAR_FLOAT_TYPE_P (type) && TYPE_PRECISION (type) == 16)
>>  {
>>    if (TYPE_MODE (type) == HFmode)
>> diff --git a/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c 
>> b/gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
>> similarity index 83%
>> rename from gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
>> rename to gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
>> index 11a73d22234..a9a586c98b9 100644
>> --- a/gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
>> +++ b/gcc/testsuite/gcc.target/riscv/__bf16-nanboxing.c
>> @@ -1,14 +1,14 @@
>>  /* { dg-do compile } */
>>  /* { dg-options "-march=rv64ifd -mabi=lp64d -mcmodel=medlow -O" } */
>>
>> -_Bfloat16 gvar = 9.87654;
>> +__bf16 gvar = 9.87654;
>>  union U
>>  {
>>    unsigned short i16;
>> -  _Bfloat16 f16;
>> +  __bf16 f16;
>>  };
>>
>> -_Bfloat16
>> +__bf16
>>  test1 (unsigned short input)
>>  {
>>    union U tmp;
>> @@ -16,19 +16,19 @@ test1 (unsigned short input)
>>    return tmp.f16;
>>  }
>>
>> -_Bfloat16
>> +__bf16
>>  test2 ()
>>  {
>>    return 1.234f;
>>  }
>>
>> -_Bfloat16
>> +__bf16
>>  test3 ()
>>  {
>>    return gvar;
>>  }
>>
>> -_Bfloat16
>> +__bf16
>>  test ()
>>  {
>>    return 0.0f;
>> diff --git a/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c 
>> b/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
>> index 9e485051260..190cc1d574a 100644
>> --- a/gcc/testsuite/gcc.target/riscv/bf16_arithmetic.c
>> +++ 

RE: [PATCH v6] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-17 Thread Li, Pan2
Committed with more comments, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Saturday, May 18, 2024 3:32 AM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v6] RISC-V: Implement IFN SAT_ADD for both the scalar and 
vector

Hi Pan,

all in all LGTM.  Just insignificant nits.

> +void
> +expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
> +{
> +  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
> +}
> +

Do we really need this function?  Or do you want it to be a dispatcher
for later?  If it should do more than just a call, please document.

> +  /* Step-1: sum = x + y  */
> +  if (mode == SImode && mode != Xmode)
> +{ /* Take addw to avoid the sum truncate.  */
> +  rtx simode_sum = gen_reg_rtx (SImode);
> +  riscv_emit_binary (PLUS, simode_sum, x, y);
> +  emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
> +}
> +  else
> +riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);

I would add a top-level comment that the emulation is just
sum = x + y;
if (sum < x)
  sum = TYPE_MAX;
and we can implement the if/then by sltu and or.

No need for another revision, though.

Regards
 Robin


Re: [PATCH] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-17 Thread Jeff Law




On 5/15/24 7:55 PM, Xiao Zeng wrote:

1 According to unpriv-isa spec:

   1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008
   standard encoding from the lower 16 bits of integer register rs1
   to the floating-point register rd, NaN-boxing the result."
   1.2 "FMV.W.X moves the single-precision value encoded in IEEE 754-2008
   standard encoding from the lower 32 bits of integer register rs1
   to the floating-point register rd. The bits are not modified in the
   transfer, and in particular, the payloads of non-canonical NaNs are 
preserved."

2 When (!TARGET_ZFHMIN == true && TARGET_HARD_FLOAT == true), instruction needs
to be added to complete the Nan-box, as done in
"RISC-V: Nan-box the result of movhf on soft-fp16":


3 Consider the "RISC-V: Nan-box the result of movbf on soft-bf16" in:

It ignores that both hf16 and bf16 are 16bits floating-point.

4 zfbfmin -> zfhmin in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_move): Optimize movbf
with Nan-boxing value.
* config/riscv/riscv.md (*movhf_softfloat_boxing): Expand movbf
with Nan-boxing value.
(*mov_softfloat_boxing): Ditto.
with Nan-boxing value.
(*movbf_softfloat_boxing): Delete abandon pattern.
---
  gcc/config/riscv/riscv.cc | 15 +--
  gcc/config/riscv/riscv.md | 19 +--
  2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4067505270e..04513537aad 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3178,13 +3178,10 @@ riscv_legitimize_move (machine_mode mode, rtx dest, rtx 
src)
   (set (reg:SI/DI mask) (const_int -65536)
   (set (reg:SI/DI temp) (zero_extend:SI/DI (subreg:HI (reg:HF/BF src) 0)))
   (set (reg:SI/DI temp) (ior:SI/DI (reg:SI/DI mask) (reg:SI/DI temp)))
- (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ]
-   UNSPEC_FMV_SFP16_X/UNSPEC_FMV_SBF16_X))
- */
+ (set (reg:HF/BF dest) (unspec:HF/BF[ (reg:SI/DI temp) ] 
UNSPEC_FMV_FP16_X))
+  */
  
-  if (TARGET_HARD_FLOAT

-  && ((!TARGET_ZFHMIN && mode == HFmode)
- || (!TARGET_ZFBFMIN && mode == BFmode))
+  if (TARGET_HARD_FLOAT && !TARGET_ZFHMIN && (mode == HFmode || mode == BFmode)
We generally prefer not to mix && and || operators on the same line. 
I'd suggest


if (TARGET_HARD_FLOAT
&& !TARGET_ZFHMIN
&& (mode == HFmode || mode == BFmode)
[ ... ]



@@ -1959,23 +1958,15 @@
 (set_attr "type" "fmove,move,load,store,mtc,mfc")
 (set_attr "mode" "")])
  
-(define_insn "*movhf_softfloat_boxing"

-  [(set (match_operand:HF 0 "register_operand""=f")
-(unspec:HF [(match_operand:X 1 "register_operand" " r")] 
UNSPEC_FMV_SFP16_X))]
+(define_insn "*mov_softfloat_boxing"
+  [(set (match_operand:HFBF 0 "register_operand" "=f")
+(unspec:HFBF [(match_operand:X 1 "register_operand" " r")]
+UNSPEC_FMV_FP16_X))]
"!TARGET_ZFHMIN"
I think the linter complained about having 8 spaces instead of a tab in 
one of the lines above.


With those fixes, this is fine for the trunk.

jeff


Re: [PATCH v1] libstdc++: Optimize removal from unique assoc containers [PR112934]

2024-05-17 Thread Barnabás Pőcze
Hi


2024. március 13., szerda 12:43 keltezéssel, Jonathan Wakely 
 írta:

> On Mon, 11 Mar 2024 at 23:36, Barnabás Pőcze  wrote:
> >
> > Previously, calling erase(key) on both std::map and std::set
> > would execute that same code that std::multi{map,set} would.
> > However, doing that is unnecessary because std::{map,set}
> > guarantee that all elements are unique.
> >
> > It is reasonable to expect that erase(key) is equivalent
> > or better than:
> >
> >   auto it = m.find(key);
> >   if (it != m.end())
> > m.erase(it);
> >
> > However, this was not the case. Fix that by adding a new
> > function _Rb_tree<>::_M_erase_unique() that is essentially
> > equivalent to the above snippet, and use this from both
> > std::map and std::set.
> 
> Hi, this change looks reasonable, thanks for the patch. Please note
> that GCC is currently in "stage 3" of its dev process so this change
> would have to wait until after GCC 14 branches from trunk, due in a
> few weeks.

As far as I can see GCC 14 has been released, so I pulled and ran the test suite
again:

  make check-target-libstdc++-v3 RUNTESTFLAGS="conformance.exp=23_containers/*"

and it reported no failures. Is there anything else I should do?


Regards,
Barnabás Pőcze

> 
> I assume you ran the testsuite with no regressions. Do you have
> benchmarks to show this making a difference?
> 
> 
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/112934
> > * include/bits/stl_tree.h (_Rb_tree<>::_M_erase_unique): Add.
> > * include/bits/stl_map.h (map<>::erase): Use _M_erase_unique.
> > * include/bits/stl_set.h (set<>::erase): Likewise.
> > ---
> >  libstdc++-v3/include/bits/stl_map.h  |  2 +-
> >  libstdc++-v3/include/bits/stl_set.h  |  2 +-
> >  libstdc++-v3/include/bits/stl_tree.h | 17 +
> >  3 files changed, 19 insertions(+), 2 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/stl_map.h 
> > b/libstdc++-v3/include/bits/stl_map.h
> > index ad58a631af5..229643b77fd 100644
> > --- a/libstdc++-v3/include/bits/stl_map.h
> > +++ b/libstdc++-v3/include/bits/stl_map.h
> > @@ -1115,7 +1115,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> > */
> >size_type
> >erase(const key_type& __x)
> > -  { return _M_t.erase(__x); }
> > +  { return _M_t._M_erase_unique(__x); }
> >
> >  #if __cplusplus >= 201103L
> >// _GLIBCXX_RESOLVE_LIB_DEFECTS
> > diff --git a/libstdc++-v3/include/bits/stl_set.h 
> > b/libstdc++-v3/include/bits/stl_set.h
> > index c0eb4dbf65f..51a1717ec62 100644
> > --- a/libstdc++-v3/include/bits/stl_set.h
> > +++ b/libstdc++-v3/include/bits/stl_set.h
> > @@ -684,7 +684,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> > */
> >size_type
> >erase(const key_type& __x)
> > -  { return _M_t.erase(__x); }
> > +  { return _M_t._M_erase_unique(__x); }
> >
> >  #if __cplusplus >= 201103L
> >// _GLIBCXX_RESOLVE_LIB_DEFECTS
> > diff --git a/libstdc++-v3/include/bits/stl_tree.h 
> > b/libstdc++-v3/include/bits/stl_tree.h
> > index 6f470f04f6a..9e80d449c7e 100644
> > --- a/libstdc++-v3/include/bits/stl_tree.h
> > +++ b/libstdc++-v3/include/bits/stl_tree.h
> > @@ -1225,6 +1225,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >size_type
> >erase(const key_type& __x);
> >
> > +  size_type
> > +  _M_erase_unique(const key_type& __x);
> > +
> >  #if __cplusplus >= 201103L
> >// _GLIBCXX_RESOLVE_LIB_DEFECTS
> >// DR 130. Associative erase should return an iterator.
> > @@ -2518,6 +2521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >return __old_size - size();
> >  }
> >
> > +  template > +  typename _Compare, typename _Alloc>
> > +typename _Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::size_type
> > +_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::
> > +_M_erase_unique(const _Key& __x)
> > +{
> > +  iterator __it = find(__x);
> > +  if (__it == end())
> > +   return 0;
> > +
> > +  _M_erase_aux(__it);
> > +  return 1;
> > +}
> > +
> >template >typename _Compare, typename _Alloc>
> >  typename _Rb_tree<_Key, _Val, _KeyOfValue,
> > --
> > 2.44.0
> >
> >
>


Re: [PATCH] RISC-V: Modify _Bfloat16 to __bf16

2024-05-17 Thread Jeff Law




On 5/17/24 2:19 AM, Kito Cheng wrote:

LGTM, thanks for fixing this :)
And just to be clear for Xiao, you can go ahead and commit this patch to 
the trunk.  An ACK from Kito, Juzhe, Palmer, Robin or myself is all you 
need for a change that is isolated to RISC-V code.


jeff



Re: [PATCH] RISC-V: Remove dead perm series code and document.

2024-05-17 Thread Jeff Law




On 5/17/24 9:27 AM, Robin Dapp wrote:

Hi,

with the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead.  This patch removes it and also adds
a function-level comment to shuffle_series_patterns.

Regtested on rv64gcv_zvfh_zvbb.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Document.
(shuffle_extract_and_slide1up_patterns): Remove.

OK.

Jeff



Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread Palmer Dabbelt

On Fri, 17 May 2024 15:37:43 PDT (-0700), juzhe.zh...@rivai.ai wrote:

I think it should be backport to GCC-14 since it is a bug.


Seems reasonable to me -- I guess in theory those extended scalar 
patterns aren't bug fixes and we should split them out, but I don't 
think it's all that big of a deal.  We'd likely just backport them to 
the performance branch anyway, so it's essentially the same on my end.






juzhe.zh...@rivai.ai
 
From: Robin Dapp

Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch

splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice

those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards

Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and

add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx

tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md

index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, vr, 
vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ 

Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread 钟居哲
I think it should be backport to GCC-14 since it is a bug.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  

Re: [PATCH] RISC-V: Add vector popcount, clz, ctz.

2024-05-17 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:26
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vector popcount, clz, ctz.
Hi,
 
this patch adds the zvbb vcpop, vclz and vctz to the autovec machinery
as well as tests for them.  It also changes several non-VLS iterators
to V_VLS iterators for consistency.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ctz2): New expander.
(clz2): Ditto.
* config/riscv/generic-vector-ooo.md: Add bitmanip ops to insn
reservation.
* config/riscv/vector-crypto.md: Add VLS modes to insns.
* config/riscv/vector.md: Add bitmanip ops to mode_idx and other
attributes.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: Adjust check
for zvbb.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-template.h: New test.
---
gcc/config/riscv/autovec.md   | 30 +-
gcc/config/riscv/generic-vector-ooo.md|  2 +-
gcc/config/riscv/vector-crypto.md | 93 ++-
gcc/config/riscv/vector.md| 14 +--
.../gcc.target/riscv/rvv/autovec/unop/clz-1.c |  8 ++
.../riscv/rvv/autovec/unop/clz-run.c  | 36 +++
.../riscv/rvv/autovec/unop/clz-template.h | 21 +
.../gcc.target/riscv/rvv/autovec/unop/ctz-1.c |  8 ++
.../riscv/rvv/autovec/unop/ctz-run.c  | 36 +++
.../riscv/rvv/autovec/unop/ctz-template.h | 21 +
.../riscv/rvv/autovec/unop/popcount-1.c   |  4 +-
.../riscv/rvv/autovec/unop/popcount-2.c   |  4 +-
.../riscv/rvv/autovec/unop/popcount-3.c   |  8 ++
.../riscv/rvv/autovec/unop/popcount-run-1.c   |  3 +-
.../rvv/autovec/unop/popcount-template.h  | 21 +
15 files changed, 250 insertions(+), 59 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..a9391ed146c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1566,7 +1566,7 @@ (define_expand "xorsign3"
})
;; 
---
-;; - [INT] POPCOUNT.
+;; - [INT] POPCOUNT, CTZ and CLZ.
;; 
---
(define_expand "popcount2"
@@ -1574,10 +1574,36 @@ (define_expand "popcount2"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
{
-  riscv_vector::expand_popcount (operands);
+  if (!TARGET_ZVBB)
+riscv_vector::expand_popcount (operands);
+  else
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred_v (POPCOUNT, mode),
+  riscv_vector::CPOP_OP, operands);
+}
   DONE;
})
+(define_expand "ctz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CTZ, mode),
+riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
+(define_expand "clz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CLZ, mode),
+riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
;; -
;;  [INT] Highpart multiplication
diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 96cb1a0be29..5e933c83841 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -74,7 +74,7 @@ (define_insn_reservation "vec_fmul" 6
;; Vector crypto, assumed to be a generic operation for now.
(define_insn_reservation "vec_crypto" 4
-  (eq_attr "type" "crypto")
+  (eq_attr "type" "crypto,vclz,vctz,vcpop")
   "vxu_ooo_issue,vxu_ooo_alu")
;; Vector crypto, AES
diff 

Re: [PATCH] RISC-V: Add vandn combine helper.

2024-05-17 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:26
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vandn combine helper.
Hi,
 
this patch adds a combine pattern for vandn as well as tests for it.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*vandn_): New pattern.
* config/riscv/vector.md: Add vandn to mode_idx.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vandn-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-template.h: New test.
---
gcc/config/riscv/autovec-opt.md   | 18 +++
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/autovec/binop/vandn-1.c |  8 +++
.../riscv/rvv/autovec/binop/vandn-run.c   | 54 +++
.../riscv/rvv/autovec/binop/vandn-template.h  | 38 +
5 files changed, 119 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-template.h
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 06438f9e2f7..07372d965b0 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1559,3 +1559,21 @@ (define_insn_and_split "*vwsll_zext1_trunc_scalar_"
 DONE;
   }
   [(set_attr "type" "vwsll")])
+
+;; vnot + vand = vandn.
+(define_insn_and_split "*vandn_"
+ [(set (match_operand:V_VLSI 0 "register_operand" "=vr")
+   (and:V_VLSI
+(not:V_VLSI
+  (match_operand:V_VLSI  2 "register_operand"  "vr"))
+(match_operand:V_VLSI1 "register_operand"  "vr")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vandn (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vandn")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c6a3845dc13..dafcd7d9bf9 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -743,7 +743,7 @@ (define_attr "mode_idx" ""
vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\
vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\
vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
- vgather,vcompress,vmov,vnclip,vnshift")
+ vgather,vcompress,vmov,vnclip,vnshift,vandn")
   (const_int 0)
   (eq_attr "type" "vimovvx,vfmovvf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
new file mode 100644
index 000..3bb5bf8dd5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+/* { dg-final { scan-assembler-times {\tvandn\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
new file mode 100644
index 000..243c5975068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+#include 
+
+#define SZ 512
+
+#define RUN(TYPE, VAL) 
\
+  TYPE a##TYPE[SZ];
\
+  TYPE b##TYPE[SZ];
\
+  for (int i = 0; i < SZ; i++) 
\
+{  
\
+  a##TYPE[i] = 123;
\
+  b##TYPE[i] = VAL;
\
+}  
\
+  vandn_##TYPE (a##TYPE, a##TYPE, b##TYPE, SZ);
\
+  for (int i = 0; i < SZ; i++) 
\
+assert (a##TYPE[i] == (TYPE) (123 & ~VAL));
+
+#define RUN2(TYPE, VAL)
\
+  TYPE as##TYPE[SZ];   
\
+  for (int i = 0; i < SZ; i++) 
\
+as##TYPE[i] = 123; 

Re: [PATCH] RISC-V: Use widening shift for scatter/gather if applicable.

2024-05-17 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:25
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Use widening shift for scatter/gather if applicable.
Hi,
 
with the zvbb extension we can emit a widening shift for scatter/gather
index preparation in case we need to multiply by 2 and zero extend.
 
The patch also adds vwsll to the mode_idx attribute and removes the
mode from shift-count operand of the insn pattern.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_gather_scatter): Use vwsll if
applicable.
* config/riscv/vector-crypto.md: Remove mode from vwsll shift
count operator.
* config/riscv/vector.md: Add vwsll to mode iterator.
 
gcc/testsuite/ChangeLog:
 
* lib/target-supports.exp: Add zvbb.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: New 
test.
---
gcc/config/riscv/riscv-v.cc   |  42 +--
gcc/config/riscv/vector-crypto.md |   4 +-
gcc/config/riscv/vector.md|   4 +-
.../gather-scatter/gather_load_64-12-zvbb.c   | 113 ++
gcc/testsuite/lib/target-supports.exp |  48 +++-
5 files changed, 193 insertions(+), 18 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..8b41b9c7774 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4016,7 +4016,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
{
   rtx ptr, vec_offset, vec_reg;
   bool zero_extend_p;
-  int scale_log2;
+  int shift;
   rtx mask = ops[5];
   rtx len = ops[6];
   if (is_load)
@@ -4025,7 +4025,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[1];
   vec_offset = ops[2];
   zero_extend_p = INTVAL (ops[3]);
-  scale_log2 = exact_log2 (INTVAL (ops[4]));
+  shift = exact_log2 (INTVAL (ops[4]));
 }
   else
 {
@@ -4033,7 +4033,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[0];
   vec_offset = ops[1];
   zero_extend_p = INTVAL (ops[2]);
-  scale_log2 = exact_log2 (INTVAL (ops[3]));
+  shift = exact_log2 (INTVAL (ops[3]));
 }
   machine_mode vec_mode = GET_MODE (vec_reg);
@@ -4043,9 +4043,12 @@ expand_gather_scatter (rtx *ops, bool is_load)
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   bool is_vlmax = is_vlmax_len_p (vec_mode, len);
+  bool use_widening_shift = false;
+
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
 {
+  use_widening_shift = TARGET_ZVBB && zero_extend_p && shift == 1;
   /* 7.2. Vector Load/Store Addressing Modes.
If the vector offset elements are narrower than XLEN, they are
zero-extended to XLEN before adding to the ptr effective address. If
@@ -4054,8 +4057,8 @@ expand_gather_scatter (rtx *ops, bool is_load)
raise an illegal instruction exception if the EEW is not supported for
offset elements.
- RVV spec only refers to the scale_log == 0 case.  */
-  if (!zero_extend_p || scale_log2 != 0)
+ RVV spec only refers to the shift == 0 case.  */
+  if (!zero_extend_p || shift)
{
  if (zero_extend_p)
inner_idx_mode
@@ -4064,19 +4067,32 @@ expand_gather_scatter (rtx *ops, bool is_load)
inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
  machine_mode new_idx_mode
= get_vector_mode (inner_idx_mode, nunits).require ();
-   rtx tmp = gen_reg_rtx (new_idx_mode);
-   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
-   zero_extend_p ? true : false));
-   vec_offset = tmp;
+   if (!use_widening_shift)
+ {
+   rtx tmp = gen_reg_rtx (new_idx_mode);
+   emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
+   zero_extend_p ? true : false));
+   vec_offset = tmp;
+ }
  idx_mode = new_idx_mode;
}
 }
-  if (scale_log2 != 0)
+  if (shift)
 {
-  rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
-   gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
-   OPTAB_DIRECT);
+  rtx tmp;
+  if (!use_widening_shift)
+ tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+ gen_int_mode (shift, Pmode), NULL_RTX, 0,
+ OPTAB_DIRECT);
+  else
+ {
+   tmp = gen_reg_rtx (idx_mode);
+   insn_code icode = code_for_pred_vwsll_scalar (idx_mode);
+   rtx ops[] = {tmp, vec_offset, const1_rtx};
+   emit_vlmax_insn (icode, BINARY_OP, ops);
+ }
+
   vec_offset = tmp;
 }
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 24822e2712c..0ddc2f3f3c6 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -295,7 +295,7 @@ (define_insn "@pred_vwsll"
(ashift:VWEXTI
  (zero_extend:VWEXTI
(match_operand: 3 "register_operand" "vr"))
- (match_operand: 4 "register_operand"  

Re: [PATCH] RISC-V: Add vwsll combine helpers.

2024-05-17 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:25
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
Subject: [PATCH] RISC-V: Add vwsll combine helpers.
Hi,
 
this patch enables the usage of vwsll in autovec context by adding the
necessary combine patterns and tests.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec-opt.md (*vwsll_zext1_): New
pattern.
(*vwsll_zext2_): Ditto.
(*vwsll_zext1_scalar_): Ditto.
(*vwsll_zext1_trunc_): Ditto.
(*vwsll_zext2_trunc_): Ditto.
(*vwsll_zext1_trunc_scalar_): Ditto.
* config/riscv/vector-crypto.md: Make pattern similar to other
narrowing/widening patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vwsll-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-template.h: New test.
---
gcc/config/riscv/autovec-opt.md   | 123 ++
gcc/config/riscv/vector-crypto.md |   2 +-
.../riscv/rvv/autovec/binop/vwsll-1.c |  10 ++
.../riscv/rvv/autovec/binop/vwsll-run.c   |  67 ++
.../riscv/rvv/autovec/binop/vwsll-template.h  |  49 +++
5 files changed, 250 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-template.h
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..06438f9e2f7 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,126 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; vzext.vf2 + vsll = vwsll.
+(define_insn_and_split "*vwsll_zext1_"
+  [(set (match_operand:VWEXTI 0 "register_operand""=vr ")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"" vr "))
+   (match_operand: 2 "vector_shift_operand" "vrvk")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_"
+  [(set (match_operand:VWEXTI 0 "register_operand""=vr ")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"" vr "))
+ (zero_extend:VWEXTI
+   (match_operand: 2 "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+
+(define_insn_and_split "*vwsll_zext1_scalar_"
+  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr")
+  (ashift:VWEXTI
+ (zero_extend:VWEXTI
+   (match_operand: 1 "register_operand"   " vr"))
+   (match_operand:   2 "vector_scalar_shift_operand" " rK")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[2]) == SUBREG)
+  operands[2] = SUBREG_REG (operands[2]);
+insn_code icode = code_for_pred_vwsll_scalar (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+;; For
+;;   uint16_t dst;
+;;   uint8_t a, b;
+;;   dst = vwsll (a, b)
+;; we seem to create
+;;   aa = (int) a;
+;;   bb = (int) b;
+;;   dst = (short) vwsll (aa, bb);
+;; The following patterns help to combine this idiom into one vwsll.
+
+(define_insn_and_split "*vwsll_zext1_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+ (zero_extend:VQEXTI
+   (match_operand: 1   "register_operand"" vr "))
+ (match_operand:VQEXTI 2   "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+ (zero_extend:VQEXTI
+   (match_operand: 1   "register_operand"" vr "))
+ (zero_extend:VQEXTI
+   (match_operand: 2   "vector_shift_operand" "vrvk")]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext1_trunc_scalar_"
+  [(set (match_operand: 0   

Re: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-05-17 23:24
To: gcc-patches
CC: palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.
Hi,
 
vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.
 
It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.
 
Regtested on rv64gcv_zvfh_zvbb.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
gcc/config/riscv/vector.md| 62 ---
.../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
.../gcc.target/riscv/rvv/base/pr115068.c  | 26 
.../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
4 files changed, 127 insertions(+), 32 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
(define_insn "@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
- [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
-  (match_operand 5 "vector_length_operand"  "   rK,   rK")
-  (match_operand 6 "const_int_operand"  "i,i")
-  (match_operand 7 "const_int_operand"  "i,i")
-  (match_operand 8 "const_int_operand"  "i,i")
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
- (match_operand:VWEXTI 3 "register_operand" "   vr,   vr")
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr")
(any_extend:VWEXTI
  (vec_duplicate:
- (match_operand: 4 "reg_or_0_operand"   "   rJ,   rJ"
-   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ"
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (plus:VWEXTI
+ (vec_duplicate:VWEXTI
+   (any_extend:
+ (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, rJ")))
+ (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu,  
0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"" vm,vm,Wc1,Wc1")
+  (match_operand 5 "vector_length_operand"  " rK,rK, rK, rK")
+  (match_operand 6 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 7 "const_int_operand"  "  i, i,  i,  i")
+  (match_operand 8 "const_int_operand"  "  i, i,  i,  i")
+  (reg:SI VL_REGNUM)
+  (reg:SI 

[Ada] Fix PR ada/115133

2024-05-17 Thread Eric Botcazou
The recent changes made to the runtime library broke its build on Solaris 
because it uses Solaris threads instead of POSIX threads on this platform.

Tested by Rainer, applied on the mainline.


2024-05-17  Eric Botcazou  
Rainer Orth  

PR ada/115133
* libgnarl/s-osinte__solaris.ads (mutex_t): Fix typo.
* libgnarl/s-taprop__solaris.adb (Record_Lock): Add conversion.
(Check_Sleep): Likewise.
(Record_Wakeup): Likewise.
(Check_Unlock): Likewise.
* libgnarl/s-tasini.adb (Initialize_RTS_Lock): Add pragma Import
on the overlaid variable.
(Finalize_RTS_Lock): Likewise.
(Acquire_RTS_Lock): Likewise.
(Release_RTS_Lock): Likewise.
* libgnarl/s-taspri__solaris.ads (To_RTS_Lock_Ptr): New instance
of Ada.Unchecked_Conversion.
* libgnat/s-oslock__solaris.ads: Add with clause for
Ada.Unchecked_Conversion.
(array_type_9): Add missing name qualification.
(record_type_3): Likewise.
(mutex_t): Fix formatting.

-- 
Eric Botcazoudiff --git a/gcc/ada/libgnarl/s-osinte__solaris.ads b/gcc/ada/libgnarl/s-osinte__solaris.ads
index 12ad52bb48e..3703697ef44 100644
--- a/gcc/ada/libgnarl/s-osinte__solaris.ads
+++ b/gcc/ada/libgnarl/s-osinte__solaris.ads
@@ -298,7 +298,7 @@ package System.OS_Interface is
 
function To_thread_t is new Ada.Unchecked_Conversion (Integer, thread_t);
 
-   subtype mutex_t is System.OS_Lock.mutex_t;
+   subtype mutex_t is System.OS_Locks.mutex_t;
 
type cond_t is limited private;
 
diff --git a/gcc/ada/libgnarl/s-taprop__solaris.adb b/gcc/ada/libgnarl/s-taprop__solaris.adb
index 88b77b09820..82e51b8d25c 100644
--- a/gcc/ada/libgnarl/s-taprop__solaris.adb
+++ b/gcc/ada/libgnarl/s-taprop__solaris.adb
@@ -1399,7 +1399,7 @@ package body System.Task_Primitives.Operations is
   P := Self_ID.Common.LL.Locks;
 
   if P /= null then
- L.Next := P;
+ L.Next := To_RTS_Lock_Ptr (P);
   end if;
 
   Self_ID.Common.LL.Locking := null;
@@ -1440,7 +1440,7 @@ package body System.Task_Primitives.Operations is
 
   Self_ID.Common.LL.L.Owner := null;
   P := Self_ID.Common.LL.Locks;
-  Self_ID.Common.LL.Locks := Self_ID.Common.LL.Locks.Next;
+  Self_ID.Common.LL.Locks := To_Lock_Ptr (Self_ID.Common.LL.Locks.Next);
   P.Next := null;
   return True;
end Check_Sleep;
@@ -1468,7 +1468,7 @@ package body System.Task_Primitives.Operations is
   P := Self_ID.Common.LL.Locks;
 
   if P /= null then
- L.Next := P;
+ L.Next := To_RTS_Lock_Ptr (P);
   end if;
 
   Self_ID.Common.LL.Locking := null;
@@ -1549,7 +1549,7 @@ package body System.Task_Primitives.Operations is
 
   L.Owner := null;
   P := Self_ID.Common.LL.Locks;
-  Self_ID.Common.LL.Locks := Self_ID.Common.LL.Locks.Next;
+  Self_ID.Common.LL.Locks := To_Lock_Ptr (Self_ID.Common.LL.Locks.Next);
   P.Next := null;
   return True;
end Check_Unlock;
diff --git a/gcc/ada/libgnarl/s-tasini.adb b/gcc/ada/libgnarl/s-tasini.adb
index 794183f5356..d42d2881df4 100644
--- a/gcc/ada/libgnarl/s-tasini.adb
+++ b/gcc/ada/libgnarl/s-tasini.adb
@@ -246,6 +246,7 @@ package body System.Tasking.Initialization is
procedure Initialize_RTS_Lock (Addr : Address) is
   Lock : aliased SOL.RTS_Lock;
   for Lock'Address use Addr;
+  pragma Import (Ada, Lock);
 
begin
   Initialize_Lock (Lock'Unchecked_Access, PO_Level);
@@ -258,6 +259,7 @@ package body System.Tasking.Initialization is
procedure Finalize_RTS_Lock (Addr : Address) is
   Lock : aliased SOL.RTS_Lock;
   for Lock'Address use Addr;
+  pragma Import (Ada, Lock);
 
begin
   Finalize_Lock (Lock'Unchecked_Access);
@@ -270,6 +272,7 @@ package body System.Tasking.Initialization is
procedure Acquire_RTS_Lock (Addr : Address) is
   Lock : aliased SOL.RTS_Lock;
   for Lock'Address use Addr;
+  pragma Import (Ada, Lock);
 
begin
   Write_Lock (Lock'Unchecked_Access);
@@ -282,6 +285,7 @@ package body System.Tasking.Initialization is
procedure Release_RTS_Lock (Addr : Address) is
   Lock : aliased SOL.RTS_Lock;
   for Lock'Address use Addr;
+  pragma Import (Ada, Lock);
 
begin
   Unlock (Lock'Unchecked_Access);
diff --git a/gcc/ada/libgnarl/s-taspri__solaris.ads b/gcc/ada/libgnarl/s-taspri__solaris.ads
index ca40229993b..16fc4196b00 100644
--- a/gcc/ada/libgnarl/s-taspri__solaris.ads
+++ b/gcc/ada/libgnarl/s-taspri__solaris.ads
@@ -47,6 +47,8 @@ package System.Task_Primitives is
 
function To_Lock_Ptr is
  new Ada.Unchecked_Conversion (OS_Locks.RTS_Lock_Ptr, Lock_Ptr);
+   function To_RTS_Lock_Ptr is
+ new Ada.Unchecked_Conversion (Lock_Ptr, OS_Locks.RTS_Lock_Ptr);
 
type Suspension_Object is limited private;
--  Should be used for the implementation of Ada.Synchronous_Task_Control
diff --git a/gcc/ada/libgnat/s-oslock__solaris.ads 

[pushed] Regenerate common.opt.urls

2024-05-17 Thread David Malcolm
I forgot to do this for r15-636-g770657d02c986c.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-640-g4e3bb431bbf280.

gcc/ChangeLog:
* common.opt.urls: Regenerate to add
fdiagnostics-show-event-links.

Signed-off-by: David Malcolm 
---
 gcc/common.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b4..10462e408744 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -534,6 +534,9 @@ 
UrlSuffix(gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-show
 fdiagnostics-show-caret
 
UrlSuffix(gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-show-caret)
 
+fdiagnostics-show-event-links
+UrlSuffix(gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-show-event-links)
+
 fdiagnostics-show-labels
 
UrlSuffix(gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-show-labels)
 
-- 
2.26.3



Re: [committed][wwwdocs] gcc-12/changes.html: Document RISC-V changes

2024-05-17 Thread Palmer Dabbelt

On Fri, 17 May 2024 14:30:49 PDT (-0700), ger...@pfeifer.com wrote:

On Thu, 28 Apr 2022, Kito Cheng wrote:

---
 htdocs/gcc-12/changes.html | 13 -

:

+New ISA extension support for vector and scalar crypto was added, only
+   support architecture testing marco and -march= 
parsing.


I realized I'm not sure I understand what the second part ("only
support...") means.

That for the time being (back then) only the macros and -march parsing
were supported?


Ya, I guess it's kind of an odd phrasing.  Maybe it should be something 
like


   The vector and scalar crypto extensions are now accepted in ISA 
   strings via the -march argument.  Note that enabling these 
   extensions will only set the coorespending feature test macros and 
   enable assembler support, they don't yet generate binaries with the 
   instructions added in these extensions.



Gerald


[pushed] wwwdocs: gcc-4.6: Use 64-bit instead of 64 bit

2024-05-17 Thread Gerald Pfeifer
---
 htdocs/gcc-4.6/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-4.6/changes.html b/htdocs/gcc-4.6/changes.html
index c96d347f..d1e15af3 100644
--- a/htdocs/gcc-4.6/changes.html
+++ b/htdocs/gcc-4.6/changes.html
@@ -791,7 +791,7 @@
 Several enhancements were committed to improve SIMD code
   generation for NEON by adding support for widening instructions,
   misaligned loads and stores, vector conditionals and
-  support for 64 bit arithmetic.
+  support for 64-bit arithmetic.
 
 Support was added for the Faraday cores fa526, fa606te,
   fa626te, fmp626te, fmp626 and fa726te and can be used with the
@@ -927,7 +927,7 @@
   which always generates the VSX memory instructions.
 The GCC compiler on AIX now defaults to a process layout with a
   larger data space allowing larger programs to be compiled.
-The GCC long double type on AIX 6.1 and above has reverted to 64 bit
+The GCC long double type on AIX 6.1 and above has reverted to 64-bit
   double precision, matching the AIX XL compiler default, because of
   missing C99 symbols required by the GCC runtime.
 The default processor scheduling model and tuning for PowerPC64
-- 
2.45.0


Re: [committed][wwwdocs] gcc-12/changes.html: Document RISC-V changes

2024-05-17 Thread Gerald Pfeifer
On Thu, 28 Apr 2022, Kito Cheng wrote:
> ---
>  htdocs/gcc-12/changes.html | 13 -
:
> +New ISA extension support for vector and scalar crypto was added, 
> only
> + support architecture testing marco and -march= 
> parsing.

I realized I'm not sure I understand what the second part ("only 
support...") means.

That for the time being (back then) only the macros and -march parsing 
were supported?

Gerald


[pushed] wwwdocs: gcc-12: Fix typo

2024-05-17 Thread Gerald Pfeifer


---
 htdocs/gcc-12/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 8a0347e3..0cfa12eb 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -903,7 +903,7 @@ function Multiply (S1, S2 : Sign) return Sign is
 announcement
 New ISA extension support for zba, zbb, zbc, zbs was added.
 New ISA extension support for vector and scalar crypto was added, only
-   support architecture testing marco and -march= 
parsing.
+   support architecture testing macro and -march= 
parsing.
 The option -mtune=thead-c906 is added to tune for T-HEAD
c906 cores.
 libstdc++ no longer attempts to detect built-in atomics.
-- 
2.45.0


Re: [PATCH v1] RISC-V: Cleanup some temporally files [NFC]

2024-05-17 Thread Jeff Law




On 5/16/24 6:12 PM, Li, Pan2 wrote:

Committed, thanks Juzhe.

Thanks for cleaning up my little mess!  Sorry about that.

jeff



Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-17 Thread Carl Love
Kewen:

I am working thru the patches.  I made the changes as requested for this patch 
but have a question about 
one of your comments.

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added



> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

QUESTION:
I am not sure what you want changed to v{un,}signed{e,o}_v4sf??  The overloaded 
instance entry names
for vd, vf have to match the first line of the definition. The name can't be 
type specific, i.e. v4sf.  
So not sure where you want the v{un,}signed{e,o}_v4sf name used?

For example, file rs6000-overloaded.def now looks like:

[VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 




 Carl 


[x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-17 Thread Roger Sayle

Hi Hongtao,
Many thanks for the review, bug fixes and suggestions for improvements.
This revised version of the patch, implements all of your corrections.  In 
theory
the "ternlog idx" should guarantee that some operands are non-null, but I agree
that it's better defensive programming to check invariants not easily proved.
Instead of calling ix86_expand_vector_move, I use ix86_broadcast_from_constant
to achieve the same effect of using a broadcast when possible, but has the 
benefit
of still using a memory operand (instead of a vector load) when broadcasting 
isn't
possible.  There are other places that could benefit from the same trick, but I 
can
address these in a follow-up patch (it may even be preferrable to keep these as
CONST_VECTOR during early RTL passes and lower to broadcast or constant pool
using splitters).

This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-05-17  Roger Sayle  
Hongtao Liu  

gcc/ChangeLog
PR target/115021
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
fixup_modeless_constant before testing predicates.  Only call
copy_to_mode_reg on memory operands (after the first one).
(ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
into a VEC_DUPLICATE if possible.
(ix86_ternlog_idx):  Convert an RTX expression into a ternlog
index between 0 and 255, recording the operands in ARGS, if
possible or return -1 if this is not possible/valid.
(ix86_ternlog_leaf_p): Helper function to identify "leaves"
of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
(ix86_ternlog_operand_p): Test whether a expression is suitable
for and prefered as an UNSPEC_TERNLOG.
(ix86_expand_ternlog_binop): Helper function to construct the
binary operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog_andnot): Helper function to construct a
ANDN operation corresponding to a sufficiently simple ternlog.
(ix86_expand_ternlog): Expand a 3-operand ternary logic
expression, constructing either an UNSPEC_TERNLOG or simpler
rtx expression.  Called from builtin expanders and pre-reload
splitters.
* config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here.
(ix86_ternlog_operand_p): Likewise.
(ix86_expand_ternlog): Likewise.
* config/i386/predicates.md (ternlog_operand): New predicate
that calls xi86_ternlog_operand_p.
* config/i386/sse.md (_vpternlog_0): New
define_insn_and_split that recognizes a SET_SRC of ternlog_operand
and expands it via ix86_expand_ternlog pre-reload.
(_vternlog_mask): Convert from define_insn to
define_expand.  Use ix86_expand_ternlog if the mask operand is
~0 (or 255 or -1).
(*_vternlog_mask): define_insn renamed from above.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-andn-di-zmm-2.c: Update test case.
* gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-1.c: Likewise.
* gcc.target/i386/avx512f-orn-si-zmm-2.c: Likewise.
* gcc.target/i386/avx512f-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
* gcc.target/i386/pr100711-3.c: Likewise.
* gcc.target/i386/pr100711-4.c: Likewise.
* gcc.target/i386/pr100711-5.c: Likewise.


Thanks again,
Roger
--

> From: Hongtao Liu 
> Sent: 14 May 2024 09:46
> On Mon, May 13, 2024 at 5:57 AM Roger Sayle 
> wrote:
> >
> > This patch improves the way that the x86 backend recognizes and
> > expands AVX512's bitwise ternary logic (vpternlog) instructions.
> I like the patch.
> 
> 1 file changed, 25 insertions(+), 1 deletion(-) 
> gcc/config/i386/i386-expand.cc | 26
> +-
> 
> modified   gcc/config/i386/i386-expand.cc
> @@ -25601,6 +25601,7 @@ ix86_gen_bcst_mem (machine_mode mode, rtx x)
> int  ix86_ternlog_idx (rtx op, rtx *args)  {
> +  /* Nice dynamic programming:)  */
>int idx0, idx1;
> 
>if (!op)
> @@ -25651,6 +25652,7 @@ ix86_ternlog_idx (rtx op, rtx *args)
> return 0xaa;
>   }
>/* Maximum of one volatile memory reference per expression.  */
> +  /* According to comments, it should be && ?  */
>if (side_effects_p (op) || side_effects_p (args[2]))
>   return -1;
>if (rtx_equal_p (op, args[2]))
> @@ -25666,6 +25668,8 @@ ix86_ternlog_idx (rtx op, rtx *args)
> 
>  case SUBREG:
>if (!VECTOR_MODE_P (GET_MODE (SUBREG_REG (op)))
> +   /* It could be TI/OI/XImode since it's just bit operations,
> +  So no need for VECTOR_MODE_P?  */
> || GET_MODE_SIZE (GET_MODE 

[PATCH] selftest: invoke "diff" when ASSERT_STREQ fails

2024-05-17 Thread David Malcolm
Currently when ASSERT_STREQ or ASSERT_STREQ_AT fail we print
both strings to stderr.  However it can be hard to figure out
the problem (e.g. for 1-character differences in long strings).

Extend the output by writing out the strings to tempfiles and
invoking "diff -up" on them when we have such a selftest failure,
to (I hope) simplify debugging.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* selftest.cc (selftest::print_diff): New function.
(selftest::assert_streq): Call it when we have non-equal
non-null strings.

Signed-off-by: David Malcolm 
---
 gcc/selftest.cc | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/gcc/selftest.cc b/gcc/selftest.cc
index 6438d86a6aa0..f58c0631908e 100644
--- a/gcc/selftest.cc
+++ b/gcc/selftest.cc
@@ -63,6 +63,26 @@ fail_formatted (const location , const char *fmt, ...)
   abort ();
 }
 
+/* Invoke "diff" to print the difference between VAL1 and VAL2
+   on stdout.  */
+
+static void
+print_diff (const location , const char *val1, const char *val2)
+{
+  temp_source_file tmpfile1 (loc, ".txt", val1);
+  temp_source_file tmpfile2 (loc, ".txt", val2);
+  const char *args[] = {"diff",
+   "-up",
+   tmpfile1.get_filename (),
+   tmpfile2.get_filename (),
+   NULL};
+  int exit_status = 0;
+  int err = 0;
+  pex_one (PEX_SEARCH | PEX_LAST,
+  args[0], CONST_CAST (char **, args),
+  NULL, NULL, NULL, _status, );
+}
+
 /* Implementation detail of ASSERT_STREQ.
Compare val1 and val2 with strcmp.  They ought
to be non-NULL; fail gracefully if either or both are NULL.  */
@@ -89,8 +109,12 @@ assert_streq (const location ,
if (strcmp (val1, val2) == 0)
  pass (loc, "ASSERT_STREQ");
else
- fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n 
val2=\"%s\"\n",
- desc_val1, desc_val2, val1, val2);
+ {
+   print_diff (loc, val1, val2);
+   fail_formatted
+ (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n val2=\"%s\"\n",
+  desc_val1, desc_val2, val1, val2);
+ }
   }
 }
 
-- 
2.26.3



Re: [PATCH v6] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-17 Thread Robin Dapp
Hi Pan,

all in all LGTM.  Just insignificant nits.

> +void
> +expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
> +{
> +  emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
> +}
> +

Do we really need this function?  Or do you want it to be a dispatcher
for later?  If it should do more than just a call, please document.

> +  /* Step-1: sum = x + y  */
> +  if (mode == SImode && mode != Xmode)
> +{ /* Take addw to avoid the sum truncate.  */
> +  rtx simode_sum = gen_reg_rtx (SImode);
> +  riscv_emit_binary (PLUS, simode_sum, x, y);
> +  emit_move_insn (xmode_sum, gen_lowpart (Xmode, simode_sum));
> +}
> +  else
> +riscv_emit_binary (PLUS, xmode_sum, xmode_x, xmode_y);

I would add a top-level comment that the emulation is just
sum = x + y;
if (sum < x)
  sum = TYPE_MAX;
and we can implement the if/then by sltu and or.

No need for another revision, though.

Regards
 Robin


[pushed] diagnostics, analyzer: add CFG edge visualization to path-printing

2024-05-17 Thread David Malcolm
This patch adds some ability for links between labelled ranges when
quoting the user's source code, and uses this to add links between
events when printing diagnostic_paths, chopping them up further into
event ranges that can be printed together.
It adds links to the various "from..." - "...to" events in the
analyzer.

For example, previously we emitted this for
c-c++-common/analyzer/infinite-loop-linked-list.c's
while_loop_missing_next':

infinite-loop-linked-list.c:30:10: warning: infinite loop [CWE-835] 
[-Wanalyzer-infinite-loop]
   30 |   while (n)
  |  ^
  'while_loop_missing_next': events 1-5
   30 |   while (n)
  |  ^
  |  |
  |  (1) infinite loop here
  |  (2) when 'n' is non-NULL: always following 'true' branch...
  |  (5) ...to here
   31 | {
   32 |   sum += n->val;
  |   ~
  |   |   |
  |   |   (3) ...to here
  |   (4) looping back...

whereas with the patch we now emit:

infinite-loop-linked-list.c:30:10: warning: infinite loop [CWE-835] 
[-Wanalyzer-infinite-loop]
   30 |   while (n)
  |  ^
  'while_loop_missing_next': events 1-3
   30 |   while (n)
  |  ^
  |  |
  |  (1) infinite loop here
  |  (2) when 'n' is non-NULL: always following 'true' branch... 
->-+
  | 
|
  | 
|
  
|++
   31 ||{
   32 ||  sum += n->val;
  || ~~
  ||  |
  |+->(3) ...to here
  'while_loop_missing_next': event 4
   32 |   sum += n->val;
  |   ^
  |   |
  |   (4) looping back... ->-+
  |  |
  'while_loop_missing_next': event 5
  |  |
  |+-+
   30 ||  while (n)
  || ^
  || |
  |+>(5) ...to here

which I believe is easier to understand.

The patch also implements the use of unicode characters and colorization
for the lines (not shown in the above example).

There is a new option -fno-diagnostics-show-event-links for getting
back the old behavior (added to -fdiagnostics-plain-output).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-636-g770657d02c986c.

gcc/analyzer/ChangeLog:
* checker-event.h (checker_event::connect_to_next_event_p):
Implement new diagnostic_event::connect_to_next_event_p vfunc.
(start_cfg_edge_event::connect_to_next_event_p): Likewise.
(start_consolidated_cfg_edges_event::connect_to_next_event_p):
Likewise.
* infinite-loop.cc (class looping_back_event): New subclass.
(infinite_loop_diagnostic::add_final_event): Use it.

gcc/ChangeLog:
* common.opt (fdiagnostics-show-event-links): New option.
* diagnostic-label-effects.h: New file.
* diagnostic-path.h (diagnostic_event::connect_to_next_event_p):
New pure virtual function.
(simple_diagnostic_event::connect_to_next_event_p): Implement it.
(simple_diagnostic_event::connect_to_next_event): New.
(simple_diagnostic_event::m_connected_to_next_event): New field.
(simple_diagnostic_path::connect_to_next_event): New decl.
* diagnostic-show-locus.cc: Include "text-art/theme.h" and
"diagnostic-label-effects.h".
(colorizer::set_cfg_edge): New.
(layout::m_fallback_theme): New field.
(layout::m_theme): New field.
(layout::m_effect_info): New field.
(layout::m_link_lhs_state): New enum and field.
(layout::m_link_rhs_column): New field.
(layout_range::has_in_edge): New.
(layout_range::has_out_edge): New.
(layout::layout): Add "effect_info" optional param.  Initialize
m_theme, m_link_lhs_state, and m_link_rhs_column.
(layout::maybe_add_location_range): Remove stray "FIXME" from
leading comment.
(layout::print_source_line): Replace space after margin with a
call to print_leftmost_column.
(layout::print_leftmost_column): New.
(layout::start_annotation_line): Make non-const.  Gain
responsibility for printing the leftmost column after the margin.
(layout::print_annotation_line): Drop pp_space, as this is now
added by start_annotation_line.
(line_label::line_label): Add "has_in_edge" and "has_out_edge"
params and initialize...
(line_label::m_has_in_edge): New field.
(line_label::m_has_out_edge): New field.
(layout::print_any_labels): Pass edge information to line_label
ctor. 

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and, dependent changed code in one file

2024-05-17 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Alex/Richard:
>
> All review comments are addressed.
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Thanks & Regards
> Ajit

Thanks for the patch and thanks to Alex for the reviews.  The patch
looks good to me apart from the minor nits below and the comments that
Alex had.  Please post the updated patch for a final ok though.

> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> 2024-05-15  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.

Not sure this is a complete sentence.  Maybe:

* config/aarch64/aarch64-ldp-fusion.cc: Factor out a
target-independent interface and move it to the head of the file.

That technically isn't detailed enough for a changelog entry,
but IMO we should use it anyway.  It's pointless to write the usual
amount of detail when the code is going to move soon.

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
>  1 file changed, 357 insertions(+), 176 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..429e532ea3b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,225 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying handle_writeback_opportunities, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,
> +  // All writeback opportunities including those that involve folding

There should be a comma after "opportunities"

> +  // base register updates into a non-writeback pair.
> +  ALL
> +};
> +
> +struct pair_fusion {
> +  pair_fusion ()
> +  {
> +calculate_dominance_info (CDI_DOMINATORS);
> +df_analyze ();
> +crtl->ssa = new rtl_ssa::function_info (cfun);
> +  };

Unnecessary trailing ";".  I think it'd be better to define this and
the destructor out-of-line though.  For one thing, it'll reduce the number
of header file dependencies, once the code is moved to its own header file.

> +
> +  // Given:
> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
> +  // - a boolean LOAD_P (true iff the insn is a load), then:
> +  // return true if the access should be considered an FP/SIMD access.
> +  // Such accesses are segregated from GPR accesses, since we only want
> +  // to form pairs for accesses that use the same register file.
> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
> +  {
> +return false;
> +  }
> +
> +  // Return true if we should consider forming pairs from memory
> +  // accesses with operand mode MODE at this stage in compilation.
> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
> +
> +  // Return true iff REG_OP is a suitable register operand for a paired
> +  // memory access, where LOAD_P is true if we're asking about loads and
> +  // false for stores.  MODE gives the mode of the operand.
> +  virtual bool pair_reg_operand_ok_p (bool load_p, rtx reg_op,
> +   machine_mode mode) = 0;
> +
> +  // Return alias check limit.
> +  // This is needed to avoid unbounded quadratic behaviour when
> +  // performing alias analysis.
> +  virtual int pair_mem_alias_check_limit () = 0;

I think the end result should be to make this a target-independent
--param, but this is ok/good as an intermediate step.

> +
> +  // Returns true if we should try to handle writeback opportunities.

s/Returns/Return/

> +  // WHICH determines the kinds of writeback opportunities the caller
> +  // is asking about.
> +  

[PATCH v2] libstdc++: Fix std::ranges::iota not included in numeric [PR108760]

2024-05-17 Thread Michael Levine (BLOOMBERG/ 731 LEX)
This is the revised version of my patch incorporating the provided feedback 
from Patrick Palka and Jonathan Wakely.

This patch fixes GCC Bug 108760: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108760

I moved out_value_result to , moved std::ranges:iota 
into , removed my new test, and moved and renamed the existing test.

I built my local version of gcc using the following configuration:  $ 
../gcc/configure --disable-bootstrap --prefix="$(pwd)/_pfx/" 
--enable-languages=c,c++,lto
I then ran $ make -jN 
and $ make -jN install

Using the locally installed version, the following code compiled:  
https://godbolt.org/z/33EPeqd1b

I tested my changes by running:  $ make check-c++ -jN -k
I personally found it difficult to understand the results of running the tests.

I ran this on the following OS:

Virtualization: wsl
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.146.1-microsoft-standard-WSL2
Architecture: x86-64


From: Michael Levine (BLOOMBERG/ 731 LEX) At: 04/17/24 14:24:24 UTC-4:00To:  
libstd...@gcc.gnu.org,  gcc-patches@gcc.gnu.org
Subject: [PATCH] libstdc++: Fix std::ranges::iota is not included in numeric 
[PR108760]
This patch fixes GCC Bug 108760:  
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108760

Before this patch, using std::ranges::iota required including  when 
it should have been sufficient to only include .

When the patch is applied, the following code will compile:  
https://godbolt.org/z/33EPeqd1b

I added a test case for this change as well.

I built my local version of gcc using the following configuration:  $ 
../gcc/configure --disable-bootstrap --prefix="$(pwd)/_pfx/" 
--enable-languages=c,c++,lto

and I tested my changes by running:  $ make check-c++ -jN -k

I ran this on the following OS:

Virtualization: wsl
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.146.1-microsoft-standard-WSL2
Architecture: x86-64




108760v2.patch
Description: Binary data


Re: [PATCH] c++: folding non-dep enumerator from current inst [PR115139]

2024-05-17 Thread Marek Polacek
On Fri, May 17, 2024 at 12:05:15PM -0400, Patrick Palka wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/14?

This patch looks good to me, thanks.
 
> -- >8 --
> 
> After the tsubst_copy removal r14-4796-g3e3d73ed5e85e7 GCC 14 ICEs during
> fold_non_dependent_expr for 'e1 | e2' ultimately because we no longer exit
> early when substituting the CONST_DECLs for e1 and e2 with args=NULL_TREE,
> during which we try substituting the class context A (also with
> args=NULL_TREE) which ends up ICEing from tsubst_pack_expansion (due to
> processing_template_decl being cleared).
> 
> Incidentally, the ICE went away on trunk ever since the tsubst_aggr_type
> removal r15-123-gf04dc89a991ddc since it made the CONST_DECL case of
> tsubst_expr use tsubst to substitute the context, which does short circuit
> for empty args and so avoids the ICE.
> 
> This patch fixes this ICE for GCC 14 by narrowly restoring the early exit
> for empty args that was present in tsubst_copy when substituting an
> enumerator CONST_DECL.  We might as well apply this to trunk too, as a
> very minor optimization.
> 
>   PR c++/115139
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_expr) : Exit early if args
>   is empty.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/template/non-dependent33.C: New test.
> ---
>  gcc/cp/pt.cc|  2 +-
>  gcc/testsuite/g++.dg/template/non-dependent33.C | 11 +++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent33.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 32640f8e946..e185e3d8941 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -21519,7 +21519,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl)
>  
>   if (DECL_TEMPLATE_PARM_P (t))
> RETURN (RECUR (DECL_INITIAL (t)));
> - if (!uses_template_parms (DECL_CONTEXT (t)))
> + if (!args || !uses_template_parms (DECL_CONTEXT (t)))
> RETURN (t);
>  
>   /* Unfortunately, we cannot just call lookup_name here.
> diff --git a/gcc/testsuite/g++.dg/template/non-dependent33.C 
> b/gcc/testsuite/g++.dg/template/non-dependent33.C
> new file mode 100644
> index 000..2f1dd8a214c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/template/non-dependent33.C
> @@ -0,0 +1,11 @@
> +// PR c++/115139
> +// { dg-do compile { target c++11 } }
> +
> +template
> +class A {
> +  enum E {
> +e1 = 1,
> +e2 = 2,
> +e3 = e1 | e2,
> +  };
> +};
> -- 
> 2.45.1.204.gd8ab1d464d
> 

Marek



[PATCH] c++: folding non-dep enumerator from current inst [PR115139]

2024-05-17 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/14?

-- >8 --

After the tsubst_copy removal r14-4796-g3e3d73ed5e85e7 GCC 14 ICEs during
fold_non_dependent_expr for 'e1 | e2' ultimately because we no longer exit
early when substituting the CONST_DECLs for e1 and e2 with args=NULL_TREE,
during which we try substituting the class context A (also with
args=NULL_TREE) which ends up ICEing from tsubst_pack_expansion (due to
processing_template_decl being cleared).

Incidentally, the ICE went away on trunk ever since the tsubst_aggr_type
removal r15-123-gf04dc89a991ddc since it made the CONST_DECL case of
tsubst_expr use tsubst to substitute the context, which does short circuit
for empty args and so avoids the ICE.

This patch fixes this ICE for GCC 14 by narrowly restoring the early exit
for empty args that was present in tsubst_copy when substituting an
enumerator CONST_DECL.  We might as well apply this to trunk too, as a
very minor optimization.

PR c++/115139

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) : Exit early if args
is empty.

gcc/testsuite/ChangeLog:

* g++.dg/template/non-dependent33.C: New test.
---
 gcc/cp/pt.cc|  2 +-
 gcc/testsuite/g++.dg/template/non-dependent33.C | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent33.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 32640f8e946..e185e3d8941 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -21519,7 +21519,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
 
if (DECL_TEMPLATE_PARM_P (t))
  RETURN (RECUR (DECL_INITIAL (t)));
-   if (!uses_template_parms (DECL_CONTEXT (t)))
+   if (!args || !uses_template_parms (DECL_CONTEXT (t)))
  RETURN (t);
 
/* Unfortunately, we cannot just call lookup_name here.
diff --git a/gcc/testsuite/g++.dg/template/non-dependent33.C 
b/gcc/testsuite/g++.dg/template/non-dependent33.C
new file mode 100644
index 000..2f1dd8a214c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/non-dependent33.C
@@ -0,0 +1,11 @@
+// PR c++/115139
+// { dg-do compile { target c++11 } }
+
+template
+class A {
+  enum E {
+e1 = 1,
+e2 = 2,
+e3 = e1 | e2,
+  };
+};
-- 
2.45.1.204.gd8ab1d464d



Re: [PATCH 00/12] aarch64: Extend aarch64_feature_flags to 128 bits

2024-05-17 Thread Richard Sandiford
Andrew Carlotti  writes:
> The end goal of the series is to change the definition of 
> aarch64_feature_flags
> from a uint64_t typedef to a class with 128 bits of storage.  This class uses
> operator overloading to mimic the existing integer interface as much as
> possible, but with added restrictions to facilate type checking and
> extensibility.
>
> Patches 01-10 are preliminary enablement work, and have passed regression
> testing.  Are these ok for master?
>
> Patch 11 is an RFC, and the only patch that touches the middle end.  I am
> seeking clarity on which part(s) of the compiler should be expected to handle
> or prevent non-bool types in instruction pattern conditions.  The actual patch
> does not compile by itself (though it does in combination with 12/12), but 
> that
> is not important to the questions I'm asking.
>
> Patch 12 is then a small patch that actually replaces the uint64_t typedef 
> with
> a class.  I think this patch is fine in it's current form, but it depends on a
> resolution to the issues in patch 11/12 first.

Thanks for doing this.

Rather than disallowing flags == 0, etc., I think we should allow
aarch64_feature_flags to be constructed from a single uint64_t.
It's a lossless conversion.  The important thing is that we don't
allow conversions the other way (and the patch doesn't allow them).

Also, I think we should make the new class in 12/12 be a templated
 type that provides an N-bit bitmask.  It should arguably
also be target-independent code.  aarch64_feature_flags would then be
an alias with the appropriate number of bits.

For the RFC in 11/12, how about, as another prepatch before 12/12,
removing all the mechanical:

#define AARCH64_ISA_LS64   (aarch64_isa_flags & AARCH64_FL_LS64)

style macros and replacing uses with something like:

  AARCH64_HAVE_ISA (LS64)

Uses outside aarch64.h should arguably be changed to TARGET_* instead,
since the convention seems to be that TARGET_* checks the underlying
ISA flag and also any other relevant conditions (where applicable).

Thanks,
Richard


[PATCH] RISC-V: Remove dead perm series code and document.

2024-05-17 Thread Robin Dapp
Hi,

with the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead.  This patch removes it and also adds
a function-level comment to shuffle_series_patterns.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_const_vector): Document.
(shuffle_extract_and_slide1up_patterns): Remove.
---
 gcc/config/riscv/riscv-v.cc | 26 --
 1 file changed, 4 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8b41b9c7774..93c2dcd04e4 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1485,28 +1485,6 @@ expand_const_vector (rtx target, rtx src)
  emit_vlmax_insn (code_for_pred_merge (mode), MERGE_OP, ops);
}
}
-  else if (npatterns == 1 && nelts_per_pattern == 3)
-   {
- /* Generate the following CONST_VECTOR:
-{ base0, base1, base1 + step, base1 + step * 2, ... }  */
- rtx base0 = builder.elt (0);
- rtx base1 = builder.elt (1);
- rtx base2 = builder.elt (2);
-
- rtx step = simplify_binary_operation (MINUS, builder.inner_mode (),
-   base2, base1);
-
- /* Step 1 - { base1, base1 + step, base1 + step * 2, ... }  */
- rtx tmp = gen_reg_rtx (mode);
- expand_vec_series (tmp, base1, step);
- /* Step 2 - { base0, base1, base1 + step, base1 + step * 2, ... }  */
- if (!rtx_equal_p (base0, const0_rtx))
-   base0 = force_reg (builder.inner_mode (), base0);
-
- insn_code icode = optab_handler (vec_shl_insert_optab, mode);
- gcc_assert (icode != CODE_FOR_nothing);
- emit_insn (GEN_FCN (icode) (target, tmp, base0));
-   }
   else
/* TODO: We will enable more variable-length vector in the future.  */
gcc_unreachable ();
@@ -3580,6 +3558,10 @@ shuffle_extract_and_slide1up_patterns (struct 
expand_vec_perm_d *d)
   return true;
 }
 
+/* This looks for a series pattern in the provided vector permute structure D.
+   If successful it emits a series insn as well as a gather to implement it.
+   Return true if successful, false otherwise.  */
+
 static bool
 shuffle_series_patterns (struct expand_vec_perm_d *d)
 {
-- 
2.45.0


[PATCH] RISC-V: Add vector popcount, clz, ctz.

2024-05-17 Thread Robin Dapp
Hi,

this patch adds the zvbb vcpop, vclz and vctz to the autovec machinery
as well as tests for them.  It also changes several non-VLS iterators
to V_VLS iterators for consistency.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (ctz2): New expander.
(clz2): Ditto.
* config/riscv/generic-vector-ooo.md: Add bitmanip ops to insn
reservation.
* config/riscv/vector-crypto.md: Add VLS modes to insns.
* config/riscv/vector.md: Add bitmanip ops to mode_idx and other
attributes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: Adjust check
for zvbb.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/popcount-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/clz-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/ctz-template.h: New test.
---
 gcc/config/riscv/autovec.md   | 30 +-
 gcc/config/riscv/generic-vector-ooo.md|  2 +-
 gcc/config/riscv/vector-crypto.md | 93 ++-
 gcc/config/riscv/vector.md| 14 +--
 .../gcc.target/riscv/rvv/autovec/unop/clz-1.c |  8 ++
 .../riscv/rvv/autovec/unop/clz-run.c  | 36 +++
 .../riscv/rvv/autovec/unop/clz-template.h | 21 +
 .../gcc.target/riscv/rvv/autovec/unop/ctz-1.c |  8 ++
 .../riscv/rvv/autovec/unop/ctz-run.c  | 36 +++
 .../riscv/rvv/autovec/unop/ctz-template.h | 21 +
 .../riscv/rvv/autovec/unop/popcount-1.c   |  4 +-
 .../riscv/rvv/autovec/unop/popcount-2.c   |  4 +-
 .../riscv/rvv/autovec/unop/popcount-3.c   |  8 ++
 .../riscv/rvv/autovec/unop/popcount-run-1.c   |  3 +-
 .../rvv/autovec/unop/popcount-template.h  | 21 +
 15 files changed, 250 insertions(+), 59 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/clz-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/ctz-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index aa1ae0fe075..a9391ed146c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1566,7 +1566,7 @@ (define_expand "xorsign3"
 })
 
 ;; 
---
-;; - [INT] POPCOUNT.
+;; - [INT] POPCOUNT, CTZ and CLZ.
 ;; 
---
 
 (define_expand "popcount2"
@@ -1574,10 +1574,36 @@ (define_expand "popcount2"
(match_operand:V_VLSI 1 "register_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_popcount (operands);
+  if (!TARGET_ZVBB)
+riscv_vector::expand_popcount (operands);
+  else
+{
+  riscv_vector::emit_vlmax_insn (code_for_pred_v (POPCOUNT, mode),
+riscv_vector::CPOP_OP, operands);
+}
   DONE;
 })
 
+(define_expand "ctz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CTZ, mode),
+  riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
+(define_expand "clz2"
+  [(match_operand:V_VLSI 0 "register_operand")
+   (match_operand:V_VLSI 1 "register_operand")]
+  "TARGET_ZVBB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (CLZ, mode),
+  riscv_vector::CPOP_OP, operands);
+DONE;
+})
+
 
 ;; -
 ;;  [INT] Highpart multiplication
diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 96cb1a0be29..5e933c83841 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -74,7 +74,7 @@ (define_insn_reservation "vec_fmul" 6
 
 ;; Vector crypto, assumed to be a generic operation for now.
 (define_insn_reservation "vec_crypto" 4
-  (eq_attr "type" "crypto")
+  (eq_attr "type" 

[PATCH] RISC-V: Add vandn combine helper.

2024-05-17 Thread Robin Dapp
Hi,

this patch adds a combine pattern for vandn as well as tests for it.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vandn_): New pattern.
* config/riscv/vector.md: Add vandn to mode_idx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vandn-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vandn-template.h: New test.
---
 gcc/config/riscv/autovec-opt.md   | 18 +++
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/autovec/binop/vandn-1.c |  8 +++
 .../riscv/rvv/autovec/binop/vandn-run.c   | 54 +++
 .../riscv/rvv/autovec/binop/vandn-template.h  | 38 +
 5 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-template.h

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 06438f9e2f7..07372d965b0 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1559,3 +1559,21 @@ (define_insn_and_split "*vwsll_zext1_trunc_scalar_"
 DONE;
   }
   [(set_attr "type" "vwsll")])
+
+;; vnot + vand = vandn.
+(define_insn_and_split "*vandn_"
+ [(set (match_operand:V_VLSI 0 "register_operand" "=vr")
+   (and:V_VLSI
+(not:V_VLSI
+  (match_operand:V_VLSI  2 "register_operand"  "vr"))
+(match_operand:V_VLSI1 "register_operand"  "vr")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vandn (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vandn")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c6a3845dc13..dafcd7d9bf9 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -743,7 +743,7 @@ (define_attr "mode_idx" ""
vfcmp,vfminmax,vfsgnj,vfclass,vfmerge,vfmov,\

vfcvtitof,vfncvtitof,vfncvtftoi,vfncvtftof,vmalu,vmiota,vmidx,\

vimovxv,vfmovfv,vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,\
-   vgather,vcompress,vmov,vnclip,vnshift")
+   vgather,vcompress,vmov,vnclip,vnshift,vandn")
   (const_int 0)
 
   (eq_attr "type" "vimovvx,vfmovvf")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
new file mode 100644
index 000..3bb5bf8dd5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+/* { dg-final { scan-assembler-times {\tvandn\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
new file mode 100644
index 000..243c5975068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vandn-run.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vandn-template.h"
+
+#include 
+
+#define SZ 512
+
+#define RUN(TYPE, VAL) 
\
+  TYPE a##TYPE[SZ];
\
+  TYPE b##TYPE[SZ];
\
+  for (int i = 0; i < SZ; i++) 
\
+{  
\
+  a##TYPE[i] = 123;
\
+  b##TYPE[i] = VAL;
\
+}  
\
+  vandn_##TYPE (a##TYPE, a##TYPE, b##TYPE, SZ);
\
+  for (int i = 0; i < SZ; i++) 
\
+assert (a##TYPE[i] == (TYPE) (123 & ~VAL));
+
+#define RUN2(TYPE, VAL)
\
+  TYPE as##TYPE[SZ];   
\
+  for (int i = 0; i < SZ; i++) 
\
+as##TYPE[i] = 123;  

[PATCH] RISC-V: Use widening shift for scatter/gather if applicable.

2024-05-17 Thread Robin Dapp
Hi,

with the zvbb extension we can emit a widening shift for scatter/gather
index preparation in case we need to multiply by 2 and zero extend.

The patch also adds vwsll to the mode_idx attribute and removes the
mode from shift-count operand of the insn pattern.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_gather_scatter): Use vwsll if
applicable.
* config/riscv/vector-crypto.md: Remove mode from vwsll shift
count operator.
* config/riscv/vector.md: Add vwsll to mode iterator.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zvbb.
* gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c: 
New test.
---
 gcc/config/riscv/riscv-v.cc   |  42 +--
 gcc/config/riscv/vector-crypto.md |   4 +-
 gcc/config/riscv/vector.md|   4 +-
 .../gather-scatter/gather_load_64-12-zvbb.c   | 113 ++
 gcc/testsuite/lib/target-supports.exp |  48 +++-
 5 files changed, 193 insertions(+), 18 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/gather_load_64-12-zvbb.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 814c5febabe..8b41b9c7774 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -4016,7 +4016,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
 {
   rtx ptr, vec_offset, vec_reg;
   bool zero_extend_p;
-  int scale_log2;
+  int shift;
   rtx mask = ops[5];
   rtx len = ops[6];
   if (is_load)
@@ -4025,7 +4025,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[1];
   vec_offset = ops[2];
   zero_extend_p = INTVAL (ops[3]);
-  scale_log2 = exact_log2 (INTVAL (ops[4]));
+  shift = exact_log2 (INTVAL (ops[4]));
 }
   else
 {
@@ -4033,7 +4033,7 @@ expand_gather_scatter (rtx *ops, bool is_load)
   ptr = ops[0];
   vec_offset = ops[1];
   zero_extend_p = INTVAL (ops[2]);
-  scale_log2 = exact_log2 (INTVAL (ops[3]));
+  shift = exact_log2 (INTVAL (ops[3]));
 }
 
   machine_mode vec_mode = GET_MODE (vec_reg);
@@ -4043,9 +4043,12 @@ expand_gather_scatter (rtx *ops, bool is_load)
   poly_int64 nunits = GET_MODE_NUNITS (vec_mode);
   bool is_vlmax = is_vlmax_len_p (vec_mode, len);
 
+  bool use_widening_shift = false;
+
   /* Extend the offset element to address width.  */
   if (inner_offsize < BITS_PER_WORD)
 {
+  use_widening_shift = TARGET_ZVBB && zero_extend_p && shift == 1;
   /* 7.2. Vector Load/Store Addressing Modes.
 If the vector offset elements are narrower than XLEN, they are
 zero-extended to XLEN before adding to the ptr effective address. If
@@ -4054,8 +4057,8 @@ expand_gather_scatter (rtx *ops, bool is_load)
 raise an illegal instruction exception if the EEW is not supported for
 offset elements.
 
-RVV spec only refers to the scale_log == 0 case.  */
-  if (!zero_extend_p || scale_log2 != 0)
+RVV spec only refers to the shift == 0 case.  */
+  if (!zero_extend_p || shift)
{
  if (zero_extend_p)
inner_idx_mode
@@ -4064,19 +4067,32 @@ expand_gather_scatter (rtx *ops, bool is_load)
inner_idx_mode = int_mode_for_size (BITS_PER_WORD, 0).require ();
  machine_mode new_idx_mode
= get_vector_mode (inner_idx_mode, nunits).require ();
- rtx tmp = gen_reg_rtx (new_idx_mode);
- emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, idx_mode,
- zero_extend_p ? true : false));
- vec_offset = tmp;
+ if (!use_widening_shift)
+   {
+ rtx tmp = gen_reg_rtx (new_idx_mode);
+ emit_insn (gen_extend_insn (tmp, vec_offset, new_idx_mode, 
idx_mode,
+ zero_extend_p ? true : false));
+ vec_offset = tmp;
+   }
  idx_mode = new_idx_mode;
}
 }
 
-  if (scale_log2 != 0)
+  if (shift)
 {
-  rtx tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
- gen_int_mode (scale_log2, Pmode), NULL_RTX, 0,
- OPTAB_DIRECT);
+  rtx tmp;
+  if (!use_widening_shift)
+   tmp = expand_binop (idx_mode, ashl_optab, vec_offset,
+   gen_int_mode (shift, Pmode), NULL_RTX, 0,
+   OPTAB_DIRECT);
+  else
+   {
+ tmp = gen_reg_rtx (idx_mode);
+ insn_code icode = code_for_pred_vwsll_scalar (idx_mode);
+ rtx ops[] = {tmp, vec_offset, const1_rtx};
+ emit_vlmax_insn (icode, BINARY_OP, ops);
+   }
+
   vec_offset = tmp;
 }
 
diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index 24822e2712c..0ddc2f3f3c6 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ 

[PATCH] RISC-V: Add vwsll combine helpers.

2024-05-17 Thread Robin Dapp
Hi,

this patch enables the usage of vwsll in autovec context by adding the
necessary combine patterns and tests.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*vwsll_zext1_): New
pattern.
(*vwsll_zext2_): Ditto.
(*vwsll_zext1_scalar_): Ditto.
(*vwsll_zext1_trunc_): Ditto.
(*vwsll_zext2_trunc_): Ditto.
(*vwsll_zext1_trunc_scalar_): Ditto.
* config/riscv/vector-crypto.md: Make pattern similar to other
narrowing/widening patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vwsll-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vwsll-template.h: New test.
---
 gcc/config/riscv/autovec-opt.md   | 123 ++
 gcc/config/riscv/vector-crypto.md |   2 +-
 .../riscv/rvv/autovec/binop/vwsll-1.c |  10 ++
 .../riscv/rvv/autovec/binop/vwsll-run.c   |  67 ++
 .../riscv/rvv/autovec/binop/vwsll-template.h  |  49 +++
 5 files changed, 250 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vwsll-template.h

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 645dc53d868..06438f9e2f7 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1436,3 +1436,126 @@ (define_insn_and_split "*n"
 DONE;
   }
   [(set_attr "type" "vmalu")])
+
+;; vzext.vf2 + vsll = vwsll.
+(define_insn_and_split "*vwsll_zext1_"
+  [(set (match_operand:VWEXTI 0"register_operand" "=vr 
")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand" " vr "))
+ (match_operand: 2 "vector_shift_operand" "vrvk")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_"
+  [(set (match_operand:VWEXTI 0"register_operand" "=vr 
")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand" " vr "))
+   (zero_extend:VWEXTI
+ (match_operand: 2 "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+
+(define_insn_and_split "*vwsll_zext1_scalar_"
+  [(set (match_operand:VWEXTI 0"register_operand"  
  "=vr")
+  (ashift:VWEXTI
+   (zero_extend:VWEXTI
+ (match_operand: 1 "register_operand"" 
vr"))
+ (match_operand:2 "vector_scalar_shift_operand" " 
rK")))]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+if (GET_CODE (operands[2]) == SUBREG)
+  operands[2] = SUBREG_REG (operands[2]);
+insn_code icode = code_for_pred_vwsll_scalar (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+;; For
+;;   uint16_t dst;
+;;   uint8_t a, b;
+;;   dst = vwsll (a, b)
+;; we seem to create
+;;   aa = (int) a;
+;;   bb = (int) b;
+;;   dst = (short) vwsll (aa, bb);
+;; The following patterns help to combine this idiom into one vwsll.
+
+(define_insn_and_split "*vwsll_zext1_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+   (zero_extend:VQEXTI
+ (match_operand: 1   "register_operand" " vr "))
+   (match_operand:VQEXTI   2   "vector_shift_operand" "vrvk"]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+
+(define_insn_and_split "*vwsll_zext2_trunc_"
+  [(set (match_operand: 0   "register_operand""=vr ")
+(truncate:
+  (ashift:VQEXTI
+   (zero_extend:VQEXTI
+ (match_operand: 1   "register_operand" " vr "))
+   (zero_extend:VQEXTI
+ (match_operand: 2   "vector_shift_operand" "vrvk")]
+  "TARGET_ZVBB && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+insn_code icode = code_for_pred_vwsll (mode);
+riscv_vector::emit_vlmax_insn (icode, riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+  [(set_attr "type" "vwsll")])
+

[PATCH] RISC-V: Split vwadd.wx and vwsub.wx and add helpers.

2024-05-17 Thread Robin Dapp
Hi,

vwadd.wx and vwsub.wx have the same problem vfwadd.wf had.  This patch
splits the insn pattern in the same way vfwadd.wf was split.

It also adds two patterns to recognize extended scalars.  In practice
those do not provide a lot of improvement over what we already have but
in some instances we can get rid of redundant extensions.  If somebody
considers the patterns excessive, I'd be open to not add them.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/vector.md: Split vwadd.wx/vwsub.wx pattern and
add extended_scalar patterns.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr115068.c: Add vwadd.wx/vwsub.wx
tests.
* gcc.target/riscv/rvv/base/pr115068-run.c: Include pr115068.c.
* gcc.target/riscv/rvv/base/vwaddsub-1.c: New test.
---
 gcc/config/riscv/vector.md| 62 ---
 .../gcc.target/riscv/rvv/base/pr115068-run.c  | 24 +--
 .../gcc.target/riscv/rvv/base/pr115068.c  | 26 
 .../gcc.target/riscv/rvv/base/vwaddsub-1.c| 47 ++
 4 files changed, 127 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vwaddsub-1.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 107914afa3a..248461302dd 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3900,27 +3900,71 @@ (define_insn 
"@pred_single_widen_add"
(set_attr "mode" "")])
 
 (define_insn 
"@pred_single_widen__scalar"
-  [(set (match_operand:VWEXTI 0 "register_operand"   "=vr,   
vr")
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
(if_then_else:VWEXTI
  (unspec:
-   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
-(match_operand 5 "vector_length_operand"  "   rK,   
rK")
-(match_operand 6 "const_int_operand"  "i,
i")
-(match_operand 7 "const_int_operand"  "i,
i")
-(match_operand 8 "const_int_operand"  "i,
i")
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand"  " rK,rK, rK, 
rK")
+(match_operand 6 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 7 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 8 "const_int_operand"  "  i, i,  i, 
 i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus_minus:VWEXTI
-   (match_operand:VWEXTI 3 "register_operand" "   vr,   
vr")
+   (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, 
vr")
(any_extend:VWEXTI
  (vec_duplicate:
-   (match_operand: 4 "reg_or_0_operand"   "   rJ,   
rJ"
- (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,
0")))]
+   (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, 
rJ"
+ (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu, 
 0")))]
   "TARGET_VECTOR"
   "vw.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "vi")
(set_attr "mode" "")])
 
+(define_insn "@pred_single_widen_add_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+   (if_then_else:VWEXTI
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand"  " rK,rK, rK, 
rK")
+(match_operand 6 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 7 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 8 "const_int_operand"  "  i, i,  i, 
 i")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+ (plus:VWEXTI
+   (vec_duplicate:VWEXTI
+ (any_extend:
+   (match_operand: 4 "reg_or_0_operand"   " rJ,rJ, rJ, 
rJ")))
+   (match_operand:VWEXTI 3 "register_operand" " vr,vr, vr, 
vr"))
+ (match_operand:VWEXTI 2 "vector_merge_operand"   " vu, 0, vu, 
 0")))]
+  "TARGET_VECTOR"
+  "vwadd.wx\t%0,%3,%z4%p1"
+  [(set_attr "type" "viwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub_extended_scalar"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=vd,vd, 
vr, vr")
+   (if_then_else:VWEXTI
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   " 
vm,vm,Wc1,Wc1")
+(match_operand 5 "vector_length_operand"  " rK,rK, rK, 
rK")
+(match_operand 6 "const_int_operand"  "  i, i,  i, 
 i")
+(match_operand 7 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Victor Do Nascimento

Dear Richard and Tamar,

Thanks to the both of you for the various bits of feedback.
I've implemented all the more straightforward bits of feedback given, 
leaving "only" the merging of the two- and four-way dot product optabs 
into one, together with the necessary changes to the various backends 
which, though a little time-consuming, should be rather mechanical.


I had originally implemented the new two-way dotprod optab as a covert 
optab anyway, so going back to the work on that local branch will give 
me a good starting point from which to do this.


And Tamar, thanks very much for the feedback regarding the unit-tests. 
I knew my testing as it currently is was rather anaemic and was eager to 
get the relevant feedback on it.  Rest assured it's all been taken on board.


Cheers,
Victor


On 5/17/24 11:13, Richard Biener wrote:

On Fri, May 17, 2024 at 11:56 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 10:46 AM
To: Tamar Christina 
Cc: Victor Do Nascimento ; gcc-
patc...@gcc.gnu.org; Richard Sandiford ; Richard
Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Fri, May 17, 2024 at 11:05 AM Tamar Christina
 wrote:



-Original Message-
From: Richard Biener 
Sent: Friday, May 17, 2024 6:51 AM
To: Victor Do Nascimento 
Cc: gcc-patches@gcc.gnu.org; Richard Sandiford

;

Richard Earnshaw ; Victor Do Nascimento

Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
autovectorizer

On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
 wrote:


From: Victor Do Nascimento 

At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
optabs for dealing with vectorizable dot product code sequences.  The
consequence of using a direct optab for this is that backend-pattern
selection is only ever able to match against one datatype - Either
that of the operands or of the accumulated value, never both.

With the introduction of the 2-way (un)signed dot-product insn [1][2]
in AArch64 SVE2, the existing direct opcode approach is no longer
sufficient for full specification of all the possible dot product
machine instructions to be matched to the code sequence; a dot product
resulting in VNx4SI may result from either dot products on VNx16QI or
VNx8HI values for the 4- and 2-way dot product operations, respectively.

This means that the following example fails autovectorization:

uint32_t foo(int n, uint16_t* data) {
   uint32_t sum = 0;
   for (int i=0; i

I don't like this too much.  I'll note we document dot_prod as

@cindex @code{sdot_prod@var{m}} instruction pattern
@item @samp{sdot_prod@var{m}}

Compute the sum of the products of two signed elements.
Operand 1 and operand 2 are of the same mode. Their
product, which is of a wider mode, is computed and added to operand 3.
Operand 3 is of a mode equal or wider than the mode of the product. The
result is placed in operand 0, which is of the same mode as operand 3.
@var{m} is the mode of operand 1 and operand 2.

with no restriction on the wider mode but we don't specify it which is
bad design.  This should have been a convert optab with two modes
from the start - adding a _twoway variant is just a hack.


We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and

altivec.


Which sure could be possible, but there's also every use in the backends that

need

to be updated, and tested, which for some targets we don't even know how to

begin.


So it seems very hard to correct dotprod to a convert optab now.


It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way
variants?
Add yet another optab?


I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..


Yep.  It shouldn't be difficult though.



Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.


Sure, all the zero and sign extending optabs for instance 


But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.


Tamar



Richard.


Tamar



Richard.


In order to minimize changes to the existing codebase,
`optab_for_tree_code' is 

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-17 Thread Tamar Christina
Hi Pan,

> 
> Hi Tamar,
> 
> I am trying to add more shape(s) like below branch version for SAT_ADD. I 
> suspect
> that widening_mul may not be the best place to take care of this shape.
> Because after_dom_children almost works on bb but we actually need to find the
> def/use cross the bb.

It actually already does this, see for example optimize_spaceship which 
optimizes
across basic blocks. However...

> 
> Thus, is there any suggestion for branch shape? Add new simplify to match.pd
> works well but it is not recommended per previous discussion.

The objection previously was not to introduce the IFNs at match.pd, it doesn't
mean we can't use match.pd to force the versions with branches to banchless
code so the existing patterns can deal with them as is.

...in this case something like this:

#if GIMPLE
(simplify
 (cond (ge (plus:c@3 @0 @1) @0) @3 integer_minus_onep)
  (if (direct_internal_fn_supported_p (...))
   (bit_ior @3 (negate (...)
#endif

Works better I think.

That is, for targets we know we can optimize it later on, or do something with 
it
in the vectorizer we canonicalize it.  The reason I have it guarded with the 
IFN is
that some target maintainers objected to replacing the branch code with 
branchless
code as their targets can more optimally deal with branches.

Cheers,
Tamar
> 
> Thanks a lot for help!
> 
> Pan
> 
> ---Source code-
> 
> #define SAT_ADD_U_1(T) \
> T sat_add_u_1_##T(T x, T y) \
> { \
>   return (T)(x + y) >= x ? (x + y) : -1; \
> }
> 
> SAT_ADD_U_1(uint16_t)
> 
> ---Gimple-
> 
> uint16_t sat_add_u_1_uint16_t (uint16_t x, uint16_t y)
> {
>   short unsigned int _1;
>   uint16_t _2;
> 
>[local count: 1073741824]:
>   _1 = x_3(D) + y_4(D);
>   if (_1 >= x_3(D))
> goto ; [65.00%]
>   else
> goto ; [35.00%]
> 
>[local count: 697932184]:
> 
>[local count: 1073741824]:
>   # _2 = PHI <65535(2), _1(3)>
>   return _2;
> }
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 15, 2024 5:12 PM
> To: Li, Pan2 ; gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com;
> Liu, Hongtao 
> Subject: RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar int
> 
> Hi Pan,
> 
> Thanks!
> 
> > -Original Message-
> > From: pan2...@intel.com 
> > Sent: Wednesday, May 15, 2024 3:14 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> > ; richard.guent...@gmail.com;
> > hongtao@intel.com; Pan Li 
> > Subject: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned
> scalar
> > int
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation add.  Aka set the result of add to the max when overflow.
> > It will take the pattern similar as below.
> >
> > SAT_ADD (x, y) => (x + y) | (-(TYPE)((TYPE)(x + y) < x))
> >
> > Take uint8_t as example, we will have:
> >
> > * SAT_ADD (1, 254)   => 255.
> > * SAT_ADD (1, 255)   => 255.
> > * SAT_ADD (2, 255)   => 255.
> > * SAT_ADD (255, 255) => 255.
> >
> > Given below example for the unsigned scalar integer uint64_t:
> >
> > uint64_t sat_add_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) | (- (uint64_t)((uint64_t)(x + y) < x));
> > }
> >
> > Before this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   long unsigned int _1;
> >   _Bool _2;
> >   long unsigned int _3;
> >   long unsigned int _4;
> >   uint64_t _7;
> >   long unsigned int _10;
> >   __complex__ long unsigned int _11;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _11 = .ADD_OVERFLOW (x_5(D), y_6(D));
> >   _1 = REALPART_EXPR <_11>;
> >   _10 = IMAGPART_EXPR <_11>;
> >   _2 = _10 != 0;
> >   _3 = (long unsigned int) _2;
> >   _4 = -_3;
> >   _7 = _1 | _4;
> >   return _7;
> > ;;succ:   EXIT
> >
> > }
> >
> > After this patch:
> > uint64_t sat_add_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _7;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _7 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
> >   return _7;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are passed for this patch:
> > 1. The riscv fully regression tests.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.cc (commutative_binary_fn_p): Add type IFN_SAT_ADD
> > to the return true switch case(s).
> > * internal-fn.def (SAT_ADD):  Add new signed optab SAT_ADD.
> > * match.pd: Add unsigned SAT_ADD match(es).
> > * optabs.def (OPTAB_NL): Remove fixed-point limitation for
> > us/ssadd.
> > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_add): New
> > extern func decl generated in match.pd match.
> > (match_saturation_arith): New func impl to match the saturation arith.
> > 

Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-05-17 Thread Jonathan Wakely
On Sun, 24 Mar 2024 at 21:34, Björn Schäpers  wrote:
>
> From: Björn Schäpers 
>
> This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
> I don't know if I picked the right way to do it.
>
> When acceptable I think the declaration should be moved into
> ops-common.h, since then we could use stat_type and also use that in the
> commonly used function.
>
> Manually tested on i686-w64-mingw32.
>
> -- >8 --
> libstdc++: Fix overwriting files on windows
>
> The inodes have no meaning on windows, thus all files have an inode of
> 0. Use a differenz approach to identify equivalent files. As a result
> std::filesystem::copy_file did not honor
> copy_options::overwrite_existing. Factored the method out of
> std::filesystem::equivalent.
>
> libstdc++-v3/Changelog:
>
> * include/bits/fs_ops.h: Add declaration of
>   __detail::equivalent_win32.
> * src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
> (fs::equivalent): Use __detail::equivalent_win32, factored the
> old test out.
> * src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
>   Use the function.
>
> Signed-off-by: Björn Schäpers 
> ---
>  libstdc++-v3/include/bits/fs_ops.h   |  8 +++
>  libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
>  libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
>  3 files changed, 60 insertions(+), 37 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/fs_ops.h 
> b/libstdc++-v3/include/bits/fs_ops.h
> index 90650c47b46..d10b78a4bdd 100644
> --- a/libstdc++-v3/include/bits/fs_ops.h
> +++ b/libstdc++-v3/include/bits/fs_ops.h
> @@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
>  namespace filesystem
>  {
> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> +namespace __detail
> +{
> +  bool
> +  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);

I don't think we want this declared in the public header, it should be
internal to the library.

Can it just be declared in ops-common.h instead?


> +} // namespace __detail
> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> +
>/** @addtogroup filesystem
> *  @{
> */
> diff --git a/libstdc++-v3/src/c++17/fs_ops.cc 
> b/libstdc++-v3/src/c++17/fs_ops.cc
> index 61df19753ef..3cc87d45237 100644
> --- a/libstdc++-v3/src/c++17/fs_ops.cc
> +++ b/libstdc++-v3/src/c++17/fs_ops.cc
> @@ -67,6 +67,49 @@
>  namespace fs = std::filesystem;
>  namespace posix = std::filesystem::__gnu_posix;
>
> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> +bool
> +fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
> +  error_code& ec)
> +{
> +  struct auto_handle {
> +explicit auto_handle(const path& p_)
> +: handle(CreateFileW(p_.c_str(), 0,
> +   FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> +   0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> +{ }
> +
> +~auto_handle()
> +{ if (*this) CloseHandle(handle); }
> +
> +explicit operator bool() const
> +{ return handle != INVALID_HANDLE_VALUE; }
> +
> +bool get_info()
> +{ return GetFileInformationByHandle(handle, ); }
> +
> +HANDLE handle;
> +BY_HANDLE_FILE_INFORMATION info;
> +  };
> +  auto_handle h1(p1);

This creates a new filesystem::path, just to call c_str() on it
immediately. The auto_handle ctor should be changed to just take const
wchar_t* so we don't need to allocate and parse new path objects.

> +  auto_handle h2(p2);
> +  if (!h1 || !h2)
> +{
> +  if (!h1 && !h2)
> +   ec = __last_system_error();
> +  return false;
> +}
> +  if (!h1.get_info() || !h2.get_info())
> +{
> +  ec = __last_system_error();
> +  return false;
> +}
> +  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
> +&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
> +&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
> +}
> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> +
>  fs::path
>  fs::absolute(const path& p)
>  {
> @@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, 
> error_code& ec) noexcept
>if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
> return false;
>
> -  struct auto_handle {
> -   explicit auto_handle(const path& p_)
> -   : handle(CreateFileW(p_.c_str(), 0,
> - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> - 0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> -   { }
> -
> -   ~auto_handle()
> -   { if (*this) CloseHandle(handle); }
> -
> -   explicit operator bool() const
> -   { return handle != INVALID_HANDLE_VALUE; }
> -
> -   bool get_info()
> -   { return GetFileInformationByHandle(handle, ); }
> -
> -   HANDLE handle;
> -   BY_HANDLE_FILE_INFORMATION info;
> -  };
> -  auto_handle h1(p1);
> -  auto_handle h2(p2);
> -  if (!h1 || !h2)
> -   {
> - if (!h1 && !h2)
> -   ec = __last_system_error();
> -   

Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-17 Thread Ajit Agarwal
Hello Alex:

On 17/05/24 6:22 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> On 17/05/2024 18:05, Ajit Agarwal wrote:
>> Hello Alex:
>>
>> On 16/05/24 10:21 pm, Alex Coplan wrote:
>>> Hi Ajit,
>>>
>>> Thanks a lot for working through the review feedback.
>>>
>>
>> Thanks a lot for reviewing the code and approving the patch.
> 
> To be clear, I didn't approve the patch because I can't, I just said
> that it looks good to me.  You need an AArch64 maintainer (probably
> Richard S) to approve it.
> 

Thats what I meant. Sorry for the confusion.
>>
>>> The patch LGTM with the two minor suggested changes below.  I can't
>>> approve the patch, though, so you'll need an OK from Richard S.
>>>
>>> Also, I'm not sure if it makes sense to apply the patch in isolation, it
>>> might make more sense to only apply it in series with follow-up patches to:
>>>  - Finish renaming any bits of the generic code that need renaming (I
>>>guess we'll want to rename at least ldp_bb_info to something else,
>>>probably there are other bits too).
>>>  - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
>>>middle-end.
>>>
>>
>> Addressed in separate patch sent.
> 
> Hmm, that doens't look right.  You sent a single patch here:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652028.html
> which looks to squash the work you've done in this patch together with
> the move.
> 
> What I expect to see is a patch series, as follows:
> 
> [PATCH 1/3] aarch64: Split generic code from aarch64 code in ldp fusion
> [PATCH 2/3] aarch64: Further renaming of generic code
> [PATCH 3/3] aarch64, middle-end: Move pair_fusion pass from aarch64 to 
> middle-end
> 
> where 1/3 is exactly the patch that I reviewed above with the two
> (minor) requested changes (plus any changes requested by Richard), 2/3
> (optionally) does further renaming to use generic terminology in the
> generic code where needed/desired, and 3/3 does a straight cut/paste
> move of code into pair-fusion.h and pair-fusion.cc, with no other
> changes (save for perhaps a Makefile change and adding an include in
> aarch64-ldp-fusion.cc).
> 
> Arguably you could split this even further and do the move of the
> pair_fusion class to the new header in a separate patch prior to the
> final move.
> 
> N.B. (IMO) the patches should be presented like this both for review and
> (if approved) when committing.
> 
> Richard S may have further suggestions on how to split the patches /
> make them more tractable to review, I think this is the bare minimum
> that is needed though.
> 

Sure, I will make patches as per above.

> Hope that makes sense.
> 
> Thanks,
> Alex
>

Thanks & Regards
Ajit
 
>>  
>>> I'll let Richard S make the final judgement on that.  I don't really
>>> mind either way.
>>
>> Sure.
>>
>> Thanks & Regards
>> Ajit
>>>
>>> On 15/05/2024 15:06, Ajit Agarwal wrote:
 Hello Alex/Richard:

 All review comments are addressed.

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface between target independent and dependent code.

 Target dependent code is the implementation of pure virtual function for
 aarch64 target and the call to target independent code.

 Bootstrapped and regtested on aarch64-linux-gnu.

 Thanks & Regards
 Ajit

 aarch64: Preparatory patch to place target independent and
 dependent changed code in one file

 Common infrastructure of load store pair fusion is divided into target
 independent and target dependent changed code.

 Target independent code is the Generic code with pure virtual function
 to interface betwwen target independent and dependent code.

 Target dependent code is the implementation of pure virtual function for
 aarch64 target and the call to target independent code.

 2024-05-15  Ajit Kumar Agarwal  

 gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Place target
independent and dependent changed code.
 ---
  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
  1 file changed, 357 insertions(+), 176 deletions(-)

 diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
 b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 index 1d9caeab05d..429e532ea3b 100644
 --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
 +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
 @@ -138,6 +138,225 @@ struct alt_base
poly_int64 offset;
  };
  
 +// Virtual base class for load/store walkers used in alias analysis.
 +struct alias_walker
 +{
 +  virtual bool conflict_p (int ) const = 0;
 +  virtual insn_info *insn () const = 0;
 +  virtual bool valid () const = 0;
 +  virtual void advance () = 0;
 +};
 +
 +// When querying 

Re: [PATCH] Use DW_TAG_module for Ada

2024-05-17 Thread Jakub Jelinek
On Fri, May 03, 2024 at 11:08:04AM -0600, Tom Tromey wrote:
> DWARF is not especially clear on the distinction between
> DW_TAG_namespace and DW_TAG_module, but I think that DW_TAG_module is
> more appropriate for Ada.  This patch changes the compiler to do this.
> Note that the Ada compiler does not yet create NAMESPACE_DECLs.
> 
> gcc
> 
>   * dwarf2out.cc (gen_namespace_die): Use DW_TAG_module for Ada.

Ok, thanks.

> diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
> index 1b0e8b5a5b2..1e46c27cdf7 100644
> --- a/gcc/dwarf2out.cc
> +++ b/gcc/dwarf2out.cc
> @@ -26992,7 +26992,7 @@ gen_namespace_die (tree decl, dw_die_ref context_die)
>  {
>/* Output a real namespace or module.  */
>context_die = setup_namespace_context (decl, comp_unit_die ());
> -  namespace_die = new_die (is_fortran () || is_dlang ()
> +  namespace_die = new_die (is_fortran () || is_dlang () || is_ada ()
>  ? DW_TAG_module : DW_TAG_namespace,
>  context_die, decl);
>/* For Fortran modules defined in different CU don't add src coords.  
> */
> -- 
> 2.44.0

Jakub



[COMMITTED] [prange] Drop range to VARYING if the bitmask intersection made it so [PR115131]

2024-05-17 Thread Aldy Hernandez
If the intersection of the bitmasks made the range span the entire
domain, normalize the range to VARYING.

gcc/ChangeLog:

PR middle-end/115131
* value-range.cc (prange::intersect): Set VARYING if intersection
of bitmasks made the range span the entire domain.
(range_tests_misc): New test.
---
 gcc/value-range.cc | 21 +
 1 file changed, 21 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 334ffb70fbc..b38d6159a85 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -589,6 +589,11 @@ prange::intersect (const vrange )
   irange_bitmask new_bitmask = get_bitmask_from_range (m_type, m_min, m_max);
   m_bitmask.intersect (new_bitmask);
   m_bitmask.intersect (r.m_bitmask);
+  if (varying_compatible_p ())
+{
+  set_varying (type ());
+  return true;
+}
 
   if (flag_checking)
 verify_range ();
@@ -2889,6 +2894,22 @@ range_tests_misc ()
   p0.invert ();
   ASSERT_TRUE (p0 == p1);
 
+  // The intersection of:
+  //[0, +INF] MASK 0xff..00 VALUE 0xf8
+  //[0, +INF] MASK 0xff..00 VALUE 0x00
+  // is [0, +INF] MASK 0xff..ff VALUE 0x00, which is VARYING.
+  // Test that we normalized to VARYING.
+  unsigned prec = TYPE_PRECISION (voidp);
+  p0.set_varying (voidp);
+  wide_int mask = wi::mask (8, true, prec);
+  wide_int value = wi::uhwi (0xf8, prec);
+  irange_bitmask bm (wi::uhwi (0xf8, prec), mask);
+  p0.update_bitmask (bm);
+  p1.set_varying (voidp);
+  bm = irange_bitmask (wi::zero (prec), mask);
+  p1.update_bitmask (bm);
+  p0.intersect (p1);
+
   // [10,20] U [15, 30] => [10, 30].
   r0 = range_int (10, 20);
   r1 = range_int (15, 30);
-- 
2.45.0



Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-17 Thread Segher Boessenkool
On Fri, May 17, 2024 at 10:38:54AM +0800, HAO CHEN GUI wrote:
> This expand calls gen_xststdcp which is a P9 vector instruction and
> relies on "TARGET_P9_VECTOR". So I set the condition.

Why?  It needs P9, sure, and MSR[VSX] set, but the operands being VSX
registers takes care of that, heh.

But it's fine, the insn patterns it uses use the same conditions
already.


Segher


[PATCH] libstdc++: Implement std::formatter without [PR115099]

2024-05-17 Thread Jonathan Wakely
Does anybody see any issue with the drive-by fixes to constraint
std::formatter to only work for pointers and integers (since
we don't know how to format pthread_t if it's an arbitrary struct, for
example), and to cast pointers to const void* for output (because if
pthread_t is char* then writing it to a stream would be bad! and we
don't want to allow users to overload operator<< for pointers to opaque
structs, for example). I don't think this will change anything in
practice for common targets, where pthread_t is either an integer or
void*.

Tested x86_64-linux.

-- >8 --

The std::thread::id formatter uses std::basic_ostringstream without
including , which went unnoticed because the test for it uses
a stringstream to check the output is correct.

The fix implemented here is to stop using basic_ostringstream for
formatting thread::id and just use std::format instead.

As a drive-by fix, the formatter specialization is constrained to
require that the thread::id::native_handle_type can be formatted, to
avoid making the formatter ill-formed if the pthread_t type is not a
pointer or integer. Since non-void pointers can't be formatted, ensure
that we convert pointers to const void* for formatting. Make a similar
change to the existing operator<< overload so that in the unlikely case
that pthread_t is a typedef for char* we don't treat it as a
null-terminated string when inserting into a stream.

libstdc++-v3/ChangeLog:

PR libstdc++/115099
* include/bits/std_thread.h: Declare formatter as friend of
thread::id.
* include/std/thread (operator<<): Convert non-void pointers to
void pointers for output.
(formatter): Add constraint that thread::native_handle_type is a
pointer or integer.
(formatter::format): Reimplement without basic_ostringstream.
* testsuite/30_threads/thread/id/output.cc: Check output
compiles before  has been included.
---
 libstdc++-v3/include/bits/std_thread.h| 11 -
 libstdc++-v3/include/std/thread   | 43 ++-
 .../testsuite/30_threads/thread/id/output.cc  | 21 -
 3 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/std_thread.h 
b/libstdc++-v3/include/bits/std_thread.h
index 2d7df12650d..5817bfb29dd 100644
--- a/libstdc++-v3/include/bits/std_thread.h
+++ b/libstdc++-v3/include/bits/std_thread.h
@@ -53,6 +53,10 @@ namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+#if __glibcxx_formatters
+  template class formatter;
+#endif
+
   /** @addtogroup threads
*  @{
*/
@@ -117,13 +121,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
friend basic_ostream<_CharT, _Traits>&
operator<<(basic_ostream<_CharT, _Traits>& __out, id __id);
+
+#if __glibcxx_formatters
+  friend formatter;
+  friend formatter;
+#endif
 };
 
   private:
 id _M_id;
 
 // _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 2097.  packaged_task constructors should be constrained
+// 2097. packaged_task constructors should be constrained
 // 3039. Unnecessary decay in thread and packaged_task
 template
   using __not_same = __not_, thread>>;
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 09ca3116e7f..e994d683bff 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -42,10 +42,6 @@
 # include  // std::stop_source, std::stop_token, std::nostopstate
 #endif
 
-#if __cplusplus > 202002L
-# include 
-#endif
-
 #include  // std::thread, get_id, yield
 #include  // std::this_thread::sleep_for, sleep_until
 
@@ -53,6 +49,10 @@
 #define __glibcxx_want_formatters
 #include 
 
+#if __cpp_lib_formatters
+# include 
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -104,10 +104,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline basic_ostream<_CharT, _Traits>&
 operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id)
 {
+  // Convert non-void pointers to const void* for formatted output.
+  using __output_type
+   = __conditional_t::value,
+ const void*,
+ thread::native_handle_type>;
+
   if (__id == thread::id())
return __out << "thread::id of a non-executing thread";
   else
-   return __out << __id._M_thread;
+   return __out << static_cast<__output_type>(__id._M_thread);
 }
   /// @}
 
@@ -287,8 +293,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // __cpp_lib_jthread
 
 #ifdef __cpp_lib_formatters // C++ >= 23
-
   template
+requires is_pointer_v
+  || is_integral_v
 class formatter
 {
 public:
@@ -331,10 +338,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
typename basic_format_context<_Out, _CharT>::iterator
format(thread::id __id, basic_format_context<_Out, _CharT>& __fc) const
{
- std::basic_ostringstream<_CharT> 

Re: [PATCH] Use DW_TAG_module for Ada

2024-05-17 Thread Tom Tromey
> "Tom" == Tom Tromey  writes:

Tom> DWARF is not especially clear on the distinction between
Tom> DW_TAG_namespace and DW_TAG_module, but I think that DW_TAG_module is
Tom> more appropriate for Ada.  This patch changes the compiler to do this.
Tom> Note that the Ada compiler does not yet create NAMESPACE_DECLs.

Ping.

Tom


Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-17 Thread Alex Coplan
Hi Ajit,

On 17/05/2024 18:05, Ajit Agarwal wrote:
> Hello Alex:
> 
> On 16/05/24 10:21 pm, Alex Coplan wrote:
> > Hi Ajit,
> > 
> > Thanks a lot for working through the review feedback.
> > 
> 
> Thanks a lot for reviewing the code and approving the patch.

To be clear, I didn't approve the patch because I can't, I just said
that it looks good to me.  You need an AArch64 maintainer (probably
Richard S) to approve it.

> 
> > The patch LGTM with the two minor suggested changes below.  I can't
> > approve the patch, though, so you'll need an OK from Richard S.
> > 
> > Also, I'm not sure if it makes sense to apply the patch in isolation, it
> > might make more sense to only apply it in series with follow-up patches to:
> >  - Finish renaming any bits of the generic code that need renaming (I
> >guess we'll want to rename at least ldp_bb_info to something else,
> >probably there are other bits too).
> >  - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
> >middle-end.
> >
> 
> Addressed in separate patch sent.

Hmm, that doens't look right.  You sent a single patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652028.html
which looks to squash the work you've done in this patch together with
the move.

What I expect to see is a patch series, as follows:

[PATCH 1/3] aarch64: Split generic code from aarch64 code in ldp fusion
[PATCH 2/3] aarch64: Further renaming of generic code
[PATCH 3/3] aarch64, middle-end: Move pair_fusion pass from aarch64 to 
middle-end

where 1/3 is exactly the patch that I reviewed above with the two
(minor) requested changes (plus any changes requested by Richard), 2/3
(optionally) does further renaming to use generic terminology in the
generic code where needed/desired, and 3/3 does a straight cut/paste
move of code into pair-fusion.h and pair-fusion.cc, with no other
changes (save for perhaps a Makefile change and adding an include in
aarch64-ldp-fusion.cc).

Arguably you could split this even further and do the move of the
pair_fusion class to the new header in a separate patch prior to the
final move.

N.B. (IMO) the patches should be presented like this both for review and
(if approved) when committing.

Richard S may have further suggestions on how to split the patches /
make them more tractable to review, I think this is the bare minimum
that is needed though.

Hope that makes sense.

Thanks,
Alex

>  
> > I'll let Richard S make the final judgement on that.  I don't really
> > mind either way.
> 
> Sure.
> 
> Thanks & Regards
> Ajit
> > 
> > On 15/05/2024 15:06, Ajit Agarwal wrote:
> >> Hello Alex/Richard:
> >>
> >> All review comments are addressed.
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface between target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> Bootstrapped and regtested on aarch64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> aarch64: Preparatory patch to place target independent and
> >> dependent changed code in one file
> >>
> >> Common infrastructure of load store pair fusion is divided into target
> >> independent and target dependent changed code.
> >>
> >> Target independent code is the Generic code with pure virtual function
> >> to interface betwwen target independent and dependent code.
> >>
> >> Target dependent code is the implementation of pure virtual function for
> >> aarch64 target and the call to target independent code.
> >>
> >> 2024-05-15  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >>* config/aarch64/aarch64-ldp-fusion.cc: Place target
> >>independent and dependent changed code.
> >> ---
> >>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
> >>  1 file changed, 357 insertions(+), 176 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> >> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> index 1d9caeab05d..429e532ea3b 100644
> >> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> >> @@ -138,6 +138,225 @@ struct alt_base
> >>poly_int64 offset;
> >>  };
> >>  
> >> +// Virtual base class for load/store walkers used in alias analysis.
> >> +struct alias_walker
> >> +{
> >> +  virtual bool conflict_p (int ) const = 0;
> >> +  virtual insn_info *insn () const = 0;
> >> +  virtual bool valid () const = 0;
> >> +  virtual void advance () = 0;
> >> +};
> >> +
> >> +// When querying handle_writeback_opportunities, this enum is used to
> >> +// qualify which opportunities we are asking about.
> >> +enum class writeback {
> >> +  // Only those writeback opportunities that arise from existing
> >> +  // auto-increment accesses.
> >> +  EXISTING,
> > 
> > Very 

[committed] libstdc++: Fix typo in _Grapheme_cluster_view::_Iterator [PR115119]

2024-05-17 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk, gcc-14 backport to follow.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/115119
* include/bits/unicode.h (_Iterator::operator++(int)): Fix typo
in increment expression.
* testsuite/ext/unicode/grapheme_view.cc: Check post-increment
on view's iterator.
---
 libstdc++-v3/include/bits/unicode.h |  6 --
 libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc | 11 +++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index 46238143fb6..a14a17c5dfc 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -34,10 +34,12 @@
 #include 
 #include   // bit_width
 #include  // __detail::__from_chars_alnum_to_val_table
+#include 
 #include 
 #include 
 #include 
-#include 
+#include  // iterator_t, sentinel_t, input_range, etc.
+#include  // view_interface
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -802,7 +804,7 @@ inline namespace __v15_1_0
operator++(int)
{
  auto __tmp = *this;
- ++this;
+ ++*this;
  return __tmp;
}
 
diff --git a/libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc 
b/libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc
index ac1e8c50b05..a3bb36e14b8 100644
--- a/libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc
+++ b/libstdc++-v3/testsuite/ext/unicode/grapheme_view.cc
@@ -83,10 +83,21 @@ test_breaks()
   VERIFY( iter == gv.end() );
 }
 
+constexpr void
+test_pr115119()
+{
+  // PR 115119 Typo in _Grapheme_cluster_view::_Iterator::operator++(int)
+  uc::_Grapheme_cluster_view gv(" "sv);
+  auto it = std::ranges::begin(gv);
+  it++;
+  VERIFY( it == std::ranges::end(gv) );
+}
+
 int main()
 {
   auto run_tests = []{
 test_breaks();
+test_pr115119();
 return true;
   };
 
-- 
2.45.0



Re: [Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-17 Thread Ajit Agarwal
Hello Alex:

On 16/05/24 10:21 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> Thanks a lot for working through the review feedback.
> 

Thanks a lot for reviewing the code and approving the patch.

> The patch LGTM with the two minor suggested changes below.  I can't
> approve the patch, though, so you'll need an OK from Richard S.
> 
> Also, I'm not sure if it makes sense to apply the patch in isolation, it
> might make more sense to only apply it in series with follow-up patches to:
>  - Finish renaming any bits of the generic code that need renaming (I
>guess we'll want to rename at least ldp_bb_info to something else,
>probably there are other bits too).
>  - Move the generic parts out of gcc/config/aarch64 to a .cc file in the
>middle-end.
>

Addressed in separate patch sent.
 
> I'll let Richard S make the final judgement on that.  I don't really
> mind either way.

Sure.

Thanks & Regards
Ajit
> 
> On 15/05/2024 15:06, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All review comments are addressed.
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface between target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> Bootstrapped and regtested on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>> aarch64: Preparatory patch to place target independent and
>> dependent changed code in one file
>>
>> Common infrastructure of load store pair fusion is divided into target
>> independent and target dependent changed code.
>>
>> Target independent code is the Generic code with pure virtual function
>> to interface betwwen target independent and dependent code.
>>
>> Target dependent code is the implementation of pure virtual function for
>> aarch64 target and the call to target independent code.
>>
>> 2024-05-15  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * config/aarch64/aarch64-ldp-fusion.cc: Place target
>>  independent and dependent changed code.
>> ---
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 533 +++
>>  1 file changed, 357 insertions(+), 176 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 1d9caeab05d..429e532ea3b 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> @@ -138,6 +138,225 @@ struct alt_base
>>poly_int64 offset;
>>  };
>>  
>> +// Virtual base class for load/store walkers used in alias analysis.
>> +struct alias_walker
>> +{
>> +  virtual bool conflict_p (int ) const = 0;
>> +  virtual insn_info *insn () const = 0;
>> +  virtual bool valid () const = 0;
>> +  virtual void advance () = 0;
>> +};
>> +
>> +// When querying handle_writeback_opportunities, this enum is used to
>> +// qualify which opportunities we are asking about.
>> +enum class writeback {
>> +  // Only those writeback opportunities that arise from existing
>> +  // auto-increment accesses.
>> +  EXISTING,
> 
> Very minor nit: I think an extra blank line here would be nice for readability
> now that the enumerators have comments above.
> 
>> +  // All writeback opportunities including those that involve folding
>> +  // base register updates into a non-writeback pair.
>> +  ALL
>> +};
>> +
> 
> Can we have a block comment here which describes the purpose of the
> class and how it fits together with the target?  Something like the
> following would do:
> 
> // This class can be overriden by targets to give a pass that fuses
> // adjacent loads and stores into load/store pair instructions.
> //
> // The target can override the various virtual functions to customize
> // the behaviour of the pass as appropriate for the target.
> 
>> +struct pair_fusion {
>> +  pair_fusion ()
>> +  {
>> +calculate_dominance_info (CDI_DOMINATORS);
>> +df_analyze ();
>> +crtl->ssa = new rtl_ssa::function_info (cfun);
>> +  };
>> +
>> +  // Given:
>> +  // - an rtx REG_OP, the non-memory operand in a load/store insn,
>> +  // - a machine_mode MEM_MODE, the mode of the MEM in that insn, and
>> +  // - a boolean LOAD_P (true iff the insn is a load), then:
>> +  // return true if the access should be considered an FP/SIMD access.
>> +  // Such accesses are segregated from GPR accesses, since we only want
>> +  // to form pairs for accesses that use the same register file.
>> +  virtual bool fpsimd_op_p (rtx, machine_mode, bool)
>> +  {
>> +return false;
>> +  }
>> +
>> +  // Return true if we should consider forming pairs from memory
>> +  // accesses with operand mode MODE at this stage in compilation.
>> +  virtual bool pair_operand_mode_ok_p (machine_mode mode) = 0;
>> +
>> +  // Return true iff REG_OP is a suitable register operand for a paired
>> +  // 

Re: [PATCH v2] c++/modules: Remember that header units have CMIs

2024-05-17 Thread Nathaniel Shead
On Fri, May 17, 2024 at 04:14:31PM +1000, Nathaniel Shead wrote:
> On Tue, May 14, 2024 at 06:21:48PM -0400, Jason Merrill wrote:
> > On 5/12/24 22:58, Nathaniel Shead wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > 
> > OK.
> > 
> 
> I realised as I was looking over this again that I might have spoken too
> soon with the header unit example being supported. Doing the following:
> 
>   // a.H
>   struct { int y; } s;
>   decltype(s) f(decltype(s));  // { dg-error "used but never defined" }
>   inline auto x = f({ 123 });
>   
>   // b.C 
>   struct {} unrelated;
>   import "a.H";
>   decltype(s) f(decltype(s) x) {
> return { 456 + x.y };
>   }
> 
>   // c.C
>   import "linkage-3_a.H";
>   int main() { auto a = x.y; }
> 
> Actually does fail to link, because in 'c.C' we call 'f(.anon_0)' but
> the definition 'b.C' is f(.anon_1).
> 
> I don't think this is fixable, so I don't think this direction is
> workable.
> 
> That said, I think that it might still be worth making header modules
> satisfy 'module_has_cmi_p', since that is true to the name, and will be
> useful in other places we currently use 'module_p ()': in which case we
> could instead make all the callers in 'no_linkage_check' do
> 'module_maybe_has_cmi_p () && !header_module_p ()'; something like the
> following, perhaps?
> 
> But I'm not too fussed about this overall if you think this will just
> make things more complicated. Otherwise bootstrapped and regtested (so
> far just modules.exp) on x86_64-pc-linux-gnu, OK for trunk if full
> regtest passes?
> 
> -- >8 --
> 
> This appears to be an oversight in the definition of module_has_cmi_p.
> This change will allow us to use the function directly in more places
> that need to additional work only if generating a module CMI in the
> future.
> 
> However, we do need to change callers of 'module_maybe_has_cmi_p'; in
> particular header units, though having a CMI, do not provide a TU to
> emit names into, and thus each importer will emit their own definitions
> which may not match for no-linkage types.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (module_has_cmi_p): Also true for header units.
>   * decl.cc (grokfndecl): Disallow no-linkage names in header
>   units.
>   * tree.cc (no_linkage_check): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/linkage-3.H: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/cp-tree.h |  2 +-
>  gcc/cp/decl.cc   |  2 +-
>  gcc/cp/tree.cc   | 13 +++-
>  gcc/testsuite/g++.dg/modules/linkage-3.H | 25 
>  4 files changed, 35 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/linkage-3.H
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index ba9e848c177..ac55b5579a1 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7381,7 +7381,7 @@ inline bool module_interface_p ()
>  inline bool module_partition_p ()
>  { return module_kind & MK_PARTITION; }
>  inline bool module_has_cmi_p ()
> -{ return module_kind & (MK_INTERFACE | MK_PARTITION); }
> +{ return module_kind & (MK_INTERFACE | MK_PARTITION | MK_HEADER); }
>  
>  inline bool module_purview_p ()
>  { return module_kind & MK_PURVIEW; }
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 6fcab615d55..f89a7df30b7 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -10802,7 +10802,7 @@ grokfndecl (tree ctype,
> used by an importer.  We don't just use module_has_cmi_p here
> because for entities in the GMF we don't yet know whether this
> module will have a CMI, so we'll conservatively assume it might.  */
> -publicp = module_maybe_has_cmi_p ();
> +publicp = module_maybe_has_cmi_p () && !header_module_p ();
>  
>if (publicp && cxx_dialect == cxx98)
>  {
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index 9d37d255d8d..00c50e3130d 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -2974,9 +2974,9 @@ verify_stmt_tree (tree t)
>  
>  /* Check if the type T depends on a type with no linkage and if so,
> return it.  If RELAXED_P then do not consider a class type declared
> -   within a vague-linkage function or in a module CMI to have no linkage,
> -   since it can still be accessed within a different TU.  Remember:
> -   no-linkage is not the same as internal-linkage.  */
> +   within a vague-linkage function or in a non-header module CMI to
> +   have no linkage, since it can still be accessed within a different TU.
> +   Remember: no-linkage is not the same as internal-linkage.  */
>  
>  tree
>  no_linkage_check (tree t, bool relaxed_p)
> @@ -3019,7 +3019,8 @@ no_linkage_check (tree t, bool relaxed_p)
>   {
> if (relaxed_p
> && TREE_PUBLIC (CP_TYPE_CONTEXT (t))
> -   && module_maybe_has_cmi_p ())
> +   && module_maybe_has_cmi_p ()
> +   && !header_module_p ())
>   /* This type 

Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]

2024-05-17 Thread Richard Biener
On Fri, 17 May 2024, Manolis Tsamis wrote:

> On Fri, May 17, 2024 at 12:22 PM Richard Biener  wrote:
> >
> > On Fri, 17 May 2024, Manolis Tsamis wrote:
> >
> > > Hi Richard,
> > >
> > > While I was re-testing the latest version of this patch I noticed that
> > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > > we generate one instruction more:
> > >
> > > sbfiz   x1, x1, 4, 32
> > > stp x29, x30, [sp, -16]!
> > > add x1, x1, 16
> > > mov x29, sp
> > > sub sp, sp, x1
> > > mov x0, sp
> > > bl  foo
> > >
> > > Instead of:
> > >
> > > stp x29, x30, [sp, -16]!
> > > add w1, w1, 1
> > > mov x29, sp
> > > sub sp, sp, w1, sxtw 4
> > > mov x0, sp
> > > bl  foo
> > >
> > > I've looked at it but can't really find a way to solve the regression.
> > > Any thoughts on this?
> >
> > Can you explain what goes wrong?  As I said rewriting parts of
> > address calculation is tricky, there's always the chance that some
> > cases regress (see your observation in comment#4 of the PR).
> >
> 
> In this case the int -> sizetype cast ends up happening earlier. Instead of
> 
>   _7 = y_6(D) + 1;
>   _1 = (sizetype) _7;
>   _2 = _1 * 16;
> 
> We get
> 
>   _13 = (sizetype) y_6(D);
>   _15 = _13 + 1;
>   _2 = _15 * 16;
> 
> and then in RTL we have
> 
> x1 = ((sizetype) x1) << 4
> sp = sp - (x1 + 16)
> 
> instead of
> 
> x1 = x1 + 1
> sp = sp - ((sizetype) x1) << 4
> 
> which doesn't form sub sp, sp, w1, sxtw 4.
> 
> But more importantly, I realized that (in this case among others) the
> pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
> -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
> so something needs to be changed.

Yes, we have that issue.  And we've guarded GIMPLE vs. non-GIMPLE and
have recursion limits in match to deal with this.  But yes, having
both is bad.  I'd say that clearly patterns reducing the number of
operations are good at least for canonicalization.

> One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
> CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
> deal with the testcases of the ticket but it does help in other cases.

The issue with such larger patterns is that they hint at the fact
the transform should happen with an eye on more than just the
small expresion.  Thus not in match.pd but in a pass like reassoc
or SLSR or IVOPTs or even CSE itself.  We also have to avoid
doing changes that cannot be undone when canonicalizing.

Richard.

> Manolis
> 
> > Note that I still believe that avoiding the early and premature
> > promotion of the addition to unsigned is a good thing.
> >
> > Note the testcase in the PR is fixed with -fwrapv because then
> > we do _not_ perform this premature optimization.  Without -fwrapv
> > the optimization is valid but as you note we do not perform it
> > consistently - otherwise we wouldn't regress.
> >
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Manolis
> > >
> > >
> > >
> > > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis 
> > > >  wrote:
> > > > >
> > > > > New patch with the requested changes can be found below.
> > > > >
> > > > > I don't know how much this affects SCEV, but I do believe that we
> > > > > should incorporate this change somehow. I've seen various cases of
> > > > > suboptimal address calculation codegen that boil down to this.
> > > >
> > > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > > of the match.pd part is now off.
> > > >
> > > > Please fix that, the patch is OK with that change.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > gcc/match.pd | 31 +++
> > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 
> > > > > 2 files changed, 47 insertions(+)
> > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 07e743ae464..1d642c205f0 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > (plus (convert @0) (op @2 (convert @1))
> > > > > #endif
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + INTEGER_CST@3)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && 

Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]

2024-05-17 Thread Manolis Tsamis
On Fri, May 17, 2024 at 12:22 PM Richard Biener  wrote:
>
> On Fri, 17 May 2024, Manolis Tsamis wrote:
>
> > Hi Richard,
> >
> > While I was re-testing the latest version of this patch I noticed that
> > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > we generate one instruction more:
> >
> > sbfiz   x1, x1, 4, 32
> > stp x29, x30, [sp, -16]!
> > add x1, x1, 16
> > mov x29, sp
> > sub sp, sp, x1
> > mov x0, sp
> > bl  foo
> >
> > Instead of:
> >
> > stp x29, x30, [sp, -16]!
> > add w1, w1, 1
> > mov x29, sp
> > sub sp, sp, w1, sxtw 4
> > mov x0, sp
> > bl  foo
> >
> > I've looked at it but can't really find a way to solve the regression.
> > Any thoughts on this?
>
> Can you explain what goes wrong?  As I said rewriting parts of
> address calculation is tricky, there's always the chance that some
> cases regress (see your observation in comment#4 of the PR).
>

In this case the int -> sizetype cast ends up happening earlier. Instead of

  _7 = y_6(D) + 1;
  _1 = (sizetype) _7;
  _2 = _1 * 16;

We get

  _13 = (sizetype) y_6(D);
  _15 = _13 + 1;
  _2 = _15 * 16;

and then in RTL we have

x1 = ((sizetype) x1) << 4
sp = sp - (x1 + 16)

instead of

x1 = x1 + 1
sp = sp - ((sizetype) x1) << 4

which doesn't form sub sp, sp, w1, sxtw 4.

But more importantly, I realized that (in this case among others) the
pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
-> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
so something needs to be changed.
One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
deal with the testcases of the ticket but it does help in other cases.

Manolis

> Note that I still believe that avoiding the early and premature
> promotion of the addition to unsigned is a good thing.
>
> Note the testcase in the PR is fixed with -fwrapv because then
> we do _not_ perform this premature optimization.  Without -fwrapv
> the optimization is valid but as you note we do not perform it
> consistently - otherwise we wouldn't regress.
>
> Richard.
>
>
>
> > Thanks,
> > Manolis
> >
> >
> >
> > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> >  wrote:
> > >
> > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis  
> > > wrote:
> > > >
> > > > New patch with the requested changes can be found below.
> > > >
> > > > I don't know how much this affects SCEV, but I do believe that we
> > > > should incorporate this change somehow. I've seen various cases of
> > > > suboptimal address calculation codegen that boil down to this.
> > >
> > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > of the match.pd part is now off.
> > >
> > > Please fix that, the patch is OK with that change.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > gcc/match.pd | 31 +++
> > > > gcc/testsuite/gcc.dg/pr109393.c | 16 
> > > > 2 files changed, 47 insertions(+)
> > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 07e743ae464..1d642c205f0 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > (plus (convert @0) (op @2 (convert @1))
> > > > #endif
> > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + INTEGER_CST@3)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)
> > > > +#endif
> > > > +
> > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > +#if GIMPLE
> > > > + (for op (plus minus)
> > > > + (simplify
> > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > + && INTEGRAL_TYPE_P (type)
> > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)
> > > > +#endif
> > > > +
> > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be 
> > > > simplified
> > > > to a simple value. */
> > > > (for op (plus minus)
> > > > diff 

[COMMITTED] [prange] Avoid looking at type() for undefined ranges

2024-05-17 Thread Aldy Hernandez
Undefined ranges have no type.  This patch fixes the thinko.

gcc/ChangeLog:

PR middle-end/115128
* ipa-cp.cc (ipa_value_range_from_jfunc): Check for undefined_p
before looking at type.
(propagate_vr_across_jump_function): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr115128.c: New test.
---
 gcc/ipa-cp.cc|  4 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr115128.c | 31 
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr115128.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 09cab761822..408166b8044 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -1744,6 +1744,8 @@ ipa_value_range_from_jfunc (vrange ,
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
 fold_range until this is fixed.  */
+ || srcvr.undefined_p ()
+ || op_vr.undefined_p ()
  || !handler.operand_check_p (vr_type, srcvr.type (), op_vr.type 
())
  || !handler.fold_range (op_res, vr_type, srcvr, op_vr))
op_res.set_varying (vr_type);
@@ -2556,6 +2558,8 @@ propagate_vr_across_jump_function (cgraph_edge *cs, 
ipa_jump_func *jfunc,
 pointer type to hold the result instead of a boolean
 type.  Avoid trapping in the sanity check in
 fold_range until this is fixed.  */
+ || src_lats->m_value_range.m_vr.undefined_p ()
+ || op_vr.undefined_p ()
  || !handler.operand_check_p (operand_type,
   src_lats->m_value_range.m_vr.type (),
   op_vr.type ())
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr115128.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr115128.c
new file mode 100644
index 000..14bd4dbd6e5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr115128.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -w" } */
+
+long XXH3_len_4to8_64b_len, XXH3_len_0to16_64b___trans_tmp_3, 
XXH3_mix2Accs_acc,
+XXH3_64bits_internal___trans_tmp_8;
+typedef unsigned long XXH3_hashLong64_f();
+void *XXH3_64bits_internal_input;
+int XXH3_64bits_internal___trans_tmp_1;
+void XXH3_mul128_fold64();
+static void XXH3_mergeAccs(unsigned long) {
+  for (;;)
+XXH3_mul128_fold64(XXH3_mix2Accs_acc);
+}
+static __attribute__((noinline)) unsigned long
+XXH3_hashLong_64b_default(void *, unsigned long len) {
+  XXH3_mergeAccs(len * 7);
+}
+__attribute__((always_inline)) long
+XXH3_64bits_internal(unsigned long len, XXH3_hashLong64_f f_hashLong) {
+  if (len <= 16) {
+long keyed =
+XXH3_64bits_internal___trans_tmp_1 ^ XXH3_len_0to16_64b___trans_tmp_3;
+XXH3_mul128_fold64(keyed, XXH3_len_4to8_64b_len);
+return XXH3_64bits_internal___trans_tmp_8;
+  }
+  f_hashLong(XXH3_64bits_internal_input, len);
+}
+static void XXH_INLINE_XXH3_64bits(unsigned long len) {
+  XXH3_64bits_internal(len, XXH3_hashLong_64b_default);
+}
+void __cmplog_rtn_hook() { XXH_INLINE_XXH3_64bits(sizeof(long)); }
-- 
2.45.0



[PATCH v6] RISC-V: Implement IFN SAT_ADD for both the scalar and vector

2024-05-17 Thread pan2 . li
From: Pan Li 

Update in v6:

* Rebase upstream for conflict.

Log for v5:

The patch implement the SAT_ADD in the riscv backend as the
sample for both the scalar and vector.  Given below vector
as example:

void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  unsigned i;

  for (i = 0; i < n; i++)
out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) < x[i]));
}

Before this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v0,0(a1)
  vle64.v v1,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vadd.vv v1,v0,v1
  vmsgtu.vv   v0,v0,v1
  vmerge.vim  v1,v1,-1,v0
  vse64.v v1,0(a0)
  ...

After this patch:
vec_sat_add_u64:
  ...
  vsetvli a5,a3,e64,m1,ta,ma
  vle64.v v1,0(a1)
  vle64.v v2,0(a2)
  sllia4,a5,3
  sub a3,a3,a5
  add a1,a1,a4
  add a2,a2,a4
  vsaddu.vv   v1,v1,v2  <=  Vector Single-Width Saturating Add
  vse64.v v1,0(a0)
  ...

The below test suites are passed for this patch.
* The riscv fully regression tests.
* The aarch64 fully regression tests.
* The x86 bootstrap tests.
* The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* config/riscv/autovec.md (usadd3): New pattern expand for
the unsigned SAT_ADD in vector mode.
* config/riscv/riscv-protos.h (riscv_expand_usadd): New func decl
to expand usadd3 pattern.
(expand_vec_usadd): Ditto but for vector.
* config/riscv/riscv-v.cc (emit_vec_saddu): New func impl to emit
the vsadd insn.
(expand_vec_usadd): New func impl to expand usadd3 for vector.
* config/riscv/riscv.cc (riscv_expand_usadd): New func impl to
expand usadd3 for scalar.
* config/riscv/riscv.md (usadd3): New pattern expand for
the unsigned SAT_ADD in scalar mode.
* config/riscv/vector.md: Allow VLS mode for vsaddu.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-4.c: New test.
* gcc.target/riscv/sat_arith.h: New test.
* gcc.target/riscv/sat_u_add-1.c: New test.
* gcc.target/riscv/sat_u_add-2.c: New test.
* gcc.target/riscv/sat_u_add-3.c: New test.
* gcc.target/riscv/sat_u_add-4.c: New test.
* gcc.target/riscv/sat_u_add-run-1.c: New test.
* gcc.target/riscv/sat_u_add-run-2.c: New test.
* gcc.target/riscv/sat_u_add-run-3.c: New test.
* gcc.target/riscv/sat_u_add-run-4.c: New test.
* gcc.target/riscv/scalar_sat_binary.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 17 +
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv-v.cc   | 16 
 gcc/config/riscv/riscv.cc | 47 
 gcc/config/riscv/riscv.md | 11 +++
 gcc/config/riscv/vector.md| 12 +--
 .../riscv/rvv/autovec/binop/vec_sat_binary.h  | 33 
 .../riscv/rvv/autovec/binop/vec_sat_u_add-1.c | 19 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-2.c | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-3.c | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-4.c | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-run-1.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-2.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-3.c   | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-4.c   | 75 +++
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 31 
 gcc/testsuite/gcc.target/riscv/sat_u_add-1.c  | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-2.c  | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add-3.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-4.c  | 17 +
 .../gcc.target/riscv/sat_u_add-run-1.c| 25 +++
 .../gcc.target/riscv/sat_u_add-run-2.c| 25 +++
 .../gcc.target/riscv/sat_u_add-run-3.c| 25 +++
 .../gcc.target/riscv/sat_u_add-run-4.c| 25 +++
 .../gcc.target/riscv/scalar_sat_binary.h  | 27 +++
 25 files changed, 744 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c
 create mode 100644 

Re: [PATCH] libstdc++: detect DLLs on windows with

2024-05-17 Thread Jonathan Wakely
On Thu, 16 May 2024 at 19:52, Björn Schäpers  wrote:
>
> From: Björn Schäpers 
>
> libstdc++-v3/Changelog
>
> * acinclude.m4 (GLIBCXX_ENABLE_BACKTACE): Add check for
>   tlhelp32.h, matching libbacktrace.
> * configure: Regenerate.
> * config.h.in: Regenerate.

This looks good, thanks. I'll apply to trunk and probably backport it too.


>
> Signed-off-by: Björn Schäpers 
> ---
>  libstdc++-v3/acinclude.m4 |  4 
>  libstdc++-v3/config.h.in  |  3 +++
>  libstdc++-v3/configure| 15 +++
>  3 files changed, 22 insertions(+)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 51a08bcc8b1..506ce98ae43 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -5481,6 +5481,10 @@ AC_DEFUN([GLIBCXX_ENABLE_BACKTRACE], [
>  BACKTRACE_CPPFLAGS="$BACKTRACE_CPPFLAGS -DHAVE_DL_ITERATE_PHDR=1"
>fi
>AC_CHECK_HEADERS(windows.h)
> +  AC_CHECK_HEADERS(tlhelp32.h, [], [],
> +  [#ifdef HAVE_WINDOWS_H
> +  #  include 
> +  #endif])
>
># Check for the fcntl function.
>if test -n "${with_target_subdir}"; then
> diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
> index 906e0143099..486ba450749 100644
> --- a/libstdc++-v3/config.h.in
> +++ b/libstdc++-v3/config.h.in
> @@ -490,6 +490,9 @@
>  /* Define to 1 if you have the `timespec_get' function. */
>  #undef HAVE_TIMESPEC_GET
>
> +/* Define to 1 if you have the  header file. */
> +#undef HAVE_TLHELP32_H
> +
>  /* Define to 1 if the target supports thread-local storage. */
>  #undef HAVE_TLS
>
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 21abaeb0778..a2d59520146 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -53865,6 +53865,21 @@ _ACEOF
>
>  fi
>
> +done
> +
> +  for ac_header in tlhelp32.h
> +do :
> +  ac_fn_c_check_header_compile "$LINENO" "tlhelp32.h" 
> "ac_cv_header_tlhelp32_h" "#ifdef HAVE_WINDOWS_H
> +  #  include 
> +  #endif
> +"
> +if test "x$ac_cv_header_tlhelp32_h" = xyes; then :
> +  cat >>confdefs.h <<_ACEOF
> +#define HAVE_TLHELP32_H 1
> +_ACEOF
> +
> +fi
> +
>  done
>
>
> --
> 2.44.0
>



Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-17 Thread Robin Dapp
> OK if that pre-commit CI works out.

The CI didn't pick it up, guess it needs to be a bit more explicit.
In the meanwhile, however, I managed to catch a short window with
> 10G free on gcc185 =>  Bootstrap and regtest successful on aarch64.
Going to push the patch later today.

Regards
 Robin


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Richard Biener
On Fri, May 17, 2024 at 11:56 AM Tamar Christina
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 17, 2024 10:46 AM
> > To: Tamar Christina 
> > Cc: Victor Do Nascimento ; gcc-
> > patc...@gcc.gnu.org; Richard Sandiford ; Richard
> > Earnshaw ; Victor Do Nascimento
> > 
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
> >  wrote:
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, May 17, 2024 6:51 AM
> > > > To: Victor Do Nascimento 
> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > ;
> > > > Richard Earnshaw ; Victor Do Nascimento
> > > > 
> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > > autovectorizer
> > > >
> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > > >  wrote:
> > > > >
> > > > > From: Victor Do Nascimento 
> > > > >
> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > > consequence of using a direct optab for this is that backend-pattern
> > > > > selection is only ever able to match against one datatype - Either
> > > > > that of the operands or of the accumulated value, never both.
> > > > >
> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > > sufficient for full specification of all the possible dot product
> > > > > machine instructions to be matched to the code sequence; a dot product
> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
> > > > > respectively.
> > > > >
> > > > > This means that the following example fails autovectorization:
> > > > >
> > > > > uint32_t foo(int n, uint16_t* data) {
> > > > >   uint32_t sum = 0;
> > > > >   for (int i=0; i > > > > sum += data[i] * data[i];
> > > > >   }
> > > > >   return sum;
> > > > > }
> > > > >
> > > > > To remedy the issue a new optab is added, tentatively named
> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > > of both input and output types involved in the operation.
> > > >
> > > > I don't like this too much.  I'll note we document dot_prod as
> > > >
> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > > @item @samp{sdot_prod@var{m}}
> > > >
> > > > Compute the sum of the products of two signed elements.
> > > > Operand 1 and operand 2 are of the same mode. Their
> > > > product, which is of a wider mode, is computed and added to operand 3.
> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > > result is placed in operand 0, which is of the same mode as operand 3.
> > > > @var{m} is the mode of operand 1 and operand 2.
> > > >
> > > > with no restriction on the wider mode but we don't specify it which is
> > > > bad design.  This should have been a convert optab with two modes
> > > > from the start - adding a _twoway variant is just a hack.
> > >
> > > We did discuss this at the time we started implementing it.  There was two
> > > options, one was indeed to change it to a convert dot_prod optab, but 
> > > doing
> > > this means we have to update every target that uses it.
> > >
> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> > altivec.
> > >
> > > Which sure could be possible, but there's also every use in the backends 
> > > that
> > need
> > > to be updated, and tested, which for some targets we don't even know how 
> > > to
> > begin.
> > >
> > > So it seems very hard to correct dotprod to a convert optab now.
> >
> > It's still the correct way to go.  At _least_ your new pattern should
> > have been this,
> > otherwise what do you do when you have two-way, four-way and eight-way
> > variants?
> > Add yet another optab?
>
> I guess that's fair, but having the new optab only be convert resulted in 
> messy
> code as everywhere you must check for both variants.
>
> Additionally that optab would then overlap with the existing optabs as, as you
> Say, the documentation only says it's of a wider type and doesn't indicate
> precision.
>
> So to avoid issues down the line then If the new optab isn't acceptable then
> we'll have to do a wholesale conversion then..

Yep.  It shouldn't be difficult though.

> >
> > Another thing is that when you do it your way you should fix the existing 
> > optab
> > to be two-way by documenting how the second mode derives from the first.
> >
> > And sure, it's not the only optab suffering from this issue.
>
> Sure, all the zero and sign extending optabs for instance 

But for example the scalar ones are correct:

OPTAB_CL(sext_optab, "extend$b$a2", SIGN_EXTEND, "extend",
gen_extend_conv_libfunc)

Richard.

> Tamar
>
> >
> > 

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 17, 2024 10:46 AM
> To: Tamar Christina 
> Cc: Victor Do Nascimento ; gcc-
> patc...@gcc.gnu.org; Richard Sandiford ; Richard
> Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>  wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, May 17, 2024 6:51 AM
> > > To: Victor Do Nascimento 
> > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> ;
> > > Richard Earnshaw ; Victor Do Nascimento
> > > 
> > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > > autovectorizer
> > >
> > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> > >  wrote:
> > > >
> > > > From: Victor Do Nascimento 
> > > >
> > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > > optabs for dealing with vectorizable dot product code sequences.  The
> > > > consequence of using a direct optab for this is that backend-pattern
> > > > selection is only ever able to match against one datatype - Either
> > > > that of the operands or of the accumulated value, never both.
> > > >
> > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > > sufficient for full specification of all the possible dot product
> > > > machine instructions to be matched to the code sequence; a dot product
> > > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > > >
> > > > This means that the following example fails autovectorization:
> > > >
> > > > uint32_t foo(int n, uint16_t* data) {
> > > >   uint32_t sum = 0;
> > > >   for (int i=0; i > > > sum += data[i] * data[i];
> > > >   }
> > > >   return sum;
> > > > }
> > > >
> > > > To remedy the issue a new optab is added, tentatively named
> > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > > of both input and output types involved in the operation.
> > >
> > > I don't like this too much.  I'll note we document dot_prod as
> > >
> > > @cindex @code{sdot_prod@var{m}} instruction pattern
> > > @item @samp{sdot_prod@var{m}}
> > >
> > > Compute the sum of the products of two signed elements.
> > > Operand 1 and operand 2 are of the same mode. Their
> > > product, which is of a wider mode, is computed and added to operand 3.
> > > Operand 3 is of a mode equal or wider than the mode of the product. The
> > > result is placed in operand 0, which is of the same mode as operand 3.
> > > @var{m} is the mode of operand 1 and operand 2.
> > >
> > > with no restriction on the wider mode but we don't specify it which is
> > > bad design.  This should have been a convert optab with two modes
> > > from the start - adding a _twoway variant is just a hack.
> >
> > We did discuss this at the time we started implementing it.  There was two
> > options, one was indeed to change it to a convert dot_prod optab, but doing
> > this means we have to update every target that uses it.
> >
> > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
> altivec.
> >
> > Which sure could be possible, but there's also every use in the backends 
> > that
> need
> > to be updated, and tested, which for some targets we don't even know how to
> begin.
> >
> > So it seems very hard to correct dotprod to a convert optab now.
> 
> It's still the correct way to go.  At _least_ your new pattern should
> have been this,
> otherwise what do you do when you have two-way, four-way and eight-way
> variants?
> Add yet another optab?

I guess that's fair, but having the new optab only be convert resulted in messy
code as everywhere you must check for both variants.

Additionally that optab would then overlap with the existing optabs as, as you
Say, the documentation only says it's of a wider type and doesn't indicate
precision.

So to avoid issues down the line then If the new optab isn't acceptable then
we'll have to do a wholesale conversion then..

> 
> Another thing is that when you do it your way you should fix the existing 
> optab
> to be two-way by documenting how the second mode derives from the first.
> 
> And sure, it's not the only optab suffering from this issue.

Sure, all the zero and sign extending optabs for instance 

Tamar

> 
> Richard.
> 
> > Tamar
> >
> > >
> > > Richard.
> > >
> > > > In order to minimize changes to the existing codebase,
> > > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > > argument is added to its signature - `const_tree otype', allowing type
> > > > information to be specified for both input and output types.  The
> > > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > > which serves as a shim to `optab_for_tree_code_1', passing 

[c-family] Small fix to implementation of -fdump-ada-spec

2024-05-17 Thread Eric Botcazou
This avoids declaring anonymous array types as having an aliased component 
when the layout is packed, as is already done for named array types.

Tested on x86-64/Linux, applied on the mainline.


2024-05-17  Eric Botcazou  

* c-ada-spec.cc (bitfield_used): Move around.
(packed_layout): Likewise.
(dump_ada_array_type): Do not put "aliased" for a packed layout.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc
index e56ef10f443..46fee30b6b9 100644
--- a/gcc/c-family/c-ada-spec.cc
+++ b/gcc/c-family/c-ada-spec.cc
@@ -699,6 +699,8 @@ compare_comment (const void *lp, const void *rp)
 
 static tree *to_dump = NULL;
 static int to_dump_count = 0;
+static bool bitfield_used = false;
+static bool packed_layout = false;
 
 /* Collect a list of declarations from T relevant to SOURCE_FILE to be dumped
by a subsequent call to dump_ada_nodes.  */
@@ -1825,7 +1827,7 @@ dump_ada_array_type (pretty_printer *buffer, tree node, int spc)
 
   pp_string (buffer, " of ");
 
-  if (TREE_CODE (tmp) != POINTER_TYPE)
+  if (TREE_CODE (tmp) != POINTER_TYPE && !packed_layout)
 	pp_string (buffer, "aliased ");
 
   if (TYPE_NAME (tmp)
@@ -2083,9 +2085,6 @@ is_float128 (tree node)
 	 || id_equal (name, "_Float128x");
 }
 
-static bool bitfield_used = false;
-static bool packed_layout = false;
-
 /* Recursively dump in BUFFER Ada declarations corresponding to NODE of type
TYPE.  SPC is the indentation level.  LIMITED_ACCESS indicates whether NODE
can be referenced via a "limited with" clause.  NAME_ONLY indicates whether


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Richard Biener
On Fri, May 17, 2024 at 11:05 AM Tamar Christina
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, May 17, 2024 6:51 AM
> > To: Victor Do Nascimento 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> > Richard Earnshaw ; Victor Do Nascimento
> > 
> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> > autovectorizer
> >
> > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
> >  wrote:
> > >
> > > From: Victor Do Nascimento 
> > >
> > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > > optabs for dealing with vectorizable dot product code sequences.  The
> > > consequence of using a direct optab for this is that backend-pattern
> > > selection is only ever able to match against one datatype - Either
> > > that of the operands or of the accumulated value, never both.
> > >
> > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > > in AArch64 SVE2, the existing direct opcode approach is no longer
> > > sufficient for full specification of all the possible dot product
> > > machine instructions to be matched to the code sequence; a dot product
> > > resulting in VNx4SI may result from either dot products on VNx16QI or
> > > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> > >
> > > This means that the following example fails autovectorization:
> > >
> > > uint32_t foo(int n, uint16_t* data) {
> > >   uint32_t sum = 0;
> > >   for (int i=0; i > > sum += data[i] * data[i];
> > >   }
> > >   return sum;
> > > }
> > >
> > > To remedy the issue a new optab is added, tentatively named
> > > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > > of both input and output types involved in the operation.
> >
> > I don't like this too much.  I'll note we document dot_prod as
> >
> > @cindex @code{sdot_prod@var{m}} instruction pattern
> > @item @samp{sdot_prod@var{m}}
> >
> > Compute the sum of the products of two signed elements.
> > Operand 1 and operand 2 are of the same mode. Their
> > product, which is of a wider mode, is computed and added to operand 3.
> > Operand 3 is of a mode equal or wider than the mode of the product. The
> > result is placed in operand 0, which is of the same mode as operand 3.
> > @var{m} is the mode of operand 1 and operand 2.
> >
> > with no restriction on the wider mode but we don't specify it which is
> > bad design.  This should have been a convert optab with two modes
> > from the start - adding a _twoway variant is just a hack.
>
> We did discuss this at the time we started implementing it.  There was two
> options, one was indeed to change it to a convert dot_prod optab, but doing
> this means we have to update every target that uses it.
>
> Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and 
> altivec.
>
> Which sure could be possible, but there's also every use in the backends that 
> need
> to be updated, and tested, which for some targets we don't even know how to 
> begin.
>
> So it seems very hard to correct dotprod to a convert optab now.

It's still the correct way to go.  At _least_ your new pattern should
have been this,
otherwise what do you do when you have two-way, four-way and eight-way variants?
Add yet another optab?

Another thing is that when you do it your way you should fix the existing optab
to be two-way by documenting how the second mode derives from the first.

And sure, it's not the only optab suffering from this issue.

Richard.

> Tamar
>
> >
> > Richard.
> >
> > > In order to minimize changes to the existing codebase,
> > > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > > argument is added to its signature - `const_tree otype', allowing type
> > > information to be specified for both input and output types.  The
> > > existing nterface is retained by defining a new `optab_for_tree_code',
> > > which serves as a shim to `optab_for_tree_code_1', passing old
> > > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> > >
> > > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > > directly, passing it both types, adding the internal logic to the
> > > function to distinguish between competing optabs.
> > >
> > > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > > ensure the new icode can be correctly selected, given the new optab.
> > >
> > > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64-sve2.md 
> > > (@aarch64_sve_dotvnx4sivnx8hi):
> > > renamed to `dot_prod_twoway_vnx8hi'.
> > > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > > update icodes used in 

Re: [PATCH] MATCH: Maybe expand (T)(A + C1) * C2 and (T)(A + C1) * C2 + C3 [PR109393]

2024-05-17 Thread Richard Biener
On Fri, 17 May 2024, Manolis Tsamis wrote:

> Hi Richard,
> 
> While I was re-testing the latest version of this patch I noticed that
> it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> we generate one instruction more:
> 
> sbfiz   x1, x1, 4, 32
> stp x29, x30, [sp, -16]!
> add x1, x1, 16
> mov x29, sp
> sub sp, sp, x1
> mov x0, sp
> bl  foo
> 
> Instead of:
> 
> stp x29, x30, [sp, -16]!
> add w1, w1, 1
> mov x29, sp
> sub sp, sp, w1, sxtw 4
> mov x0, sp
> bl  foo
> 
> I've looked at it but can't really find a way to solve the regression.
> Any thoughts on this?

Can you explain what goes wrong?  As I said rewriting parts of
address calculation is tricky, there's always the chance that some
cases regress (see your observation in comment#4 of the PR).

Note that I still believe that avoiding the early and premature
promotion of the addition to unsigned is a good thing.

Note the testcase in the PR is fixed with -fwrapv because then
we do _not_ perform this premature optimization.  Without -fwrapv
the optimization is valid but as you note we do not perform it
consistently - otherwise we wouldn't regress.

Richard.



> Thanks,
> Manolis
> 
> 
> 
> On Thu, May 16, 2024 at 11:15 AM Richard Biener
>  wrote:
> >
> > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis  
> > wrote:
> > >
> > > New patch with the requested changes can be found below.
> > >
> > > I don't know how much this affects SCEV, but I do believe that we
> > > should incorporate this change somehow. I've seen various cases of
> > > suboptimal address calculation codegen that boil down to this.
> >
> > This misses the ChangeLog (I assume it's unchanged) and indent
> > of the match.pd part is now off.
> >
> > Please fix that, the patch is OK with that change.
> >
> > Thanks,
> > Richard.
> >
> > > gcc/match.pd | 31 +++
> > > gcc/testsuite/gcc.dg/pr109393.c | 16 
> > > 2 files changed, 47 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 07e743ae464..1d642c205f0 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > (plus (convert @0) (op @2 (convert @1))
> > > #endif
> > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > + Where (A + CST1) doesn't need to have a single use. */
> > > +#if GIMPLE
> > > + (for op (plus minus)
> > > + (simplify
> > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > + INTEGER_CST@3)
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > + && INTEGRAL_TYPE_P (type)
> > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_WRAPS (type))
> > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)
> > > +#endif
> > > +
> > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > +#if GIMPLE
> > > + (for op (plus minus)
> > > + (simplify
> > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > + && INTEGRAL_TYPE_P (type)
> > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > + && TYPE_OVERFLOW_WRAPS (type))
> > > + (op (mult (convert @0) @2) (mult (convert @1) @2)
> > > +#endif
> > > +
> > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be simplified
> > > to a simple value. */
> > > (for op (plus minus)
> > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c 
> > > b/gcc/testsuite/gcc.dg/pr109393.c
> > > new file mode 100644
> > > index 000..e9051273672
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > @@ -0,0 +1,16 @@
> > > +/* PR tree-optimization/109393 */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > +
> > > +int foo(int *a, int j)
> > > +{
> > > + int k = j - 1;
> > > + return a[j - 1] == a[k];
> > > +}
> > > +
> > > +int bar(int *a, int j)
> > > +{
> > > + int k = j - 1;
> > > + return ([j + 1] - 2) == [k];
> > > +}
> > > --
> > > 2.44.0
> > >
> > >
> > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis  
> > > wrote:
> > > >
> > > > The original motivation for this pattern was that the following 
> > > > function does
> > > > not fold to 'return 1':
> > > >
> > > > int foo(int *a, int j)
> > > > {
> > > >   int k = j - 1;
> > > >   return a[j - 1] == a[k];
> > > > }
> > > >
> > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently as 
> > > > part of
> > > 

RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Hongtao Liu 
> Sent: Friday, May 17, 2024 3:14 AM
> To: Victor Do Nascimento 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> Richard Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> > >
> > Sorry to chime in, for x86 backend, we defined usdot_prodv16hi, and
> > 2-way dot_prod operations can be generated
> >
> This is the link https://godbolt.org/z/hcWr64vx3, x86 define
> udot_prodv16qi/udot_prod8hi and both 2-way and 4-way dot_prod
> instructions are generated
> 

That's not the same, the 2-way vs 4-way dot_prod here is that
e.g. udot_prod8hi can reduce to either DImode or SImode.
udot_prod8hi does not have enough information to distinguish the two and in RTL
you can't overload the names.  So this is about the ISA having instructions 
that overlap
on the source mode of the instruction.

Tamar

> 
> --
> BR,
> Hongtao


[PATCH] middle-end/115110 - Fix view_converted_memref_p

2024-05-17 Thread Richard Biener
view_converted_memref_p was checking the reference type against the
pointer type of the offset operand rather than its pointed-to type
which leads to all refs being subject to view-convert treatment
in get_alias_set causing numerous testsuite fails but with its
new uses from r15-512-g9b7cad5884f21c is also a wrong-code issue.

Bootstrap & regtest on x86_64-unknown-linux-gnu in progress.

PR middle-end/115110
* tree-ssa-alias.cc (view_converted_memref_p): Fix.
---
 gcc/tree-ssa-alias.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 9f5f69bcfad..d64d6d02f4a 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2077,8 +2077,9 @@ view_converted_memref_p (tree base)
 {
   if (TREE_CODE (base) != MEM_REF && TREE_CODE (base) != TARGET_MEM_REF)
 return false;
-  return same_type_for_tbaa (TREE_TYPE (base),
-TREE_TYPE (TREE_OPERAND (base, 1))) != 1;
+  return (same_type_for_tbaa (TREE_TYPE (base),
+ TREE_TYPE (TREE_TYPE (TREE_OPERAND (base, 1
+ != 1);
 }
 
 /* Return true if an indirect reference based on *PTR1 constrained
-- 
2.35.3


RE: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Friday, May 17, 2024 6:51 AM
> To: Victor Do Nascimento 
> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford ;
> Richard Earnshaw ; Victor Do Nascimento
> 
> Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> 
> On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>  wrote:
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> 
> I don't like this too much.  I'll note we document dot_prod as
> 
> @cindex @code{sdot_prod@var{m}} instruction pattern
> @item @samp{sdot_prod@var{m}}
> 
> Compute the sum of the products of two signed elements.
> Operand 1 and operand 2 are of the same mode. Their
> product, which is of a wider mode, is computed and added to operand 3.
> Operand 3 is of a mode equal or wider than the mode of the product. The
> result is placed in operand 0, which is of the same mode as operand 3.
> @var{m} is the mode of operand 1 and operand 2.
> 
> with no restriction on the wider mode but we don't specify it which is
> bad design.  This should have been a convert optab with two modes
> from the start - adding a _twoway variant is just a hack.

We did discuss this at the time we started implementing it.  There was two
options, one was indeed to change it to a convert dot_prod optab, but doing
this means we have to update every target that uses it.

Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and 
altivec.

Which sure could be possible, but there's also every use in the backends that 
need
to be updated, and tested, which for some targets we don't even know how to 
begin.

So it seems very hard to correct dotprod to a convert optab now.

Tamar

> 
> Richard.
> 
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-sve2.md 
> > (@aarch64_sve_dotvnx4sivnx8hi):
> > renamed to `dot_prod_twoway_vnx8hi'.
> > * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> > update icodes used in line with above rename.
> > * optabs-tree.cc (optab_for_tree_code_1): Renamed
> > `optab_for_tree_code' and added new argument.
> > (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> > * optabs-tree.h (optab_for_tree_code_1): New.
> > * optabs.cc (expand_widen_pattern_expr): Expand support for
> > DOT_PROD_EXPR patterns.
> > * optabs.def (udot_prod_twoway_optab): New.
> > (sdot_prod_twoway_optab): Likewise.
> > * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> > support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * 

[COMMITTED 32/35] ada: Improve test for unprocessed preprocessor directives

2024-05-17 Thread Marc Poulhiès
From: Steve Baird 

Preprocessor directives are case insensitive and may have spaces or tabs
between the '#' and the keyword. When checking for the error case of
unprocessed preprocessor directives, take these rules into account.

gcc/ada/

* scng.adb (scan): When checking for an unprocessed preprocessor
directive, take into account the preprocessor's rules about case
insensitivity and about white space between the '#' and the
keyword.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/scng.adb | 183 +++
 1 file changed, 122 insertions(+), 61 deletions(-)

diff --git a/gcc/ada/scng.adb b/gcc/ada/scng.adb
index 9b1d00e3452..8b2829ffbbf 100644
--- a/gcc/ada/scng.adb
+++ b/gcc/ada/scng.adb
@@ -40,6 +40,7 @@ with Widechar; use Widechar;
 
 pragma Warnings (Off);
 --  This package is used also by gnatcoll
+with System.Case_Util;
 with System.CRC32;
 with System.UTF_32;  use System.UTF_32;
 with System.WCh_Con; use System.WCh_Con;
@@ -2250,86 +2251,146 @@ package body Scng is
 
  when Special_Preprocessor_Character =>
 
---  If Set_Special_Character has been called for this character,
---  set Scans.Special_Character and return a Special token.
+declare
+   function Matches_After_Skipping_White_Space
+ (S : String) return Boolean;
+
+   --  Return True iff after skipping past white space the
+   --  next Source characters match the given string.
+
+   
+   -- Matches_After_Skipping_White_Space --
+   
+
+   function Matches_After_Skipping_White_Space
+ (S : String) return Boolean
+   is
+  function To_Lower_Case_String (Buff : Text_Buffer)
+return String;
+  --  Convert a text buffer to a lower-case string.
+
+  --
+  -- To_Lower_Case_String --
+  --
+
+  function To_Lower_Case_String (Buff : Text_Buffer)
+return String
+  is
+ subtype One_Based is Text_Buffer (1 .. Buff'Length);
+ Result : String := String (One_Based (Buff));
+  begin
+ --  The System.Case_Util.To_Lower function (the overload
+ --  that takes a string parameter) cannot be called
+ --  here due to bootstrapping problems. That function
+ --  was added too recently.
+
+ System.Case_Util.To_Lower (Result);
+ return Result;
+  end To_Lower_Case_String;
+
+  pragma Assert (Source (Scan_Ptr) = '#');
+  Local_Scan_Ptr : Source_Ptr := Scan_Ptr + 1;
+
+   --  Start of processing for Matches_After_Skipping_White_Space
 
-if Special_Characters (Source (Scan_Ptr)) then
-   Token_Ptr := Scan_Ptr;
-   Token := Tok_Special;
-   Special_Character := Source (Scan_Ptr);
-   Scan_Ptr := Scan_Ptr + 1;
-   return;
+   begin
+  while Local_Scan_Ptr in Source'Range
+and then Source (Local_Scan_Ptr) in ' ' | HT
+  loop
+ Local_Scan_Ptr := Local_Scan_Ptr + 1;
+  end loop;
 
---  Check for something looking like a preprocessor directive
+  return Local_Scan_Ptr in Source'Range
+and then Local_Scan_Ptr + (S'Length - 1) in Source'Range
+and then S = To_Lower_Case_String (
+   Source (Local_Scan_Ptr ..
+   Local_Scan_Ptr + (S'Length - 1)));
+   end Matches_After_Skipping_White_Space;
 
-elsif Source (Scan_Ptr) = '#'
-  and then (Source (Scan_Ptr + 1 .. Scan_Ptr + 2) = "if"
-  or else
-Source (Scan_Ptr + 1 .. Scan_Ptr + 5) = "elsif"
-  or else
-Source (Scan_Ptr + 1 .. Scan_Ptr + 4) = "else"
-  or else
-Source (Scan_Ptr + 1 .. Scan_Ptr + 3) = "end")
-then
-   Error_Msg_S
- ("preprocessor directive ignored, preprocessor not active");
+begin
+   --  If Set_Special_Character has been called for this character,
+   --  set Scans.Special_Character and return a Special token.
 
-   --  Skip to end of line
+   if Special_Characters (Source (Scan_Ptr)) then
+  Token_Ptr := Scan_Ptr;
+  Token := 

[COMMITTED 24/35] ada: Do not query the modification time of a special file.

2024-05-17 Thread Marc Poulhiès
From: Steve Baird 

In Ada.Directories, the function Modification_Time raises Name_Error if it is
called for a special file. So don't do that in Start_Search_Internal.

gcc/ada/

* libgnat/a-direct.adb (Start_Search_Internal): Do not call
Modification_Time for a special file; declare a Calendar.Time
constant No_Time and use that instead.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-direct.adb | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/a-direct.adb b/gcc/ada/libgnat/a-direct.adb
index 32e020c48c3..adff12277e8 100644
--- a/gcc/ada/libgnat/a-direct.adb
+++ b/gcc/ada/libgnat/a-direct.adb
@@ -29,7 +29,7 @@
 --  --
 --
 
-with Ada.Calendar;   use Ada.Calendar;
+with Ada.Calendar.Formatting;use Ada.Calendar;
 with Ada.Characters.Handling;use Ada.Characters.Handling;
 with Ada.Containers.Vectors;
 with Ada.Directories.Validity;   use Ada.Directories.Validity;
@@ -1392,6 +1392,17 @@ package body Ada.Directories is
   end record;
 
   Res : Result := (Found => False);
+
+  --  This declaration of No_Time copied from GNAT.Calendar
+  --  because adding a "with GNAT.Calendar;" to this unit
+  --  results in problems.
+
+  No_Time : constant Ada.Calendar.Time :=
+Ada.Calendar.Formatting.Time_Of
+  (Ada.Calendar.Year_Number'First,
+   Ada.Calendar.Month_Number'First,
+   Ada.Calendar.Day_Number'First,
+   Time_Zone => 0);
begin
   --  Get the file attributes for the directory item
 
@@ -1452,7 +1463,10 @@ package body Ada.Directories is
   Full_Name => To_Unbounded_String (Path),
   Attr_Error_Code   => 0,
   Kind  => Res.Kind,
-  Modification_Time => Modification_Time (Path),
+  Modification_Time =>
+   (if Res.Kind = Special_File
+  then No_Time
+  else Modification_Time (Path)),
   Size  => Res.Size));
  end if;
   end if;
-- 
2.43.2



[COMMITTED 33/35] ada: Start the initialization of the tasking runtime earlier

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

This installs the tasking versions of the RTS_Lock manipulation routines
very early, before the elaboration of all the Ada units of the program,
including those of the runtime, because this elaboration may require the
initialization of RTS_Lock objects.

gcc/ada/

* bindgen.adb (Gen_Adainit): Generate declaration and call to the
imported procedure __gnat_tasking_runtime_initialize if need be.
* libgnat/s-soflin.ads (Locking Soft-Links): Add commentary.
* libgnarl/s-tasini.adb (Tasking_Runtime_Initialize): New procedure
exported as __gnat_tasking_runtime_initialize.  Initialize RTS_Lock
manipulation routines here instead of...
(Init_RTS): ...here.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/bindgen.adb   | 18 --
 gcc/ada/libgnarl/s-tasini.adb | 30 +-
 gcc/ada/libgnat/s-soflin.ads  |  4 +++-
 3 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/bindgen.adb b/gcc/ada/bindgen.adb
index fc834e3a9b6..f15f96495df 100644
--- a/gcc/ada/bindgen.adb
+++ b/gcc/ada/bindgen.adb
@@ -819,8 +819,7 @@ package body Bindgen is
 WBI ("  pragma Import (C, XDR_Stream, ""__gl_xdr_stream"");");
  end if;
 
- --  Import entry point for elaboration time signal handler
- --  installation, and indication of if it's been called previously.
+ --  Import entry point for initialization of the runtime
 
  WBI ("");
  WBI ("  procedure Runtime_Initialize " &
@@ -828,6 +827,15 @@ package body Bindgen is
  WBI ("  pragma Import (C, Runtime_Initialize, " &
   """__gnat_runtime_initialize"");");
 
+ --  Import entry point for initialization of the tasking runtime
+
+ if With_GNARL then
+WBI ("");
+WBI ("  procedure Tasking_Runtime_Initialize;");
+WBI ("  pragma Import (C, Tasking_Runtime_Initialize, " &
+ """__gnat_tasking_runtime_initialize"");");
+ end if;
+
  --  Import handlers attach procedure for sequential elaboration policy
 
  if System_Interrupts_Used
@@ -1090,6 +1098,12 @@ package body Bindgen is
  --  Generate call to Runtime_Initialize
 
  WBI ("  Runtime_Initialize (1);");
+
+ --  Generate call to Tasking_Runtime_Initialize
+
+ if With_GNARL then
+WBI ("  Tasking_Runtime_Initialize;");
+ end if;
   end if;
 
   --  Generate call to set Initialize_Scalar values if active
diff --git a/gcc/ada/libgnarl/s-tasini.adb b/gcc/ada/libgnarl/s-tasini.adb
index 22294145bed..794183f5356 100644
--- a/gcc/ada/libgnarl/s-tasini.adb
+++ b/gcc/ada/libgnarl/s-tasini.adb
@@ -102,10 +102,6 @@ package body System.Tasking.Initialization is
procedure Release_RTS_Lock (Addr : Address);
--  Release the RTS lock at Addr
 
-   
-   --  Local Subprograms --
-   
-

-- Tasking Initialization --

@@ -116,6 +112,15 @@ package body System.Tasking.Initialization is
--  of initializing global locks, and installing tasking versions of certain
--  operations used by the compiler. Init_RTS is called during elaboration.
 
+   procedure Tasking_Runtime_Initialize;
+   pragma Export (Ada, Tasking_Runtime_Initialize,
+  "__gnat_tasking_runtime_initialize");
+   --  This procedure starts the initialization of the GNARL. It installs the
+   --  tasking versions of the RTS_Lock manipulation routines. It is called
+   --  very early before the elaboration of all the Ada units of the program,
+   --  including those of the runtime, because this elaboration may require
+   --  the initialization of RTS_Lock objects.
+
--
-- Change_Base_Priority --
--
@@ -414,11 +419,6 @@ package body System.Tasking.Initialization is
   SSL.Task_Name  := Task_Name'Access;
   SSL.Get_Current_Excep  := Get_Current_Excep'Access;
 
-  SSL.Initialize_RTS_Lock := Initialize_RTS_Lock'Access;
-  SSL.Finalize_RTS_Lock   := Finalize_RTS_Lock'Access;
-  SSL.Acquire_RTS_Lock:= Acquire_RTS_Lock'Access;
-  SSL.Release_RTS_Lock:= Release_RTS_Lock'Access;
-
   --  Initialize the tasking soft links (if not done yet) that are common
   --  to the full and the restricted run times.
 
@@ -430,6 +430,18 @@ package body System.Tasking.Initialization is
   Undefer_Abort (Environment_Task);
end Init_RTS;
 
+   
+   -- Tasking_Runtime_Initialize --
+   
+
+   procedure Tasking_Runtime_Initialize is
+   begin
+  SSL.Initialize_RTS_Lock := Initialize_RTS_Lock'Access;
+  SSL.Finalize_RTS_Lock   := Finalize_RTS_Lock'Access;
+  SSL.Acquire_RTS_Lock:= 

[COMMITTED 26/35] ada: Factor out duplicated code in bodies of System.Task_Primitives.Operations

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

The duplication is present in some POSIX-like implementations (POSIX
and RTEMS) while it has already been eliminated in others (Linux, QNX).  The
latter implementations are also slightly modified for consistency's sake.

No functional changes.

gcc/ada/

* libgnarl/s-taprop__dummy.adb (Initialize_Lock): Fix formatting.
* libgnarl/s-taprop__linux.adb (RTS_Lock_Ptr): Delete.
(Init_Mutex): Rename into...
(Initialize_Lock): ...this.
(Initialize_Lock [Lock]): Call above procedure.
(Initialize_Lock [RTS_Lock]): Likewise.
(Initialize_TCB): Likewise.
* libgnarl/s-taprop__posix.adb (Initialize_Lock): New procedure
factored out from the other two homonyms.
(Initialize_Lock [Lock]): Call above procedure.
(Initialize_Lock [RTS_Lock]): Likewise.
* libgnarl/s-taprop__qnx.adb (RTS_Lock_Ptr): Delete.
(Init_Mutex): Rename into...
(Initialize_Lock): ...this.
(Initialize_Lock [Lock]): Call above procedure.
(Initialize_Lock [RTS_Lock]): Likewise.
(Initialize_TCB): Likewise.
* libgnarl/s-taprop__rtems.adb (Initialize_Lock): New procedure
factored out from the other two homonyms.
(Initialize_Lock [Lock]): Call above procedure.
(Initialize_Lock [RTS_Lock]): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taprop__dummy.adb |  4 +-
 gcc/ada/libgnarl/s-taprop__linux.adb | 47 ++---
 gcc/ada/libgnarl/s-taprop__posix.adb | 61 +---
 gcc/ada/libgnarl/s-taprop__qnx.adb   | 46 ++---
 gcc/ada/libgnarl/s-taprop__rtems.adb | 61 +---
 5 files changed, 90 insertions(+), 129 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taprop__dummy.adb 
b/gcc/ada/libgnarl/s-taprop__dummy.adb
index 90c4cd4cf72..829d595694c 100644
--- a/gcc/ada/libgnarl/s-taprop__dummy.adb
+++ b/gcc/ada/libgnarl/s-taprop__dummy.adb
@@ -239,7 +239,9 @@ package body System.Task_Primitives.Operations is
end Initialize_Lock;
 
procedure Initialize_Lock
- (L : not null access RTS_Lock; Level : Lock_Level) is
+ (L : not null access RTS_Lock;
+  Level : Lock_Level)
+   is
begin
   null;
end Initialize_Lock;
diff --git a/gcc/ada/libgnarl/s-taprop__linux.adb 
b/gcc/ada/libgnarl/s-taprop__linux.adb
index d6a29b5e158..74717cb2d2b 100644
--- a/gcc/ada/libgnarl/s-taprop__linux.adb
+++ b/gcc/ada/libgnarl/s-taprop__linux.adb
@@ -248,10 +248,10 @@ package body System.Task_Primitives.Operations is
--  as in "sudo /sbin/setcap cap_sys_nice=ep exe_file". If it doesn't have
--  permission, then a request for Ceiling_Locking is ignored.
 
-   type RTS_Lock_Ptr is not null access all RTS_Lock;
-
-   function Init_Mutex (L : RTS_Lock_Ptr; Prio : Any_Priority) return C.int;
-   --  Initialize the mutex L. If Ceiling_Support is True, then set the ceiling
+   function Initialize_Lock
+ (L: not null access RTS_Lock;
+  Prio : Any_Priority) return C.int;
+   --  Initialize the lock L. If Ceiling_Support is True, then set the ceiling
--  to Prio. Returns 0 for success, or ENOMEM for out-of-memory.
 
---
@@ -340,11 +340,20 @@ package body System.Task_Primitives.Operations is
 
function Self return Task_Id renames Specific.Self;
 
-   
-   -- Init_Mutex --
-   
+   -
+   -- Initialize_Lock --
+   -
 
-   function Init_Mutex (L : RTS_Lock_Ptr; Prio : Any_Priority) return C.int is
+   --  Note: mutexes and cond_variables needed per-task basis are initialized
+   --  in Initialize_TCB and the Storage_Error is handled. Other mutexes (such
+   --  as RTS_Lock, Memory_Lock...) used in RTS is initialized before any
+   --  status change of RTS. Therefore raising Storage_Error in the following
+   --  routines should be able to be handled safely.
+
+   function Initialize_Lock
+ (L: not null access RTS_Lock;
+  Prio : Any_Priority) return C.int
+   is
   Mutex_Attr : aliased pthread_mutexattr_t;
   Result, Result_2 : C.int;
 
@@ -377,17 +386,7 @@ package body System.Task_Primitives.Operations is
   Result_2 := pthread_mutexattr_destroy (Mutex_Attr'Access);
   pragma Assert (Result_2 = 0);
   return Result; -- of pthread_mutex_init, not pthread_mutexattr_destroy
-   end Init_Mutex;
-
-   -
-   -- Initialize_Lock --
-   -
-
-   --  Note: mutexes and cond_variables needed per-task basis are initialized
-   --  in Initialize_TCB and the Storage_Error is handled. Other mutexes (such
-   --  as RTS_Lock, Memory_Lock...) used in RTS is initialized before any
-   --  status change of RTS. Therefore raising Storage_Error in the following
-   --  routines should be able to be handled safely.
+   end Initialize_Lock;
 
procedure Initialize_Lock
  (Prio : Any_Priority;
@@ -420,18 +419,19 @@ 

[COMMITTED 31/35] ada: Restore dependency on System.OS_Interface in System.Task_Primitives

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

The dependency is relied upon by the binder to drag the tasking runtime.

gcc/ada/

* libgnarl/s-taspri__mingw.ads: Add clause for System.OS_Interface.
(Private_Data): Change type of Thread component.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taspri__mingw.ads | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/libgnarl/s-taspri__mingw.ads 
b/gcc/ada/libgnarl/s-taspri__mingw.ads
index a51f752d805..6eae97d4af6 100644
--- a/gcc/ada/libgnarl/s-taspri__mingw.ads
+++ b/gcc/ada/libgnarl/s-taspri__mingw.ads
@@ -31,6 +31,7 @@
 
 --  This is a NT (native) version of this package
 
+with System.OS_Interface;
 with System.OS_Locks;
 with System.Win32;
 
@@ -87,7 +88,7 @@ private
end record;
 
type Private_Data is limited record
-  Thread : aliased Win32.HANDLE;
+  Thread : aliased System.OS_Interface.Thread_Id;
   pragma Atomic (Thread);
   --  Thread field may be updated by two different threads of control.
   --  (See, Enter_Task and Create_Task in s-taprop.adb).
-- 
2.43.2



[COMMITTED 30/35] ada: Further adjustments coming from aliasing considerations

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

They are needed on 32-bit platforms because of different calling conventions
and again in the units implementing AltiVec and Streams support.

gcc/ada/

* libgnat/g-alvevi.ads: Add pragma Universal_Aliasing for all the
view types.
* libgnat/s-stratt.ads: Likewise for Fat_Pointer type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/g-alvevi.ads | 11 +++
 gcc/ada/libgnat/s-stratt.ads |  3 +++
 2 files changed, 14 insertions(+)

diff --git a/gcc/ada/libgnat/g-alvevi.ads b/gcc/ada/libgnat/g-alvevi.ads
index b2beac7284c..b0f58790adf 100644
--- a/gcc/ada/libgnat/g-alvevi.ads
+++ b/gcc/ada/libgnat/g-alvevi.ads
@@ -58,6 +58,7 @@ package GNAT.Altivec.Vector_Views is
type VUC_View is record
   Values : Varray_unsigned_char;
end record;
+   pragma Universal_Aliasing (VUC_View);
 
type Varray_signed_char is array (Vchar_Range) of signed_char;
for Varray_signed_char'Alignment use VECTOR_ALIGNMENT;
@@ -65,6 +66,7 @@ package GNAT.Altivec.Vector_Views is
type VSC_View is record
   Values : Varray_signed_char;
end record;
+   pragma Universal_Aliasing (VSC_View);
 
type Varray_bool_char is array (Vchar_Range) of bool_char;
for Varray_bool_char'Alignment use VECTOR_ALIGNMENT;
@@ -72,6 +74,7 @@ package GNAT.Altivec.Vector_Views is
type VBC_View is record
   Values : Varray_bool_char;
end record;
+   pragma Universal_Aliasing (VBC_View);
 
--
-- short components --
@@ -85,6 +88,7 @@ package GNAT.Altivec.Vector_Views is
type VUS_View is record
   Values : Varray_unsigned_short;
end record;
+   pragma Universal_Aliasing (VUS_View);
 
type Varray_signed_short is array (Vshort_Range) of signed_short;
for Varray_signed_short'Alignment use VECTOR_ALIGNMENT;
@@ -92,6 +96,7 @@ package GNAT.Altivec.Vector_Views is
type VSS_View is record
   Values : Varray_signed_short;
end record;
+   pragma Universal_Aliasing (VSS_View);
 
type Varray_bool_short is array (Vshort_Range) of bool_short;
for Varray_bool_short'Alignment use VECTOR_ALIGNMENT;
@@ -99,6 +104,7 @@ package GNAT.Altivec.Vector_Views is
type VBS_View is record
   Values : Varray_bool_short;
end record;
+   pragma Universal_Aliasing (VBS_View);
 

-- int components --
@@ -112,6 +118,7 @@ package GNAT.Altivec.Vector_Views is
type VUI_View is record
   Values : Varray_unsigned_int;
end record;
+   pragma Universal_Aliasing (VUI_View);
 
type Varray_signed_int is array (Vint_Range) of signed_int;
for Varray_signed_int'Alignment use VECTOR_ALIGNMENT;
@@ -119,6 +126,7 @@ package GNAT.Altivec.Vector_Views is
type VSI_View is record
   Values : Varray_signed_int;
end record;
+   pragma Universal_Aliasing (VSI_View);
 
type Varray_bool_int is array (Vint_Range) of bool_int;
for Varray_bool_int'Alignment use VECTOR_ALIGNMENT;
@@ -126,6 +134,7 @@ package GNAT.Altivec.Vector_Views is
type VBI_View is record
   Values : Varray_bool_int;
end record;
+   pragma Universal_Aliasing (VBI_View);
 
--
-- float components --
@@ -139,6 +148,7 @@ package GNAT.Altivec.Vector_Views is
type VF_View is record
   Values : Varray_float;
end record;
+   pragma Universal_Aliasing (VF_View);
 
--
-- pixel components --
@@ -152,5 +162,6 @@ package GNAT.Altivec.Vector_Views is
type VP_View is record
   Values : Varray_pixel;
end record;
+   pragma Universal_Aliasing (VP_View);
 
 end GNAT.Altivec.Vector_Views;
diff --git a/gcc/ada/libgnat/s-stratt.ads b/gcc/ada/libgnat/s-stratt.ads
index 1d4c82d17ab..eee19f4bdce 100644
--- a/gcc/ada/libgnat/s-stratt.ads
+++ b/gcc/ada/libgnat/s-stratt.ads
@@ -74,6 +74,9 @@ package System.Stream_Attributes is
   P2 : System.Address;
end record;
 
+   pragma Universal_Aliasing (Fat_Pointer);
+   --  This avoids a copy for the aforementioned unchecked conversions
+

-- Treatment of enumeration types --

-- 
2.43.2



[COMMITTED 21/35] ada: Fix others error message location

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

Before this patch, the compiler pointed at the wrong component
association when reporting an illegal occurrence of "others" in an
aggregate. This patch fixes this by keeping track of which choice
contains the occurrence of "others" when resolving array aggregates.

gcc/ada/

* sem_aggr.adb (Resolve_Array_Aggregate): Fix location of error
message.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 43 +++
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 64e7db79ecc..ee9beb04c9a 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -1335,7 +1335,7 @@ package body Sem_Aggr is
   Index_Base_High : constant Node_Id   := Type_High_Bound (Index_Base);
   --  Ditto for the base type
 
-  Others_Present : Boolean := False;
+  Others_N : Node_Id := Empty;
 
   Nb_Choices : Nat := 0;
   --  Contains the overall number of named choices in this sub-aggregate
@@ -1870,7 +1870,7 @@ package body Sem_Aggr is
 
 while Present (Choice) loop
if Nkind (Choice) = N_Others_Choice then
-  Others_Present := True;
+  Others_N := Choice;
 
else
   Analyze (Choice);
@@ -2189,7 +2189,7 @@ package body Sem_Aggr is
 Delete_Choice := False;
 while Present (Choice) loop
if Nkind (Choice) = N_Others_Choice then
-  Others_Present := True;
+  Others_N := Choice;
 
   if Choice /= First (Choice_List (Assoc))
 or else Present (Next (Choice))
@@ -2289,7 +2289,7 @@ package body Sem_Aggr is
 
   if Present (Expressions (N))
 and then (Nb_Choices > 1
-   or else (Nb_Choices = 1 and then not Others_Present))
+   or else (Nb_Choices = 1 and then No (Others_N)))
   then
  Error_Msg_N
("cannot mix named and positional associations in array aggregate",
@@ -2299,16 +2299,11 @@ package body Sem_Aggr is
 
   --  Test for the validity of an others choice if present
 
-  if Others_Present and then not Others_Allowed then
- declare
-Others_N : constant Node_Id :=
-  First (Choice_List (First (Component_Associations (N;
- begin
-Error_Msg_N ("OTHERS choice not allowed here", Others_N);
-Error_Msg_N ("\qualify the aggregate with a constrained subtype "
- & "to provide bounds for it", Others_N);
-return Failure;
- end;
+  if Present (Others_N) and then not Others_Allowed then
+ Error_Msg_N ("OTHERS choice not allowed here", Others_N);
+ Error_Msg_N ("\qualify the aggregate with a constrained subtype "
+  & "to provide bounds for it", Others_N);
+ return Failure;
   end if;
 
   --  Protect against cascaded errors
@@ -2320,7 +2315,7 @@ package body Sem_Aggr is
   --  STEP 2: Process named components
 
   if No (Expressions (N)) then
- if Others_Present then
+ if Present (Others_N) then
 Case_Table_Size := Nb_Choices - 1;
  else
 Case_Table_Size := Nb_Choices;
@@ -2709,7 +2704,7 @@ package body Sem_Aggr is
 
  if Lo_Val <= Hi_Val
or else (Lo_Val > Hi_Val + 1
- and then not Others_Present)
+ and then No (Others_N))
  then
 Missing_Or_Duplicates := True;
 exit;
@@ -2796,7 +2791,7 @@ package body Sem_Aggr is
  --  Loop through entries in table to find missing indexes.
  --  Not needed if others, since missing impossible.
 
- if not Others_Present then
+ if No (Others_N) then
 for J in 2 .. Nb_Discrete_Choices loop
Lo_Val := Expr_Value (Table (J).Lo);
Hi_Val := Table (J - 1).Highest;
@@ -2862,7 +2857,7 @@ package body Sem_Aggr is
 --  If Others is present, then bounds of aggregate come from the
 --  index constraint (not the choices in the aggregate itself).
 
-if Others_Present then
+if Present (Others_N) then
Get_Index_Bounds (Index_Constr, Aggr_Low, Aggr_High);
 
--  Abandon processing if either bound is already signalled as
@@ -3043,7 +3038,7 @@ package body Sem_Aggr is
 Next (Expr);
  end loop;
 
- if Others_Present then
+ if Present (Others_N) then
 Assoc := Last (Component_Associations (N));
 
 --  Ada 2005 (AI-231)
@@ -3102,7 +3097,7 @@ package body Sem_Aggr is
 
  --  STEP 3 (B): 

[COMMITTED 35/35] ada: Improve deriving initial sizes for container aggregates

2024-05-17 Thread Marc Poulhiès
From: Viljar Indus 

Deriving the initial size of container aggregates is necessary
for deriving the correct capacity for bounded containers.

Add support for deriving the correct initial size
when the container aggregate is iterating over an array
object.

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate):
Derive the size for iterable aggregates in the case of
one-dimensional array objects.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 83 +---
 1 file changed, 55 insertions(+), 28 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 892f47ceb05..2476675604c 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -6693,9 +6693,9 @@ package body Exp_Aggr is
 
 --  If one or more of the associations is one of the iterated
 --  forms, and is either an association with nonstatic bounds
---  or is an iterator over an iterable object, then treat the
---  whole container aggregate as having a nonstatic number of
---  elements.
+--  or is an iterator over an iterable object where the size
+--  cannot be derived, then treat the whole container aggregate as
+--  having a nonstatic number of elements.
 
 declare
Has_Nonstatic_Length : Boolean := False;
@@ -6725,37 +6725,43 @@ package body Exp_Aggr is
 Comp := First (Component_Associations (N));
 
 while Present (Comp) loop
-   Choice := First (Choice_List (Comp));
+   if Present (Choice_List (Comp)) then
+  Choice := First (Choice_List (Comp));
 
-   while Present (Choice) loop
-  Analyze (Choice);
+  while Present (Choice) loop
+ Analyze (Choice);
 
-  if Nkind (Choice) = N_Range then
- Lo := Low_Bound (Choice);
- Hi := High_Bound (Choice);
- Add_Range_Size;
+ if Nkind (Choice) = N_Range then
+Lo := Low_Bound (Choice);
+Hi := High_Bound (Choice);
+Add_Range_Size;
 
-  elsif Is_Entity_Name (Choice)
-and then Is_Type (Entity (Choice))
-  then
- Lo := Type_Low_Bound (Entity (Choice));
- Hi := Type_High_Bound (Entity (Choice));
- Add_Range_Size;
+ elsif Is_Entity_Name (Choice)
+   and then Is_Type (Entity (Choice))
+ then
+Lo := Type_Low_Bound (Entity (Choice));
+Hi := Type_High_Bound (Entity (Choice));
+Add_Range_Size;
 
- Rewrite (Choice,
-   Make_Range (Loc,
- New_Copy_Tree (Lo),
- New_Copy_Tree (Hi)));
+Rewrite (Choice,
+  Make_Range (Loc,
+New_Copy_Tree (Lo),
+New_Copy_Tree (Hi)));
 
-  else
- --  Single choice (syntax excludes a subtype
- --  indication).
+ else
+--  Single choice (syntax excludes a subtype
+--  indication).
 
- Siz := Siz + 1;
-  end if;
+Siz := Siz + 1;
+ end if;
 
-  Next (Choice);
-   end loop;
+ Next (Choice);
+  end loop;
+
+   elsif Nkind (Comp) = N_Iterated_Component_Association then
+
+  Siz := Siz + Build_Siz_Exp (Comp);
+   end if;
Next (Comp);
 end loop;
  end if;
@@ -6770,6 +6776,7 @@ package body Exp_Aggr is
   function Build_Siz_Exp (Comp : Node_Id) return Int is
  Lo, Hi   : Node_Id;
  Temp_Siz_Exp : Node_Id;
+ It   : Node_Id;
 
   begin
  if Nkind (Comp) = N_Range then
@@ -6835,8 +6842,28 @@ package body Exp_Aggr is
 end if;
 
  elsif Nkind (Comp) = N_Iterated_Component_Association then
-return Build_Siz_Exp (First (Discrete_Choices (Comp)));
+if Present (Iterator_Specification (Comp)) then
+
+   --  If the static size of the iterable object is known,
+   --  attempt to return it.
+
+   It := Name (Iterator_Specification (Comp));
+   Preanalyze (It);
 
+   --  Handle the simplest cases for now where It denotes a
+   --  top-level one-dimensional array objects".
+
+   if Nkind (It) in N_Identifier
+ and then Ekind (Etype (It)) = 

[COMMITTED 25/35] ada: Fix for validity checking and conditional evaluation of 'Old

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

Detection of expression that are "known on entry" (as defined in Ada
2022 RM 6.1.1(20/5)) was confused by validity checks when used from
within expansion of attribute 'Old.

gcc/ada/

* sem_util.adb (Is_Known_On_Entry): Handle constants introduced
by validity checks.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_util.adb | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index be777d26e46..d512d462b44 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -30791,6 +30791,14 @@ package body Sem_Util is
   return False;
end if;
 
+   --  Handle constants introduced by side-effect
+   --  removal, e.g. by validity checks.
+
+   if not Comes_From_Source (Obj) then
+  return
+Is_Known_On_Entry (Expression (Parent (Obj)));
+   end if;
+
--  return False if not "all views are constant".
if Is_Immutably_Limited_Type (Obj_Typ)
  or Needs_Finalization (Obj_Typ)
-- 
2.43.2



[COMMITTED 22/35] ada: Clarify code for aggregate warnings

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch improves comments in code that emits warnings about
particular situations involving aggregates. It also removes a
conjunct in a condition that's useless because always true in the
context of the test.

gcc/ada/

* sem_aggr.adb (Resolve_Array_Aggregate): Improve comments
and condition.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 52 +---
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index ee9beb04c9a..14c68b5eaf3 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -2873,9 +2873,9 @@ package body Sem_Aggr is
 --  No others clause present
 
 else
-   --  Special processing if others allowed and not present. This
-   --  means that the bounds of the aggregate come from the index
-   --  constraint (and the length must match).
+   --  Special processing if others allowed and not present. In
+   --  this case, the bounds of the aggregate come from the
+   --  choices (RM 4.3.3 (27)).
 
if Others_Allowed then
   Get_Index_Bounds (Index_Constr, Aggr_Low, Aggr_High);
@@ -2890,30 +2890,28 @@ package body Sem_Aggr is
  return False;
   end if;
 
-  --  If others allowed, and no others present, then the array
-  --  should cover all index values. If it does not, we will
-  --  get a length check warning, but there is two cases where
-  --  an additional warning is useful:
-
-  --  If we have no positional components, and the length is
-  --  wrong (which we can tell by others being allowed with
-  --  missing components), and the index type is an enumeration
-  --  type, then issue appropriate warnings about these missing
-  --  components. They are only warnings, since the aggregate
-  --  is fine, it's just the wrong length. We skip this check
-  --  for standard character types (since there are no literals
-  --  and it is too much trouble to concoct them), and also if
-  --  any of the bounds have values that are not known at
-  --  compile time.
-
-  --  Another case warranting a warning is when the length
-  --  is right, but as above we have an index type that is
-  --  an enumeration, and the bounds do not match. This is a
-  --  case where dubious sliding is allowed and we generate a
-  --  warning that the bounds do not match.
-
-  if No (Expressions (N))
-and then Nkind (Index) = N_Range
+  --  If there is an applicable index constraint and others is
+  --  not present, then sliding is allowed and only a length
+  --  check will be performed. However, additional warnings are
+  --  useful if the index type is an enumeration type, as
+  --  sliding is dubious in this case. We emit two kinds of
+  --  warnings:
+  --
+  --1. If the length is wrong then there are missing
+  --   components; we issue appropriate warnings about
+  --   these missing components. They are only warnings,
+  --   since the aggregate is fine, it's just the wrong
+  --   length. We skip this check for standard character
+  --   types (since there are no literals and it is too
+  --   much trouble to concoct them), and also if any of
+  --   the bounds have values that are not known at compile
+  --   time.
+  --
+  --2. If the length is right but the bounds do not match,
+  --   we issue a warning, as we consider sliding dubious
+  --   when the index type is an enumeration type.
+
+  if Nkind (Index) = N_Range
 and then Is_Enumeration_Type (Etype (Index))
 and then not Is_Standard_Character_Type (Etype (Index))
 and then Compile_Time_Known_Value (Aggr_Low)
-- 
2.43.2



[COMMITTED 23/35] ada: Disable Equivalent_Array_Aggregate optimization if predicates involved

2024-05-17 Thread Marc Poulhiès
From: Steve Baird 

In most paths, the function Build_Equivalent_Record_Aggregate was already
testing Has_Predicates for a given component type and conditionally returning
an Empty result. This is also needed in the case of a scalar component type.
Without it, we can build corrupt trees that fail use-before-definition
detection checks in gigi.

gcc/ada/

* exp_ch3.adb (Build_Equivalent_Record_Aggregate): Add
Has_Predicates test for a scalar component to match what is
already done for other kinds of components.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 5764b22b800..f6314dff285 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -1950,6 +1950,7 @@ package body Exp_Ch3 is
   or else not Compile_Time_Known_Value (Type_Low_Bound (Comp_Type))
   or else not
 Compile_Time_Known_Value (Type_High_Bound (Comp_Type))
+  or else Has_Predicates (Etype (Comp))
 then
Initialization_Warning (T);
return Empty;
-- 
2.43.2



[COMMITTED 20/35] ada: Expose utility routine for processing of Depends contracts in SPARK

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

Routine Is_Unconstrained_Or_Tagged_Item is now used both in the GNAT
frontend (for checking legality of Depends clauses) and in the GNATprove
backend (for representing implicit inputs in flow graphs).

gcc/ada/

* sem_prag.adb (Is_Unconstrained_Or_Tagged_Item): Move to
Sem_Util, so it can be used from GNATprove.
* sem_util.ads (Is_Unconstrained_Or_Tagged_Item): Move from
Sem_Prag; spec.
* sem_util.adb (Is_Unconstrained_Or_Tagged_Item): Move from
Sem_Prag; body.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 29 -
 gcc/ada/sem_util.adb | 23 +++
 gcc/ada/sem_util.ads |  5 +
 3 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 02aad4d1caa..f27e40edcbb 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -280,12 +280,6 @@ package body Sem_Prag is
--  Determine whether dependency clause Clause is surrounded by extra
--  parentheses. If this is the case, issue an error message.
 
-   function Is_Unconstrained_Or_Tagged_Item (Item : Entity_Id) return Boolean;
-   --  Subsidiary to Collect_Subprogram_Inputs_Outputs and the analysis of
-   --  pragma Depends. Determine whether the type of dependency item Item is
-   --  tagged, unconstrained array, unconstrained private or unconstrained
-   --  record.
-
procedure Record_Possible_Body_Reference
  (State_Id : Entity_Id;
   Ref  : Node_Id);
@@ -32959,29 +32953,6 @@ package body Sem_Prag is
   and then List_Containing (N) = Private_Declarations (Parent (N));
end Is_Private_SPARK_Mode;
 
-   -
-   -- Is_Unconstrained_Or_Tagged_Item --
-   -
-
-   function Is_Unconstrained_Or_Tagged_Item
- (Item : Entity_Id) return Boolean
-   is
-  Typ : constant Entity_Id := Etype (Item);
-   begin
-  if Is_Tagged_Type (Typ) then
- return True;
-
-  elsif Is_Array_Type (Typ)
-or else Is_Record_Type (Typ)
-or else Is_Private_Type (Typ)
-  then
- return not Is_Constrained (Typ);
-
-  else
- return False;
-  end if;
-   end Is_Unconstrained_Or_Tagged_Item;
-
-
-- Is_Valid_Assertion_Kind --
-
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index dd9f868b696..be777d26e46 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -20709,6 +20709,29 @@ package body Sem_Util is
   return T = Universal_Integer or else T = Universal_Real;
end Is_Universal_Numeric_Type;
 
+   -
+   -- Is_Unconstrained_Or_Tagged_Item --
+   -
+
+   function Is_Unconstrained_Or_Tagged_Item
+ (Item : Entity_Id) return Boolean
+   is
+  Typ : constant Entity_Id := Etype (Item);
+   begin
+  if Is_Tagged_Type (Typ) then
+ return True;
+
+  elsif Is_Array_Type (Typ)
+or else Is_Record_Type (Typ)
+or else Is_Private_Type (Typ)
+  then
+ return not Is_Constrained (Typ);
+
+  else
+ return False;
+  end if;
+   end Is_Unconstrained_Or_Tagged_Item;
+
--
-- Is_User_Defined_Equality --
--
diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
index 99c60ddf708..4fef8966380 100644
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -2397,6 +2397,11 @@ package Sem_Util is
pragma Inline (Is_Universal_Numeric_Type);
--  True if T is Universal_Integer or Universal_Real
 
+   function Is_Unconstrained_Or_Tagged_Item (Item : Entity_Id) return Boolean;
+   --  Subsidiary to Collect_Subprogram_Inputs_Outputs and the analysis of
+   --  pragma Depends. Determine whether the type of dependency item Item is
+   --  tagged, unconstrained array or unconstrained record.
+
function Is_User_Defined_Equality (Id : Entity_Id) return Boolean;
--  Determine whether an entity denotes a user-defined equality
 
-- 
2.43.2



[COMMITTED 14/35] ada: gnatbind-related cleanups

2024-05-17 Thread Marc Poulhiès
From: Bob Duff 

This patch cleans up some things noticed while working on gnatbind.
No change in behavior yet.

gcc/ada/

* ali-util.adb (Read_Withed_ALIs): Minor reformatting.
* bindo-units.adb (Corresponding_Body): Add assert.
(Corresponding_Spec): Likewise.
* uname.adb: Clean up assertions, use available functions.
Get_Spec_Name/Get_Body_Name can assert that N obeys the
conventions for Unit_Name_Type (end in "%s" or "%b").

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/ali-util.adb|  4 +--
 gcc/ada/bindo-units.adb |  8 --
 gcc/ada/uname.adb   | 61 ++---
 3 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/gcc/ada/ali-util.adb b/gcc/ada/ali-util.adb
index fe0af74086c..61dddb94e85 100644
--- a/gcc/ada/ali-util.adb
+++ b/gcc/ada/ali-util.adb
@@ -161,9 +161,7 @@ package body ALI.Util is
   --  Process all dependent units
 
   for U in ALIs.Table (Id).First_Unit .. ALIs.Table (Id).Last_Unit loop
- for
-   W in Units.Table (U).First_With .. Units.Table (U).Last_With
- loop
+ for W in Units.Table (U).First_With .. Units.Table (U).Last_With loop
 Afile := Withs.Table (W).Afile;
 
 --  Only process if not a generic (Afile /= No_File) and if
diff --git a/gcc/ada/bindo-units.adb b/gcc/ada/bindo-units.adb
index 0fbe8e9d381..0acc6612270 100644
--- a/gcc/ada/bindo-units.adb
+++ b/gcc/ada/bindo-units.adb
@@ -103,7 +103,9 @@ package body Bindo.Units is
 
begin
   pragma Assert (U_Rec.Utype = Is_Spec);
-  return U_Id - 1;
+  return Result : constant Unit_Id := U_Id - 1 do
+ pragma Assert (ALI.Units.Table (Result).Utype = Is_Body);
+  end return;
end Corresponding_Body;
 

@@ -117,7 +119,9 @@ package body Bindo.Units is
 
begin
   pragma Assert (U_Rec.Utype = Is_Body);
-  return U_Id + 1;
+  return Result : constant Unit_Id := U_Id + 1 do
+ pragma Assert (ALI.Units.Table (Result).Utype = Is_Spec);
+  end return;
end Corresponding_Spec;
 

diff --git a/gcc/ada/uname.adb b/gcc/ada/uname.adb
index 08574784173..dbb08b88cfd 100644
--- a/gcc/ada/uname.adb
+++ b/gcc/ada/uname.adb
@@ -50,14 +50,8 @@ package body Uname is
   Buffer : Bounded_String;
begin
   Append (Buffer, N);
-
-  pragma Assert
-(Buffer.Length > 2
- and then Buffer.Chars (Buffer.Length - 1) = '%'
- and then Buffer.Chars (Buffer.Length) = 's');
-
+  pragma Assert (Is_Spec_Name (N));
   Buffer.Chars (Buffer.Length) := 'b';
-
   return Name_Find (Buffer);
end Get_Body_Name;
 
@@ -160,14 +154,8 @@ package body Uname is
   Buffer : Bounded_String;
begin
   Append (Buffer, N);
-
-  pragma Assert
-(Buffer.Length > 2
- and then Buffer.Chars (Buffer.Length - 1) = '%'
- and then Buffer.Chars (Buffer.Length) = 'b');
-
+  pragma Assert (Is_Body_Name (N));
   Buffer.Chars (Buffer.Length) := 's';
-
   return Name_Find (Buffer);
end Get_Spec_Name;
 
@@ -416,6 +404,9 @@ package body Uname is
   Suffix : Boolean := True)
is
begin
+  pragma Assert (Buf.Chars (1) /= '"');
+  pragma Assert (Is_Body_Name (N) or else Is_Spec_Name (N));
+
   Buf.Length := 0;
   Append_Decoded (Buf, N);
 
@@ -424,17 +415,11 @@ package body Uname is
   --  (lower case) 's'/'b', and before appending (lower case) "spec" or
   --  "body".
 
-  pragma Assert (Buf.Length >= 3);
-  pragma Assert (Buf.Chars (1) /= '"');
-  pragma Assert (Buf.Chars (Buf.Length) in 's' | 'b');
-
   declare
  S : constant String :=
(if Buf.Chars (Buf.Length) = 's' then " (spec)" else " (body)");
   begin
- Buf.Length := Buf.Length - 1; -- remove 's' or 'b'
- pragma Assert (Buf.Chars (Buf.Length) = '%');
- Buf.Length := Buf.Length - 1; -- remove '%'
+ Buf.Length := Buf.Length - 2; -- remove "%s" or "%b"
  Set_Casing (Buf, Identifier_Casing (Source_Index (Main_Unit)));
 
  if Suffix then
@@ -474,9 +459,9 @@ package body Uname is
   Buffer : Bounded_String;
begin
   Append (Buffer, N);
-  return Buffer.Length > 2
-and then Buffer.Chars (Buffer.Length - 1) = '%'
-and then Buffer.Chars (Buffer.Length) = 'b';
+  pragma Assert
+(Buffer.Length > 2 and then Buffer.Chars (Buffer.Length - 1) = '%');
+  return Buffer.Chars (Buffer.Length) = 'b';
end Is_Body_Name;
 
---
@@ -535,10 +520,7 @@ package body Uname is
   System : constant String := "system";
 
begin
-  if Name = Ada
-or else Name = Interfaces
-or else Name = System
-  then
+  if Name in Ada | Interfaces | System then
  return True;
   end if;
 
@@ -555,15 +537,14 @@ package body Uname is
 
   --  The following are 

[COMMITTED 18/35] ada: gnatbind: subprogram spec no longer exists

2024-05-17 Thread Marc Poulhiès
From: Bob Duff 

If a subprogram spec S is present while compiling something that
says "with S;", but the spec is absent while compiling the body
of S, then gnatbind fails to detect the mismatch.  The spec and
body of S might have different parameter and result types.
This patch fixes gnatbind to detect this case and give an error.

gcc/ada/

* bcheck.adb (Check_Consistency_Of_Sdep): Split out new procedure.
Add check for special case of subprogram spec that no longer
exists.
(Check_Consistency): Call Check_Consistency_Of_Sdep, except when
Reified_Child_Spec is True. No need for "goto Continue" or "exit
Sdep_Loop".
* ali.ads (Subunit_Name, Unit_Name): Change the type to
Unit_Name_Type. Add a comment pointing to the ALI file
documentation, because it's in a somewhat-surprising place.
* ali.adb (Scan_ALI): Subunit_Name and Unit_Name are now
Unit_Name_Type. Remove comment explaining why Name_Find is used;
Name_Find is the usual case. Do not remove the "%s" or "%b" from
the Unit_Name. We need to be able to distinguish specs and bodies.
This is also necessary to obey the invariant of Unit_Name_Type.
* binde.adb (Write_Closure): Subunit_Name is now Unit_Name_Type.
* clean.adb (Clean_Executables): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/ali.adb|   9 +-
 gcc/ada/ali.ads|  10 +--
 gcc/ada/bcheck.adb | 216 +++--
 gcc/ada/binde.adb  |   2 +-
 gcc/ada/clean.adb  |   2 +-
 5 files changed, 141 insertions(+), 98 deletions(-)

diff --git a/gcc/ada/ali.adb b/gcc/ada/ali.adb
index 6bf48c04afe..69a91bce5ab 100644
--- a/gcc/ada/ali.adb
+++ b/gcc/ada/ali.adb
@@ -3287,8 +3287,8 @@ package body ALI is
 
 --  Acquire (sub)unit and reference file name entries
 
-Sdep.Table (Sdep.Last).Subunit_Name := No_Name;
-Sdep.Table (Sdep.Last).Unit_Name:= No_Name;
+Sdep.Table (Sdep.Last).Subunit_Name := No_Unit_Name;
+Sdep.Table (Sdep.Last).Unit_Name:= No_Unit_Name;
 Sdep.Table (Sdep.Last).Rfile:=
   Sdep.Table (Sdep.Last).Sfile;
 Sdep.Table (Sdep.Last).Start_Line   := 1;
@@ -3304,16 +3304,13 @@ package body ALI is
  Add_Char_To_Name_Buffer (Getc);
   end loop;
 
-  --  Set the (sub)unit name. Note that we use Name_Find rather
-  --  than Name_Enter here as the subunit name may already
-  --  have been put in the name table by the Project Manager.
+  --  Set the (sub)unit name.
 
   if Name_Len <= 2
 or else Name_Buffer (Name_Len - 1) /= '%'
   then
  Sdep.Table (Sdep.Last).Subunit_Name := Name_Find;
   else
- Name_Len := Name_Len - 2;
  Sdep.Table (Sdep.Last).Unit_Name := Name_Find;
   end if;
 
diff --git a/gcc/ada/ali.ads b/gcc/ada/ali.ads
index 67b8fcd1b80..1f452268681 100644
--- a/gcc/ada/ali.ads
+++ b/gcc/ada/ali.ads
@@ -25,7 +25,7 @@
 
 --  This package defines the internal data structures used for representation
 --  of Ada Library Information (ALI) acquired from the ALI files generated by
---  the front end.
+--  the front end. The format of the ALI files is documented in Lib.Writ.
 
 with Casing;  use Casing;
 with Gnatvsn; use Gnatvsn;
@@ -882,11 +882,11 @@ package ALI is
   --  Set True for dummy entries that correspond to missing files or files
   --  where no dependency relationship exists.
 
-  Subunit_Name : Name_Id;
-  --  Name_Id for subunit name if present, else No_Name
+  Subunit_Name : Unit_Name_Type;
+  --  Subunit name if present, else No_Unit_Name
 
-  Unit_Name : Name_Id;
-  --  Name_Id for the unit name if not a subunit (No_Name for a subunit)
+  Unit_Name : Unit_Name_Type;
+  --  Unit name if not a subunit (No_Unit_Name for a subunit)
 
   Rfile : File_Name_Type;
   --  Reference file name. Same as Sfile unless a Source_Reference pragma
diff --git a/gcc/ada/bcheck.adb b/gcc/ada/bcheck.adb
index dd2ece80d01..56a417cc517 100644
--- a/gcc/ada/bcheck.adb
+++ b/gcc/ada/bcheck.adb
@@ -36,6 +36,7 @@ with Osint;
 with Output;   use Output;
 with Rident;   use Rident;
 with Types;use Types;
+with Uname;
 
 package body Bcheck is
 
@@ -68,6 +69,12 @@ package body Bcheck is
--  Used to compare two unit names for No_Dependence checks. U1 is in
--  standard unit name format, and U2 is in literal form with periods.
 
+   procedure Check_Consistency_Of_Sdep
+ (A : ALIs_Record; D : Sdep_Record; Src : Source_Record);
+   --  Called by Check_Consistency to check the consistency of one Sdep record,
+   --  where A is the ALI, and D represents the unit it depends on, and Src is
+   --  the source file 

[COMMITTED 12/35] ada: Fix crash caused by missing New_Copy_tree

2024-05-17 Thread Marc Poulhiès
Since a recent refactor ("Factor common processing in expansion of
aggregates") where Initialize_Array_Component and
Initialize_Record_Component are merged, the behavior has slightly
changed. In the case of the expansion of an aggregate initialization
where the number of 'others' components is <= 3, the initialization
expression is not duplicated anymore, causing some incorrect multiple
definition when said expression is later transformed with
Expressions_With_Action that declares an object. The simple fix is to
add the now missing New_Copy_Tree where the assignments are created.

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code) : Copy the
initialization expression when unrolling the loop.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index cff04fc1b79..9c5944a917d 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -1649,11 +1649,14 @@ package body Exp_Aggr is
and then Local_Expr_Value (H) - Local_Expr_Value (L) <= 2
and then not Is_Iterated_Component
  then
-Append_List_To (S, Gen_Assign (New_Copy_Tree (L), Expr));
-Append_List_To (S, Gen_Assign (Add (1, To => L), Expr));
+Append_List_To
+  (S, Gen_Assign (New_Copy_Tree (L), New_Copy_Tree (Expr)));
+Append_List_To
+  (S, Gen_Assign (Add (1, To => L), New_Copy_Tree (Expr)));
 
 if Local_Expr_Value (H) - Local_Expr_Value (L) = 2 then
-   Append_List_To (S, Gen_Assign (Add (2, To => L), Expr));
+   Append_List_To
+ (S, Gen_Assign (Add (2, To => L), New_Copy_Tree (Expr)));
 end if;
 
 return S;
-- 
2.43.2



[COMMITTED 15/35] ada: correction to gnatbind-related cleanups

2024-05-17 Thread Marc Poulhiès
From: Bob Duff 

Correction to previous change; Asserts had been moved to
before Buf was initialized.

gcc/ada/

* uname.adb (Get_Unit_Name_String): Move Asserts after
Buf is initialized.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/uname.adb | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/uname.adb b/gcc/ada/uname.adb
index dbb08b88cfd..5a7dac53b3d 100644
--- a/gcc/ada/uname.adb
+++ b/gcc/ada/uname.adb
@@ -404,11 +404,10 @@ package body Uname is
   Suffix : Boolean := True)
is
begin
-  pragma Assert (Buf.Chars (1) /= '"');
-  pragma Assert (Is_Body_Name (N) or else Is_Spec_Name (N));
-
   Buf.Length := 0;
   Append_Decoded (Buf, N);
+  pragma Assert (Buf.Chars (1) /= '"');
+  pragma Assert (Is_Body_Name (N) or else Is_Spec_Name (N));
 
   --  Buf always ends with "%s" or "%b", which we either remove, or replace
   --  with " (spec)" or " (body)". Set_Casing of Buf after checking for
-- 
2.43.2



[COMMITTED 29/35] ada: Replace spinlocks with fully-fledged locks in finalization collections

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

This replaces spinlocks with fully-fledged locks in finalization collections
because the former are deemed problematic with tasks that can be preempted.

Because of the requirement to avoid dragging the tasking runtime when it is
not necessary, the implementation goes through the usual soft links, with an
additional hurdle that space must be reserved for the lock in any case since
it is part of the ABI.  This entails the introduction of the System.OS_Locks
unit in the non-tasking runtime and the modification of the tasking runtime
to also use this unit.

This in turn requires a small adjustment: because of the presence of pre-
and post-conditions in Interfaces.C and of the limitations of the RTSfind
mechanism, the System.Finalization_Primitives unit must be preloaded, as
what is done for the Ada.Strings.Text_Buffers unit.

This effectively reverts the implementation to using the global task lock on
bare board platforms.

gcc/ada/

* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-oslock$(objext).
(LIBGNAT_TARGET_PAIRS): Use s-oslock__dummy.ads by default.
Set specific s-oslock.ads source file for all the platforms.
* exp_ch7.ads (Preload_Finalization_Collection): New procedure.
* exp_ch7.adb (Allows_Finalization_Collection): Return False if
System.Finalization_Primitives has not been preloaded.
(Preload_Finalization_Collection): New procedure.
* opt.ads (Interface_Seen): New boolean variable.
* s-oscons-tmplt.c: Use "N" string for pragma Style_Checks.
* scng.adb (Scan): Set Interface_Seen upon seeing "interface".
* sem_ch10.adb: Add clause for Exp_Ch7.
(Analyze_Compilation_Unit): Call Preload_Finalization_Collection
after the context of the unit is analyzed.
* libgnarl/a-rttiev.adb: Add with clause for System.OS_Locks and
alphabetize others.
(Event_Queue_Lock): Adjust qualified name of subtype.
* libgnarl/s-osinte__aix.ads: Add with clause for System.OS_Locks
and change pthread_mutex_t into a local subtype.
* libgnarl/s-osinte__android.ads: Likewise.
* libgnarl/s-osinte__darwin.ads: Likewise.
* libgnarl/s-osinte__dragonfly.ads: Likewise.
* libgnarl/s-osinte__freebsd.ads: Likewise.
* libgnarl/s-osinte__gnu.ads: Likewise.
* libgnarl/s-osinte__hpux-dce.ads: Likewise.
* libgnarl/s-osinte__hpux.ads: Add Likewise.
* libgnarl/s-osinte__kfreebsd-gnu.ads: Likewise.
* libgnarl/s-osinte__linux.ads: Likewise.
* libgnarl/s-osinte__lynxos178e.ads: Likewise.
* libgnarl/s-osinte__qnx.ads: Likewise.
* libgnarl/s-osinte__rtems.ads: Likewise.
* libgnarl/s-osinte__mingw.ads: Add with clause for System.OS_Locks
and change CRITICAL_SECTION into a local subtype.  Add declarations
for imported procedures dealing with CRITICAL_SECTION.
* libgnarl/s-osinte__solaris.ads: Add with clause for System.OS_Locks
and change mutex_t into a local subtype.
* libgnarl/s-osinte__vxworks.ads: Add missing blank line.
* libgnarl/s-taprop.ads: Alphabetize clauses and package renamings.
Use qualified name for RTS_Lock throughout.
* libgnarl/s-taprop__dummy.adb: Add use clause for System.OS_Locks
and alphabetize others.
* libgnarl/s-taprop__hpux-dce.adb: Likewise.
* libgnarl/s-taprop__linux.adb: Likewise.
* libgnarl/s-taprop__posix.adb: Likewise.
* libgnarl/s-taprop__qnx.adb: Likewise.
* libgnarl/s-taprop__rtems.adb: Likewise.
* libgnarl/s-taprop__solaris.adb: Likewise.
* libgnarl/s-taprop__vxworks.adb: Likewise.
* libgnarl/s-taprop__mingw.adb: Likewise.  Remove declarations for
imported procedures dealing with CRITICAL_SECTION.
* libgnarl/s-tarest.adb: Add with clause for System.OS_Locks and
alphabetize others.
(Global_Task_Lock): Adjust qualified name of subtype.
* libgnarl/s-tasini.adb: Add clause for System.OS_Locks.
(Initialize_RTS_Lock): New procedure.
(Finalize_RTS_Lock): Likewise.
(Acquire_RTS_Lock): Likewise.
(Release_RTS_Lock): Likewise.
(Init_RTS): Add compile-time assertions for RTS_Lock types.
Set the soft links for the RTS lock manipulation routines.
* libgnarl/s-taspri__dummy.ads: Add with clause for System.OS_Locks.
(RTS_Lock): Delete and adjust throughout accordingly.
* libgnarl/s-taspri__hpux-dce.ads: Likewise.
* libgnarl/s-taspri__lynxos.ads: Likewise.
* libgnarl/s-taspri__mingw.ads: Likewise.
* libgnarl/s-taspri__posix-noaltstack.ads: Likewise.
* libgnarl/s-taspri__posix.ads: Likewise.
* libgnarl/s-taspri__solaris.ads: Likewise.
* libgnarl/s-taspri__vxworks.ads: Likewise.
* libgnat/s-finpri.ads: Add clause for System.OS_Locks.
(Finalization_Collection): Change 

[COMMITTED 17/35] ada: Update docs for Resolve_Null_Array_Aggregate

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

The documentation comments for Sem_Aggr.Resolve_Null_Array_Aggregate
suggested that this subprogram created a subtype, which it didn't.
This patch replaces those comments with ones that better match the
behavior.

gcc/ada/

* sem_aggr.adb (Resolve_Null_Array_Aggregate): Update
documentation comments.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 508c86bc5de..64e7db79ecc 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -409,11 +409,10 @@ package body Sem_Aggr is
--  string as an aggregate, prior to resolution.
 
function Resolve_Null_Array_Aggregate (N : Node_Id) return Boolean;
-   --  For the Ada 2022 construct, build a subtype with a null range for each
-   --  dimension, using the bounds from the context subtype (if the subtype
-   --  is constrained). If the subtype is unconstrained, then the bounds
-   --  are determined in much the same way as the bounds for a null string
-   --  literal with no applicable index constraint.
+   --  The recursive method used to construct an aggregate's bounds in
+   --  Resolve_Array_Aggregate cannot work for null array aggregates. This
+   --  function constructs an appropriate list of ranges and stores its first
+   --  element in Aggregate_Bounds (N).
 
-
--  Delta aggregate processing --
@@ -4540,7 +4539,8 @@ package body Sem_Aggr is
 
   Set_Parent (Constr, N);
 
-  --  Create a constrained subtype with null dimensions
+  --  Populate the list with null ranges. The relevant RM clauses are
+  --  RM 4.3.3 (26.1) and RM 4.3.3 (26).
 
   Index := First_Index (Typ);
   while Present (Index) loop
-- 
2.43.2



[COMMITTED 13/35] ada: Make raise-gcc.c compatible with Clang

2024-05-17 Thread Marc Poulhiès
From: Sebastian Poeplau 

The Morello variant of Clang doesn't have
__builtin_code_address_from_pointer; work around it where necessary.

gcc/ada/

* raise-gcc.c: Work around __builtin_code_address_from_pointer
if it is unavailable.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/raise-gcc.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/ada/raise-gcc.c b/gcc/ada/raise-gcc.c
index 01cf4b6236d..7179f62529e 100644
--- a/gcc/ada/raise-gcc.c
+++ b/gcc/ada/raise-gcc.c
@@ -596,7 +596,15 @@ get_ip_from_context (_Unwind_Context *uw_context)
 #endif
 
 #if !defined(__USING_SJLJ_EXCEPTIONS__) && defined(__CHERI__)
+#if __has_builtin (__builtin_code_address_from_pointer)
   ip = __builtin_code_address_from_pointer ((void *)ip);
+#elif defined(__aarch64__)
+  /* Clang doesn't have __builtin_code_address_from_pointer to abstract over
+ target-specific differences. On AArch64, we need to drop the LSB of the
+ instruction pointer because it's not part of the address; it indicates the
+ CPU mode. */
+  ip &= ~1UL;
+#endif
 #endif
 
   /* Subtract 1 if necessary because GetIPInfo yields a call return address
-- 
2.43.2



[COMMITTED 09/35] ada: Simplify code for private types with unknown discriminants

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

Private type entities have Is_Constrained set when they have no
discriminants and no unknown discriminants; it is now set slightly
later, but simpler (this change could only affect Process_Discriminants,
but this flag should not be needed there).

Also, we now reuse this flag to detect private types with discriminants.

Code cleanup; behavior is unaffected.

gcc/ada/

* sem_ch7.adb (New_Private_Type): Simplify setting of
Is_Constrained flag.
* sem_prag.adb (Is_Unconstrained_Or_Tagged_Item): Simplify
detection of private types with no discriminant.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch7.adb  | 7 +++
 gcc/ada/sem_prag.adb | 3 +--
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/sem_ch7.adb b/gcc/ada/sem_ch7.adb
index 74646224452..a70d72c94c1 100644
--- a/gcc/ada/sem_ch7.adb
+++ b/gcc/ada/sem_ch7.adb
@@ -2746,10 +2746,6 @@ package body Sem_Ch7 is
   Set_Is_First_Subtype (Id);
   Reinit_Size_Align (Id);
 
-  Set_Is_Constrained (Id,
-No (Discriminant_Specifications (N))
-  and then not Unknown_Discriminants_Present (N));
-
   --  Set tagged flag before processing discriminants, to catch illegal
   --  usage.
 
@@ -2765,6 +2761,9 @@ package body Sem_Ch7 is
 
   elsif Unknown_Discriminants_Present (N) then
  Set_Has_Unknown_Discriminants (Id);
+
+  else
+ Set_Is_Constrained (Id);
   end if;
 
   Set_Private_Dependents (Id, New_Elmt_List);
diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 0302cdb00ba..e57f42d9a54 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -32978,8 +32978,7 @@ package body Sem_Prag is
  return Has_Discriminants (Typ) and then not Is_Constrained (Typ);
 
   elsif Is_Private_Type (Typ) then
- return Has_Discriminants (Typ)
-   or else Has_Unknown_Discriminants (Typ);
+ return not Is_Constrained (Typ);
 
   else
  return False;
-- 
2.43.2



[COMMITTED 34/35] ada: Remove outdated workaround in aggregate expansion

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

Before this patch, the compiler refrained from rewriting aggregates
into purely positional form in some cases of one-component aggregates.
As explained in comments, this was because the back end could not
handle positional aggregates in those situations.

As the back end seems to have grown more capable, this patch removes
the workaround. It also extends the comments describing a warning that
is emitted in the same configuration with aggregates.

gcc/ada/

* exp_aggr.adb (Aggr_Size_OK): Remove workaround and extend
comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 9c5944a917d..892f47ceb05 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -711,9 +711,10 @@ package body Exp_Aggr is
 return True;
  end if;
 
- --  One-component aggregates are suspicious, and if the context type
- --  is an object declaration with nonstatic bounds it will trip gcc;
- --  such an aggregate must be expanded into a single assignment.
+ --  One-component named aggregates where the index constraint is not
+ --  known at compile time are suspicious as the user might have
+ --  intended to write a subtype name but wrote the name of an object
+ --  instead. We emit a warning if we're in such a case.
 
  if Hiv = Lov and then Nkind (Parent (N)) = N_Object_Declaration then
 declare
@@ -741,8 +742,6 @@ package body Exp_Aggr is
 Error_Msg_N ("\maybe subtype name was meant??", Indx);
  end if;
   end if;
-
-  return False;
end if;
 end;
  end if;
-- 
2.43.2



[COMMITTED 19/35] ada: Couple of adjustments coming from aliasing considerations

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

The first adjustment is to the expansion of implementation types for array
types with peculiar index types, for which the aliased property set on the
component of the original type must be copied; the implicit base type also
needs to be properly marked if the implementation type is constrained.

The second adjustment is to selected types in the runtime, which need to
be marked with pragma Universal_Aliasing because of their special usage.

gcc/ada/

* exp_pakd.adb (Create_Packed_Array_Impl_Type): For non-bit-packed
array types, propagate the aliased property of the component.
(Install_PAT): Set fields on the implicit base type of an array.
* libgnat/a-stream.ads (private part): Add pragma Universal_Aliasing
for Stream_Element.
* libgnat/g-alleve.ads: Add pragma Universal_Aliasing for all the
vector types.
* libgnat/g-alleve__hard.ads: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_pakd.adb   | 12 +--
 gcc/ada/libgnat/a-stream.ads   |  3 ++
 gcc/ada/libgnat/g-alleve.ads   | 54 ++
 gcc/ada/libgnat/g-alleve__hard.ads | 11 ++
 4 files changed, 71 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb
index 3f26c3527fa..59dfe5df8df 100644
--- a/gcc/ada/exp_pakd.adb
+++ b/gcc/ada/exp_pakd.adb
@@ -598,6 +598,14 @@ package body Exp_Pakd is
  Set_Associated_Node_For_Itype (PAT, Typ);
  Set_Original_Array_Type   (PAT, Typ);
 
+ --  In the case of a constrained array type, also set fields on the
+ --  implicit base type built during the analysis of its declaration.
+
+ if Ekind (PAT) = E_Array_Subtype then
+Set_Is_Packed_Array_Impl_Type (Etype (PAT), True);
+Set_Original_Array_Type   (Etype (PAT), Base_Type (Typ));
+ end if;
+
  --  Propagate representation aspects
 
  Set_Is_Atomic   (PAT, Is_Atomic(Typ));
@@ -818,7 +826,7 @@ package body Exp_Pakd is
Subtype_Marks => Indexes,
Component_Definition =>
  Make_Component_Definition (Loc,
-   Aliased_Present=> False,
+   Aliased_Present=> Has_Aliased_Components (Typ),
Subtype_Indication =>
   New_Occurrence_Of (Ctyp, Loc)));
 
@@ -828,7 +836,7 @@ package body Exp_Pakd is
 Discrete_Subtype_Definitions => Indexes,
 Component_Definition =>
   Make_Component_Definition (Loc,
-Aliased_Present=> False,
+Aliased_Present=> Has_Aliased_Components (Typ),
 Subtype_Indication =>
   New_Occurrence_Of (Ctyp, Loc)));
 end if;
diff --git a/gcc/ada/libgnat/a-stream.ads b/gcc/ada/libgnat/a-stream.ads
index 0a0cabce3f2..dcb5a9aa81c 100644
--- a/gcc/ada/libgnat/a-stream.ads
+++ b/gcc/ada/libgnat/a-stream.ads
@@ -84,4 +84,7 @@ private
for Stream_Element_Array'Read use Read_SEA;
for Stream_Element_Array'Write use Write_SEA;
 
+   pragma Universal_Aliasing (Stream_Element);
+   --  This type is used to stream any other type
+
 end Ada.Streams;
diff --git a/gcc/ada/libgnat/g-alleve.ads b/gcc/ada/libgnat/g-alleve.ads
index 0f3ec36d0f1..4e22a3e6387 100644
--- a/gcc/ada/libgnat/g-alleve.ads
+++ b/gcc/ada/libgnat/g-alleve.ads
@@ -313,22 +313,62 @@ private
---
 
--  We simply use the natural array definitions corresponding to each
-   --  user-level vector type.
+   --  user-level vector type. We need to put pragma Universal_Aliasing
+   --  on these types because the common operations are implemented by
+   --  means of Unchecked_Conversion betwwen different representations.
 
-   type LL_VUI is new VUI_View;
-   type LL_VSI is new VSI_View;
-   type LL_VBI is new VBI_View;
+   --
+   -- char Core Components --
+   --
+
+   type LL_VUC is new VUC_View;
+   pragma Universal_Aliasing (LL_VUC);
+
+   type LL_VSC is new VSC_View;
+   pragma Universal_Aliasing (LL_VSC);
+
+   type LL_VBC is new VBC_View;
+   pragma Universal_Aliasing (LL_VBC);
+
+   ---
+   -- short Core Components --
+   ---
 
type LL_VUS is new VUS_View;
+   pragma Universal_Aliasing (LL_VUS);
+
type LL_VSS is new VSS_View;
+   pragma Universal_Aliasing (LL_VSS);
+
type LL_VBS is new VBS_View;
+   pragma Universal_Aliasing (LL_VBS);
 
-   type LL_VUC is new VUC_View;
-   type LL_VSC is new VSC_View;
-   type LL_VBC is new VBC_View;
+   -
+   -- int Core Components --
+   -
+
+   type LL_VUI is new VUI_View;
+   pragma Universal_Aliasing (LL_VUI);
+
+   type LL_VSI is new 

[COMMITTED 10/35] ada: Only record types with discriminants can be unconstrained

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

Remove redundant condition for detecting unconstrained record types.

Code cleanup; behavior is unaffected.

gcc/ada/

* sem_prag.adb (Is_Unconstrained_Or_Tagged_Item): Remove call
to Has_Discriminants; combine ELSIF branches.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index e57f42d9a54..02aad4d1caa 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -32971,13 +32971,10 @@ package body Sem_Prag is
   if Is_Tagged_Type (Typ) then
  return True;
 
-  elsif Is_Array_Type (Typ) then
- return not Is_Constrained (Typ);
-
-  elsif Is_Record_Type (Typ) then
- return Has_Discriminants (Typ) and then not Is_Constrained (Typ);
-
-  elsif Is_Private_Type (Typ) then
+  elsif Is_Array_Type (Typ)
+or else Is_Record_Type (Typ)
+or else Is_Private_Type (Typ)
+  then
  return not Is_Constrained (Typ);
 
   else
-- 
2.43.2



[COMMITTED 16/35] ada: Fix containers' Reference_Preserving_Key functions' memory leaks

2024-05-17 Thread Marc Poulhiès
From: Steve Baird 

Fix memory leaks in containers' Reference_Preserving_Key functions

Make the same change in each of 3 Ada.Containers child units: Ordered_Sets,
Indefinite_Ordered_Sets, and Bounded_Ordered_Sets. The function
Reference_Preserving_Key evaluates an allocator of type Key_Access whose
storage was not being reclaimed. Update the Finalize procedure for
type Reference_Control_Type to free that storage. But this change introduces
a possible erroneous double-free situation if an object is copied (because
the original and the copy will each be finalized at some point). So also
introduce an Adjust procedure which allocates a copy of the allocated object.
Another possible solution to this problem (which is not being implemented
yet) is based on implementing AI22-0082. Also include a fix for a bug in
Sem_Util.Has_Some_Controlled_Component that was discovered while working
on this.

gcc/ada/

* sem_util.adb (Has_Some_Controlled_Component): Fix a bug which
causes (in some cases involving a Disable_Controlled aspect
specification) Needs_Finalization to return different answers for
one type depending on whether the function is called before or
after the type is frozen.
* libgnat/a-coorse.ads: Type Control_Reference_Type gets an Adjust
procedure.
* libgnat/a-cborse.ads: Likewise.
* libgnat/a-ciorse.ads: Likewise
* libgnat/a-coorse.adb:
(Finalize): Reclaim allocated Key_Type object.
(Adjust): New procedure; prevent sharing of non-null Key_Access
values by allocating a copy.
* libgnat/a-cborse.adb: Likewise.
* libgnat/a-ciorse.adb: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-cborse.adb | 17 +
 gcc/ada/libgnat/a-cborse.ads |  3 +++
 gcc/ada/libgnat/a-ciorse.adb | 16 +++-
 gcc/ada/libgnat/a-ciorse.ads |  3 +++
 gcc/ada/libgnat/a-coorse.adb | 16 +++-
 gcc/ada/libgnat/a-coorse.ads |  3 +++
 gcc/ada/sem_util.adb |  6 +-
 7 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/libgnat/a-cborse.adb b/gcc/ada/libgnat/a-cborse.adb
index b649c5eb6e7..9d2a0216342 100644
--- a/gcc/ada/libgnat/a-cborse.adb
+++ b/gcc/ada/libgnat/a-cborse.adb
@@ -40,6 +40,8 @@ with 
Ada.Containers.Red_Black_Trees.Generic_Bounded_Set_Operations;
 pragma Elaborate_All
   (Ada.Containers.Red_Black_Trees.Generic_Bounded_Set_Operations);
 
+with Ada.Unchecked_Deallocation;
+
 with System; use type System.Address;
 with System.Put_Images;
 
@@ -775,6 +777,18 @@ is
Is_Less_Key_Node=> Is_Less_Key_Node,
Is_Greater_Key_Node => Is_Greater_Key_Node);
 
+  
+  -- Adjust --
+  
+
+  procedure Adjust (Control : in out Reference_Control_Type) is
+  begin
+ Impl.Reference_Control_Type (Control).Adjust;
+ if Control.Old_Key /= null then
+Control.Old_Key := new Key_Type'(Control.Old_Key.all);
+ end if;
+  end Adjust;
+
   -
   -- Ceiling --
   -
@@ -872,6 +886,8 @@ is
   --
 
   procedure Finalize (Control : in out Reference_Control_Type) is
+ procedure Deallocate is
+   new Ada.Unchecked_Deallocation (Key_Type, Key_Access);
   begin
  if Control.Container /= null then
 Impl.Reference_Control_Type (Control).Finalize;
@@ -883,6 +899,7 @@ is
 end if;
 
 Control.Container := null;
+Deallocate (Control.Old_Key);
  end if;
   end Finalize;
 
diff --git a/gcc/ada/libgnat/a-cborse.ads b/gcc/ada/libgnat/a-cborse.ads
index 2366d1adcc2..650f4a40384 100644
--- a/gcc/ada/libgnat/a-cborse.ads
+++ b/gcc/ada/libgnat/a-cborse.ads
@@ -324,6 +324,9 @@ is
  Old_Key   : Key_Access;
   end record;
 
+  overriding procedure Adjust (Control : in out Reference_Control_Type);
+  pragma Inline (Adjust);
+
   overriding procedure Finalize (Control : in out Reference_Control_Type);
   pragma Inline (Finalize);
 
diff --git a/gcc/ada/libgnat/a-ciorse.adb b/gcc/ada/libgnat/a-ciorse.adb
index d90fb882b43..fe91345cdd4 100644
--- a/gcc/ada/libgnat/a-ciorse.adb
+++ b/gcc/ada/libgnat/a-ciorse.adb
@@ -807,6 +807,18 @@ is
Is_Less_Key_Node=> Is_Less_Key_Node,
Is_Greater_Key_Node => Is_Greater_Key_Node);
 
+  
+  -- Adjust --
+  
+
+  procedure Adjust (Control : in out Reference_Control_Type) is
+  begin
+ Impl.Reference_Control_Type (Control).Adjust;
+ if Control.Old_Key /= null then
+Control.Old_Key := new Key_Type'(Control.Old_Key.all);
+ end if;
+  end Adjust;
+
   -
   -- Ceiling --
   -
@@ -906,6 +918,8 @@ is
   --
 
   procedure Finalize (Control : in out Reference_Control_Type) is
+ procedure Deallocate is
+  

[COMMITTED 02/35] ada: Small cleanup in aggregate expansion code

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch moves a statement outside of a loop because it didn't
need to be inside that loop. The behavior of the program is not
affected.

gcc/ada/

* exp_aggr.adb (Flatten): Small cleanup.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 5d2b334722a..cff04fc1b79 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -4626,6 +4626,14 @@ package body Exp_Aggr is
 Component_Loop : while Present (Elmt) loop
Expr := Expression (Elmt);
 
+   --  If the expression involves a construct that generates a
+   --  loop, we must generate individual assignments and no
+   --  flattening is possible.
+
+   if Nkind (Expr) = N_Quantified_Expression then
+  return False;
+   end if;
+
--  In the case of a multidimensional array, check that the
--  aggregate can be recursively flattened.
 
@@ -4642,14 +4650,6 @@ package body Exp_Aggr is
   if Nkind (Choice) = N_Others_Choice then
  Rep_Count := 0;
 
- --  If the expression involves a construct that generates
- --  a loop, we must generate individual assignments and
- --  no flattening is possible.
-
- if Nkind (Expr) = N_Quantified_Expression then
-return False;
- end if;
-
  for J in Vals'Range loop
 if No (Vals (J)) then
Vals (J)  := New_Copy_Tree (Expr);
-- 
2.43.2



[COMMITTED 28/35] ada: Document secondary usage of Materialize_Entity flag

2024-05-17 Thread Marc Poulhiès
From: Eric Botcazou 

The flag is also used by the semantic analyzer.

gcc/ada/

* einfo.ads (Materialize_Entity): Document secondary usage.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo.ads | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/einfo.ads b/gcc/ada/einfo.ads
index 71c560d5272..e5110f51670 100644
--- a/gcc/ada/einfo.ads
+++ b/gcc/ada/einfo.ads
@@ -3584,10 +3584,11 @@ package Einfo is
 --   tasks implementing such interface.
 
 --Materialize_Entity
---   Defined in all entities. Set only for renamed obects which should be
+--   Defined in all entities. Set mostly for renamed objects that should be
 --   materialized for debugging purposes. This means that a memory location
 --   containing the renamed address should be allocated. This is needed so
---   that the debugger can find the entity.
+--   that the debugger can find the entity. Also set on types built in the
+--   case of unanalyzed packages referenced through a limited_with clause.
 
 --May_Inherit_Delayed_Rep_Aspects
 --   Defined in all entities for types and subtypes. Set if the type is
-- 
2.43.2



[COMMITTED 08/35] ada: Allow private items with unknown discriminants as Depends inputs

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

Objects of private types with unknown discriminants are now allowed as
inputs in the Depends contracts.

gcc/ada/

* sem_prag.adb (Is_Unconstrained_Or_Tagged_Item): Allow objects
of private types with unknown discriminants.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 9dc22e3edc1..0302cdb00ba 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -283,7 +283,8 @@ package body Sem_Prag is
function Is_Unconstrained_Or_Tagged_Item (Item : Entity_Id) return Boolean;
--  Subsidiary to Collect_Subprogram_Inputs_Outputs and the analysis of
--  pragma Depends. Determine whether the type of dependency item Item is
-   --  tagged, unconstrained array or unconstrained record.
+   --  tagged, unconstrained array, unconstrained private or unconstrained
+   --  record.
 
procedure Record_Possible_Body_Reference
  (State_Id : Entity_Id;
@@ -32977,7 +32978,8 @@ package body Sem_Prag is
  return Has_Discriminants (Typ) and then not Is_Constrained (Typ);
 
   elsif Is_Private_Type (Typ) then
- return Has_Discriminants (Typ);
+ return Has_Discriminants (Typ)
+   or else Has_Unknown_Discriminants (Typ);
 
   else
  return False;
-- 
2.43.2



[COMMITTED 11/35] ada: Fix Constraint_Error on mutable assignment

2024-05-17 Thread Marc Poulhiès
From: Bob Duff 

For an assignment statement "X := Y;", where X is a formal parameter
of a "late overriding" subprogram (i.e. it has no spec, and the body
is overriding), and the subtype of X is an unconstrained record with
defaulted discriminants, if the actual parameter passed to X is
unconstrained, then X is unconstrained. This patch fixes a bug
where X was incorrectly considered constrained, so that if Y's
discriminants are different from X, Constraint_Error was raised.

The bug was caused by the fact that an extra "constrained" formal
parameter was missing in both caller and callee.

gcc/ada/

* sem_disp.adb (Check_Dispatching_Operation): Call
Create_Extra_Formals, so that the caller will have an extra
"constrained" parameter, which will be checked on assignment in
the callee, and will be passed in by the caller.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_disp.adb | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_disp.adb b/gcc/ada/sem_disp.adb
index 525a9f7f0a1..fd521a09bc0 100644
--- a/gcc/ada/sem_disp.adb
+++ b/gcc/ada/sem_disp.adb
@@ -1514,10 +1514,10 @@ package body Sem_Disp is
 Subp);
 
   else
-
  --  The subprogram body declares a primitive operation.
  --  We must update its dispatching information here. The
  --  information is taken from the overridden subprogram.
+ --  Such a late-overriding body also needs extra formals.
  --  We must also generate a cross-reference entry because
  --  references to other primitives were already created
  --  when type was frozen.
@@ -1527,6 +1527,7 @@ package body Sem_Disp is
  if Present (DTC_Entity (Old_Subp)) then
 Set_DTC_Entity (Subp, DTC_Entity (Old_Subp));
 Set_DT_Position_Value (Subp, DT_Position (Old_Subp));
+Create_Extra_Formals (Subp);
 
 if not Restriction_Active (No_Dispatching_Calls) then
if Building_Static_DT (Tagged_Type) then
-- 
2.43.2



[COMMITTED 03/35] ada: Remove superfluous Relocate_Node calls

2024-05-17 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch removes two calls to Relocate_Node that were not needed.
This does not affect the behavior of the compiler.

gcc/ada/

* exp_ch4.adb (Expand_N_Case_Expression): Remove call to
Relocate_Node.
* sem_attr.adb (Analyze_Attribute): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb  | 2 +-
 gcc/ada/sem_attr.adb | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 448cd5c82b6..42d18f1 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5109,7 +5109,7 @@ package body Exp_Ch4 is
   else
  Alt_Expr :=
Make_Attribute_Reference (Alt_Loc,
- Prefix => Relocate_Node (Alt_Expr),
+ Prefix => Alt_Expr,
  Attribute_Name => Name_Unrestricted_Access);
   end if;
end if;
diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 629033ca5ac..a921909685a 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -3425,7 +3425,7 @@ package body Sem_Attr is
   --  perform legality checks on the original tree.
 
   if Nkind (P) in N_Raise_xxx_Error then
- Rewrite (N, Relocate_Node (P));
+ Rewrite (N, P);
  P := Original_Node (P_Old);
   end if;
 
-- 
2.43.2



[COMMITTED 06/35] ada: Fix probable copy/paste error

2024-05-17 Thread Marc Poulhiès
gcc/ada/

* doc/gnat_rm/implementation_defined_attributes.rst: Fix
copy/paste.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst | 7 +++
 gcc/ada/gnat_rm.texi  | 7 +++
 gcc/ada/gnat_ugn.texi | 4 ++--
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
index f8700b1be4e..728d63a8e92 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_attributes.rst
@@ -81,10 +81,9 @@ Attribute Atomic_Always_Lock_Free
 =
 .. index:: Atomic_Always_Lock_Free
 
-The prefix of the ``Atomic_Always_Lock_Free`` attribute is a type.
-The result is a Boolean value which is True if the type has discriminants,
-and False otherwise.  The result indicate whether atomic operations are
-supported by the target for the given type.
+The prefix of the ``Atomic_Always_Lock_Free`` attribute is a type. The
+result indicates whether atomic operations are supported by the target
+for the given type.
 
 Attribute Bit
 =
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index 6da3f3131d5..8dcdd6ca14c 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -10373,10 +10373,9 @@ either be omitted, or explicitly given as 
@code{No_Output_Operands}.
 
 @geindex Atomic_Always_Lock_Free
 
-The prefix of the @code{Atomic_Always_Lock_Free} attribute is a type.
-The result is a Boolean value which is True if the type has discriminants,
-and False otherwise.  The result indicate whether atomic operations are
-supported by the target for the given type.
+The prefix of the @code{Atomic_Always_Lock_Free} attribute is a type. The
+result indicates whether atomic operations are supported by the target
+for the given type.
 
 @node Attribute Bit,Attribute Bit_Position,Attribute 
Atomic_Always_Lock_Free,Implementation Defined Attributes
 @anchor{gnat_rm/implementation_defined_attributes attribute-bit}@anchor{178}
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index 997086c67bd..7bad8b4e161 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , Apr 15, 2024
+GNAT User's Guide for Native Platforms , Apr 16, 2024
 
 AdaCore
 
@@ -29580,8 +29580,8 @@ to permit their use in free software.
 
 @printindex ge
 
-@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 @anchor{d1}@w{  }
+@anchor{gnat_ugn/gnat_utility_programs switches-related-to-project-files}@w{   
   }
 
 @c %**end of body
 @bye
-- 
2.43.2



[COMMITTED 27/35] ada: Bug in computing local restrictions inherited from enclosing scopes.

2024-05-17 Thread Marc Poulhiès
From: Steve Baird 

In the function Local_Restrict.Active_Restriction, we traverse enclosing
scopes looking for a relevant Local_Restrictions aspect specification.
Fix a bug in this traversal.

gcc/ada/

* local_restrict.adb (Active_Restriction): When traversing scopes,
do not skip over a subprogram body.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/local_restrict.adb | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/local_restrict.adb b/gcc/ada/local_restrict.adb
index 6e91c8a2e2a..3be94049928 100644
--- a/gcc/ada/local_restrict.adb
+++ b/gcc/ada/local_restrict.adb
@@ -90,22 +90,28 @@ package body Local_Restrict is
 return Result;
  end if;
 
- Scop := Enclosing_Declaration (Scop);
- if Present (Scop) then
-Scop := Parent (Scop);
+ declare
+Saved_Scope : constant Node_Id := Scop;
+ begin
+Scop := Enclosing_Declaration (Scop);
 if Present (Scop) then
-   --  For a subprogram associated with a type, we don't care
-   --  where the type was frozen; continue from the type.
-
-   if Nkind (Scop) = N_Freeze_Entity then
-  Scop := Scope (Entity (Scop));
-   elsif Nkind (Parent (Scop)) = N_Freeze_Entity then
-  Scop := Scope (Entity (Parent (Scop)));
-   else
-  Scop := Find_Enclosing_Scope (Scop);
+   Scop := Parent (Scop);
+   if Present (Scop) then
+  --  For a subprogram associated with a type, we don't care
+  --  where the type was frozen; continue from the type.
+
+  if Nkind (Scop) = N_Freeze_Entity then
+ Scop := Scope (Entity (Scop));
+  elsif Nkind (Parent (Scop)) = N_Freeze_Entity then
+ Scop := Scope (Entity (Parent (Scop)));
+  elsif Present (Scope (Saved_Scope)) then
+ Scop := Scope (Saved_Scope);
+  else
+ Scop := Find_Enclosing_Scope (Scop);
+  end if;
end if;
 end if;
- end if;
+ end;
   end loop;
 
   return Empty;
-- 
2.43.2



[COMMITTED 07/35] ada: Tune detection of unconstrained and tagged items in Depends contract

2024-05-17 Thread Marc Poulhiès
From: Piotr Trojanek 

The Tagged/Array/Record/Private types are mutually exclusive, so they
can be examined like with a case statement (except for records with
private extensions, but their handling is not affected by this change).

gcc/ada/

* sem_prag.adb (Is_Unconstrained_Or_Tagged_Item): Tune repeated
testing of type kinds.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 2fc46ab0cd2..9dc22e3edc1 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -32970,14 +32970,14 @@ package body Sem_Prag is
   if Is_Tagged_Type (Typ) then
  return True;
 
-  elsif Is_Array_Type (Typ) and then not Is_Constrained (Typ) then
- return True;
+  elsif Is_Array_Type (Typ) then
+ return not Is_Constrained (Typ);
 
   elsif Is_Record_Type (Typ) then
  return Has_Discriminants (Typ) and then not Is_Constrained (Typ);
 
-  elsif Is_Private_Type (Typ) and then Has_Discriminants (Typ) then
- return True;
+  elsif Is_Private_Type (Typ) then
+ return Has_Discriminants (Typ);
 
   else
  return False;
-- 
2.43.2



[COMMITTED 05/35] ada: Check subtype to avoid a precondition failure

2024-05-17 Thread Marc Poulhiès
From: Viljar Indus 

gcc/ada/

* sem_ch3.adb (Analyze_Component_Declaration):
Apply range checks only for Scalar_Types to
ensure that they have the Scalar_Range attribute.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch3.adb | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index 7ee4ca299d9..263be607ec1 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -2029,8 +2029,9 @@ package body Sem_Ch3 is
 
   while Present (Target_Index) loop
  if Nkind (Subt_Index) in N_Expanded_Name | N_Identifier
- and then Nkind
-(Scalar_Range (Entity (Subt_Index))) = N_Range
+   and then Is_Scalar_Type (Entity (Subt_Index))
+   and then
+ Nkind (Scalar_Range (Entity (Subt_Index))) = N_Range
  then
 Apply_Range_Check
(Expr=> Scalar_Range (Entity (Subt_Index)),
-- 
2.43.2



  1   2   >