Re: [PATCH v2] add explicit ABI and align options to pr88233.c

2024-05-30 Thread Kewen.Lin
on 2024/5/29 14:32, Alexandre Oliva wrote:
> On May 26, 2024, "Kewen.Lin"  wrote:
> 
>> Hi,
>> on 2024/4/22 17:38, Alexandre Oliva wrote:
>>> Ping?
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566530.html
>>> (modified version follows)
> 
>> Segher originated this test case, I was expecting he can chime in this. :)
> 
> Me too ;-)
> 
>>> We've observed failures of this test on powerpc configurations that
>>> default to different calling conventions and alignment requirements.
> 
>> It seems that it was using the original "BE" and "LE" guards to shadow
>> ABIs, could you share some more on how you found this failure?  It seems
>> that your test environment with -mstrict-align turned on by default?  And
>> also having a ABI which passing small struct return value in register?
> 
> Exactly, AdaCore's ppc64-vx7r2 are configured so as to enable
> -mstrict-align and -freg-struct-return by default.

OK, thanks for the information!

> 
> But since these settings may change depending on the target variant, I
> figured it would be useful to record what the assumptions are that the
> test makes.  That one of these settings changed depending on endianness
> and affected codegen was, to me, further evidence that this would be
> useful, so, with the explicit settings, I could restore the original
> test's expectations.

Got it, but it also means we can probably test it without the default ABI
on the test env, someone may argue this testing is of less value.  By
visiting the original PR, maybe we can drop the scanning on the load isns
and just keep the scanning-not on mtvsr, it becomes not sensitive for the
alignment and struct result passing way.  Looking forward to Segher's
opinion on this patch. :)

BR,
Kewen



Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-30 Thread Tejas Belagod

On 5/30/24 6:28 PM, Richard Sandiford wrote:

Tejas Belagod  writes:

Currently poly-int type structures are passed by value to OpenMP runtime
functions for shared clauses etc.  This patch improves on this by passing
around poly-int structures by address to avoid copy-overhead.

gcc/ChangeLog
* omp-low.c (use_pointer_for_field): Use pointer if the OMP data
structure's field type is a poly-int.
---
  gcc/omp-low.cc | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 1a65229cc37..b15607f4ef5 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -466,7 +466,8 @@ static bool
  use_pointer_for_field (tree decl, omp_context *shared_ctx)
  {
if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
-  || TYPE_ATOMIC (TREE_TYPE (decl)))
+  || TYPE_ATOMIC (TREE_TYPE (decl))
+  || POLY_INT_CST_P (DECL_SIZE (decl)))
  return true;
  
/* We can only use copy-in/copy-out semantics for shared variables




Thanks for the reviews.


Realise this is also true of my original patch, but:

I suppose a question here is whether this function is only ever used for
local interfaces between code generated by the same source code function,
or whether it's ABI in a more general sense.  


I'm not a 100% sure, but AFAICS, 'use_pointer_for_field' seems to be 
used only for local interface between source and generated functions. I 
don't see any backend hooks into this or backend hooking into this 
function for general ABI. Ofcourse, I'm not the expert on OMP lowering, 
so it would be great to get an expert opinion on this.



If the latter, I suppose
we should make sure to handle ACLE types the same way regardless of
whether the SVE vector size is known.



When you say same way, do you mean the way SVE ABI defines the rules for 
SVE types?


Thanks,
Tejas.


(At the moment, the vector size is fixed for a TU, not just a function,
but we should probably plan for relaxing that in future.)

Thanks,
Richard




Re: [PATCH 2/2] xtensa: Use epilogue_completed rather than cfun->machine->epilogue_done

2024-05-30 Thread Max Filippov
On Thu, May 30, 2024 at 6:33 AM Takayuki 'January June' Suwa
 wrote:
>
> In commit ad89d820bf, an "epilogue_done" member was added to the
> machine_function structure, but it is sufficient to use the existing
> "epilogue_completed" global variable.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa-protos.h
> (xtensa_use_return_instruction_p): Remove.
> * config/xtensa/xtensa.cc
> (machine_function): Remove "epilogue_done" field.
> (xtensa_expand_epilogue): Remove "cfun->machine->epilogue_done" usage.
> (xtensa_use_return_instruction_p): Remove.
> * config/xtensa/xtensa.md ("return"):
> Replace calling "xtensa_use_return_instruction_p()" with inline code.
> ---
>   gcc/config/xtensa/xtensa-protos.h |  1 -
>   gcc/config/xtensa/xtensa.cc   | 14 --
>   gcc/config/xtensa/xtensa.md   |  5 -
>   3 files changed, 4 insertions(+), 16 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH 1/2] xtensa: Use REG_P(), MEM_P(), etc. instead of comparing GET_CODE()

2024-05-30 Thread Max Filippov
On Thu, May 30, 2024 at 6:33 AM Takayuki 'January June' Suwa
 wrote:
>
> Instead of comparing directly, this patch replaces as much as possible with
> macros that determine RTX code such as REG_P(), SUBREG_P() or MEM_P(), etc.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.cc (xtensa_valid_move, constantpool_address_p,
> xtensa_tls_symbol_p, gen_int_relational, xtensa_emit_move_sequence,
> xtensa_copy_incoming_a7, xtensa_expand_block_move,
> xtensa_expand_nonlocal_goto, xtensa_emit_call,
> xtensa_legitimate_address_p, xtensa_legitimize_address,
> xtensa_tls_referenced_p, print_operand, print_operand_address,
> xtensa_output_literal):
> Replace RTX code comparisons with their predicate macros such as
> REG_P().
> * config/xtensa/xtensa.h (CONSTANT_ADDRESS_P,
> LEGITIMATE_PIC_OPERAND_P): Ditto.
> * config/xtensa/xtensa.md (reload_literal, indirect_jump):
> Ditto.
> ---
>   gcc/config/xtensa/xtensa.cc | 90 ++---
>   gcc/config/xtensa/xtensa.h  | 10 ++---
>   gcc/config/xtensa/xtensa.md |  4 +-
>   3 files changed, 51 insertions(+), 53 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.
For some reason neither git am nor patch -p1 could apply this patch,
so I did that manually.

-- 
Thanks.
-- Max


[PATCH v6] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li 

Update in v6
* Fix more doc build error.

Update in v5
* Fix some doc build error.

Log in v4:
After we support one gassign form of the unsigned .SAT_ADD,  we
would like to support more forms including both the branch and
branchless.  There are 5 other forms of .SAT_ADD,  list as below:

Form 1:
  #define SAT_ADD_U_1(T) \
  T sat_add_u_1_##T(T x, T y) \
  { \
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Form 2:
  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
  }

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
return (T)(x + y) < x ? -1 : (x + y); \
  }

Take the forms 3 of above as example:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
goto ; [35.00%]
  else
goto ; [65.00%]
;;succ:   4
;;3

;;   basic block 3, loop depth 0
;;pred:   2
  _1 = REALPART_EXPR <_6>;
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   3
;;2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;succ:   EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;succ:   EXIT
}

The flag '^' acts on cond_expr will generate matching code similar as below:

else if (gphi *_a1 = dyn_cast  (_d1))
  {
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : ...
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_1_1 : ...
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, ...);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags  & 
EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
switch (TREE_CODE (_p0))
  ...

The below test suites are still running, will update it later.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (enum expr_flag): Add new enum for expr flag.
(dt_node::gen_kids_1): Add cond_expr and flag handling.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
(match_assign_saturation_arith): Rename to.
(match_phi_saturation_arith): Add new func impl to match the
.SAT_ADD when phi.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.

Signed-off-by: Pan Li 
---
 gcc/doc/match-and-simplify.texi |  16 
 gcc/genmatch.cc | 126 +++-
 gcc/match.pd|  43 ++-
 gcc/tree-ssa-math-opts.cc   |  51 -
 4 files changed, 231 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 01f19e2f62c..63d5af159f5 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -361,6 +361,22 @@ Usually the types of the generated result expressions are
 determined from the context, but sometimes like in the above case
 it is required that you specify them explicitly.
 
+Another modifier for generated expressions is @code{^} which
+tell

Re: [PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:21 PM Hongyu Wang  wrote:
>
> The ccmp insn itself doesn't support fp compare, but x86 has fp comi
> insn that changes EFLAG which can be the scc input to ccmp. Allow
> scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD
> compare which can not be identified in ccmp.
Ok if the second patch(middle-end part) is approved.
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (ix86_gen_ccmp_first): Add fp
> compare and check the allowed fp compare type.
> (ix86_gen_ccmp_next): Adjust compare_code input to ccmp for
> fp compare.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-1.c: Add test for fp compare.
> * gcc.target/i386/apx-ccmp-2.c: Likewise.
> ---
>  gcc/config/i386/i386-expand.cc | 53 --
>  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 45 +-
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c | 47 +++
>  3 files changed, 138 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index f00525e449f..7507034dc91 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -25571,18 +25571,58 @@ ix86_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn 
> **gen_seq,
>if (op_mode == VOIDmode)
>  op_mode = GET_MODE (op1);
>
> +  /* We only supports following scalar comparisons that use just 1
> + instruction: DI/SI/QI/HI/DF/SF/HF.
> + Unordered/Ordered compare cannot be corretly indentified by
> + ccmp so they are not supported.  */
>if (!(op_mode == DImode || op_mode == SImode || op_mode == HImode
> -   || op_mode == QImode))
> +   || op_mode == QImode || op_mode == DFmode || op_mode == SFmode
> +   || op_mode == HFmode)
> +  || code == ORDERED
> +  || code == UNORDERED)
>  {
>end_sequence ();
>return NULL_RTX;
>  }
>
>/* Canonicalize the operands according to mode.  */
> -  if (!nonimmediate_operand (op0, op_mode))
> -op0 = force_reg (op_mode, op0);
> -  if (!x86_64_general_operand (op1, op_mode))
> -op1 = force_reg (op_mode, op1);
> +  if (SCALAR_INT_MODE_P (op_mode))
> +{
> +  if (!nonimmediate_operand (op0, op_mode))
> +   op0 = force_reg (op_mode, op0);
> +  if (!x86_64_general_operand (op1, op_mode))
> +   op1 = force_reg (op_mode, op1);
> +}
> +  else
> +{
> +  /* op0/op1 can be canonicallized from expand_fp_compare, so
> +just adjust the code to make it generate supported fp
> +condition.  */
> +  if (ix86_fp_compare_code_to_integer (code) == UNKNOWN)
> +   {
> + /* First try to split condition if we don't need to honor
> +NaNs, as the ORDERED/UNORDERED check always fall
> +through.  */
> + if (!HONOR_NANS (op_mode))
> +   {
> + rtx_code first_code;
> + split_comparison (code, op_mode, &first_code, &code);
> +   }
> + /* Otherwise try to swap the operand order and check if
> +the comparison is supported.  */
> + else
> +   {
> + code = swap_condition (code);
> + std::swap (op0, op1);
> +   }
> +
> + if (ix86_fp_compare_code_to_integer (code) == UNKNOWN)
> +   {
> + end_sequence ();
> + return NULL_RTX;
> +   }
> +   }
> +}
>
>*prep_seq = get_insns ();
>end_sequence ();
> @@ -25647,6 +25687,9 @@ ix86_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn 
> **gen_seq, rtx prev,
>dfv = ix86_get_flags_cc ((rtx_code) cmp_code);
>
>prev_code = GET_CODE (prev);
> +  /* Fixup FP compare code here.  */
> +  if (GET_MODE (XEXP (prev, 0)) == CCFPmode)
> +prev_code = ix86_fp_compare_code_to_integer (prev_code);
>
>if (bit_code != AND)
>  prev_code = reverse_condition (prev_code);
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> index 5a2dad89f1f..e4e112f07e0 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { ! ia32 } } } */
> -/* { dg-options "-O2 -mapx-features=ccmp" } */
> +/* { dg-options "-O2 -ffast-math -mapx-features=ccmp" } */
>
>  int
>  f1 (int a)
> @@ -56,8 +56,49 @@ f9 (int a, int b)
>return a == 3 || a == 0;
>  }
>
> +int
> +f10 (float a, int b, float c)
> +{
> +  return a > c || b < 19;
> +}
> +
> +int
> +f11 (float a, int b)
> +{
> +  return a == 0.0 && b > 21;
> +}
> +
> +int
> +f12 (double a, int b)
> +{
> +  return a < 3.0 && b != 23;
> +}
> +
> +int
> +f13 (double a, double b, int c, int d)
> +{
> +  a += b;
> +  c += d;
> +  return a != b || c == d;
> +}
> +
> +int
> +f14 (double a, int b)
> +{
> +  return b != 0 && a < 1.5;
> +}
> +
> +int
> +f15 (double a, double b, int c, int d)
> +{
> +  return c != d || a <= b;
> +}
> +
>  /* { dg-final {

Re: [PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-30 Thread Hongtao Liu
On Wed, May 15, 2024 at 4:24 PM Hongyu Wang  wrote:
>
> APX CCMP feature implements conditional compare which executes compare
> when EFLAGS matches certain condition.
>
> CCMP introduces default flags value (dfv), when conditional compare does
> not execute, it will directly set the flags according to dfv.
>
> The instruction goes like
>
> ccmpeq {dfv=sf,of,cf,zf}  %rax, %r16
>
> For this instruction, it will test EFLAGS regs if it matches conditional
> code EQ, if yes, compare %rax and %r16 like legacy cmp. If no, the
> EFLAGS will be updated according to dfv, which means SF,OF,CF,ZF are
> set. PF will be set according to CF in dfv, and AF will always be
> cleared.
>
> The dfv part can be a combination of sf,of,cf,zf, like {dfv=cf,zf} which
> sets CF and ZF only and clear others, or {dfv=} which clears all EFLAGS.
>
> To enable CCMP, we implemented the target hook TARGET_GEN_CCMP_FIRST and
> TARGET_GEN_CCMP_NEXT to reuse the current ccmp infrastructure. Also we
> extended the cstorem4 optab to support storing different CCmode to fit
> current ccmp infrasturcture.
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (ix86_gen_ccmp_first): New function
> that test if the first compare can be generated.
> (ix86_gen_ccmp_next): New function to emit a simgle compare and ccmp
> sequence.
> * config/i386/i386-opts.h (enum apx_features): Add apx_ccmp.
> * config/i386/i386-protos.h (ix86_gen_ccmp_first): New proto
> declare.
> (ix86_gen_ccmp_next): Likewise.
> (ix86_get_flags_cc): Likewise.
> * config/i386/i386.cc (ix86_flags_cc): New enum.
> (ix86_ccmp_dfv_mapping): New string array to map conditional
> code to dfv.
> (ix86_print_operand): Handle special dfv flag for CCMP.
> (ix86_get_flags_cc): New function to return x86 CC enum.
> (TARGET_GEN_CCMP_FIRST): Define.
> (TARGET_GEN_CCMP_NEXT): Likewise.
> * config/i386/i386.h (TARGET_APX_CCMP): Define.
> * config/i386/i386.md (@ccmp): New define_insn to support
> ccmp.
> (UNSPEC_APX_DFV): New unspec for ccmp dfv.
> (ALL_CC): New mode iterator.
> (cstorecc4): Change to ...
> (cstore4) ... this, use ALL_CC to loop through all
> available CCmodes.
> * config/i386/i386.opt (apx_ccmp): Add enum value for ccmp.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-1.c: New compile test.
> * gcc.target/i386/apx-ccmp-2.c: New runtime test.
> ---
>  gcc/config/i386/i386-expand.cc | 121 +
>  gcc/config/i386/i386-opts.h|   6 +-
>  gcc/config/i386/i386-protos.h  |   5 +
>  gcc/config/i386/i386.cc|  50 +
>  gcc/config/i386/i386.h |   1 +
>  gcc/config/i386/i386.md|  35 +-
>  gcc/config/i386/i386.opt   |   3 +
>  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c |  63 +++
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  57 ++
>  9 files changed, 337 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 1ab22fe7973..f00525e449f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -25554,4 +25554,125 @@ ix86_expand_fast_convert_bf_to_sf (rtx val)
>return ret;
>  }
>
> +rtx
> +ix86_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
> +   rtx_code code, tree treeop0, tree treeop1)
> +{
> +  if (!TARGET_APX_CCMP)
> +return NULL_RTX;
> +
> +  rtx op0, op1, res;
> +  machine_mode op_mode;
> +
> +  start_sequence ();
> +  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
> +
> +  op_mode = GET_MODE (op0);
> +  if (op_mode == VOIDmode)
> +op_mode = GET_MODE (op1);
> +
> +  if (!(op_mode == DImode || op_mode == SImode || op_mode == HImode
> +   || op_mode == QImode))
> +{
> +  end_sequence ();
> +  return NULL_RTX;
> +}
> +
> +  /* Canonicalize the operands according to mode.  */
> +  if (!nonimmediate_operand (op0, op_mode))
> +op0 = force_reg (op_mode, op0);
> +  if (!x86_64_general_operand (op1, op_mode))
> +op1 = force_reg (op_mode, op1);
> +
> +  *prep_seq = get_insns ();
> +  end_sequence ();
> +
> +  start_sequence ();
> +
> +  res = ix86_expand_compare (code, op0, op1);
> +
> +  if (!res)
> +{
> +  end_sequence ();
> +  return NULL_RTX;
> +}
> +  *gen_seq = get_insns ();
> +  end_sequence ();
> +
> +  return res;
> +}
> +
> +rtx
> +ix86_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev,
> +  rtx_code cmp_code, tree treeop0, tree treeop1,
> +  rtx_code bit_code)
> +{
> +  if (!TARGET_APX_CCMP)
> +return NULL_RTX;
> +
>

Re: [PATCH] i386: Optimize EQ/NE comparison between avx512 kmask and -1.

2024-05-30 Thread Hongtao Liu
On Tue, May 28, 2024 at 4:00 PM Hu, Lin1  wrote:
>
> Hi all,
>
> This patch aims to acheive EQ/NE comparison between avx512 kmask and -1
> by using kxortest with checking CF.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/113609
> * config/i386/sse.md
> (*kortest_cmp_setcc): New define_insn_and_split.
> (*kortest_cmp_jcc): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> PR target/113609
> * gcc.target/i386/pr113609-1.c: New test.
> * gcc.target/i386/pr113609-2.c: Ditto.
> ---
>  gcc/config/i386/sse.md |  67 +++
>  gcc/testsuite/gcc.target/i386/pr113609-1.c | 194 +
>  gcc/testsuite/gcc.target/i386/pr113609-2.c | 161 +
>  3 files changed, 422 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113609-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113609-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index b59c988fc31..34fd2e4afac 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2201,6 +2201,73 @@ (define_expand "kortest"
>   UNSPEC_KORTEST))]
>"TARGET_AVX512F")
>
> +;; Optimize cmp + setcc with mask register by kortest + setcc.
> +(define_insn_and_split "*kortest_cmp_setcc"
> +   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm, qm")
> +(match_operator:QI 1 "bt_comparison_operator"
> +   [(match_operand:SWI1248_AVX512BWDQ_64 2 "register_operand" "?k, 
> ")
> +(const_int -1)]))
> +  (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512BW"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  if (MASK_REGNO_P (REGNO (operands[2])))
> +{
> +  emit_insn (gen_kortest_ccc (operands[2], operands[2]));
> +  operands[4] = gen_rtx_REG (CCCmode, FLAGS_REG);
> +}
> +  else
> +{
> +  operands[4] = gen_rtx_REG (CCZmode, FLAGS_REG);
> +  emit_insn (gen_rtx_SET (operands[4],
> + gen_rtx_COMPARE (CCZmode,
> +  operands[2],
> +  constm1_rtx)));
> +}
> +  ix86_expand_setcc (operands[0],
> +GET_CODE (operands[1]),
> +operands[4],
> +const0_rtx);
> +  DONE;
> +})
> +
> +;; Optimize cmp + jcc with mask register by kortest + jcc.
> +(define_insn_and_split "*kortest_cmp_jcc"
> +   [(set (pc)
> +  (if_then_else
> +   (match_operator 0 "bt_comparison_operator"
> + [(match_operand:SWI1248_AVX512BWDQ_64 1 "register_operand" "?k, 
> ")
> +  (const_int -1)])
> + (label_ref (match_operand 2))
> +  (pc)))
> +  (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512BW"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +{
> +  if (MASK_REGNO_P (REGNO (operands[1])))
> +{
> +  emit_insn (gen_kortest_ccc (operands[1], operands[1]));
> +  operands[4] = gen_rtx_REG (CCCmode, FLAGS_REG);
> +}
> +  else
> +{
> +  operands[4] = gen_rtx_REG (CCZmode, FLAGS_REG);
> +  emit_insn (gen_rtx_SET (operands[4],
> + gen_rtx_COMPARE (CCZmode,
> +  operands[1],
> +  constm1_rtx)));
> +}
> +  ix86_expand_branch (GET_CODE (operands[0]),
> + operands[4],
> + const0_rtx,
> + operands[2]);
> +  DONE;
> +})
> +
>  (define_insn "kunpckhi"
>[(set (match_operand:HI 0 "register_operand" "=k")
> (ior:HI
> diff --git a/gcc/testsuite/gcc.target/i386/pr113609-1.c 
> b/gcc/testsuite/gcc.target/i386/pr113609-1.c
> new file mode 100644
> index 000..f0639b8500a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr113609-1.c
> @@ -0,0 +1,194 @@
> +/* PR target/113609 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=x86-64-v4" } */
> +/* { dg-final { scan-assembler-not "^cmp" } } */
> +/* { dg-final { scan-assembler-not "\[ \\t\]+sete" { target { ! ia32 } } } } 
> */
> +/* { dg-final { scan-assembler-not "\[ \\t\]+setne" { target { ! ia32 } } } 
> } */
> +/* { dg-final { scan-assembler-not "\[ \\t\]+je" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "\[ \\t\]+jne" { target { ! ia32 } } } } 
> */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+sete" 1 { target { ia32 } } } 
> } */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+setne" 1 { target { ia32 } } 
> } } */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+je" 1 { target { ia32 } } } } 
> */
> +/* { dg-final { scan-assembler-times "\[ \\t\]+jne" 2 { target { ia32 } } } 
> } */
> +/* { dg-final { scan-assembler-times "kortest" 12 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "kortest" 17 { target { ! ia32 } } } } 
> */
> +
> +#include 
> +
> +unsigned int
> +cmp_vector_sete_mask8(

Re: Reverted recent patches to resource.cc

2024-05-30 Thread Jeff Law




On 5/30/24 8:09 PM, Hans-Peter Nilsson wrote:

Date: Wed, 29 May 2024 21:23:58 -0600
Cc: gcc-patches@gcc.gnu.org



I don't bother with qemu.exp at all.  I've set up binfmt handlers so
that I can execute foreign binaries.

So given a root filesystem, I can chroot into it and do whatever I need.
   As far as dejagnu is concerned it looks like the native system.


Interesting.  In a Docker setup (or similar container)?
I've got docker containers for some of these.  m68k, alpha, s390 for 
example.   They're based off debian base images.  I don't have anything 
for sparc and I don't see a base image to build from.  If there was a 
base image, then a Dockerfile like this would get you started:



FROM sparc64/debian:latest

# Basic environment variables so we can apt-get without interactive prompts
ENV LC_ALL C
ENV DEBIAN_FRONTEND noninteractive
ENV TZ=Etc/UTC

RUN apt-get update && apt-get -y install gcc dejagnu binutils 
default-jre-headless git build-essential autoconf bison flex gawk make 
texinfo help2man libncurses5-dev python3-dev python3-distutils libtool
libtool-bin unzip wget curl rsync texinfo g++ libmpc-dev libgmp-dev 
libmpfr-dev libgmp-dev python3 libisl-dev rsync vim automake autoconf 
autotools-dev unzip help2man libtool libtool-bin sudo curl wget pyt

hon3-dev bzip2 xz-utils gdb bc libssl-dev libelf-dev


With the lack of an existing docker image, you can probably use 
debootstrap to construct an initial chroot, the import that into docker, 
adding whatever bits you need to do the build (ie, compilers, dejagnu, 
make, texinfo, blah blah blah) via apt until it works.  It's been a 
while since I've done it, but I'm pretty sure that's how I got things 
going on the sh4 and sh4eb platforms.



The JRE bits are only needed because these get launched as a docker 
swarm service and thus need to connect to the Jenkins server using JNLP. 
 Some of the packages are only needed for kernel builds or glibc builds.



Jeff




[to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable

2024-05-30 Thread Jeff Law
Basically this adds the ability to generate two independent constants 
during  synthesis, then bring them together with a pack instruction. 
Thus we never need to go out to the constant pool when zbkb is enabled. 
The worst sequence we ever generate is


lui+addi+lui+addi+pack

Obviously if either half can be synthesized with just a lui or just an 
addi, then we'll DTRT automagically.   So for example:


unsigned long foo_0xf857f2def857f2de(void) {
return 0x14252800;
}

The high and low halves are just a lui.  So the final synthesis is:



li  a5,671088640# 15[c=4 l=4]  *movdi_64bit/1
li  a0,337969152# 16[c=4 l=4]  *movdi_64bit/1
packa0,a5,a0# 17[c=12 l=4]  riscv_xpack_di_si_2


On the implementation side, I think the bits I've put in here likely can 
be used to handle the repeating constant case for !zbkb.  I think it 
likely could be used to help capture cases where the upper half can be 
derived from the lower half (say by turning a bit on or off, shifting or 
something similar).  The key in both of these cases is we need a 
temporary register holding an intermediate value.


Ventana's internal tester enables zbkb, but I don't think any of the 
other testers currently exercise zbkb.  We'll probably want to change 
that at some point, but I don't think it's super-critical yet.


While I can envision a few more cases where we could improve constant 
synthesis,   No immediate plans to work in this space, but if someone is 
interested, some thoughts are recorded here:




https://wiki.riseproject.dev/display/HOME/CT_00_031+--+Additional+Constant+Synthesis+Improvements




Jeffgcc/
* config/riscv/riscv.cc (riscv_integer_op): Add new field.
(riscv_build_integer_1): Initialize the new field.
(riscv_built_integer): Recognize more cases where Zbkb's
pack instruction is profitable.
(riscv_move_integer): Loop over all the codes.  If requested,
save the current constant into a temporary.  Generate pack
for more cases using the saved constant.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 91fefacee80..10af38a5a81 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -250,6 +250,7 @@ struct riscv_arg_info {
and each VALUE[i] is a constant integer.  CODE[0] is undefined.  */
 struct riscv_integer_op {
   bool use_uw;
+  bool save_temporary;
   enum rtx_code code;
   unsigned HOST_WIDE_INT value;
 };
@@ -759,6 +760,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   codes[0].code = UNKNOWN;
   codes[0].value = value;
   codes[0].use_uw = false;
+  codes[0].save_temporary = false;
   return 1;
 }
   if (TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (value))
@@ -767,6 +769,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
   codes[0].code = UNKNOWN;
   codes[0].value = value;
   codes[0].use_uw = false;
+  codes[0].save_temporary = false;
 
   /* RISC-V sign-extends all 32bit values that live in a 32bit
 register.  To avoid paradoxes, we thus need to use the
@@ -796,6 +799,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = PLUS;
  alt_codes[alt_cost-1].value = low_part;
  alt_codes[alt_cost-1].use_uw = false;
+ alt_codes[alt_cost-1].save_temporary = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -810,6 +814,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = XOR;
  alt_codes[alt_cost-1].value = low_part;
  alt_codes[alt_cost-1].use_uw = false;
+ alt_codes[alt_cost-1].save_temporary = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -852,6 +857,7 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  alt_codes[alt_cost-1].code = ASHIFT;
  alt_codes[alt_cost-1].value = shift;
  alt_codes[alt_cost-1].use_uw = use_uw;
+ alt_codes[alt_cost-1].save_temporary = false;
  memcpy (codes, alt_codes, sizeof (alt_codes));
  cost = alt_cost;
}
@@ -873,9 +879,11 @@ riscv_build_integer_1 (struct riscv_integer_op 
codes[RISCV_MAX_INTEGER_OPS],
  codes[0].value = (((unsigned HOST_WIDE_INT) value >> trailing_ones)
| (value << (64 - trailing_ones)));
  codes[0].use_uw = false;
+ codes[0].save_temporary = false;
  codes[1].code = ROTATERT;
  codes[1].value = 64 - trailing_ones;
  codes[1].use_uw = false;
+ codes[1].save_temporary = false;
  cost = 2;
}
   /* Handle the case where the 11 bit range of zero bits wraps around.  */
@@ -888,9 +896,11 @@ 

[PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-05-30 Thread HAO CHEN GUI
Hi,
  This patch optimizes vector construction with two vector doubleword loads.
It generates an optimal insn sequence as "xxlor" has lower latency than
"mtvsrdd" on Power10.

  Compared with previous version, the main change is to use "isa" attribute
to guard "lxsd" and "lxsdx".
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Optimize vector construction with two vector doubleword loads

When constructing a vector by two doublewords from memory, originally it
does
ld 10,0(3)
ld 9,0(4)
mtvsrdd 34,9,10

An optimal sequence on Power10 should be
lxsd 0,0(4)
lxvrdx 1,0,3
xxlor 34,1,32

This patch does this optimization by insn combine and split.

gcc/
PR target/103568
* config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn
pattern.
(vsx_ld_highpart_zero_): New insn pattern.
(vsx_concat_mem_): New insn_and_split pattern.

gcc/testsuite/
PR target/103568
* gcc.target/powerpc/pr103568.c: New test.

patch.diff
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f135fa079bd..f9a2a260e89 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di"
   "lxvd2x %x0,%y1"
   [(set_attr "type" "vecload")])

+(define_insn "vsx_ld_lowpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "zero_constant" "j,j")))]
+  ""
+  "@
+   lxsd %0,%1
+   lxsdx %x0,%y1"
+  [(set_attr "type" "vecload,vecload")
+   (set_attr "isa" "p9v,p7v")])
+
+(define_insn "vsx_ld_highpart_zero_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "zero_constant" "j")
+ (match_operand: 2 "memory_operand" "Z")))]
+  "TARGET_POWER10"
+  "lxvrdx %x0,%y2"
+  [(set_attr "type" "vecload")])
+
 (define_insn "vsx_ld_elemrev_v1ti"
   [(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
 (vec_select:V1TI
@@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_"
 }
   [(set_attr "type" "vecperm,vecmove")])

+(define_insn_and_split "vsx_concat_mem_"
+  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
+   (vec_concat:VSX_D
+ (match_operand: 1 "memory_operand" "wY,Z")
+ (match_operand: 2 "memory_operand" "Z,Z")))]
+  "TARGET_POWER10 && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx tmp2 = gen_reg_rtx (mode);
+  emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX 
(mode),
+ operands[1]));
+  emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2],
+CONST0_RTX (mode)));
+  emit_insn (gen_ior3 (operands[0], tmp1, tmp2));
+  DONE;
+})
+
 ;; Combiner patterns to allow creating XXPERMDI's to access either double
 ;; word element in a vector register.
 (define_insn "*vsx_concat__1"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c 
b/gcc/testsuite/gcc.target/powerpc/pr103568.c
new file mode 100644
index 000..b2a06fb2162
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+vector double test (double *a, double *b)
+{
+  return (vector double) {*a, *b};
+}
+
+vector long long test1 (long long *a, long long *b)
+{
+  return (vector long long) {*a, *b};
+}
+
+/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */
+/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */
+


Re: Reverted recent patches to resource.cc

2024-05-30 Thread Hans-Peter Nilsson
> Date: Wed, 29 May 2024 21:23:58 -0600
> Cc: gcc-patches@gcc.gnu.org

> I don't bother with qemu.exp at all.  I've set up binfmt handlers so 
> that I can execute foreign binaries.
> 
> So given a root filesystem, I can chroot into it and do whatever I need. 
>   As far as dejagnu is concerned it looks like the native system.

Interesting.  In a Docker setup (or similar container)?
If so, care to share the Dockerfile (or equivalent)?

A quick web search shows similar attempts, but I found
nothing "turn-key ready" that enables bootstrapping.

I'll try to cook up something, likely Debian-based...later
this decade, aiming for suitability in contrib/.

brgds, H-P
ps. my hyperbolic time guesstimates somehow still expire too soon!


RE: [PATCH] aarch64: Add vector floating point extend patterns [PR113880, PR113869]

2024-05-30 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > This patch improves vectorization of certain floating point widening
> > operations for the aarch64 target by adding vector floating point
> > extend patterns for
> > V2SF->V2DF and V4HF->V4SF conversions.
> >
> > PR target/113880
> > PR target/113869
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (extend2): New
> expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/extend-vec.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> 
> Thanks for doing this.  Could we instead rename
> aarch64_float_extend_lo_ to extend2 and use
> something similar to:
> 
> ---
> /* The builtins below should be expanded through the standard optabs
>CODE_FOR_[u]avg3_[floor,ceil].  However the mapping scheme in
>aarch64-simd-builtins.def does not easily allow us to have a pre-mode
>("uavg") and post-mode string ("_ceil") in the CODE_FOR_* construction.
>So the builtins use a name that is natural for AArch64 instructions
>e.g. "aarch64_srhadd" and we re-map these to the optab-related
>CODE_FOR_ here.  */
> #undef VAR1
> #define VAR1(F,T1,T2,I,M) \
> constexpr insn_code CODE_FOR_aarch64_##F##M =
> CODE_FOR_##T1##M##3##T2;
> 
> BUILTIN_VDQ_BHSI (srhadd, avg, _ceil, 0) BUILTIN_VDQ_BHSI (urhadd, uavg,
> _ceil, 0) BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0) BUILTIN_VDQ_BHSI
> (uhadd, uavg, _floor, 0)
> 
> #undef VAR1
> ---
> 
> (from aarch64-builtins.cc) to handle the intrinsics?  The idea is to try to 
> avoid
> adding new patterns just to satisfy the internal naming convention.

Sure, Richard.

Here's the updated patch 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653177.html.

Please let me know if I missed anything.

Thanks,
Pengxuan
> 
> Richard
> 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md|  7 +++
> >  gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21
> > +++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 868f4486218..8febb411d06 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3141,6 +3141,13 @@ (define_insn
> "aarch64_float_extend_lo_"
> >[(set_attr "type" "neon_fp_cvt_widen_s")]
> >  )
> >
> > +(define_expand "extend2"
> > +  [(set (match_operand: 0 "register_operand" "=w")
> > +(float_extend:
> > +  (match_operand:VDF 1 "register_operand" "w")))]
> > +  "TARGET_SIMD"
> > +)
> > +
> >  ;; Float narrowing operations.
> >
> >  (define_insn "aarch64_float_trunc_rodd_df"
> > diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > new file mode 100644
> > index 000..f6241d5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> > @@ -0,0 +1,21 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s}
> > +1 } } */ void f (float *__restrict a, double *__restrict b) {
> > +  b[0] = a[0];
> > +  b[1] = a[1];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h}
> > +1 } } */ void
> > +f1 (_Float16 *__restrict a, float *__restrict b) {
> > +
> > +  b[0] = a[0];
> > +  b[1] = a[1];
> > +  b[2] = a[2];
> > +  b[3] = a[3];
> > +}


[PATCH v2] aarch64: Add vector floating point extend pattern [PR113880, PR113869]

2024-05-30 Thread Pengxuan Zheng
This patch adds vector floating point extend pattern for V2SF->V2DF and
V4HF->V4SF conversions by renaming the existing aarch64_float_extend_lo_
pattern to the standard optab one, i.e., extend2. This allows the
vectorizer to vectorize certain floating point widening operations for the
aarch64 target.

PR target/113880
PR target/113869

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VAR1): Remap float_extend_lo_
builtin codes to standard optab ones.
* config/aarch64/aarch64-simd.md (aarch64_float_extend_lo_): 
Rename
to...
(extend2): ... This.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/extend-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-builtins.cc|  9 
 gcc/config/aarch64/aarch64-simd.md|  2 +-
 gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++
 3 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index f8eeccb554d..25189888d17 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -534,6 +534,15 @@ BUILTIN_VDQ_BHSI (urhadd, uavg, _ceil, 0)
 BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0)
 BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0)
 
+/* The builtins below should be expanded through the standard optabs
+   CODE_FOR_extend2. */
+#undef VAR1
+#define VAR1(F,T,N,M) \
+  constexpr insn_code CODE_FOR_aarch64_##F##M = CODE_FOR_##T##N##M##2;
+
+VAR1 (float_extend_lo_, extend, v2sf, v2df)
+VAR1 (float_extend_lo_, extend, v4hf, v4sf)
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   {#N #A, UP (A), CF##MAP (N, A), 0, TYPES_##T, FLAG_##FLAG},
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 868f4486218..c5e2c9f00d0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3132,7 +3132,7 @@
 DONE;
   }
 )
-(define_insn "aarch64_float_extend_lo_"
+(define_insn "extend2"
   [(set (match_operand: 0 "register_operand" "=w")
(float_extend:
  (match_operand:VDF 1 "register_operand" "w")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c 
b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
new file mode 100644
index 000..f6241d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */
+void
+f (float *__restrict a, double *__restrict b)
+{
+  b[0] = a[0];
+  b[1] = a[1];
+}
+
+/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */
+void
+f1 (_Float16 *__restrict a, float *__restrict b)
+{
+
+  b[0] = a[0];
+  b[1] = a[1];
+  b[2] = a[2];
+  b[3] = a[3];
+}
-- 
2.17.1



[committed] [x86] Rename double_u with __double_u to avoid pulluting the namespace.

2024-05-30 Thread liuhongt
Committed as an obvious patch.

gcc/ChangeLog:

* config/i386/emmintrin.h (__double_u): Rename from double_u.
(_mm_load_sd): Replace double_u with __double_u.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
* config/i386/xmmintrin.h (__float_u): Rename from float_u.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.
---
 gcc/config/i386/emmintrin.h | 10 +-
 gcc/config/i386/xmmintrin.h |  6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index fa301103daf..356ca218fcb 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -56,7 +56,7 @@ typedef double __m128d __attribute__ ((__vector_size__ (16), 
__may_alias__));
 /* Unaligned version of the same types.  */
 typedef long long __m128i_u __attribute__ ((__vector_size__ (16), 
__may_alias__, __aligned__ (1)));
 typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
-typedef double double_u __attribute__ ((__may_alias__, __aligned__ (1)));
+typedef double __double_u __attribute__ ((__may_alias__, __aligned__ (1)));
 
 /* Create a selector for use with the SHUFPD instruction.  */
 #define _MM_SHUFFLE2(fp1,fp0) \
@@ -146,7 +146,7 @@ _mm_load1_pd (double const *__P)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_load_sd (double const *__P)
 {
-  return __extension__ (__m128d) { *(double_u *)__P, 0.0 };
+  return __extension__ (__m128d) { *(__double_u *)__P, 0.0 };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -181,7 +181,7 @@ _mm_storeu_pd (double *__P, __m128d __A)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_store_sd (double *__P, __m128d __A)
 {
-  *(double_u *)__P = ((__v2df)__A)[0] ;
+  *(__double_u *)__P = ((__v2df)__A)[0] ;
 }
 
 extern __inline double __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -974,13 +974,13 @@ _mm_unpacklo_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadh_pd (__m128d __A, double const *__B)
 {
-  return __extension__ (__m128d) { ((__v2df)__A)[0], *(double_u*)__B };
+  return __extension__ (__m128d) { ((__v2df)__A)[0], *(__double_u*)__B };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadl_pd (__m128d __A, double const *__B)
 {
-  return __extension__ (__m128d) { *(double_u*)__B, ((__v2df)__A)[1] };
+  return __extension__ (__m128d) { *(__double_u*)__B, ((__v2df)__A)[1] };
 }
 
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 87515ecb218..c90fc71331a 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -72,7 +72,7 @@ typedef float __m128 __attribute__ ((__vector_size__ (16), 
__may_alias__));
 
 /* Unaligned version of the same type.  */
 typedef float __m128_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
-typedef float float_u __attribute__ ((__may_alias__, __aligned__ (1)));
+typedef float __float_u __attribute__ ((__may_alias__, __aligned__ (1)));
 
 /* Internal data types for implementing the intrinsics.  */
 typedef float __v4sf __attribute__ ((__vector_size__ (16)));
@@ -910,7 +910,7 @@ _mm_set_ps1 (float __F)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_load_ss (float const *__P)
 {
-  return __extension__ (__m128) (__v4sf){ *(float_u *)__P, 0.0f, 0.0f, 0.0f };
+  return __extension__ (__m128) (__v4sf){ *(__float_u *)__P, 0.0f, 0.0f, 0.0f 
};
 }
 
 /* Create a vector with all four elements equal to *P.  */
@@ -966,7 +966,7 @@ _mm_setr_ps (float __Z, float __Y, float __X, float __W)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_store_ss (float *__P, __m128 __A)
 {
-  *(float_u *)__P = ((__v4sf)__A)[0];
+  *(__float_u *)__P = ((__v4sf)__A)[0];
 }
 
 extern __inline float __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-- 
2.31.1



[PATCH v5] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li 

Update in v5
* Fix some doc build error.

Log in v4:
After we support one gassign form of the unsigned .SAT_ADD,  we
would like to support more forms including both the branch and
branchless.  There are 5 other forms of .SAT_ADD,  list as below:

Form 1:
  #define SAT_ADD_U_1(T) \
  T sat_add_u_1_##T(T x, T y) \
  { \
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Form 2:
  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
  }

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
return (T)(x + y) < x ? -1 : (x + y); \
  }

Take the forms 3 of above as example:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
goto ; [35.00%]
  else
goto ; [65.00%]
;;succ:   4
;;3

;;   basic block 3, loop depth 0
;;pred:   2
  _1 = REALPART_EXPR <_6>;
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   3
;;2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;succ:   EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;succ:   EXIT
}

The flag '^' acts on cond_expr will generate matching code similar as below:

else if (gphi *_a1 = dyn_cast  (_d1))
  {
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : ...
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_1_1 : ...
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, ...);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags  & 
EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
switch (TREE_CODE (_p0))
  ...

The below test suites are still running, will update it later.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (enum expr_flag): Add new enum for expr flag.
(dt_node::gen_kids_1): Add cond_expr and flag handling.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
(match_assign_saturation_arith): Rename to.
(match_phi_saturation_arith): Add new func impl to match the
.SAT_ADD when phi.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.

Signed-off-by: Pan Li 
---
 gcc/doc/match-and-simplify.texi |  16 
 gcc/genmatch.cc | 126 +++-
 gcc/match.pd|  43 ++-
 gcc/tree-ssa-math-opts.cc   |  51 -
 4 files changed, 231 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 01f19e2f62c..9c7316755d4 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -361,6 +361,22 @@ Usually the types of the generated result expressions are
 determined from the context, but sometimes like in the above case
 it is required that you specify them explicitly.
 
+Another modifier for generated expressions is @code{^} which
+tells the machinery to try more matches for so

RE: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-30 Thread Pengxuan Zheng (QUIC)
> Pengxuan Zheng  writes:
> > vget_low_2.c is a test case for little-endian, but we missed the
> > -mlittle-endian flag in r15-697-ga2e4fe5a53cf75.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.
> 
> Ok, thanks.
> 
> If you'd like write access, please follow the instructions on
> https://gcc.gnu.org/gitwrite.html (I'll sponsor).
> 
> Richard

Thanks a lot, Richard! I really appreciate it!

I have submitted a request for write access naming you as sponsor.

Thanks,
Pengxuan
> 
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > index 44414e1c043..93e9e664ee9 100644
> > --- a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > +++ b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +/* { dg-options "-O3 -fdump-tree-optimized -mlittle-endian" } */
> >
> >  #include 


[committed] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak
Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander
also for non-movbe targets.  The testcase:

unsigned short bswap8 (unsigned short val)
{
  return ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

now expands through bswaphi2 named expander.

Rewrite bswaphi_lowpart insn pattern as bswaphisi2_lowpart in the RTX form
that combine pass can use to simplify:

Trying 6, 9, 8 -> 10:
6: r99:SI=bswap(r103:SI)
9: {r107:SI=r103:SI&0x;clobber flags:CC;}
  REG_DEAD r103:SI
  REG_UNUSED flags:CC
8: {r106:SI=r99:SI 0>>0x10;clobber flags:CC;}
  REG_DEAD r99:SI
  REG_UNUSED flags:CC
   10: {r104:SI=r106:SI|r107:SI;clobber flags:CC;}
  REG_DEAD r107:SI
  REG_DEAD r106:SI
  REG_UNUSED flags:CC

Successfully matched this instruction:
(set (reg:SI 104 [ _8 ])
(ior:SI (and:SI (reg/v:SI 103 [ val ])
(const_int -65536 [0x]))
(lshiftrt:SI (bswap:SI (reg/v:SI 103 [ val ]))
(const_int 16 [0x10]
allowing combination of insns 6, 8, 9 and 10

when compiling the following testcase:

unsigned int bswap8 (unsigned int val)
{
  return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

to produce:

movl%edi, %eax
xchgb   %ah, %al
ret

The expansion now always goes through a clobberless form of the bswaphi
instruction.  The instruction is conditionally converted to a rotate at
peephole2 pass.  This significantly simplifies bswaphisi2_lowpart
insn pattern attributes.

PR target/115102

gcc/ChangeLog:

* config/i386/i386.md (bswaphi2): Also enable for !TARGET_MOVBE.
(*bswaphi2): New insn pattern.
(bswaphisi2_lowpart): Rename from bswaphi_lowpart.  Rewrite
insn RTX to match the expected form of the combine pass.
Remove rol{w} alternative and corresponding attributes.
(bswsaphisi2_lowpart peephole2): New peephole2 pattern to
conditionally convert bswaphisi2_lowpart to rotlhi3_1_slp.
(bswapsi2): Update expander for rename.
(rotlhi3_1_slp splitter): Conditionally split to bswaphi2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115102.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c162cd42386..375654cf74e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17210,9 +17210,7 @@ (define_split
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed
   && (TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))"
- [(parallel [(set (strict_low_part (match_dup 0))
- (bswap:HI (match_dup 0)))
-(clobber (reg:CC FLAGS_REG))])])
+ [(set (match_dup 0) (bswap:HI (match_dup 0)))])
 
 ;; Rotations through carry flag
 (define_insn "rcrsi2"
@@ -20730,12 +20728,11 @@ (define_expand "bswapsi2"
 operands[1] = force_reg (SImode, operands[1]);
   else
 {
-  rtx x = operands[0];
+  rtx x = gen_reg_rtx (SImode);
 
-  emit_move_insn (x, operands[1]);
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
   emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
   DONE;
 }
 })
@@ -20767,7 +20764,11 @@ (define_insn "*bswap2"
 (define_expand "bswaphi2"
   [(set (match_operand:HI 0 "register_operand")
(bswap:HI (match_operand:HI 1 "nonimmediate_operand")))]
-  "TARGET_MOVBE")
+  ""
+{
+  if (!TARGET_MOVBE)
+operands[1] = force_reg (HImode, operands[1]);
+})
 
 (define_insn "*bswaphi2_movbe"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=Q,r,m")
@@ -20788,33 +20789,55 @@ (define_insn "*bswaphi2_movbe"
(set_attr "bdver1_decode" "double,*,*")
(set_attr "mode" "QI,HI,HI")])
 
+(define_insn "*bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=Q")
+   (bswap:HI (match_operand:HI 1 "register_operand" "0")))]
+  "!TARGET_MOVBE"
+  "xchg{b}\t{%h0, %b0|%b0, %h0}"
+  [(set_attr "type" "imov")
+   (set_attr "pent_pair" "np")
+   (set_attr "athlon_decode" "vector")
+   (set_attr "amdfam10_decode" "double")
+   (set_attr "bdver1_decode" "double")
+   (set_attr "mode" "QI")])
+
 (define_peephole2
   [(set (match_operand:HI 0 "general_reg_operand")
(bswap:HI (match_dup 0)))]
-  "TARGET_MOVBE
-   && !(TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))
+  "!(TARGET_USE_XCHGB ||
+ TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
&& peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 0) (rotate:HI (match_dup 0) (const_int 8)))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "bswaphi_lowpart"
-  [(set (strict_low_part (match_operand:HI 0 "register_operand" "+Q,r"))
-   (bswap:HI (match_dup 0)))
-   (clobber (reg:CC FLAGS_REG))]
+(define_insn "bswaphisi2_lowpart"
+  [(set (match_operand:SI 0 "register_operand" "=Q"

[PATCH v3 3/6] btf: refactor and simplify implementation

2024-05-30 Thread David Faust
This patch heavily refactors btfout.cc to take advantage of the
structural changes in the prior commits.

Now that inter-type references are internally stored as simply pointers,
all the painful, brittle, confusing infrastructure that was used in the
process of converting CTF type IDs to BTF type IDs can be thrown out.
This greatly simplifies the entire process of converting from CTF to
BTF, making the code cleaner, easier to read, and easier to maintain.

In addition, we no longer need to worry about destructive changes in
internal data structures used commonly by CTF and BTF, which allows
deleting several ancillary data structures previously used in btfout.cc.

This is nearly transparent, but a few improvements have also been made:

 1) BTF_KIND_FUNC records are now _always_ constructed at early_finish,
allowing us to construct records even for functions which are later
inlined by optimizations. DATASEC entries for functions are only
constructed at late_finish, to avoid incorrectly generating entries
for functions which get inlined.

 2) BTF_KIND_VAR records and DATASEC entries for them are now always
constructed at (late) finish, which avoids cases where we could
incorrectly create records for variables which were completely
optimized away.  This fixes PR debug/113566 for non-LTO builds.
In LTO builds, BTF must be emitted at early_finish, so some VAR
records may be emitted for variables which are later optimized away.

 3) Some additional assembler comments have been added with more
information for debugging.

gcc/
* btfout.cc (struct btf_datasec_entry): New.
(struct btf_datasec): Add `id' member.  Change `entries' to use
new struct btf_datasec_entry.
(func_map): New hash_map.
(max_translated_id): New.
(btf_var_ids, btf_id_map, holes, voids, num_vars_added)
(num_types_added, num_types_created): Delete.
(btf_absolute_var_id, btf_relative_var_id, btf_absolute_func_id)
(btf_relative_func_id, btf_absolute_datasec_id, init_btf_id_map)
(get_btf_id, set_btf_id, btf_emit_id_p): Delete.
(btf_removed_type_p): Delete.
(btf_dtd_kind, btf_emit_type_p): New helpers.
(btf_fwd_to_enum_p, btf_calc_num_vbytes): Use them.
(btf_collect_datasec): Delete.
(btf_dtd_postprocess_cb, btf_dvd_emit_preprocess_cb)
(btf_dtd_emit_preprocess_cb, btf_emit_preprocess): Delete.
(btf_dmd_representable_bitfield_p): Adapt to type reference changes
and delete now-unused ctfc argument.
(btf_asm_datasec_type_ref): Delete.
(btf_asm_type_ref): Adapt to type reference changes, simplify.
(btf_asm_type): Likewise. Mark struct/union types with bitfield
members.
(btf_asm_array): Adapt to data structure changes.
(btf_asm_varent): Likewise.
(btf_asm_sou_member): Likewise. Ensure non-bitfield members are
correctly re-encoded if struct or union contains any bitfield.
(btf_asm_func_arg, btf_asm_func_type, btf_asm_datasec_entry)
(btf_asm_datasec_type): Adapt to data structure changes.
(output_btf_header): Adapt to other changes, simplify type
length calculation, add info to assembler comments.
(output_btf_vars): Adapt to other changes.
(output_btf_strs): Fix overlong lines.
(output_asm_btf_sou_fields, output_asm_btf_enum_list)
(output_asm_btf_func_args_list, output_asm_btf_vlen_bytes)
(output_asm_btf_type, output_btf_types, output_btf_func_types)
(output_btf_datasec_types): Adapt to other changes.
(btf_init_postprocess): Delete.
(btf_output): Change to only perform output.
(btf_early_add_const_void, btf_early_add_func_records): New.
(btf_early_finish): Use them here. New.
(btf_datasec_push_entry): Adapt to data structure changes.
(btf_datasec_add_func, btf_datasec_add_var): New.
(btf_late_add_func_datasec_entries): New.
(btf_emit_variable_p): New helper.
(btf_late_add_vars): Use it here. New.
(btf_type_list_cb, btf_late_collect_translated_types): New.
(btf_late_assign_func_ids, btf_late_assign_var_ids)
(btf_late_assign_datasec_ids): New.
(btf_finish): Remove unused argument. Call new btf_late*
functions and btf_output.
(btf_finalize): Adapt to data structure changes.
* ctfc.h (struct ctf_dtdef): Convert existing boolean flags to
BOOL_BITFIELD and reorder.
(struct ctf_dvdef): Add dvd_id member.
(btf_finish): Remove argument from prototype.
(get_btf_id): Delete prototype.
(funcs_traverse_callback, traverse_btf_func_types): Add an
explanatory comment.
* dwarf2ctf.cc (ctf_debug_finish): Remove unused argument.
* dwarf2ctf.h: Analogous change.
* dwarf2out.cc: Likewise.
---
 gcc/btfout.cc| 1254 +++---

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-30 Thread Segher Boessenkool
Hi!

On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
> Code is implemented with pure virtual functions to interface with target
> code.

It's not a pure function.  A pure function -- by definition -- has no
side effects.  These things have side effects.

What you mean is this is *an implementation* for C++ functions without
a generic implementation.  An obfuscation some people (like me) would
say.  But please call things what they are!  So not "pure function".
That has a meaning, and this isn't it.

>   * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>   implementation of additional virtual functions added in pair_fusion
>   struct.

This does not belong in this patch.  Do not send "rs6000" patches that
touch anything outside of config/rs6000/ and similar, certainly not in
config/something-else/!

This would be WAY easier to review (read: AT ALL POSSIBLE) if you
included some detailed rationale and design document.


Segher


[PATCH v3 5/6] bpf,btf: enable BTF pruning by default for BPF

2024-05-30 Thread David Faust
This patch enables -fprune-btf by default in the BPF backend when
generating BTF information, and fixes BPF CO-RE generation when using
-fprune-btf.

When generating BPF CO-RE information, we must ensure that types used
in CO-RE relocations always have sufficient BTF information emited so
that the CO-RE relocations can be processed by a BPF loader.  The BTF
pruning algorithm on its own does not have sufficient information to
determine which types are used in a BPF CO-RE relocation, so this
information must be supplied by the BPF backend, using a new
btf_mark_type_used function.

Co-authored-by: Cupertino Miranda 

gcc/
* btfout.cc (btf_mark_type_used): New.
* ctfc.h (btf_mark_type_used): Declare it here.
* config/bpf/bpf.cc (bpf_option_override): Enable -fprune-btf
by default if -gbtf is enabled.
* config/bpf/core-builtins.cc (extra_fn): New typedef.
(compute_field_expr): Add callback parameter, and call it if supplied.
Fix computation for MEM_REF.
(mark_component_type_as_used): New.
(bpf_mark_types_as_used): Likewise.
(bpf_expand_core_builtin): Call here.
* doc/invoke.texi (Debugging Options): Note that -fprune-btf is
enabled by default for BPF target when generating BTF.

gcc/testsuite/
* gcc.dg/debug/btf/btf-variables-5.c: Add -fno-prune-btf to dg-options.
---
 gcc/btfout.cc | 22 ++
 gcc/config/bpf/bpf.cc |  5 ++
 gcc/config/bpf/core-builtins.cc   | 71 +--
 gcc/ctfc.h|  1 +
 gcc/doc/invoke.texi   |  3 +
 .../gcc.dg/debug/btf/btf-variables-5.c|  6 +-
 6 files changed, 100 insertions(+), 8 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index a7da164f6b31..35d2875e3f61 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -1501,6 +1501,28 @@ btf_late_assign_datasec_ids (ctf_container_ref ctfc)
 }
 }
 
+
+/* Manually mark that type T is used to ensure it will not be pruned.
+   Used by the BPF backend when generating BPF CO-RE to mark types used
+   in CO-RE relocations.  */
+
+void
+btf_mark_type_used (tree t)
+{
+  /* If we are not going to prune anyway, this is a no-op.  */
+  if (!flag_prune_btf)
+return;
+
+  gcc_assert (TYPE_P (t));
+  ctf_container_ref ctfc = ctf_get_tu_ctfc ();
+  ctf_dtdef_ref dtd = ctf_lookup_tree_type (ctfc, t);
+
+  if (!dtd)
+return;
+
+  btf_add_used_type (ctfc, dtd, false, false, true);
+}
+
 /* Callback used for assembling the only-used-types list.  Note that this is
the same as btf_type_list_cb above, but the hash_set traverse requires a
different function signature.  */
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index dd1bfe38d29b..775730700eba 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -221,6 +221,11 @@ bpf_option_override (void)
   && !(target_flags_explicit & MASK_BPF_CORE))
 target_flags |= MASK_BPF_CORE;
 
+  /* -gbtf implies -fprune-btf for BPF target.  */
+  if (btf_debuginfo_p ())
+SET_OPTION_IF_UNSET (&global_options, &global_options_set,
+flag_prune_btf, true);
+
   /* Determine available features from ISA setting (-mcpu=).  */
   if (bpf_has_jmpext == -1)
 bpf_has_jmpext = (bpf_isa >= ISA_V2);
diff --git a/gcc/config/bpf/core-builtins.cc b/gcc/config/bpf/core-builtins.cc
index 232bebcadbd5..86e2e9d6e39f 100644
--- a/gcc/config/bpf/core-builtins.cc
+++ b/gcc/config/bpf/core-builtins.cc
@@ -624,13 +624,20 @@ bpf_core_get_index (const tree node, bool *valid)
 
ALLOW_ENTRY_CAST is an input arguments and specifies if the function should
consider as valid expressions in which NODE entry is a cast expression (or
-   tree code nop_expr).  */
+   tree code nop_expr).
+
+   EXTRA_FN is a callback function to allow extra functionality with this
+   function traversal.  Currently used for marking used type during expand
+   pass.  */
+
+typedef void (*extra_fn) (tree);
 
 static unsigned char
 compute_field_expr (tree node, unsigned int *accessors,
bool *valid,
tree *access_node,
-   bool allow_entry_cast = true)
+   bool allow_entry_cast = true,
+   extra_fn callback = NULL)
 {
   unsigned char n = 0;
   unsigned int fake_accessors[MAX_NR_ACCESSORS];
@@ -647,6 +654,9 @@ compute_field_expr (tree node, unsigned int *accessors,
 
   *access_node = node;
 
+  if (callback != NULL)
+callback (node);
+
   switch (TREE_CODE (node))
 {
 case INDIRECT_REF:
@@ -664,17 +674,19 @@ compute_field_expr (tree node, unsigned int *accessors,
 case COMPONENT_REF:
   n = compute_field_expr (TREE_OPERAND (node, 0), accessors,
  valid,
- access_node, false);
+ access_node, false, callback);
   accessors[n] = bpf_core_get_index (TREE_OPE

[PATCH v3 6/6] opts: allow any combination of DWARF, CTF, BTF

2024-05-30 Thread David Faust
Previously it was not supported to generate both CTF and BTF debug info
in the same compiler run, as both formats made incompatible changes to
the same internal data structures.

With the structural change in the prior patches, in particular the
guarantee that CTF will always be fully emitted before any BTF
translation occurs, there is no longer anything preventing generation
of both CTF and BTF at the same time.  This patch changes option parsing
to allow any combination of -gdwarf, -gctf, and -gbtf at the same time.

gcc/
* opts.cc (set_debug_level): Allow any combination of -gdwarf,
-gctf and -gbtf at the same time.
---
 gcc/opts.cc | 20 +++-
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/gcc/opts.cc b/gcc/opts.cc
index f80d5d4ba8f9..d58bea096a5f 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -3505,21 +3505,15 @@ set_debug_level (uint32_t dinfo, int extended, const 
char *arg,
 }
   else
 {
-  /* Make and retain the choice if both CTF and DWARF debug info are to
-be generated.  */
-  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG))
+  /* Any combination of DWARF, CTF and BTF is allowed together.  */
+  if (((dinfo == DWARF2_DEBUG) || (dinfo == CTF_DEBUG)
+  || (dinfo == BTF_DEBUG))
  && ((opts->x_write_symbols == (DWARF2_DEBUG|CTF_DEBUG))
+ || (opts->x_write_symbols == (DWARF2_DEBUG|BTF_DEBUG))
+ || (opts->x_write_symbols == (CTF_DEBUG|BTF_DEBUG))
  || (opts->x_write_symbols == DWARF2_DEBUG)
- || (opts->x_write_symbols == CTF_DEBUG)))
-   {
- opts->x_write_symbols |= dinfo;
- opts_set->x_write_symbols |= dinfo;
-   }
-  /* However, CTF and BTF are not allowed together at this time.  */
-  else if (((dinfo == DWARF2_DEBUG) || (dinfo == BTF_DEBUG))
-  && ((opts->x_write_symbols == (DWARF2_DEBUG|BTF_DEBUG))
-  || (opts->x_write_symbols == DWARF2_DEBUG)
-  || (opts->x_write_symbols == BTF_DEBUG)))
+ || (opts->x_write_symbols == CTF_DEBUG)
+ || (opts->x_write_symbols == BTF_DEBUG)))
{
  opts->x_write_symbols |= dinfo;
  opts_set->x_write_symbols |= dinfo;
-- 
2.43.0



[PATCH v3 4/6] btf: add -fprune-btf option

2024-05-30 Thread David Faust
This patch adds a new option, -fprune-btf, to control BTF debug info
generation.

As the name implies, this option enables a kind of "pruning" of the BTF
information before it is emitted.  When enabled, rather than emitting
all type information translated from DWARF, only information for types
directly used in the source program is emitted.

The primary purpose of this pruning is to reduce the amount of
unnecessary BTF information emitted, especially for BPF programs.  It is
very common for BPF programs to incldue Linux kernel internal headers in
order to have access to kernel data structures.  However, doing so often
has the side effect of also adding type definitions for a large number
of types which are not actually used by nor relevant to the program.
In these cases, -fprune-btf commonly reduces the size of the resulting
BTF information by 10x or more, as seen on average when compiling Linux
kernel BPF selftests.  This both slims down the size of the resulting
object and reduces the time required by the BPF loader to verify the
program and its BTF information.

Note that the pruning implemented in this patch follows the same rules
as the BTF pruning performed unconditionally by LLVM's BPF backend when
generating BTF.  In particular, the main sources of pruning are:

  1) Only generate BTF for types used by variables and functions at
 the file scope.  Note that with or without pruning, BTF_KIND_VAR
 entries are only generated for variables present in the final
 object - unused static variables or variables completely optimized
 away must not have VAR entries in BTF.

  2) Avoid emitting full BTF for struct and union types which are only
 pointed-to by members of other struct/union types.  In these cases,
 the full BTF_KIND_STRUCT or BTF_KIND_UNION which would normally
 be emitted is replaced with a BTF_KIND_FWD, as though the
 underlying type was a forward-declared struct or union type.

gcc/
* btfout.cc (btf_used_types): New hash set.
(struct btf_fixup): New.
(fixups, forwards): New vecs.
(btf_output): Calculate num_types depending on flag_prune_btf.
(btf_early_finsih): New initialization for flag_prune_btf.
(btf_add_used_type): New function.
(btf_used_type_list_cb): Likewise.
(btf_late_collect_pruned_types): Likewise.
(btf_late_add_vars): Handle special case for variables in ".maps"
section when generating BTF for BPF CO-RE target.
(btf_late_finish): Use btf_late_collect_pruned_types when
flag_prune_btf in effect.  Move some initialization to btf_early_finish.
(btf_finalize): Additional deallocation for flag_prune_btf.
* common.opt (fprune-btf): New flag.
* ctfc.cc (init_ctf_strtable): Make non-static.
* ctfc.h (struct ctf_dtdef): Add visited_children_p boolean flag.
(init_ctf_strtable, ctfc_delete_strtab): Make extern.
* doc/invoke.texi (Debugging Options): Document -fprune-btf.

gcc/testsuite/
* gcc.dg/debug/btf/btf-prune-1.c: New test.
* gcc.dg/debug/btf/btf-prune-2.c: Likewise.
* gcc.dg/debug/btf/btf-prune-3.c: Likewise.
* gcc.dg/debug/btf/btf-prune-maps.c: Likewise.
---
 gcc/btfout.cc | 359 +-
 gcc/common.opt|   4 +
 gcc/ctfc.cc   |   2 +-
 gcc/ctfc.h|   3 +
 gcc/doc/invoke.texi   |  20 +
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |  25 ++
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |  33 ++
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |  35 ++
 .../gcc.dg/debug/btf/btf-prune-maps.c |  20 +
 9 files changed, 494 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 32fda14f704b..a7da164f6b31 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -828,7 +828,10 @@ output_btf_types (ctf_container_ref ctfc)
 {
   size_t i;
   size_t num_types;
-  num_types = ctfc->ctfc_types->elements ();
+  if (flag_prune_btf)
+num_types = max_translated_id;
+  else
+num_types = ctfc->ctfc_types->elements ();
 
   if (num_types)
 {
@@ -957,6 +960,212 @@ btf_early_add_func_records (ctf_container_ref ctfc)
 }
 }
 
+/* The set of types used directly in the source program, and any types manually
+   marked as used.  This is the set of types which will be emitted when
+   flag_prune_btf is set.  */
+static GTY (()) hash_set *btf_used_types;
+
+/* Fixup used to avoid unnecessary pointer chasing for types.  A fixup is
+   created when a structure or union member is a pointer to another struct
+   or union type.  In such cases, avoid emitt

[PATCH v3 2/6] ctf: use pointers instead of IDs internally

2024-05-30 Thread David Faust
This patch replaces all inter-type references in the ctfc internal data
structures with pointers, rather than the references-by-ID which were
used previously.

A couple of small updates in the BPF backend are included to make it
compatible with the change.

This change is only to the in-memory representation of various CTF
structures to make them easier to work with in various cases.  It is
outwardly transparent; there is no change in emitted CTF.

gcc/
* btfout.cc (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines to
include/btf.h.
(btf_dvd_emit_preprocess_cb, btf_emit_preprocess)
(btf_dmd_representable_bitfield_p, btf_asm_array, btf_asm_varent)
(btf_asm_sou_member, btf_asm_func_arg, btf_init_postprocess):
Adapt to structural changes in ctf_* structs.
* ctfc.h (struct ctf_dtdef): Add forward declaration.
(ctf_dtdef_t, ctf_dtdef_ref): Move typedefs earlier.
(struct ctf_arinfo, struct ctf_funcinfo, struct ctf_sliceinfo)
(struct ctf_itype, struct ctf_dmdef, struct ctf_func_arg)
(struct ctf_dvdef): Use pointers instead of type IDs for
references to other types and use typedefs where appropriate.
(struct ctf_dtdef): Add ref_type member.
(ctf_type_exists): Use pointer instead of type ID.
(ctf_add_reftype, ctf_add_enum, ctf_add_slice, ctf_add_float)
(ctf_add_integer, ctf_add_unknown, ctf_add_pointer)
(ctf_add_array, ctf_add_forward, ctf_add_typedef)
(ctf_add_function, ctf_add_sou, ctf_add_enumerator)
(ctf_add_variable): Likewise. Return pointer instead of ID.
(ctf_lookup_tree_type): Return pointer to type instead of ID.
* ctfc.cc: Analogous changes.
* ctfout.cc (ctf_asm_type, ctf_asm_slice, ctf_asm_varent)
(ctf_asm_sou_lmember, ctf_asm_sou_member, ctf_asm_func_arg)
(output_ctf_objt_info): Adapt to changes.
* dwarf2ctf.cc (gen_ctf_type, gen_ctf_void_type)
(gen_ctf_unknown_type, gen_ctf_base_type, gen_ctf_pointer_type)
(gen_ctf_subrange_type, gen_ctf_array_type, gen_ctf_typedef)
(gen_ctf_modifier_type, gen_ctf_sou_type, gen_ctf_function_type)
(gen_ctf_enumeration_type, gen_ctf_variable, gen_ctf_function)
(gen_ctf_type, ctf_do_die): Likewise.
* config/bpf/btfext-out.cc (struct btf_ext_core_reloc): Use
pointer instead of type ID.
(bpf_core_reloc_add, bpf_core_get_sou_member_index)
(output_btfext_core_sections): Adapt to above changes.
* config/bpf/core-builtins.cc (process_type): Likewise.

include/
* btf.h (BTF_VOID_TYPEID, BTF_INIT_TYPEID): Move defines here,
from gcc/btfout.cc.
---
 gcc/btfout.cc   |  40 +++---
 gcc/config/bpf/btfext-out.cc|  14 +-
 gcc/config/bpf/core-builtins.cc |   3 +-
 gcc/ctfc.cc | 139 +-
 gcc/ctfc.h  |  90 ++--
 gcc/ctfout.cc   |  22 +--
 gcc/dwarf2ctf.cc| 244 +++-
 include/btf.h   |   5 +
 8 files changed, 278 insertions(+), 279 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 1b6a9ed811f0..40e8d8c5c01b 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -61,11 +61,6 @@ static char btf_info_section_label[MAX_BTF_LABEL_BYTES];
 #define BTF_INFO_SECTION_LABEL  "Lbtf"
 #endif
 
-/* BTF encodes void as type id 0.  */
-
-#define BTF_VOID_TYPEID 0
-#define BTF_INIT_TYPEID 1
-
 #define BTF_INVALID_TYPEID 0x
 
 /* Mapping of CTF variables to the IDs they will be assigned when they are
@@ -626,7 +621,8 @@ btf_dvd_emit_preprocess_cb (ctf_dvdef_ref *slot, 
ctf_container_ref arg_ctfc)
 return 1;
 
   /* Do not add variables which refer to unsupported types.  */
-  if (!voids.contains (var->dvd_type) && btf_removed_type_p (var->dvd_type))
+  if (!voids.contains (var->dvd_type->dtd_type)
+  && btf_removed_type_p (var->dvd_type->dtd_type))
 return 1;
 
   arg_ctfc->ctfc_vars_list[num_vars_added] = var;
@@ -716,7 +712,7 @@ btf_emit_preprocess (ctf_container_ref ctfc)
 static bool
 btf_dmd_representable_bitfield_p (ctf_container_ref ctfc, ctf_dmdef_t *dmd)
 {
-  ctf_dtdef_ref ref_type = ctfc->ctfc_types_list[dmd->dmd_type];
+  ctf_dtdef_ref ref_type = ctfc->ctfc_types_list[dmd->dmd_type->dtd_type];
 
   if (CTF_V2_INFO_KIND (ref_type->dtd_data.ctti_info) == CTF_K_SLICE)
 {
@@ -913,8 +909,8 @@ btf_asm_type (ctf_container_ref ctfc, ctf_dtdef_ref dtd)
 static void
 btf_asm_array (ctf_container_ref ctfc, ctf_arinfo_t arr)
 {
-  btf_asm_type_ref ("bta_elem_type", ctfc, arr.ctr_contents);
-  btf_asm_type_ref ("bta_index_type", ctfc, arr.ctr_index);
+  btf_asm_type_ref ("bta_elem_type", ctfc, arr.ctr_contents->dtd_type);
+  btf_asm_type_ref ("bta_index_type", ctfc, arr.ctr_index->dtd_type);
   dw2_asm_output_data (4, arr.ctr_nelems, "bta_nelems");
 }
 
@@ -927,7 +923,7 @@ btf_asm_varent (ctf_container_ref ctfc, ctf_dvdef_ref var)
 

[PATCH v3 1/6] ctf, btf: restructure CTF/BTF emission

2024-05-30 Thread David Faust
This commit makes some structural changes to the CTF/BTF debug info
emission.  In particular:

 a) CTF is new always fully generated and emitted before any
BTF-related procedures are run.  This means that BTF-related
functions can change, even irreversibly, the shared in-memory
representation used by the two formats without issue.

 b) BTF generation has fewer entry points, and is cleanly divided
into early_finish and finish.

 c) BTF is now always emitted at finish (called from dwarf2out_finish),
for all targets in non-LTO builds, rather than being emitted at
early_finish for targets other than BPF CO-RE.  In LTO builds,
BTF is emitted at early_finish as before.

Note that this change alone does not alter the contents of BTF at
all, regardless of whether it would have previously been emitted at
early_finish or finish, because the calculation of the BTF to be
emitted is not moved by this patch, only the write-out.

The changes are transparent to both CTF and BTF emission.

gcc/
* btfout.cc (btf_init_postprocess): Rename to...
(btf_early_finish): ...this.
(btf_output): Rename to...
(btf_finish): ...this.
* ctfc.h: Analogous changes.
* dwarf2ctf.cc (ctf_debug_early_finish): Conditionally call
btf_early_finish, or ctf_finalize as appropriate.  Emit BTF
here for LTO builds.
(ctf_debug_finish): Always call btf_finish here if generating
BTF info in non-LTO builds.
(ctf_debug_finalize, ctf_debug_init_postprocess): Delete.
* dwarf2out.cc (dwarf2out_early_finish): Remove call to
ctf_debug_init_postprocess.
---
 gcc/btfout.cc| 28 +
 gcc/ctfc.h   |  4 +--
 gcc/dwarf2ctf.cc | 65 +++-
 gcc/dwarf2out.cc |  2 --
 4 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 07f066a47068..1b6a9ed811f0 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -1491,6 +1491,34 @@ btf_finalize (void)
   tu_ctfc = NULL;
 }
 
+/* Initial entry point of BTF generation, called at early_finish () after
+   CTF information has possibly been output.  Translate all CTF information
+   to BTF, and do any processing that must be done early, such as creating
+   BTF_KIND_FUNC records.  */
+
+void
+btf_early_finish (void)
+{
+  btf_init_postprocess ();
+}
+
+/* Late entry point for BTF generation, called from dwarf2out_finish ().
+   Complete and emit BTF information.  */
+
+void
+btf_finish (const char * filename)
+{
+  btf_output (filename);
+
+  /* If compiling for BPF with CO-RE info, we cannot deallocate until after
+ CO-RE information is created, which happens very late in BPF backend.
+ Therefore, the deallocation (i.e. btf_finalize ()) is delayed until
+ TARGET_ASM_FILE_END for BPF CO-RE.  */
+  if (!btf_with_core_debuginfo_p ())
+btf_finalize ();
+}
+
+
 /* Traversal function for all BTF_KIND_FUNC type records.  */
 
 bool
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index fa188bf2f5a4..e7bd93901cfa 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -384,8 +384,8 @@ extern void ctf_init (void);
 extern void ctf_output (const char * filename);
 extern void ctf_finalize (void);
 
-extern void btf_output (const char * filename);
-extern void btf_init_postprocess (void);
+extern void btf_early_finish (void);
+extern void btf_finish (const char * filename);
 extern void btf_finalize (void);
 
 extern ctf_container_ref ctf_get_tu_ctfc (void);
diff --git a/gcc/dwarf2ctf.cc b/gcc/dwarf2ctf.cc
index dc59569fe560..8f9e2fada9e3 100644
--- a/gcc/dwarf2ctf.cc
+++ b/gcc/dwarf2ctf.cc
@@ -933,30 +933,6 @@ gen_ctf_type (ctf_container_ref ctfc, dw_die_ref die)
   return type_id;
 }
 
-/* Prepare for output and write out the CTF debug information.  */
-
-static void
-ctf_debug_finalize (const char *filename, bool btf)
-{
-  if (btf)
-{
-  btf_output (filename);
-  /* btf_finalize when compiling BPF applciations gets deallocated by the
-BPF target in bpf_file_end.  */
-  if (btf_debuginfo_p () && !btf_with_core_debuginfo_p ())
-   btf_finalize ();
-}
-
-  else
-{
-  /* Emit the collected CTF information.  */
-  ctf_output (filename);
-
-  /* Reset the CTF state.  */
-  ctf_finalize ();
-}
-}
-
 bool
 ctf_do_die (dw_die_ref die)
 {
@@ -996,27 +972,27 @@ ctf_debug_init (void)
   add_name_attribute (ctf_unknown_die, "unknown");
 }
 
-/* Preprocess the CTF debug information after initialization.  */
-
-void
-ctf_debug_init_postprocess (bool btf)
-{
-  /* Only BTF requires postprocessing right after init.  */
-  if (btf)
-btf_init_postprocess ();
-}
-
 /* Early finish CTF/BTF debug info.  */
 
 void
 ctf_debug_early_finish (const char * filename)
 {
-  /* Emit CTF debug info early always.  */
-  if (ctf_debug_info_level > CTFINFO_LEVEL_NONE
-  /* Emit BTF debug info early if CO-RE relocations are not
-required.  */
-  || (btf_debu

[PATCH v3 0/6] btf: refactor and add pruning option

2024-05-30 Thread David Faust
[v2: https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650482.html
 Changes from v2:
 - Handle -flto when generating BTF (in patch 1).  For LTO builds, BTF
   is emitted at early_finish as before.  For non-LTO builds, BTF will
   be emitted at (late) finish for all targets.
 - Move the option parsing change to allow -gbtf and -gctf together to
   the end of the series.  This patch is relatively separate from the
   other changes, and only depends upon patch 1.
 - Include Indu's various comments and suggestions throughout the
   series.
 - Fix a few GNU style issues with indentation and long lines.
 - Fix the special handling for .maps section variables in BTF in
   patch 4.  Previously the type marking was too aggressive, and undid
   nearly all the pruning in some kernel selftests.  In one test this
   fix reduced the emitted BTF from over 6000 types to about 200 types,
   very similar to clang's output for the same test.  ]

This patch series signficantly refactors the BTF generation in gcc,
making it simpler and easier to understand, extend and maintain.

It also introduces an optional algorithm to "prune" BTF information
before emission.  This pruning is meant to be used for BPF programs, to
alleviate the massive bloating of BTF information caused by including
Linux kernel internal headers.  The pruning is designed to be compatible
with the unconditional pruning performed  by the LLVM BPF backend when
generating BTF information.

While the changes are fairly significant, there is no actual change in
emitted BTF information (unless pruning is enabled), other than bug
fixes and small additions to the assembler debug comments.

Patch 1 restructures the emission of CTF and BTF information, with the
result that CTF is always completely generated and emitted before any
BTF-related procedures are run.  BTF emission is moved to late finish
for all targets, except when building with -flto.

Patch 2 changes the data structures shared by CTF and BTF to use
pointers rather than type IDs for all inter-type references.  This
change is completely transparent to both CTF and BTF.

Patch 3 heavily refactors btfout.cc to take advantage of the prior
changes and significantly simplify the BTF implementation.  The changes
are nearly transparent, however some small but important improvements
are also made possible by the refactor, such as fixing PR113566 for
non-LTO builds.

Patch 4 adds a new option to perform pruning of the BTF information
before emission.  This is intended to be used for BPF programs which
often include kernel headers, and in many cases reduces the size of
the resulting BTF information by a factor of 10.

Patch 5 makes BTF pruning work with BPF CO-RE, and enables the pruning
by default in the BPF backend.

Patch 6 takes advantage of the prior changes, and removes the
restriction on generating both CTF and BTF in the same compiler run,
allowing for any combinaion of -gdwarf, -gctf and -gbtf.

Tested on x86_64-linux-gnu, and on x86_64-linux-gnu host for
bpf-unknown-none target.

Also tested by compiling and runninng Linux kernel BPF selftests.
No known regressions.

David Faust (6):
  ctf, btf: restructure CTF/BTF emission
  ctf: use pointers instead of IDs internally
  btf: refactor and simplify implementation
  btf: add -fprune-btf option
  bpf,btf: enable BTF pruning by default for BPF
  opts: allow any combination of DWARF, CTF, BTF

 gcc/btfout.cc | 1613 +
 gcc/common.opt|4 +
 gcc/config/bpf/bpf.cc |5 +
 gcc/config/bpf/btfext-out.cc  |   14 +-
 gcc/config/bpf/core-builtins.cc   |   74 +-
 gcc/ctfc.cc   |  141 +-
 gcc/ctfc.h|  113 +-
 gcc/ctfout.cc |   22 +-
 gcc/doc/invoke.texi   |   23 +
 gcc/dwarf2ctf.cc  |  311 ++--
 gcc/dwarf2ctf.h   |2 +-
 gcc/dwarf2out.cc  |4 +-
 gcc/opts.cc   |   20 +-
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c  |   25 +
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c  |   33 +
 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c  |   35 +
 .../gcc.dg/debug/btf/btf-prune-maps.c |   20 +
 .../gcc.dg/debug/btf/btf-variables-5.c|6 +-
 include/btf.h |5 +
 19 files changed, 1420 insertions(+), 1050 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-1.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-2.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-3.c
 create mode 100644 gcc/testsuite/gcc.dg/debug/btf/btf-prune-maps.c

-- 
2.43.0



Re: [COMMITTED] ggc: Reduce GGC_QUIRE_SIZE on Solaris/SPARC [PR115031]

2024-05-30 Thread Eric Botcazou
> It turns out that this exhaustion of the 32-bit address space happens
> due to a combination of three issues:
> 
> * the SPARC pagesize of 8 kB,
> 
> * ggc-page.cc's chunk size of 512 * pagesize, i.e. 4 MB, and
> 
> * mmap adding two 8 kB unmapped red-zone pages to each mapping
> 
> which result in the 4 MB mappings to actually consume 4.5 MB of address
> space.
> 
> To avoid this, this patch reduces the chunk size so it remains at 4 MB
> even when combined with the red-zone pages, as recommended by mmap(2).

Nice investigation!  This size is a host parameter rather than a target one 
though, so config/sparc/sol2.h is probably not the most appropriate place to 
override it, but I personally do not mind.

-- 
Eric Botcazou




Re: [PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-30 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2.
> It is kept off for generic codegen.
>
> Note the reason for the +sve even though they are in aarch64-sve.exp is if the
> testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then
> the intrinsics end up being disabled because the -march is preferred over the
> -mcpu even though the -mcpu comes later.
>
> This prevents the tests from failing in such runs.

IMO we should just skip aarch64-sve.exp if the options explicitly disable
SVE.  But that's separate work.  I'll try it once this patch is in.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/tuning_models/neoversen2.h (neoversen2_tunings): Add
>   AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
>   * config/aarch64/tuning_models/neoversev1.h (neoversev1_tunings): Add
>   AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
>   * config/aarch64/tuning_models/neoversev2.h (neoversev2_tunings): Add
>   AARCH64_EXTRA_TUNE_AVOID_PRED_RMW.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pred_clobber_1.c: New test.
>   * gcc.target/aarch64/sve/pred_clobber_2.c: New test.
>   * gcc.target/aarch64/sve/pred_clobber_3.c: New test.
>   * gcc.target/aarch64/sve/pred_clobber_4.c: New test.
>
> ---
> diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
> b/gcc/config/aarch64/tuning_models/neoversen2.h
> index 
> 7e799bbe762fe862e31befed50e54040a7fd1f2f..be9a48ac3adc097f967c217fe09dcac194d7d14f
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversen2.h
> +++ b/gcc/config/aarch64/tuning_models/neoversen2.h
> @@ -236,7 +236,8 @@ static const struct tune_params neoversen2_tunings =
>(AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> +   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversev1.h 
> b/gcc/config/aarch64/tuning_models/neoversev1.h
> index 
> 9363f2ad98a5279cc99f2f9b1509ba921d582e84..0fc41ce6a41b3135fa06d2bda1f517fdf4f8dbcf
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversev1.h
> +++ b/gcc/config/aarch64/tuning_models/neoversev1.h
> @@ -227,7 +227,8 @@ static const struct tune_params neoversev1_tunings =
>(AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> -   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> +   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversev2.h 
> b/gcc/config/aarch64/tuning_models/neoversev2.h
> index 
> bc01ed767c9b690504eb98456402df5d9d64eee3..f76e4ef358f7dfb9c7d7b470ea7240eaa2120f8e
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversev2.h
> +++ b/gcc/config/aarch64/tuning_models/neoversev2.h
> @@ -236,7 +236,8 @@ static const struct tune_params neoversev2_tunings =
>(AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> -   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> +   | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> +   | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags.  */
>&generic_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
> new file mode 100644
> index 
> ..934a00a38531c5fd4139d99ff33414904b2c104f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred_clobber_1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcpu=neoverse-n2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#pragma GCC target "+sve"
> +
> +#include 
> +
> +extern void use(svbool_t);
> +
> +/*
> +** foo:
> +**   ...
> +**   ptrue   p([1-9][0-9]?).b, all

Might be better to make this p([1-3]), so that we disallow any registers
that would cause a spill.

OK with that change, thanks.

Richard

> +**   cmplo   p0.h, p\1/z, z0.h, z[0-9]+.h
> +**   ...
> +*/
> +void foo (svuint16_t a, uint16_t b)
> +{
> +svbool_t p0 = svcmplt_n_u16 (svptru

Re: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-30 Thread Richard Sandiford
Tamar Christina  writes:
> [...]
> @@ -6651,8 +6661,10 @@ (define_insn "and3"
>   (and:PRED_ALL (match_operand:PRED_ALL 1 "register_operand")
> (match_operand:PRED_ALL 2 "register_operand")))]
>"TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2   ]
> - [ Upa , Upa, Upa ] and\t%0.b, %1/z, %2.b, %2.b
> +  {@ [ cons: =0, 1  , 2  ; attrs: pred_clobber ]
> + [ &Upa, Upa, Upa; yes ] and\t%0.b, %1/z, %2.b, %2.b
> + [ ?Upa, 0  , Upa; yes ] ^
> + [ Upa , Upa, Upa; no  ] ^

I think this ought to be:

> +  {@ [ cons: =0, 1  ,  2   ; attrs: pred_clobber ]
> + [ &Upa, Upa,  Upa ; yes ] and\t%0.b, %1/z, %2.b, 
> %2.b
> + [ ?Upa, 0Upa, 0Upa; yes ] ^
> + [ Upa , Upa,  Upa ; no  ] ^

so that operand 2 can be tied to operand 0 in the worst case.  Similarly:

>}
>  )
>  
> @@ -6679,8 +6691,10 @@ (define_insn "@aarch64_pred__z"
>   (match_operand:PRED_ALL 3 "register_operand"))
> (match_operand:PRED_ALL 1 "register_operand")))]
>"TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3   ]
> - [ Upa , Upa, Upa, Upa ] \t%0.b, %1/z, %2.b, %3.b
> +  {@ [ cons: =0, 1  , 2  , 3  ; attrs: pred_clobber ]
> + [ &Upa, Upa, Upa, Upa; yes ] \t%0.b, %1/z, 
> %2.b, %3.b
> + [ ?Upa, 0  , Upa, Upa; yes ] ^
> + [ Upa , Upa, Upa, Upa; no  ] ^
>}
>  )

this would be:

  {@ [ cons: =0, 1   , 2   , 3   ; attrs: pred_clobber ]
 [ &Upa, Upa , Upa , Upa ; yes ] \t%0.b, %1/z, 
%2.b, %3.b
 [ ?Upa, 0Upa, 0Upa, 0Upa; yes ] ^
 [ Upa , Upa , Upa,  Upa ; no  ] ^
  }

Same idea for the rest.

I tried this on:

--
#include 

void use (svbool_t, svbool_t, svbool_t);

void
f1 (svbool_t p0, svbool_t p1, svbool_t p2, int n, svbool_t *ptr)
{
  while (n--)
p2 = svand_z (p0, p1, p2);
  *ptr = p2;
}

void
f2 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  *ptr = svand_z (p0, p1, p2);
}

void
f3 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (svand_z (p0, p1, p2), p1, p2);
}

void
f4 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (p0, svand_z (p0, p1, p2), p2);
}

void
f5 (svbool_t p0, svbool_t p1, svbool_t p2, svbool_t *ptr)
{
  use (p0, p1, svand_z (p0, p1, p2));
}
--

and it seemed to produce the right output:

--
f1:
cbz w0, .L2
sub w0, w0, #1
.p2align 5,,15
.L3:
and p2.b, p0/z, p1.b, p2.b
sub w0, w0, #1
cmn w0, #1
bne .L3
.L2:
str p2, [x1]
ret

f2:
and p3.b, p0/z, p1.b, p2.b
str p3, [x0]
ret

f3:
and p0.b, p0/z, p1.b, p2.b
b   use

f4:
and p1.b, p0/z, p1.b, p2.b
b   use

f5:
and p2.b, p0/z, p1.b, p2.b
b   use
--

(with that coming directly from RA, rather than being cleaned
up later)

> [...]
> @@ -10046,8 +10104,10 @@ (define_insn_and_rewrite "*aarch64_brkn_cc"
>  (match_dup 3)]
> UNSPEC_BRKN))]
>"TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3 ]
> - [ Upa , Upa, Upa, 0 ] brkns\t%0.b, %1/z, %2.b, %0.b
> +  {@ [ cons: =0, 1  , 2  , 3; attrs: pred_clobber ]
> + [ &Upa, Upa, Upa, 0; yes ] brkns\t%0.b, %1/z, %2.b, 
> %0.b
> + [ ?Upa, 0  , Upa, 0; yes ] ^
> + [ Upa , Upa, Upa, 0; no  ] ^
>}
>"&& (operands[4] != CONST0_RTX (VNx16BImode)
> || operands[5] != CONST0_RTX (VNx16BImode))"

Probably best to leave this out.  All alternatives require operand 3
to match operand 0.  So operands 1 and 2 will only match operand 0
if they're the same as operand 3.  In that case it'd be better to
allow the sharing rather than force the same value to be stored
in two registers.

That is, if op1 != op3 && op2 != op3 then we get what we want
naturally, regardless of tuning.

The same thing would apply to the BRKN instances of :

> @@ -10020,8 +10076,10 @@ (define_insn "@aarch64_brk"
>  (match_operand:VNx16BI 3 "register_operand")]
> SVE_BRK_BINARY))]
>"TARGET_SVE"
> -  {@ [ cons: =0, 1  , 2  , 3 ]
> - [ Upa , Upa, Upa,  ] brk\t%0.b, %1/z, %2.b, 
> %.b
> +  {@ [ cons: =0,  1 , 2  , 3; attrs: pred_clobber ]
> + [ &Upa, Upa, Upa, ; yes ] 
> brk\t%0.b, %1/z, %2.b, %.b
> + [ ?Upa, 0  , Upa, ; yes ] ^
> + [ Upa , Upa, Upa, ; no  ] ^
>}
>  )

but I think we should keep this factoring/abstraction and just add
the 

Re: [PATCH v10 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-30 Thread Qing Zhao



> On May 30, 2024, at 15:43, Joseph Myers  wrote:
> 
> On Thu, 30 May 2024, Qing Zhao wrote:
> 
>>  In order to make this working, the routine digest_init in c-typeck.cc
>>  is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
>>  when require_constant is TRUE.
> 
> The new changes here are OK.
Thanks.

Qing
> 
> -- 
> Joseph S. Myers
> josmy...@redhat.com
> 



[Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-30 Thread Ajit Agarwal
Hello All:

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific specific code implements virtual functions defined
by generic code.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs6000-mem-fusion.cc and additional virtual
function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.

Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.

Thanks & Regards
Ajit


aarch64, rs6000, middle-end: Add implementation for different targets for pair 
mem fusion

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific specific code implements virtual functions defined
by generic code.

Code is implemented with pure virtual functions to interface with target
code.

Target specific code are added in rs6000-mem-fusion.cc and additional virtual
function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.

2024-05-31  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Add target specific
implementation of additional virtual functions added in pair_fusion
struct.
* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation using pure virtual
functions.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/accesses.h: Moved set_is_live_out_use as public
from private.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/me-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/aarch64/aarch64-ldp-fusion.cc  |  23 +
 gcc/config/rs6000/rs6000-mem-fusion.cc| 629 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/pair-fusion.cc|  18 +-
 gcc/pair-fusion.h |  20 +
 gcc/rtl-ssa/accesses.h|   2 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 12 files changed, 740 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index a37113bd00a..1beabc35d52 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -524,6 +524,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -560,6 +561,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 0af927231d3..784cdc3937c 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -104,6 +104,29 @@ struct aarch64_pair_fusion : public pair_fusion
  bool load_p) override final;
 
   rtx destructure_pair (rtx regs[2], rtx pattern, bool load_p) override final;
+
+  bool should_handle_unordered_insns (rtl_ssa::insn_info *,
+ rtl_ssa::insn_info *) override final
+  {
+return true;
+  }
+
+  bool fuseable_store_p (rtl_ssa::insn_info *,
+rtl_ssa::insn_info *) override final
+  {
+return true;
+  }
+
+  bool fuseable_load_p (rtl_ssa::insn_info *) override final
+  {
+return true;
+  }
+
+  void set_multiword_subreg (rtl_ssa::insn_info *, rtl_ssa::insn_info *,
+bool) overrid

Re: [PATCH v10 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-30 Thread Joseph Myers
On Thu, 30 May 2024, Qing Zhao wrote:

>   In order to make this working, the routine digest_init in c-typeck.cc
>   is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
>   when require_constant is TRUE.

The new changes here are OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v3 #2/2] [rs6000] adjust return_pc debug attrs

2024-05-30 Thread Segher Boessenkool
Hi Alex,

On Thu, May 30, 2024 at 01:40:27PM -0300, Alexandre Oliva wrote:
> Sorry, I misnumbered this patch as #1/2 when first posting v3.

I see at least five completely different patches in this email thread,
with different subjects and all.

> On May 28, 2024, Segher Boessenkool  wrote:
> 
> > Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
> > characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
> > take 8 columns.
> 
> ACK.  When was it bumped up to 80, BTW?  It wasn't always like that, was
> it?

It always was like that.  Some people say 79, that is fine too.

It mostly irks me because lines that end in : and then a lot of empty
space look like peoople used one of those awful "write the changelog for
me" helper things, that *at best* *slow you down*, and always (always!)
cause worse results.

> I've noticed that something seems to change my line width settings
> in Emacs, but I must have missed that memo.

Line length in source code is 79 or 80.  In email it is 72 or so.  This
has not changed since the dawn of time :-)

> >> +/* Return the offset to be added to the label output after CALL_INSN
> >> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> >> +
> >> +static int
> >> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> >> +{
> >> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always
> 
> > This isn't true.  There is a crlogical insn before the bcl for sysv for
> > example.
> 
> Hmm, if that's so, this function will have to look at the insn and
> recognize those cases and use a different way to compute the offset.
> 
> However, I don't see any relevant uses of bcl in output patterns for
> insns containing a call rtx.

bl, bcl, what's the difference (bit 4 is, heh, 16 vs. 18).  Read bl if
you want -- the point is that there are crlogical insns before the
branch-and-link.

> >> +we compute the offset
> >> + back to the address of the call opcode proper, then add the
> >> + constant 4 bytes, to get the address after that opcode.  */
> >> +  return 4 - get_attr_length (call_insn);
> 
> > Please explain this magic, too -- in code preferably (so with a ? :
> > maybe, but don't try to "optimise" that expression, let the compiler
> > do that, it is much better at it anyway :-) )
> 
> There's neither optimization nor magic, it's literally what's written in
> the comment quoted above: starting from the label at the end of the call
> insn (that's what the caller offsets from, as in the documentation in
> the actual #1/2), subtract the length (to get to the address of the call
> opcode), and add 4 (to get past the call opcode).  Ok, I've reordered
> the two addends for an IMHO more readable expression, but that was all.

4 - length does not make any sense /an sich/, it *is* magic.

It is not clear it is correct at all, either.

> > Is that correct for all ABIs we support?  

(Context missing!  What did I ask?)

> Back when I wrote this patchset, I went through all call opcodes I could
> find in the md and .c files within rs6000/, and I was satisfied that it
> covered what we had then, but I won't pretend to know all about all of
> the ppc ABIs.  I may have missed disguised call insns, too.  I was
> hoping some ppc maintainer, more familiar with the port than I am, would
> catch any trouble on review and let me know about pitfalls and surprises
> to watch out for.

Yeah, things don't work that way.  If you need help, *ask* for that.
Don't pretend a patch is good if you have doubts yourself!

> > Even if so, it needs a lot more documentation than this.
> 
> I can write more documentation, but I'm at a loss as to what you're
> hoping for.  If you set clearer expectations, I'll be glad to oblige.

I want a patch submission that is a) understandable and b) a good thing
to have.  If a patch leaves me wondering what is going on at all, that
is not ideal ;-)


Segher


Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-30 Thread Segher Boessenkool
On Wed, May 29, 2024 at 03:52:15AM -0300, Alexandre Oliva wrote:
> On May 27, 2024, "Kewen.Lin"  wrote:
> 
> > I wonder if it's possible to have a test case for this?
> 
> gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on
> ppc64le-linux-gnu.  Are these the sort of test case you're interested
> in, or are you looking for something that tests the offsets in debug
> info, rather than the end-to-end debugging feature?

A testcase specifically for this would be good, yes.  Where you can say
at the top of the file "This tests that ... [X and Y]" :-)


Segher


Re: [RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-30 Thread Joseph Myers
On Wed, 29 May 2024, Kewen.Lin wrote:

> > Note that when removing a target macro, it's a good idea to add it to the 
> > "Old target macros that have moved to the target hooks structure." list 
> > (of #pragma GCC poison) in system.h to ensure any new target that was 
> > originally written before the change doesn't accidentally get into GCC 
> > while still using the old macros.
> > 
> 
> Thanks for the comments on target macro removal!  I found it means
> that we can't use such macros any more even if they have become port
> specific.  For some targets such as pa, they redefine these macros in

Yes, that's intentional.  If you need subtarget headers to define 
something for use in the new target hook for those targets, make them 
define e.g. SUBTARGET_LONG_DOUBLE_MODE or similar instead, rather than 
using the old poisoned macro names.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric [PR108760]

2024-05-30 Thread Michael Levine (BLOOMBERG/ 731 LEX)
When I remove  for importing __memcmp (my apologies for 
writing __memcpy) from libstdc++-v3/include/bits/ranges_algobase.h and try to 
rerun the code, I get the following error:

In file included from 
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/numeric:69,
 from ranges-iota-fix.cpp:1:
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h: In 
member function ‘constexpr bool std::ranges::__equal_fn::operator()(_Iter1, 
_Sent1, _Iter2, _Sent2, _Pred, _Proj1, _Proj2) const’:
$HOME/projects/objdirforgcc/_pfx/include/c++/15.0.0/bits/ranges_algobase.h:143:32:
 error: ‘__memcmp’ is not a member of ‘std’; did you mean ‘__memcmpable’?
  143 |   return !std::__memcmp(__first1, __first2, __len);
  |^~~~
  |__memcmpable

From: jwak...@redhat.com At: 05/24/24 10:12:57 UTC-4:00To:  Michael Levine 
(BLOOMBERG/ 731 LEX ) 
Cc:  ppa...@redhat.com,  gcc-patches@gcc.gnu.org,  libstd...@gcc.gnu.org
Subject: Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric 
[PR108760]

On 24/05/24 13:56 -, Michael Levine (BLOOMBERG/ 731 LEX) wrote:
>I've attached the v3 version of the patch as a single, squashed patch 
containing all of the changes.  I manually prepended my sign off to the patch.


>Signed-off-by: Michael Levine 
>---
>diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
>index 62faff173bd..d258be0b93f 100644
>--- a/libstdc++-v3/include/bits/ranges_algo.h
>+++ b/libstdc++-v3/include/bits/ranges_algo.h
>@@ -3521,58 +3521,6 @@ namespace ranges
> 
> #endif // __glibcxx_ranges_contains
> 
>-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
>-
>-  template
>-struct out_value_result
>-{
>-  [[no_unique_address]] _Out out;
>-  [[no_unique_address]] _Tp value;
>-
>-  template
>-requires convertible_to
>-&& convertible_to
>-constexpr
>-  operator out_value_result<_Out2, _Tp2>() const &
>- { return {out, value}; }
>-
>-  template
>-  requires convertible_to<_Out, _Out2>
>-   && convertible_to<_Tp, _Tp2>
>-   constexpr
>-  operator out_value_result<_Out2, _Tp2>() &&
>-  { return {std::move(out), std::move(value)}; }
>-};
>-
>-  template
>-using iota_result = out_value_result<_Out, _Tp>;
>-
>-  struct __iota_fn
>-  {
>-template _Sent, 
weakly_incrementable _Tp>
>-  requires indirectly_writable<_Out, const _Tp&>
>-  constexpr iota_result<_Out, _Tp>
>-  operator()(_Out __first, _Sent __last, _Tp __value) const
>-  {
>-while (__first != __last)
>-{
>-*__first = static_cast(__value);
>- ++__first;
>- ++__value;
>-   }
>-return {std::move(__first), std::move(__value)};
>-  }
>-
>-template _Range>
>-  constexpr iota_result, _Tp>
>-  operator()(_Range&& __r, _Tp __value) const
>-  { return (*this)(ranges::begin(__r), ranges::end(__r), 
std::move(__value)); }
>-  };
>-
>-  inline constexpr __iota_fn iota{};
>-
>-#endif // __glibcxx_ranges_iota
>-
> #if __glibcxx_ranges_find_last >= 202207L // C++ >= 23
> 
>   struct __find_last_fn
>diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
>index e26a73a27d6..965b36aed35 100644
>--- a/libstdc++-v3/include/bits/ranges_algobase.h
>+++ b/libstdc++-v3/include/bits/ranges_algobase.h
>@@ -35,6 +35,7 @@
> #include 
> #include 
> #include 
>+#include  // __memcpy

Why is this being added here? What is __memcpy?

I don't think out_value_result requires any new headers to be included
here, does it?

> #include  // ranges::begin, ranges::range etc.
> #include   // __invoke
> #include  // __is_byte
>@@ -70,6 +71,32 @@ namespace ranges
>__is_move_iterator> = true;
>   } // namespace __detail
> 
>+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
>+
>+template
>+struct out_value_result
>+{
>+[[no_unique_address]] _Out out;
>+[[no_unique_address]] _Tp value;
>+
>+template
>+requires convertible_to
>+&& convertible_to
>+constexpr
>+  operator out_value_result<_Out2, _Tp2>() const &
>+ { return {out, value}; }
>+
>+template
>+requires convertible_to<_Out, _Out2>
>+&& convertible_to<_Tp, _Tp2>
>+  constexpr
>+  operator out_value_result<_Out2, _Tp2>() &&
>+  { return {std::move(out), std::move(value)}; }
>+};
>+
>+#endif // __glibcxx_ranges_iota
>+
>+
>   struct __equal_fn
>   {
> template _Sent1,
>diff --git a/libstdc++-v3/include/std/numeric 
b/libstdc++-v3/include/std/numeric
>index c912db4a519..d88f7f02137 100644
>--- a/libstdc++-v3/include/std/numeric
>+++ b/libstdc++-v3/include/std/numeric
>@@ -65,6 +65,10 @@
> # include 
> #endif
> 
>+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
>+#include  // for out_value_result as used by 
std::ranges::iota.  It transitively also brings in , from 
which _Range is used by 

Re: [PATCH] Fix LTO type mismatch warning on transparent union

2024-05-30 Thread Richard Biener



> Am 30.05.2024 um 13:46 schrieb Eric Botcazou :
> 
> 
>> 
>> Do function pointers inter-operate TBAA wise for this case and would this
>> possibly An issue?
> 
> Do you mean in LTO mode?  I must say I'm not sure of the way LTO performs TBAA
> for function pointers: does it require (strict) matching of the type for all
> the parameters of the pointed-to function types?  

Yes, I think in terms of how we compute canonical types.  These pointer to 
derived types prove difficult for c23 as well (even without considering cross 
language interoperability and LTO).

> If so, then I guess it could
> theoretically assign different alias sets to compatible function pointers when
> one of them happens to point to the function type of a function imported with
> the transparent union gap, with some problematic fallout when objects of these
> function pointers happen to be indirectly modified in the program...
> 
> Note that there is an equivalent bypass based on common_or_extern a few lines
> below in the function (although I'm not sure if it's problematic TBAA-wise).

I’d have to check.  I think the diagnostic at hand tries to diagnose possible 
call ABI issues, so not sure why it mentions strict aliasing.  Or maybe I 
misremember.

A patch like yours would be OK if this is really just about the ABI issue.

Richard 

> --
> Eric Botcazou
> 
> 


Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-30 Thread Ian Lance Taylor
On Thu, May 30, 2024 at 12:48 AM Martin Uecker  wrote:
>
>
> Hi Ian,
>
> can you give me a green light for the go changes. The C FE
> changes were approved.
>
> The only change with respect to the last version are
> the removal of the unneeded null check for the
> main variant (as discussed) and that I also removed the
>
> container->decls_seen.add (TREE_TYPE (decl));
>
> and the corresponding check, because I think it is
> redundant if we also test for the main variant.
> (it wasn't with TYPE_CANONICAL because this was
> only added conditionally).
>
> The other code in the file only checks for added
> declarations not types, so should not depend on this.

Apologies.  I thought that I had already said that the Go changes are
fine if the libgo tests still pass.  Anyhow, that is the case: if the
tests pass, the change is fine.  Thanks.

Ian


Re: [PATCH v3 #2/2] [rs6000] adjust return_pc debug attrs

2024-05-30 Thread Alexandre Oliva
Sorry, I misnumbered this patch as #1/2 when first posting v3.

On May 28, 2024, Segher Boessenkool  wrote:

> Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
> characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
> take 8 columns.

ACK.  When was it bumped up to 80, BTW?  It wasn't always like that, was
it?  I've noticed that something seems to change my line width settings
in Emacs, but I must have missed that memo.

>> +/* Return the offset to be added to the label output after CALL_INSN
>> +   to compute the address to be placed in DW_AT_call_return_pc.  */
>> +
>> +static int
>> +rs6000_call_offset_return_label (rtx_insn *call_insn)
>> +{
>> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always

> This isn't true.  There is a crlogical insn before the bcl for sysv for
> example.

Hmm, if that's so, this function will have to look at the insn and
recognize those cases and use a different way to compute the offset.

However, I don't see any relevant uses of bcl in output patterns for
insns containing a call rtx.  There are other uses in profiling and pic
loading and whatnot, but those don't get mentioned in the call graph in
debug info, and so they don't get this target hook called on them.

>> +we compute the offset
>> + back to the address of the call opcode proper, then add the
>> + constant 4 bytes, to get the address after that opcode.  */
>> +  return 4 - get_attr_length (call_insn);

> Please explain this magic, too -- in code preferably (so with a ? :
> maybe, but don't try to "optimise" that expression, let the compiler
> do that, it is much better at it anyway :-) )

There's neither optimization nor magic, it's literally what's written in
the comment quoted above: starting from the label at the end of the call
insn (that's what the caller offsets from, as in the documentation in
the actual #1/2), subtract the length (to get to the address of the call
opcode), and add 4 (to get past the call opcode).  Ok, I've reordered
the two addends for an IMHO more readable expression, but that was all.

> Is that correct for all ABIs we support?  

Back when I wrote this patchset, I went through all call opcodes I could
find in the md and .c files within rs6000/, and I was satisfied that it
covered what we had then, but I won't pretend to know all about all of
the ppc ABIs.  I may have missed disguised call insns, too.  I was
hoping some ppc maintainer, more familiar with the port than I am, would
catch any trouble on review and let me know about pitfalls and surprises
to watch out for.

> Even if so, it needs a lot more documentation than this.

I can write more documentation, but I'm at a loss as to what you're
hoping for.  If you set clearer expectations, I'll be glad to oblige.

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-30 Thread Richard Sandiford
Tamar Christina  writes:
>> -Original Message-
>> From: Tamar Christina 
>> Sent: Wednesday, May 22, 2024 10:29 AM
>> To: Richard Sandiford 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; ktkac...@gcc.gnu.org
>> Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for
>> enabling conditional early clobber
>>
>> >
>> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
>> > (I'm open to other suggestions.)  Just looking for something that describes
>> > either the architecture or the end result that we want to achieve.
>> > And preferable something fairly short :)
>> >
>> > avoid_* would be consistent with the existing "avoid_cross_loop_fma".
>> >
>> > > +
>> > >  #undef AARCH64_EXTRA_TUNING_OPTION
>> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
>> > > index
>> >
>> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
>> > 6b46c74084ba7c3c 100644
>> > > --- a/gcc/config/aarch64/aarch64.h
>> > > +++ b/gcc/config/aarch64/aarch64.h
>> > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
>> > AARCH64_FL_SM_OFF;
>> > >  enabled through +gcs.  */
>> > >  #define TARGET_GCS (AARCH64_ISA_GCS)
>> > >
>> > > +/*  Prefer different predicate registers for the output of a predicated 
>> > > operation
>> > over
>> > > +re-using an existing input predicate.  */
>> > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
>> > > +  && (aarch64_tune_params.extra_tuning_flags \
>> > > +  &
>> > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
>> > >
>> > >  /* Standard register usage.  */
>> > >
>> > > diff --git a/gcc/config/aarch64/aarch64.md 
>> > > b/gcc/config/aarch64/aarch64.md
>> > > index
>> >
>> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
>> > 53473b478c5ddba82 100644
>> > > --- a/gcc/config/aarch64/aarch64.md
>> > > +++ b/gcc/config/aarch64/aarch64.md
>> > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
>> > "any"))
>> > >  ;; target-independent code.
>> > >  (define_attr "is_call" "no,yes" (const_string "no"))
>> > >
>> > > +;; Indicates whether we want to enable the pattern with an optional 
>> > > early
>> > > +;; clobber for SVE predicates.
>> > > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
>> > > +
>> > >  ;; [For compatibility with Arm in pipeline models]
>> > >  ;; Attribute that specifies whether or not the instruction touches fp
>> > >  ;; registers.
>> > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
>> > >  (define_attr "arch_enabled" "no,yes"
>> > >(if_then_else
>> > >  (ior
>> > > - (eq_attr "arch" "any")
>> > > + (and (eq_attr "arch" "any")
>> > > +  (eq_attr "pred_clobber" "no"))
>> > >
>> > >   (and (eq_attr "arch" "rcpc8_4")
>> > >(match_test "AARCH64_ISA_RCPC8_4"))
>> > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
>> > >(match_test "TARGET_SVE"))
>> > >
>> > >   (and (eq_attr "arch" "sme")
>> > > -  (match_test "TARGET_SME")))
>> > > +  (match_test "TARGET_SME"))
>> > > +
>> > > + (and (eq_attr "pred_clobber" "yes")
>> > > +  (match_test "TARGET_SVE_PRED_CLOBBER")))
>> >
>> > IMO it'd be bettero handle pred_clobber separately from arch, as a new
>> > top-level AND:
>> >
>> >   (and
>> > (ior
>> >   (eq_attr "pred_clobber" "no")
>> >   (match_test "!TARGET_..."))
>> > (ior
>> >   ...existing arch tests...))
>> >
>>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-tuning-flags.def
> (AVOID_PRED_RMW): New.
> * config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
> * config/aarch64/aarch64.md (pred_clobber): New.
> (arch_enabled): Use it.
>
> -- inline copy of patch --
>
> diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> index 
> d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb
>  100644
> --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> @@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
> AVOID_CROSS_LOOP_FMA)
>
>  AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
>
> +/* Enable is the target prefers to use a fresh register for predicate outputs
> +   rather than re-use an input predicate register.  */
> +AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
> +
>  #undef AARCH64_EXTRA_TUNING_OPTION
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b
>  100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> AARCH64_FL

[PATCH 6/6] vect: Optimize order of lane-reducing statements in loop def-use cycles [PR114440]

2024-05-30 Thread Feng Xue OS
When transforming multiple lane-reducing operations in a loop reduction chain,
originally, corresponding vectorized statements are generated into def-use
cycles starting from 0. The def-use cycle with smaller index, would contain
more statements, which means more instruction dependency. For example:

   int sum = 0;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
 }

Original transformation result:

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 }

For a higher instruction parallelism in final vectorized loop, an optimal
means is to make those effective vectorized lane-reducing statements be
distributed evenly among all def-use cycles. Transformed as the below,
DOT_PROD, WIDEN_SUM and SADs are generated into disparate cycles,
instruction dependency could be eliminated.

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = WIDEN_SUM (w_v1[i: 0 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = sum_v0;  // copy
   sum_v1 = sum_v1;  // copy
   sum_v2 = SAD (s0_v2[i: 0 ~ 7 ], s1_v2[i: 0 ~ 7 ], sum_v2);
   sum_v3 = SAD (s0_v3[i: 8 ~ 15], s1_v3[i: 8 ~ 15], sum_v3);
 }

Thanks,
Feng
---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
* tree-vect-loop.cc (vect_transform_reduction): Generate lane-reducing
statements in an optimized order.
---
 gcc/tree-vect-loop.cc | 51 ++-
 gcc/tree-vectorizer.h |  6 +
 2 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b5849dbb08a..4807f529506 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8703,7 +8703,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 }
 
   bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);
-  gcc_assert (single_defuse_cycle || lane_reducing_op_p (code));
+  bool lane_reducing = lane_reducing_op_p (code);
+  gcc_assert (single_defuse_cycle || lane_reducing);
 
   /* Create the destination vector  */
   tree scalar_dest = gimple_get_lhs (stmt_info->stmt);
@@ -8720,6 +8721,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 }
   else
 {
+  int result_pos = 0;
+
   /* The input vectype of the reduction PHI determines copies of
 vectorized def-use cycles, which might be more than effective copies
 of vectorized lane-reducing reduction statements.  This could be
@@ -8749,9 +8752,9 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy
 
-  sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
-  sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
-  sum_v2 = sum_v2;  // copy
+  sum_v0 = sum_v0;  // copy
+  sum_v1 = SAD (s0_v1[i: 0 ~ 7 ], s1_v1[i: 0 ~ 7 ], sum_v1);
+  sum_v2 = SAD (s0_v2[i: 8 ~ 15], s1_v2[i: 8 ~ 15], sum_v2);
   sum_v3 = sum_v3;  // copy
 
   sum_v0 += n_v0[i: 0  ~ 3 ];
@@ -8759,7 +8762,20 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
   sum_v2 += n_v2[i: 8  ~ 11];
   sum_v3 += n_v3[i: 12 ~ 15];
 }
-   */
+
+Moreover, for a higher instruction parallelism in final vectorized
+loop, it is considered to make those effective vectorized
+lane-reducing statements be distributed evenly among all def-use
+cycles. In the above example, SADs are generated into other cycles
+rather than that of DOT_PROD.  */
+
+  if (stmt_ncopies < ncopies)
+   {
+ gcc_assert (lane_reducing);
+ result_pos = reduc_info->reduc_result_pos;
+ reduc_info->reduc_result_pos = (result_pos + stmt_ncopies) % ncopies;
+ gcc_assert (result_pos >= 0 && result_pos < ncopies);
+   }
 
   for (i = 0; i < MIN (3, (int) op.num_ops); i++)
{
@@ -8792,7 +8808,30 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
 op.ops[i], &vec_oprnds[i], vectype);
 
  if (used_ncopies <

[PATCH 5/6] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
For lane-reducing operation(dot-prod/widen-sum/sad) in loop reduction, current
vectorizer could only handle the pattern if the reduction chain does not
contain other operation, no matter the other is normal or lane-reducing.

Actually, to allow multiple arbitray lane-reducing operations, we need to
support vectorization of loop reduction chain with mixed input vectypes. Since
lanes of vectype may vary with operation, the effective ncopies of vectorized
statements for operation also may not be same to each other, this causes
mismatch on vectorized def-use cycles. A simple way is to align all operations
with the one that has the most ncopies, the gap could be complemented by
generating extra trival pass-through copies. For example:

   int sum = 0;
   for (i)
 {
   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
   sum += n[i];   // normal 
 }

The vector size is 128-bit,vectorization factor is 16. Reduction statements
would be transformed as:

   vector<4> int sum_v0 = { 0, 0, 0, 0 };
   vector<4> int sum_v1 = { 0, 0, 0, 0 };
   vector<4> int sum_v2 = { 0, 0, 0, 0 };
   vector<4> int sum_v3 = { 0, 0, 0, 0 };

   for (i / 16)
 {
   sum_v0 = DOT_PROD (d0_v0[i: 0 ~ 15], d1_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = WIDEN_SUM (w_v0[i: 0 ~ 15], sum_v0);
   sum_v1 = sum_v1;  // copy
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 = SAD (s0_v0[i: 0 ~ 7 ], s1_v0[i: 0 ~ 7 ], sum_v0);
   sum_v1 = SAD (s0_v1[i: 8 ~ 15], s1_v1[i: 8 ~ 15], sum_v1);
   sum_v2 = sum_v2;  // copy
   sum_v3 = sum_v3;  // copy

   sum_v0 += n_v0[i: 0  ~ 3 ];
   sum_v1 += n_v1[i: 4  ~ 7 ];
   sum_v2 += n_v2[i: 8  ~ 11];
   sum_v3 += n_v3[i: 12 ~ 15];
 }

Thanks,
Feng
---
gcc/
PR tree-optimization/114440
* tree-vectorizer.h (vectorizable_lane_reducing): New function
declaration.
* tree-vect-stmts.cc (vect_analyze_stmt): Call new function
vectorizable_lane_reducing to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_model_reduction_cost): Remove cost computation
code related to emulated_mixed_dot_prod.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing.
(vect_transform_reduction): Extend transformation to support reduction
statements with mixed input vectypes.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
* gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c
---
 .../gcc.dg/vect/vect-reduc-chain-1.c  |  62 +++
 .../gcc.dg/vect/vect-reduc-chain-2.c  |  77 +++
 .../gcc.dg/vect/vect-reduc-chain-3.c  |  66 +++
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-1.c  |  97 
 .../gcc.dg/vect/vect-reduc-chain-dot-slp-2.c  |  81 +++
 .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  35 ++
 gcc/tree-vect-loop.cc | 478 --
 gcc/tree-vect-stmts.cc|   2 +
 gcc/tree-vectorizer.h |   2 +
 9 files changed, 755 insertions(+), 145 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-dot-slp-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
new file mode 100644
index 000..04bfc419dbd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
@@ -0,0 +1,62 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res,
+   SIGNEDNESS_2 char *restrict a,
+   SIGNEDNESS_2 char *restrict b,
+   SIGNEDNESS_2 char *restrict c,
+   SIGNEDNESS_2 char *restrict d,
+   SIGNEDNESS_1 int *

[PATCH 4/6] vect: Bind input vectype to lane-reducing operation

2024-05-30 Thread Feng Xue OS
The input vectype is an attribute of lane-reducing operation, instead of
reduction PHI that it is associated to, since there might be more than one
lane-reducing operations with different type in a loop reduction chain. So
bind each lane-reducing operation with its own input type.

Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove parameter
loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
(vect_model_reduction_cost): Remove loop_vinfo argument of call to
vect_is_emulated_mixed_dot_prod.
(vect_transform_reduction): Likewise.
(vectorizable_reduction): Likewise, and bind input vectype to
lane-reducing operation.
---
 gcc/tree-vect-loop.cc | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 51627c27f8a..20c99f11e9a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5270,8 +5270,7 @@ have_whole_vector_shift (machine_mode mode)
See vect_emulate_mixed_dot_prod for the actual sequence used.  */
 
 static bool
-vect_is_emulated_mixed_dot_prod (loop_vec_info loop_vinfo,
-stmt_vec_info stmt_info)
+vect_is_emulated_mixed_dot_prod (stmt_vec_info stmt_info)
 {
   gassign *assign = dyn_cast (stmt_info->stmt);
   if (!assign || gimple_assign_rhs_code (assign) != DOT_PROD_EXPR)
@@ -5282,10 +5281,9 @@ vect_is_emulated_mixed_dot_prod (loop_vec_info 
loop_vinfo,
   if (TYPE_SIGN (TREE_TYPE (rhs1)) == TYPE_SIGN (TREE_TYPE (rhs2)))
 return false;
 
-  stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
-  gcc_assert (reduc_info->is_reduc_info);
+  gcc_assert (STMT_VINFO_REDUC_VECTYPE_IN (stmt_info));
   return !directly_supported_p (DOT_PROD_EXPR,
-   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info),
+   STMT_VINFO_REDUC_VECTYPE_IN (stmt_info),
optab_vector_mixed_sign);
 }
 
@@ -5324,8 +5322,8 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
   if (!gimple_extract_op (orig_stmt_info->stmt, &op))
 gcc_unreachable ();
 
-  bool emulated_mixed_dot_prod
-= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
+  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
+
   if (reduction_type == EXTRACT_LAST_REDUCTION)
 /* No extra instructions are needed in the prologue.  The loop body
operations are costed in vectorizable_condition.  */
@@ -7840,6 +7838,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 vectype_in = STMT_VINFO_VECTYPE (phi_info);
   STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
 
+  /* Each lane-reducing operation has its own input vectype, while reduction
+ PHI records the input vectype with least lanes.  */
+  if (lane_reducing)
+STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
+
   enum vect_reduction_type v_reduc_type = STMT_VINFO_REDUC_TYPE (phi_info);
   STMT_VINFO_REDUC_TYPE (reduc_info) = v_reduc_type;
   /* If we have a condition reduction, see if we can simplify it further.  */
@@ -8366,7 +8369,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   if (single_defuse_cycle || lane_reducing)
 {
   int factor = 1;
-  if (vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info))
+  if (vect_is_emulated_mixed_dot_prod (stmt_info))
/* Three dot-products and a subtraction.  */
factor = 4;
   record_stmt_cost (cost_vec, ncopies * factor, vector_stmt,
@@ -8617,8 +8620,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
: &vec_oprnds2));
 }
 
-  bool emulated_mixed_dot_prod
-= vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info);
+  bool emulated_mixed_dot_prod = vect_is_emulated_mixed_dot_prod (stmt_info);
+
   FOR_EACH_VEC_ELT (vec_oprnds0, i, def0)
 {
   gimple *new_stmt;
-- 
2.17.1From b885de76ad7e9f5accceff18cb6c11de73a36225 Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Wed, 29 May 2024 16:41:57 +0800
Subject: [PATCH 4/6] vect: Bind input vectype to lane-reducing operation

The input vectype is an attribute of lane-reducing operation, instead of
reduction PHI that it is associated to, since there might be more than one
lane-reducing operations with different type in a loop reduction chain. So
bind each lane-reducing operation with its own input type.

2024-05-29 Feng Xue 

gcc/
	* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove parameter
	loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
	(vect_model_reduction_cost): Remove loop_vinfo argument of call to
	vect_is_emulated_mixed_dot_prod.
	(vect_transform_reduction): Likewise.
	(vectorizable_reduction): Likewise, and bind input vectype to
	lane-reducing operation.
---
 gcc/tree-vect-loop.cc | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/tr

[PATCH 3/6] vect: Set STMT_VINFO_REDUC_DEF for non-live stmt in loop reduction

2024-05-30 Thread Feng Xue OS
Normally, vectorizable checking on statement in a loop reduction chain does
not use the reduction PHI information. But some special statements might
need it in vectorizable analysis, especially, for multiple lane-reducing
operations support later.

Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vectorizable_reduction): Set STMT_VINFO_REDUC_DEF
for non-live stmt.
* tree-vect-stmts.cc (vectorizable_condition): Treat the condition
statement that is pointed by stmt_vec_info of reduction PHI as the
real "for_reduction" statement.
---
 gcc/tree-vect-loop.cc  |  5 +++--
 gcc/tree-vect-stmts.cc | 11 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index aa5f21ccd1a..51627c27f8a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7632,14 +7632,15 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 all lanes here - even though we only will vectorize from
 the SLP node with live lane zero the other live lanes also
 need to be identified as part of a reduction to be able
-to skip code generation for them.  */
+to skip code generation for them.  For lane-reducing operation
+vectorizable analysis needs the reduction PHI information.  */
   if (slp_for_stmt_info)
{
  for (auto s : SLP_TREE_SCALAR_STMTS (slp_for_stmt_info))
if (STMT_VINFO_LIVE_P (s))
  STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
}
-  else if (STMT_VINFO_LIVE_P (vdef))
+  else
STMT_VINFO_REDUC_DEF (def) = phi_info;
   gimple_match_op op;
   if (!gimple_extract_op (vdef->stmt, &op))
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 935d80f0e1b..2e0be763abb 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12094,11 +12094,20 @@ vectorizable_condition (vec_info *vinfo,
   vect_reduction_type reduction_type = TREE_CODE_REDUCTION;
   bool for_reduction
 = STMT_VINFO_REDUC_DEF (vect_orig_stmt (stmt_info)) != NULL;
+  if (for_reduction)
+{
+  reduc_info = info_for_reduction (vinfo, stmt_info);
+  if (STMT_VINFO_REDUC_DEF (reduc_info) != vect_orig_stmt (stmt_info))
+   {
+ for_reduction = false;
+ reduc_info = NULL;
+   }
+}
+
   if (for_reduction)
 {
   if (slp_node)
return false;
-  reduc_info = info_for_reduction (vinfo, stmt_info);
   reduction_type = STMT_VINFO_REDUC_TYPE (reduc_info);
   reduc_index = STMT_VINFO_REDUC_IDX (stmt_info);
   gcc_assert (reduction_type != EXTRACT_LAST_REDUCTION
-- 
2.17.1

[PATCH 2/6] vect: Split out partial vect checking for reduction into a function

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.

Partial vectorization checking for vectorizable_reduction is a piece of
relatively isolated code, which may be reused by other places. Move the
code into a new function for sharing.

Thanks,
Feng
---
gcc/
* tree-vect-loop.cc (vect_reduction_use_partial_vector): New function.
(vectorizable_reduction): Move partial vectorization checking code to
vect_reduction_use_partial_vector.
---
 gcc/tree-vect-loop.cc | 138 --
 1 file changed, 78 insertions(+), 60 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a42d79c7cbf..aa5f21ccd1a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7391,6 +7391,81 @@ build_vect_cond_expr (code_helper code, tree vop[3], 
tree mask,
 }
 }
 
+/* Given an operation with CODE in loop reduction path whose reduction PHI is
+   specified by REDUC_INFO, the operation has TYPE of scalar result, and its
+   input vectype is represented by VECTYPE_IN. The vectype of vectorized result
+   may be different from VECTYPE_IN, either in base type or vectype lanes,
+   lane-reducing operation is the case.  This function check if it is possible,
+   and how to perform partial vectorization on the operation in the context
+   of LOOP_VINFO.  */
+
+static void
+vect_reduction_use_partial_vector (loop_vec_info loop_vinfo,
+  stmt_vec_info reduc_info,
+  slp_tree slp_node, code_helper code,
+  tree type, tree vectype_in)
+{
+  if (!LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
+return;
+
+  enum vect_reduction_type reduc_type = STMT_VINFO_REDUC_TYPE (reduc_info);
+  internal_fn reduc_fn = STMT_VINFO_REDUC_FN (reduc_info);
+  internal_fn cond_fn = get_conditional_internal_fn (code, type);
+
+  if (reduc_type != FOLD_LEFT_REDUCTION
+  && !use_mask_by_cond_expr_p (code, cond_fn, vectype_in)
+  && (cond_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED)))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" no conditional operation is available.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else if (reduc_type == FOLD_LEFT_REDUCTION
+  && reduc_fn == IFN_LAST
+  && !expand_vec_cond_expr_p (vectype_in, truth_type_for (vectype_in),
+  SSA_NAME))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+   "can't operate on partial vectors because"
+   " no conditional operation is available.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else if (reduc_type == FOLD_LEFT_REDUCTION
+  && internal_fn_mask_index (reduc_fn) == -1
+  && FLOAT_TYPE_P (vectype_in)
+  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" signed zeros cannot be preserved.\n");
+  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+}
+  else
+{
+  internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
+  unsigned nvectors;
+
+  if (slp_node)
+   nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+  else
+   nvectors = vect_get_num_copies (loop_vinfo, vectype_in);
+
+  if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, nvectors, vectype_in, 1);
+  else
+   vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_in, NULL);
+}
+}
+
 /* Function vectorizable_reduction.
 
Check if STMT_INFO performs a reduction operation that can be vectorized.
@@ -7456,7 +7531,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   bool single_defuse_cycle = false;
   bool nested_cycle = false;
   bool double_reduc = false;
-  int vec_num;
   tree cr_index_scalar_type = NULL_TREE, cr_index_vector_type = NULL_TREE;
   tree cond_reduc_val = NULL_TREE;
 
@@ -8283,11 +8357,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  return false;
}
 
-  if (slp_node)
-vec_num = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-  else
-vec_num = 1;
-
   vect_model_reduction_cost (loop_vinfo, stmt_info, reduc_fn,
 reduction_type, ncopies, cost_vec);
   /* Cost the reducti

[PATCH 1/6] vect: Add a function to check lane-reducing code [PR114440]

2024-05-30 Thread Feng Xue OS
This is a patch that is split out from 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652626.html.

Check if an operation is lane-reducing requires comparison of code against
three kinds (DOT_PROD_EXPR/WIDEN_SUM_EXPR/SAD_EXPR).  Add an utility
function to make source coding for the check handy and concise.

Feng
--
gcc/
* tree-vectorizer.h (lane_reducing_op_p): New function.
* tree-vect-slp.cc (vect_analyze_slp): Use new function
lane_reducing_op_p to check statement code.
* tree-vect-loop.cc (vect_transform_reduction): Likewise.
(vectorizable_reduction): Likewise, and change name of a local
variable that holds the result flag.
---
 gcc/tree-vect-loop.cc | 29 -
 gcc/tree-vect-slp.cc  |  4 +---
 gcc/tree-vectorizer.h |  6 ++
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 04a9ac64df7..a42d79c7cbf 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7650,9 +7650,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   gimple_match_op op;
   if (!gimple_extract_op (stmt_info->stmt, &op))
 gcc_unreachable ();
-  bool lane_reduc_code_p = (op.code == DOT_PROD_EXPR
-   || op.code == WIDEN_SUM_EXPR
-   || op.code == SAD_EXPR);
+  bool lane_reducing = lane_reducing_op_p (op.code);
 
   if (!POINTER_TYPE_P (op.type) && !INTEGRAL_TYPE_P (op.type)
   && !SCALAR_FLOAT_TYPE_P (op.type))
@@ -7664,7 +7662,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   /* For lane-reducing ops we're reducing the number of reduction PHIs
  which means the only use of that may be in the lane-reducing operation.  
*/
-  if (lane_reduc_code_p
+  if (lane_reducing
   && reduc_chain_length != 1
   && !only_slp_reduc_chain)
 {
@@ -7678,7 +7676,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  since we'll mix lanes belonging to different reductions.  But it's
  OK to use them in a reduction chain or when the reduction group
  has just one element.  */
-  if (lane_reduc_code_p
+  if (lane_reducing
   && slp_node
   && !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
   && SLP_TREE_LANES (slp_node) > 1)
@@ -7738,7 +7736,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* To properly compute ncopies we are interested in the widest
 non-reduction input type in case we're looking at a widening
 accumulation that we later handle in vect_transform_reduction.  */
-  if (lane_reduc_code_p
+  if (lane_reducing
  && vectype_op[i]
  && (!vectype_in
  || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
@@ -8211,7 +8209,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   && loop_vinfo->suggested_unroll_factor == 1)
 single_defuse_cycle = true;
 
-  if (single_defuse_cycle || lane_reduc_code_p)
+  if (single_defuse_cycle || lane_reducing)
 {
   gcc_assert (op.code != COND_EXPR);
 
@@ -8227,7 +8225,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 mixed-sign dot-products can be implemented using signed
 dot-products.  */
   machine_mode vec_mode = TYPE_MODE (vectype_in);
-  if (!lane_reduc_code_p
+  if (!lane_reducing
  && !directly_supported_p (op.code, vectype_in, optab_vector))
 {
   if (dump_enabled_p ())
@@ -8252,7 +8250,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  For the other cases try without the single cycle optimization.  */
   if (!ok)
{
- if (lane_reduc_code_p)
+ if (lane_reducing)
return false;
  else
single_defuse_cycle = false;
@@ -8263,7 +8261,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* If the reduction stmt is one of the patterns that have lane
  reduction embedded we cannot handle the case of ! single_defuse_cycle.  */
   if ((ncopies > 1 && ! single_defuse_cycle)
-  && lane_reduc_code_p)
+  && lane_reducing)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -8274,7 +8272,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   if (slp_node
   && !(!single_defuse_cycle
-  && !lane_reduc_code_p
+  && !lane_reducing
   && reduction_type != FOLD_LEFT_REDUCTION))
 for (i = 0; i < (int) op.num_ops; i++)
   if (!vect_maybe_update_slp_op_vectype (slp_op[i], vectype_op[i]))
@@ -8295,7 +8293,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   /* Cost the reduction op inside the loop if transformed via
  vect_transform_reduction.  Otherwise this is costed by the
  separate vectorizable_* routines.  */
-  if (single_defuse_cycle || lane_reduc_code_p)
+  if (single_defuse_cycle || lane_reducing)
 {
   int factor = 1;
   if (vect_is_emulated_mixed_dot_prod (loop_vinfo, stmt_info))
@@ -8313,7 +8311,7 @@ vectori

Re: [PATCH] ira: Fix go_through_subreg offset calculation [PR115281]

2024-05-30 Thread Vladimir Makarov



On 5/30/24 03:59, Richard Sandiford wrote:


Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Yes.  Thank you, Richard.


gcc/
PR rtl-optimization/115281
* ira-conflicts.cc (go_through_subreg): Use the natural size of
the inner mode rather than the outer mode.

gcc/testsuite/
PR rtl-optimization/115281
* gfortran.dg/pr115281.f90: New test.




Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-30 Thread Manolis Tsamis
On Fri, May 24, 2024 at 9:27 AM Richard Biener  wrote:
>
> On Thu, 23 May 2024, Manolis Tsamis wrote:
>
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.
> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
>
> How do we represent atomics?  If the latter is a load-acquire or release
> the transform would be invalid.
>
As you noted, this transformation cannot work with acquire/release, so
when the pass finds a volatile_refs_p instruction it drops all
candidates so there is no correctness issue.

> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> >   PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add avoid-store-forwarding.o.
> >   * common.opt: New option -favoid-store-forwarding.
> >   * params.opt: New param store-forwarding-max-distance.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >   * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/avoid-store-forwarding-1.c: New test.
> >   * gcc.dg/avoid-store-forwarding-2.c: New test.
> >   * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> >   statistics.o \
> >   stmt.o \
> >   stor-layout.o \
> > + avoid-store-forwarding.o \
> >   store-motion.o \
> >   streamer-hooks.o \
> >   stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with GCC; see the file COPYING3.  If not see
> > +   .  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "alias.h"
> > +#include "rtlanal.h"
> > +#include "tree-pass.h"
> > +#include "cselib.h"
> > +#include "predict.h"
> > +#include "insn-config.h"
> > +#include "expmed.h"
> > +#include "recog.h"
> > +#include "regset.h"
> > +#include "df.h"
> > +#include "expr.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "vec.h"
> > +
> > +/* This pass tries to detect and avoid cases of store forwarding.
> > +   On many processors there is a large penalty when smaller stores are
> > +   forwarded to larger loads.  The idea used to avoid the stall is to move
> > +   the store after the load and in addition emit a bit insert sequence so
> > +   the load register has the correct value.  For example the following:
> > +
> > + strbw2, [x1

[PATCH v4] Match: Support more form for scalar unsigned SAT_ADD

2024-05-30 Thread pan2 . li
From: Pan Li 

After we support one gassign form of the unsigned .SAT_ADD,  we
would like to support more forms including both the branch and
branchless.  There are 5 other forms of .SAT_ADD,  list as below:

Form 1:
  #define SAT_ADD_U_1(T) \
  T sat_add_u_1_##T(T x, T y) \
  { \
return (T)(x + y) >= x ? (x + y) : -1; \
  }

Form 2:
  #define SAT_ADD_U_2(T) \
  T sat_add_u_2_##T(T x, T y) \
  { \
T ret; \
T overflow = __builtin_add_overflow (x, y, &ret); \
return (T)(-overflow) | ret; \
  }

Form 3:
  #define SAT_ADD_U_3(T) \
  T sat_add_u_3_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
  }

Form 4:
  #define SAT_ADD_U_4(T) \
  T sat_add_u_4_##T (T x, T y) \
  { \
T ret; \
return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
  }

Form 5:
  #define SAT_ADD_U_5(T) \
  T sat_add_u_5_##T(T x, T y) \
  { \
return (T)(x + y) < x ? -1 : (x + y); \
  }

Take the forms 3 of above as example:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  uint64_t ret;
  return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
}

Before this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  long unsigned int _2;
  uint64_t _3;
  __complex__ long unsigned int _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  _2 = IMAGPART_EXPR <_6>;
  if (_2 != 0)
goto ; [35.00%]
  else
goto ; [65.00%]
;;succ:   4
;;3

;;   basic block 3, loop depth 0
;;pred:   2
  _1 = REALPART_EXPR <_6>;
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   3
;;2
  # _3 = PHI <_1(3), 18446744073709551615(2)>
  return _3;
;;succ:   EXIT
}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _12;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  return _12;
;;succ:   EXIT
}

The flag '^' acts on cond_expr will generate matching code similar as below:

else if (gphi *_a1 = dyn_cast  (_d1))
  {
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : ...
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_1_1 : ...
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
tree _p0 = build2 (gimple_cond_code (_ct_1), boolean_type_node, 
_cond_lhs_1, ...);
bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 0)->flags  & 
EDGE_TRUE_VALUE;
tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 : 1);
tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 : 0);
switch (TREE_CODE (_p0))
  ...

The below test suites are still running, will update it later.
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* doc/match-and-simplify.texi: Add doc for the matching flag '^'.
* genmatch.cc (enum expr_flag): Add new enum for expr flag.
(dt_node::gen_kids_1): Add cond_expr and flag handling.
(dt_operand::gen_phi_on_cond): Add new func to gen phi matching
on cond_expr.
(parser::parse_expr): Add handling for the expr flag '^'.
* match.pd: Add more form for unsigned .SAT_ADD.
* tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
(match_assign_saturation_arith): Rename to.
(match_phi_saturation_arith): Add new func impl to match the
.SAT_ADD when phi.
(math_opts_dom_walker::after_dom_children): Add phi matching
try for all gimple phi stmt.

Signed-off-by: Pan Li 
---
 gcc/doc/match-and-simplify.texi |  14 
 gcc/genmatch.cc | 126 +++-
 gcc/match.pd|  43 ++-
 gcc/tree-ssa-math-opts.cc   |  51 -
 4 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/gcc/doc/match-and-simplify.texi b/gcc/doc/match-and-simplify.texi
index 01f19e2f62c..fc0cf6d7552 100644
--- a/gcc/doc/match-and-simplify.texi
+++ b/gcc/doc/match-and-simplify.texi
@@ -361,6 +361,20 @@ Usually the types of the generated result expressions are
 determined from the context, but sometimes like in the above case
 it is required that you specify them explicitly.
 
+Another modifier for generated expressions is @code{^} which
+tells the machinery to try more matches for some special cases.
+Normally the @code{cond} only allo

Re: [COMMITTED 01/12] - Move all relation queries into relation_oracle.

2024-05-30 Thread Mikael Morin

Hello...

Le 23/05/2024 à 22:52, Andrew MacLeod a écrit :
A range-query currently provides a couple of relation query routines, 
plus it also provides direct access to an oracle.   This patch moves 
those queries into the oracle where they should be, and ands the ability 
to create and destroy the basic dominance oracle ranger uses.  This is 
the usual oracle most passes would want, and this provides full access 
to it if ranger has been enabled.  It also allows passes which do not 
use ranger to turn on an oracle and work with it.


Full documentation  for relations and the oracle can be found at: 
https://gcc.gnu.org/wiki/AndrewMacLeod/Relations


Moving the queries into the oracle removes the need to check for a NULL 
pointer on every query, and results in speeding up VRP by about 0.7%


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.



(...)


diff --git a/gcc/value-query.cc b/gcc/value-query.cc
index c2ab745a466..b275a43b679 100644
--- a/gcc/value-query.cc
+++ b/gcc/value-query.cc
@@ -437,53 +439,3 @@ global_range_query::range_of_expr (vrange &r, tree expr, 
gimple *stmt)
 
   return true;

 }
-
-// Return any known relation between SSA1 and SSA2 before stmt S is executed.
-// If GET_RANGE is true, query the range of both operands first to ensure
-// the definitions have been processed and any relations have be created.
-
-relation_kind
-range_query::query_relation (gimple *s, tree ssa1, tree ssa2, bool get_range) 
> -{
-  if (!m_oracle || TREE_CODE (ssa1) != SSA_NAME || TREE_CODE (ssa2) != 
SSA_NAME)
-return VREL_VARYING;
-
-  // Ensure ssa1 and ssa2 have both been evaluated.
-  if (get_range)
-{
-  Value_Range tmp1 (TREE_TYPE (ssa1));
-  Value_Range tmp2 (TREE_TYPE (ssa2));
-  range_of_expr (tmp1, ssa1, s);
-  range_of_expr (tmp2, ssa2, s);
-}
-  return m_oracle->query_relation (gimple_bb (s), ssa1, ssa2);
-}
-
-// Return any known relation between SSA1 and SSA2 on edge E.
-// If GET_RANGE is true, query the range of both operands first to ensure
-// the definitions have been processed and any relations have be created.
-
-relation_kind
-range_query::query_relation (edge e, tree ssa1, tree ssa2, bool get_range)
-{
-  basic_block bb;
-  if (!m_oracle || TREE_CODE (ssa1) != SSA_NAME || TREE_CODE (ssa2) != 
SSA_NAME)
-return VREL_VARYING;
-
-  // Use destination block if it has a single predecessor, and this picks
-  // up any relation on the edge.
-  // Otherwise choose the src edge and the result is the same as on-exit.
-  if (!single_pred_p (e->dest))
-bb = e->src;
-  else
-bb = e->dest;
-
-  // Ensure ssa1 and ssa2 have both been evaluated.
-  if (get_range)
-{
-  Value_Range tmp (TREE_TYPE (ssa1));
-  range_on_edge (tmp, e, ssa1);
-  range_on_edge (tmp, e, ssa2);
-}
-  return m_oracle->query_relation (bb, ssa1, ssa2);
-}


(...)


diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 619ee5f0867..d1081b3b3f5 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -288,6 +288,39 @@ relation_oracle::valid_equivs (bitmap b, const_bitmap 
equivs, basic_block bb)
 }
 }
 
+// Return any known relation between SSA1 and SSA2 before stmt S is executed.

+// If GET_RANGE is true, query the range of both operands first to ensure
+// the definitions have been processed and any relations have be created.


The method lost its argument GET_RANGE in the move from value-query.cc, 
so the documentation for GET_RANGE can be removed as well.



+
+relation_kind
+relation_oracle::query_relation (gimple *s, tree ssa1, tree ssa2)
+{
+  if (TREE_CODE (ssa1) != SSA_NAME || TREE_CODE (ssa2) != SSA_NAME)
+return VREL_VARYING;
+  return query_relation (gimple_bb (s), ssa1, ssa2);
+}
+
+// Return any known relation between SSA1 and SSA2 on edge E.
+// If GET_RANGE is true, query the range of both operands first to ensure
+// the definitions have been processed and any relations have be created.


Same here.


+
+relation_kind
+relation_oracle::query_relation (edge e, tree ssa1, tree ssa2)
+{
+  basic_block bb;
+  if (TREE_CODE (ssa1) != SSA_NAME || TREE_CODE (ssa2) != SSA_NAME)
+return VREL_VARYING;
+
+  // Use destination block if it has a single predecessor, and this picks
+  // up any relation on the edge.
+  // Otherwise choose the src edge and the result is the same as on-exit.
+  if (!single_pred_p (e->dest))
+bb = e->src;
+  else
+bb = e->dest;
+
+  return query_relation (bb, ssa1, ssa2);
+}
 // -
 
 // The very first element in the m_equiv chain is actually just a summary


Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-30 Thread Feng Xue OS
>> Hi,
>>
>> The patch was updated with the newest trunk, and also contained some minor 
>> changes.
>>
>> I am working on another new feature which is meant to support pattern 
>> recognition
>> of lane-reducing operations in affine closure originated from loop reduction 
>> variable,
>> like:
>>
>>   sum += cst1 * dot_prod_1 + cst2 * sad_2 + ... + cstN * lane_reducing_op_N
>>
>> The feature WIP depends on the patch. It has been a little bit long time 
>> since its post,
>> would you please take a time to review this one? Thanks.

> This seems to do multiple things so I wonder if you can split up the
> patch a bit?

OK. Will send out split patches in new mails.

> For example adding lane_reducing_op_p can be split out, it also seems like
> the vect_transform_reduction change to better distribute work can be done
> separately?  Likewise refactoring like splitting out
> vect_reduction_use_partial_vector.
> 
> When we have
> 
>sum += d0[i] * d1[i];  // dot-prod 
>sum += w[i];   // widen-sum 
>sum += abs(s0[i] - s1[i]); // sad 
>sum += n[i];   // normal 
> 
> the vector DOT_PROD and friend ops can end up mixing different lanes
> since it is not specified which lanes are reduced into which output lane.
> So, DOT_PROD might combine 0-3, 4-7, ... but SAD might combine
> 0,4,8,12; 1,5,9,13; ... I think this isn't worse than what one op itself
> is doing, but it's worth pointing out (it's probably unlikely a target
> mixes different reduction strategies anyway).

Yes. But even on a peculiar target, DOT_PROD and SAD have different reduction
strategies, it does not impact result correctness, at least for integer 
operation.
Is there anything special that we need to consider?

> 
> Can you make sure to add at least one SLP reduction example to show
> this works for SLP as well?
OK. The patches contains the cases for SLP reduction chain. Will add one for 
SLP reduction, this should be a negative case.

Thanks,
Feng


[COMMITTED] ggc: Reduce GGC_QUIRE_SIZE on Solaris/SPARC [PR115031]

2024-05-30 Thread Rainer Orth
g++.dg/modules/pr99023_b.X currently FAILs on 32-bit Solaris/SPARC:

FAIL: g++.dg/modules/pr99023_b.X -std=c++2a  1 blank line(s) in output
FAIL: g++.dg/modules/pr99023_b.X -std=c++2a (test for excess errors)

Excess errors:
cc1plus: out of memory allocating 1048344 bytes after a total of 7913472 bytes

It turns out that this exhaustion of the 32-bit address space happens
due to a combination of three issues:

* the SPARC pagesize of 8 kB,

* ggc-page.cc's chunk size of 512 * pagesize, i.e. 4 MB, and

* mmap adding two 8 kB unmapped red-zone pages to each mapping

which result in the 4 MB mappings to actually consume 4.5 MB of address
space.

To avoid this, this patch reduces the chunk size so it remains at 4 MB
even when combined with the red-zone pages, as recommended by mmap(2).

Tested on sparc-sun-solaris2.11 and sparcv9-sun-solaris2.11, committed
to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-29  Rainer Orth  

gcc:
PR c++/115031
* config/sparc/sol2.h (GGC_QUIRE_SIZE): Define as 510.

# HG changeset patch
# Parent  5c140588dce73c5358387df3bb9597de45fa5524
ggc: Reduce GGC_QUIRE_SIZE on 32-bit Solaris/SPARC [PR115031]

diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h
--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@@ -38,6 +38,9 @@ along with GCC; see the file COPYING3.  
 #undef SPARC_DEFAULT_CMODEL
 #define SPARC_DEFAULT_CMODEL CM_MEDMID
 
+/* Redue ggc-page.cc's chunk size to account for mmap red-zone pages.  */
+#define GGC_QUIRE_SIZE 510
+
 /* Select a format to encode pointers in exception handling data.  CODE
is 0 for data, 1 for code labels, 2 for function pointers.  GLOBAL is
true if the symbol may be affected by dynamic relocations.


Re: [PATCH 00/11] AArch64/OpenMP: Test SVE ACLE types with various OpenMP constructs.

2024-05-30 Thread Richard Sandiford
Tejas Belagod  writes:
> Note: This patch series is based on Richard's initial patch
>   https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606741.html
> and Jakub's suggestion
>   https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611892.html
>
> The following patch series handles various scenarios with OpenMP and SVE 
> types.
> The starting point for the series follows a suggestion from Jakub to cover 
> all 
> the possible scenarios that could arise when OMP constructs/clauses etc are 
> used with SVE ACLE types. Here are a few instances that this patch series 
> tests
> and in some cases fixes the expected output.  This patch series does not 
> follow
> a formal definition or a spec of how OMP interacts with SVE ACLE types, so 
> it's 
> more of a proposed behaviour.  Comments and discussion welcome.

Thanks for doing this.  I've left some comments on individual patches,
but generally the series looks good from my limited abilit to evaluate it.
Hopefully Jakub can say whether this catches all the cases that matter.

Richard

> This list is not exhaustive, but covers most scenarios of how SVE ACLE types
> ought to interact with OMP constructs/clauses.
>
> 1. Poly-int structures that represent variable-sized objects and OMP runtime.
>
> Currently poly-int type structures are passed by value to OpenMP runtime
> functions for shared clauses etc.  This patch improves on this by passing
> around poly-int structures by address to avoid copy-overhead.
>
> 2. SVE ACLE types in OMP Shared clauses.
>
> We test the behaviour where SVE ACLE type objects are shared in the following
> methods into an OMP region:
>   a. Explicit Shared clause on SVE ACLE type objects.
>   b. Implicit shared clause.
>   c. Implicit shared with default clause.
>   d. SVE ALCE types in the presence of predetermined (static) shared objects.
>
> The associated tests ensure that all such shared objects are passed by address
> into the OMP runtime.  There are runtime tests to verify the functional
> correctness of the change.
>
> 3. Offloading and SVE ACLE types.
>
> The target clause in OpenMP is used to offload loop kernels to accelerator
> peripeherals.  target's 'map' clause is used to move data from and to the 
> accelarator.  When the data is SVE type, it may not be suitable because of
> various reasons i.e. the two SVE targets may not agree on vector size or
> some targets don't support variable vector size.  This makes SVE unsuitable
> for use in OMP's 'map' clause.  We diagnose all such cases and issue errors
> where appropriate.  The cases we cover in this patch are:
>
>   a. Implicitly-mapped SVE ACLE types in OMP target regions are diagnosed.
>   b. Explicitly-mapped SVE ACLE types in OMP target regions using map clause
>  are diagnosed.
>   c. Explicilty-mapped SVLE ACLE types of various directions - to, from, 
> tofrom
>  in the map clause are diagnosed.
>   d. target enter and exit data clauses with map on SVE ACLE types are 
>  diagnosed.
>   e. target data map with alloc on SVE ACLE types are diagnosed.
>   f. target update from clause on SVE ACLE types are diagnosed.
>   g. target private firstprivate with SVE ACLE types are diagnosed.
>   h. All combinations of target with work-sharing constructs like parallel,
>  loop, simd, teams, distribute etc are also diagnosed when SVE ACLE types
>  are involved.
>
> 3. Lastprivate and SVE ACLE types.
>
> Various OpenMP lastprivate clause scenarios with SVE object types are 
> diagnosed.  Worksharing constructs like sections, for, distribute bind to an
> implicit outer parallel region in whose scope SVE ACLE types are declared and 
> are therefore default private.  The lastprivate clause list with SVE ACLE type
> object items are diagnosed in this scenario.
>
> 4. Threadprivate on SVE ACLE type objects.
>
> We ensure threadprivate SVE ACLE type objects are supported. We also ensure
> copyin clause is also supported.
>
> 5. User-Defined Reductions on SVE ACLE types.
>
> We define a reduction using OMP declare reduction using SVE ACLE intrinsics 
> and
> ensure its functional correctness with various work-sharing constructs like
> for, simd, parallel, task, taskloop.
>
> 6. Uniform and Aligned Clause with SVE ACLE
>
> We ensure the uniform clause's functional correctness with simd construct and
> associated SVE ACLE intrinsics in the simd region.  There is no direct
> interaction between uniform and SVE ACLE type objects, but we ensure the 
> uniform
> clause applies correctly to a region where SVE ACLE intrinsics are present.
> Similarly for the aligned clause.
>
> 7. Linear clause and SVE ACLE type.
>
> We diagnose if a linear clause list item has SVE ACLE type objects present.
> Its doesn't mean much if the linear clause is applied to SVE ACLE types.
>
> 8. Depend clause and SVE ACLE objects.
>
> We test for functional correctness many combinations of dependency of shared
> SVE ACLE type objects in parallel regions.  We test if in, out dependencies 
> 

Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-30 Thread Richard Sandiford
Tejas Belagod  writes:
> Currently poly-int type structures are passed by value to OpenMP runtime
> functions for shared clauses etc.  This patch improves on this by passing
> around poly-int structures by address to avoid copy-overhead.
>
> gcc/ChangeLog
>   * omp-low.c (use_pointer_for_field): Use pointer if the OMP data
>   structure's field type is a poly-int.
> ---
>  gcc/omp-low.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
> index 1a65229cc37..b15607f4ef5 100644
> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -466,7 +466,8 @@ static bool
>  use_pointer_for_field (tree decl, omp_context *shared_ctx)
>  {
>if (AGGREGATE_TYPE_P (TREE_TYPE (decl))
> -  || TYPE_ATOMIC (TREE_TYPE (decl)))
> +  || TYPE_ATOMIC (TREE_TYPE (decl))
> +  || POLY_INT_CST_P (DECL_SIZE (decl)))
>  return true;
>  
>/* We can only use copy-in/copy-out semantics for shared variables

Realise this is also true of my original patch, but:

I suppose a question here is whether this function is only ever used for
local interfaces between code generated by the same source code function,
or whether it's ABI in a more general sense.  If the latter, I suppose
we should make sure to handle ACLE types the same way regardless of
whether the SVE vector size is known.

(At the moment, the vector size is fixed for a TU, not just a function,
but we should probably plan for relaxing that in future.)

Thanks,
Richard


Re: [PATCH 03/11] AArch64: Diagnose OpenMP offloading when SVE types involved.

2024-05-30 Thread Richard Sandiford
Tejas Belagod  writes:
> The target clause in OpenMP is used to offload loop kernels to accelarator
> peripeherals.  target's 'map' clause is used to move data from and to the
> accelarator.  When the data is SVE type, it may not be suitable because of
> various reasons i.e. the two SVE targets may not agree on vector size or
> some targets don't support variable vector size.  This makes SVE unsuitable
> for use in OMP's 'map' clause.  This patch diagnoses all such cases and issues
> an error where SVE types are not suitable.
>
> Co-authored-by: Andrea Corallo 
>
> gcc/ChangeLog:
>
>   * target.h (type_context_kind): Add new context kinds for target 
> clauses.
>   * config/aarch64/aarch64-sve-builtins.cc (verify_type_context): Diagnose
>   SVE types for a given OpenMP context.
>   * gimplify.cc (omp_notice_variable):  Diagnose implicitly-mapped SVE
>   objects in OpenMP regions.
>   (gimplify_scan_omp_clauses): Diagnose SVE types for various target
>   clauses.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/omp/offload-1.c: New test.
>   * gcc.target/aarch64/sve/omp/offload-2.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-parallel-loop.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-parallel.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-simd.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-teams-distribute.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-teams-loop.c: Likewise.
>   * gcc.target/aarch64/sve/omp/offload-teams.c: Likewise.
>   * gcc.target/aarch64/sve/omp/target-device.c: Likewise.
>   * gcc.target/aarch64/sve/omp/target-link.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-sve-builtins.cc|  31 +++
>  gcc/gimplify.cc   |  34 ++-
>  gcc/target.h  |  19 +-
>  .../gcc.target/aarch64/sve/omp/offload-1.c| 237 ++
>  .../gcc.target/aarch64/sve/omp/offload-2.c| 198 +++
>  .../aarch64/sve/omp/offload-parallel-loop.c   | 236 +
>  .../aarch64/sve/omp/offload-parallel.c| 195 ++
>  .../gcc.target/aarch64/sve/omp/offload-simd.c | 236 +
>  .../sve/omp/offload-teams-distribute-simd.c   | 237 ++
>  .../sve/omp/offload-teams-distribute.c| 236 +
>  .../aarch64/sve/omp/offload-teams-loop.c  | 237 ++
>  .../aarch64/sve/omp/offload-teams.c   | 195 ++
>  .../aarch64/sve/omp/target-device.c   |  97 +++
>  .../gcc.target/aarch64/sve/omp/target-link.c  |  48 
>  14 files changed, 2234 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel-loop.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/omp/offload-parallel.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-simd.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute-simd.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-distribute.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams-loop.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/offload-teams.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-device.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/omp/target-link.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index f3983a123e3..ee1064c3bb7 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -5000,6 +5000,29 @@ bool
>  verify_type_context (location_t loc, type_context_kind context,
>const_tree type, bool silent_p)
>  {
> +  if (aarch64_sve::builtin_type_p (type)
> +  || (POINTER_TYPE_P (type)
> +   && aarch64_sve::builtin_type_p (TREE_TYPE (type

Could you say in more detail why we check for zero or one levels
of pointer indirection but not for more?

Also, was there a reason for checking builtin_type_p rather than
sizeless_type_p?  Things like svbool_t remain sizeless even for
-msve-vector-bits=128 etc., so sizeless_type_p would still cover
that case.  But arm_sve_vector_bits makes it possible to define
fixed-length vector types that are treated for ABI & ACLE purposes
like SVE types.  I don't think those should be treated differently
from normal vectors by omp, since the size is fixed by the attribute
(and types with different attributes are distinct).

Thanks,
Richard

> +switch (context)
> +{
> +  case TCTX_OMP_MAP:
> + error_at (loc, "SVE type %qT not allowed in 

[pushed] Add new text_art::tree_widget and use it in analyzer

2024-05-30 Thread David Malcolm
This patch adds a new text_art::tree_widget, which makes it easy
to generate hierarchical visualizations using either ASCII:

  +- Child 0
  |  +- Grandchild 0 0
  |  +- Grandchild 0 1
  |  `- Grandchild 0 2
  +- Child 1
  |  +- Grandchild 1 0
  |  +- Grandchild 1 1
  |  `- Grandchild 1 2
  `- Child 2
 +- Grandchild 2 0
 +- Grandchild 2 1
 `- Grandchild 2 2

or Unicode:

  Root
  ├─ Child 0
  │  ├─ Grandchild 0 0
  │  ├─ Grandchild 0 1
  │  ╰─ Grandchild 0 2
  ├─ Child 1
  │  ├─ Grandchild 1 0
  │  ├─ Grandchild 1 1
  │  ╰─ Grandchild 1 2
  ╰─ Child 2
 ├─ Grandchild 2 0
 ├─ Grandchild 2 1
 ╰─ Grandchild 2 2

potentially with colorization of the connecting lines.

It adds a new template for typename T:

  void text_art::dump (const T&);

for using this to dump any object to stderr that supports a
make_dump_widget method, with similar templates for dumping to
a pretty_printer * and a FILE *.

It uses this within the analyzer to add two new families of dumping
methods: one for program states, e.g.:

(gdb) call state->dump()
State
├─ Region Model
│  ├─ Current Frame: frame: ‘calls_malloc’@2
│  ├─ Store
│  │  ├─ m_called_unknown_fn: false
│  │  ├─ frame: ‘test’@1
│  │  │  ╰─ _1: (INIT_VAL(n_2(D))*(size_t)4)
│  │  ╰─ frame: ‘calls_malloc’@2
│  │ ├─ result_4: &HEAP_ALLOCATED_REGION(27)
│  │ ╰─ _5: &HEAP_ALLOCATED_REGION(27)
│  ╰─ Dynamic Extents
│ ╰─ HEAP_ALLOCATED_REGION(27): (INIT_VAL(n_2(D))*(size_t)4)
╰─ ‘malloc’ state machine
   ╰─ 0x468cb40: &HEAP_ALLOCATED_REGION(27): unchecked ({free}) (‘result_4’)

and the other for showing the detail of the recursive makeup of svalues
and regions, e.g. the (INIT_VAL(n_2(D))*(size_t)4) from above:

(gdb) call size_in_bytes->dump()
(17): ‘long unsigned int’: binop_svalue(mult_expr: ‘*’)
├─ (15): ‘size_t’: initial_svalue
│  ╰─ m_reg: (12): ‘size_t’: decl_region(‘n_2(D)’)
│ ╰─ parent: (9): frame_region(‘test’, index: 0, depth: 1)
│╰─ parent: (1): stack region
│   ╰─ parent: (0): root region
╰─ (16): ‘size_t’: constant_svalue (‘4’)

I've already found both of these useful when debugging analyzer issues.

The patch uses the former to update the output of
-fdump-analyzer-exploded-nodes-2 and
-fdump-analyzer-exploded-nodes-3.

The older dumping functions within the analyzer are retained in case
they turn out to still be useful for debugging.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-925-g97238e4217d5cd.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add text-art/tree-widget.o.
* doc/analyzer.texi: Rewrite discussion of dumping state to
cover the text_art::tree_widget-based dumps, with a more
interesting example.
* text-art/dump-widget-info.h: New file.
* text-art/dump.h: New file.
* text-art/selftests.cc (selftest::text_art_tests): Call
text_art_tree_widget_cc_tests.
* text-art/selftests.h (selftest::text_art_tree_widget_cc_tests):
New decl.
* text-art/theme.cc (ascii_theme::get_cppchar): Handle the various
cell_kind::TREE_*.
(unicode_theme::get_cppchar): Likewise.
* text-art/theme.h (enum class theme::cell_kind): Add
TREE_CHILD_NON_FINAL, TREE_CHILD_FINAL, TREE_X_CONNECTOR, and
TREE_Y_CONNECTOR.
* text-art/tree-widget.cc: New file.

gcc/analyzer/ChangeLog:
* call-details.cc: Define INCLUDE_VECTOR.
* call-info.cc: Likewise.
* call-summary.cc: Likewise.
* checker-event.cc: Likewise.
* checker-path.cc: Likewise.
* complexity.cc: Likewise.
* constraint-manager.cc: Likewise.
(bounded_range::make_dump_widget): New.
(bounded_ranges::add_to_dump_widget): New.
(equiv_class::make_dump_widget): New.
(constraint::make_dump_widget): New.
(bounded_ranges_constraint::make_dump_widget): New.
(constraint_manager::make_dump_widget): New.
* constraint-manager.h (bounded_range::make_dump_widget): New
decl.
(bounded_ranges::add_to_dump_widget): New decl.
(equiv_class::make_dump_widget): New decl.
(constraint::make_dump_widget): New decl.
(bounded_ranges_constraint::make_dump_widget): New decl.
(constraint_manager::make_dump_widget): New decl.
* diagnostic-manager.cc: Define INCLUDE_VECTOR.
* engine.cc: Likewise.  Include "text-art/dump.h".
(setjmp_svalue::print_dump_widget_label): New.
(setjmp_svalue::add_dump_widget_children): New.
(exploded_graph::dump_exploded_nodes): Use text_art::dump_to_file
for -fdump-analyzer-exploded-nodes-2 and
-fdump-analyzer-exploded-nodes-3.  Fix overlong line.
* feasible-graph.cc: Define INCLUDE_VECTOR.
* infinite-recursion.cc: Likewise.
* kf-analyzer.cc: Likewise.
* kf-lang-cp.cc: Likewise.
* kf.cc: Likewise.
* known-function-manager.cc: Likewise.
* pending-diagnost

[pushed] analyzer: fix a -Wunused-parameter

2024-05-30 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-926-g0b3a3a66eb816b.

gcc/analyzer/ChangeLog:
* infinite-loop.cc (looping_back_event::get_desc): Fix unused
parameter warning introduced by me in r15-636-g770657d02c986c.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/infinite-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/analyzer/infinite-loop.cc b/gcc/analyzer/infinite-loop.cc
index 04346cdfdc3c..a83b8130e43c 100644
--- a/gcc/analyzer/infinite-loop.cc
+++ b/gcc/analyzer/infinite-loop.cc
@@ -171,7 +171,7 @@ public:
   {
   }
 
-  label_text get_desc (bool can_colorize) const final override
+  label_text get_desc (bool) const final override
   {
 return label_text::borrow ("looping back...");
   }
-- 
2.26.3



Re: [PATCH 02/11] AArch64: Add test cases for SVE types in OpenMP shared clause.

2024-05-30 Thread Richard Sandiford
Tejas Belagod  writes:
> This patch tests various shared clauses with SVE types.  It also adds a test
> scaffold to run OpenMP tests in under the gcc.target testsuite.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/omp/aarch64-sve-omp.exp: New scaffold.

Hopefully Jakub can comment on whether we should test this in the
GCC testsuite or libgomp testsuite.

On the test:

> [...]
> +int
> +main ()
> +{
> +  svint32_t x = svindex_s32 (0 ,1);
> +  svint32_t y = svindex_s32 (8, 1);
> +  svint32_t a, b;
> +  svbool_t p;
> +
> +  /* Implicit shared.  */
> +  a = foo (x, y, p);
> +  b = implicit_shared_default (x, y, p);

It looks like p is used uninitialised here.  Can you check locally
that using svptrue_b8 () (or whatever) as an initialiser allows the
test to pass while svpfalse_b () causes it to fail?

Thanks,
Richard

> +  compare_vec (a, b);
> +
> +  /* Explicit shared.  */
> +  a = foo (x ,y, p);
> +  b = explicit_shared (x, y, p);
> +  compare_vec (a, b);
> +
> +  /* Implicit shared with no default clause.  */
> +  a = foo (x ,y, p);
> +  b = implicit_shared_no_default (x, y, p);
> +  compare_vec (a, b);
> +
> +  /* Mix shared.  */
> +  a = foo (x ,y, p);
> +  b = mix_shared (y, p);
> +  compare_vec (a, b);
> +
> +  /* Predetermined shared.  */
> +  predetermined_shared_static (true);
> +  predetermined_shared_static (false);
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "value-expr: \*.omp_data_i->a" 10 
> "ompexp" } } */


Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-30 Thread Ajit Agarwal
Hello Richard:

On 30/05/24 4:44 pm, Richard Sandiford wrote:
> Thanks for the update.  Some comments below, but looks very close
> to ready.
> 

Thanks a lot.

> Ajit Agarwal  writes:
>> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
>> new file mode 100644
>> index 000..060fd95
>> --- /dev/null
>> +++ b/gcc/pair-fusion.cc
>> @@ -0,0 +1,3012 @@
>> +// Pass to fuse adjacent loads/stores into paired memory accesses.
>> +// Copyright (C) 2024 Free Software Foundation, Inc.
> 
> This should probably be 2023-2024, since it's based on code
> contributed in 2023.
> 

Addressed in v5 of the patch.

>> +//
>> +// This file is part of GCC.
>> +//
>> +// GCC is free software; you can redistribute it and/or modify it
>> +// under the terms of the GNU General Public License as published by
>> +// the Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +//
>> +// GCC is distributed in the hope that it will be useful, but
>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +// General Public License for more details.
>> +//
>> +// You should have received a copy of the GNU General Public License
>> +// along with GCC; see the file COPYING3.  If not see
>> +// .
>> +
>> +#define INCLUDE_ALGORITHM
>> +#define INCLUDE_FUNCTIONAL
>> +#define INCLUDE_LIST
>> +#define INCLUDE_TYPE_TRAITS
>> +#include "config.h"
>> +#include "system.h"
>> +#include "coretypes.h"
>> +#include "backend.h"
>> +#include "rtl.h"
>> +#include "df.h"
>> +#include "rtl-iter.h"
>> +#include "rtl-ssa.h"
>> +#include "cfgcleanup.h"
>> +#include "tree-pass.h"
>> +#include "ordered-hash-map.h"
>> +#include "tree-dfa.h"
>> +#include "fold-const.h"
>> +#include "tree-hash-traits.h"
>> +#include "print-tree.h"
>> +#include "pair-fusion.h"
>> +
>> +using namespace rtl_ssa;
>> +
>> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
>> +// (LFS) which we use as part of the key into the main hash tables.
>> +//
>> +// The idea is that we group candidates together only if they agree on
>> +// the fields below.  Candidates that disagree on any of these
>> +// properties shouldn't be merged together.
>> +struct lfs_fields
>> +{
>> +  bool load_p;
>> +  bool fpsimd_p;
>> +  unsigned size;
>> +};
>> +
>> +using insn_list_t = std::list;
>> +
>> +// Information about the accesses at a given offset from a particular
>> +// base.  Stored in an access_group, see below.
>> +struct access_record
>> +{
>> +  poly_int64 offset;
>> +  std::list cand_insns;
>> +  std::list::iterator place;
>> +
>> +  access_record (poly_int64 off) : offset (off) {}
>> +};
>> +
>> +// A group of accesses where adjacent accesses could be ldp/stp
>> +// candidates.  The splay tree supports efficient insertion,
>> +// while the list supports efficient iteration.
>> +struct access_group
>> +{
>> +  splay_tree tree;
>> +  std::list list;
>> +
>> +  template
>> +  inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn);
>> +};
>> +
>> +// Test if this base candidate is viable according to HAZARDS.
>> +bool base_cand::viable () const
> 
> Formating nit, should be:
> 
> bool
> base_cand::viable () const
>

Addressed in v5 of the patch.

 
>> +{
>> +  return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]);
>> +}
>> [...]
>> +void
>> +pair_fusion_bb_info::transform ()
>> +{
>> +  traverse_base_map (expr_map);
>> +  traverse_base_map (def_map);
>> +}
>> +
>> +// the base register which we can fold in to make this pair use
>> +// a writeback addressing mode.
> 
> The first line of this comment is missing.  It should be:
> 
> // Given an existing pair insn INSN, look for a trailing update of
> 

Addressed in v5 of the patch.

>> [...]
>> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
>> new file mode 100644
>> index 000..f295fdbdb8f
>> --- /dev/null
>> +++ b/gcc/pair-fusion.h
>> @@ -0,0 +1,195 @@
>> +// Pass to fuse adjacent loads/stores into paired memory accesses.
>> +//
>> +// This file contains the definition of the virtual base class which is
>> +// overriden by targets that make use of the pass.
>> +//
>> +// Copyright (C) 2024 Free Software Foundation, Inc.
> 
> 2023-2024 here too
>

Addressed in v5 of the patch.
 
>> +//
>> +// This file is part of GCC.
>> +//
>> +// GCC is free software; you can redistribute it and/or modify it
>> +// under the terms of the GNU General Public License as published by
>> +// the Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +//
>> +// GCC is distributed in the hope that it will be useful, but
>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +// General Public License for more details.
>> +//
>> +// You should have received a copy of the GNU General Public License
>> +// along with GCC; see the file 

[PATCH v10 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-05-30 Thread Qing Zhao
gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): New function.
(call_object_size): Call the new function.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
* gcc.dg/flex-array-counted-by-3.c: New test.
* gcc.dg/flex-array-counted-by-4.c: New test.
* gcc.dg/flex-array-counted-by-5.c: New test.
---
 .../gcc.dg/builtin-object-size-common.h   |  11 ++
 .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
 .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
 .../gcc.dg/flex-array-counted-by-5.c  |  48 +
 gcc/tree-object-size.cc   |  60 ++
 5 files changed, 360 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
index 66ff7cdd953a..b677067c6e6b 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
@@ -30,3 +30,14 @@ unsigned nfails = 0;
   __builtin_abort ();\
 return 0;\
   } while (0)
+
+#define EXPECT(p, _v) do {   \
+  size_t v = _v; \
+  if (p == v)\
+__builtin_printf ("ok:  %s == %zd\n", #p, p);\
+  else   \
+{\
+  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);\
+  FAIL ();   \
+}\
+} while (0);
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
new file mode 100644
index ..78f50230e891
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
@@ -0,0 +1,63 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size.  */ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+struct flex {
+  int b;
+  int c[];
+} *array_flex;
+
+struct annotated {
+  int b;
+  int c[] __attribute__ ((counted_by (b)));
+} *array_annotated;
+
+struct nested_annotated {
+  struct {
+union {
+  int b;
+  float f; 
+};
+int n;
+  };
+  int c[] __attribute__ ((counted_by (b)));
+} *array_nested_annotated;
+
+void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
+{
+  array_flex
+= (struct flex *)malloc (sizeof (struct flex)
++ normal_count *  sizeof (int));
+  array_flex->b = normal_count;
+
+  array_annotated
+= (struct annotated *)malloc (sizeof (struct annotated)
+ + attr_count *  sizeof (int));
+  array_annotated->b = attr_count;
+
+  array_nested_annotated
+= (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
++ attr_count *  sizeof (int));
+  array_nested_annotated->b = attr_count;
+
+  return;
+}
+
+void __attribute__((__noinline__)) test ()
+{
+EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
+EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
+  array_annotated->b * sizeof (int));
+EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
+  array_nested_annotated->b * sizeof (int));
+}
+
+int main(int argc, char *argv[])
+{
+  setup (10,10);   
+  test ();
+  DONE ();
+}
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
new file mode 100644
index ..20103d58ef51
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
@@ -0,0 +1,178 @@
+/* Test the attribute counted_by and its usage in
+__builtin_dynamic_object_size: what's the correct behavior when the
+allocation size mismatched with the value of counted_by attribute?
+We should always use the latest value that is hold by the counted_by
+field.  */
+/* { dg-do run } */
+/* { dg-options "-O -fstrict-flex-arrays=3" } */
+
+#include "builtin-object-size-common.h"
+
+struct annotated {
+  size_t foo;
+  char others;
+  char array[] __attribute__((counted_by (foo)));
+};
+
+#define noinline __attribute__((__noinline__))
+#define SIZE_BUMP 10 
+#define MAX(a, b) ((a) > (b) ? (a) : (b))
+
+/* In general, Due to type casting, the type for the pointee of a pointer
+   does not say anyt

[PATCH v10 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-05-30 Thread Qing Zhao
to carry the TYPE of the flexible array.

Such information is needed during tree-object-size.cc.

We cannot use the result type or the type of the 1st argument
of the routine .ACCESS_WITH_SIZE to decide the element type
of the original array due to possible type casting in the
source code.

gcc/c/ChangeLog:

* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
argument to .ACCESS_WITH_SIZE.

gcc/ChangeLog:

* tree-object-size.cc (access_with_size_object_size): Use the type
of the 6th argument for the type of the element.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-6.c: New test.
---
 gcc/c/c-typeck.cc | 11 +++--
 gcc/internal-fn.cc|  2 +
 .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
 gcc/tree-object-size.cc   | 16 ---
 4 files changed, 66 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 0d9c7a34a0df..efd111305b5a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -2646,7 +2646,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
 
to:
 
-   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
+   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
+   (TYPE_OF_ARRAY *)0))
 
NOTE: The return type of this function is the POINTER type pointing
to the original flexible array type.
@@ -2658,6 +2659,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
*counted_by_type)
The 4th argument of the call is a constant 0 with the TYPE of the
object pointed by COUNTED_BY_REF.
 
+   The 6th argument of the call is a constant 0 with the pointer TYPE
+   to the original flexible array type.
+
   */
 static tree
 build_access_with_size_for_counted_by (location_t loc, tree ref,
@@ -2670,12 +2674,13 @@ build_access_with_size_for_counted_by (location_t loc, 
tree ref,
 
   tree call
 = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
-   result_type, 5,
+   result_type, 6,
array_to_pointer_conversion (loc, ref),
counted_by_ref,
build_int_cst (integer_type_node, 1),
build_int_cst (counted_by_type, 0),
-   build_int_cst (integer_type_node, -1));
+   build_int_cst (integer_type_node, -1),
+   build_int_cst (result_type, 0));
   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
   SET_EXPR_LOCATION (call, loc);
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index eb2c4cd59048..0d27f17b2834 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3456,6 +3456,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
  1: read_only
  2: write_only
  3: read_write
+   6th argument: A constant 0 with the pointer TYPE to the original flexible
+ array type.
 
Both the return type and the type of the first argument of this
function have been converted from the incomplete array type to
diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
new file mode 100644
index ..65fa01443d95
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
@@ -0,0 +1,46 @@
+/* Test the attribute counted_by and its usage in
+ * __builtin_dynamic_object_size: when the type of the flexible array member
+ * is casting to another type.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "builtin-object-size-common.h"
+
+typedef unsigned short u16;
+
+struct info {
+   u16 data_len;
+   char data[] __attribute__((counted_by(data_len)));
+};
+
+struct foo {
+   int a;
+   int b;
+};
+
+static __attribute__((__noinline__))
+struct info *setup ()
+{
+ struct info *p;
+ size_t bytes = 3 * sizeof(struct foo);
+
+ p = (struct info *)malloc (sizeof (struct info) + bytes);
+ p->data_len = bytes;
+
+ return p;
+}
+
+static void
+__attribute__((__noinline__)) report (struct info *p)
+{
+ struct foo *bar = (struct foo *)p->data;
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 1), 1), 16);
+ EXPECT(__builtin_dynamic_object_size((char *)(bar + 2), 1), 8);
+}
+
+int main(int argc, char *argv[])
+{
+ struct info *p = setup();
+ report(p);
+ return 0;
+}
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 8de264d1dee2..4c1fa9b555fa 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -762,9 +762,11 @@ addr_object_size (struct object_size_info *osi, const_tree 
ptr,
  1: the number of the elements of the object type;
4th argument TYPE_O

[PATCH v10 4/5] Use the .ACCESS_WITH_SIZE in bound sanitizer.

2024-05-30 Thread Qing Zhao
gcc/c-family/ChangeLog:

* c-ubsan.cc (get_bound_from_access_with_size): New function.
(ubsan_instrument_bounds): Handle call to .ACCESS_WITH_SIZE.

gcc/testsuite/ChangeLog:

* gcc.dg/ubsan/flex-array-counted-by-bounds-2.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-3.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds-4.c: New test.
* gcc.dg/ubsan/flex-array-counted-by-bounds.c: New test.
---
 gcc/c-family/c-ubsan.cc   | 42 +
 .../ubsan/flex-array-counted-by-bounds-2.c| 45 ++
 .../ubsan/flex-array-counted-by-bounds-3.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds-4.c| 34 ++
 .../ubsan/flex-array-counted-by-bounds.c  | 46 +++
 5 files changed, 201 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-3.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-4.c
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds.c

diff --git a/gcc/c-family/c-ubsan.cc b/gcc/c-family/c-ubsan.cc
index 940982819ddf..7cd3c6aa5b88 100644
--- a/gcc/c-family/c-ubsan.cc
+++ b/gcc/c-family/c-ubsan.cc
@@ -376,6 +376,40 @@ ubsan_instrument_return (location_t loc)
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
 
+/* Get the tree that represented the number of counted_by, i.e, the maximum
+   number of the elements of the object that the call to .ACCESS_WITH_SIZE
+   points to, this number will be the bound of the corresponding array.  */
+static tree
+get_bound_from_access_with_size (tree call)
+{
+  if (!is_access_with_size_p (call))
+return NULL_TREE;
+
+  tree ref_to_size = CALL_EXPR_ARG (call, 1);
+  unsigned int class_of_size = TREE_INT_CST_LOW (CALL_EXPR_ARG (call, 2));
+  tree type = TREE_TYPE (CALL_EXPR_ARG (call, 3));
+  tree size = fold_build2 (MEM_REF, type, unshare_expr (ref_to_size),
+  build_int_cst (ptr_type_node, 0));
+  /* If size is negative value, treat it as zero.  */
+  if (!TYPE_UNSIGNED (type))
+  {
+tree cond = fold_build2 (LT_EXPR, boolean_type_node,
+unshare_expr (size), build_zero_cst (type));
+size = fold_build3 (COND_EXPR, type, cond,
+   build_zero_cst (type), size);
+  }
+
+  /* Only when class_of_size is 1, i.e, the number of the elements of
+ the object type, return the size.  */
+  if (class_of_size != 1)
+return NULL_TREE;
+  else
+size = fold_convert (sizetype, size);
+
+  return size;
+}
+
+
 /* Instrument array bounds for ARRAY_REFs.  We create special builtin,
that gets expanded in the sanopt pass, and make an array dimension
of it.  ARRAY is the array, *INDEX is an index to the array.
@@ -401,6 +435,14 @@ ubsan_instrument_bounds (location_t loc, tree array, tree 
*index,
  && COMPLETE_TYPE_P (type)
  && integer_zerop (TYPE_SIZE (type)))
bound = build_int_cst (TREE_TYPE (TYPE_MIN_VALUE (domain)), -1);
+  else if (INDIRECT_REF_P (array)
+  && is_access_with_size_p ((TREE_OPERAND (array, 0
+   {
+ bound = get_bound_from_access_with_size ((TREE_OPERAND (array, 0)));
+ bound = fold_build2 (MINUS_EXPR, TREE_TYPE (bound),
+  bound,
+  build_int_cst (TREE_TYPE (bound), 1));
+   }
   else
return NULL_TREE;
 }
diff --git a/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c 
b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
new file mode 100644
index ..b503320628d2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/flex-array-counted-by-bounds-2.c
@@ -0,0 +1,45 @@
+/* Test the attribute counted_by and its usage in
+   bounds sanitizer combined with VLA.  */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-output "index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 20 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 11 out of bounds for type 'int 
\\\[\\\*\\\]\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+/* { dg-output "\[^\n\r]*index 10 out of bounds for type 'int 
\\\[\\\*\\\]'\[^\n\r]*(\n|\r\n|\r)" } */
+
+
+#include 
+
+void __attribute__((__noinline__)) setup_and_test_vla (int n, int m)
+{
+   struct foo {
+   int n;
+   int p[][n] __attribute__((counted_by(n)));
+   } *f;
+
+   f = (struct foo *) malloc (sizeof(struct foo) + m*sizeof(int[n]));
+   f->n = m;
+   f->p[m][n-1]=1;
+   return;
+}
+
+void __attribute__((__noinline__)) setup_and_test_vla_1 (int n1, int n2, int m)
+{
+  struct foo {
+int n;
+int p[][n2][n1] __attribute__((counted_by(n)));
+  } *f;
+
+  f = (struct foo *) malloc (

[PATCH v10 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-30 Thread Qing Zhao
Including the following changes:
* The definition of the new internal function .ACCESS_WITH_SIZE
  in internal-fn.def.
* C FE converts every reference to a FAM with a "counted_by" attribute
  to a call to the internal function .ACCESS_WITH_SIZE.
  (build_component_ref in c_typeck.cc)

  This includes the case when the object is statically allocated and
  initialized.
  In order to make this working, the routine digest_init in c-typeck.cc
  is updated to fold calls to .ACCESS_WITH_SIZE to its first argument
  when require_constant is TRUE.

  However, for the reference inside "offsetof", the "counted_by" attribute is
  ignored since it's not useful at all.
  (c_parser_postfix_expression in c/c-parser.cc)

  In addtion to "offsetof", for the reference inside operator "typeof" and
  "alignof", we ignore counted_by attribute too.

  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
  replace the call with its first argument.

* Convert every call to .ACCESS_WITH_SIZE to its first argument.
  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
* Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
  get the reference from the call to .ACCESS_WITH_SIZE.
  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)

gcc/c/ChangeLog:

* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
attribute when build_component_ref inside offsetof operator.
* c-tree.h (build_component_ref): Add one more parameter.
* c-typeck.cc (build_counted_by_ref): New function.
(build_access_with_size_for_counted_by): New function.
(build_component_ref): Check the counted-by attribute and build
call to .ACCESS_WITH_SIZE.
(build_unary_op): When building ADDR_EXPR for
.ACCESS_WITH_SIZE, use its first argument.
(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
(digest_init): Fold call to .ACCESS_WITH_SIZE to its first
argument when require_constant is TRUE.

gcc/ChangeLog:

* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
* tree.cc (is_access_with_size_p): New function.
(get_ref_from_access_with_size): New function.
* tree.h (is_access_with_size_p): New prototype.
(get_ref_from_access_with_size): New prototype.

gcc/testsuite/ChangeLog:

* gcc.dg/flex-array-counted-by-2.c: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 142 +-
 gcc/internal-fn.cc|  34 +
 gcc/internal-fn.def   |   5 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 ++
 gcc/tree.cc   |  22 +++
 gcc/tree.h|   8 +
 8 files changed, 328 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 00f8bf4376e5..2d9e9c0969f0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -10848,9 +10848,12 @@ c_parser_postfix_expression (c_parser *parser)
if (c_parser_next_token_is (parser, CPP_NAME))
  {
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful at all.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref, comp_tok->value,
-comp_tok->location, UNKNOWN_LOCATION);
+comp_tok->location, UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
while (c_parser_next_token_is (parser, CPP_DOT)
   || c_parser_next_token_is (parser,
@@ -10877,11 +10880,14 @@ c_parser_postfix_expression (c_parser *parser)
break;
  }
c_token *comp_tok = c_parser_peek_token (parser);
+   /* Ignore the counted_by attribute for reference inside
+  offsetof since the information is not useful.  */
offsetof_ref
  = build_component_ref (loc, offsetof_ref,
 comp_tok->value,
 comp_tok->location,
-UNKNOWN_LOCATION);
+UNKNOWN_LOCATION,
+false);
c_parser_consume_token (parser);
  }
else
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 531a7e87

[PATCH v10 1/5] Provide counted_by attribute to flexible array member field (PR108896)

2024-05-30 Thread Qing Zhao
'counted_by (COUNT)'
 The 'counted_by' attribute may be attached to the C99 flexible
 array member of a structure.  It indicates that the number of the
 elements of the array is given by the field "COUNT" in the
 same structure as the flexible array member.
 GCC may use this information to improve detection of object size 
information
 for such structures and provide better results in compile-time diagnostics
 and runtime features like the array bound sanitizer and
 the '__builtin_dynamic_object_size'.

 For instance, the following code:

  struct P {
size_t count;
char other;
char array[] __attribute__ ((counted_by (count)));
  } *p;

 specifies that the 'array' is a flexible array member whose number
 of elements is given by the field 'count' in the same structure.

 The field that represents the number of the elements should have an
 integer type.  Otherwise, the compiler reports an error and
 ignores the attribute.

 When the field that represents the number of the elements is assigned a
 negative integer value, the compiler treats the value as zero.

 An explicit 'counted_by' annotation defines a relationship between
 two objects, 'p->array' and 'p->count', and there are the following
 requirementthat on the relationship between this pair:

* 'p->count' must be initialized before the first reference to
  'p->array';

* 'p->array' has _at least_ 'p->count' number of elements
  available all the time.  This relationship must hold even
  after any of these related objects are updated during the
  program.

 It's the user's responsibility to make sure the above requirements
 to be kept all the time.  Otherwise the compiler reports
 warnings, at the same time, the results of the array bound
 sanitizer and the '__builtin_dynamic_object_size' is undefined.

 One important feature of the attribute is, a reference to the
 flexible array member field uses the latest value assigned to
 the field that represents the number of the elements before that
 reference.  For example,

p->count = val1;
p->array[20] = 0;  // ref1 to p->array
p->count = val2;
p->array[30] = 0;  // ref2 to p->array

 in the above, 'ref1' uses 'val1' as the number of the elements
 in 'p->array', and 'ref2' uses 'val2' as the number of elements
 in 'p->array'.

gcc/c-family/ChangeLog:

PR C/108896
* c-attribs.cc (handle_counted_by_attribute): New function.
(attribute_takes_identifier_p): Add counted_by attribute to the list.
* c-common.cc (c_flexible_array_member_type_p): ...To this.
* c-common.h (c_flexible_array_member_type_p): New prototype.

gcc/c/ChangeLog:

PR C/108896
* c-decl.cc (flexible_array_member_type_p): Renamed and moved to...
(add_flexible_array_elts_to_size): Use renamed function.
(is_flexible_array_member_p): Use renamed function.
(verify_counted_by_attribute): New function.
(finish_struct): Use renamed function and verify counted_by
attribute.
* c-tree.h (lookup_field): New prototype.
* c-typeck.cc (lookup_field): Expose as extern function.
(tagged_types_tu_compatible_p): Check counted_by attribute for
structure type.

gcc/ChangeLog:

PR C/108896
* doc/extend.texi: Document attribute counted_by.

gcc/testsuite/ChangeLog:

PR C/108896
* gcc.dg/flex-array-counted-by.c: New test.
* gcc.dg/flex-array-counted-by-7.c: New test.
* gcc.dg/flex-array-counted-by-8.c: New test.
---
 gcc/c-family/c-attribs.cc |  68 +-
 gcc/c-family/c-common.cc  |  13 ++
 gcc/c-family/c-common.h   |   1 +
 gcc/c/c-decl.cc   |  80 ---
 gcc/c/c-tree.h|   1 +
 gcc/c/c-typeck.cc |  37 -
 gcc/doc/extend.texi   |  68 ++
 .../gcc.dg/flex-array-counted-by-7.c  |   8 ++
 .../gcc.dg/flex-array-counted-by-8.c  | 127 ++
 gcc/testsuite/gcc.dg/flex-array-counted-by.c  |  62 +
 10 files changed, 444 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-7.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-8.c
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 04e39b41bdf3..9d562ea8548e 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -105,6 +105,8 @@ static tree handle_warn_if_not_aligned_attribute (tree *, 
tree, tree,
  int, bool *);
 static tree handle_strict_flex_array_attribute (tree

[PATCH v10 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-05-30 Thread Qing Zhao
Hi,

This is the 10th version of the patch.
Which is rebased on the latest trunk.

Compare with the 9th version, all the difference are in patch #2, including
a small C FE change in the routine "digest_init". all others are middle-end
changes.

please review the changes for the patch #2:

updates per Richard's comments:

1. In the C FE routine "digest_init" of c-typeck.cc,
   when "require_constant" is TRUE, fold the call to .ACCESS_WITH_SIZE
   to its first argument.

   At the same time, delete the special handling of call to .ACCESS_WITH_SIZE
   in the middle end routines "initializer_constant_valid_p_1" and
   "output_constant" in varasm.cc

2. Add ECF_PURE to the new internal-function .ACCESS_WITH_SIZE in 
internal-fn.def.
   As a result, delete all special handling of calls to .ACCESS_WITH_SIZE in
   the files "tree-ssa-alias.cc" and "tree-ssa-dce.cc" and the routine
   "proces_call_operands" of the file "tree.cc" 

3. Delete the unnecessary lines from the routine "expand_DEFERRED_INIT" 
   per Richard's suggestion.

Approval status:
   Patch #1, #3, #4, #5 are all approved;
   Patch #2, All C FE changes, except the change for the routine "digest_init"
in c-typeck.cc, are approved.

Review needed:

   Patch #2: Middle end change;
 the change for the routine "digest_init" in C FE. 

The 9th version is here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649389.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649390.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649391.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649392.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649393.html

It based on the following original proposal:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635884.html
Represent the missing dependence for the "counted_by" attribute and its 
consumers

Bootstrapped and regression tested on both X86 and Aarch64, no issue.

Okay for trunk?

thanks.

Qing


Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-05-30 Thread Andrew Pinski
On Thu, May 30, 2024 at 5:09 AM Filip Kastl  wrote:
>
> Hi,
>
> This patch adds a transformation into the switch conversion pass --
> the "exponential index transform".  This transformation can help switch
> conversion convert switches it otherwise could not.  The transformation is
> intended for switches whose cases are all powers of 2.  Here is a more 
> detailed
> description of what and how this patch tries to address:
>
>
> The problem
> ---
>
> Consider this piece of C code
>
> switch (i)
>   {
> case (1 << 0): return 0;
> case (1 << 1): return 1;
> case (1 << 2): return 2;
> ...
> case (1 << 30): return 30;
> default: return 31;
>   }
>
> If i is a power of 2 (2^k), the switch just returns the exponent (k).  This 
> can
> be seen as taking the base 2 logarithm of i or as returning the position of 
> the
> singular 1 bit in i.
>
> Currently, GCC fails to see this and doesn't optimize the switch in any way.
>
> Switch conversion is able to optimize similar switches to an array lookup.
> This is not possible here because the range of cases is too wide.  Another
> optimization that switch conversion is able to do is the "linear
> transformation" -- switch conversion is able to notice a linear relationship
> between the index variable (variable i in the case) and the result value and
> rewrite switch to just an assignment (or multiple assignments in case of
> multiple result values). Sadly, linear transformation isn't applicable here
> because the linear relationship is based on the exponent of i, not on i 
> itself.
>
>
> The solution
> 
>
> The exponential index transformation does the following.  When it recognises
> that a switch only has case numbers that are powers of 2 it replaces them with
> their exponents.  It also replaces the index variable by its exponent.  This 
> is
> done by inserting a statement that takes the logarithm of i and using the
> result as the new index variable.  Actually we use the FFS operation for this
> -- since we expect a power of two, we may just ask for the position of the
> first 1 bit.
>
> We also need to insert a conditional that checks at runtime that the index
> variable is a power of two.  If it isn't, the resulting value should just be
> the default case value (31 in the example above).
>
> With exponential index transform, switch conversion is able to simplify the
> above example into something like this
>
> if (i is power of 2)
>   return log2(i); // actually implemented as ffs(i) - 1
> else
>   return 31;
>
> Switch conversion bails if the range of case numbers is too big.  Exponential
> index transform shrinks this range (exponentially).  So even if there is no
> linear relationship in the switch, exponential index transform can still help
> convert the switch at least to an array lookup.
>
>
> Limitations
> ---
>
> Currently we only run the exponential index transform if the target has the
> POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
> logarithm) instructions -- we check direct_internal_fn_supported_p () for
> POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
> less efficient than just using a jump table.  We try to avoid transforming a
> switch into a less efficient form.  Maybe this is too conservative and could 
> be
> tweaked in the future.
>
>
> Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
> and regtest on a version where I removed the check that the target has the
> POPCOUNT and FFS instructions so that the transformation would be triggered
> more often.  That testing also went well.
>
> Are there any things I should tweak?  Or is the patch ready to be applied?
>
> Cheers,
> Filip Kastl
>
>
> -- 8< --
>
>
> Sometimes a switch has case numbers that are powers of 2.  Switch
> conversion usually isn't able to optimize switches.  This patch adds
> "exponential index transformation" to switch conversion.  After switch
> conversion applies this transformation on the switch the index variable
> of the switch becomes the exponent instead of the whole value.  For
> example:
>
> switch (i)
>   {
> case (1 << 0): return 0;
> case (1 << 1): return 1;
> case (1 << 2): return 2;
> ...
> case (1 << 30): return 30;
> default: return 31;
>   }
>
> gets transformed roughly into
>
> switch (log2(i))
>   {
> case 0: return 0;
> case 1: return 1;
> case 2: return 2;
> ...
> case 30: return 30;
> default: return 31;
>   }
>
> This enables switch conversion to further optimize the switch.
>
> This patch only enables this transformation if there are optabs for
> POPCOUNT and FFS so that the base 2 logarithm can be computed
> efficiently at runtime.
>
> gcc/ChangeLog:
>
> * tree-switch-conversion.cc (switch_conversion::switch_conversion):
> Track if the transformation happened.
> (switch_conversion::is_exp_index_transform_viable): New function
> to decide 

[PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-05-30 Thread Filip Kastl
Hi,

This patch adds a transformation into the switch conversion pass --
the "exponential index transform".  This transformation can help switch
conversion convert switches it otherwise could not.  The transformation is
intended for switches whose cases are all powers of 2.  Here is a more detailed
description of what and how this patch tries to address:


The problem
---

Consider this piece of C code

switch (i)
  {
case (1 << 0): return 0;
case (1 << 1): return 1;
case (1 << 2): return 2;
...
case (1 << 30): return 30;
default: return 31;
  }

If i is a power of 2 (2^k), the switch just returns the exponent (k).  This can
be seen as taking the base 2 logarithm of i or as returning the position of the
singular 1 bit in i.

Currently, GCC fails to see this and doesn't optimize the switch in any way.

Switch conversion is able to optimize similar switches to an array lookup.
This is not possible here because the range of cases is too wide.  Another
optimization that switch conversion is able to do is the "linear
transformation" -- switch conversion is able to notice a linear relationship
between the index variable (variable i in the case) and the result value and
rewrite switch to just an assignment (or multiple assignments in case of
multiple result values). Sadly, linear transformation isn't applicable here
because the linear relationship is based on the exponent of i, not on i itself.


The solution


The exponential index transformation does the following.  When it recognises
that a switch only has case numbers that are powers of 2 it replaces them with
their exponents.  It also replaces the index variable by its exponent.  This is
done by inserting a statement that takes the logarithm of i and using the
result as the new index variable.  Actually we use the FFS operation for this
-- since we expect a power of two, we may just ask for the position of the
first 1 bit.

We also need to insert a conditional that checks at runtime that the index
variable is a power of two.  If it isn't, the resulting value should just be
the default case value (31 in the example above).

With exponential index transform, switch conversion is able to simplify the
above example into something like this

if (i is power of 2)
  return log2(i); // actually implemented as ffs(i) - 1
else
  return 31;

Switch conversion bails if the range of case numbers is too big.  Exponential
index transform shrinks this range (exponentially).  So even if there is no
linear relationship in the switch, exponential index transform can still help
convert the switch at least to an array lookup.


Limitations
---

Currently we only run the exponential index transform if the target has the
POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
logarithm) instructions -- we check direct_internal_fn_supported_p () for
POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
less efficient than just using a jump table.  We try to avoid transforming a
switch into a less efficient form.  Maybe this is too conservative and could be
tweaked in the future.


Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
and regtest on a version where I removed the check that the target has the
POPCOUNT and FFS instructions so that the transformation would be triggered
more often.  That testing also went well.

Are there any things I should tweak?  Or is the patch ready to be applied?

Cheers,
Filip Kastl


-- 8< --


Sometimes a switch has case numbers that are powers of 2.  Switch
conversion usually isn't able to optimize switches.  This patch adds
"exponential index transformation" to switch conversion.  After switch
conversion applies this transformation on the switch the index variable
of the switch becomes the exponent instead of the whole value.  For
example:

switch (i)
  {
case (1 << 0): return 0;
case (1 << 1): return 1;
case (1 << 2): return 2;
...
case (1 << 30): return 30;
default: return 31;
  }

gets transformed roughly into

switch (log2(i))
  {
case 0: return 0;
case 1: return 1;
case 2: return 2;
...
case 30: return 30;
default: return 31;
  }

This enables switch conversion to further optimize the switch.

This patch only enables this transformation if there are optabs for
POPCOUNT and FFS so that the base 2 logarithm can be computed
efficiently at runtime.

gcc/ChangeLog:

* tree-switch-conversion.cc (switch_conversion::switch_conversion):
Track if the transformation happened.
(switch_conversion::is_exp_index_transform_viable): New function
to decide whether the transformation should be applied.
(switch_conversion::exp_index_transform): New function to
execute the transformation.
(switch_conversion::gen_inbound_check): Don't remove the default
BB if the transformation happened.
(switch_conversion::expand): Execute th

Re: [PATCH] Fix LTO type mismatch warning on transparent union

2024-05-30 Thread Eric Botcazou
> Do function pointers inter-operate TBAA wise for this case and would this
> possibly An issue?

Do you mean in LTO mode?  I must say I'm not sure of the way LTO performs TBAA 
for function pointers: does it require (strict) matching of the type for all 
the parameters of the pointed-to function types?  If so, then I guess it could 
theoretically assign different alias sets to compatible function pointers when 
one of them happens to point to the function type of a function imported with 
the transparent union gap, with some problematic fallout when objects of these 
function pointers happen to be indirectly modified in the program...

Note that there is an equivalent bypass based on common_or_extern a few lines 
below in the function (although I'm not sure if it's problematic TBAA-wise).

-- 
Eric Botcazou




[PING] [PATCH] RISC-V: Add Zfbfmin extension

2024-05-30 Thread Xiao Zeng
1 In the previous patch, the libcall for BF16 was implemented:


2 Riscv provides Zfbfmin extension, which completes the "Scalar BF16 Converts":


3 Implemented replacing libcall with Zfbfmin extension instruction.

4 Reused previous testcases in:


gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_output_move): Handle BFmode move
for zfbfmin.
* config/riscv/riscv.md (truncsfbf2): New pattern for BFmode.
(trunchfbf2): Dotto.
(truncdfbf2): Dotto.
(trunctfbf2): Dotto.
(extendbfsf2): Dotto.
(*movhf_hardfloat): Add BFmode.
(*mov_hardfloat): Dotto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfbfmin-bf16_arithmetic.c: New test.
* gcc.target/riscv/zfbfmin-bf16_comparison.c: New test.
* gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c: New test.
* gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c: New test.
---
 gcc/config/riscv/riscv.cc |  4 +-
 gcc/config/riscv/riscv.md | 75 +--
 .../riscv/zfbfmin-bf16_arithmetic.c   | 35 +
 .../riscv/zfbfmin-bf16_comparison.c   | 33 
 .../zfbfmin-bf16_float_libcall_convert.c  | 45 +++
 .../zfbfmin-bf16_integer_libcall_convert.c| 66 
 6 files changed, 249 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_arithmetic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_comparison.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_float_libcall_convert.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zfbfmin-bf16_integer_libcall_convert.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0c22058b8c..7c6bafedda3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4106,7 +4106,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.x.h\t%0,%1";
/* Using fmv.x.s + sign-extend to emulate fmv.x.h.  */
return "fmv.x.s\t%0,%1;slli\t%0,%0,16;srai\t%0,%0,16";
@@ -4162,7 +4162,7 @@ riscv_output_move (rtx dest, rtx src)
switch (width)
  {
  case 2:
-   if (TARGET_ZFHMIN)
+   if (TARGET_ZFHMIN || TARGET_ZFBFMIN)
  return "fmv.h.x\t%0,%z1";
/* High 16 bits should be all-1, otherwise HW will treated
   as a n-bit canonical NaN, but isn't matter for softfloat.  */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 78c16adee98..7fd2e3aa23e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1763,6 +1763,57 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "HF")])
 
+(define_insn "truncsfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:SF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  "fcvt.bf16.s\t%0,%1"
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+;; The conversion of HF/DF/TF to BF needs to be done with SF if there is a
+;; chance to generate at least one instruction, otherwise just using
+;; libfunc __trunc[h|d|t]fbf2.
+(define_expand "trunchfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:HF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, HFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+(define_expand "truncdfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:DF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, DFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
+(define_expand "trunctfbf2"
+  [(set (match_operand:BF0 "register_operand" "=f")
+   (float_truncate:BF
+  (match_operand:TF 1 "register_operand" " f")))]
+  "TARGET_ZFBFMIN"
+  {
+convert_move (operands[0],
+ convert_modes (SFmode, TFmode, operands[1], 0), 0);
+DONE;
+  }
+  [(set_attr "type" "fcvt")
+   (set_attr "mode" "BF")])
+
 ;;
 ;;  
 ;;
@@ -1907,6 +1958,15 @@
   [(set_attr "type" "fcvt")
(set_attr "mode" "SF")])
 
+(define_insn "extendbfsf2"
+  [(set (match_operand:SF0 "register_operand" "=f")
+   (float_extend:SF
+  (match_operand:BF 1 "regi

[PATCH] Fix PR c++/111106: missing ; causes internal compiler error

2024-05-30 Thread Simon Martin
We currently fail upon the following because an assert in dependent_type_p
fails for f's parameter

=== cut here ===
consteval int id (int i) { return i; }
constexpr int
f (auto i) requires requires { id (i) } { return i; }
void g () { f (42); }
=== cut here ===

This patch fixes this by handling synthesized parameters for abbreviated
function templates in that assert.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/06

gcc/cp/ChangeLog:

* pt.cc (dependent_type_p): Relax assert to handle synthesized template
parameters when !processing_template_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval37.C: New test.

---
 gcc/cp/pt.cc |  6 +-
 gcc/testsuite/g++.dg/cpp2a/consteval37.C | 19 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/consteval37.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index dfce1b3c359..a50d5cfd5a2 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -28019,7 +28019,11 @@ dependent_type_p (tree type)
   /* If we are not processing a template, then nobody should be
 providing us with a dependent type.  */
   gcc_assert (type);
-  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type));
+  gcc_assert (TREE_CODE (type) != TEMPLATE_TYPE_PARM || is_auto (type)
+ || (/* Synthesized template parameter */
+ DECL_TEMPLATE_PARM_P (TEMPLATE_TYPE_DECL (type)) &&
+ (DECL_IMPLICIT_TEMPLATE_PARM_P
+  (TEMPLATE_TYPE_DECL (type);
   return false;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/consteval37.C 
b/gcc/testsuite/g++.dg/cpp2a/consteval37.C
new file mode 100644
index 000..ea2641fc204
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/consteval37.C
@@ -0,0 +1,19 @@
+// PR c++/06
+// { dg-do compile { target c++20 } }
+
+consteval int id (int i) { return i; }
+
+constexpr int f (auto i) // { dg-line line_1 }
+  requires requires { id (i) } // { dg-error "expected|invalid use" }
+{
+  return i;
+}
+
+void g () {
+  f (42); // { dg-error "parameter 1" }
+}
+
+// { dg-error "constraints on a non-templated" {} { target *-*-* } line_1 }
+// { dg-error "has incomplete type" {} { target *-*-* } line_1 }
+// { dg-error "invalid type for" {} { target *-*-* } line_1 }
+// { dg-note "declared here" {} { target *-*-* } line_1 }
-- 
2.44.0




Re: [Patch, aarch64, middle-end\ v4: Move pair_fusion pass from aarch64 to middle-end

2024-05-30 Thread Richard Sandiford
Thanks for the update.  Some comments below, but looks very close
to ready.

Ajit Agarwal  writes:
> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
> new file mode 100644
> index 000..060fd95
> --- /dev/null
> +++ b/gcc/pair-fusion.cc
> @@ -0,0 +1,3012 @@
> +// Pass to fuse adjacent loads/stores into paired memory accesses.
> +// Copyright (C) 2024 Free Software Foundation, Inc.

This should probably be 2023-2024, since it's based on code
contributed in 2023.

> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +#define INCLUDE_ALGORITHM
> +#define INCLUDE_FUNCTIONAL
> +#define INCLUDE_LIST
> +#define INCLUDE_TYPE_TRAITS
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "df.h"
> +#include "rtl-iter.h"
> +#include "rtl-ssa.h"
> +#include "cfgcleanup.h"
> +#include "tree-pass.h"
> +#include "ordered-hash-map.h"
> +#include "tree-dfa.h"
> +#include "fold-const.h"
> +#include "tree-hash-traits.h"
> +#include "print-tree.h"
> +#include "pair-fusion.h"
> +
> +using namespace rtl_ssa;
> +
> +// We pack these fields (load_p, fpsimd_p, and size) into an integer
> +// (LFS) which we use as part of the key into the main hash tables.
> +//
> +// The idea is that we group candidates together only if they agree on
> +// the fields below.  Candidates that disagree on any of these
> +// properties shouldn't be merged together.
> +struct lfs_fields
> +{
> +  bool load_p;
> +  bool fpsimd_p;
> +  unsigned size;
> +};
> +
> +using insn_list_t = std::list;
> +
> +// Information about the accesses at a given offset from a particular
> +// base.  Stored in an access_group, see below.
> +struct access_record
> +{
> +  poly_int64 offset;
> +  std::list cand_insns;
> +  std::list::iterator place;
> +
> +  access_record (poly_int64 off) : offset (off) {}
> +};
> +
> +// A group of accesses where adjacent accesses could be ldp/stp
> +// candidates.  The splay tree supports efficient insertion,
> +// while the list supports efficient iteration.
> +struct access_group
> +{
> +  splay_tree tree;
> +  std::list list;
> +
> +  template
> +  inline void track (Alloc node_alloc, poly_int64 offset, insn_info *insn);
> +};
> +
> +// Test if this base candidate is viable according to HAZARDS.
> +bool base_cand::viable () const

Formating nit, should be:

bool
base_cand::viable () const

> +{
> +  return !hazards[0] || !hazards[1] || (*hazards[0] > *hazards[1]);
> +}
> [...]
> +void
> +pair_fusion_bb_info::transform ()
> +{
> +  traverse_base_map (expr_map);
> +  traverse_base_map (def_map);
> +}
> +
> +// the base register which we can fold in to make this pair use
> +// a writeback addressing mode.

The first line of this comment is missing.  It should be:

// Given an existing pair insn INSN, look for a trailing update of

> [...]
> diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> new file mode 100644
> index 000..f295fdbdb8f
> --- /dev/null
> +++ b/gcc/pair-fusion.h
> @@ -0,0 +1,195 @@
> +// Pass to fuse adjacent loads/stores into paired memory accesses.
> +//
> +// This file contains the definition of the virtual base class which is
> +// overriden by targets that make use of the pass.
> +//
> +// Copyright (C) 2024 Free Software Foundation, Inc.

2023-2024 here too

> +//
> +// This file is part of GCC.
> +//
> +// GCC is free software; you can redistribute it and/or modify it
> +// under the terms of the GNU General Public License as published by
> +// the Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +//
> +// GCC is distributed in the hope that it will be useful, but
> +// WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +// General Public License for more details.
> +//
> +// You should have received a copy of the GNU General Public License
> +// along with GCC; see the file COPYING3.  If not see
> +// .
> +
> +namespace rtl_ssa {
> +  class def_info;
> +  class insn_info;
> +  class insn_range_info;
> +  class bb_info;
> +}
> +
> +// Information about a potential base candidate, used in try_fuse_pair.
> +// There may be zero, one, or two viable RTL bases for a given pair.
> +struct base_cand
> +{
> +  // DEF is the de

Re: [PATCH] Fix -Wstringop-overflow warning in 23_containers/vector/types/1.cc

2024-05-30 Thread Jonathan Wakely
On Thu, 30 May 2024 at 06:11, François Dumont  wrote:
>
> Looks like this new version works the same to fix the warning without
> the issues reported here.
>
> All 23_containers/vector tests run in C++98/14/20 so far.
>
> Ok to commit once I've complete the testsuite (or some bot did it for me
> !) ?

There seem to be two unrelated changes here. One is to make the local
variables usable in both branches, but I don't understand why that
matters because the first branch doesn't reallocate, so there's no
call to operator new that would confuse the compiler.

The second change is to add the __builtin_unreachable() to tell the
compiler that __len is correct and did not wrap around. Which should
not be needed because

Are both changes necessary to fix the FAIL for c++98 mode?

I've just done a quick check, and it seems that this smaller change
fixes the FAIL:

--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -933,6 +933,9 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER

   const size_type __len =
 _M_check_len(__n, "vector::_M_range_insert");
+   if (__len < (__n + (__old_finish - __old_start)))
+ __builtin_unreachable();
+
   pointer __new_start(this->_M_allocate(__len));
   pointer __new_finish(__new_start);
   __try

We don't need the rest of the change, which makes sense because the
first branch doesn't reallocate so the compiler doesn't think the
pointers can be invalidated.



> I'll look for a PR to associate, if you have one in mind do not hesitate
> to tell me.

It's discussed in PR 109849.

>
> François
>
>
> On 28/05/2024 12:28, Jonathan Wakely wrote:
> > On 27/05/24 22:07 +0200, François Dumont wrote:
> >> In C++98 this test fails with:
> >>
> >> Excess errors:
> >> /home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:452:
> >> warning: 'void* __builtin_memcpy(void*, const void*, long unsigned
> >> int)' writing between 2 and 9223372036854775806 bytes into a region
> >> of size 0 overflows the destination [-Wstringop-overflow=]
> >>
> >> The attached patch avoids this warning.
> >>
> >> libstdc++: Fix -Wstringop-overflow warning coming from std::vector
> >>
> >> Make vector<>::_M_range_insert implementation more transparent to
> >> the compiler checks.
> >>
> >> Extend local copies of members to the whole method scope so that
> >> all branches benefit
> >> from those.
> >>
> >> libstdc++-v3/ChangeLog:
> >>
> >> * include/bits/vector.tcc
> >> (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt,
> >> forward_iterator_tag)):
> >> Use local copies of members to call the different
> >> algorithms.
> >>
> >> Ok to commit if all tests passes ?
> >>
> >> François
> >
> >> diff --git a/libstdc++-v3/include/bits/vector.tcc
> >> b/libstdc++-v3/include/bits/vector.tcc
> >> index 36b27dce7b9..671929dee55 100644
> >> --- a/libstdc++-v3/include/bits/vector.tcc
> >> +++ b/libstdc++-v3/include/bits/vector.tcc
> >> @@ -885,83 +885,80 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>   {
> >> if (__first != __last)
> >>   {
> >> +// Make local copies of these members because the compiler
> >> +// thinks the allocator can alter them if 'this' is globally
> >> +// reachable.
> >> +pointer __start = this->_M_impl._M_start;
> >> +pointer __end = this->_M_impl._M_end_of_storage;
> >> +pointer __finish = this->_M_impl._M_finish;
> >> +pointer __pos = __position.base();
> >> +_Tp_alloc_type& __allocator = _M_get_Tp_allocator();
> >> +
> >> +if (__pos < __start || __finish < __pos)
> >> +  __builtin_unreachable();
> >
> > I don't think we should use __builtin_unreachable for something which
> > is not an invariant of the class. The __position argument is supplied
> > by the user, so we should not make promises about it being valid,
> > because we can't know that.
> >
> > We can promise that __start <= __finish, and that __finish <= end,
> > because we control those. We can't promise the user won't pass in a
> > bad __position. Although it's undefined for the user to do that, using
> > __builtin_unreachable() here makes the effects worse, and makes it
> > harder to debug.
> >
> > Also, (__pos < __start) might already trigger undefined behaviour for
> > fancy pointers, if they don't point to the same memory region.
> >
> > So this change is not OK.
> >
> >
> >> +
> >> const size_type __n = std::distance(__first, __last);
> >> -if (size_type(this->_M_impl._M_end_of_storage
> >> -  - this->_M_impl._M_finish) >= __n)
> >> +if (size_type(__end - __finish) >= __n)
> >>   {
> >> -const size_type __elems_after = end() - __position;
> >> -pointer __old_finish(this->_M_impl._M_finish);
> >> +const size_type __elems_after = __end - __pos;
> >> +pointer __old_finish(

Re: [PATCH] aarch64: testsuite: Explicitly add -mlittle-endian to vget_low_2.c

2024-05-30 Thread Richard Sandiford
Pengxuan Zheng  writes:
> vget_low_2.c is a test case for little-endian, but we missed the 
> -mlittle-endian
> flag in r15-697-ga2e4fe5a53cf75.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vget_low_2.c: Add -mlittle-endian.

Ok, thanks.

If you'd like write access, please follow the instructions on
https://gcc.gnu.org/gitwrite.html (I'll sponsor).

Richard

> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/testsuite/gcc.target/aarch64/vget_low_2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c 
> b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> index 44414e1c043..93e9e664ee9 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fdump-tree-optimized -mlittle-endian" } */
>  
>  #include 


Re: [PATCH] aarch64: Add vector floating point extend patterns [PR113880, PR113869]

2024-05-30 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch improves vectorization of certain floating point widening 
> operations
> for the aarch64 target by adding vector floating point extend patterns for
> V2SF->V2DF and V4HF->V4SF conversions.
>
>   PR target/113880
>   PR target/113869
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-simd.md (extend2): New expand.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/extend-vec.c: New test.
>
> Signed-off-by: Pengxuan Zheng 

Thanks for doing this.  Could we instead rename
aarch64_float_extend_lo_ to extend2 and
use something similar to:

---
/* The builtins below should be expanded through the standard optabs
   CODE_FOR_[u]avg3_[floor,ceil].  However the mapping scheme in
   aarch64-simd-builtins.def does not easily allow us to have a pre-mode
   ("uavg") and post-mode string ("_ceil") in the CODE_FOR_* construction.
   So the builtins use a name that is natural for AArch64 instructions
   e.g. "aarch64_srhadd" and we re-map these to the optab-related
   CODE_FOR_ here.  */
#undef VAR1
#define VAR1(F,T1,T2,I,M) \
constexpr insn_code CODE_FOR_aarch64_##F##M = CODE_FOR_##T1##M##3##T2;

BUILTIN_VDQ_BHSI (srhadd, avg, _ceil, 0)
BUILTIN_VDQ_BHSI (urhadd, uavg, _ceil, 0)
BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0)
BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0)

#undef VAR1
---

(from aarch64-builtins.cc) to handle the intrinsics?  The idea is
to try to avoid adding new patterns just to satisfy the internal
naming convention.

Richard

> ---
>  gcc/config/aarch64/aarch64-simd.md|  7 +++
>  gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 868f4486218..8febb411d06 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3141,6 +3141,13 @@ (define_insn "aarch64_float_extend_lo_"
>[(set_attr "type" "neon_fp_cvt_widen_s")]
>  )
>  
> +(define_expand "extend2"
> +  [(set (match_operand: 0 "register_operand" "=w")
> +(float_extend:
> +  (match_operand:VDF 1 "register_operand" "w")))]
> +  "TARGET_SIMD"
> +)
> +
>  ;; Float narrowing operations.
>  
>  (define_insn "aarch64_float_trunc_rodd_df"
> diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c 
> b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> new file mode 100644
> index 000..f6241d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */
> +void
> +f (float *__restrict a, double *__restrict b)
> +{
> +  b[0] = a[0];
> +  b[1] = a[1];
> +}
> +
> +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */
> +void
> +f1 (_Float16 *__restrict a, float *__restrict b)
> +{
> +
> +  b[0] = a[0];
> +  b[1] = a[1];
> +  b[2] = a[2];
> +  b[3] = a[3];
> +}


Re: [PATCH] [libstdc++-v3] [rtems] enable filesystem support

2024-05-30 Thread Jonathan Wakely
Thanks.

I got bounces from oarcorp.com and rtems.org, are the details in
MAINTAINERS out of date for Joel and Ralf?

On Thu, 30 May 2024 at 09:19, Alexandre Oliva  wrote:
>
> On May 30, 2024, Jonathan Wakely  wrote:
>
> > Typo here, "rtemps"?
>
> Ugh, thanks, fixed.
>
> > OK with that fixed (and configure regenerated).
>
> I also untabified the new lines, to match the surrounding context.
> Here's what I installed:
>
> [libstdc++-v3] [rtems] enable filesystem support
>
> mkdir, chdir and chmod functions are defined in librtemscpu, that
> doesn't get linked in during libstdc++-v3 configure, but applications
> use -qrtems for linking, which brings those symbols in, so it makes
> sense to mark them as available so that the C++ filesystem APIs are
> enabled.
>
>
> for  libstdc++-v3/ChangeLog
>
> * configure.ac [*-*-rtems*]: Set chdir, chmod and mkdir as
> available.
> * configure: Rebuilt.
> ---
>  libstdc++-v3/configure|7 +++
>  libstdc++-v3/configure.ac |7 +++
>  2 files changed, 14 insertions(+)
>
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 5179cc507f129..5645e991af7ab 100755
> --- a/libstdc++-v3/configure
> +++ b/libstdc++-v3/configure
> @@ -28610,6 +28610,13 @@ _ACEOF
>
>  $as_echo "#define HAVE_USLEEP 1" >>confdefs.h
>
> +
> +# These functions are defined in librtemscpu.  We don't use
> +# -qrtems during configure, so we don't link that in, and fail
> +# to find them.
> +glibcxx_cv_chdir=yes
> +glibcxx_cv_chmod=yes
> +glibcxx_cv_mkdir=yes
>  ;;
>  esac
>elif test "x$with_headers" != "xno"; then
> diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
> index 37396bd6ebbe6..ccb24a82be799 100644
> --- a/libstdc++-v3/configure.ac
> +++ b/libstdc++-v3/configure.ac
> @@ -400,6 +400,13 @@ dnl # rather than hardcoding that information.
>  AC_DEFINE(HAVE_SYMLINK)
>  AC_DEFINE(HAVE_TRUNCATE)
>  AC_DEFINE(HAVE_USLEEP)
> +
> +# These functions are defined in librtemscpu.  We don't use
> +# -qrtems during configure, so we don't link that in, and fail
> +# to find them.
> +glibcxx_cv_chdir=yes
> +glibcxx_cv_chmod=yes
> +glibcxx_cv_mkdir=yes
>  ;;
>  esac
>elif test "x$with_headers" != "xno"; then
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive
>



Re: [PATCH] [libstdc++-v3] [rtems] enable filesystem support

2024-05-30 Thread Alexandre Oliva
On May 30, 2024, Jonathan Wakely  wrote:

> Typo here, "rtemps"?

Ugh, thanks, fixed.

> OK with that fixed (and configure regenerated).

I also untabified the new lines, to match the surrounding context.
Here's what I installed:

[libstdc++-v3] [rtems] enable filesystem support

mkdir, chdir and chmod functions are defined in librtemscpu, that
doesn't get linked in during libstdc++-v3 configure, but applications
use -qrtems for linking, which brings those symbols in, so it makes
sense to mark them as available so that the C++ filesystem APIs are
enabled.


for  libstdc++-v3/ChangeLog

* configure.ac [*-*-rtems*]: Set chdir, chmod and mkdir as
available.
* configure: Rebuilt.
---
 libstdc++-v3/configure|7 +++
 libstdc++-v3/configure.ac |7 +++
 2 files changed, 14 insertions(+)

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5179cc507f129..5645e991af7ab 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28610,6 +28610,13 @@ _ACEOF
 
 $as_echo "#define HAVE_USLEEP 1" >>confdefs.h
 
+
+# These functions are defined in librtemscpu.  We don't use
+# -qrtems during configure, so we don't link that in, and fail
+# to find them.
+glibcxx_cv_chdir=yes
+glibcxx_cv_chmod=yes
+glibcxx_cv_mkdir=yes
 ;;
 esac
   elif test "x$with_headers" != "xno"; then
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 37396bd6ebbe6..ccb24a82be799 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -400,6 +400,13 @@ dnl # rather than hardcoding that information.
 AC_DEFINE(HAVE_SYMLINK)
 AC_DEFINE(HAVE_TRUNCATE)
 AC_DEFINE(HAVE_USLEEP)
+
+# These functions are defined in librtemscpu.  We don't use
+# -qrtems during configure, so we don't link that in, and fail
+# to find them.
+glibcxx_cv_chdir=yes
+glibcxx_cv_chmod=yes
+glibcxx_cv_mkdir=yes
 ;;
 esac
   elif test "x$with_headers" != "xno"; then


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] ira: Fix go_through_subreg offset calculation [PR115281]

2024-05-30 Thread Richard Sandiford
go_through_subreg used:

  else if (!can_div_trunc_p (SUBREG_BYTE (x),
 REGMODE_NATURAL_SIZE (GET_MODE (x)), offset))

to calculate the register offset for a pseudo subreg x.  In the blessed
days before poly-int, this was:

*offset = (SUBREG_BYTE (x) / REGMODE_NATURAL_SIZE (GET_MODE (x)));

But I think this is testing the wrong natural size.  If we exclude
paradoxical subregs (which will get an offset of zero regardless),
it's the inner register that is being split, so it should be the
inner register's natural size that we use.

This matters in the testcase because we have an SFmode lowpart
subreg into the last of three variable-sized vectors.  The
SUBREG_BYTE is therefore equal to the size of two variable-sized
vectors.  Dividing by the vector size gives a register offset of 2,
as expected, but dividing by the size of a scalar FPR would give
a variable offset.

I think something similar could happen for fixed-size targets if
REGMODE_NATURAL_SIZE is different for vectors and integers (say).

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/115281
* ira-conflicts.cc (go_through_subreg): Use the natural size of
the inner mode rather than the outer mode.

gcc/testsuite/
PR rtl-optimization/115281
* gfortran.dg/pr115281.f90: New test.
---
 gcc/ira-conflicts.cc   |  3 +-
 gcc/testsuite/gfortran.dg/pr115281.f90 | 39 ++
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr115281.f90

diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index 83274c53330..15ac42d8848 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -227,8 +227,9 @@ go_through_subreg (rtx x, int *offset)
   if (REGNO (reg) < FIRST_PSEUDO_REGISTER)
 *offset = subreg_regno_offset (REGNO (reg), GET_MODE (reg),
   SUBREG_BYTE (x), GET_MODE (x));
+  /* The offset is always 0 for paradoxical subregs.  */
   else if (!can_div_trunc_p (SUBREG_BYTE (x),
-REGMODE_NATURAL_SIZE (GET_MODE (x)), offset))
+REGMODE_NATURAL_SIZE (GET_MODE (reg)), offset))
 /* Checked by validate_subreg.  We must know at compile time which
inner hard registers are being accessed.  */
 gcc_unreachable ();
diff --git a/gcc/testsuite/gfortran.dg/pr115281.f90 
b/gcc/testsuite/gfortran.dg/pr115281.f90
new file mode 100644
index 000..80aa822e745
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr115281.f90
@@ -0,0 +1,39 @@
+! { dg-options "-O3" }
+! { dg-additional-options "-mcpu=neoverse-v1" { target aarch64*-*-* } }
+
+SUBROUTINE fn0(ma, mb, nt)
+  CHARACTER ca
+  REAL r0(ma)
+  INTEGER i0(mb)
+  REAL r1(3,mb)
+  REAL r2(3,mb)
+  REAL r3(3,3)
+  zero=0.0
+  do na = 1, nt
+ nt = i0(na)
+ do l = 1, 3
+r1 (l, na) =   r0 (nt)
+r2(l, na) = zero
+ enddo
+  enddo
+  if (ca  .ne.'z') then
+ do j = 1, 3
+do i = 1, 3
+   r4  = zero
+enddo
+ enddo
+ do na = 1, nt
+do k =  1, 3
+   do l = 1, 3
+  do m = 1, 3
+ r3 = r4 * v
+  enddo
+   enddo
+enddo
+ do i = 1, 3
+   do k = 1, ifn (r3)
+   enddo
+enddo
+ enddo
+ endif
+END
-- 
2.25.1



Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-30 Thread Martin Uecker


Hi Ian,

can you give me a green light for the go changes. The C FE
changes were approved.

The only change with respect to the last version are
the removal of the unneeded null check for the
main variant (as discussed) and that I also removed the

container->decls_seen.add (TREE_TYPE (decl));

and the corresponding check, because I think it is
redundant if we also test for the main variant.
(it wasn't with TYPE_CANONICAL because this was
only added conditionally).

The other code in the file only checks for added
declarations not types, so should not depend on this.

Martin


Am Freitag, dem 24.05.2024 um 17:39 +0200 schrieb Martin Uecker:
> This is another version of this patch with two changes:
> 
> - I added a fix (with test) for PR 115177 which is just the same
> issue for hardbools which are internally implemented as enums.
> 
> - I fixed the golang issue. Since the addition of the main variant
> to the seen decls is unconditional I removed also the addition
> of the type itself which now seems unnecessary.
> 
> Bootstrapped and regression tested on x86_64.
> 
> Martin
> 
> 
> 
> C: allow aliasing of compatible types derived from enumeral types 
> [PR115157]
> 
> Aliasing of enumeral types with the underlying integer is now allowed
> by setting the aliasing set to zero.  But this does not allow aliasing
> of derived types which are compatible as required by ISO C.  Instead,
> initially set structural equality.  Then set TYPE_CANONICAL and update
> pointers and main variants when the type is completed (as done for
> structures and unions in C23).
> 
> PR 115157
> PR 115177
> 
> gcc/c/
> * c-decl.cc (shadow_tag-warned,parse_xref_tag,start_enum,
> finish_enum): Set SET_TYPE_STRUCTURAL_EQUALITY / TYPE_CANONICAL.
> * c-obj-common.cc (get_alias_set): Remove special case.
> (get_aka_type): Add special case.
> 
> gcc/c-family/
> * c-attribs.cc (handle_hardbool_attribute): Set TYPE_CANONICAL
> for hardbools.
> 
> gcc/
> * godump.cc (go_output_typedef): use TYPE_MAIN_VARIANT instead
> of TYPE_CANONICAL.
> 
> gcc/testsuite/
> * gcc.dg/enum-alias-1.c: New test.
> * gcc.dg/enum-alias-2.c: New test.
> * gcc.dg/enum-alias-3.c: New test.
> * gcc.dg/enum-alias-4.c: New test.
> 
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 04e39b41bdf..033395093b6 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -1074,6 +1074,7 @@ handle_hardbool_attribute (tree *node, tree name, tree 
> args,
>  
>TREE_SET_CODE (*node, ENUMERAL_TYPE);
>ENUM_UNDERLYING_TYPE (*node) = orig;
> +  TYPE_CANONICAL (*node) = TYPE_CANONICAL (orig);
>  
>tree false_value;
>if (args)
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index b691b91b3db..6e6606c9570 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -5051,7 +5051,7 @@ shadow_tag_warned (const struct c_declspecs *declspecs, 
> int warned)
> if (t == NULL_TREE)
>   {
> t = make_node (code);
> -   if (flag_isoc23 && code != ENUMERAL_TYPE)
> +   if (flag_isoc23 || code == ENUMERAL_TYPE)
>   SET_TYPE_STRUCTURAL_EQUALITY (t);
> pushtag (input_location, name, t);
>   }
> @@ -8828,7 +8828,7 @@ parser_xref_tag (location_t loc, enum tree_code code, 
> tree name,
>   the forward-reference will be altered into a real type.  */
>  
>ref = make_node (code);
> -  if (flag_isoc23 && code != ENUMERAL_TYPE)
> +  if (flag_isoc23 || code == ENUMERAL_TYPE)
>  SET_TYPE_STRUCTURAL_EQUALITY (ref);
>if (code == ENUMERAL_TYPE)
>  {
> @@ -9919,6 +9919,7 @@ start_enum (location_t loc, struct c_enum_contents 
> *the_enum, tree name,
>  {
>enumtype = make_node (ENUMERAL_TYPE);
>TYPE_SIZE (enumtype) = NULL_TREE;
> +  SET_TYPE_STRUCTURAL_EQUALITY (enumtype);
>pushtag (loc, name, enumtype);
>if (fixed_underlying_type != NULL_TREE)
>   {
> @@ -9935,6 +9936,8 @@ start_enum (location_t loc, struct c_enum_contents 
> *the_enum, tree name,
> TYPE_SIZE (enumtype) = NULL_TREE;
> TYPE_PRECISION (enumtype) = TYPE_PRECISION (fixed_underlying_type);
> ENUM_UNDERLYING_TYPE (enumtype) = fixed_underlying_type;
> +   TYPE_CANONICAL (enumtype) = TYPE_CANONICAL (fixed_underlying_type);
> +   c_update_type_canonical (enumtype);
> layout_type (enumtype);
>   }
>  }
> @@ -10094,6 +10097,10 @@ finish_enum (tree enumtype, tree values, tree 
> attributes)
>ENUM_UNDERLYING_TYPE (enumtype) =
>   c_common_type_for_size (TYPE_PRECISION (tem), TYPE_UNSIGNED (tem));
>  
> +  TYPE_CANONICAL (enumtype) =
> + TYPE_CANONICAL (ENUM_UNDERLYING_TYPE (enumtype));
> +  c_update_type_canonical (enumtype);
> +
>layo

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-30 Thread Richard Biener



> Am 30.05.2024 um 00:31 schrieb Jeff Law :
> 
> 
> 
>> On 5/28/24 1:01 AM, Richard Biener wrote:
>>> On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
>>>  wrote:
>>> 
>>> This patch adds a new compiler pass aimed at identifying naive CRC 
>>> implementations,
>>> characterized by the presence of a loop calculating a CRC (polynomial long 
>>> division).
>>> Upon detection of a potential CRC, the pass prints an informational message.
>>> 
>>> Performs CRC optimization if optimization level is >= 2,
>>> besides optimizations for size and if fno_gimple_crc_optimization given.
>>> 
>>> This pass is added for the detection and optimization of naive CRC 
>>> implementations,
>>> improving the efficiency of CRC-related computations.
>>> 
>>> This patch includes only initial fast checks for filtering out non-CRCs,
>>> detected possible CRCs verification and optimization parts will be provided 
>>> in subsequent patches.
>> Just a few quick questions - I'm waiting for a revision with Jeffs comments 
>> cleared before having a closer look.  The patch does
>> nothing but analyze right now, correct?  I assume a later patch will
>> fill in stuff in ::execute and use the return value of
>> loop_may_calculate_crc (it's a bit odd to review such a "split"
>> thing).
> We split it up on functional chunks.  I think if it gets approved it probably 
> should go in atomically since it makes no sense to commit the first pass 
> recognition filter without the validation step or the validation step without 
> the codegen step.
> 
> So consider the break down strictly for review convenience.
> 
> 
>> I think what this does fits final value replacement which lives in 
>> tree-scalar-evolution.cc and works from the loop-closed PHIs, trying
>> to replace those.  I'm not sure we want to have a separate pass for
>> this.  Consider a loop calculating two or four CRCs in parallel, replacing 
>> LC PHIs one-by-one should be able to handle this.
> I suspect that'll be quite hard for both the "does this generally look like a 
> CRC loop" code as well as the "validate this is a CRC loop" code.
> 
> Mariam, your thoughts on whether or not those two phases could handle a loop 
> with two CRC calculations inside, essentially creating two calls to our new 
> builtins?

The key would be to only simulate the use-def cycle from the loop-closed PHI 
(plus the loop control of course, but miter/SCEV should be enough there) and 
just replace that LC PHI, leaving loop DCE to DCE.

If we really want a separate pass (or utility to work on a single loop) then we 
might consider moving some of the final value replacement code that doesn’t 
work with only SCEV there as well.  There’s also special code in loop 
distribution for strlen recognition now, not exactly fitting in.

Note I had patches to do final value replacement on demand from CD-DCE when it 
figures a loop has no side effects besides of its reduction outputs (still want 
to pick this up at some point again).

Richard 

> Jeff
> 
> 


[pushed] wwwdocs: *: Use https to access gcc.gnu.org/onlinedocs

2024-05-30 Thread Gerald Pfeifer
Not sure we really need to keep all those docus for all our point 
releases. Might make more sense use the latest for each branch. But 
that's another discussion.

Gerald

---
 htdocs/gcc-10/index.html  | 12 ++--
 htdocs/gcc-11/index.html  | 10 +-
 htdocs/gcc-12/index.html  |  8 
 htdocs/gcc-13/index.html  |  8 
 htdocs/gcc-14/index.html  |  4 ++--
 htdocs/gcc-4.9/index.html |  8 
 htdocs/gcc-5/index.html   | 12 ++--
 htdocs/gcc-6/index.html   | 12 ++--
 htdocs/gcc-7/index.html   | 12 ++--
 htdocs/gcc-8/index.html   | 12 ++--
 htdocs/gcc-9/index.html   | 12 ++--
 11 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/htdocs/gcc-10/index.html b/htdocs/gcc-10/index.html
index 5fb1e02e..fe1e4c07 100644
--- a/htdocs/gcc-10/index.html
+++ b/htdocs/gcc-10/index.html
@@ -28,31 +28,31 @@ GCC 10.4 relative to previous releases of GCC.
 GCC 10.5
 July 7, 2023
 (changes,
- http://gcc.gnu.org/onlinedocs/10.5.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/10.5.0/";>documentation)
 
 
 GCC 10.4
 June 28, 2022
 (changes,
- http://gcc.gnu.org/onlinedocs/10.4.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/10.4.0/";>documentation)
 
 
 GCC 10.3
 April 8, 2021
 (changes,
- http://gcc.gnu.org/onlinedocs/10.3.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/10.3.0/";>documentation)
 
 
 GCC 10.2
 July 23, 2020
 (changes,
- http://gcc.gnu.org/onlinedocs/10.2.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/10.2.0/";>documentation)
 
 
 GCC 10.1
 May 7, 2020
 (changes,
- http://gcc.gnu.org/onlinedocs/10.1.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/10.1.0/";>documentation)
 
 
 
@@ -66,7 +66,7 @@ GNU Compiler Collection.
 The GCC developers would like to thank the numerous people that have
 contributed new features, improvements, bug fixes, and other changes as
 well as test results to GCC.
-This http://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Contributors.html";>amazing
+This https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Contributors.html";>amazing
 group of volunteers is what makes GCC successful.
 
 For additional information about GCC please refer to the
diff --git a/htdocs/gcc-11/index.html b/htdocs/gcc-11/index.html
index bb41c492..681da6a1 100644
--- a/htdocs/gcc-11/index.html
+++ b/htdocs/gcc-11/index.html
@@ -25,25 +25,25 @@ GCC 11.3 relative to previous releases of GCC.
 GCC 11.4
 May 29, 2023
 (changes,
- http://gcc.gnu.org/onlinedocs/11.4.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/11.4.0/";>documentation)
 
 
 GCC 11.3
 April 21, 2022
 (changes,
- http://gcc.gnu.org/onlinedocs/11.3.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/11.3.0/";>documentation)
 
 
 GCC 11.2
 July 28, 2021
 (changes,
- http://gcc.gnu.org/onlinedocs/11.2.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/11.2.0/";>documentation)
 
 
 GCC 11.1
 April 27, 2021
 (changes,
- http://gcc.gnu.org/onlinedocs/11.1.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/11.1.0/";>documentation)
 
 
 
@@ -57,7 +57,7 @@ GNU Compiler Collection.
 The GCC developers would like to thank the numerous people that have
 contributed new features, improvements, bug fixes, and other changes as
 well as test results to GCC.
-This http://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Contributors.html";>amazing
+This https://gcc.gnu.org/onlinedocs/gcc-11.1.0/gcc/Contributors.html";>amazing
 group of volunteers is what makes GCC successful.
 
 For additional information about GCC please refer to the
diff --git a/htdocs/gcc-12/index.html b/htdocs/gcc-12/index.html
index a76ef1dc..3ba4df2b 100644
--- a/htdocs/gcc-12/index.html
+++ b/htdocs/gcc-12/index.html
@@ -25,19 +25,19 @@ GCC 12.2 relative to previous releases of GCC.
 GCC 12.3
 May 8, 2023
 (changes,
- http://gcc.gnu.org/onlinedocs/12.3.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/12.3.0/";>documentation)
 
 
 GCC 12.2
 Aug 19, 2022
 (changes,
- http://gcc.gnu.org/onlinedocs/12.2.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/12.2.0/";>documentation)
 
 
 GCC 12.1
 May 6, 2022
 (changes,
- http://gcc.gnu.org/onlinedocs/12.1.0/";>documentation)
+ https://gcc.gnu.org/onlinedocs/12.1.0/";>documentation)
 
 
 
@@ -51,7 +51,7 @@ GNU Compiler Collection.
 The GCC developers would like to thank the numerous people that have
 contributed new features, improvements, bug fixes, and other changes as
 well as test results to GCC.
-This http://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Contributors.html";>amazing
+This https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Contributors.html";>amazing
 group of volunteers is what makes GCC successful.
 
 For additional information about GCC please refer to the
diff --git a/htdocs/gcc-13/index.html b/htdocs/gcc-13/index.html
index b04838db..ad1d5f1a 100644
--- a/htdocs/gcc-13/index.html
+++ b/htdocs/gcc-13/index.htm