Re: [PATCH] Fortran: passing of allocatable/pointer arguments to OPTIONAL+VALUE [PR92887]

2023-11-03 Thread Paul Richard Thomas
Hi Harald,

This looks good to me. OK for mainline.

Thanks for the patch.

Paul


On Wed, 1 Nov 2023 at 22:10, Harald Anlauf  wrote:

> Dear all,
>
> I've dusted off and cleaned up a previous attempt to fix the handling
> of allocatable or pointer actual arguments to OPTIONAL+VALUE dummies.
> The standard says that a non-allocated / non-associated actual argument
> in that case shall be treated as non-present.
>
> However, gfortran's calling conventions demand that the presence status
> for OPTIONAL+VALUE is passed as a hidden argument, while we need to
> pass something on the stack which has the right type.  The solution
> is to conditionally create a temporary when needed.
>
> Testcase checked with NAG.
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> Thanks,
> Harald
>
>


Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi, Richi.

The following is strided load/store doc:

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..4f0821a291d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 indicates stride operand is signed or unsigned number and operand 4 
is mask operand.
+operand 5 is length operand and operand 6 is bias operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+operand 2 can be negative stride if operand 3 is 0.
+Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
6) elements from memory.
+Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
should
+be loaded from memory and clear if element @var{i} of the result should be 
undefined.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+operand 2 indicates stride operand is signed or unsigned number.
+Operand 3 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+operand 1 can be negative stride if operand 2 is 0.
+Similar to mask_len_store, the instruction stores at most (operand 5 + operand 
6) elements of mask (operand 4) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,

Does it look reasonable ? Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-02 22:34
To: 钟居哲
CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Thu, 2 Nov 2023, ??? wrote:
 
> Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
 
I don't think we need signed/unsigned.  RTL expansion has the signedness
of the offset argument there and can just extend to the appropriate mode
to offset a pointer.
 
> And I wonder I should create the stride_type using size_type_node or 
> ptrdiff_type_node ?
> Which is preferrable ?
 
'sizetype' - that's the type we require to be used for 
the POINTER_PLUS_EXPR offset operand.
 
 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:27
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Hi, Richi.
> > 
> > >> Do we really need to have two modes for the optab though or could we
> > >> simply require the target to support arbitrary offset modes (give it
> > >> is implicitly constrained to ptr_mode for the base already)?  Or
> > >> properly extend/truncate the offset at expansion time, say to ptr_mode
> > >> or to the mode of sizetype.
> > 
> > For RVV, it's ok by default set stride type as ptr_mode/size_type by 
> > default.
> > Is it ok that I define strided load/store as single mode optab and default 
> > Pmode as stride operand?
> > How about scale and signed/unsigned operand ?
> > It seems scale operand can be removed ?

[PATCH] [doc] middle-end/112296 - __builtin_constant_p and side-effects

2023-11-03 Thread Richard Biener
The following tries to clarify the __builtin_constant_p documentation,
stating that the argument expression is not evaluated and side-effects
are discarded.  I'm struggling to find the correct terms matching
what the C language standard would call things so I'd appreciate
some help here.

OK for trunk?

Shall we diagnose arguments with side-effects?  It seems to me
such use is usually unintended?  I think rather than dropping
side-effects as a side-effect of folding the frontend should
discard them at parsing time instead, no?

Thanks,
Richard.

PR middle-end/112296
* doc/extend.texi (__builtin_constant_p): Clarify that
side-effects are discarded.
---
 gcc/doc/extend.texi | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fa7402813e7..c8fc4e391b5 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -14296,14 +14296,16 @@ an error if there is no such function.
 
 @defbuiltin{int __builtin_constant_p (@var{exp})}
 You can use the built-in function @code{__builtin_constant_p} to
-determine if a value is known to be constant at compile time and hence
-that GCC can perform constant-folding on expressions involving that
-value.  The argument of the function is the value to test.  The function
+determine if the expression @var{exp} is known to be constant at
+compile time and hence that GCC can perform constant-folding on expressions
+involving that value.  The argument of the function is the expression to test.
+The expression is not evaluated, side-effects are discarded.  The function
 returns the integer 1 if the argument is known to be a compile-time
-constant and 0 if it is not known to be a compile-time constant.  A
-return of 0 does not indicate that the value is @emph{not} a constant,
-but merely that GCC cannot prove it is a constant with the specified
-value of the @option{-O} option.
+constant and 0 if it is not known to be a compile-time constant.
+Any expression that has side-effects makes the function return 0.
+A return of 0 does not indicate that the expression is @emph{not} a constant,
+but merely that GCC cannot prove it is a constant within the constraints
+of the active set of optimization options.
 
 You typically use this function in an embedded application where
 memory is a critical resource.  If you have some complex calculation,
-- 
2.35.3


Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Richard Biener
On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  wrote:
>
> Hello All:
>
> Currently code sinking heuristics are based on profile data like
> basic block count and sink frequency threshold. We have removed
> such heuristics and added register pressure heuristics based on
> live-in and live-out of early blocks and immediate dominator of
> use blocks of the same loop nesting depth.
>
> Such heuristics reduces register pressure when code sinking is
> done with same loop nesting depth.
>
> High register pressure region is the region where there are live-in of
> early blocks that has been modified by the early block. If there are
> modification of the variables in best block that are live-in in early
> block that are live-out of best block.

?!  Parse error.

> Bootstrapped and regtested on powerpc64-linux-gnu.

What's the effect on code generation?

Note that live is a quadratic problem while sinking was not.  You
are effectively making the pass unfit for -O1.

You are computing "liveness" on GIMPLE where within EBBs there
isn't really any particular order of stmts, so it's kind of a garbage
heuristic.  Likewise you are not computing the effect that moving
a stmt has on liveness as far as I can see but you are just identifying
some odd metrics (I don't really understand them) to rank blocks,
not even taking the register file size into account.

You are replacing the hot/cold heuristic.

IMHO the sinking pass is the totally wrong place to do anything
about register pressure.  You are trying to solve a scheduling
problem by just looking at a single stmt.

Richard.

> Thanks & Regards
> Ajit
>
> tree-optimization: Add register pressure heuristics
>
> Currently code sinking heuristics are based on profile data like
> basic block count and sink frequency threshold. We have removed
> such heuristics to add register pressure heuristics based on
> live-in and live-out of early blocks and immediate dominator of
> use blocks.
>
> High register pressure region is the region where there are live-in of
> early blocks that has been modified by the early block. If there are
> modification of the variables in best block that are live-in in early
> block that are live-out of best block.
>
> 2023-11-03  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
> as paramters.
> (sink_code_in_bb): Ditto.
> (select_best_block): Add register pressure heuristics to select
> the best blocks in the immediate dominator for same loop nest depth.
> (execute): Add live range analysis.
> (additional_var_map): New function.
> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
> tests on ssa_names.
> (verify_live_on_entry): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +
>  gcc/tree-ssa-live.cc| 11 ++-
>  gcc/tree-ssa-sink.cc| 93 ++---
>  4 files changed, 104 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
> index f06daf23035..998fe588278 100644
> --- a/gcc/tree-ssa-live.cc
> +++ b/gcc/tree-ssa-live.cc
> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p 
> live)
>  def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>
>/* An

Re: [PATCH] vect: allow using inbranch simdclones for masked loops

2023-11-03 Thread Richard Biener
On Thu, 2 Nov 2023, Andre Vieira (lists) wrote:

> Hi,
> 
> In a previous patch I did most of the work for this, but forgot to change the
> check for number of arguments matching between call and simdclone.  This check
> should accept calls without a mask to be matched against simdclones with mask
> arguments.  I also added tests to verify this feature actually works.
> 
> 
> For the simd-builtins tests I decided to remove the sin (double) simdclone
> which would now be used, because it was inbranch and we enable their use for
> not inbranch.  Given the nature of the test, removing it made more sense, but
> thats not a strong opinion, happy to change.
> 
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu.
> 
> OK for trunk?

OK.

I do wonder about the gfortran testsuite adjustments though.

!GCC$ builtin (sin) attributes simd (inbranch)

  ! this should not be using simd clone
  y4 = sin(x8)

previously we wouldn't vectorize this as no notinbranch simd function
is available but now we do since we use the inbranch function for the
notinbranch call.  If that's desired then a better modification of
the test would be to expect vectorization, no?

Richard.

> PS: I'll be away for two weeks from tomorrow, it would be really nice if this
> can go in for gcc-14, otherwise the previous work I did for this won't have
> any actual visible effect :(
> 
> 
> gcc/ChangeLog:
> 
>   * tree-vect-stmts.cc (vectorizable_simd_clone_call): Allow unmasked
>   calls to use masked simdclones.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-simd-clone-20.c: New file.
>   * gfortran.dg/simd-builtins-1.h: Adapt.
>   * gfortran.dg/simd-builtins-6.f90: Adapt.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [tree-optimization/111721 V2] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, Juzhe-Zhong wrote:

> This patch fixes following FAILs for RVV:
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> 
> Bootstrap on X86 and regtest passed.
> 
> Ok for trunk ?

OK.  We can walk back if problems with SVE appear.

Thanks,
Richard.

> PR tree-optimization/111721
> 
> gcc/ChangeLog:
> 
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> dummy mask -1.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
> 
> ---
>  gcc/tree-vect-slp.cc   | 5 ++---
>  gcc/tree-vect-stmts.cc | 5 +++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..6b8a7b628b6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -759,9 +759,8 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
> if ((dt == vect_constant_def
>  || dt == vect_external_def)
> && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> -   && (TREE_CODE (type) == BOOLEAN_TYPE
> -   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> -   type)))
> +   && TREE_CODE (type) != BOOLEAN_TYPE
> +   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ce4868d3e1..8c92bd5d931 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9825,6 +9825,7 @@ vectorizable_load (vec_info *vinfo,
>  
>tree mask = NULL_TREE, mask_vectype = NULL_TREE;
>int mask_index = -1;
> +  slp_tree slp_op = NULL;
>if (gassign *assign = dyn_cast  (stmt_info->stmt))
>  {
>scalar_dest = gimple_assign_lhs (assign);
> @@ -9861,7 +9862,7 @@ vectorizable_load (vec_info *vinfo,
>   mask_index = vect_slp_child_index_for_operand (call, mask_index);
>if (mask_index >= 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> -   &mask, NULL, &mask_dt, &mask_vectype))
> +   &mask, &slp_op, &mask_dt, &mask_vectype))
>   return false;
>  }
>  
> @@ -10046,7 +10047,7 @@ vectorizable_load (vec_info *vinfo,
>  {
>if (slp_node
> && mask
> -   && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
> +   && !vect_maybe_update_slp_op_vectype (slp_op,
>   mask_vectype))
>   {
> if (dump_enabled_p ())
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.

we don't need operand 3

> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should

Element @var{i}.  I think we don't want to document 'undefined' but
rather match mask_gather_load and say the result should be zero.

Similar adjustments below.

> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +operand 2 indicates stride operand is signed or unsigned number.
> +Operand 3 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +operand 1 can be negative stride if operand 2 is 0.
> +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> operand 6) elements of mask (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Does it look reasonable ? Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:34
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
>  
> I don't think we need signed/unsigned.  RTL expansion has the signedness
> of the offset argument there and can just extend to the appropriate mode
> to offset a pointer.
>  
> > And I wonder I should create the stride_type using size_type_node or 
> > ptrdiff_type_node ?
> > Which is preferrable ?
>  
> 'sizetype' - that's the type we require to be used for 
> the POINTER_PLUS_EXPR offset operand.
>  
>  
> > Thanks.
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-11-02 22:27
> > To: ???
> > CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> > Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> > mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> > On Thu, 2 Nov 2023, ??? wrote:
> >  
> > > Hi, Richi.
> > > 
> > > >> Do we really need to have two modes for the optab though or could we
> > > >> simply require the target to support arbitrary offset modes (give it
> > > >

[PATCH v3] rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-11-03 Thread Surya Kumari Jangala
Hi Segher,
I have incorporated changes in the code as per the review comments provided by 
you 
for version 2 of the patch. Please review.

Regards,
Surya


rs6000/p8swap: Fix incorrect lane extraction by vec_extract() [PR106770]

In the routine rs6000_analyze_swaps(), special handling of swappable
instructions is done even if the webs that contain the swappable instructions
are not optimized, i.e., the webs do not contain any permuting load/store
instructions along with the associated register swap instructions. Doing special
handling in such webs will result in the extracted lane being adjusted
unnecessarily for vec_extract.

Another issue is that existing code treats non-permuting loads/stores as special
swappables. Non-permuting loads/stores (that have not yet been split into a
permuting load/store and a swap) are handled by converting them into a permuting
load/store (which effectively removes the swap). As a result, if special
swappables are handled only in webs containing permuting loads/stores, then
non-optimal code is generated for non-permuting loads/stores.

Hence, in this patch, all webs containing either permuting loads/ stores or
non-permuting loads/stores are marked as requiring special handling of
swappables. Swaps associated with permuting loads/stores are marked for removal,
and non-permuting loads/stores are converted to permuting loads/stores. Then the
special swappables in the webs are fixed up.

This patch also ensures that swappable instructions are not modified in the
following webs as it is incorrect to do so:
 - webs containing permuting load/store instructions and associated swap
   instructions that are transformed by converting the permuting memory
   instructions into non-permuting instructions and removing the swap
   instructions.
 - webs where swap(load(vector constant)) instructions are replaced with
   load(swapped vector constant).

2023-09-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR106770
* config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New function.
(handle_non_permuting_mem_insn): New function.
(rs6000_analyze_swaps): Handle swappable instructions only in certain
webs.
(web_requires_special_handling): New instance variable.
(handle_special_swappables): Remove handling of non-permuting load/store
instructions.

gcc/testsuite/
PR rtl-optimization/PR106770
* gcc.target/powerpc/pr106770.c: New test.
---

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 0388b9bd736..02ea299bc3d 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
   unsigned int special_handling : 4;
   /* Set if the web represented by this entry cannot be optimized.  */
   unsigned int web_not_optimizable : 1;
+  /* Set if the swappable insns in the web represented by this entry
+ have to be fixed. Swappable insns have to be fixed in:
+   - webs containing permuting loads/stores and the swap insns
+in such webs have been marked for removal
+   - webs where non-permuting loads/stores have been converted
+to permuting loads/stores  */
+  unsigned int web_requires_special_handling : 1;
   /* Set if this insn should be deleted.  */
   unsigned int will_delete : 1;
 };
@@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry *insn_entry, 
unsigned i)
   if (dump_file)
fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
   break;
-case SH_NOSWAP_LD:
-  /* Convert a non-permuting load to a permuting one.  */
-  permute_load (insn);
-  break;
-case SH_NOSWAP_ST:
-  /* Convert a non-permuting store to a permuting one.  */
-  permute_store (insn);
-  break;
 case SH_EXTRACT:
   /* Change the lane on an extract operation.  */
   adjust_extract (insn);
@@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
   free (to_delete);
 }
 
+/* Return true if insn is a non-permuting load/store.  */
+static bool
+non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  return insn_entry[i].special_handling == SH_NOSWAP_LD
+|| insn_entry[i].special_handling == SH_NOSWAP_ST;
+}
+
+/* Convert a non-permuting load/store insn to a permuting one.  */
+static void
+convert_mem_insn (swap_web_entry *insn_entry, unsigned int i)
+{
+  rtx_insn *insn = insn_entry[i].insn;
+  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
+permute_load (insn);
+  if (insn_entry[i].special_handling == SH_NOSWAP_ST)
+permute_store (insn);
+}
+
 /* Main entry point for this pass.  */
 unsigned int
 rs6000_analyze_swaps (function *fun)
@@ -2624,25 +2642,55 @@ rs6000_analyze_swaps (function *fun)
   dump_swap_insn_table (insn_entry);
 }
 
-  /* For each load and store in an optimizable web (which implies
- the loads and stores are permuting), find the associ

Ping #2: [PATCH] Power10: Add options to disable load and store vector pair.

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Fri, 13 Oct 2023 19:41:13 -0400
| From: Michael Meissner 
| Subject: [PATCH] Power10: Add options to disable load and store vector pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632987.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH 1/6] PowerPC: Add -mcpu=future option

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner 
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi,Richi.

>> Element @var{i}.  I think we don't want to document 'undefined' but
>> rather match mask_gather_load and say the result should be zero.

I am confused here,  will vectorizer depend on zero inactive value ?
Since RVV load/store doesn't touch the inactive(tail or unmasked) elements, I 
am wondering whether it will cause issues.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-03 15:40
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.
 
we don't need operand 3
 
> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should
 
Element @var{i}.  I think we don't want to document 'undefined' but
rather match mask_gather_load and say the result should be zero.
 
Similar adjustments below.
 
> +be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +operand 2 indicates stride operand is signed or unsigned number.
> +Operand 3 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 4 is mask operand, operand 5 is length operand and operand 6 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +operand 1 can be negative stride if operand 2 is 0.
> +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> operand 6) elements of mask (operand 4) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.
> +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Does it look reasonable ? Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-02 22:34
> To: ???
> CC: gcc-patches; Jeff Law; richard.sandiford; rdapp.gcc
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Thu, 2 Nov 2023, ??? wrote:
>  
> > Ok. So drop 'scale' and keep signed/unsigned argument, is that right?
>  
> I don't think we need signed/unsigned.  RTL expansion has the signedness
> of the offset argument there and can just extend to the appropriate mode
> to offset a pointer.
>  
> > And I wonder I should create the stride_type using size_type_node or 
> > ptrdiff_type_node ?
> > Which is preferrable ?
>  
> 'sizetype' - that's 

Ping #2: [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair.

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Wed, 18 Oct 2023 20:00:18 -0400
| From: Michael Meissner 
| Subject: [PATCH 2/6] PowerPC: Make -mcpu=future enable 
-mblock-ops-vector-pair.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633512.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Wed, 18 Oct 2023 20:01:54 -0400
| From: Michael Meissner 
| Subject: [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633514.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations.

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Wed, 18 Oct 2023 20:04:44 -0400
| From: Michael Meissner 
| Subject: [PATCH 5/6] PowerPC: Switch to dense math names for all MMA 
operations.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633515.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Ping #2: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.

2023-11-03 Thread Michael Meissner
Ping #2

| Date: Wed, 18 Oct 2023 20:06:20 -0400
| From: Michael Meissner 
| Subject: [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers.
| Message-ID: 

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633516.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread juzhe.zh...@rivai.ai
Hi, Richi.

Is this following reasonable ?

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..8d1cdad1607 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Bit @var{i} of the mask (operand 3) is set if element @var{i} of the result 
should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.

+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,

Thanks.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-11-03 15:40
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
mask_len_strided_load/mask_len_strided_store OPTABS/IFN
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> The following is strided load/store doc:
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..4f0821a291d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 indicates stride operand is signed or unsigned number and operand 
> 4 is mask operand.
 
we don't need operand 3
 
> +operand 5 is length operand and operand 6 is bias operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +operand 2 can be negative stride if operand 3 is 0.
> +Similar to mask_len_load, the instruction loads at most (operand 5 + operand 
> 6) elements from memory.
> +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the result 
> should
 
Element @var{i}.  I think we don't want to document 'undefined' but
rather match mask_gather_load and say the result should be zero.
 

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi?Richi.
> 
> >> Element @var{i}.  I think we don't want to document 'undefined' but
> >> rather match mask_gather_load and say the result should be zero.
> 
> I am confused here,  will vectorizer depend on zero inactive value ?

I don't think it does, but why be inconsistent with gather and at
the same time document it behaves like gather?

> Since RVV load/store doesn't touch the inactive(tail or unmasked) elements, I 
> am wondering whether it will cause issues.

For load not zeroing will cause a false dependence on the previous value
of the register, so I doubt this is optimal behavior.  For stores the
inactive elements are of course not stored.

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-03 15:40
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > The following is strided load/store doc:
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..4f0821a291d 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > Pmode.
> > +operand 3 indicates stride operand is signed or unsigned number and 
> > operand 4 is mask operand.
>  
> we don't need operand 3
>  
> > +operand 5 is length operand and operand 6 is bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i load address is operand 1 + @var{i} * operand 2.
> > +operand 2 can be negative stride if operand 3 is 0.
> > +Similar to mask_len_load, the instruction loads at most (operand 5 + 
> > operand 6) elements from memory.
> > +Bit @var{i} of the mask (operand 4) is set if element @var{i} of the 
> > result should
>  
> Element @var{i}.  I think we don't want to document 'undefined' but
> rather match mask_gather_load and say the result should be zero.
>  
> Similar adjustments below.
>  
> > +be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> > +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> > +
> >  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
> >  @item @samp{scatter_store@var{m}@var{n}}
> >  Store a vector of mode @var{m} into several distinct memory locations.
> > @@ -5131,6 +5147,21 @@ at most (operand 6 + operand 7) elements of (operand 
> > 4) to memory.
> >  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> > stored.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_store@var{m}}
> > +Store a vector of mode m into several distinct memory locations.
> > +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> > +operand 2 indicates stride operand is signed or unsigned number.
> > +Operand 3 is the vector of values that should be stored, which is of mode 
> > @var{m}.
> > +operand 4 is mask operand, operand 5 is length operand and operand 6 is 
> > bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_scatter_store@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 1 as step.
> > +For each element index i store address is operand 0 + @var{i} * operand 1.
> > +operand 1 can be negative stride if operand 2 is 0.
> > +Similar to mask_len_store, the instruction stores at most (operand 5 + 
> > operand 6) elements of mask (operand 4) to memory.
> > +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> > stored.
> > +Mask elements @var{i} with @var{i} > (operand 5 + operand 6) are ignored.
> > +
> >  @cindex @code{vec_set@var{m}} instruction pattern
> >  @item @samp{vec_set@var{m}}
> >  Set given field in the vector value.  Operand 0 is the vector to modify,
> > 
> > Does it look reasonable ? Thanks.
> > 
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-11-02 22:34
> > To: ???
> > CC: gcc-patches; Jeff Law; ri

Re: Re: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> Is this following reasonable ?
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index fab2513105a..8d1cdad1607 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of 
> the result should
>  be loaded from memory and clear if element @var{i} of the result should be 
> undefined.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> +@item @samp{mask_len_strided_load@var{m}}
> +Load several separate memory locations into a destination vector of mode 
> @var{m}.
> +Operand 0 is a destination vector of mode @var{m}.
> +Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_gather_load@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 2 as step.
> +For each element index i load address is operand 1 + @var{i} * operand 2.
> +Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
> 5) elements from memory.
> +Bit @var{i} of the mask (operand 3) is set if element @var{i} of the result 
> should

Element @var{i} of the mask

> +be loaded from memory and clear if element @var{i} of the result should be 
> zero.
> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
>  @item @samp{scatter_store@var{m}@var{n}}
>  Store a vector of mode @var{m} into several distinct memory locations.
> @@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 
> 4) to memory.
>  Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
> stored.
>  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> 
> +@cindex @code{mask_len_strided_store@var{m}} instruction pattern
> +@item @samp{mask_len_strided_store@var{m}}
> +Store a vector of mode m into several distinct memory locations.
> +Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
> +Operand 2 is the vector of values that should be stored, which is of mode 
> @var{m}.
> +operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
> operand.
> +The instruction can be seen as a special case of 
> @code{mask_len_scatter_store@var{m}@var{n}}
> +with an offset vector that is a @code{vec_series} with operand 1 as base and 
> operand 1 as step.
> +For each element index i store address is operand 0 + @var{i} * operand 1.
> +Similar to mask_len_store, the instruction stores at most (operand 4 + 
> operand 5) elements of mask (operand 3) to memory.
> +Bit @var{i} of the mask is set if element @var{i} of (operand 3) should be 
> stored.

Element @var{i}

Otherwise looks OK

> +Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
> +
>  @cindex @code{vec_set@var{m}} instruction pattern
>  @item @samp{vec_set@var{m}}
>  Set given field in the vector value.  Operand 0 is the vector to modify,
> 
> Thanks.
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-11-03 15:40
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; jeffreyalaw; richard.sandiford; Robin Dapp
> Subject: Re: Re: [PATCH V2] OPTABS/IFN: Add 
> mask_len_strided_load/mask_len_strided_store OPTABS/IFN
> On Fri, 3 Nov 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > The following is strided load/store doc:
> > 
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index fab2513105a..4f0821a291d 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5094,6 +5094,22 @@ Bit @var{i} of the mask is set if element @var{i} of 
> > the result should
> >  be loaded from memory and clear if element @var{i} of the result should be 
> > undefined.
> >  Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
> > 
> > +@cindex @code{mask_len_strided_load@var{m}} instruction pattern
> > +@item @samp{mask_len_strided_load@var{m}}
> > +Load several separate memory locations into a destination vector of mode 
> > @var{m}.
> > +Operand 0 is a destination vector of mode @var{m}.
> > +Operand 1 is a scalar base address and operand 2 is a scalar stride of 
> > Pmode.
> > +operand 3 indicates stride operand is signed or unsigned number and 
> > operand 4 is mask operand.
>  
> we don't need operand 3
>  
> > +operand 5 is length operand and operand 6 is bias operand.
> > +The instruction can be seen as a special case of 
> > @code{mask_len_gather_load@var{m}@var{n}}
> > +with an offset vector that is a @code{vec_series} with operand 1 as base 
> > and operand 2 as step.
> > +For each element index i load address is operand 1 + @var{i} * operand 2.
> > +operand 2 can be negative stride if op

Re: [PATCH] Fix PR ada/111909 On Darwin, determine filesystem case sensitivity at runtime

2023-11-03 Thread Arnaud Charlet
Hi Simon,

In addition to the non portable issues already mentioned, this change isn't OK 
also
for other reasons.

Basically this function is global and decides once for all on the case 
sensitivity, while
the case sensitiviy is on a per filsystem basis as you noted.

So without changing fundamentally the model, you can't decide dynamically for 
the whole
system. Making the choice based on the current directory is pretty random, 
since the current
directory isn't well defined at program's start up and could be pretty much any 
filesystem.

Note that the current setting on arm is actually for iOS, which we did support 
at AdaCore
at some point (and could revive in the future, who knows).

So it would be fine to refine the test to differentiate between macOS and 
embedded iOS and co,
that would be a better change here.

> This change affects only Ada.
> 
> In gcc/ada/adaint.c(__gnat_get_file_names_case_sensitive), the
> assumption for __APPLE__ is that file names are case-insensitive
> unless __arm__ or __arm64__ are defined, in which case file names
> are declared case-sensitive.
> 
> The associated comment is
>   "By default, we suppose filesystems aren't case sensitive on
>   Windows and Darwin (but they are on arm-darwin)."
> 
> This means that on aarch64-apple-darwin, file names are declared
> case-sensitive, which is not normally the case (but users can set
> up case-sensitive volumes).
> 
> It's understood that GCC does not currently support iOS/tvOS/watchOS,
> so we assume macOS.
> 
> Bootstrapped on x86_64-apple-darwin with languages c,c++,ada and regression 
> tested (check-gnat).
> Also, tested with the example from PR ada/81114, extracted into 4 volumes 
> (APFS, APFS-case-sensitive,
> HFS, HFS-case-sensitive; the example code built successfully on the 
> case-sensitive volumes.
> Setting GNAT_FILE_NAME_CASE_SENSITIVE successfully overrode the choices made 
> by the
> new code.
> 
>  gcc/ada/Changelog:
> 
>  2023-10-29 Simon Wright 
> 
>  PR ada/111909
> 
>  * gcc/ada/adaint.c
>   (__gnat_get_file_names_case_sensitive): Remove the checks for
>   __arm__, __arm64__.
>   Split out the check for __APPLE__; remove the checks for __arm__,
>   __arm64__, and use getattrlist(2) to determine whether the current
>   working directory is on a case-sensitive filesystem.
> 
> Signed-off-by: Simon Wright 


[PATCH v1] LoongArch: Fix instruction name typo in lsx_vreplgr2vr_ template

2023-11-03 Thread Chenghui Pan
gcc/ChangeLog:

* config/loongarch/lsx.md: Fix instruction name typo in
lsx_vreplgr2vr_ template.
---
 gcc/config/loongarch/lsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 4af32c8dfe1..55c7d79a030 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -1523,7 +1523,7 @@ (define_insn "lsx_vreplgr2vr_"
   "ISA_HAS_LSX"
 {
   if (which_alternative == 1)
-return "ldi.\t%w0,0";
+return "vldi.\t%w0,0";
 
   if (!TARGET_64BIT && (mode == V2DImode || mode == V2DFmode))
 return "#";
-- 
2.36.0



Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Robin Dapp
> Could you explain why a special expansion is needed?  (Sorry if you already
> have and I missed it, bit overloaded ATM.)  What does it do that is
> different from what expand_fn_using_insn would do?

All it does (in excess) is shuffle the arguments - vcond_mask_len has the
mask as third operand similar to vcond_mask while vec_cond has the mask
first.  I can swap them in the IFN already but when not swapping we will
either be inconsistent with vec_cond or with vcond_mask.

Regards
 Robin


Re: [ARC PATCH] Improve DImode left shift by a single bit.

2023-11-03 Thread Claudiu Zissulescu Ianculescu
Missed this one.

Ok, please proceed with the commit.

Thank you for your contribution,
Claudiu

On Sat, Oct 28, 2023 at 4:05 PM Roger Sayle  wrote:
>
>
> This patch improves the code generated for X << 1 (and for X + X) when
> X is 64-bit DImode, using the same two instruction code sequence used
> for DImode addition.
>
> For the test case:
>
> long long foo(long long x) { return x << 1; }
>
> GCC -O2 currently generates the following code:
>
> foo:lsr r2,r0,31
> asl_s   r1,r1,1
> asl_s   r0,r0,1
> j_s.d   [blink]
> or_sr1,r1,r2
>
> and on CPU without a barrel shifter, i.e. -mcpu=em
>
> foo:add.f   0,r0,r0
> asl_s   r1,r1
> rlc r2,0
> asl_s   r0,r0
> j_s.d   [blink]
> or_sr1,r1,r2
>
> with this patch (both with and without a barrel shifter):
>
> foo:add.f   r0,r0,r0
> j_s.d   [blink]
> adc r1,r1,r1
>
> [For Jeff Law's benefit a similar optimization is also applicable to
> H8300H, that could also use a two instruction sequence (plus rts) but
> currently GCC generates 16 instructions (plus an rts) for foo above.]
>
> Tested with a cross-compiler to arc-linux hosted on x86_64,
> with no new (compile-only) regressions from make -k check.
> Ok for mainline if this passes Claudiu's nightly testing?
>
> 2023-10-28  Roger Sayle  
>
> gcc/ChangeLog
> * config/arc/arc.md (addsi3): Fix GNU-style code formatting.
> (adddi3): Change define_expand to generate an *adddi3.
> (*adddi3): New define_insn_and_split to lower DImode additions
> during the split1 pass (after combine and before reload).
> (ashldi3): New define_expand to (only) generate *ashldi3_cnt1
> for DImode left shifts by a single bit.
> (*ashldi3_cnt1): New define_insn_and_split to lower DImode
> left shifts by one bit to an *adddi3.
>
> gcc/testsuite/ChangeLog
> * gcc.target/arc/adddi3-1.c: New test case.
> * gcc.target/arc/ashldi3-1.c: Likewise.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Richard Sandiford
Robin Dapp  writes:
>> Could you explain why a special expansion is needed?  (Sorry if you already
>> have and I missed it, bit overloaded ATM.)  What does it do that is
>> different from what expand_fn_using_insn would do?
>
> All it does (in excess) is shuffle the arguments - vcond_mask_len has the
> mask as third operand similar to vcond_mask while vec_cond has the mask
> first.  I can swap them in the IFN already but when not swapping we will
> either be inconsistent with vec_cond or with vcond_mask.

Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
operands, even if that makes things inconsistent with vcond_mask.
vcond_mask isn't really a good example to follow, since the operand
order is not only inconsistent with the IFN, it's also inconsistent
with the natural if_then_else order.

Thanks,
Richard



Re: libstdc++ patch RFA: Fix dl_iterate_phdr configury for libbacktrace

2023-11-03 Thread Jonathan Wakely
On Fri, 3 Nov 2023 at 03:04, Ian Lance Taylor  wrote:
>
> The libbacktrace sources, as used by libstdc++-v3, fail to correctly
> determine whether the system supports dl_iterate_phdr.  The issue is
> that the libbacktrace configure assumes that _GNU_SOURCE is defined
> during compilation, but the libstdc++-v3 configure does not do that.
> This configury failure is the cause of PR 112263.
>
> This patch fixes the problem.  OK for mainline?

Thanks for figuring this one out, I forget toabout _GNU_SOURCE often,
because g++ defines it automatically.

OK for mainline and gcc-13 and gcc-12, thanks!

>
> Ian
>
> PR libbacktrace/112263
> * acinclude.m4: Set -D_GNU_SOURCE in BACKTRACE_CPPFLAGS and when
> grepping link.h for dl_iterate_phdr.
> * configure: Regenerate.



[PATCH] OPTAB: Add mask_len_strided_load/mask_len_strided_store optab

2023-11-03 Thread Juzhe-Zhong
gcc/ChangeLog:

* doc/md.texi: Add mask_len_strided_load/mask_len_strided_store optab.
* optabs.def (OPTAB_D): Ditto.

---
 gcc/doc/md.texi | 27 +++
 gcc/optabs.def  |  2 ++
 2 files changed, 29 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index fab2513105a..eee4fe156e4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,6 +5094,20 @@ Bit @var{i} of the mask is set if element @var{i} of the 
result should
 be loaded from memory and clear if element @var{i} of the result should be 
undefined.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_load@var{m}} instruction pattern
+@item @samp{mask_len_strided_load@var{m}}
+Load several separate memory locations into a destination vector of mode 
@var{m}.
+Operand 0 is a destination vector of mode @var{m}.
+Operand 1 is a scalar base address and operand 2 is a scalar stride of Pmode.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_gather_load@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 2 as step.
+For each element index i load address is operand 1 + @var{i} * operand 2.
+Similar to mask_len_load, the instruction loads at most (operand 4 + operand 
5) elements from memory.
+Element @var{i} of the mask (operand 3) is set if element @var{i} of the 
result should
+be loaded from memory and clear if element @var{i} of the result should be 
zero.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{scatter_store@var{m}@var{n}} instruction pattern
 @item @samp{scatter_store@var{m}@var{n}}
 Store a vector of mode @var{m} into several distinct memory locations.
@@ -5131,6 +5145,19 @@ at most (operand 6 + operand 7) elements of (operand 4) 
to memory.
 Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be 
stored.
 Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored.
 
+@cindex @code{mask_len_strided_store@var{m}} instruction pattern
+@item @samp{mask_len_strided_store@var{m}}
+Store a vector of mode m into several distinct memory locations.
+Operand 0 is a scalar base address and operand 1 is scalar stride of Pmode.
+Operand 2 is the vector of values that should be stored, which is of mode 
@var{m}.
+operand 3 is mask operand, operand 4 is length operand and operand 5 is bias 
operand.
+The instruction can be seen as a special case of 
@code{mask_len_scatter_store@var{m}@var{n}}
+with an offset vector that is a @code{vec_series} with operand 1 as base and 
operand 1 as step.
+For each element index i store address is operand 0 + @var{i} * operand 1.
+Similar to mask_len_store, the instruction stores at most (operand 4 + operand 
5) elements of mask (operand 3) to memory.
+Element @var{i} of the mask is set if element @var{i} of (operand 3) should be 
stored.
+Mask elements @var{i} with @var{i} > (operand 4 + operand 5) are ignored.
+
 @cindex @code{vec_set@var{m}} instruction pattern
 @item @samp{vec_set@var{m}}
 Set given field in the vector value.  Operand 0 is the vector to modify,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..9ae677f8f27 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -536,4 +536,6 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (mask_len_strided_load_optab, "mask_len_strided_load_$a")
+OPTAB_D (mask_len_strided_store_optab, "mask_len_strided_store_$a")
 OPTAB_D (select_vl_optab, "select_vl$a")
-- 
2.36.3



Re: [PATCH] vect: allow using inbranch simdclones for masked loops

2023-11-03 Thread Andre Vieira (lists)




On 03/11/2023 07:31, Richard Biener wrote:



OK.

I do wonder about the gfortran testsuite adjustments though.

!GCC$ builtin (sin) attributes simd (inbranch)

   ! this should not be using simd clone
   y4 = sin(x8)

previously we wouldn't vectorize this as no notinbranch simd function
is available but now we do since we use the inbranch function for the
notinbranch call.  If that's desired then a better modification of
the test would be to expect vectorization, no?



I was in two minds about this. I interpreted the test to be about the 
fact that sin is overloaded in fortran, given the name of the program 
'program test_overloaded_intrinsic', and thus I thought it was testing 
that it calls sinf when a real(4) is passed and sin for a real(8) and 
that simdclones aren't used for the wrong overload. That doesn't quite 
explain why the pragma for sin(double) was added in the first place, 
that wouldn't have been necessary, but then again neither are the cos 
and cosf.


Happy to put it back in and test that the 'masked' simdclone is used 
using some regexp too.


Re: [PATCH] vect: allow using inbranch simdclones for masked loops

2023-11-03 Thread Richard Biener
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:

> 
> 
> On 03/11/2023 07:31, Richard Biener wrote:
> 
> > 
> > OK.
> > 
> > I do wonder about the gfortran testsuite adjustments though.
> > 
> > !GCC$ builtin (sin) attributes simd (inbranch)
> > 
> >! this should not be using simd clone
> >y4 = sin(x8)
> > 
> > previously we wouldn't vectorize this as no notinbranch simd function
> > is available but now we do since we use the inbranch function for the
> > notinbranch call.  If that's desired then a better modification of
> > the test would be to expect vectorization, no?
> > 
> 
> I was in two minds about this. I interpreted the test to be about the fact
> that sin is overloaded in fortran, given the name of the program 'program
> test_overloaded_intrinsic', and thus I thought it was testing that it calls
> sinf when a real(4) is passed and sin for a real(8) and that simdclones aren't
> used for the wrong overload. That doesn't quite explain why the pragma for
> sin(double) was added in the first place, that wouldn't have been necessary,
> but then again neither are the cos and cosf.
> 
> Happy to put it back in and test that the 'masked' simdclone is used using
> some regexp too.

Looking at when the test was added it was added when supporting
-fpre-include.  So it hardly was a test for our SIMD capabilities
but for having those OMP simd declarations.

Your original change is OK.

Richard.


Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Ajit Agarwal
Hello Richard:

On 03/11/23 12:51 pm, Richard Biener wrote:
> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  wrote:
>>
>> Hello All:
>>
>> Currently code sinking heuristics are based on profile data like
>> basic block count and sink frequency threshold. We have removed
>> such heuristics and added register pressure heuristics based on
>> live-in and live-out of early blocks and immediate dominator of
>> use blocks of the same loop nesting depth.
>>
>> Such heuristics reduces register pressure when code sinking is
>> done with same loop nesting depth.
>>
>> High register pressure region is the region where there are live-in of
>> early blocks that has been modified by the early block. If there are
>> modification of the variables in best block that are live-in in early
>> block that are live-out of best block.
> 
> ?!  Parse error.
> 

I didnt understand what you meant here. Please suggest.

>> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> What's the effect on code generation?
> 
> Note that live is a quadratic problem while sinking was not.  You
> are effectively making the pass unfit for -O1.
> 
> You are computing "liveness" on GIMPLE where within EBBs there
> isn't really any particular order of stmts, so it's kind of a garbage
> heuristic.  Likewise you are not computing the effect that moving
> a stmt has on liveness as far as I can see but you are just identifying
> some odd metrics (I don't really understand them) to rank blocks,
> not even taking the register file size into account.


if the live out of best_bb  <= live out of early_bb, that shows
that there are modification in best_bb. Then it's 
safer to move statements in best_bb as there are lesser interfering
live variables in best_bb.

if there are lesser live out in best_bb, there is lesser chance 
of interfering live ranges and hence moving statements in best_bb
will not increase register pressure.

If the liveout of best_bb is greater than live-out of early_bb, 
moving statements in best_bb will increase chances of more interfering
live ranges and hence increase in register pressure.

This is how the heuristics is defined.


> 
> You are replacing the hot/cold heuristic.

> 
> IMHO the sinking pass is the totally wrong place to do anything
> about register pressure.  You are trying to solve a scheduling
> problem by just looking at a single stmt.
> 

bb->count from profile.cc are prone to errors as you have 
mentioned in previous mails. Main bottlenecks with code 
motion is increase in register pressure as that counts to 
spills in later phases of the compiler backend.

Calculation of best_bb based of immediate dominator should
consider register pressure instead of hold cold regions as that
would effect code generation.

If there is increase in register pressure with code motion and if
we are moving into colder regions, that wont improve code generations.

Hold/cold should be criteria but not the improved criteria with 
code motion.

We should consider register pressure with code motion than hot/cold
regions.

Thanks & Regards
Ajit

> Richard.
> 
>> Thanks & Regards


>> Ajit
>>
>> tree-optimization: Add register pressure heuristics
>>
>> Currently code sinking heuristics are based on profile data like
>> basic block count and sink frequency threshold. We have removed
>> such heuristics to add register pressure heuristics based on
>> live-in and live-out of early blocks and immediate dominator of
>> use blocks.
>>
>> High register pressure region is the region where there are live-in of
>> early blocks that has been modified by the early block. If there are
>> modification of the variables in best block that are live-in in early
>> block that are live-out of best block.
>>
>> 2023-11-03  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>> as paramters.
>> (sink_code_in_bb): Ditto.
>> (select_best_block): Add register pressure heuristics to select
>> the best blocks in the immediate dominator for same loop nest depth.
>> (execute): Add live range analysis.
>> (additional_var_map): New function.
>> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>> tests on ssa_names.
>> (verify_live_on_entry): Ditto.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +
>>  gcc/tree-ssa-live.cc| 11 ++-
>>  gcc/tree-ssa-sink.cc| 93 ++---
>>  4 files changed, 104 insertions(+), 34 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
>> b/gcc/testsuite/gcc.dg/tree-ss

[RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple
address register classes.  For APX EGPR targets, some instructions can't be
encoded with REX2 prefix, so it is necessary to limit address register
class to avoid REX2 registers.  The same situation happens for instructions
with high registers, where the REX register can not be used in the address,
so the existing infrastructure can be adapted to also handle this case.

The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
introduces no functional changes, although it fixes a couple of inconsistent
attribute values in passing.

A follow-up patch will use the above infrastructure to limit address register
class to legacy registers for instructions with high registers.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
Rename to ...
(ix86_memory_address_reg_class): ... this.  Generalize address
register class handling to allow multiple address register classes.
Return maximal class for unrecognized instructions.  Improve comments.
(ix86_insn_base_reg_class): Rewrite to handle
multiple address register classes.
(ix86_regno_ok_for_insn_base_p): Ditto.
(ix86_insn_index_reg_class): Ditto.
* config/i386/i386.md: Rename "gpr32" attribute to "addr"
and substitute its values with "0" -> "rex", "1" -> "*".
(addr): New attribute to limit allowed address register set.
(gpr32): Remove.
* config/i386/mmx.md: Rename "gpr32" attribute to "addr"
and substitute its values with "0" -> "rex", "1" -> "*".
* config/i386/sse.md: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Comments welcome.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0f17f7d0258..e934b14145f 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11357,93 +11357,110 @@ ix86_validate_address_register (rtx op)
   return NULL_RTX;
 }
 
-/* Return true if insn memory address can use any available reg
-   in BASE_REG_CLASS or INDEX_REG_CLASS, otherwise false.
-   For APX, some instruction can't be encoded with gpr32
-   which is BASE_REG_CLASS or INDEX_REG_CLASS, for that case
-   returns false.  */
-static bool
-ix86_memory_address_use_extended_reg_class_p (rtx_insn* insn)
+/* Classify which memory address registers insn can use.  */
+
+static enum attr_addr
+ix86_memory_address_reg_class (rtx_insn* insn)
 {
-  /* LRA will do some initialization with insn == NULL,
- return the maximum reg class for that.
- For other cases, real insn will be passed and checked.  */
-  bool ret = true;
+  /* LRA can do some initialization with NULL insn,
+ return maximum register class in this case.  */
+  enum attr_addr addr_rclass = ADDR_REX2;
+
   if (TARGET_APX_EGPR && insn)
 {
   if (asm_noperands (PATTERN (insn)) >= 0
  || GET_CODE (PATTERN (insn)) == ASM_INPUT)
-   return ix86_apx_inline_asm_use_gpr32;
+   return ix86_apx_inline_asm_use_gpr32 ? ADDR_REX2 : ADDR_REX;
 
+  /* Return maximum register class for unrecognized instructions.  */
   if (INSN_CODE (insn) < 0)
-   return false;
+   return addr_rclass;
 
-  /* Try recog the insn before calling get_attr_gpr32. Save
-the current recog_data first.  */
-  /* Also save which_alternative for current recog.  */
+  /* Try to recognize the insn before calling get_attr_addr.
+Save current recog_data and current alternative.  */
+  struct recog_data_d old_recog_data = recog_data;
+  int old_alternative = which_alternative;
 
-  struct recog_data_d recog_data_save = recog_data;
-  int which_alternative_saved = which_alternative;
-
-  /* Update the recog_data for alternative check. */
+  /* Update recog_data for processing of alternatives.  */
   if (recog_data.insn != insn)
extract_insn_cached (insn);
 
-  /* If alternative is not set, loop throught each alternative
-of insn and get gpr32 attr for all enabled alternatives.
-If any enabled alternatives has 0 value for gpr32, disallow
-gpr32 for addressing.  */
-  if (which_alternative_saved == -1)
+  /* If current alternative is not set, loop throught enabled
+alternatives and get the most limited register class.  */
+  if (old_alternative == -1)
{
  alternative_mask enabled = get_enabled_alternatives (insn);
- bool curr_insn_gpr32 = false;
+
  for (int i = 0; i < recog_data.n_alternatives; i++)
{
  if (!TEST_BIT (enabled, i))
continue;
+
  which_alternative = i;
- curr_insn_gpr32 = get_attr_gpr32 (insn);
- if (!curr_insn_gpr32)
-   ret = false;
+ addr_rclass = MIN (addr_rclass, get_attr_addr (insn));
}
}
   else
{
- which_alternative = which_alternative_saved;
- ret = get_attr_gpr32 (in

[PATCH] tree-optimization/112310 - code hoisting undefined behavior

2023-11-03 Thread Richard Biener
The following avoids hoisting expressions that may invoke undefined
behavior and are not computed on all paths.  This is realized by
noting that we have to avoid materializing expressions as part
of hoisting that are not part of the set of expressions we have
found eligible for hoisting.  Instead of picking the expression
corresponding to the hoistable values from the first successor
we now keep a union of the expressions so that hoisting can pick
the expression that has its dependences fully hoistable.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112310
* tree-ssa-pre.cc (do_hoist_insertion): Keep the union
of expressions, validate dependences are contained within
the hoistable set before hoisting.

* gcc.dg/torture/pr112310.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr112310.c | 36 +
 gcc/tree-ssa-pre.cc | 26 --
 2 files changed, 54 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr112310.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr112310.c 
b/gcc/testsuite/gcc.dg/torture/pr112310.c
new file mode 100644
index 000..daf2390734c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr112310.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target int32plus } */
+
+extern void abort (void);
+short a, b;
+static int c, d, e, f = -7;
+char g;
+int *h = &d;
+int **i = &h;
+int j;
+short(k)(int l) { return l >= 2 ? 0 : l; }
+int(m)(int l, int n) { return l < -2147483647 / n ? l : l * n; }
+void o() { &c; }
+int main()
+{
+  int p;
+  for (; g <= 3; g++)
+{
+  for (; c; c++)
+   ;
+  a = 2;
+  for (; a <= 7; a++) {
+ short *r = &b;
+ p = m(*h, 2022160547);
+ unsigned q = 2022160547;
+ e = p * q;
+ *r ^= e;
+ j = k(c + 3);
+ **i = 0;
+  }
+  *i = &f;
+}
+  if (b != -3189)
+abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 07fb165b2a8..361806ac2c9 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -3635,10 +3635,9 @@ do_hoist_insertion (basic_block block)
 return false;
 
   /* We have multiple successors, compute ANTIC_OUT by taking the intersection
- of all of ANTIC_IN translating through PHI nodes.  Note we do not have to
- worry about iteration stability here so just use the expression set
- from the first set and prune that by sorted_array_from_bitmap_set.
- This is a simplification of what we do in compute_antic_aux.  */
+ of all of ANTIC_IN translating through PHI nodes.  Track the union
+ of the expression sets so we can pick a representative that is
+ fully generatable out of hoistable expressions.  */
   bitmap_set_t ANTIC_OUT = bitmap_set_new ();
   bool first = true;
   FOR_EACH_EDGE (e, ei, block->succs)
@@ -3653,10 +3652,15 @@ do_hoist_insertion (basic_block block)
  bitmap_set_t tmp = bitmap_set_new ();
  phi_translate_set (tmp, ANTIC_IN (e->dest), e);
  bitmap_and_into (&ANTIC_OUT->values, &tmp->values);
+ bitmap_ior_into (&ANTIC_OUT->expressions, &tmp->expressions);
  bitmap_set_free (tmp);
}
   else
-   bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN (e->dest)->values);
+   {
+ bitmap_and_into (&ANTIC_OUT->values, &ANTIC_IN (e->dest)->values);
+ bitmap_ior_into (&ANTIC_OUT->expressions,
+  &ANTIC_IN (e->dest)->expressions);
+   }
 }
 
   /* Compute the set of hoistable expressions from ANTIC_OUT.  First compute
@@ -3697,15 +3701,13 @@ do_hoist_insertion (basic_block block)
   return false;
 }
 
-  /* Hack hoitable_set in-place so we can use sorted_array_from_bitmap_set.  */
+  /* Hack hoistable_set in-place so we can use sorted_array_from_bitmap_set.  
*/
   bitmap_move (&hoistable_set.values, &availout_in_some);
   hoistable_set.expressions = ANTIC_OUT->expressions;
 
   /* Now finally construct the topological-ordered expression set.  */
   vec exprs = sorted_array_from_bitmap_set (&hoistable_set);
 
-  bitmap_clear (&hoistable_set.values);
-
   /* If there are candidate values for hoisting, insert expressions
  strategically to make the hoistable expressions fully redundant.  */
   pre_expr expr;
@@ -3737,6 +3739,13 @@ do_hoist_insertion (basic_block block)
  && FLOAT_TYPE_P (get_expr_type (expr)))
continue;
 
+  /* Only hoist if the full expression is available for hoisting.
+This avoids hoisting values that are not common and for
+example evaluate an expression that's not valid to evaluate
+unconditionally (PR112310).  */
+  if (!valid_in_sets (&hoistable_set, AVAIL_OUT (block), expr))
+   continue;
+
   /* OK, we should hoist this value.  Perform the transformation.  */
   pre_stats.hoist_insert++;
   if (dump_file && (dump_flags & TDF_DETAILS))

Re: [PATCH v4] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-03 Thread Alex Coplan
Hi Jason,

On 26/10/2023 17:06, Jason Merrill wrote:
> On 10/25/23 06:28, Alex Coplan wrote:
> > On 11/10/2023 14:31, Alex Coplan wrote:
> > > On 27/09/2023 15:27, Alex Coplan wrote:
> > > > Hi,
> > > > 
> > > > This is a v4 patch to address Jason's feedback here:
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630911.html
> > > > 
> > > > w.r.t. v3 it just removes a comment now that some uncertainty around
> > > > cxx_binary_literals has been resolved, and updates the documentation as
> > > > suggested to point to the Clang docs.
> > > > 
> > > > --
> 
> Incidentally, putting a 8< or >8 in the line of dashes lets git am
> --scissors prune the text above the line.
> 
> > > > This patch implements clang's __has_feature and __has_extension in GCC.
> > > > Currently the patch aims to implement all documented features (and some
> > > > undocumented ones) following the documentation at
> > > > https://clang.llvm.org/docs/LanguageExtensions.html with the exception
> > > > of the legacy features for C++ type traits.  These are omitted, since as
> > > > the clang documentation notes, __has_builtin is the correct "modern" way
> > > > to query for these (which GCC already implements).
> > > 
> > > Gentle ping on this:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631525.html
> > 
> > Ping^2
> 
> > +static const hf_feature_info has_feature_table[] =
> 
> You might use constexpr for these tables?

I'll give that a go, thanks.

> 
> OK either way, thanks!

Thanks a lot for the reviews.  Just to clarify, does this OK count for the C
front-end parts as well as the C++ and generic parts, or do we need a C
front-end maintainer to take a look at those bits?

Thanks,
Alex

> 
> Jason
> 


[PATCH] tree-optimization/112366 - remove assert for failed live lane code gen

2023-11-03 Thread Richard Biener
The following removes a bogus assert constraining the uses that
could appear when a built from scalar defs SLP node constrains
code generation in a way so earlier uses of the vector CTOR
components fail to get vectorized.  We can't really constrain the
operation such use appears in.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/112366
* tree-vect-loop.cc (vectorizable_live_operation): Remove
assert.
---
 gcc/tree-vect-loop.cc | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3b28c826b3b..2a43176bcfd 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10778,7 +10778,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
|| !PURE_SLP_STMT (vect_stmt_to_vectorize (use_stmt_info
  {
/* ???  This can happen when the live lane ends up being
-  used in a vector construction code-generated by an
+  rooted in a vector construction code-generated by an
   external SLP node (and code-generation for that already
   happened).  See gcc.dg/vect/bb-slp-47.c.
   Doing this is what would happen if that vector CTOR
@@ -10791,11 +10791,6 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
&& !vect_stmt_dominates_stmt_p (SSA_NAME_DEF_STMT (new_tree),
use_stmt))
  {
-   enum tree_code code = gimple_assign_rhs_code (use_stmt);
-   gcc_checking_assert (code == SSA_NAME
-|| code == CONSTRUCTOR
-|| code == VIEW_CONVERT_EXPR
-|| CONVERT_EXPR_CODE_P (code));
if (dump_enabled_p ())
  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
   "Using original scalar computation for "
-- 
2.35.3


Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-fno-exceptions')

2023-11-03 Thread Thomas Schwinge
Hi!

On 2023-09-08T15:30:50+0200, I wrote:
> GCC/C++ front end maintainers, please provide input on the overall
> approach here:

Jason verbally ACKed this at the GNU Tools Cauldron 2023.

> On 2023-06-15T17:15:54+0200, I wrote:
>> On 2023-06-06T20:31:21+0100, Jonathan Wakely  wrote:
>>> On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge  
>>> wrote:
 This issue comes up in context of me working on C++ support for GCN and
 nvptx target.  Those targets shall default to '-fno-exceptions' -- or,
 "in other words", '-fexceptions' is not supported.  (Details omitted
 here.)

 It did seem clear to me that with such a configuration it'll be hard to
 get clean test results.  Then I found code in
 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

 # If exceptions are disabled, mark tests expecting exceptions to be 
 enabled
 # as unsupported.
 if { ![check_effective_target_exceptions_enabled] } {
 if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" 
 $text] {
 return "::unsupported::exception handling disabled"
 }

 ..., which, in a way, sounds as if the test suite generally is meant to
 produce useful results for '-fno-exceptions', nice surprise!

 Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:

 RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'

 ..., I find that indeed this does work for a lot of test cases, where we
 then get (random example):

  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
 -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
 +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling disabled

 ..., due to:

  [...]/g++.dg/coroutines/pr99710.C: In function 'task my_coro()':
 +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception handling 
 disabled, use '-fexceptions' to enable
  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await expressions are 
 not permitted in handlers
  compiler exited with status 1

 But, we're nowhere near clean test results: PASS -> FAIL as well as
 XFAIL -> XPASS regressions, due to 'error: exception handling disabled'
 precluding other diagnostics seems to be one major issue.

 Is there interest in me producing the obvious (?) changes to those test
 cases, such that compiler g++ as well as target library libstdc++ test
 results are reasonably clean?  (If you think that's all "wasted effort",
 then I suppose I'll just locally ignore any FAILs/XPASSes/UNRESOLVEDs
 that appear in combination with
 'UNSUPPORTED: [...]: exception handling disabled'.)
>>>
>>> I would welcome that for libstdc++. [...]
>
>> Per your and my changes a few days ago, we've already got libstdc++
>> covered, [...]
>
>> Not having heard anything contrary regarding the compiler side of things,
>> I've now been working on that, see below.
>>
>>> We already have a handful of tests that use #if __cpp_exceptions to make
>>> those parts conditional on exception support.
>>
>> Yes, that's an option not for all but certainly for some test cases.
>> (I'm not looking into that now -- but this may in fact be a good
>> beginner-level task, will add to ).

Done: 
"Improve test suite coverage for '-fno-exceptions' configurations".

 Otherwise, a number of test cases need DejaGnu directives
 conditionalized on 'target exceptions_enabled'.
>>
>> Before I get to such things, even simpler: OK to push the attached
>> "Skip a number of C++ test cases for '-fno-exceptions' testing"?
>
> I've re-attached my patch from a few months ago:
> "Skip a number of C++ test cases for '-fno-exceptions' testing".
> (I'd obviously re-check for current master branch before 'git push'.)

Pushed to master branch commit fe65f4a2a39b389a4240dc59d856b082c0b5ad96
"Skip a number of C++ test cases for '-fno-exceptions' testing", see
attached.


Grüße
 Thomas


> If there is interest in this at all, I'd then later complete and submit
> my more or less WIP patches for the slightly more involved test case
> scenarios.
>
>
> Grüße
>  Thomas
>
>
 (Or,
 'error: exception handling disabled' made a "really late" diagnostic, so
 that it doesn't preclude other diagnostics?  I'll have a look.  Well,
 maybe something like: in fact do not default to '-fno-exceptions', but
 instead emit 'error: exception handling disabled' only if in a "really
 late" pass we run into exceptions-related constructs that we cannot
 support.  That'd also avoid PASS -> UNSUPPORTED "regressions" when
 exception handling in fact gets optimized away, for example.  I like that
 idea, conceptually -- but is it feasible to implement..?)
>>>
>>> IMHO just [...] using [an effective target keyword] in test
>>> selectors 

Skip a number of 'g++.dg/compat/' test cases for '-fno-exceptions' testing (was: Skip a number of C++ "split files" test cases for '-fno-exceptions' testing (was: Skip a number of C++ test cases for '

2023-11-03 Thread Thomas Schwinge
Hi!

On 2023-06-15T17:47:57+0200, I wrote:
> [...], OK to push the attached
> "Skip a number of C++ "split files" test cases for '-fno-exceptions' testing"?

The 'g++.dg/compat/' parts of this pushed to master branch in
commit e5919951b8cb0dc8af5b80dc747416fb60a9835b
"Skip a number of 'g++.dg/compat/' test cases for '-fno-exceptions' testing",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e5919951b8cb0dc8af5b80dc747416fb60a9835b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Jun 2023 16:11:11 +0200
Subject: [PATCH] Skip a number of 'g++.dg/compat/' test cases for
 '-fno-exceptions' testing

Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
  return "::unsupported::exception handling disabled"
}

..., which generally means:

-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled

However, this doesn't work for "split files" test cases.  For example:

PASS: g++.dg/compat/eh/ctor1 cp_compat_main_tst.o compile
[-PASS:-]{+UNSUPPORTED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o [-compile-]{+compile: exception handling disabled+}
[-PASS:-]{+UNSUPPORTED:+} g++.dg/compat/eh/ctor1 cp_compat_y_tst.o [-compile-]{+compile: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o-cp_compat_y_tst.o link
[-PASS:-]{+UNRESOLVED:+} g++.dg/compat/eh/ctor1 cp_compat_x_tst.o-cp_compat_y_tst.o execute

The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.

Specify 'dg-require-effective-target exceptions_enabled' for those test cases.

	gcc/testsuite/
	* g++.dg/compat/eh/ctor1_main.C: Specify
	'dg-require-effective-target exceptions_enabled'.
	* g++.dg/compat/eh/ctor2_main.C: Likewise.
	* g++.dg/compat/eh/dtor1_main.C: Likewise.
	* g++.dg/compat/eh/filter1_main.C: Likewise.
	* g++.dg/compat/eh/filter2_main.C: Likewise.
	* g++.dg/compat/eh/new1_main.C: Likewise.
	* g++.dg/compat/eh/nrv1_main.C: Likewise.
	* g++.dg/compat/eh/spec3_main.C: Likewise.
	* g++.dg/compat/eh/template1_main.C: Likewise.
	* g++.dg/compat/eh/unexpected1_main.C: Likewise.
	* g++.dg/compat/init/array5_main.C: Likewise.
---
 gcc/testsuite/g++.dg/compat/eh/ctor1_main.C   | 2 ++
 gcc/testsuite/g++.dg/compat/eh/ctor2_main.C   | 2 ++
 gcc/testsuite/g++.dg/compat/eh/dtor1_main.C   | 2 ++
 gcc/testsuite/g++.dg/compat/eh/filter1_main.C | 2 ++
 gcc/testsuite/g++.dg/compat/eh/filter2_main.C | 2 ++
 gcc/testsuite/g++.dg/compat/eh/new1_main.C| 2 ++
 gcc/testsuite/g++.dg/compat/eh/nrv1_main.C| 2 ++
 gcc/testsuite/g++.dg/compat/eh/spec3_main.C   | 2 ++
 gcc/testsuite/g++.dg/compat/eh/template1_main.C   | 2 ++
 gcc/testsuite/g++.dg/compat/eh/unexpected1_main.C | 2 ++
 gcc/testsuite/g++.dg/compat/init/array5_main.C| 2 ++
 11 files changed, 22 insertions(+)

diff --git a/gcc/testsuite/g++.dg/compat/eh/ctor1_main.C b/gcc/testsuite/g++.dg/compat/eh/ctor1_main.C
index a188b46da86..1598d9db0f8 100644
--- a/gcc/testsuite/g++.dg/compat/eh/ctor1_main.C
+++ b/gcc/testsuite/g++.dg/compat/eh/ctor1_main.C
@@ -4,6 +4,8 @@
 
 // Split into pieces for binary compatibility testing October 2002
 
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
+
 extern void ctor1_x (void);
 
 int
diff --git a/gcc/testsuite/g++.dg/compat/eh/ctor2_main.C b/gcc/testsuite/g++.dg/compat/eh/ctor2_main.C
index 58836e26eba..f79c8a2e756 100644
--- a/gcc/testsuite/g++.dg/compat/eh/ctor2_main.C
+++ b/gcc/testsuite/g++.dg/compat/eh/ctor2_main.C
@@ -4,6 +4,8 @@
 
 // Split into pieces for binary compatibility testing October 2002
 
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
+
 extern void ctor2_x (void);
 
 int main ()
diff --git a/gcc/testsuite/g++.dg/compat/eh/dtor1_main.C b/gcc/testsuite/g++.dg/compat/eh/dtor1_main.C
index 962fa64274b..1550d7403c6 100644
--- a/gcc/testsuite/g++.dg/compat/eh/dtor1_main.C
+++ b/gcc/testsuite/g++.dg/compat/eh/dtor1_main.C
@@ -5,6 +5,8 @@
 
 // Split into pieces for binary compatibility testing October 2002
 
+// Explicit { dg-require-effective-target ex

Skip a number of 'g++.dg/lto/' test cases for '-fno-exceptions' testing (was: Skip a number of C++ "split files" test cases for '-fno-exceptions' testing (was: Skip a number of C++ test cases for '-fn

2023-11-03 Thread Thomas Schwinge
Hi!

On 2023-06-15T17:47:57+0200, I wrote:
> [...], OK to push the attached
> "Skip a number of C++ "split files" test cases for '-fno-exceptions' testing"?

The 'g++.dg/lto/' parts of this pushed to master branch in
commit 94782ed70796427e6f4b15b1c2df91cd7bef28e8
"Skip a number of 'g++.dg/lto/' test cases for '-fno-exceptions' testing",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 94782ed70796427e6f4b15b1c2df91cd7bef28e8 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Jun 2023 16:11:11 +0200
Subject: [PATCH] Skip a number of 'g++.dg/lto/' test cases for
 '-fno-exceptions' testing

Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
  return "::unsupported::exception handling disabled"
}

..., which generally means:

-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled

However, this doesn't work for "split files" test cases.  For example:

[-PASS:-]{+UNSUPPORTED:+} g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o [-assemble, -fPIC -flto -flto-partition=1to1-]{+assemble: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o [-link,-]{+link+} -fPIC -flto -flto-partition=1to1
{+UNRESOLVED: g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o execute -fPIC -flto -flto-partition=1to1+}

The "compile"/"assemble" tests (either continue to work, or) result in the
expected 'UNSUPPORTED: [...] compile: exception handling disabled', but
dependent "link" and "execute" tests then turn UNRESOLVED.

Specify 'dg-require-effective-target exceptions_enabled' for those test cases.

	gcc/testsuite/
	* g++.dg/lto/20081109-1_0.C: Specify
	'dg-require-effective-target exceptions_enabled'.
	* g++.dg/lto/20081109_0.C: Likewise.
	* g++.dg/lto/20091026-1_0.C: Likewise.
	* g++.dg/lto/pr87906_0.C: Likewise.
	* g++.dg/lto/pr88046_0.C: Likewise.
---
 gcc/testsuite/g++.dg/lto/20081109-1_0.C | 1 +
 gcc/testsuite/g++.dg/lto/20081109_0.C   | 2 ++
 gcc/testsuite/g++.dg/lto/20091026-1_0.C | 1 +
 gcc/testsuite/g++.dg/lto/pr87906_0.C| 1 +
 gcc/testsuite/g++.dg/lto/pr88046_0.C| 1 +
 5 files changed, 6 insertions(+)

diff --git a/gcc/testsuite/g++.dg/lto/20081109-1_0.C b/gcc/testsuite/g++.dg/lto/20081109-1_0.C
index db0ba367fe8..7dc315b39ed 100644
--- a/gcc/testsuite/g++.dg/lto/20081109-1_0.C
+++ b/gcc/testsuite/g++.dg/lto/20081109-1_0.C
@@ -1,4 +1,5 @@
 // { dg-lto-do link }
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
 // { dg-require-effective-target fpic }
 // { dg-lto-options {{-fPIC -flto -flto-partition=1to1}} }
 // { dg-extra-ld-options "-fPIC -flto -flto-partition=1to1 -r -fno-exceptions -flinker-output=nolto-rel" }
diff --git a/gcc/testsuite/g++.dg/lto/20081109_0.C b/gcc/testsuite/g++.dg/lto/20081109_0.C
index 93cfc67fff2..4746e1c7c46 100644
--- a/gcc/testsuite/g++.dg/lto/20081109_0.C
+++ b/gcc/testsuite/g++.dg/lto/20081109_0.C
@@ -1,3 +1,5 @@
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
+
 extern "C" { void abort (void);}
 int foo (int);
 
diff --git a/gcc/testsuite/g++.dg/lto/20091026-1_0.C b/gcc/testsuite/g++.dg/lto/20091026-1_0.C
index 06eff292cb6..ca0729c52f5 100644
--- a/gcc/testsuite/g++.dg/lto/20091026-1_0.C
+++ b/gcc/testsuite/g++.dg/lto/20091026-1_0.C
@@ -1,4 +1,5 @@
 // { dg-lto-do link }
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
 // { dg-extra-ld-options "-r -nostdlib -flinker-output=nolto-rel" }
 
 #include "20091026-1_a.h"
diff --git a/gcc/testsuite/g++.dg/lto/pr87906_0.C b/gcc/testsuite/g++.dg/lto/pr87906_0.C
index 6a04cd5c6f0..623c29ca007 100644
--- a/gcc/testsuite/g++.dg/lto/pr87906_0.C
+++ b/gcc/testsuite/g++.dg/lto/pr87906_0.C
@@ -1,4 +1,5 @@
 // { dg-lto-do link }
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
 // { dg-require-effective-target fpic }
 // { dg-require-effective-target shared }
 // { dg-lto-options { { -O -fPIC -flto } } }
diff --git a/gcc/testsuite/g++.dg/lto/pr88046_0.C b/gcc/testsuite/g++.dg/lto/pr88046_0.C
index 734ce86e9b8..99224b5011f 100644
-

Skip a number of 'g++.dg/tree-prof/' test cases for '-fno-exceptions' testing (was: Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-fno-e

2023-11-03 Thread Thomas Schwinge
Hi!

On 2023-06-15T18:04:04+0200, I wrote:
> [...], OK to push the attached
> "Skip a number of C++ 'g++.dg/tree-prof/' test cases for '-fno-exceptions' 
> testing"?

Pushed to master branch commit 3881d010dca9b5db5301f28e4a1e3a8e4bc40faa
"Skip a number of 'g++.dg/tree-prof/' test cases for '-fno-exceptions' testing",
see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 3881d010dca9b5db5301f28e4a1e3a8e4bc40faa Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 14 Jun 2023 22:39:01 +0200
Subject: [PATCH] Skip a number of 'g++.dg/tree-prof/' test cases for
 '-fno-exceptions' testing

Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
  return "::unsupported::exception handling disabled"
}

..., which generally means:

-PASS: [...] (test for excess errors)
+UNSUPPORTED: [...]: exception handling disabled

However, this doesn't work for 'g++.dg/tree-prof/' test cases.  For example:

[-PASS:-]{+UNSUPPORTED:+} g++.dg/tree-prof/indir-call-prof-2.C [-compilation,  -fprofile-generate -D_PROFILE_GENERATE-]{+compilation: exception handling disabled+}
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C execution,-fprofile-generate -D_PROFILE_GENERATE
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C compilation,  -fprofile-use -D_PROFILE_USE
[-PASS:-]{+UNRESOLVED:+} g++.dg/tree-prof/indir-call-prof-2.C execution,-fprofile-use -D_PROFILE_USE

Dependent tests turn UNRESOLVED if the first "compilation" runs into the
expected 'UNSUPPORTED: [...] compile: exception handling disabled'.

Specify 'dg-require-effective-target exceptions_enabled' for those test cases.

	gcc/testsuite/
	* g++.dg/tree-prof/indir-call-prof-2.C: Specify
	'dg-require-effective-target exceptions_enabled'.
	* g++.dg/tree-prof/partition1.C: Likewise.
	* g++.dg/tree-prof/partition2.C: Likewise.
	* g++.dg/tree-prof/partition3.C: Likewise.
	* g++.dg/tree-prof/pr51719.C: Likewise.
	* g++.dg/tree-prof/pr57451.C: Likewise.
	* g++.dg/tree-prof/pr59255.C: Likewise.
---
 gcc/testsuite/g++.dg/tree-prof/indir-call-prof-2.C | 1 +
 gcc/testsuite/g++.dg/tree-prof/partition1.C| 1 +
 gcc/testsuite/g++.dg/tree-prof/partition2.C| 1 +
 gcc/testsuite/g++.dg/tree-prof/partition3.C| 1 +
 gcc/testsuite/g++.dg/tree-prof/pr51719.C   | 1 +
 gcc/testsuite/g++.dg/tree-prof/pr57451.C   | 1 +
 gcc/testsuite/g++.dg/tree-prof/pr59255.C   | 1 +
 7 files changed, 7 insertions(+)

diff --git a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof-2.C b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof-2.C
index e20cc64d373..5b6f172b025 100644
--- a/gcc/testsuite/g++.dg/tree-prof/indir-call-prof-2.C
+++ b/gcc/testsuite/g++.dg/tree-prof/indir-call-prof-2.C
@@ -1,3 +1,4 @@
+/* Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.  */
 /* { dg-options "-O" } */
 
 int foo1(void) { return 0; }
diff --git a/gcc/testsuite/g++.dg/tree-prof/partition1.C b/gcc/testsuite/g++.dg/tree-prof/partition1.C
index d0dcbc4524b..8dd64aa27a5 100644
--- a/gcc/testsuite/g++.dg/tree-prof/partition1.C
+++ b/gcc/testsuite/g++.dg/tree-prof/partition1.C
@@ -1,3 +1,4 @@
+/* Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.  */
 /* { dg-require-effective-target freorder } */
 /* { dg-options "-O2 -freorder-blocks-and-partition" } */
 
diff --git a/gcc/testsuite/g++.dg/tree-prof/partition2.C b/gcc/testsuite/g++.dg/tree-prof/partition2.C
index 0bc50fae98a..580d0e06c00 100644
--- a/gcc/testsuite/g++.dg/tree-prof/partition2.C
+++ b/gcc/testsuite/g++.dg/tree-prof/partition2.C
@@ -1,4 +1,5 @@
 // PR middle-end/45458
+// Explicit { dg-require-effective-target exceptions_enabled } so that dependent tests don't turn UNRESOLVED for '-fno-exceptions'.
 // { dg-require-effective-target freorder }
 // { dg-options "-O2 -fnon-call-exceptions -freorder-blocks-and-partition" }
 
diff --git a/gcc/testsuite/g++.dg/tree-prof/partition3.C b/gcc/testsuite/g++.dg/tree-prof/partition3.C
index c62174aa4d3..6cd51cc157e 100644
--- a/gcc/testsuite/g++.dg/tree-prof/partition3.C
+++ b/gcc/testsuite/g++.dg/tree-prof/partition3.C
@@ -1,4 +1,5 @@
 // PR middle-end/45566
+// Explicit { dg-require-effective-target exceptions_enabled } so that depend

Re: Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-fno-exceptions')

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 12:03:06PM +0100, Thomas Schwinge wrote:
> --- a/gcc/testsuite/g++.dg/cpp0x/catch1.C
> +++ b/gcc/testsuite/g++.dg/cpp0x/catch1.C
> @@ -1,5 +1,6 @@
>  // PR c++/53371
>  // { dg-do compile { target c++11 } }
> +// Explicit { dg-require-effective-target exceptions_enabled } to avoid 
> verify compiler messages FAILs for '-fno-exceptions'.

Ugh, this is just too ugly.
Please don't mix explanation comment with a directive, one line should be
just a directive and you can add explanation in a line above it.

Jakub



RE: [tree-optimization/111721 V2] VECT: Support SLP for MASK_LEN_GATHER_LOAD with dummy mask

2023-11-03 Thread Li, Pan2
Committed as passed the regression test of aarch64, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, November 3, 2023 3:36 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [tree-optimization/111721 V2] VECT: Support SLP for 
MASK_LEN_GATHER_LOAD with dummy mask

On Fri, 3 Nov 2023, Juzhe-Zhong wrote:

> This patch fixes following FAILs for RVV:
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> 
> Bootstrap on X86 and regtest passed.
> 
> Ok for trunk ?

OK.  We can walk back if problems with SVE appear.

Thanks,
Richard.

> PR tree-optimization/111721
> 
> gcc/ChangeLog:
> 
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Support SLP for 
> dummy mask -1.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
> 
> ---
>  gcc/tree-vect-slp.cc   | 5 ++---
>  gcc/tree-vect-stmts.cc | 5 +++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 43d742e3c92..6b8a7b628b6 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -759,9 +759,8 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned 
> char swap,
> if ((dt == vect_constant_def
>  || dt == vect_external_def)
> && !GET_MODE_SIZE (vinfo->vector_mode).is_constant ()
> -   && (TREE_CODE (type) == BOOLEAN_TYPE
> -   || !can_duplicate_and_interleave_p (vinfo, stmts.length (),
> -   type)))
> +   && TREE_CODE (type) != BOOLEAN_TYPE
> +   && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 6ce4868d3e1..8c92bd5d931 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9825,6 +9825,7 @@ vectorizable_load (vec_info *vinfo,
>  
>tree mask = NULL_TREE, mask_vectype = NULL_TREE;
>int mask_index = -1;
> +  slp_tree slp_op = NULL;
>if (gassign *assign = dyn_cast  (stmt_info->stmt))
>  {
>scalar_dest = gimple_assign_lhs (assign);
> @@ -9861,7 +9862,7 @@ vectorizable_load (vec_info *vinfo,
>   mask_index = vect_slp_child_index_for_operand (call, mask_index);
>if (mask_index >= 0
> && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index,
> -   &mask, NULL, &mask_dt, &mask_vectype))
> +   &mask, &slp_op, &mask_dt, &mask_vectype))
>   return false;
>  }
>  
> @@ -10046,7 +10047,7 @@ vectorizable_load (vec_info *vinfo,
>  {
>if (slp_node
> && mask
> -   && !vect_maybe_update_slp_op_vectype (SLP_TREE_CHILDREN (slp_node)[0],
> +   && !vect_maybe_update_slp_op_vectype (slp_op,
>   mask_vectype))
>   {
> if (dump_enabled_p ())
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[pushed] aarch64: Remove unnecessary can_create_pseudo_p condition

2023-11-03 Thread Richard Sandiford
This patch removes a can_create_pseudo_p condition from
*cmov_uxtw_insn_insv, bringing it in line with *cmov_insn_insv.
The constraints correctly describe the requirements.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.md (*cmov_uxtw_insn_insv): Remove
can_create_pseudo_p condition.
---
 gcc/config/aarch64/aarch64.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5bb8c772be8..bcf4bc8ccd4 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4447,7 +4447,7 @@ (define_insn_and_split "*cmov_uxtw_insn_insv"
   (match_operator:SI 1 "aarch64_comparison_operator"
[(match_operand 2 "cc_register" "") (const_int 0)]))
  (match_operand:SI 3 "general_operand" "r"]
-  "can_create_pseudo_p ()"
+  ""
   "#"
   "&& true"
   [(set (match_dup 0)
-- 
2.25.1



Re: [1/3] Add support for target_version attribute

2023-11-03 Thread Andrew Carlotti
On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > This patch adds support for the "target_version" attribute to the middle
> > end and the C++ frontend, which will be used to implement function
> > multiversioning in the aarch64 backend.
> >
> > Note that C++ is currently the only frontend which supports
> > multiversioning using the "target" attribute, whereas the
> > "target_clones" attribute is additionally supported in C, D and Ada.
> > Support for the target_version attribute will be extended to C at a
> > later date.
> >
> > Targets that currently use the "target" attribute for function
> > multiversioning (i.e. i386 and rs6000) are not affected by this patch.
> >
> >
> > I could have implemented the target hooks slightly differently, by reusing 
> > the
> > valid_attribute_p hook and adding attribute name checks to each backend
> > implementation (c.f. the aarch64 implementation in patch 2/3).  Would this 
> > be
> > preferable?
> 
> Having as much as possible in target-independent code seems better
> to me FWIW.  On that basis:
> 
> >
> > Otherwise, is this ok for master?
> >
> >
> > gcc/c-family/ChangeLog:
> >
> > * c-attribs.cc (handle_target_version_attribute): New.
> > (c_common_attribute_table): Add target_version.
> > (handle_target_clones_attribute): Add conflict with
> > target_version attribute.
> >
> > gcc/ChangeLog:
> >
> > * attribs.cc (is_function_default_version): Update comment to
> > specify incompatibility with target_version attributes.
> > * cgraphclones.cc (cgraph_node::create_version_clone_with_body):
> > Call valid_version_attribute_p for target_version attributes.
> > * target.def (valid_version_attribute_p): New hook.
> > (expanded_clones_attribute): New hook.
> > * doc/tm.texi.in: Add new hooks.
> > * doc/tm.texi: Regenerate.
> > * multiple_target.cc (create_dispatcher_calls): Remove redundant
> > is_function_default_version check.
> > (expand_target_clones): Use target hook for attribute name.
> > * targhooks.cc (default_target_option_valid_version_attribute_p):
> > New.
> > * targhooks.h (default_target_option_valid_version_attribute_p):
> > New.
> > * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
> > target_version attributes.
> >
> > gcc/cp/ChangeLog:
> >
> > * decl2.cc (check_classfn): Update comment to include
> > target_version attributes.
> >
> >
> > diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> > index 
> > b1300018d1e8ed8e02ded1ea721dc192a6d32a49..a3c4a81e8582ea4fd06b9518bf51fad7c998ddd6
> >  100644
> > --- a/gcc/attribs.cc
> > +++ b/gcc/attribs.cc
> > @@ -1233,8 +1233,9 @@ make_dispatcher_decl (const tree decl)
> >return func_decl;  
> >  }
> >  
> > -/* Returns true if decl is multi-versioned and DECL is the default 
> > function,
> > -   that is it is not tagged with target specific optimization.  */
> > +/* Returns true if DECL is multi-versioned using the target attribute, and 
> > this
> > +   is the default version.  This function can only be used for targets 
> > that do
> > +   not support the "target_version" attribute.  */
> >  
> >  bool
> >  is_function_default_version (const tree decl)
> > diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> > index 
> > 072cfb69147bd6b314459c0bd48a0c1fb92d3e4d..1a224c036277d51ab4dc0d33a403177bd226e48a
> >  100644
> > --- a/gcc/c-family/c-attribs.cc
> > +++ b/gcc/c-family/c-attribs.cc
> > @@ -148,6 +148,7 @@ static tree handle_alloc_align_attribute (tree *, tree, 
> > tree, int, bool *);
> >  static tree handle_assume_aligned_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_assume_attribute (tree *, tree, tree, int, bool *);
> >  static tree handle_target_attribute (tree *, tree, tree, int, bool *);
> > +static tree handle_target_version_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_target_clones_attribute (tree *, tree, tree, int, bool 
> > *);
> >  static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> >  static tree ignore_attribute (tree *, tree, tree, int, bool *);
> > @@ -480,6 +481,8 @@ const struct attribute_spec c_common_attribute_table[] =
> >   handle_error_attribute, NULL },
> >{ "target", 1, -1, true, false, false, false,
> >   handle_target_attribute, NULL },
> > +  { "target_version", 1, -1, true, false, false, false,
> > + handle_target_version_attribute, NULL },
> >{ "target_clones",  1, -1, true, false, false, false,
> >   handle_target_clones_attribute, NULL },
> >{ "optimize",   1, -1, true, false, false, false,
> > @@ -5569,6 +5572,45 @@ handle_target_attribute (tree *node, tree name, tree 
> > args, int flags,
> >return NULL_TREE;
> >  }
> >  
> > +/* Handle a "target_version" attribute.  */
> > +
>

[PATCH] aarch64: Rework aarch64_modes_tieable_p [PR112105]

2023-11-03 Thread Richard Sandiford
On AArch64, can_change_mode_class and modes_tieable_p are
mostly answering the same questions:

(a) Do two modes have the same layout for the bytes that are
common to both modes?

(b) Do all valid subregs involving the two modes behave as
GCC would expect?

(c) Is there at least one register that can hold both modes?

These questions involve no class-dependent tests, and the relationship
is symmetrical.  This means we can do most of the checks in a common
subroutine.

can_change_mode_class is the hook that matters for correctness,
while modes_tieable_p is more for optimisation.  It was therefore
can_change_mode_class that had the more accurate tests.
modes_tieable_p was looser in some ways (e.g. it missed some
big-endian tests) and overly strict in others (it didn't allow
ties between a vector structure mode and the mode of a single lane).
The overly strict part caused a missed combination in the testcase.

I think the can_change_mode_class logic also needed some tweaks,
as described in the changelog.

Tested on aarch64-linux-gnu.  I'll leave it a day or so for comments,
and to give the CI testers a chance to try it.

Richard


gcc/
PR target/112105
* config/aarch64/aarch64.cc (aarch64_modes_compatible_p): New
function, with the core logic extracted from...
(aarch64_can_change_mode_class): ...here.  Extend the previous rules
to allow changes between partial SVE modes and other modes if
the other mode is no bigger than an element, and if no other rule
prevents it.  Use the aarch64_modes_tieable_p handling of
partial Advanced SIMD structure modes.
(aarch64_modes_tieable_p): Use aarch64_modes_compatible_p.
Allow all vector mode ties that it allows.

gcc/testusite/
PR target/112105
* gcc.target/aarch64/pr112105.c: New test.
* gcc.target/aarch64/sve/pcs/struct_3_128.c: Expect a 32-bit spill
rather than a 16-bit spill.
---
 gcc/config/aarch64/aarch64.cc | 223 +-
 gcc/testsuite/gcc.target/aarch64/pr112105.c   |  31 +++
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c |   4 +-
 3 files changed, 147 insertions(+), 111 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr112105.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5fd7063663c..cb65ccc8465 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25215,53 +25215,131 @@ aarch64_expand_sve_vcond (machine_mode data_mode, 
machine_mode cmp_mode,
   emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
 }
 
-/* Implement TARGET_MODES_TIEABLE_P.  In principle we should always return
-   true.  However due to issues with register allocation it is preferable
-   to avoid tieing integer scalar and FP scalar modes.  Executing integer
-   operations in general registers is better than treating them as scalar
-   vector operations.  This reduces latency and avoids redundant int<->FP
-   moves.  So tie modes if they are either the same class, or vector modes
-   with other vector modes, vector structs or any scalar mode.  */
+/* Return true if:
+
+   (a) MODE1 and MODE2 use the same layout for bytes that are common
+   to both modes;
+
+   (b) subregs involving the two modes behave as the target-independent
+   subreg rules require; and
+
+   (c) there is at least one register that can hold both modes.
+
+   Return false otherwise.  */
 
 static bool
-aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
+aarch64_modes_compatible_p (machine_mode mode1, machine_mode mode2)
 {
-  if ((aarch64_advsimd_partial_struct_mode_p (mode1)
-   != aarch64_advsimd_partial_struct_mode_p (mode2))
+  unsigned int flags1 = aarch64_classify_vector_mode (mode1);
+  unsigned int flags2 = aarch64_classify_vector_mode (mode2);
+
+  bool sve1_p = (flags1 & VEC_ANY_SVE);
+  bool sve2_p = (flags2 & VEC_ANY_SVE);
+
+  bool partial_sve1_p = sve1_p && (flags1 & VEC_PARTIAL);
+  bool partial_sve2_p = sve2_p && (flags2 & VEC_PARTIAL);
+
+  bool pred1_p = (flags1 & VEC_SVE_PRED);
+  bool pred2_p = (flags2 & VEC_SVE_PRED);
+
+  bool partial_advsimd_struct1_p = (flags1 == (VEC_ADVSIMD | VEC_STRUCT
+  | VEC_PARTIAL));
+  bool partial_advsimd_struct2_p = (flags2 == (VEC_ADVSIMD | VEC_STRUCT
+  | VEC_PARTIAL));
+
+  /* Don't allow changes between predicate modes and other modes.
+ Only predicate registers can hold predicate modes and only
+ non-predicate registers can hold non-predicate modes, so any
+ attempt to mix them would require a round trip through memory.  */
+  if (pred1_p != pred2_p)
+return false;
+
+  /* The contents of partial SVE modes are distributed evenly across
+ the register, whereas GCC expects them to be clustered together.
+ We therefore need to be careful about mode changes involving them.  */
+  if (partial_sve1

Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Hongtao Liu
On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak  wrote:
>
> The patch generalizes address register class handling to allow multiple
> address register classes.  For APX EGPR targets, some instructions can't be
> encoded with REX2 prefix, so it is necessary to limit address register
> class to avoid REX2 registers.  The same situation happens for instructions
> with high registers, where the REX register can not be used in the address,
> so the existing infrastructure can be adapted to also handle this case.
>
> The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
> introduces no functional changes, although it fixes a couple of inconsistent
> attribute values in passing.

@@ -22569,9 +22578,8 @@ (define_insn "_mpsadbw"
mpsadbw\t{%3, %2, %0|%0, %2, %3}
vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "isa" "noavx,noavx,avx")
-   (set_attr "gpr32" "0,0,1")
+   (set_attr "addr" "rex")
(set_attr "type" "sselog1")
-   (set_attr "gpr32" "0")
(set_attr "length_immediate" "1")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")

I believe your fix is correct.

>
> A follow-up patch will use the above infrastructure to limit address register
> class to legacy registers for instructions with high registers.

The patch looks good to me, but please leave some time for Hongyu in
case he has any comments.

>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
> Rename to ...
> (ix86_memory_address_reg_class): ... this.  Generalize address
> register class handling to allow multiple address register classes.
> Return maximal class for unrecognized instructions.  Improve comments.
> (ix86_insn_base_reg_class): Rewrite to handle
> multiple address register classes.
> (ix86_regno_ok_for_insn_base_p): Ditto.
> (ix86_insn_index_reg_class): Ditto.
> * config/i386/i386.md: Rename "gpr32" attribute to "addr"
> and substitute its values with "0" -> "rex", "1" -> "*".
> (addr): New attribute to limit allowed address register set.
> (gpr32): Remove.
> * config/i386/mmx.md: Rename "gpr32" attribute to "addr"
> and substitute its values with "0" -> "rex", "1" -> "*".
> * config/i386/sse.md: Ditto.
>
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>
> Comments welcome.
>
> Uros.



-- 
BR,
Hongtao


[PATCH] Fortran: Fix generate_error library function fnspec

2023-11-03 Thread Martin Jambor
Hi,

when developing an otherwise unrelated patch I've discovered that the
fnspec for the Fortran library function generate_error is wrong. It is
currently ". R . R " where the first R describes the first parameter
and means that it "is only read and does not escape."  The function
itself, however, with signature:

  bool
  generate_error_common (st_parameter_common *cmp, int family, const char 
*message)

contains the following:

  /* Report status back to the compiler.  */
  cmp->flags &= ~IOPARM_LIBRETURN_MASK;

which does not correspond to the fnspec and breaks testcase
gfortran.dg/large_unit_2.f90 when my patch is applied, since it tries
to re-use the flags from before the call.

This patch replaces the "R" with "W" which stands for "specifies that
the memory pointed to by the parameter does not escape."

Bootstrapped and tested on x86_64-linux.  OK for master?


2023-11-02  Martin Jambor  

* trans-decl.cc (gfc_build_builtin_function_decls): Fix fnspec of
generate_error.

---
 gcc/fortran/trans-decl.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index a3f037bd07b..b86cfec7d49 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -3821,7 +3821,7 @@ gfc_build_builtin_function_decls (void)
void_type_node, -2, pchar_type_node, pchar_type_node);
 
   gfor_fndecl_generate_error = gfc_build_library_function_decl_with_spec (
-   get_identifier (PREFIX("generate_error")), ". R . R ",
+   get_identifier (PREFIX("generate_error")), ". W . R ",
void_type_node, 3, pvoid_type_node, integer_type_node,
pchar_type_node);
 
-- 
2.42.0



[PATCH v2] Format gotools.sum closer to what DejaGnu does

2023-11-03 Thread Maxim Kuvyrkov
The only difference compared to v1 is using vanilla automake 1.15.1
to regenerate Makefile.in.

I'll merge this as obvious if no-one objects in a day.

===
... to restore compatability with validate_failures.py .
The testsuite script validate_failures.py expects
"Running  ..." to extract  values,
and gotools.sum provided "Running ".

Note that libgo.sum, which also uses Makefile logic to generate
DejaGnu-like output, already has "..." suffix.

gotools/ChangeLog:

* Makefile.am: Update "Running  ..." output
* Makefile.in: Regenerate.

Signed-off-by: Maxim Kuvyrkov 
---
 gotools/Makefile.am | 4 ++--
 gotools/Makefile.in | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gotools/Makefile.am b/gotools/Makefile.am
index 7b5302990f8..d2376b9c25b 100644
--- a/gotools/Makefile.am
+++ b/gotools/Makefile.am
@@ -332,8 +332,8 @@ check: check-head check-go-tool check-runtime 
check-cgo-test check-carchive-test
@cp gotools.sum gotools.log
@for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
  testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; \
- echo "Running $${testname}" >> gotools.sum; \
- echo "Running $${testname}" >> gotools.log; \
+ echo "Running $${testname} ..." >> gotools.sum; \
+ echo "Running $${testname} ..." >> gotools.log; \
  sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> gotools.log; \
  grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' -e 
's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
done
diff --git a/gotools/Makefile.in b/gotools/Makefile.in
index 2783b91ef4b..36c2ec2abd3 100644
--- a/gotools/Makefile.in
+++ b/gotools/Makefile.in
@@ -1003,8 +1003,8 @@ mostlyclean-local:
 @NATIVE_TRUE@  @cp gotools.sum gotools.log
 @NATIVE_TRUE@  @for file in cmd_go-testlog runtime-testlog cgo-testlog 
carchive-testlog cmd_vet-testlog embed-testlog; do \
 @NATIVE_TRUE@testname=`echo $${file} | sed -e 's/-testlog//' -e 's|_|/|'`; 
\
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.sum; \
-@NATIVE_TRUE@echo "Running $${testname}" >> gotools.log; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.sum; \
+@NATIVE_TRUE@echo "Running $${testname} ..." >> gotools.log; \
 @NATIVE_TRUE@sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' < $${file} >> 
gotools.log; \
 @NATIVE_TRUE@grep '^--- ' $${file} | sed -e 's/^--- \(.*\) ([^)]*)$$/\1/' 
-e 's/SKIP/UNTESTED/' | sort -k 2 >> gotools.sum; \
 @NATIVE_TRUE@  done
-- 
2.34.1



Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Hongyu Wang
Thanks for the fix and refinement!

I think the addr attr looks more reasonable, just one small issue that
EGPR was not only encoded with REX2 prefix, there are several
instructions that encode EGPR using evex prefix. So I think
addr_rex2/addr_rex may be a misleading note. I'd prefer still using
gpr16/gpr32 as the name which clearly shows which type of gpr was
adopted to an address.

Hongtao Liu  于2023年11月3日周五 20:50写道:
>
> On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak  wrote:
> >
> > The patch generalizes address register class handling to allow multiple
> > address register classes.  For APX EGPR targets, some instructions can't be
> > encoded with REX2 prefix, so it is necessary to limit address register
> > class to avoid REX2 registers.  The same situation happens for instructions
> > with high registers, where the REX register can not be used in the address,
> > so the existing infrastructure can be adapted to also handle this case.
> >
> > The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
> > introduces no functional changes, although it fixes a couple of inconsistent
> > attribute values in passing.
>
> @@ -22569,9 +22578,8 @@ (define_insn "_mpsadbw"
> mpsadbw\t{%3, %2, %0|%0, %2, %3}
> vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "isa" "noavx,noavx,avx")
> -   (set_attr "gpr32" "0,0,1")
> +   (set_attr "addr" "rex")
> (set_attr "type" "sselog1")
> -   (set_attr "gpr32" "0")
> (set_attr "length_immediate" "1")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "orig,orig,vex")
>
> I believe your fix is correct.
>
> >
> > A follow-up patch will use the above infrastructure to limit address 
> > register
> > class to legacy registers for instructions with high registers.
>
> The patch looks good to me, but please leave some time for Hongyu in
> case he has any comments.
>
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
> > Rename to ...
> > (ix86_memory_address_reg_class): ... this.  Generalize address
> > register class handling to allow multiple address register classes.
> > Return maximal class for unrecognized instructions.  Improve comments.
> > (ix86_insn_base_reg_class): Rewrite to handle
> > multiple address register classes.
> > (ix86_regno_ok_for_insn_base_p): Ditto.
> > (ix86_insn_index_reg_class): Ditto.
> > * config/i386/i386.md: Rename "gpr32" attribute to "addr"
> > and substitute its values with "0" -> "rex", "1" -> "*".
> > (addr): New attribute to limit allowed address register set.
> > (gpr32): Remove.
> > * config/i386/mmx.md: Rename "gpr32" attribute to "addr"
> > and substitute its values with "0" -> "rex", "1" -> "*".
> > * config/i386/sse.md: Ditto.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > Comments welcome.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao


Re: [PATCH] Fortran: Fix generate_error library function fnspec

2023-11-03 Thread Tobias Burnus

On 03.11.23 13:54, Martin Jambor wrote:

when developing an otherwise unrelated patch I've discovered that the
fnspec for the Fortran library function generate_error is wrong. It is
currently ". R . R " where the first R describes the first parameter
and means that it "is only read and does not escape."  The function
itself, however, with signature:

   bool
   generate_error_common (st_parameter_common *cmp, int family, const char 
*message)

contains the following:

   /* Report status back to the compiler.  */
   cmp->flags &= ~IOPARM_LIBRETURN_MASK;

which does not correspond to the fnspec and breaks testcase
gfortran.dg/large_unit_2.f90 when my patch is applied, since it tries
to re-use the flags from before the call.

This patch replaces the "R" with "W" which stands for "specifies that
the memory pointed to by the parameter does not escape."

Bootstrapped and tested on x86_64-linux.  OK for master?


LGTM - thanks for the fix!

Tobias


2023-11-02  Martin Jambor  

 * trans-decl.cc (gfc_build_builtin_function_decls): Fix fnspec of
 generate_error.

---
  gcc/fortran/trans-decl.cc | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index a3f037bd07b..b86cfec7d49 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -3821,7 +3821,7 @@ gfc_build_builtin_function_decls (void)
  void_type_node, -2, pchar_type_node, pchar_type_node);

gfor_fndecl_generate_error = gfc_build_library_function_decl_with_spec (
- get_identifier (PREFIX("generate_error")), ". R . R ",
+ get_identifier (PREFIX("generate_error")), ". W . R ",
  void_type_node, 3, pvoid_type_node, integer_type_node,
  pchar_type_node);


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
On Fri, Nov 3, 2023 at 2:21 PM Hongyu Wang  wrote:
>
> Thanks for the fix and refinement!
>
> I think the addr attr looks more reasonable, just one small issue that
> EGPR was not only encoded with REX2 prefix, there are several
> instructions that encode EGPR using evex prefix. So I think
> addr_rex2/addr_rex may be a misleading note. I'd prefer still using
> gpr16/gpr32 as the name which clearly shows which type of gpr was
> adopted to an address.

No problem, I will keep gpr8/gpr16/gpr32 as "addr" values.

Thanks,
Uros.

>
> Hongtao Liu  于2023年11月3日周五 20:50写道:
> >
> > On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak  wrote:
> > >
> > > The patch generalizes address register class handling to allow multiple
> > > address register classes.  For APX EGPR targets, some instructions can't 
> > > be
> > > encoded with REX2 prefix, so it is necessary to limit address register
> > > class to avoid REX2 registers.  The same situation happens for 
> > > instructions
> > > with high registers, where the REX register can not be used in the 
> > > address,
> > > so the existing infrastructure can be adapted to also handle this case.
> > >
> > > The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
> > > introduces no functional changes, although it fixes a couple of 
> > > inconsistent
> > > attribute values in passing.
> >
> > @@ -22569,9 +22578,8 @@ (define_insn "_mpsadbw"
> > mpsadbw\t{%3, %2, %0|%0, %2, %3}
> > vmpsadbw\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> >[(set_attr "isa" "noavx,noavx,avx")
> > -   (set_attr "gpr32" "0,0,1")
> > +   (set_attr "addr" "rex")
> > (set_attr "type" "sselog1")
> > -   (set_attr "gpr32" "0")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "orig,orig,vex")
> >
> > I believe your fix is correct.
> >
> > >
> > > A follow-up patch will use the above infrastructure to limit address 
> > > register
> > > class to legacy registers for instructions with high registers.
> >
> > The patch looks good to me, but please leave some time for Hongyu in
> > case he has any comments.
> >
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
> > > Rename to ...
> > > (ix86_memory_address_reg_class): ... this.  Generalize address
> > > register class handling to allow multiple address register classes.
> > > Return maximal class for unrecognized instructions.  Improve comments.
> > > (ix86_insn_base_reg_class): Rewrite to handle
> > > multiple address register classes.
> > > (ix86_regno_ok_for_insn_base_p): Ditto.
> > > (ix86_insn_index_reg_class): Ditto.
> > > * config/i386/i386.md: Rename "gpr32" attribute to "addr"
> > > and substitute its values with "0" -> "rex", "1" -> "*".
> > > (addr): New attribute to limit allowed address register set.
> > > (gpr32): Remove.
> > > * config/i386/mmx.md: Rename "gpr32" attribute to "addr"
> > > and substitute its values with "0" -> "rex", "1" -> "*".
> > > * config/i386/sse.md: Ditto.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > >
> > > Comments welcome.
> > >
> > > Uros.
> >
> >
> >
> > --
> > BR,
> > Hongtao


Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Richard Biener
On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> On 03/11/23 12:51 pm, Richard Biener wrote:
> > On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  wrote:
> >>
> >> Hello All:
> >>
[...]
> >>
> >> High register pressure region is the region where there are live-in of
> >> early blocks that has been modified by the early block. If there are
> >> modification of the variables in best block that are live-in in early
> >> block that are live-out of best block.
> >
> > ?!  Parse error.
> >
>
> I didnt understand what you meant here. Please suggest.

I can't even guess what that paragraph means.  It fails at a
parsing level already, I can't even start to reason about what
the sentences mean.

> >> Bootstrapped and regtested on powerpc64-linux-gnu.
> >
> > What's the effect on code generation?
> >
> > Note that live is a quadratic problem while sinking was not.  You
> > are effectively making the pass unfit for -O1.
> >
> > You are computing "liveness" on GIMPLE where within EBBs there
> > isn't really any particular order of stmts, so it's kind of a garbage
> > heuristic.  Likewise you are not computing the effect that moving
> > a stmt has on liveness as far as I can see but you are just identifying
> > some odd metrics (I don't really understand them) to rank blocks,
> > not even taking the register file size into account.
>
>
> if the live out of best_bb  <= live out of early_bb, that shows
> that there are modification in best_bb.

Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
then some variables die during the execution of bb?  Otherwise,
if live_out (early) > live_out (best) then somewhere on the path
from early to best some variables die.

> Then it's
> safer to move statements in best_bb as there are lesser interfering
> live variables in best_bb.

consider a stmt

 a = b + c;

where b and c die at the definition of a.  Then moving the stmt
down from early_bb means you increase live_out (early_bb) by
one.  So why's that "safer" then?  Of course live_out (best_bb)
also increases by two then.

> if there are lesser live out in best_bb, there is lesser chance
> of interfering live ranges and hence moving statements in best_bb
> will not increase register pressure.
>
> If the liveout of best_bb is greater than live-out of early_bb,
> moving statements in best_bb will increase chances of more interfering
> live ranges and hence increase in register pressure.
>
> This is how the heuristics is defined.

I don't think they will work.  Do you have any numbers?

>
> >
> > You are replacing the hot/cold heuristic.
>
> >
> > IMHO the sinking pass is the totally wrong place to do anything
> > about register pressure.  You are trying to solve a scheduling
> > problem by just looking at a single stmt.
> >
>
> bb->count from profile.cc are prone to errors as you have
> mentioned in previous mails. Main bottlenecks with code
> motion is increase in register pressure as that counts to
> spills in later phases of the compiler backend.
>
> Calculation of best_bb based of immediate dominator should
> consider register pressure instead of hold cold regions as that
> would effect code generation.
>
> If there is increase in register pressure with code motion and if
> we are moving into colder regions, that wont improve code generations.
>
> Hold/cold should be criteria but not the improved criteria with
> code motion.
>
> We should consider register pressure with code motion than hot/cold
> regions.
>
> Thanks & Regards
> Ajit
>
> > Richard.
> >
> >> Thanks & Regards
>
>
> >> Ajit
> >>
> >> tree-optimization: Add register pressure heuristics
> >>
> >> Currently code sinking heuristics are based on profile data like
> >> basic block count and sink frequency threshold. We have removed
> >> such heuristics to add register pressure heuristics based on
> >> live-in and live-out of early blocks and immediate dominator of
> >> use blocks.
> >>
> >> High register pressure region is the region where there are live-in of
> >> early blocks that has been modified by the early block. If there are
> >> modification of the variables in best block that are live-in in early
> >> block that are live-out of best block.
> >>
> >> 2023-11-03  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
> >> as paramters.
> >> (sink_code_in_bb): Ditto.
> >> (select_best_block): Add register pressure heuristics to select
> >> the best blocks in the immediate dominator for same loop nest 
> >> depth.
> >> (execute): Add live range analysis.
> >> (additional_var_map): New function.
> >> * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
> >> tests on ssa_names.
> >> (verify_live_on_entry): Ditto.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> >> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> >> ---
> >

Re: [PATCH v2] Format gotools.sum closer to what DejaGnu does

2023-11-03 Thread Jeff Law




On 11/3/23 06:54, Maxim Kuvyrkov wrote:

The only difference compared to v1 is using vanilla automake 1.15.1
to regenerate Makefile.in.

I'll merge this as obvious if no-one objects in a day.

===
... to restore compatability with validate_failures.py .
The testsuite script validate_failures.py expects
"Running  ..." to extract  values,
and gotools.sum provided "Running ".

Note that libgo.sum, which also uses Makefile logic to generate
DejaGnu-like output, already has "..." suffix.

gotools/ChangeLog:

* Makefile.am: Update "Running  ..." output
* Makefile.in: Regenerate.

OK.  Thanks for running down the differences in the autogenerated  bits.

Jeff


Order#52377

2023-11-03 Thread NRT BILLING
Success Your items will be delivered soon Verified Shipping details
Order status update Delighted


Invoice52377.pdf
Description: Binary data


Re: [PATCH] libstdc++: avoid uninitialized read in basic_string constructor

2023-11-03 Thread Ben Sherman
> This was https://gcc.gnu.org/PR109703 (and several duplicates) and
> should already be fixed in all affected branches. Where are you seeing
> this?

I saw this on 13.1.0 and could not find the bug report or fix for it, so I
didn't realize it had already been fixed.  Sorry about that.






This electronic mail message and any attached files contain information 
intended for the exclusive use of the individual or entity to whom it is 
addressed and may contain information that is proprietary, confidential and/or 
exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any viewing, copying, disclosure or 
distribution of this information may be subject to legal restriction or 
sanction. Please notify the sender, by electronic mail or telephone, of any 
unintended recipients and delete the original message without making any copies.



Re: [PATCH] libstdc++: avoid uninitialized read in basic_string constructor

2023-11-03 Thread Sam James


Jonathan Wakely  writes:

> On Thu, 2 Nov 2023 at 19:58, Ben Sherman  
> wrote:
>>
>> Tested on x86_64-pc-linux-gnu, please let me know if there's anything
>> else needed. I haven't contributed before and don't have write access, so
>> apologies if I've missed anything.
>
> This was https://gcc.gnu.org/PR109703 (and several duplicates) and
> should already be fixed in all affected branches. Where are you seeing
> this?

It would help to know where they got their GCC from too. Ben?


[PATCH] Cleanup vectorizable_live_operation

2023-11-03 Thread Richard Biener
During analyzing PR111950 I found the loop live operation code-gen
odd, in particular only replacing a single PHI but then adjusting
possibly remaining PHIs afterwards where there shouldn't really
be any out-of-loop uses of the scalar in-loop def left.

Bootstrapped and tested together with another patch, quickly
re-testing after splitting out now.

Richard.

* tree-vect-loop.cc (vectorizable_live_operation): Simplify
LC PHI replacement.
---
 gcc/tree-vect-loop.cc | 53 ++-
 1 file changed, 17 insertions(+), 36 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2a43176bcfd..362856a6507 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10698,49 +10698,30 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   &stmts, true, NULL_TREE);
}
 
+  gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
   if (stmts)
-   {
- gimple_stmt_iterator exit_gsi = gsi_after_labels (exit_bb);
- gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
+   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
 
- /* Remove existing phi from lhs and create one copy from new_tree.  */
- tree lhs_phi = NULL_TREE;
- gimple_stmt_iterator gsi;
- for (gsi = gsi_start_phis (exit_bb);
-  !gsi_end_p (gsi); gsi_next (&gsi))
+  /* Remove existing phis that copy from lhs and create copies
+from new_tree.  */
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_start_phis (exit_bb); !gsi_end_p (gsi);)
+   {
+ gimple *phi = gsi_stmt (gsi);
+ if ((gimple_phi_arg_def (phi, 0) == lhs))
{
- gimple *phi = gsi_stmt (gsi);
- if ((gimple_phi_arg_def (phi, 0) == lhs))
-   {
- remove_phi_node (&gsi, false);
- lhs_phi = gimple_phi_result (phi);
- gimple *copy = gimple_build_assign (lhs_phi, new_tree);
- gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
- break;
-   }
+ remove_phi_node (&gsi, false);
+ tree lhs_phi = gimple_phi_result (phi);
+ gimple *copy = gimple_build_assign (lhs_phi, new_tree);
+ gsi_insert_before (&exit_gsi, copy, GSI_SAME_STMT);
}
+ else
+   gsi_next (&gsi);
}
 
-  /* Replace use of lhs with newly computed result.  If the use stmt is a
-single arg PHI, just replace all uses of PHI result.  It's necessary
-because lcssa PHI defining lhs may be before newly inserted stmt.  */
-  use_operand_p use_p;
+  /* There a no further out-of-loop uses of lhs by LC-SSA construction.  */
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
-   if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
-   && !is_gimple_debug (use_stmt))
- {
-   if (gimple_code (use_stmt) == GIMPLE_PHI
-   && gimple_phi_num_args (use_stmt) == 1)
- {
-   replace_uses_by (gimple_phi_result (use_stmt), new_tree);
- }
-   else
- {
-   FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
-   SET_USE (use_p, new_tree);
- }
-   update_stmt (use_stmt);
- }
+   gcc_assert (flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)));
 }
   else
 {
-- 
2.35.3


Re: [PATCH] libstdc++: avoid uninitialized read in basic_string constructor

2023-11-03 Thread Jonathan Wakely
On Fri, 3 Nov 2023 at 13:51, Ben Sherman  wrote:
>
> > This was https://gcc.gnu.org/PR109703 (and several duplicates) and
> > should already be fixed in all affected branches. Where are you seeing
> > this?
>
> I saw this on 13.1.0 and could not find the bug report or fix for it, so I
> didn't realize it had already been fixed.  Sorry about that.

No problem, sorry you spent your time analyzing the bug and writing the fix.

By default the bugzilla search results don't include closed bugs. That
means that if you search for a bug that was closed recently but is
still present in the release you're using, it won't show up.



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 2:22 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
>>> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
 
 Thanks a lot for raising these issues.
 
 If I understand correctly,  the major question we need to answer is:
 
 For the following example: (Jakub mentioned this  in an early message)
 
  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
  2 struct S s;
  3 s.a = 5;
  4 char *p = &s.b[2];
  5 int i1 = __builtin_dynamic_object_size (p, 0);
  6 s.a = 3;
  7 int i2 = __builtin_dynamic_object_size (p, 0);
 
 Should the 2nd __bdos call (line 7) get
A. the latest value of s.a (line 6) for it’s size?
 Or  B. the value when the s.b was referenced (line 3, line 4)?
 
>>> I personally think it should be (A). The user is specifically
>>> indicating that the size has somehow changed, and the compiler should
>>> behave accordingly.
>> 
>> 
>> One potential problem for A apart from the potential impact on
>> optimization is that the information may get lost more
>> easily. Consider:
>> 
>> char *p = &s.b[2];
>> f(&s);
>> int i = __bdos(p, 0);
>> 
>> If the compiler can not see into 'f', the information is lost
>> because f may have changed the size.
> 
> Why?  It doesn't really matter.  The options are
> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>   or whatever); .ACCESS_WITH_SIZE can't be pure,

.ACCESS_WITH_SIZE will only load the size from its address, no any write to 
memory.
It still can be PURE, right? (It will not be CONST anymore).

> but sure, for aliasing
>   POV we can describe it with more detail that it doesn't modify anything
>   in the pointed structure, just escapes the pointer;

If we need to do this, where in the gcc code we need to add these details?

> __bdos can stay
>   leaf I believe;

That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

Qing

> and when expanding __bdos later on, it would just
>   dereference the associated pointer at that point (note, __bdos is
>   pure, so it has vuse but not vdef and can load from memory); if
>   f changes s.a, no problem, __bdos will load the changed value in there
> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>   __bdos later will use s.a value from the &s.b[2] spot
> 
>   Jakub
> 



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 02:32:04PM +, Qing Zhao wrote:
> > Why?  It doesn't really matter.  The options are
> > A. p is at &s.b[2] associated with &s.a and int type (or size of int
> >   or whatever); .ACCESS_WITH_SIZE can't be pure,
> 
> .ACCESS_WITH_SIZE will only load the size from its address, no any write to 
> memory.
> It still can be PURE, right? (It will not be CONST anymore).

No, it can't be pure.  Because for the IL purposes, it needs to be treated
as if it saves that address of the counter into some unnamed global variable
somewhere.
> 
> > but sure, for aliasing
> >   POV we can describe it with more detail that it doesn't modify anything
> >   in the pointed structure, just escapes the pointer;
> 
> If we need to do this, where in the gcc code we need to add these details?

I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
expert here.

> > __bdos can stay
> >   leaf I believe;
> 
> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)

No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
I believe recurse on the first argument of that call (say if one has
  ptr_3 = malloc (sz_1);
  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, &ptr_3[4], ...);
then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
ptr_3 (the size actually allocated)) and the the counter.

Jakub



[PATCH] Testcases for vectorizer peeling

2023-11-03 Thread Richard Biener
The following exercise otherwise not exercised paths in the
vectorizer peeling code, resulting in CPU 2017 build ICEs
when patching but no fallout in the testsuite.

tested on x86_64-unknown-linux-gnu, pushed.

* gfortran.dg/20231103-1.f90: New testcase.
* gfortran.dg/20231103-2.f90: Likewise.
---
 gcc/testsuite/gfortran.dg/20231103-1.f90 | 22 ++
 gcc/testsuite/gfortran.dg/20231103-2.f90 | 22 ++
 2 files changed, 44 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/20231103-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/20231103-2.f90

diff --git a/gcc/testsuite/gfortran.dg/20231103-1.f90 
b/gcc/testsuite/gfortran.dg/20231103-1.f90
new file mode 100644
index 000..61ccf5c5e9d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/20231103-1.f90
@@ -0,0 +1,22 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+SUBROUTINE sedi_1D(QX1d,  DZ1d,kdir,BX1d,kbot,ktop)
+  real, dimension(:) :: QX1d,DZ1d
+  real, dimension(size(QX1d))  :: VVQ
+  logicalBX_present
+  do k= kbot,ktop,kdir
+ VVQ= VV_Q0
+  enddo
+  Vxmaxx= min0
+  if (kdir==1) then
+ dzMIN = minval(DZ1d)
+  endif
+  npassx=   Vxmaxx/dzMIN
+  DO nnn= 1,npassx
+ if (BX_present) then
+   do k= ktop,kdir
+ BX1d= iDZ1d0
+   enddo
+ endif
+  ENDDO
+END
diff --git a/gcc/testsuite/gfortran.dg/20231103-2.f90 
b/gcc/testsuite/gfortran.dg/20231103-2.f90
new file mode 100644
index 000..c510505d5ad
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/20231103-2.f90
@@ -0,0 +1,22 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+subroutine 
shr_map_checkFldStrshr_map_mapSet_dest(ndst,max0,eps,sum0,maxval0,min0,nidnjd,renorm)
+  allocatable  sum(:)
+  logical renorm
+  allocate(sum(ndst))
+  do n=1,ndst 
+if (sum0 > eps) then
+  rmax = max0
+endif
+  enddo
+  if (renorm) then
+rmin = maxval0
+rmax = minval(sum)
+do n=1,nidnjd
+  if (sum0 > eps) then
+rmin = min0
+  endif
+enddo
+write(*,*) rmin,rmax
+  endif
+end
-- 
2.35.3


Re: [PATCH] tree-optimization: Add register pressure heuristics

2023-11-03 Thread Ajit Agarwal
Hello Richard:


On 03/11/23 7:06 pm, Richard Biener wrote:
> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>> On 03/11/23 12:51 pm, Richard Biener wrote:
>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal  wrote:

 Hello All:

> [...]

 High register pressure region is the region where there are live-in of
 early blocks that has been modified by the early block. If there are
 modification of the variables in best block that are live-in in early
 block that are live-out of best block.
>>>
>>> ?!  Parse error.
>>>
>>
>> I didnt understand what you meant here. Please suggest.
> 
> I can't even guess what that paragraph means.  It fails at a
> parsing level already, I can't even start to reason about what
> the sentences mean.

Sorry for that I will modify.

> 
 Bootstrapped and regtested on powerpc64-linux-gnu.
>>>
>>> What's the effect on code generation?
>>>
>>> Note that live is a quadratic problem while sinking was not.  You
>>> are effectively making the pass unfit for -O1.
>>>
>>> You are computing "liveness" on GIMPLE where within EBBs there
>>> isn't really any particular order of stmts, so it's kind of a garbage
>>> heuristic.  Likewise you are not computing the effect that moving
>>> a stmt has on liveness as far as I can see but you are just identifying
>>> some odd metrics (I don't really understand them) to rank blocks,
>>> not even taking the register file size into account.
>>
>>
>> if the live out of best_bb  <= live out of early_bb, that shows
>> that there are modification in best_bb.
> 
> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
> then some variables die during the execution of bb?

live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
and there are more GEN (Variables).

  Otherwise,
> if live_out (early) > live_out (best) then somewhere on the path
> from early to best some variables die.
> 

If live_out (early) > live_out (best) means there are more GEN (Variables)
between path from early to best.


>> Then it's
>> safer to move statements in best_bb as there are lesser interfering
>> live variables in best_bb.
> 
> consider a stmt
> 
>  a = b + c;
> 
> where b and c die at the definition of a.  Then moving the stmt
> down from early_bb means you increase live_out (early_bb) by
> one.  So why's that "safer" then?  Of course live_out (best_bb)
> also increases by two then.
> 

If b and c die at the definition of a and generates a live_in(early_bb)
would be live_out(early_bb) - 2 + 1.

the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
by one and live_out (best_bb) depends on the LIVEIN(for all successors of 
best_bb)
which may be same even if we move down.

There are chances that live_out (best_bb) greater if for all successors of 
best_bb there are more GEN ( variables). If live_out (best_bb) is less
means there more KILL (Variables) in successors of best_bb.

With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
code motion as there are chances of more interfering live ranges. If 
liveout(best_bb)
<= liveout (early_bb) then we do code motion as there is there are more KILL(for
all successors of best_bb) and there is less chance of interfering live ranges.

With moving down above stmt from early_bb to best_bb increases 
live_out(early_bb)
by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 
1
but if it becomes > live_out(best_bb) then we dont do code motion if we have 
more GEN (Variables) in best_bb otherewise its safer to do 
code motion.

for above statement a = b + c dies b and c and generates a in early_bb then
liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
we do code motion.





>> if there are lesser live out in best_bb, there is lesser chance
>> of interfering live ranges and hence moving statements in best_bb
>> will not increase register pressure.
>>
>> If the liveout of best_bb is greater than live-out of early_bb,
>> moving statements in best_bb will increase chances of more interfering
>> live ranges and hence increase in register pressure.
>>
>> This is how the heuristics is defined.
> 
> I don't think they will work.  Do you have any numbers?
>

My heuristics will work as mentioned above. I will  run the spec benchmarks
and will able to give performance numbers.

Thanks & Regards
Ajit
 
>>
>>>
>>> You are replacing the hot/cold heuristic.
>>
>>>
>>> IMHO the sinking pass is the totally wrong place to do anything
>>> about register pressure.  You are trying to solve a scheduling
>>> problem by just looking at a single stmt.
>>>
>>
>> bb->count from profile.cc are prone to errors as you have
>> mentioned in previous mails. Main bottlenecks with code
>> motion is increase in register pressure as that counts to
>> spills in later phases of the compiler backend.
>>
>> Calculation of bes

GCN: Address undeclared 'NULL' usage in 'libgcc/config/gcn/gthr-gcn.h:__gthread_getspecific' (was: [PATCH 1/3] Create GCN-specific gthreads)

2023-11-03 Thread Thomas Schwinge
Hi!

On 2019-06-07T15:39:36+0100, Andrew Stubbs  wrote:
> This patch creates a new gthread model for AMD GCN devices.
>
> For now, there's just enough support for libgfortran to use mutexes in
> its I/O routines. The rest can be added at a later time, if at all.

Hmm, interestingly we don't have that for nvptx -- and I didn't run into
the need when early this year I was working on Fortran I/O support for
nvptx corresponding to that of GCN.  (That's the pending
"nvptx, libgfortran: Switch out of "minimal" mode" and prerequisite
patches.)

Anyway, not resolving that mystery today, but just a simple technicality:
pushed to master branch commit 5926f30a8dcee9142360fdae445ebfdee4a528f9
"GCN: Address undeclared 'NULL' usage in 
'libgcc/config/gcn/gthr-gcn.h:__gthread_getspecific'",
see attached.


Grüße
 Thomas


> Notes:
>
>   * GCN GPUs do not support dynamic creation and deletion of threads, so
> there can be no implementation for those functions. (There may be
> many threads, of course, but they are hardware managed and must be
> launched all at once.)
>
>   * It would be possible to implement support for EMUTLS, but I have no
> wish to do so at this time, and it isn't likely to be needed by
> OpenMP or OpenACC offload kernels, so those functions are also stub
> implementations.
>
> OK to commit?
>
> --
> Andrew Stubbs
> Mentor Graphics / CodeSourcery
> Create GCN-specific gthreads
>
> 2019-06-05  Kwok Cheung Yeung  
> Andrew Stubbs  
>
> gcc/
>   * config.gcc (thread_file): Set to gcn for AMD GCN.
>   * config/gcn/gcn.c (gcn_emutls_var_init): New function.
>   (TARGET_EMUTLS_VAR_INIT): New hook.
>
>   config/
>   * gthr.m4 (GCC_AC_THREAD_HEADER): Add case for gcn.
>
>   libgcc/
>   * configure: Regenerate.
>   * config/gcn/gthr-gcn.h: New.
>
> diff --git a/config/gthr.m4 b/config/gthr.m4
> index 7b29f1f3327..4b937306ad0 100644
> --- a/config/gthr.m4
> +++ b/config/gthr.m4
> @@ -13,6 +13,7 @@ AC_DEFUN([GCC_AC_THREAD_HEADER],
>  case $1 in
>  aix) thread_header=config/rs6000/gthr-aix.h ;;
>  dce) thread_header=config/pa/gthr-dce.h ;;
> +gcn) thread_header=config/gcn/gthr-gcn.h ;;
>  lynx)thread_header=config/gthr-lynx.h ;;
>  mipssde) thread_header=config/mips/gthr-mipssde.h ;;
>  posix)   thread_header=gthr-posix.h ;;
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 6b00c387247..b450098aa09 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1428,6 +1428,7 @@ amdgcn-*-amdhsa)
>   fi
>   # Force .init_array support.
>   gcc_cv_initfini_array=yes
> + thread_file=gcn
>   ;;
>  moxie-*-elf)
>   gas=yes
> diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
> index 71f4b4ce35a..e528b649cce 100644
> --- a/gcc/config/gcn/gcn.c
> +++ b/gcc/config/gcn/gcn.c
> @@ -3163,6 +3163,16 @@ gcn_valid_cvt_p (machine_mode from, machine_mode to, 
> enum gcn_cvt_t op)
> || (to == DFmode && (from == SImode || from == SFmode)));
>  }
>
> +/* Implement TARGET_EMUTLS_VAR_INIT.
> +
> +   Disable emutls (gthr-gcn.h does not support it, yet).  */
> +
> +tree
> +gcn_emutls_var_init (tree, tree decl, tree)
> +{
> +  sorry_at (DECL_SOURCE_LOCATION (decl), "TLS is not implemented for GCN.");
> +}
> +
>  /* }}}  */
>  /* {{{ Costs.  */
>
> @@ -6007,6 +6017,8 @@ print_operand (FILE *file, rtx x, int code)
>  #define TARGET_CONSTANT_ALIGNMENT gcn_constant_alignment
>  #undef  TARGET_DEBUG_UNWIND_INFO
>  #define TARGET_DEBUG_UNWIND_INFO gcn_debug_unwind_info
> +#undef  TARGET_EMUTLS_VAR_INIT
> +#define TARGET_EMUTLS_VAR_INIT gcn_emutls_var_init
>  #undef  TARGET_EXPAND_BUILTIN
>  #define TARGET_EXPAND_BUILTIN gcn_expand_builtin
>  #undef  TARGET_FUNCTION_ARG
> diff --git a/libgcc/config/gcn/gthr-gcn.h b/libgcc/config/gcn/gthr-gcn.h
> new file mode 100644
> index 000..4227b515f01
> --- /dev/null
> +++ b/libgcc/config/gcn/gthr-gcn.h
> @@ -0,0 +1,163 @@
> +/* Threads compatibility routines for libgcc2 and libobjc.  */
> +/* Compile this one with gcc.  */
> +/* Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +Under Section 7 of GPL version 3, you are granted additional
> +permissions described in the GCC Runtime Library Exception, version
> +3.1, as published by the Free Software Foundation.
> +
> +You should have received a copy of the GNU General Public License and
> +a copy of the GCC Runtime Library Exception along with this program;
> +see the files COPYING3 and COPYING

Re: [PATCH] c++: partial spec constraint checking context [PR105220]

2023-11-03 Thread Patrick Palka
On Tue, 3 May 2022, Jason Merrill wrote:

> On 5/2/22 14:50, Patrick Palka wrote:
> > Currently when checking the constraints of a class template, we do so in
> > the context of the template, not the specialized type.  This is the best
> > we can do for a primary template since the specialized type is valid
> > only if the primary template's constraints are satisfied.
> 
> Hmm, that's unfortunate.  It ought to be possible, if awkward, to form the
> type long enough to check its constraints.

(Sorry, lost track of this patch...)

Seems doable, but I'm not sure if would make any difference in practice?

If the access context during satisfaction of a primary class template's
constraints is the specialization rather than the primary template,
then that should only make a difference if there's some friend declaration
naming the specialization.  But that'd mean the specialization's
constraints had to have been satisfied at that point, before the friend
declaration went into effect.  So either the constraints don't depend on
the access granted by the friend declaration anyway, or they do and the
program is ill-formed (due to either satifaction failure or instability) IIUC.

For example, I don't think an adapted version of the testcase without a
partial specialization is valid, regardless of whether the access context
during satisfaction of A is A or just A:

template
concept fooable = requires(T t) { t.foo(); };

template
struct A { };

struct B {
private:
  friend struct A; // satisfaction failure at this point
  void foo();
};

template struct A;


> 
> > But for a
> > partial specialization, we can assume the specialized type is valid (as
> > a consequence of constraints being checked only when necessary), so we
> > arguably should check the constraints on a partial specialization more
> > specifically in the context of the specialized type, not the template.
> > 
> > This patch implements this by substituting and setting the access
> > context appropriately in satisfy_declaration_constraints.  Note that
> > setting the access context in this case is somewhat redundant since the
> > relevant caller most_specialized_partial_spec will already have set the
> > access context to the specialiation, but this redundancy should be harmless.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk and perhaps 12.2 (after the branch is thawed)?
> > 
> > PR c++/105220
> > 
> > gcc/cp/ChangeLog:
> > 
> > * constraint.cc (satisfy_declaration_constraints): When checking
> > the constraints of a partial template specialization, do so in
> > the context of the specialized type not the template.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-partial-spec12.C: New test.
> > ---
> >   gcc/cp/constraint.cc  | 17 ++---
> >   .../g++.dg/cpp2a/concepts-partial-spec12.C| 19 +++
> >   2 files changed, 33 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12.C
> > 
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index 94f6222b436..772f8532b47 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3253,11 +3253,22 @@ satisfy_declaration_constraints (tree t, tree args,
> > sat_info info)
> >   {
> > if (!push_tinst_level (t, args))
> > return result;
> > -  tree pattern = DECL_TEMPLATE_RESULT (t);
> > +  tree ascope = DECL_TEMPLATE_RESULT (t);
> > +  if (CLASS_TYPE_P (TREE_TYPE (t))
> > + && CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (t)))
> > +   {
> > + gcc_checking_assert (t == most_general_template (t));
> > + /* When checking the constraints on a partial specialization,
> > +do so in the context of the specialized type, not the template.
> > +This substitution should always succeed since we shouldn't
> > +be checking constraints thereof unless the specialized type
> > +is valid.  */
> > + ascope = tsubst (ascope, args, tf_none, info.in_decl);
> > +   }
> > push_to_top_level ();
> > -  push_access_scope (pattern);
> > +  push_access_scope (ascope);
> > result = satisfy_normalized_constraints (norm, args, info);
> > -  pop_access_scope (pattern);
> > +  pop_access_scope (ascope);
> > pop_from_top_level ();
> > pop_tinst_level ();
> >   }
> > diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12.C
> > b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12.C
> > new file mode 100644
> > index 000..641d456722d
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-partial-spec12.C
> > @@ -0,0 +1,19 @@
> > +// PR c++/105220
> > +// { dg-do compile { target c++20 } }
> > +
> > +template
> > +concept fooable = requires(T t) { t.foo(); };
> > +
> > +template
> > +struct A;// #1, incomplete
> > +
> > +template
> > +struct A {

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 10:46 AM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 02:32:04PM +, Qing Zhao wrote:
>>> Why?  It doesn't really matter.  The options are
>>> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>>>  or whatever); .ACCESS_WITH_SIZE can't be pure,
>> 
>> .ACCESS_WITH_SIZE will only load the size from its address, no any write to 
>> memory.
>> It still can be PURE, right? (It will not be CONST anymore).
> 
> No, it can't be pure.  Because for the IL purposes, it needs to be treated
> as if it saves that address of the counter into some unnamed global variable
> somewhere.

Okay. I see.

>> 
>>> but sure, for aliasing
>>>  POV we can describe it with more detail that it doesn't modify anything
>>>  in the pointed structure, just escapes the pointer;
>> 
>> If we need to do this, where in the gcc code we need to add these details?
> 
> I think ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1, but Richi is
> expert here.

Just checked these routines, looks like that some other non-pure internal 
functions are handled here too.
For example, 
  case IFN_UBSAN_BOUNDS:
  case IFN_UBSAN_VPTR:
  case IFN_UBSAN_OBJECT_SIZE:
  case IFN_UBSAN_PTR:
  case IFN_ASAN_CHECK:

Looks like the correct place to adjust the new .ACCESS_WITH_SIZE. 
> 
>>> __bdos can stay
>>>  leaf I believe;
>> 
>> That’s good!  (I thought now _bdos will call .ACCESS_WITH_SIZE?)
> 
> No, it shouldn't call it obviously.  If tree-object-size.cc discovery tracks
> something to a pointer initialized by .ACCESS_WITH_SIZE call, then it should
> I believe recurse on the first argument of that call (say if one has
>  ptr_3 = malloc (sz_1);
>  ptr_2 = .ACCESS_WITH_SIZE (ptr_3, &ptr_3[4], ...);
> then supposedly __bdos later on should e.g. for 0/1 modes take minimum from
> ptr_3 (the size actually allocated)) and the the counter.

Yes, this is the situation in my mind too. 
I thought this might eliminate the LEAF feature from __bdos. -:) if not, that’s 
good.

Qing
> 
>   Jakub
> 



[PATCH v2 0/7] aarch64 GCS preliminary patches

2023-11-03 Thread Szabolcs Nagy
I'm working on Guarded Control Stack support for aarch64 and have a
set of patches that are needed for GCS but seem useful without it so
makes sense to review them separately from the rest of the GCS work.

previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628123.html

Szabolcs Nagy (7):
  aarch64: Use br instead of ret for eh_return
  aarch64: Do not force a stack frame for EH returns
  aarch64: Add eh_return compile tests
  aarch64: Disable branch-protection for pcs tests
  aarch64,arm: Remove accepted_branch_protection_string
  aarch64,arm: Fix branch-protection= parsing
  aarch64,arm: Move branch-protection data to targets

 gcc/config/aarch64/aarch64-opts.h |   6 +-
 gcc/config/aarch64/aarch64-protos.h   |   1 -
 gcc/config/aarch64/aarch64.cc | 193 +++
 gcc/config/aarch64/aarch64.h  |   9 +-
 gcc/config/arm/aarch-common-protos.h  |   5 +-
 gcc/config/arm/aarch-common.cc| 229 +-
 gcc/config/arm/aarch-common.h |  25 +-
 gcc/config/arm/arm-c.cc   |   2 -
 gcc/config/arm/arm.cc |  57 -
 gcc/config/arm/arm.opt|   3 -
 gcc/df-scan.cc|  10 +
 gcc/doc/tm.texi   |  12 +
 gcc/doc/tm.texi.in|  12 +
 gcc/except.cc |  20 ++
 .../gcc.target/aarch64/aapcs64/func-ret-1.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-2.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-3.c   |   1 +
 .../gcc.target/aarch64/aapcs64/func-ret-4.c   |   1 +
 .../aarch64/aapcs64/func-ret-64x1_1.c |   1 +
 .../aarch64/branch-protection-attr.c  |   6 +-
 .../aarch64/branch-protection-option.c|   2 +-
 .../gcc.target/aarch64/eh_return-2.c  |   9 +
 .../gcc.target/aarch64/eh_return-3.c  |  30 +++
 .../aarch64/return_address_sign_1.c   |  13 +-
 .../aarch64/return_address_sign_2.c   |  17 +-
 .../aarch64/return_address_sign_b_1.c |  11 -
 .../aarch64/return_address_sign_b_2.c |  17 +-
 27 files changed, 356 insertions(+), 338 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-3.c

-- 
2.25.1



[PATCH v2 1/7] aarch64: Use br instead of ret for eh_return

2023-11-03 Thread Szabolcs Nagy
The expected way to handle eh_return is to pass the stack adjustment
offset and landing pad address via

  EH_RETURN_STACKADJ_RTX
  EH_RETURN_HANDLER_RTX

to the epilogue that is shared between normal return paths and the
eh_return paths.  EH_RETURN_HANDLER_RTX is the stack slot of the
return address that is overwritten with the landing pad in the
eh_return case and EH_RETURN_STACKADJ_RTX is a register added to sp
right before return and it is set to 0 in the normal return case.

The issue with this design is that eh_return and normal return may
require different return sequence but there is no way to distinguish
the two cases in the epilogue (the stack adjustment may be 0 in the
eh_return case too).

The reason eh_return and normal return requires different return
sequence is that control flow integrity hardening may need to treat
eh_return as a forward-edge transfer (it is not returning to the
previous stack frame) and normal return as a backward-edge one.
In case of AArch64 forward-edge is protected by BTI and requires br
instruction and backward-edge is protected by PAUTH or GCS and
requires ret (or authenticated ret) instruction.

This patch resolves the issue by introducing EH_RETURN_TAKEN_RTX that
is a flag set to 1 in the eh_return path and 0 in normal return paths.
Branching on the EH_RETURN_TAKEN_RTX flag, the right return sequence
can be used in the epilogue.

The handler could be passed the old way via clobbering the return
address, but since now the eh_return case can be distinguished, the
handler can be in a different register than x30 and no stack frame
is needed for eh_return.

This patch fixes a return to anywhere gadget in the unwinder with
existing standard branch protection as well as makes EH return
compatible with the Guarded Control Stack (GCS) extension.

Some tests are adjusted because eh_return no longer prevents pac-ret
in the normal return path.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_eh_return_handler_rtx):
Remove.
* config/aarch64/aarch64.cc (aarch64_return_address_signing_enabled):
Sign return address even in functions with eh_return.
(aarch64_expand_epilogue): Conditionally return with br or ret.
(aarch64_eh_return_handler_rtx): Remove.
* config/aarch64/aarch64.h (EH_RETURN_TAKEN_RTX): Define.
(EH_RETURN_STACKADJ_RTX): Change to R5.
(EH_RETURN_HANDLER_RTX): Change to R6.
* df-scan.cc: Handle EH_RETURN_TAKEN_RTX.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document EH_RETURN_TAKEN_RTX.
* except.cc (expand_eh_return): Handle EH_RETURN_TAKEN_RTX.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/return_address_sign_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_2.c: ... here and fix the
scan asm check.
* gcc.target/aarch64/return_address_sign_b_1.c: Move func4 to ...
* gcc.target/aarch64/return_address_sign_b_2.c: ... here and fix the
scan asm check.

---
v2:
- Introduce EH_RETURN_TAKEN_RTX instead of abusing EH_RETURN_STACKADJ_RTX.
- Merge test fixes.
---
 gcc/config/aarch64/aarch64-protos.h   |  1 -
 gcc/config/aarch64/aarch64.cc | 88 ++-
 gcc/config/aarch64/aarch64.h  |  9 +-
 gcc/df-scan.cc| 10 +++
 gcc/doc/tm.texi   | 12 +++
 gcc/doc/tm.texi.in| 12 +++
 gcc/except.cc | 20 +
 .../aarch64/return_address_sign_1.c   | 13 +--
 .../aarch64/return_address_sign_2.c   | 17 +++-
 .../aarch64/return_address_sign_b_1.c | 11 ---
 .../aarch64/return_address_sign_b_2.c | 17 +++-
 11 files changed, 116 insertions(+), 94 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 60a55f4bc19..80296024f04 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -859,7 +859,6 @@ machine_mode aarch64_hard_regno_caller_save_mode (unsigned, 
unsigned,
   machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
-rtx aarch64_eh_return_handler_rtx (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr_rtx (void);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index a28b66acf6a..5cdb33dd3dc 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -9113,17 +9113,6 @@ aarch64_return_address_signing_enabled (void)
   /* This function should only be called after frame laid out.   */
   gcc_assert (cfun->machine->frame.laid_out);
 
-  /* Turn return address signing off in any function that uses
- __builtin_eh_return.  The address passed to __builtin_eh_return
- is not signed so either it has t

[PATCH v2 5/7] aarch64,arm: Remove accepted_branch_protection_string

2023-11-03 Thread Szabolcs Nagy
On aarch64 this caused ICE with pragma push_options since

  commit ae54c1b09963779c5c3914782324ff48af32e2f1
  Author: Wilco Dijkstra 
  CommitDate: 2022-06-01 18:13:57 +0100

  AArch64: Cleanup option processing code

The failure is at pop_options:

internal compiler error: ‘global_options’ are modified in local context

On arm the variable was unused.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Do not override branch_protection options.
(aarch64_override_options): Remove accepted_branch_protection_string.
* config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove.
(aarch_parse_branch_protection): Remove
accepted_branch_protection_string.
* config/arm/arm.cc: Likewise.
---
unchanged from v1
---
 gcc/config/aarch64/aarch64.cc  | 10 +-
 gcc/config/arm/aarch-common.cc | 16 
 gcc/config/arm/arm.cc  |  2 --
 3 files changed, 1 insertion(+), 27 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 88594bed8ce..f8e8fefc8d8 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -323,8 +323,6 @@ bool aarch64_pcrelative_literal_loads;
 /* Global flag for whether frame pointer is enabled.  */
 bool aarch64_use_frame_pointer;
 
-char *accepted_branch_protection_string = NULL;
-
 /* Support for command line parsing of boolean flags in the tuning
structures.  */
 struct aarch64_flag_desc
@@ -18101,12 +18099,6 @@ aarch64_adjust_generic_arch_tuning (struct tune_params 
¤t_tune)
 static void
 aarch64_override_options_after_change_1 (struct gcc_options *opts)
 {
-  if (accepted_branch_protection_string)
-{
-  opts->x_aarch64_branch_protection_string
-   = xstrdup (accepted_branch_protection_string);
-}
-
   /* PR 70044: We have to be careful about being called multiple times for the
  same function.  This means all changes should be repeatable.  */
 
@@ -18715,7 +18707,7 @@ aarch64_override_options (void)
   /* Return address signing is currently not supported for ILP32 targets.  For
  LP64 targets use the configured option in the absence of a command-line
  option for -mbranch-protection.  */
-  if (!TARGET_ILP32 && accepted_branch_protection_string == NULL)
+  if (!TARGET_ILP32 && aarch64_branch_protection_string == NULL)
 {
 #ifdef TARGET_ENABLE_PAC_RET
   aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 5b96ff4c2e8..cbc7f68a8bf 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -659,9 +659,6 @@ arm_md_asm_adjust (vec &outputs, vec & /*inputs*/,
   return saw_asm_flag ? seq : NULL;
 }
 
-#define BRANCH_PROTECT_STR_MAX 255
-extern char *accepted_branch_protection_string;
-
 static enum aarch_parse_opt_result
 aarch_handle_no_branch_protection (char* str, char* rest)
 {
@@ -812,19 +809,6 @@ aarch_parse_branch_protection (const char *const_str, 
char** last_str)
   else
*last_str = NULL;
 }
-
-  if (res == AARCH_PARSE_OK)
-{
-  /* If needed, alloc the accepted string then copy in const_str.
-   Used by override_option_after_change_1.  */
-  if (!accepted_branch_protection_string)
-   accepted_branch_protection_string
- = (char *) xmalloc (BRANCH_PROTECT_STR_MAX + 1);
-  strncpy (accepted_branch_protection_string, const_str,
-  BRANCH_PROTECT_STR_MAX + 1);
-  /* Forcibly null-terminate.  */
-  accepted_branch_protection_string[BRANCH_PROTECT_STR_MAX] = '\0';
-}
   return res;
 }
 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 6e933c80183..f49312cace0 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -2424,8 +2424,6 @@ const struct tune_params arm_fa726te_tune =
   tune_params::SCHED_AUTOPREF_OFF
 };
 
-char *accepted_branch_protection_string = NULL;
-
 /* Auto-generated CPU, FPU and architecture tables.  */
 #include "arm-cpu-data.h"
 
-- 
2.25.1



[PATCH v2 3/7] aarch64: Add eh_return compile tests

2023-11-03 Thread Szabolcs Nagy
gcc/testsuite/ChangeLog:

* gcc.target/aarch64/eh_return-2.c: New test.
* gcc.target/aarch64/eh_return-3.c: New test.

---
v2: check-function-bodies in eh_return-3.c
(this is not very robust, but easier to read)
---
 .../gcc.target/aarch64/eh_return-2.c  |  9 ++
 .../gcc.target/aarch64/eh_return-3.c  | 30 +++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/eh_return-3.c

diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-2.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
new file mode 100644
index 000..4a9d124e891
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler "add\tsp, sp, x5" } } */
+/* { dg-final { scan-assembler "br\tx6" } } */
+
+void
+foo (unsigned long off, void *handler)
+{
+  __builtin_eh_return (off, handler);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/eh_return-3.c 
b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
new file mode 100644
index 000..bfbe92af427
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/eh_return-3.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mbranch-protection=pac-ret+leaf" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+**foo:
+** hint25 // paciasp
+** stp x0, x1, .*
+** stp x2, x3, .*
+** cbz w2, .*
+** mov x4, 0
+** ldp x2, x3, .*
+** ldp x0, x1, .*
+** cbz x4, .*
+** add sp, sp, x5
+** br  x6
+** hint29 // autiasp
+** ret
+** mov x5, x0
+** mov x6, x1
+** mov x4, 1
+** b   .*
+*/
+void
+foo (unsigned long off, void *handler, int c)
+{
+  if (c)
+return;
+  __builtin_eh_return (off, handler);
+}
-- 
2.25.1



[PATCH v2 2/7] aarch64: Do not force a stack frame for EH returns

2023-11-03 Thread Szabolcs Nagy
EH returns no longer rely on clobbering the return address on the stack
so forcing a stack frame is not necessary.

This does not actually change the code gen for the unwinder since there
are calls before the EH return.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_needs_frame_chain): Do not
force frame chain for eh_return.
---
unchanged compared to v1
already approved at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629346.html
---
 gcc/config/aarch64/aarch64.cc | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 5cdb33dd3dc..88594bed8ce 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8492,8 +8492,7 @@ aarch64_output_probe_sve_stack_clash (rtx base, rtx 
adjustment,
 static bool
 aarch64_needs_frame_chain (void)
 {
-  /* Force a frame chain for EH returns so the return address is at FP+8.  */
-  if (frame_pointer_needed || crtl->calls_eh_return)
+  if (frame_pointer_needed)
 return true;
 
   /* A leaf function cannot have calls or write LR.  */
-- 
2.25.1



[PATCH v2 6/7] aarch64,arm: Fix branch-protection= parsing

2023-11-03 Thread Szabolcs Nagy
Refactor the parsing to have a single API and fix a few parsing issues:

- Different handling of "bti+none" and "none+bti": these should be
  rejected because "none" can only appear alone.

- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
  was caused by using strtok_r.

- Memory got leaked (str_root was never freed). And two buffers got
  allocated when one is enough.

The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally.  The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.
---
v2: merge tests updates into the patch
error message is not changed, see previous discussion:
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633945.html
---
 gcc/config/aarch64/aarch64.cc |  37 +--
 gcc/config/arm/aarch-common-protos.h  |   5 +-
 gcc/config/arm/aarch-common.cc| 214 --
 gcc/config/arm/aarch-common.h |  14 +-
 gcc/config/arm/arm.cc |   3 +-
 .../aarch64/branch-protection-attr.c  |   6 +-
 .../aarch64/branch-protection-option.c|   2 +-
 7 files changed, 113 insertions(+), 168 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f8e8fefc8d8..4f7f707b675 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18642,7 +18642,8 @@ aarch64_override_options (void)
 aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
 
   if (aarch64_branch_protection_string)
-aarch_validate_mbranch_protection (aarch64_branch_protection_string);
+aarch_validate_mbranch_protection (aarch64_branch_protection_string,
+  "-mbranch-protection=");
 
   /* -mcpu=CPU is shorthand for -march=ARCH_FOR_CPU, -mtune=CPU.
  If either of -march or -mtune is given, they override their
@@ -19016,34 +19017,12 @@ aarch64_handle_attr_cpu (const char *str)
 
 /* Handle the argument STR to the branch-protection= attribute.  */
 
- static bool
- aarch64_handle_attr_branch_protection (const char* str)
- {
-  char *err_str = (char *) xmalloc (strlen (str) + 1);
-  enum aarch_parse_opt_result res = aarch_parse_branch_protection (str,
-  &err_str);
-  bool success = false;
-  switch (res)
-{
- case AARCH_PARSE_MISSING_ARG:
-   error ("missing argument to % pragma 
or"
- " attribute");
-   break;
- case AARCH_PARSE_INVALID_ARG:
-   error ("invalid protection type %qs in % pragma or attribute", err_str);
-   break;
- case AARCH_PARSE_OK:
-   success = true;
-  /* Fall through.  */
- case AARCH_PARSE_INVALID_FEATURE:
-   break;
- default:
-   gcc_unreachable ();
-}
-  free (err_str);
-  return success;
- }
+static bool
+aarch64_handle_attr_branch_protection (const char* str)
+{
+  return aarch_validate_mbranch_protection (str,
+   "target(\"branch-protection=\")");
+}
 
 /* Handle the argument STR to the tune= target attribute.  */
 
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index f8cb6562096..75ffdfbb050 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -159,10 +159,7 @@ rtx_insn *arm_md_asm_adjust (vec &outputs, vec & 
/*inputs*/,
 vec &clobbers, HARD_REG_SET &clobbered_regs,
 location_t loc);
 
-/* Parsing routine for branch-protection common to AArch64 and Arm.  */
-enum aarch_parse_opt_result aarch_parse_branch_protection (const char*, 
char**);
-
 /* Validation routine for branch-protection common to AArch64 and Arm.  */
-bool aarch_validate_mbranch_protection

[PATCH v2 4/7] aarch64: Disable branch-protection for pcs tests

2023-11-03 Thread Szabolcs Nagy
The tests manipulate the return address in abitest-2.h and thus not
compatible with -mbranch-protection=pac-ret+leaf or
-mbranch-protection=gcs.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/aapcs64/func-ret-1.c: Disable branch-protection.
* gcc.target/aarch64/aapcs64/func-ret-2.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-3.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-4.c: Likewise.
* gcc.target/aarch64/aapcs64/func-ret-64x1_1.c: Likewise.
---
unchanged compared to v1
already approved at
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629353.html
---
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c  | 1 +
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c | 1 +
 5 files changed, 5 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
index 5405e1e4920..7bd7757efe6 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-1.c
@@ -4,6 +4,7 @@
AAPCS64 \S 4.1.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
index 6b171c46fbb..85a822ace4a 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-2.c
@@ -4,6 +4,7 @@
Homogeneous floating-point aggregate types are covered in func-ret-3.c.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
index ad312b675b9..1d35ebf14b4 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
@@ -4,6 +4,7 @@
in AAPCS64 \S 4.3.5.  */
 
 /* { dg-do run { target aarch64-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 /* { dg-require-effective-target aarch64_big_endian } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
index af05fbe9fdf..15e1408c62d 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
@@ -5,6 +5,7 @@
are treated as general composite types.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 /* { dg-require-effective-target aarch64_big_endian } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
index 05957e2dcae..fe7bbb6a835 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-64x1_1.c
@@ -3,6 +3,7 @@
   Test 64-bit singleton vector types which should be in FP/SIMD registers.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
 
 #ifndef IN_FRAMEWORK
-- 
2.25.1



Re: Remove redundant partial specialization in _Nth_type

2023-11-03 Thread Patrick Palka
yn Sat, 28 Oct 2023, Feng Jisen wrote:

> This patch remove a redundant partial specialization in class _Nth_type.
> For the original metafunction _Nth_type code,
>   # 0
>   template     struct _Nth_type<0, _Tp0, 
> _Rest...>
>     { using type = _Tp0; };
>  # 1
>   template
>     struct _Nth_type<0, _Tp0, _Tp1, _Rest...>
>     { using type = _Tp0; };   # 2
>   template
>     struct _Nth_type<0, _Tp0, _Tp1, _Tp2, _Rest...>
>     { using type = _Tp0; };
>   # 3
>   template typename... _Rest>
> #if __cpp_concepts
>     requires (_Np >= 3)
> #endif
>     struct _Nth_type<_Np, _Tp0, _Tp1, _Tp2, _Rest...>
>     : _Nth_type<_Np - 3, _Rest...>
>     { };
> 
> we need partial specialization # 2 to deal with template argument <0, Tp0, 
> Tp1, Tp2, ...>.
> Because without concepts, both # 0 and # 3 is legal and there is no partial 
> order relationship between them. 
> However, # 1 is redundant. For template argument <0, Tp0, Tp1>, #0 is 
> instantiated and that's enough.

Thanks for the patch!  This looks good to me.

> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/utility.h:(_Nth_type) Remove redundant partial 
> specialization.
> 
> 
> diff --git a/libstdc++-v3/include/bits/utility.h 
> b/libstdc++-v3/include/bits/utility.hindex bed94525642..8766dfbc15f 100644
> --- a/libstdc++-v3/include/bits/utility.h
> +++ b/libstdc++-v3/include/bits/utility.h
> @@ -258,10 +258,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      { };
>  
>  #if ! __cpp_concepts // Need additional specializations to avoid ambiguities.
> -  template
> -    struct _Nth_type<0, _Tp0, _Tp1, _Rest...>
> -    { using type = _Tp0; };
> -
>    template
>      struct _Nth_type<0, _Tp0, _Tp1, _Tp2, _Rest...>
>      { using type = _Tp0; };
> -- 
> 
> 
> 

[PATCH v2 7/7] aarch64,arm: Move branch-protection data to targets

2023-11-03 Thread Szabolcs Nagy
The branch-protection types are target specific, not the same on arm
and aarch64.  This currently affects pac-ret+b-key, but there will be
a new type on aarch64 that is not relevant for arm.

gcc/ChangeLog:

* config/aarch64/aarch64-opts.h (enum aarch64_key_type): Rename to ...
(enum aarch_key_type): ... this.
* config/aarch64/aarch64.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_pac_ret_b_key): Copy.
(aarch_handle_bti_protection): Copy.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Remove.
(aarch_handle_standard_branch_protection): Remove.
(aarch_handle_pac_ret_protection): Remove.
(aarch_handle_pac_ret_leaf): Remove.
(aarch_handle_pac_ret_b_key): Remove.
(aarch_handle_bti_protection): Remove.
* config/arm/aarch-common.h (enum aarch_key_type): Remove.
(struct aarch_branch_protect_type): Declare.
* config/arm/arm-c.cc (arm_cpu_builtins): Remove aarch_ra_sign_key.
* config/arm/arm.cc (aarch_handle_no_branch_protection): Copy.
(aarch_handle_standard_branch_protection): Copy.
(aarch_handle_pac_ret_protection): Copy.
(aarch_handle_pac_ret_leaf): Copy.
(aarch_handle_bti_protection): Copy.
(arm_configure_build_target): Copy.
* config/arm/arm.opt: Remove aarch_ra_sign_key.
---
unchanged compared to v1.
---
 gcc/config/aarch64/aarch64-opts.h |  6 ++--
 gcc/config/aarch64/aarch64.cc | 55 +++
 gcc/config/arm/aarch-common.cc| 55 ---
 gcc/config/arm/aarch-common.h | 11 +++
 gcc/config/arm/arm-c.cc   |  2 --
 gcc/config/arm/arm.cc | 52 +
 gcc/config/arm/arm.opt|  3 --
 7 files changed, 109 insertions(+), 75 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-opts.h 
b/gcc/config/aarch64/aarch64-opts.h
index 831e28ab52a..1abae1442b5 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -103,9 +103,9 @@ enum stack_protector_guard {
 };
 
 /* The key type that -msign-return-address should use.  */
-enum aarch64_key_type {
-  AARCH64_KEY_A,
-  AARCH64_KEY_B
+enum aarch_key_type {
+  AARCH_KEY_A,
+  AARCH_KEY_B
 };
 
 /* An enum specifying how to handle load and store pairs using
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4f7f707b675..9739223831f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -18620,6 +18620,61 @@ aarch64_set_asm_isa_flags (aarch64_feature_flags flags)
   aarch64_set_asm_isa_flags (&global_options, flags);
 }
 
+static void
+aarch_handle_no_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
+  aarch_enable_bti = 0;
+}
+
+static void
+aarch_handle_standard_branch_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+  aarch_enable_bti = 1;
+}
+
+static void
+aarch_handle_pac_ret_protection (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
+  aarch_ra_sign_key = AARCH_KEY_A;
+}
+
+static void
+aarch_handle_pac_ret_leaf (void)
+{
+  aarch_ra_sign_scope = AARCH_FUNCTION_ALL;
+}
+
+static void
+aarch_handle_pac_ret_b_key (void)
+{
+  aarch_ra_sign_key = AARCH_KEY_B;
+}
+
+static void
+aarch_handle_bti_protection (void)
+{
+  aarch_enable_bti = 1;
+}
+
+static const struct aarch_branch_protect_type aarch_pac_ret_subtypes[] = {
+  { "leaf", false, aarch_handle_pac_ret_leaf, NULL, 0 },
+  { "b-key", false, aarch_handle_pac_ret_b_key, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
+const struct aarch_branch_protect_type aarch_branch_protect_types[] = {
+  { "none", true, aarch_handle_no_branch_protection, NULL, 0 },
+  { "standard", true, aarch_handle_standard_branch_protection, NULL, 0 },
+  { "pac-ret", false, aarch_handle_pac_ret_protection, aarch_pac_ret_subtypes,
+ARRAY_SIZE (aarch_pac_ret_subtypes) },
+  { "bti", false, aarch_handle_bti_protection, NULL, 0 },
+  { NULL, false, NULL, NULL, 0 }
+};
+
 /* Implement TARGET_OPTION_OVERRIDE.  This is called once in the beginning
and is used to parse the -m{cpu,tune,arch} strings and setup the initial
tuning structs.  In particular it must set selected_tune and
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 159c61b786c..92e1248f83f 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -659,61 +659,6 @@ arm_md_asm_adjust (vec &outputs, vec & 
/*inputs*/,
   return saw_asm_flag ? seq : NULL;
 }
 
-static void
-aarch_handle_no_branch_protection (void)
-{
-  aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
-  aarch_enable_bti = 0;
-}
-
-static void
-aarch_handle_standard_branch_protection (void)
-{
-  aarch_ra_sign_scope

[COMMITTED]: i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple
register classes.  For APX EGPR targets, some instructions do not support
GPR32 registers, so it is necessary to limit address register set to
avoid them.  The same situation happens for instructions with high registers,
where REX registers can not be used in the address, so the existing
infrastructure can be adapted to also handle this case.

The patch is mostly a mechanical rename of "gpr32" attribute to "addr" and
introduces no functional changes, although it fixes a couple of inconsistent
attribute values in passing.

A follow-up patch will use the above infrastructure to limit address register
class to legacy registers for instructions with high registers.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
Rename to ...
(ix86_memory_address_reg_class): ... this.  Generalize address
register class handling to allow multiple address register classes.
Return maximal class for unrecognized instructions.  Improve comments.
(ix86_insn_base_reg_class): Rewrite to handle
multiple address register classes.
(ix86_regno_ok_for_insn_base_p): Ditto.
(ix86_insn_index_reg_class): Ditto.
* config/i386/i386.md: Rename "gpr32" attribute to "addr"
and substitute its values with "0" -> "gpr16", "1" -> "*".
(addr): New attribute to limit allowed address register set.
(gpr32): Remove.
* config/i386/mmx.md: Rename "gpr32" attribute to "addr"
and substitute its values with "0" -> "gpr16", "1" -> "*".
* config/i386/sse.md: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 0f17f7d0258..fdc9362cf5b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11357,93 +11357,116 @@ ix86_validate_address_register (rtx op)
   return NULL_RTX;
 }
 
-/* Return true if insn memory address can use any available reg
-   in BASE_REG_CLASS or INDEX_REG_CLASS, otherwise false.
-   For APX, some instruction can't be encoded with gpr32
-   which is BASE_REG_CLASS or INDEX_REG_CLASS, for that case
-   returns false.  */
-static bool
-ix86_memory_address_use_extended_reg_class_p (rtx_insn* insn)
+/* Determine which memory address register set insn can use.  */
+
+static enum attr_addr
+ix86_memory_address_reg_class (rtx_insn* insn)
 {
-  /* LRA will do some initialization with insn == NULL,
- return the maximum reg class for that.
- For other cases, real insn will be passed and checked.  */
-  bool ret = true;
+  /* LRA can do some initialization with NULL insn,
+ return maximum register class in this case.  */
+  enum attr_addr addr_rclass = ADDR_GPR32;
+
   if (TARGET_APX_EGPR && insn)
 {
   if (asm_noperands (PATTERN (insn)) >= 0
  || GET_CODE (PATTERN (insn)) == ASM_INPUT)
-   return ix86_apx_inline_asm_use_gpr32;
+   return ix86_apx_inline_asm_use_gpr32 ? ADDR_GPR32 : ADDR_GPR16;
 
+  /* Return maximum register class for unrecognized instructions.  */
   if (INSN_CODE (insn) < 0)
-   return false;
+   return addr_rclass;
 
-  /* Try recog the insn before calling get_attr_gpr32. Save
-the current recog_data first.  */
-  /* Also save which_alternative for current recog.  */
+  /* Try to recognize the insn before calling get_attr_addr.
+Save current recog_data and current alternative.  */
+  struct recog_data_d saved_recog_data = recog_data;
+  int saved_alternative = which_alternative;
 
-  struct recog_data_d recog_data_save = recog_data;
-  int which_alternative_saved = which_alternative;
-
-  /* Update the recog_data for alternative check. */
+  /* Update recog_data for processing of alternatives.  */
   if (recog_data.insn != insn)
extract_insn_cached (insn);
 
-  /* If alternative is not set, loop throught each alternative
-of insn and get gpr32 attr for all enabled alternatives.
-If any enabled alternatives has 0 value for gpr32, disallow
-gpr32 for addressing.  */
-  if (which_alternative_saved == -1)
+  /* If current alternative is not set, loop throught enabled
+alternatives and get the most limited register class.  */
+  if (saved_alternative == -1)
{
  alternative_mask enabled = get_enabled_alternatives (insn);
- bool curr_insn_gpr32 = false;
+
  for (int i = 0; i < recog_data.n_alternatives; i++)
{
  if (!TEST_BIT (enabled, i))
continue;
+
  which_alternative = i;
- curr_insn_gpr32 = get_attr_gpr32 (insn);
- if (!curr_insn_gpr32)
-   ret = false;
+ addr_rclass = MIN (addr_rclass, get_attr_addr (insn));
}
}
   else
{
- which_alternative = which_alternative_saved;
- ret = get_attr_gpr32 (insn);
+ which_altern

Re: [PATCH v4] c-family: Implement __has_feature and __has_extension [PR60512]

2023-11-03 Thread Marek Polacek
On Wed, Sep 27, 2023 at 03:27:30PM +0100, Alex Coplan wrote:
> Hi,
> 
> This is a v4 patch to address Jason's feedback here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630911.html
> 
> w.r.t. v3 it just removes a comment now that some uncertainty around
> cxx_binary_literals has been resolved, and updates the documentation as
> suggested to point to the Clang docs.
> 
> --
> 
> This patch implements clang's __has_feature and __has_extension in GCC.
> Currently the patch aims to implement all documented features (and some
> undocumented ones) following the documentation at
> https://clang.llvm.org/docs/LanguageExtensions.html with the exception
> of the legacy features for C++ type traits.  These are omitted, since as
> the clang documentation notes, __has_builtin is the correct "modern" way
> to query for these (which GCC already implements).
> 
> Bootstrapped/regtested on aarch64-linux-gnu, bootstrapped on
> x86_64-apple-darwin, darwin regtest in progress.  OK for trunk if
> testing passes?

Thanks for the patch.  I only have a few minor comments.

> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index aae57260097..1210953d33a 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -311,6 +311,43 @@ const struct fname_var_t fname_vars[] =
>{NULL, 0, 0},
>  };
>  
> +/* Flags to restrict availability of generic features that
> +   are known to __has_{feature,extension}.  */
> +
> +enum
> +{
> +  HF_FLAG_EXT = 1,   /* Available only as an extension.  */
> +  HF_FLAG_SANITIZE = 2, /* Availability depends on sanitizer flags.  */
> +};

Why not have a new HF_FLAG_ = 0 here and use it below...

> +/* Info for generic features which can be queried through
> +   __has_{feature,extension}.  */
> +
> +struct hf_feature_info
> +{
> +  const char *ident;
> +  unsigned flags;
> +  unsigned mask;

Not enum sanitize_code for mask?

> +};
> +
> +/* Table of generic features which can be queried through
> +   __has_{feature,extension}.  */
> +
> +static const hf_feature_info has_feature_table[] =
> +{
> +  { "address_sanitizer", HF_FLAG_SANITIZE, SANITIZE_ADDRESS },
> +  { "thread_sanitizer",  HF_FLAG_SANITIZE, SANITIZE_THREAD },
> +  { "leak_sanitizer",HF_FLAG_SANITIZE, SANITIZE_LEAK },
> +  { "hwaddress_sanitizer",   HF_FLAG_SANITIZE, SANITIZE_HWADDRESS },
> +  { "undefined_behavior_sanitizer", HF_FLAG_SANITIZE, SANITIZE_UNDEFINED },
> +  { "attribute_deprecated_with_message",  0, 0 },
> +  { "attribute_unavailable_with_message", 0, 0 },
> +  { "enumerator_attributes",   0, 0 },
> +  { "tls", 0, 0 },

...here?  Might be more obvious what it means then.

> +  { "gnu_asm_goto_with_outputs",   HF_FLAG_EXT, 0 },
> +  { "gnu_asm_goto_with_outputs_full",  HF_FLAG_EXT, 0 }
> +};
> +
>  /* Global visibility options.  */
>  struct visibility_flags visibility_options;
>  
> @@ -9808,4 +9845,63 @@ c_strict_flex_array_level_of (tree array_field)
>return strict_flex_array_level;
>  }
>  
> +/* Map from identifiers to booleans.  Value is true for features, and
> +   false for extensions.  Used to implement __has_{feature,extension}.  */
> +
> +using feature_map_t = hash_map ;
> +static feature_map_t *feature_map = nullptr;

You don't need " = nullptr" here.

> +/* Register a feature for __has_{feature,extension}.  FEATURE_P is true
> +   if the feature identified by NAME is a feature (as opposed to an
> +   extension).  */
> +
> +void
> +c_common_register_feature (const char *name, bool feature_p)
> +{
> +  bool dup = feature_map->put (get_identifier (name), feature_p);
> +  gcc_checking_assert (!dup);
> +}
> +
> +/* Lazily initialize hash table for __has_{feature,extension},
> +   dispatching to the appropriate frontend to register language-specific

"front end"

> +   features.  */
> +
> +static void
> +init_has_feature ()
> +{
> +  gcc_assert (!feature_map);
> +  feature_map = new feature_map_t;
> +  gcc_assert (feature_map);

I don't see asserts like that around "new hash_map" elsewhere in the code base.
If you want them, maybe use gcc_checking_assert instead?

> +  for (unsigned i = 0; i < ARRAY_SIZE (has_feature_table); i++)
> +{
> +  const hf_feature_info *info = has_feature_table + i;
> +
> +  if ((info->flags & HF_FLAG_SANITIZE) && !(flag_sanitize & info->mask))
> + continue;
> +
> +  const bool feature_p = !(info->flags & HF_FLAG_EXT);
> +  c_common_register_feature (info->ident, feature_p);
> +}
> +
> +  /* Register language-specific features.  */
> +  c_family_register_lang_features ();
> +}
> +
> +/* If STRICT_P is true, evaluate __has_feature (IDENT).
> +   Otherwise, evaluate __has_extension (IDENT).  */
> +
> +bool
> +has_feature_p (const char *ident, bool strict_p)

strict_p sounds a little bit odd here, maybe use bool feature_p/ext_p?
Or not, it's fine either way.

> +{
> +  if (!feature_ma

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
following:

 .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)

INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)

which returns the “REF_TO_OBJ" same as the 1st argument;

1st argument “REF_TO_OBJ": Reference to the object;
2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by the 
1st argument, 
 if the object that the “REF_TO_OBJ” refered has a
   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
elements of the type;
   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
3rd argument "ACCESS_MODE": 
 -1: Unknown access semantics
  0: none
  1: read_only
  2: write_only
  3: read_write

NOTEs, 
 A. This new internal function is intended for a more general use from all the 
3 attributes, "access", "alloc_size", and the new "counted_by", to encode the 
"size" and "access_mode" information to the corresponding pointer. (in order to 
resolve PR96503, etc. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96503)
 B. For "counted_by" and "alloc_size" attributes, the 3rd argument will be -1.  
 
 C. In this wrieup, we focus on the implementation details for the "counted_by" 
attribute. However, this function should be ready to be used by "access" and 
"alloc_size" without issue. 

Although .ACCESS_WITH_SIZE is not PURE anymore, but it’s only read from the 2nd 
argument, and not modify anything in the pointed objects. So, we can adjust the 
IPA alias analysis phase with this details 
(ref_maybe_used_by_call_p_1/call_may_clobber_ref_p_1).

One more note: only the integer type is allowed for the SIZE, and in 
tree_object_size.cc, all the SIZE
 (in attributes “access”, “alloc_size”, etc) are converted to “sizetype”.  So, 
we don’t need to specify
The type of the size for “REF_TO_SIZE” since it’s always integer types and 
always converted to “sizetype” internally. 

Let me know any more comment or suggestion. 

Qing


On Nov 3, 2023, at 2:32 AM, Martin Uecker  wrote:
> 
> 
> Am Freitag, dem 03.11.2023 um 07:22 +0100 schrieb Jakub Jelinek:
>> On Fri, Nov 03, 2023 at 07:07:36AM +0100, Martin Uecker wrote:
>>> Am Donnerstag, dem 02.11.2023 um 17:28 -0700 schrieb Bill Wendling:
 On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
> 
> Thanks a lot for raising these issues.
> 
> If I understand correctly,  the major question we need to answer is:
> 
> For the following example: (Jakub mentioned this  in an early message)
> 
>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>  2 struct S s;
>  3 s.a = 5;
>  4 char *p = &s.b[2];
>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>  6 s.a = 3;
>  7 int i2 = __builtin_dynamic_object_size (p, 0);
> 
> Should the 2nd __bdos call (line 7) get
>A. the latest value of s.a (line 6) for it’s size?
> Or  B. the value when the s.b was referenced (line 3, line 4)?
> 
 I personally think it should be (A). The user is specifically
 indicating that the size has somehow changed, and the compiler should
 behave accordingly.
>>> 
>>> 
>>> One potential problem for A apart from the potential impact on
>>> optimization is that the information may get lost more
>>> easily. Consider:
>>> 
>>> char *p = &s.b[2];
>>> f(&s);
>>> int i = __bdos(p, 0);
>>> 
>>> If the compiler can not see into 'f', the information is lost
>>> because f may have changed the size.
>> 
>> Why?  It doesn't really matter.  The options are
>> A. p is at &s.b[2] associated with &s.a and int type (or size of int
>>   or whatever); .ACCESS_WITH_SIZE can't be pure, but sure, for aliasing
>>   POV we can describe it with more detail that it doesn't modify anything
>>   in the pointed structure, just escapes the pointer; __bdos can stay
>>   leaf I believe; and when expanding __bdos later on, it would just
>>   dereference the associated pointer at that point (note, __bdos is
>>   pure, so it has vuse but not vdef and can load from memory); if
>>   f changes s.a, no problem, __bdos will load the changed value in there
> 
> Ah, I right. Because of the reload it doesn't matter. 
> Thank you for the explanation!
> 
> Martin
> 
>> B. if .ACCESS_WITH_SIZE associates the pointer with the s.a value from that
>>   point, .ACCESS_WITH_SIZE can be const, but obviously if f changes s.a,
>>   __bdos later will use s.a value from the &s.b[2] spot



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Jakub Jelinek
On Fri, Nov 03, 2023 at 04:20:57PM +, Qing Zhao wrote:
> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
> following:
> 
>  .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
> 
> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
> 
> which returns the “REF_TO_OBJ" same as the 1st argument;
> 
> 1st argument “REF_TO_OBJ": Reference to the object;
> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by 
> the 1st argument, 
>  if the object that the “REF_TO_OBJ” refered has a
>* real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
> elements of the type;
>* void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 

No, you can't do this.  Conversions between pointers are mostly useless in
GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
as it could have some random completely unrelated type.
So, the multiplication factor needs to be encoded in the arguments rather
than derived from REF_TO_OBJ's type, and similarly the size of what
REF_TO_SIZE points to needs to be encoded somewhere.

> 3rd argument "ACCESS_MODE": 
>  -1: Unknown access semantics
>   0: none
>   1: read_only
>   2: write_only
>   3: read_write

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 3, 2023, at 12:30 PM, Jakub Jelinek  wrote:
> 
> On Fri, Nov 03, 2023 at 04:20:57PM +, Qing Zhao wrote:
>> So, based on the discussion so far, We will define the .ACCESS_WITH_SIZE as 
>> following:
>> 
>> .ACCESS_WITH_SIZE (REF_TO_OBJ, REF_TO_SIZE, ACCESS_MODE)
>> 
>> INTERNAL_FN (ACCESS_WITH_SIZE,  ECF_LEAF | ECF_NOTHROW, NULL)
>> 
>> which returns the “REF_TO_OBJ" same as the 1st argument;
>> 
>> 1st argument “REF_TO_OBJ": Reference to the object;
>> 2nd argument “REF_TO_SIZE”:  Reference to size of the object referenced by 
>> the 1st argument, 
>> if the object that the “REF_TO_OBJ” refered has a
>>   * real type, the SIZE that the “REF_TO_SIZE” referred is the number of the 
>> elements of the type;
>>   * void type, the SIZE that the “REF_TO_SIZE” referred is number of bytes; 
> 
> No, you can't do this.  Conversions between pointers are mostly useless in
> GIMPLE, , so you can't make decisions based on TREE_TYPE (TREE_TYPE (fnarg))
> as it could have some random completely unrelated type.
> So, the multiplication factor needs to be encoded in the arguments rather
> than derived from REF_TO_OBJ's type, and similarly the size of what
> REF_TO_SIZE points to needs to be encoded somewhere.

Okay, I see, so 2 more arguments to the new function.

Qing
> 
>> 3rd argument "ACCESS_MODE": 
>> -1: Unknown access semantics
>>  0: none
>>  1: read_only
>>  2: write_only
>>  3: read_write
> 
>   Jakub
> 



[COMMITTED 1/2] Remove simple ranges from trailing zero bitmasks.

2023-11-03 Thread Andrew MacLeod
WHen we set bitmasks indicating known zero or one bits, we see some 
"obvious" things once in a while that are easy to prevent. ie


unsigned int [2, +INF] MASK 0xfffe VALUE 0x1

the range [2, 2] is obviously impossible since the final bit must be a 
one.   This doesn't usually cause us too much trouble, but the 
subsequent patch triggers some more interesting situations in which it 
helps to remove the obvious ranges when we have  mask that is trailing 
zeros.


Its too much of a performance impact to constantly be checking the range 
every time we set the bitmask, but it turns out that if we simply try to 
take care of it during intersection operations (which happen at most key 
times, like changing an existing value), the impact is pretty minimal.. 
like 0.6% of VRP.


This patch looks for trailing zeros in the mask, and replaces the low 
end range covered by those bits with those bits from the value field.


Bootstraps on build-x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew



From b20f1dce46fb8bb1b142e9087530e546a40edec8 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Tue, 31 Oct 2023 11:51:34 -0400
Subject: [PATCH 1/2] Remove simple ranges from trailing zero bitmasks.

During the intersection operation, it can be helpful to remove any
low-end ranges when the bitmask has trailing zeros.  This prevents
obviously incorrect ranges from appearing without requiring a bitmask
check.

	* value-range.cc (irange_bitmask::adjust_range): New.
	(irange::intersect_bitmask): Call adjust_range.
	* value-range.h (irange_bitmask::adjust_range): New prototype.
---
 gcc/value-range.cc | 30 ++
 gcc/value-range.h  |  2 ++
 2 files changed, 32 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index fcf53efa1dd..a1e72c78f8b 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1857,6 +1857,35 @@ irange::get_bitmask_from_range () const
   return irange_bitmask (wi::zero (prec), min | xorv);
 }
 
+// Remove trailing ranges that this bitmask indicates can't exist.
+
+void
+irange_bitmask::adjust_range (irange &r) const
+{
+  if (unknown_p () || r.undefined_p ())
+return;
+
+  int_range_max range;
+  tree type = r.type ();
+  int prec = TYPE_PRECISION (type);
+  // If there are trailing zeros, create a range representing those bits.
+  gcc_checking_assert (m_mask != 0);
+  int z = wi::ctz (m_mask);
+  if (z)
+{
+  wide_int ub = (wi::one (prec) << z) - 1;
+  range = int_range<5> (type, wi::zero (prec), ub);
+  // Then remove the specific value these bits contain from the range.
+  wide_int value = m_value & ub;
+  range.intersect (int_range<2> (type, value, value, VR_ANTI_RANGE));
+  // Inverting produces a list of ranges which can be valid.
+  range.invert ();
+  // And finally select R from only those valid values.
+  r.intersect (range);
+  return;
+}
+}
+
 // If the the mask can be trivially converted to a range, do so and
 // return TRUE.
 
@@ -2002,6 +2031,7 @@ irange::intersect_bitmask (const irange &r)
 
   if (!set_range_from_bitmask ())
 normalize_kind ();
+  m_bitmask.adjust_range (*this);
   if (flag_checking)
 verify_range ();
   return true;
diff --git a/gcc/value-range.h b/gcc/value-range.h
index e9d81d22cd0..84f65ffb591 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -139,6 +139,8 @@ public:
   void verify_mask () const;
   void dump (FILE *) const;
 
+  void adjust_range (irange &r) const;
+
   // Convenience functions for nonzero bitmask compatibility.
   wide_int get_nonzero_bits () const;
   void set_nonzero_bits (const wide_int &bits);
-- 
2.41.0



[COMMITTED 2/2] PR tree-optimization/111766 - Adjust operators equal and not_equal to check bitmasks against constants

2023-11-03 Thread Andrew MacLeod
When we compare a range against a constant for equality or inequality, 
there is currently no attempt made to utilize the known bits.


This patch adds a method to the irange_bitmask class to ask if a 
specific value satisfies the known bit pattern.  Operators equal and 
not_equal then utilize it when comparing to a constant eliiminating a 
class of cases we don;t currently get. ie.


if (x & 1) return;
if (x == 97657) foo()

will eliminate the call to foo, even though we do not remove all the odd 
numbers from the range.  THe bit pattern comparison for
  [irange] unsigned int [0, 0] [2, +INF] MASK 0xfffe VALUE 0x1  
will indicate that any even constants will be false.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

From eb899fee35b8326b2105c04f58fd58bbdeca9d3b Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 25 Oct 2023 09:46:50 -0400
Subject: [PATCH 2/2] Adjust operators equal and not_equal to check bitmasks
 against constants

Check to see if a comparison to a constant can be determined to always
be not-equal based on the bitmask.

	PR tree-optimization/111766
	gcc/
	* range-op.cc (operator_equal::fold_range): Check constants
	against the bitmask.
	(operator_not_equal::fold_range): Ditto.
	* value-range.h (irange_bitmask::member_p): New.

	gcc/testsuite/
	* gcc.dg/pr111766.c: New.
---
 gcc/range-op.cc | 20 
 gcc/testsuite/gcc.dg/pr111766.c | 13 +
 gcc/value-range.h   | 14 ++
 3 files changed, 43 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr111766.c

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 33b193be7d0..6137f2aeed3 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -931,8 +931,9 @@ operator_equal::fold_range (irange &r, tree type,
 
   // We can be sure the values are always equal or not if both ranges
   // consist of a single value, and then compare them.
-  if (wi::eq_p (op1.lower_bound (), op1.upper_bound ())
-  && wi::eq_p (op2.lower_bound (), op2.upper_bound ()))
+  bool op1_const = wi::eq_p (op1.lower_bound (), op1.upper_bound ());
+  bool op2_const = wi::eq_p (op2.lower_bound (), op2.upper_bound ());
+  if (op1_const && op2_const)
 {
   if (wi::eq_p (op1.lower_bound (), op2.upper_bound()))
 	r = range_true (type);
@@ -947,6 +948,11 @@ operator_equal::fold_range (irange &r, tree type,
   tmp.intersect (op2);
   if (tmp.undefined_p ())
 	r = range_false (type);
+  // Check if a constant cannot satisfy the bitmask requirements.
+  else if (op2_const && !op1.get_bitmask ().member_p (op2.lower_bound ()))
+	 r = range_false (type);
+  else if (op1_const && !op2.get_bitmask ().member_p (op1.lower_bound ()))
+	 r = range_false (type);
   else
 	r = range_true_and_false (type);
 }
@@ -1033,8 +1039,9 @@ operator_not_equal::fold_range (irange &r, tree type,
 
   // We can be sure the values are always equal or not if both ranges
   // consist of a single value, and then compare them.
-  if (wi::eq_p (op1.lower_bound (), op1.upper_bound ())
-  && wi::eq_p (op2.lower_bound (), op2.upper_bound ()))
+  bool op1_const = wi::eq_p (op1.lower_bound (), op1.upper_bound ());
+  bool op2_const = wi::eq_p (op2.lower_bound (), op2.upper_bound ());
+  if (op1_const && op2_const)
 {
   if (wi::ne_p (op1.lower_bound (), op2.upper_bound()))
 	r = range_true (type);
@@ -1049,6 +1056,11 @@ operator_not_equal::fold_range (irange &r, tree type,
   tmp.intersect (op2);
   if (tmp.undefined_p ())
 	r = range_true (type);
+  // Check if a constant cannot satisfy the bitmask requirements.
+  else if (op2_const && !op1.get_bitmask ().member_p (op2.lower_bound ()))
+	 r = range_true (type);
+  else if (op1_const && !op2.get_bitmask ().member_p (op1.lower_bound ()))
+	 r = range_true (type);
   else
 	r = range_true_and_false (type);
 }
diff --git a/gcc/testsuite/gcc.dg/pr111766.c b/gcc/testsuite/gcc.dg/pr111766.c
new file mode 100644
index 000..c27a029c772
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111766.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+int
+foo3n(int c, int bb)
+{
+  if ((bb & ~3)!=0) __builtin_unreachable(); // bb = [0,3]
+  if ((bb & 1)==0) __builtin_unreachable(); // bb&1 == 0 // [0],[3]
+  if(bb == 2) __builtin_trap();
+  return bb;
+}
+
+/* { dg-final { scan-tree-dump-not "trap" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 84f65ffb591..330e6f70c6b 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -139,6 +139,7 @@ public:
   void verify_mask () const;
   void dump (FILE *) const;
 
+  bool member_p (const wide_int &val) const;
   void adjust_range (irange &r) const;
 
   // Convenience functions for nonzero bitmask compatibility.
@@ -202,6 +203,19 @@ irange_bitmask::set_nonzero_bits (const wide_int &bits)
 verify_mask ();
 }
 
+// Return TRUE if val could be a valid value with this bitmask.
+
+

Re: [PATCH] [doc] middle-end/112296 - __builtin_constant_p and side-effects

2023-11-03 Thread Joseph Myers
On Fri, 3 Nov 2023, Richard Biener wrote:

> The following tries to clarify the __builtin_constant_p documentation,
> stating that the argument expression is not evaluated and side-effects
> are discarded.  I'm struggling to find the correct terms matching
> what the C language standard would call things so I'd appreciate
> some help here.
> 
> OK for trunk?

OK.

> Shall we diagnose arguments with side-effects?  It seems to me
> such use is usually unintended?  I think rather than dropping

The traditional use is definitely in macros to choose between code for 
constant arguments (evaluating them more than once) and maybe-out-of-line 
code for non-constant arguments (evaluating them exactly once), in which 
case having a side effect is definitely OK.

> side-effects as a side-effect of folding the frontend should
> discard them at parsing time instead, no?

I suppose the original expression needs to remain around in some form 
until the latest point at which optimizers might decide to evaluate 
__builtin_constant_p to true.  Although cases with possible side effects 
might well be optimized to false earlier; the interesting cases for 
deciding later are e.g. __builtin_constant_p called on an argument to an 
inline function (no side effects for __builtin_constant_p to discard, 
whether or not there are side effects in the caller from evaluating the 
expression passed to the function).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libstdc++/complex: Remove implicit type casts in complex

2023-11-03 Thread Weslley da Silva Pereira
Hi Jonathan,

I am sorry for the delay. The mailing lists libstd...@gcc.gnu.org and
gcc-patches@gcc.gnu.org have just too many emails, so your email got lost.
I hope my changes still make sense to be included in GCC. Please, find my
comments below.

On Thu, May 11, 2023 at 3:57 PM Jonathan Wakely  wrote:

>
>
> On Mon, 27 Mar 2023 at 22:25, Weslley da Silva Pereira via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
>
>> Dear all,
>>
>> Here follows a patch that removes implicit type casts in std::complex.
>>
>> *Description:* The current implementation of `complex<_Tp>` assumes that
>> `int, double, long double` are explicitly convertible to `_Tp`. Moreover,
>> it also assumes that:
>>
>> 1. `int` is implicitly convertible to `_Tp`, e.g., when using
>> `complex<_Tp>(1)`.
>> 2. `long double` can be attributed to a `_Tp` variable, e.g., when using
>> `const _Tp __pi_2 = 1.5707963267948966192313216916397514L`.
>>
>> This patch transforms the implicit casts (1) and (2) into explicit type
>> casts. As a result, `std::complex` is now able to support more types. One
>> example is the type `Eigen::Half` from
>> https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
>> implement implicit type conversions.
>>
>> *ChangeLog:*
>> libstdc++-v3/ChangeLog:
>>
>> * include/std/complex:
>>
>
> Thank you for the patch. Now that we're in developement stage 1 for GCC
> 14, it's time to consider it.
>
> You're missing a proper changelog entry, I suggest:
>
>* include/std/complex (polar, __complex_sqrt)
>(__complex_pow_unsigned, pow, __complex_acos): Replace implicit
>conversions from int and long double to value_type.
>

I agree with your proposal for the changelog.


> You're also missing either a copyright assignment on file with the FSF
> (unless you've completed that paperwork?), or a DCO sign-off. Please see
> https://gcc.gnu.org/contribute.html#legal and https://gcc.gnu.org/dco.html
> for more details.
>

Here is my DCO sign-off:

*Copyright:*
Signed-off-by: Weslley da Silva Pereira 


>
>
>>
>> *Patch:* fix_complex.diff. (Also at
>> https://github.com/gcc-mirror/gcc/pull/84)
>>
>> *OBS:* I didn't find a good reason for adding new tests or test results
>> here since this is really a small upgrade (in my view) to std::complex.
>>
>
> I don't agree. The purpose of this is to support std::complex for a
> type Foo without implicit conversions (which isn't required by the standard
> btw, only the floating-point types are required to work, but we can support
> others as an extension). Without tests, we don't know if that goal has been
> met, and we don't know if the goal continues to be met in future versions.
> A test would ensure that we don't accidentally re-introduce code requiring
> implicit conversions.
>
> With a suitable test, I think this patch will be OK for GCC 14.
>
> Thanks again for contributing.
>
>
>
*Tests:*
See the attached file `test_complex_eigenhalf.cpp`

*Test results:*
- When using x86-64 GCC (trunk), I obtained compilation errors as shown in
the attached text file. Live example at: https://godbolt.org/z/oa9M34h8P.
- I observed no errors after applying the suggested patch on my machine.
- I tried it with the flag `-Wall`. No warnings were shown.
- My machine configuration and my GCC build information are displayed in
the file `config.log` generated by the configuration step of GCC.

Let me know if I need to do anything else.

Thanks,
  Weslley

-- 
Weslley S. Pereira
: In function 'std::complex<_Tp> std::polar(const _Tp&, const _Tp&) 
[with _Tp = Eigen::half]':
:13:20: error: could not convert '0' from 'int' to 'const Eigen::half&'
   13 | z1 = std::polar(one);
  |  ~~^
  ||
  |int
:13:20: note:   when instantiating default argument for call to 
'std::complex<_Tp> std::polar(const _Tp&, const _Tp&) [with _Tp = Eigen::half]'
: In function 'int main()':
:13:20: error: invalid initialization of reference of type 'const 
Eigen::half&' from expression of type 'int'
In file included from :1:
/opt/compiler-explorer/gcc-trunk-20231103/include/c++/14.0.0/complex:967:40: 
note: in passing argument 2 of 'std::complex<_Tp> std::polar(const _Tp&, const 
_Tp&) [with _Tp = Eigen::half]'
  967 | polar(const _Tp& __rho, const _Tp& __theta)
  | ~~~^~~
/opt/compiler-explorer/gcc-trunk-20231103/include/c++/14.0.0/complex: In 
instantiation of &#

Re: [PATCH v3] c++: implement P2564, consteval needs to propagate up [PR107687]

2023-11-03 Thread Jason Merrill

On 11/2/23 11:28, Marek Polacek wrote:

On Sat, Oct 14, 2023 at 12:56:11AM -0400, Jason Merrill wrote:

On 10/10/23 13:20, Marek Polacek wrote:

On Tue, Aug 29, 2023 at 03:26:46PM -0400, Jason Merrill wrote:

On 8/23/23 15:49, Marek Polacek wrote:

+struct A {
+  int x;
+  int y = id(x);
+};
+
+template
+constexpr int k(int) {  // k is not an immediate function because 
A(42) is a
+  return A(42).y;   // constant expression and thus not 
immediate-escalating
+}


Needs use(s) of k to test the comment.


True, and that revealed what I think is a bug in the standard.
In the test I'm saying:

// ??? [expr.const]#example-9 says:
//   k is not an immediate function because A(42) is a
//   constant expression and thus not immediate-escalating
// But I think the call to id(x) is *not* a constant expression and thus
// it is an immediate-escalating expression.  Therefore k *is*
// an immediate function.  So we get the error below.  clang++ agrees.
id(x) is not a constant expression because x isn't constant.


Not when considering id(x) by itself, but in the evaluation of A(42), the
member x has just been initialized to constant 42.  And A(42) is
constant-evaluated because "An aggregate initialization is an immediate
invocation if it evaluates a default member initializer that has a
subexpression that is an immediate-escalating expression."

I assume clang doesn't handle this passage properly yet because it was added
during core review of the paper, for parity between aggregate initialization
and constructor escalation.

This can be a follow-up patch.


I see.  So the fix will be to, for the aggregate initialization case, pass
the whole A(42).y thing to cxx_constant_eval, not just id(x).


Well, A(42) is the immediate invocation, the .y is not part of it.


I suppose some
functions cannot possibly be promoted because they don't contain
any CALL_EXPRs.  So we may be able to rule them out while doing
cp_fold_r early.


Yes.  Or, the only immediate-escalating functions referenced have already
been checked.


It looks like you haven't pursued this yet?  One implementation thought: 
maybe_store_cfun... could stop skipping immediate_escalating_function_p 
(current_function_decl), and after we're done folding if the current 
function isn't in the hash_set we can go ahead and set 
DECL_ESCALATION_CHECKED_P?



We can also do some escalation during constexpr evaluation: all the
functions involved need to be instantiated for the evaluation, and if we
encounter an immediate-escalating expression while evaluating a call to an
immediate-escalating function, we can promote it then.  Though we can't
necessarily mark it as not needing promotion, as there might be i-e exprs in
branches that the particular evaluation doesn't take.


I've tried but I didn't get anywhere.  The patch was basically

--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2983,7 +2983,13 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
} fb (new_call.bindings);

if (*non_constant_p)
-return t;
+{
+  if (cxx_dialect >= cxx20
+ && ctx->manifestly_const_eval == mce_false
+ && DECL_IMMEDIATE_FUNCTION_P (fun))
+   maybe_promote_function_to_consteval (current_function_decl);
+  return t;
+}

/* We can't defer instantiating the function any longer.  */
if (!DECL_INITIAL (fun)

but since I have to check mce_false, it didn't do anything useful
in practice (that is, it wouldn't escalate anything in my tests).


Makes sense.


@@ -55,6 +64,8 @@ enum fold_flags {
 ff_mce_false = 1 << 1,
 /* Whether we're being called from cp_fold_immediate.  */
 ff_fold_immediate = 1 << 2,
+  /* Whether we're escalating immediate-escalating functions.  */
+  ff_escalating = 1 << 3,


If we always escalate when we see a known immediate invocation, this means
recurse.  And maybe we could use at_eof for that?


Yes.  Though, at_eof == 1 doesn't imply that all templates have been
instantiated, only that we got to c_parse_final_cleanups.  Rather than
keeping the ff_ flag, I've updated at_eof to mean that 2 signals that
all templates have been instantiated, and 3 that we're into cgraph
territory.  (Might should use an enum rather than magic numbers.)


Sounds good.


   };
   using fold_flags_t = int;
@@ -428,6 +439,176 @@ lvalue_has_side_effects (tree e)
   return TREE_SIDE_EFFECTS (e);
   }
+/* Return true if FN is an immediate-escalating function.  */
+
+static bool
+immediate_escalating_function_p (tree fn)
+{
+  if (!fn || !flag_immediate_escalation)
+return false;
+
+  gcc_checking_assert (TREE_CODE (fn) == FUNCTION_DECL);


Maybe check DECL_IMMEDIATE_FUNCTION_P early, rather than multiple times
below...


+  /* An immediate-escalating function is
+  -- the call operator of a lambda that is not declared with the consteval
+specifier  */
+  if (LAMBDA_FUNCTION_P (fn) && !DECL_IMMEDIATE_FUNCTION_P (fn))
+return true;
+  /* -- a defaulted special member functi

[pushed] diagnostics: consolidate group-handling fields in diagnostic_context

2023-11-03 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5112-gae8abcb81ed814.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): Update for consolidation
of group-based fields.
(diagnostic_report_diagnostic): Likewise.
(diagnostic_context::begin_group): New, based on body of
auto_diagnostic_group's ctor.
(diagnostic_context::end_group): New, based on body of
auto_diagnostic_group's dtor.
(auto_diagnostic_group::auto_diagnostic_group): Convert to a call
to begin_group.
(auto_diagnostic_group::~auto_diagnostic_group): Convert to a call
to end_group.
* diagnostic.h (diagnostic_context::begin_group): New decl.
(diagnostic_context::end_group): New decl.
(diagnostic_context::diagnostic_group_nesting_depth): Rename to...
(diagnostic_context::m_diagnostic_groups.m_nesting_depth):
...this.
(diagnostic_context::diagnostic_group_emission_count): Rename
to...
(diagnostic_context::m_diagnostic_groups::m_emission_count):
...this.
---
 gcc/diagnostic.cc | 42 --
 gcc/diagnostic.h  | 19 ++-
 2 files changed, 42 insertions(+), 19 deletions(-)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 0f392358aef..0759fae916e 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -218,8 +218,8 @@ diagnostic_initialize (diagnostic_context *context, int 
n_opts)
   context->tabstop = 8;
   context->escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE;
   context->edit_context_ptr = NULL;
-  context->diagnostic_group_nesting_depth = 0;
-  context->diagnostic_group_emission_count = 0;
+  context->m_diagnostic_groups.m_nesting_depth = 0;
+  context->m_diagnostic_groups.m_emission_count = 0;
   context->m_output_format = new diagnostic_text_output_format (*context);
   context->set_locations_cb = nullptr;
   context->ice_handler_cb = NULL;
@@ -1570,9 +1570,9 @@ diagnostic_report_diagnostic (diagnostic_context *context,
 ++diagnostic_kind_count (context, diagnostic->kind);
 
   /* Is this the initial diagnostic within the stack of groups?  */
-  if (context->diagnostic_group_emission_count == 0)
+  if (context->m_diagnostic_groups.m_emission_count == 0)
 context->m_output_format->on_begin_group ();
-  context->diagnostic_group_emission_count++;
+  context->m_diagnostic_groups.m_emission_count++;
 
   pp_format (context->printer, &diagnostic->message);
   context->m_output_format->on_begin_diagnostic (diagnostic);
@@ -2296,28 +2296,42 @@ fancy_abort (const char *file, int line, const char 
*function)
   internal_error ("in %s, at %s:%d", function, trim_filename (file), line);
 }
 
+/* struct diagnostic_context.  */
+
+void
+diagnostic_context::begin_group ()
+{
+  m_diagnostic_groups.m_nesting_depth++;
+}
+
+void
+diagnostic_context::end_group ()
+{
+  if (--m_diagnostic_groups.m_nesting_depth == 0)
+{
+  /* Handle the case where we've popped the final diagnostic group.
+If any diagnostics were emitted, give the context a chance
+to do something.  */
+  if (m_diagnostic_groups.m_emission_count > 0)
+   m_output_format->on_end_group ();
+  m_diagnostic_groups.m_emission_count = 0;
+}
+}
+
 /* class auto_diagnostic_group.  */
 
 /* Constructor: "push" this group into global_dc.  */
 
 auto_diagnostic_group::auto_diagnostic_group ()
 {
-  global_dc->diagnostic_group_nesting_depth++;
+  global_dc->begin_group ();
 }
 
 /* Destructor: "pop" this group from global_dc.  */
 
 auto_diagnostic_group::~auto_diagnostic_group ()
 {
-  if (--global_dc->diagnostic_group_nesting_depth == 0)
-{
-  /* Handle the case where we've popped the final diagnostic group.
-If any diagnostics were emitted, give the context a chance
-to do something.  */
-  if (global_dc->diagnostic_group_emission_count > 0)
-   global_dc->m_output_format->on_end_group ();
-  global_dc->diagnostic_group_emission_count = 0;
-}
+  global_dc->end_group ();
 }
 
 /* class diagnostic_text_output_format : public diagnostic_output_format.  */
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index a2c8740cbd0..ed1b6c0c7b1 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -226,6 +226,12 @@ public:
the context of a diagnostic message.  */
 struct diagnostic_context
 {
+public:
+  void begin_group ();
+  void end_group ();
+
+public:
+
   /* Where most of the diagnostic formatting work is done.  */
   pretty_printer *printer;
 
@@ -443,12 +449,15 @@ struct diagnostic_context
  applied, for generating patches.  */
   edit_context *edit_context_ptr;
 
-  /* How many diagnostic_group instances are currently alive.  */
-  int diagnostic_group_nesting_depth;
+  /* Fields relating to diagnostic groups.  */
+  struct {
+/* How many diagnostic_group instances are currently alive.  */
+int m_nesting_depth;
 
- 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-11-03 Thread Jason Merrill

On 11/3/23 00:44, waffl3x wrote:

That leaves 2, 4, and 5.

2. I am pretty sure xobj functions should have the struct they are a
part of recorded as the method basetype member. I have already checked
that function_type and method_type are the same node type under the
hood and it does appear to be, so it should be trivial to set it.
However I do have to wonder why static member functions don't set it,
it seems to be that it would be convenient to use the same field. Can
you provide any insight into that?



method basetype is only for METHOD_TYPE; if you want the containing type
of the function, that's DECL_CONTEXT.


-- gcc/tree.h:530
#define FUNC_OR_METHOD_CHECK(T) TREE_CHECK2 (T, FUNCTION_TYPE, METHOD_TYPE)
-- gcc/tree.h:2518
#define TYPE_METHOD_BASETYPE(NODE)  \
   (FUNC_OR_METHOD_CHECK (NODE)->type_non_common.maxval)

The code doesn't seem to reflect that, perhaps since the underlying
node type is the same (as far as I can tell, they both inherit from
tree_type_non_common) it wasn't believed to be necessary.


Curious.  It might also be a remnant of how older code dealt with how 
sometimes a member function changes between FUNCTION_TYPE and 
METHOD_TYPE during parsing.



Upon looking at DECL_CONTEXT though I see it seems you were thinking of
FUNCTION_DECL. I haven't observed DECL_CONTEXT being set for
FUNCTION_DECL nodes though, perhaps it should be? Although it's more
likely that it is being set and I just haven't noticed, but if that's
the case I'll have to make a note to make sure it is being set for xobj
member functions.


I would certainly expect it to be getting set already.


I was going to say that this seems like a redundancy, but I realized
that the type of a member function pointer is tied to the struct, so it
actually ends up relevant for METHOD_TYPE nodes. I would hazard a guess
that when forming member function pointers the FUNCTION_DECL node was
not as easily accessed. If I remember correctly that is not the case
right now so it might be worthwhile to refactor away from
TYPE_METHOD_BASETYPE and replace uses of it with DECL_CONTEXT.


For a pointer-to-member, the class type is part of the type, so trying 
to remove it from the type doesn't sound like an improvement to me. 
Specifically, TYPE_PTRMEM_CLASS_TYPE refers to TYPE_METHOD_BASETYPE for 
a pointer to member function.



I'm getting ahead of myself though, I'll stop here and avoid going on
too much of a refactoring tangent. I do want this patch to make it into
GCC14 after all.


Good plan.  :)


4. I have no comment here, but it does concern me since I don't
understand it at all.


In the list near the top of cp-tree.h, DECL_LANG_FLAG_6 for a
FUNCTION_DECL is documented to be DECL_THIS_STATIC, which should only be
set on the static member.


Right, I'll try to remember to check this area in the future, but yeah
that tracks because I did remove that flag. Removing that flag just so
happened to be the start of this saga of bug fixes but alas, it had to
be done.


5. I am pretty sure this is fine for now, but if xobj member functions
ever were to support virtual/override capabilities, then it would be a
problem. Is my understanding correct, or is there some other reason
that iobj member functions have a different value here?


It is surprising that an iobj memfn would have a different DECL_ALIGN,
but it shouldn't be a problem; the vtables only rely on alignment being
at least 2.


I'll put a note for myself to look into it in the future, it's an
oddity at minimum and oddities interest me :^).


There are also some differences in the arg param in
cp_build_addr_expr_1 that concerns me, but most of them are reflected
in the differences I have already noted. I had wanted to include these
differences as well but I have been spending too much time staring at
it, it's no longer productive. In short, the indirect_ref node for xobj
member functions has reference_to_this set, while iobj member functions
do not.


That's a result of difference 3.


Okay, makes sense, I'm mildly concerned about any possible side effects
this might have since we have a function_type node suddenly going
through execution paths that only method_type went through before. The
whole "reference_to_this" "pointer_to_this" thing is a little confusing
because I'm pretty sure that doesn't refer to the actual `this` object
argument or parameter since I've seen it all over the place. Hopefully
it's benign.


Yes, "pointer_to_this" is just a cache of the type that is a pointer to 
the type you're looking at.  You are correct that it has nothing to do 
with the C++ 'this'.



The baselink binfo field has the private flag set for xobj
member functions, iobj member functions does not.


TREE_PRIVATE on a binfo is part of BINFO_ACCESS, which isn't a fixed
value, but gets updated during member search. Perhaps the differences
in consideration of conversion to a base led to it being set or cleared
differently? I wouldn't worry too much about it unless you see
di

Re: [AVR PATCH] Improvements to SImode and PSImode shifts by constants.

2023-11-03 Thread Georg-Johann Lay




Am 02.11.23 um 12:54 schrieb Roger Sayle:


This patch provides non-looping implementations for more SImode (32-bit)
and PSImode (24-bit) shifts on AVR.  For most cases, these are shorter
and faster than using a loop, but for a few (controlled by optimize_size)


Maybe this should also adjust the insn costs, like in avr_rtx_costs_1?

Depending on what you are outputting, avr_asm_len() might be more
convenient.

What I am not sure about are the text cases that expect exact sequences
which might be annoying in the future?

Johann



they are a little larger but significantly faster,  The approach is to
perform byte-based shifts by 1, 2 or 3 bytes, followed by bit-based shifts
(effectively in a narrower type) for the remaining bits, beyond 8, 16 or 24.

For example, the simple test case below (inspired by PR 112268):

unsigned long foo(unsigned long x)
{
   return x >> 26;
}

gcc -O2 currently generates:

foo:ldi r18,26
1:  lsr r25
 ror r24
 ror r23
 ror r22
 dec r18
 brne 1b
 ret

which is 8 instructions, and takes ~158 cycles.
With this patch, we now generate:

foo:mov r22,r25
 clr r23
 clr r24
 clr r25
 lsr r22
 lsr r22
 ret

which is 7 instructions, and takes ~7 cycles.

One complication is that the modified functions sometimes use spaces instead
of TABs, with occasional mistakes in GNU-style formatting, so I've fixed
these indentation/whitespace issues.  There's no change in the code for the
cases previously handled/special-cased, with the exception of ashrqi3 reg,5
where with -Os a (4-instruction) loop is shorter than the five single-bit
shifts of a fully unrolled implementation.

This patch has been (partially) tested with a cross-compiler to avr-elf
hosted on x86_64, without a simulator, where the compile-only tests in
the gcc testsuite show no regressions.  If someone could test this more
thoroughly that would be great.


2023-11-02  Roger Sayle  

gcc/ChangeLog
 * config/avr/avr.cc (ashlqi3_out): Fix indentation whitespace.
 (ashlhi3_out): Likewise.
 (avr_out_ashlpsi3): Likewise.  Handle shifts by 9 and 17-22.
 (ashlsi3_out): Fix formatting.  Handle shifts by 9 and 25-30.
 (ashrqi3_our): Use loop for shifts by 5 when optimizing for size.
 Fix indentation whitespace.
 (ashrhi3_out): Likewise.
 (avr_out_ashrpsi3): Likewise.  Handle shifts by 17.
 (ashrsi3_out): Fix indentation.  Handle shifts by 17 and 25.
 (lshrqi3_out): Fix whitespace.
 (lshrhi3_out): Likewise.
 (avr_out_lshrpsi3): Likewise.  Handle shifts by 9 and 17-22.
 (lshrsi3_out): Fix indentation.  Handle shifts by 9,17,18 and 25-30.

gcc/testsuite/ChangeLog
 * gcc.target/avr/ashlsi-1.c: New test case.
 * gcc.target/avr/ashlsi-2.c: Likewise.
 * gcc.target/avr/ashrsi-1.c: Likewise.
 * gcc.target/avr/ashrsi-2.c: Likewise.
 * gcc.target/avr/lshrsi-1.c: Likewise.
 * gcc.target/avr/lshrsi-2.c: Likewise.


Thanks in advance,
Roger
--



[PATCH] Fortran: fix issue with multiple references of a procedure pointer [PR97245]

2023-11-03 Thread Harald Anlauf
Dear all,

this is a rather weird bug with a very simple fix.  If a procedure pointer
is referenced in a CALL, a symbol was created shadowing the original
declaration if it was host-associated.  Funnily, this affected only
references of the procedure pointer after the first CALL.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Would it be OK to backport this fix to 13-branch?

Thanks,
Harald

From 1ca323b8d58846d0890a8595ba9fc7bc7eee8fdd Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 3 Nov 2023 19:41:54 +0100
Subject: [PATCH] Fortran: fix issue with multiple references of a procedure
 pointer [PR97245]

gcc/fortran/ChangeLog:

	PR fortran/97245
	* match.cc (gfc_match_call): If a procedure pointer has already been
	resolved, do not create a new symbol in a procedure reference of
	the same name shadowing the first one if it is host-associated.

gcc/testsuite/ChangeLog:

	PR fortran/97245
	* gfortran.dg/proc_ptr_53.f90: New test.
---
 gcc/fortran/match.cc  |  1 +
 gcc/testsuite/gfortran.dg/proc_ptr_53.f90 | 35 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/proc_ptr_53.f90

diff --git a/gcc/fortran/match.cc b/gcc/fortran/match.cc
index f848e52be4c..9e3571d3dbe 100644
--- a/gcc/fortran/match.cc
+++ b/gcc/fortran/match.cc
@@ -5064,6 +5064,7 @@ gfc_match_call (void)
  right association is made.  They are thrown out in resolution.)
  ...  */
   if (!sym->attr.generic
+	&& !sym->attr.proc_pointer
 	&& !sym->attr.subroutine
 	&& !sym->attr.function)
 {
diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_53.f90 b/gcc/testsuite/gfortran.dg/proc_ptr_53.f90
new file mode 100644
index 000..29dd08d9f75
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/proc_ptr_53.f90
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! PR fortran/97245 - ASSOCIATED intrinsic did not recognize a
+!pointer variable the second time it is used
+
+MODULE formulaciones
+  IMPLICIT NONE
+
+  ABSTRACT INTERFACE
+ SUBROUTINE proc_void()
+ END SUBROUTINE proc_void
+  end INTERFACE
+
+  PROCEDURE(proc_void), POINTER :: pADJSensib => NULL()
+
+CONTAINS
+
+  subroutine calculo()
+PROCEDURE(proc_void), POINTER :: otherprocptr => NULL()
+
+IF (associated(pADJSensib)) THEN
+   CALL pADJSensib ()
+ENDIF
+IF (associated(pADJSensib)) THEN! this was erroneously rejected
+   CALL pADJSensib ()
+END IF
+
+IF (associated(otherprocptr)) THEN
+   CALL otherprocptr ()
+ENDIF
+IF (associated(otherprocptr)) THEN
+   CALL otherprocptr ()
+END IF
+  end subroutine calculo
+
+END MODULE formulaciones
--
2.35.3



[PATCH] c++, v2: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]

2023-11-03 Thread Jakub Jelinek
On Thu, Nov 02, 2023 at 10:23:27PM -0400, Jason Merrill wrote:
> Under the existing diagnostic I'd like to distinguish the different cases
> more, e.g.
> 
> "multicharacter literal with %d characters exceeds 'int' size of %d bytes"
> "multicharacter literal cannot have an encoding prefix"

Ok.  The following updated patch does all of that.  I've also integrated
the follow-up patch into it (to count the source character set characters
correctly).

I've just used multi-character rather than multicharacter because the
existing diagnostics spell it that way (I know the C++ standard doesn't
use the hyphen).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-03  Jakub Jelinek  

PR c++/110341
libcpp/
* charset.cc: Implement C++ 26 P1854R4 - Making non-encodable string
literals ill-formed.
(one_count_chars, convert_count_chars, count_source_chars): New
functions.
(narrow_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
interpret token also as CPP_STRING32 and if number of characters
in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
pedwarn on it.  Make the diagnostics more detailed.
(wide_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *.  Make the diagnostics more detailed.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst and
wide_str_to_charconst callers.
gcc/testsuite/
* g++.dg/cpp26/literals1.C: New test.
* g++.dg/cpp26/literals2.C: New test.
* g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
* g++.dg/cpp23/wchar-multi2.C: Likewise.
* gcc.dg/c2x-utf8char-3.c: Likewise.
* gcc.dg/cpp/charconst-4.c: Likewise.
* gcc.dg/cpp/charconst.c: Likewise.
* gcc.dg/cpp/if-2.c: Likewise.
* gcc.dg/utf16-4.c: Likewise.
* gcc.dg/utf32-4.c: Likewise.
* g++.dg/cpp1z/utf8-neg.C: Likewise.
* g++.dg/cpp2a/ucn2.C: Likewise.
* g++.dg/ext/utf16-4.C: Likewise.
* g++.dg/ext/utf32-4.C: Likewise.

--- libcpp/charset.cc.jj2023-11-02 07:49:20.975811244 +0100
+++ libcpp/charset.cc   2023-11-03 11:57:56.738701066 +0100
@@ -446,6 +446,73 @@ one_utf16_to_utf8 (iconv_t bigend, const
   return 0;
 }
 
+
+/* Special routine which just counts number of characters in the
+   string, what exactly is stored into the output doesn't matter
+   as long as it is one uchar per character.  */
+
+static inline int
+one_count_chars (iconv_t, const uchar **inbufp, size_t *inbytesleftp,
+uchar **outbufp, size_t *outbytesleftp)
+{
+  cppchar_t s = 0;
+  int rval;
+
+  /* Check for space first, since we know exactly how much we need.  */
+  if (*outbytesleftp < 1)
+return E2BIG;
+
+#if HOST_CHARSET == HOST_CHARSET_ASCII
+  rval = one_utf8_to_cppchar (inbufp, inbytesleftp, &s);
+  if (rval)
+return rval;
+#else
+  if (*inbytesleftp < 1)
+return EINVAL;
+  static const uchar utf_ebcdic_map[256] = {
+/* See table 4 in http://unicode.org/reports/tr16/tr16-7.2.html  */
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1,
+1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1, 1,
+1, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 1, 1, 1, 1,
+9, 9, 9, 9, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
+2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2,
+2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 3, 3,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 4, 4, 4, 4,
+1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 5, 5, 5,
+1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 6, 6, 7, 7, 0
+  };
+  rval = utf_ebcdic_map[**inbufp];
+  if (rval == 9)
+return EILSEQ;
+  if (rval == 0)
+rval = 1;
+  if (rval >= 2)
+{
+  if (*inbytesleftp < rval)
+   return EINVAL;
+  for (int i = 1; i < rval; ++i)
+   if (utf_ebcdic_map[(*inbufp)[i]] != 9)
+ return EILSEQ;
+}
+  *inbytesleftp -= rval;
+  *inbufp += rval;
+#endif
+
+  **outbufp = ' ';
+
+  *outbufp += 1;
+  *outbytesleftp -= 1;
+  return 0;
+}
+
+
 /* Helper routine for the next few functions.  The 'const' on
one_conversion means that we promise not to modify what function is
pointed to, which lets the inliner see through it.  */
@@ -529,6 +596,15 @@ convert_utf32_utf8 (iconv_t cd, const uc
   return conversion_loop (one_utf32_to_utf8, cd, from, flen, to);
 }
 
+/* Magic conversion which just counts characters from input, so
+   only to->len is significant.  */
+static bool
+convert_count_chars (iconv_t cd, const uchar *from,
+   

[RFC] vect: disable multiple calls of poly simdclones

2023-11-03 Thread Andre Vieira (lists)

Hi,

The current codegen code to support VF's that are multiples of a 
simdclone simdlen rely on BIT_FIELD_REF to create multiple input 
vectors.  This does not work for non-constant simdclones, so we should 
disable using such clones when
the VF is a multiple of the non-constant simdlen until we change the 
codegen to support those.


Enabling SVE simdclone support will cause ICEs if the vectorizer decides 
to use a SVE simdclone with a VF that is larger than the simdlen. I'll 
be away for the next two weeks, so cant' really discuss this further.
I initially tried to solve the problem, but the way 
vectorizable_simd_clone_call is structured doesn't make it easy to 
replace BIT_FIELD_REF with the poly-suitable solution right now of using 
unpack_{hi,lo}. Unfortunately I only found this now as I was adding 
further tests for SVE :(


gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones
with non-constant simdlen when VF is not exactly the same.diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
5f262cae2aae784e3ef4fd07455b7aa742797b51..dc3e0716161838aef66cf37342499006673336d6
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4165,7 +4165,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen,
  &num_calls)
|| (!n->simdclone->inbranch && (masked_call_offset > 0))
-   || (nargs != simd_nargs))
+   || (nargs != simd_nargs)
+   /* Currently we do not support multiple calls of non-constant
+  simdlen as poly vectors can not be accessed by BIT_FIELD_REF.  */
+   || (!n->simdclone->simdlen.is_constant () && num_calls != 1))
  continue;
if (num_calls != 1)
  this_badness += exact_log2 (num_calls) * 4096;


[PATCH] attribs: Fix ICE with -Wno-attributes= [PR112339]

2023-11-03 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because with -Wno-attributes=foo::no_sanitize
(but generally any other non-gnu namespace and some gnu well known attribute
name within that other namespace) the FEs don't really parse attribute
arguments of such attribute, but lookup_attribute_spec is non-NULL with
NULL handler and such attributes are added to DECL_ATTRIBUTES or
TYPE_ATTRIBUTES and then when e.g. middle-end does lookup_attribute
on a particular attribute and expects the attribute to mean something
and/or have a particular verified arguments, it can crash when seeing
the foreign attribute in there instead.

The following patch fixes that by never adding ignored attributes
to DECL_ATTRIBUTES/TYPE_ATTRIBUTES, previously that was the case just
for attributes in ignored namespace (where lookup_attribute_space
returned NULL).  We don't really know anything about those attributes,
so shouldn't pretend we know something about them, especially when
the arguments are error_mark_node or NULL instead of something that
would have been parsed.  And it would be really weird if we normally
ignore say [[clang::unused]] attribute, but when people use
-Wno-attributes=clang::unused we actually treated it as gnu::unused.
All the user asked for is suppress warnings about that attribute being
unknown.

The first hunk is just playing safe, I'm worried people could
-Wno-attributes=gnu::
and get various crashes with known GNU attributes not being actually
parsed and recorded (or worse e.g. when we tweak standard attributes
into GNU attributes and we wouldn't add those).
The -Wno-attributes= documentation says that it suppresses warning about
unknown attributes, so I think -Wno-attributes=gnu:: should prevent
warning about say [[gnu::foobarbaz]] attribute, but not about
[[gnu::unused]] because the latter is a known attribute.
The routine would return true for any scoped attribute in the ignored
namespace, with the change it ignores only unknown attributes in ignored
namespace, known ones in there will be ignored only if they have
max_length of -2 (e.g.. with
-Wno-attributes=gnu:: -Wno-attributes=gnu::foobarbaz).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-11-03  Jakub Jelinek  

PR c/112339
* attribs.cc (attribute_ignored_p): Only return true for
attr_namespace_ignored_p if as is NULL.
(decl_attributes): Never add ignored attributes.

* c-c++-common/ubsan/Wno-attributes-1.c: New test.

--- gcc/attribs.cc.jj   2023-11-02 20:22:02.017016371 +0100
+++ gcc/attribs.cc  2023-11-03 08:45:32.688726738 +0100
@@ -579,9 +579,9 @@ attribute_ignored_p (tree attr)
 return false;
   if (tree ns = get_attribute_namespace (attr))
 {
-  if (attr_namespace_ignored_p (ns))
-   return true;
   const attribute_spec *as = lookup_attribute_spec (TREE_PURPOSE (attr));
+  if (as == NULL && attr_namespace_ignored_p (ns))
+   return true;
   if (as && as->max_length == -2)
return true;
 }
@@ -862,7 +862,10 @@ decl_attributes (tree *node, tree attrib
}
}
 
-  if (no_add_attrs)
+  if (no_add_attrs
+ /* Don't add attributes registered just for -Wno-attributes=foo::bar
+purposes.  */
+ || attribute_ignored_p (attr))
continue;
 
   if (spec->handler != NULL)
--- gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c.jj  2023-11-03 
08:44:13.752837201 +0100
+++ gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c 2023-11-03 
08:44:13.751837215 +0100
@@ -0,0 +1,9 @@
+/* PR c/112339 */
+/* { dg-do compile { target { c++11 || c } } } */
+/* { dg-options "-Wno-attributes=foo::no_sanitize -fsanitize=undefined" } */
+/* { dg-additional-options "-std=c2x" { target c } } */
+
+[[foo::no_sanitize("bounds")]] void
+foo (void)
+{
+}

Jakub



Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao


> On Nov 2, 2023, at 8:13 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:00 AM Richard Biener
>  wrote:
>> 
>> On Wed, Nov 1, 2023 at 3:47 PM Qing Zhao  wrote:
>>> 
>>> 
>>> 
 On Oct 31, 2023, at 6:14 PM, Joseph Myers  wrote:
 
 On Tue, 31 Oct 2023, Qing Zhao wrote:
 
> 2.3 A new semantic requirement in the user documentation of "counted_by"
> 
> For the following structure including a FAM with a counted_by attribute:
> 
> struct A
> {
>  size_t size;
>  char buf[] __attribute__((counted_by(size)));
> };
> 
> for any object with such type:
> 
> struct A *obj = __builtin_malloc (sizeof(struct A) + sz * sizeof(char));
> 
> The setting to the size field should be done before the first reference
> to the FAM field.
> 
> Such requirement to the user will guarantee that the first reference to
> the FAM knows the size of the FAM.
> 
> We need to add this additional requirement to the user document.
 
 Make sure the manual is very specific about exactly when size is
 considered to be an accurate representation of the space available for buf
 (given that, after malloc or realloc, it's going to be temporarily
 inaccurate).  If the intent is that inaccurate size at such a time means
 undefined behavior, say so explicitly.
>>> 
>>> Yes, good point. We need to define this clearly in the beginning.
>>> We need to explicit say that
>>> 
>>> the size of the FAM is defined by the latest “counted_by” value. And it’s 
>>> an undefined behavior when the size field is not defined when the FAM is 
>>> referenced.
>>> 
>>> Is the above good enough?
>>> 
>>> 
 
> 2.4 Replace FAM field accesses with the new function ACCESS_WITH_SIZE
> 
> In C FE:
> 
> for every reference to a FAM, for example, "obj->buf" in the small 
> example,
> check whether the corresponding FIELD_DECL has a "counted_by" attribute?
> if YES, replace the reference to "obj->buf" with a call to
> .ACCESS_WITH_SIZE (obj->buf, obj->size, -1);
 
 This seems plausible - but you should also consider the case of static
 initializers - remember the GNU extension for statically allocated objects
 with flexible array members (unless you're not allowing it with
 counted_by).
 
 static struct A x = { sizeof "hello", "hello" };
 static char *y = &x.buf;
 
 I'd expect that to be valid - and unless you say such a usage is invalid,
>>> 
>>> At this moment, I think that this should be valid.
>>> 
>>> I,e, the following:
>>> 
>>> struct A
>>> {
>>> size_t size;
>>> char buf[] __attribute__((counted_by(size)));
>>> };
>>> 
>>> static struct A x = {sizeof "hello", "hello”};
>>> 
>>> Should be valid, and x.size represents the number of elements of x.buf.
>>> Both x.size and x.buf are initialized statically.
>>> 
 you should avoid the replacement in such a static initializer context when
 the FAM reference is to an object with a constant address (if
 .ACCESS_WITH_SIZE would not act as an lvalue whose address is a constant
 expression; if it works fine as a constant-address lvalue, then the
 replacement would be OK).
>>> 
>>> Then if such usage for the “counted_by” is valid, we need to replace the FAM
>>> reference by a call to  .ACCESS_WITH_SIZE as well.
>>> Otherwise the “counted_by” relationship will be lost to the Middle end.
>>> 
>>> With the current definition of .ACCESS_WITH_SIZE
>>> 
>>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>>> 
>>> Isn’t the PTR (return value of the call) a LVALUE?
>> 
>> You probably want to specify that when a pointer to the array is taken the
>> pointer has to be to the first array element (or do we want to mangle the
>> 'size' accordingly for the instrumentation?).  You also want to specify that
>> the 'size' associated with such pointer is assumed to be unchanging and
>> after changing the size such pointer has to be re-obtained.  Plus that
>> changes to the allocated object/size have to be performed through an
>> lvalue where the containing type and thus the 'counted_by' attribute is
>> visible.  That is,
>> 
>> size_t *s = &a.size;
>> *s = 1;
>> 
>> is invoking undefined behavior, likewise modifying 'buf' (makes it a bit
>> awkward since for example that wouldn't support using posix_memalign
>> for allocation, though aligned_alloc would be fine).
>> 
> I believe Qing's original documentation for counted_by makes the
> relationship between 'size' and the FAM very clear and that without
> agreement it'll result in undefined behavior. Though it might be best
> to state that in a strong way.

I will update the counted-by documentation with the following additions:

1. The initialization of the size field should be done before the first 
reference to the FAM;
And.
 2. A later reference to the FAM will use the latest value assigned to the size 
field before that reference;

I think adding these two on top of my 

Re: RFC: the proposal to resolve the missing dependency issue for counted_by attribute

2023-11-03 Thread Qing Zhao
Yes, after today’s discussion, I think we agreed on 

1. Passing the size field by reference to .ACCESS_WITH_SIZE as jakub suggested.
2. Then the compiler should be able to always use the latest value of size 
field for the reference to FAM.

As a result, no need to add code for pointer re-obtaining purpose in the source 
code. 

I will update the proposal one more time.

thanks.

Qing

> On Nov 2, 2023, at 8:28 PM, Bill Wendling  wrote:
> 
> On Thu, Nov 2, 2023 at 1:36 PM Qing Zhao  wrote:
>> 
>> Thanks a lot for raising these issues.
>> 
>> If I understand correctly,  the major question we need to answer is:
>> 
>> For the following example: (Jakub mentioned this  in an early message)
>> 
>>  1 struct S { int a; char b __attribute__((counted_by (a))) []; };
>>  2 struct S s;
>>  3 s.a = 5;
>>  4 char *p = &s.b[2];
>>  5 int i1 = __builtin_dynamic_object_size (p, 0);
>>  6 s.a = 3;
>>  7 int i2 = __builtin_dynamic_object_size (p, 0);
>> 
>> Should the 2nd __bdos call (line 7) get
>>A. the latest value of s.a (line 6) for it’s size?
>> Or  B. the value when the s.b was referenced (line 3, line 4)?
>> 
> I personally think it should be (A). The user is specifically
> indicating that the size has somehow changed, and the compiler should
> behave accordingly.
> 
>> A should be more convenient for the user to use the dynamic array feature.
>> With B, the user has to modify the source code (to add code to “re-obtain”
>> the pointer after the size was adjusted at line 6) as mentioned by Richard.
>> 
>> This depends on how we design the new internal function .ACCESS_WITH_SIZE
>> 
>> 1. Size is passed by value to .ACCESS_WITH_SIZE as we currently designed.
>> 
>> PTR = .ACCESS_WITH_SIZE (PTR, SIZE, ACCESS_MODE)
>> 
>> 2. Size is passed by reference to .ACCESS_WITH_SIZE as Jakub suggested.
>> 
>> PTR = .ACCESS_WITH_SIZE(PTR, &SIZE, TYPEOFSIZE, ACCESS_MODE)
>> 
>> With 1, We can only provide B, the user needs to modify the source code to 
>> get the full feature of dynamic array;
>> With 2, We can provide  A, the user will get full support to the dynamic 
>> array without restrictions in the source code.
>> 
> My understanding of ACCESS_WITH_SIZE is that it's there to add an
> explicit reference to SIZE so that the optimizers won't reorder the
> code incorrectly. If that's the case, then it should act as if
> ACCESS_WITH_SIZE wasn't even there (i.e. it's just a pointer
> dereference into the FAM). We get that with (2) it appears. It would
> be a major headache to make the user go throughout their code base to
> ensure that SIZE was either unmodified, or if it was that extra code
> must be added to ensure the expected behavior.
> 
>> However, We have to pay additional cost for supporting A by using 2, which 
>> includes:
>> 
>> 1. .ACCESS_WITH_SIZE will become an escape point, which will further impact 
>> the IPA optimizations, more runtime overhead.
>>Then .ACCESS_WTH_SIZE will not be CONST, right? But it will still be PURE?
>> 
>> 2. __builtin_dynamic_object_size will NOT be LEAF anymore.  This will also 
>> impact some IPA optimizations, more runtime overhead.
>> 
>> I think the following are the factors that make the decision:
>> 
>> 1. How big the performance impact?
>> 2. How important the dynamic array feature? Is adding some user restrictions 
>> as Richard mentioned feasible to support this feature?
>> 
>> Maybe we can implement 1 first, if the full support to the dynamic array is 
>> needed, we can add 2 then?
>> Or, we can implement both, and compare the performance difference, then 
>> decide?
>> 
>> Qing
>> 



[ARC PATCH] Provide a TARGET_FOLD_BUILTIN target hook.

2023-11-03 Thread Roger Sayle

This patch implements a arc_fold_builtin target hook to allow ARC
builtins to be folded at the tree-level.  Currently this function
converts __builtin_arc_swap into a LROTATE_EXPR at the tree-level,
and evaluates __builtin_arc_norm and __builtin_arc_normw of integer
constant arguments at compile-time.  Because ARC_BUILTIIN_SWAP is
now handled at the tree-level, UNSPEC_ARC_SWAP no longer used,
allowing it and the "swap" define_insn to be removed.

An example benefit of folding things at compile-time is that
calling __builtin_arc_swap on the result of __builtin_arc_swap
now eliminates both and generates no code, and likewise calling
__builtin_arc_swap of a constant integer argument is evaluated
at compile-time.

Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-11-03  Roger Sayle  

gcc/ChangeLog
* config/arc/arc.cc (TARGET_FOLD_BUILTIN): Define to
arc_fold_builtin.
(arc_fold_builtin): New function.  Convert ARC_BUILTIN_SWAP
into a rotate.  Evaluate ARC_BUILTIN_NORM and
ARC_BUILTIN_NORMW of constant arguments.
* config/arc/arc.md (UNSPEC_ARC_SWAP): Delete.
(normw): Make output template/assembler whitespace consistent.
(swap): Remove define_insn, only use of SWAP UNSPEC.
* config/arc/builtins.def: Tweak indentation.
(SWAP): Expand using rotlsi2_cnt16 instead of using swap.

gcc/testsuite/ChangeLog
* gcc.target/arc/builtin_norm-1.c: New test case.
* gcc.target/arc/builtin_norm-2.c: Likewise.
* gcc.target/arc/builtin_normw-1.c: Likewise.
* gcc.target/arc/builtin_normw-2.c: Likewise.
* gcc.target/arc/builtin_swap-1.c: Likewise.
* gcc.target/arc/builtin_swap-2.c: Likewise.
* gcc.target/arc/builtin_swap-3.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index e209ad2..70ee410 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -643,6 +643,9 @@ static rtx arc_legitimize_address_0 (rtx, rtx, machine_mode 
mode);
 #undef  TARGET_EXPAND_BUILTIN
 #define TARGET_EXPAND_BUILTIN arc_expand_builtin
 
+#undef  TARGET_FOLD_BUILTIN
+#define TARGET_FOLD_BUILTIN arc_fold_builtin
+
 #undef  TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL arc_builtin_decl
 
@@ -7048,6 +7051,46 @@ arc_expand_builtin (tree exp,
 return const0_rtx;
 }
 
+/* Implement TARGET_FOLD_BUILTIN.  */
+
+static tree
+arc_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *arg,
+  bool ignore ATTRIBUTE_UNUSED)
+{
+  unsigned int fcode = DECL_MD_FUNCTION_CODE (fndecl);
+
+  switch (fcode)
+{
+default:
+  break;
+
+case ARC_BUILTIN_SWAP:
+  return fold_build2 (LROTATE_EXPR, integer_type_node, arg[0],
+  build_int_cst (integer_type_node, 16));
+
+case ARC_BUILTIN_NORM:
+  if (TREE_CODE (arg[0]) == INTEGER_CST
+ && !TREE_OVERFLOW (arg[0]))
+   {
+ wide_int arg0 = wi::to_wide (arg[0], 32);
+ wide_int result = wi::shwi (wi::clrsb (arg0), 32);
+ return wide_int_to_tree (integer_type_node, result);
+   }
+  break;
+
+case ARC_BUILTIN_NORMW:
+  if (TREE_CODE (arg[0]) == INTEGER_CST
+ && !TREE_OVERFLOW (arg[0]))
+   {
+ wide_int arg0 = wi::to_wide (arg[0], 16);
+ wide_int result = wi::shwi (wi::clrsb (arg0), 32);
+ return wide_int_to_tree (integer_type_node, result);
+   }
+  break;
+}
+  return NULL_TREE;
+}
+
 /* Returns true if the operands[opno] is a valid compile-time constant to be
used as register number in the code for builtins.  Else it flags an error
and returns false.  */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 96ff62d..9e81d13 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -116,7 +116,6 @@
   UNSPEC_TLS_OFF
   UNSPEC_ARC_NORM
   UNSPEC_ARC_NORMW
-  UNSPEC_ARC_SWAP
   UNSPEC_ARC_DIVAW
   UNSPEC_ARC_DIRECT
   UNSPEC_ARC_LP
@@ -4355,8 +4354,8 @@ archs4x, archs4xd"
  (clrsb:HI (match_operand:HI 1 "general_operand" "cL,Cal"]
   "TARGET_NORM"
   "@
-   norm%_ \t%0, %1
-   norm%_ \t%0, %1"
+   norm%_\\t%0,%1
+   norm%_\\t%0,%1"
   [(set_attr "length" "4,8")
(set_attr "type" "two_cycle_core,two_cycle_core")])
 
@@ -4453,18 +4452,6 @@ archs4x, archs4xd"
 [(set_attr "type" "unary")
  (set_attr "length" "20")])
 
-(define_insn "swap"
-  [(set (match_operand:SI  0 "dest_reg_operand" "=w,w,w")
-   (unspec:SI [(match_operand:SI 1 "general_operand" "L,Cal,c")]
-   UNSPEC_ARC_SWAP))]
-  "TARGET_SWAP"
-  "@
-   swap \t%0, %1
-   swap \t%0, %1
-   swap \t%0, %1"
-  [(set_attr "length" "4,8,4")
-   (set_attr "type" "two_cycle_core,two_cycle_core,two_cycle_core")])
-
 (define_insn "divaw"
   [(set (match_operand:SI 0 "dest_reg_operand" "=&w,&w,&w")
   

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-11-03 Thread Kwok Cheung Yeung

On 17/10/2023 2:12 pm, Tobias Burnus wrote:

C++11 (and C23) attribute do not seem to be properly handled:

[[omp::decl (declare target,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect)]]
int bar(void) { return 8; }
[[omp::directive (begin declare target,indirect)]];
int baz(void) { return 11; }
[[omp::directive (end declare target)]];

While I get for the last one ("baz"):

__attribute__((omp declare target, omp declare target block, omp declare 
target indirect))


the first two (foo and bar) do not have any attribute; if I remove the 
"indirect",
I do get "__attribute__((omp declare target))". Hence, the omp::decl 
support seems to

partially work.



If we replace the 'indirect' with 'device_type', we get a 'directive 
with only ‘device_type’ clause' error as the affected function has not 
been specified. I have updated the parser so that a similar message is 
emitted if only 'device_type' or 'indirect' clauses are supplied.





The following works - but there is not a testcase for either syntax:

int bar(void) { return 8; }
[[omp::directive(declare target to(bar) , indirect(1))]];
int baz(void) { return 11; }
[[omp::directive ( declare target indirect enter(baz))]];

int bar(void) { return 8; }
#pragma omp declare target to(bar) , indirect(1)
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)

(There is one for #pragma + 'to' in gomp/declare-target-indirect-2.c, 
however.)


Side remark: OpenMP 5.2 renamed 'to' to 'enter' (with deprecated alias 
'to);
hence, I also use 'enter' above. The current testcases for indiredt use 
'enter'.

(Not that it should make a difference as the to/enter do work.)



Added to g++.dg/gomp/declare-target-indirect-1.C.


The following seems to work fine, but I think we do not have
a testcase for it ('bar' has no indirect, foo and baz have it):

#pragma omp begin declare target indirect(1)
int foo(void) { return 5; }
#pragma omp begin declare target indirect(0)
int bar(void) { return 8; }
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)
#pragma omp end declare target
#pragma omp end declare target



Added to c-c++-common/gomp/declare-target-indirect-2.c.


Possibly affecting other logical flags as well, but I do notice that
gcc but not g++ accepts the following:

#pragma omp begin declare target indirect("abs")
#pragma omp begin declare target indirect(5.5)

g++ shows: error: expected constant integer expression

OpenMP requires 'constant boolean' expr (OpenMP 5.1) or
'expression of logical type','constant' (OpenMP 5.2), where for the 
latter it has:


"The OpenMP *logical type* supports logical variables and expressions in 
any base language.
"[C / C++] Any OpenMP logical expression is a scalar expression. This 
document uses true as
a generic term for a non-zero integer value and false as a generic term 
for an integer value

of zero."

I am not quite sure what to expect here; in terms of C++, conv.bool 
surely permits
those for those pvalues "Boolean conversions".  For C, I don't find the 
wording in the

standard but 'if("abc")' and 'if (5.5)' is accepted.


I've changed the C++ parser to accept these 'unusual' logical values, 
and modified the wording of the error to require a 'logical' expression.


I notice that the {__builtin_,}GOMP_target_map_indirect_ptr call is 
inserted
quite late, i.e. in omp-offload.cc.  A dump and also looking at the *.s 
files

shows that the
   __builtin_GOMP_target_map_indirect_ptr / call
GOMP_target_map_indirect_ptr

do not only show up for the device but also for the host-fallback code.

I think the latter is not required as a host pointer can be directly 
executed

on the host - and device -> host pointer like in
   omp target device(ancestor:1)
do not need to be supported.

Namely the current glossary (here git version but OpenMP 5.2 is very 
similar);

note the "other than the host device":

"indirect device invocation - An indirect call to the device version of a
procedure on a device other than the host device, through a function 
pointer
(C/C++), a pointer to a member function (C++) or a procedure pointer 
(Fortran)

that refers to the host version of the procedure.

Can't we use  #ifdef ACCEL_COMPILER  to optimize the host fallback?

That way, we can also avoid generating the splay-tree on the host
cf. LIBGOMP_OFFLOADED_ONLY.


The GOMP_target_map_indirect_ptr call is now only generated by the accel 
compilers.


FWIW, the splay-tree was not actually being built on the host-side, and 
the host implementation of GOMP_target_map_indirect_ptr just returned 
the pointer unchanged. It is now changed to __builtin_unreachable as the 
calls should no longer be generated in host code.



#pragma omp begin declare target indirect(1) device_type(host)

is accepted but it violates:

OpenMP 5.1: "Restrictions to the declare target directive are as follows:"
"If an indirect clause is present and invoked-by-fptr evaluates to true 
then

the only permitted d

Re: [PATCH v2] Format gotools.sum closer to what DejaGnu does

2023-11-03 Thread rep . dot . nop
On 3 November 2023 14:38:06 CET, Jeff Law  wrote:
>
>
>On 11/3/23 06:54, Maxim Kuvyrkov wrote:

>> gotools/ChangeLog:
>> 
>>  * Makefile.am: Update "Running  ..." output
>>  * Makefile.in: Regenerate.
>OK.  Thanks for running down the differences in the autogenerated  bits.

Many thanks for reinstating regression checking even for go stuff and hence the 
extra 2 or 3 kilometres to fix unnoticed regressions imposed by different 
dejagnu check files/suites and the wart of using pristine autotools 
infrastructure.

cheers


Re: [PATCH] Fortran: fix issue with multiple references of a procedure pointer [PR97245]

2023-11-03 Thread Steve Kargl
On Fri, Nov 03, 2023 at 07:56:20PM +0100, Harald Anlauf wrote:
> Dear all,
> 
> this is a rather weird bug with a very simple fix.  If a procedure pointer
> is referenced in a CALL, a symbol was created shadowing the original
> declaration if it was host-associated.  Funnily, this affected only
> references of the procedure pointer after the first CALL.
> 
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
> 
> Would it be OK to backport this fix to 13-branch?
> 


OK, and yes, you can backport if you want.

-- 
Steve


Re: [PATCH] internal-fn: Add VCOND_MASK_LEN.

2023-11-03 Thread Robin Dapp
> Ah, OK.  IMO it's better to keep the optab operands the same as the IFN
> operands, even if that makes things inconsistent with vcond_mask.
> vcond_mask isn't really a good example to follow, since the operand
> order is not only inconsistent with the IFN, it's also inconsistent
> with the natural if_then_else order.

v4 attached with that changed,  match.pd patterns interleaved as well
as scratch-handling added and VLS modes removed.  Lehua has since pushed
another patch that extends gimple_match_op to 6/7 operands already so
that could be removed as well making the patch even smaller now.

Testsuite on riscv looks good (apart from the mentioned cond_widen...),
still running on aarch64 and x86.  OK if those pass?

Regards
 Robin

Subject: [PATCH v4] internal-fn: Add VCOND_MASK_LEN.

In order to prevent simplification of a COND_OP with degenerate mask
(CONSTM1_RTX) into just an OP in the presence of length masking this
patch introduces a length-masked analog to VEC_COND_EXPR:
IFN_VCOND_MASK_LEN.

It also adds new match patterns that allow the combination of
unconditional unary, binary and ternay operations with the
VCOND_MASK_LEN into a conditional operation if the target supports it.

gcc/ChangeLog:

PR tree-optimization/111760

* config/riscv/autovec.md (vcond_mask_len_): Add
expander.
* config/riscv/riscv-protos.h (enum insn_type): Add.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add !pred_mov.
* doc/md.texi: Add vcond_mask_len.
* gimple-match-exports.cc (maybe_resimplify_conditional_op):
Create VCOND_MASK_LEN when length masking.
* gimple-match.h (gimple_match_op::gimple_match_op): Always
initialize len and bias.
* internal-fn.cc (vec_cond_mask_len_direct): Add.
(direct_vec_cond_mask_len_optab_supported_p): Add.
(internal_fn_len_index): Add VCOND_MASK_LEN.
(internal_fn_mask_index): Ditto.
* internal-fn.def (VCOND_MASK_LEN): New internal function.
* match.pd: Combine unconditional unary, binary and ternary
operations into the respective COND_LEN operations.
* optabs.def (OPTAB_D): Add vcond_mask_len optab.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-arith-2.c: No vect cost model for
riscv_v.
---
 gcc/config/riscv/autovec.md   | 26 ++
 gcc/config/riscv/riscv-protos.h   |  3 ++
 gcc/config/riscv/riscv-v.cc   |  3 +-
 gcc/doc/md.texi   |  9 
 gcc/gimple-match-exports.cc   | 13 +++--
 gcc/gimple-match.h|  6 ++-
 gcc/internal-fn.cc|  5 ++
 gcc/internal-fn.def   |  2 +
 gcc/match.pd  | 51 +++
 gcc/optabs.def|  1 +
 gcc/testsuite/gcc.dg/vect/vect-cond-arith-2.c |  1 +
 11 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cc4c9596bbf..0a5e4ccb54e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -565,6 +565,32 @@ (define_insn_and_split "vcond_mask_"
   [(set_attr "type" "vector")]
 )
 
+(define_expand "vcond_mask_len_"
+  [(match_operand:V 0 "register_operand")
+(match_operand: 1 "nonmemory_operand")
+(match_operand:V 2 "nonmemory_operand")
+(match_operand:V 3 "autovec_else_operand")
+(match_operand 4 "autovec_length_operand")
+(match_operand 5 "const_0_operand")]
+  "TARGET_VECTOR"
+  {
+if (satisfies_constraint_Wc1 (operands[1]))
+  riscv_vector::expand_cond_len_unop (code_for_pred_mov (mode),
+ operands);
+else
+  {
+   /* The order of then and else is opposite to pred_merge.  */
+   rtx ops[] = {operands[0], operands[3], operands[3], operands[2],
+operands[1]};
+   riscv_vector::emit_nonvlmax_insn (code_for_pred_merge (mode),
+ riscv_vector::MERGE_OP_TU,
+ ops, operands[4]);
+  }
+DONE;
+  }
+  [(set_attr "type" "vector")]
+)
+
 ;; -
 ;;  [BOOL] Select based on masks
 ;; -
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a1be731c28e..0d0ee5effea 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -359,6 +359,9 @@ enum insn_type : unsigned int
   /* For vmerge, no mask operand, no mask policy operand.  */
   MERGE_OP = __NORMAL_OP_TA2 | TERNARY_OP_P,
 
+  /* For vmerge with TU policy.  */
+  MERGE_OP_TU = HAS_DEST_P | HAS_MERGE_P | TERNARY_OP_P | TU_POLICY_P,
+
   /* For vm, no tail policy operand.  */
   COMPARE_OP = __NORMAL_OP_MA | TERNARY_OP_P,
   COMPARE_OP_MU = __MASK_O

[PATCH v4] gcc: Introduce -fhardened

2023-11-03 Thread Marek Polacek
On Thu, Oct 26, 2023 at 05:55:56PM +0200, Richard Biener wrote:
> 
> 
> > Am 24.10.2023 um 21:09 schrieb Marek Polacek :
> > 
> > On Tue, Oct 24, 2023 at 09:22:25AM +0200, Richard Biener wrote:
> >>> On Mon, Oct 23, 2023 at 9:26 PM Marek Polacek  wrote:
> >>> 
> >>> On Thu, Oct 19, 2023 at 02:24:11PM +0200, Richard Biener wrote:
>  Can you see how our
>  primary and secondary targets (+ host OS) behave here?
> >>> 
> >>> That's very reasonable.  I tried to build gcc on Compile Farm 119 (AIX) 
> >>> but
> >>> that fails with:
> >>> 
> >>> ar  -X64 x ../ppc64/libgcc/libgcc_s.a shr.o
> >>> ar: 0707-100 ../ppc64/libgcc/libgcc_s.a does not exist.
> >>> make[2]: *** [/home/polacek/gcc/libgcc/config/rs6000/t-slibgcc-aix:98: 
> >>> all] Error 1
> >>> make[2]: Leaving directory 
> >>> '/home/polacek/x/trunk/powerpc-ibm-aix7.3.1.0/libgcc'
> >>> 
> >>> and I tried Darwin (104) and that fails with
> >>> 
> >>> *** Configuration aarch64-apple-darwin21.6.0 not supported
> >>> 
> >>> Is anyone else able to build gcc on those machines, or test the attached
> >>> patch?
> >>> 
>  I think the
>  documentation should elaborate a bit on expectations for non-Linux/GNU
>  targets, specifically I think the default configuration for a target 
>  should
>  with -fhardened _not_ have any -Whardened diagnostics.  Maybe we can
>  have a testcase for this?
> >>> 
> >>> Sorry, I'm not sure how to test that.  I suppose if -fhardened enables
> >>> something not supported on those systems, and it's something for which
> >>> we have a configure test, then we shouldn't warn.  This is already the
> >>> case for -pie, -z relro, and -z now.
> >> 
> >> I was thinking of
> >> 
> >> /* { dg-do compile } */
> >> /* { dg-additional-options "-fhardened -Whardened" } */
> >> 
> >> int main () {}
> >> 
> >> and excess errors should catch "misconfigurations"?
> > 
> > I see.  fhardened-3.c is basically just like this (-Whardened is on by 
> > default).
> > 
> >>> Should the docs say something like the following for features without
> >>> configure checks?
> >>> 
> >>> @option{-fhardened} can, on certain systems, attempt to enable features
> >>> not supported on that particular system.  In that case, it's possible to
> >>> prevent the warning using the @option{-Wno-hardened} option.
> >> 
> >> Yeah, but ideally
> >> 
> >> @option{-fhardened} can, on certain systems, not enable features not
> >> available on those systems and @option{-Whardened} will not diagnose
> >> those as missing.
> >> 
> >> But I understand it doesn't work like that?
> > 
> > Right.  It will not diagnose missing features if they have a configure
> > check, otherwise it will.  And I don't know if we want a configure check
> > for every feature.  Maybe we can add them in the future if the current
> > patch turns out to be problematical in practice?
> 
> Maybe we can have a switch on known target triples and statically configure 
> based
> On that, eventually even not support -fhardened for targets not listed.  
> That’s certainly easier than detecting the target system features (think of 
> cross compilers)

You mean like the following?  The only difference is the addition of
HAVE_FHARDENED_SUPPORT and updating the tests to only run on gnu/linux
targets.  If other OSs want to use -fhardened, they need to update the
configure test.  Thanks,

Bootstrapped/regtested on x86_64-pc-linux-gnu and
powerpc64le-unknown-linux-gnu.

-- >8 --
In 
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags.  The read of the room seems to be that the option
would be useful.  So here's a patch implementing that option.

Currently, -fhardened enables:

  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=zero
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full (x86 GNU/Linux only)

-fhardened will not override options that were specified on the command line
(before or after -fhardened).  For example,

 -D_FORTIFY_SOURCE=1 -fhardened

means that _FORTIFY_SOURCE=1 will be used.  Similarly,

  -fhardened -fstack-protector

will not enable -fstack-protector-strong.

Currently, -fhardened is only supported on GNU/Linux.

In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything.  This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option.  I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.

gcc/c-family/ChangeLog:

* c-opts.cc (c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
and _GLIBCXX_ASSERTIONS.

gcc/ChangeLog:

* common.opt (Whardened, fhardened): New options.
* config.in: Regenerate.
* config/bpf/bpf.cc: Include "opts.h".
(bpf_option_override): If flag_stack_protector_set

Re: [PATCH] attribs: Fix ICE with -Wno-attributes= [PR112339]

2023-11-03 Thread Jason Merrill
LGTM but I'd like Marek to approve it.

On Fri, Nov 3, 2023, 3:12 PM Jakub Jelinek  wrote:

> Hi!
>
> The following testcase ICEs, because with -Wno-attributes=foo::no_sanitize
> (but generally any other non-gnu namespace and some gnu well known
> attribute
> name within that other namespace) the FEs don't really parse attribute
> arguments of such attribute, but lookup_attribute_spec is non-NULL with
> NULL handler and such attributes are added to DECL_ATTRIBUTES or
> TYPE_ATTRIBUTES and then when e.g. middle-end does lookup_attribute
> on a particular attribute and expects the attribute to mean something
> and/or have a particular verified arguments, it can crash when seeing
> the foreign attribute in there instead.
>
> The following patch fixes that by never adding ignored attributes
> to DECL_ATTRIBUTES/TYPE_ATTRIBUTES, previously that was the case just
> for attributes in ignored namespace (where lookup_attribute_space
> returned NULL).  We don't really know anything about those attributes,
> so shouldn't pretend we know something about them, especially when
> the arguments are error_mark_node or NULL instead of something that
> would have been parsed.  And it would be really weird if we normally
> ignore say [[clang::unused]] attribute, but when people use
> -Wno-attributes=clang::unused we actually treated it as gnu::unused.
> All the user asked for is suppress warnings about that attribute being
> unknown.
>
> The first hunk is just playing safe, I'm worried people could
> -Wno-attributes=gnu::
> and get various crashes with known GNU attributes not being actually
> parsed and recorded (or worse e.g. when we tweak standard attributes
> into GNU attributes and we wouldn't add those).
> The -Wno-attributes= documentation says that it suppresses warning about
> unknown attributes, so I think -Wno-attributes=gnu:: should prevent
> warning about say [[gnu::foobarbaz]] attribute, but not about
> [[gnu::unused]] because the latter is a known attribute.
> The routine would return true for any scoped attribute in the ignored
> namespace, with the change it ignores only unknown attributes in ignored
> namespace, known ones in there will be ignored only if they have
> max_length of -2 (e.g.. with
> -Wno-attributes=gnu:: -Wno-attributes=gnu::foobarbaz).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-11-03  Jakub Jelinek  
>
> PR c/112339
> * attribs.cc (attribute_ignored_p): Only return true for
> attr_namespace_ignored_p if as is NULL.
> (decl_attributes): Never add ignored attributes.
>
> * c-c++-common/ubsan/Wno-attributes-1.c: New test.
>
> --- gcc/attribs.cc.jj   2023-11-02 20:22:02.017016371 +0100
> +++ gcc/attribs.cc  2023-11-03 08:45:32.688726738 +0100
> @@ -579,9 +579,9 @@ attribute_ignored_p (tree attr)
>  return false;
>if (tree ns = get_attribute_namespace (attr))
>  {
> -  if (attr_namespace_ignored_p (ns))
> -   return true;
>const attribute_spec *as = lookup_attribute_spec (TREE_PURPOSE
> (attr));
> +  if (as == NULL && attr_namespace_ignored_p (ns))
> +   return true;
>if (as && as->max_length == -2)
> return true;
>  }
> @@ -862,7 +862,10 @@ decl_attributes (tree *node, tree attrib
> }
> }
>
> -  if (no_add_attrs)
> +  if (no_add_attrs
> + /* Don't add attributes registered just for
> -Wno-attributes=foo::bar
> +purposes.  */
> + || attribute_ignored_p (attr))
> continue;
>
>if (spec->handler != NULL)
> --- gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c.jj  2023-11-03
> 08:44:13.752837201 +0100
> +++ gcc/testsuite/c-c++-common/ubsan/Wno-attributes-1.c 2023-11-03
> 08:44:13.751837215 +0100
> @@ -0,0 +1,9 @@
> +/* PR c/112339 */
> +/* { dg-do compile { target { c++11 || c } } } */
> +/* { dg-options "-Wno-attributes=foo::no_sanitize -fsanitize=undefined" }
> */
> +/* { dg-additional-options "-std=c2x" { target c } } */
> +
> +[[foo::no_sanitize("bounds")]] void
> +foo (void)
> +{
> +}
>
> Jakub
>
>


[PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec

2023-11-03 Thread pan2 . li
From: Pan Li 

The [i|l|ll][rint|round|ceil|floor] internal functions are
defined as DEF_INTERNAL_FLT_FN instead of DEF_INTERNAL_FLT_FLOATN_FN.
Then the *f16 (N=16 of FLOATN) format of these functions are not
available when try to get the ifn from the given cfn in the
vectorizable_call. Aka:

BUILT_IN_LRINTF16 => IFN_LAST (should be IFN_LRINT here)
BUILT_IN_RINTF16 => IFN_RINT

It is better to remove FP16 related modes until the additional
middle-end support is ready. This patch would like to clean the FP16
modes with some comments.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Remove HF modes.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector-iterators.md | 59 +---
 1 file changed, 2 insertions(+), 57 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f2d9f60b631..e80eaedc4b3 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3221,20 +3221,15 @@ (define_mode_attr vnnconvert [
 ;; V_F2SI_CONVERT: (HF, SF, DF) => SI
 ;; V_F2DI_CONVERT: (HF, SF, DF) => DI
 ;;
+;; HF requires additional support from internal function, aka
+;; gcc/internal-fn.def, remove HF shortly until the middle-end is ready.
 (define_mode_attr V_F2SI_CONVERT [
-  (RVVM4HF "RVVM8SI") (RVVM2HF "RVVM4SI") (RVVM1HF "RVVM2SI")
-  (RVVMF2HF "RVVM1SI") (RVVMF4HF "RVVMF2SI")
-
   (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI")
   (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI")
 
   (RVVM8DF "RVVM4SI") (RVVM4DF "RVVM2SI") (RVVM2DF "RVVM1SI")
   (RVVM1DF "RVVMF2SI")
 
-  (V1HF "V1SI") (V2HF "V2SI") (V4HF "V4SI") (V8HF "V8SI") (V16HF "V16SI")
-  (V32HF "V32SI") (V64HF "V64SI") (V128HF "V128SI") (V256HF "V256SI")
-  (V512HF "V512SI") (V1024HF "V1024SI")
-
   (V1SF "V1SI") (V2SF "V2SI") (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")
   (V32SF "V32SI") (V64SF "V64SI") (V128SF "V128SI") (V256SF "V256SI")
   (V512SF "V512SI") (V1024SF "V1024SI")
@@ -3245,19 +3240,12 @@ (define_mode_attr V_F2SI_CONVERT [
 ])
 
 (define_mode_attr v_f2si_convert [
-  (RVVM4HF "rvvm8si") (RVVM2HF "rvvm4si") (RVVM1HF "rvvm2si")
-  (RVVMF2HF "rvvm1si") (RVVMF4HF "rvvmf2si")
-
   (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si")
   (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si")
 
   (RVVM8DF "rvvm4si") (RVVM4DF "rvvm2si") (RVVM2DF "rvvm1si")
   (RVVM1DF "rvvmf2si")
 
-  (V1HF "v1si") (V2HF "v2si") (V4HF "v4si") (V8HF "v8si") (V16HF "v16si")
-  (V32HF "v32si") (V64HF "v64si") (V128HF "v128si") (V256HF "v256si")
-  (V512HF "v512si") (V1024HF "v1024si")
-
   (V1SF "v1si") (V2SF "v2si") (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")
   (V32SF "v32si") (V64SF "v64si") (V128SF "v128si") (V256SF "v256si")
   (V512SF "v512si") (V1024SF "v1024si")
@@ -3268,9 +3256,6 @@ (define_mode_attr v_f2si_convert [
 ])
 
 (define_mode_iterator V_VLS_F_CONVERT_SI [
-  (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH") (RVVM1HF "TARGET_ZVFH")
-  (RVVMF2HF "TARGET_ZVFH") (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
-
   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") (RVVM1SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
@@ -3280,18 +3265,6 @@ (define_mode_iterator V_VLS_F_CONVERT_SI [
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
 
-  (V1HF "riscv_vector::vls_mode_valid_p (V1HFmode) && TARGET_ZVFH")
-  (V2HF "riscv_vector::vls_mode_valid_p (V2HFmode) && TARGET_ZVFH")
-  (V4HF "riscv_vector::vls_mode_valid_p (V4HFmode) && TARGET_ZVFH")
-  (V8HF "riscv_vector::vls_mode_valid_p (V8HFmode) && TARGET_ZVFH")
-  (V16HF "riscv_vector::vls_mode_valid_p (V16HFmode) && TARGET_ZVFH")
-  (V32HF "riscv_vector::vls_mode_valid_p (V32HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
-  (V64HF "riscv_vector::vls_mode_valid_p (V64HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
-  (V128HF "riscv_vector::vls_mode_valid_p (V128HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
-  (V256HF "riscv_vector::vls_mode_valid_p (V256HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
-  (V512HF "riscv_vector::vls_mode_valid_p (V512HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 1024")
-  (V1024HF "riscv_vector::vls_mode_valid_p (V1024HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 2048")
-
   (V1SF "riscv_vector::vls_mode_valid_p (V1SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
   (V2SF "riscv_vector::vls_mode_valid_p (V2SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
   (V4SF "riscv_vector::vls_mode_valid_p (V4SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
@@ -3317,19 +3290,12 @@ (define_mode_iterator V_VLS_F_CONVERT_SI [
 ])
 
 (define_mode_attr V_F2DI_CONVERT [
-  (RVVM2HF "RVVM8DI") (RVVM1HF "RVVM4DI") (RVVMF2HF "RVVM2DI")
-  (RVVMF4HF "RVVM1DI")
-
   (RVVM4SF "RVVM8DI") (RVVM2SF "RVVM4DI") (RVVM1SF "RVVM2DI")
   (RVVMF2SF "RVVM1DI")
 
   (RVVM8DF "RVVM8DI") (RVVM4DF "RVVM4DI") (RVVM2DF "RVVM2DI")
   (RVVM1DF "RVVM1DI")
 
-  (V1HF "V1DI") (V2HF "V2DI") (V

Re: [PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec

2023-11-03 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-11-04 09:41
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec
From: Pan Li 
 
The [i|l|ll][rint|round|ceil|floor] internal functions are
defined as DEF_INTERNAL_FLT_FN instead of DEF_INTERNAL_FLT_FLOATN_FN.
Then the *f16 (N=16 of FLOATN) format of these functions are not
available when try to get the ifn from the given cfn in the
vectorizable_call. Aka:
 
BUILT_IN_LRINTF16 => IFN_LAST (should be IFN_LRINT here)
BUILT_IN_RINTF16 => IFN_RINT
 
It is better to remove FP16 related modes until the additional
middle-end support is ready. This patch would like to clean the FP16
modes with some comments.
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Remove HF modes.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/vector-iterators.md | 59 +---
1 file changed, 2 insertions(+), 57 deletions(-)
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f2d9f60b631..e80eaedc4b3 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3221,20 +3221,15 @@ (define_mode_attr vnnconvert [
;; V_F2SI_CONVERT: (HF, SF, DF) => SI
;; V_F2DI_CONVERT: (HF, SF, DF) => DI
;;
+;; HF requires additional support from internal function, aka
+;; gcc/internal-fn.def, remove HF shortly until the middle-end is ready.
(define_mode_attr V_F2SI_CONVERT [
-  (RVVM4HF "RVVM8SI") (RVVM2HF "RVVM4SI") (RVVM1HF "RVVM2SI")
-  (RVVMF2HF "RVVM1SI") (RVVMF4HF "RVVMF2SI")
-
   (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI")
   (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI")
   (RVVM8DF "RVVM4SI") (RVVM4DF "RVVM2SI") (RVVM2DF "RVVM1SI")
   (RVVM1DF "RVVMF2SI")
-  (V1HF "V1SI") (V2HF "V2SI") (V4HF "V4SI") (V8HF "V8SI") (V16HF "V16SI")
-  (V32HF "V32SI") (V64HF "V64SI") (V128HF "V128SI") (V256HF "V256SI")
-  (V512HF "V512SI") (V1024HF "V1024SI")
-
   (V1SF "V1SI") (V2SF "V2SI") (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")
   (V32SF "V32SI") (V64SF "V64SI") (V128SF "V128SI") (V256SF "V256SI")
   (V512SF "V512SI") (V1024SF "V1024SI")
@@ -3245,19 +3240,12 @@ (define_mode_attr V_F2SI_CONVERT [
])
(define_mode_attr v_f2si_convert [
-  (RVVM4HF "rvvm8si") (RVVM2HF "rvvm4si") (RVVM1HF "rvvm2si")
-  (RVVMF2HF "rvvm1si") (RVVMF4HF "rvvmf2si")
-
   (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si")
   (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si")
   (RVVM8DF "rvvm4si") (RVVM4DF "rvvm2si") (RVVM2DF "rvvm1si")
   (RVVM1DF "rvvmf2si")
-  (V1HF "v1si") (V2HF "v2si") (V4HF "v4si") (V8HF "v8si") (V16HF "v16si")
-  (V32HF "v32si") (V64HF "v64si") (V128HF "v128si") (V256HF "v256si")
-  (V512HF "v512si") (V1024HF "v1024si")
-
   (V1SF "v1si") (V2SF "v2si") (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")
   (V32SF "v32si") (V64SF "v64si") (V128SF "v128si") (V256SF "v256si")
   (V512SF "v512si") (V1024SF "v1024si")
@@ -3268,9 +3256,6 @@ (define_mode_attr v_f2si_convert [
])
(define_mode_iterator V_VLS_F_CONVERT_SI [
-  (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH") (RVVM1HF "TARGET_ZVFH")
-  (RVVMF2HF "TARGET_ZVFH") (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
-
   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") (RVVM1SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
@@ -3280,18 +3265,6 @@ (define_mode_iterator V_VLS_F_CONVERT_SI [
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-  (V1HF "riscv_vector::vls_mode_valid_p (V1HFmode) && TARGET_ZVFH")
-  (V2HF "riscv_vector::vls_mode_valid_p (V2HFmode) && TARGET_ZVFH")
-  (V4HF "riscv_vector::vls_mode_valid_p (V4HFmode) && TARGET_ZVFH")
-  (V8HF "riscv_vector::vls_mode_valid_p (V8HFmode) && TARGET_ZVFH")
-  (V16HF "riscv_vector::vls_mode_valid_p (V16HFmode) && TARGET_ZVFH")
-  (V32HF "riscv_vector::vls_mode_valid_p (V32HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
-  (V64HF "riscv_vector::vls_mode_valid_p (V64HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
-  (V128HF "riscv_vector::vls_mode_valid_p (V128HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
-  (V256HF "riscv_vector::vls_mode_valid_p (V256HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
-  (V512HF "riscv_vector::vls_mode_valid_p (V512HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 1024")
-  (V1024HF "riscv_vector::vls_mode_valid_p (V1024HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 2048")
-
   (V1SF "riscv_vector::vls_mode_valid_p (V1SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
   (V2SF "riscv_vector::vls_mode_valid_p (V2SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
   (V4SF "riscv_vector::vls_mode_valid_p (V4SFmode) && 
TARGET_VECTOR_ELEN_FP_32")
@@ -3317,19 +3290,12 @@ (define_mode_iterator V_VLS_F_CONVERT_SI [
])
(define_mode_attr V_F2DI_CONVERT [
-  (RVVM2HF "RVVM8DI") (RVVM1HF "RVVM4DI") (RVVMF2HF "RVVM2DI")
-  (RVVMF4HF "RVVM1DI")
-
   (RVVM4SF "RVVM8DI") (RVVM

[pushed] diagnostics: convert diagnostic_context to a class

2023-11-03 Thread David Malcolm
This patch:
- converts "struct diagnostic_context" to "class diagnostic_context".
- ensures all data members have an "m_" prefix, except for "printer",
  which has so many uses that renaming would be painful.
- makes most of the data members private
- converts much of the diagnostic_* functions to member functions of
  diagnostic_context, adding compatibility wrappers for users such as
  the Fortran frontend, and making as many as possible private.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-5117-g8200cd97c9c57d.

gcc/ChangeLog:
* common.opt (fdiagnostics-text-art-charset=): Remove refererence
to diagnostic-text-art.h.
* coretypes.h (struct diagnostic_context): Replace forward decl
with...
(class diagnostic_context): ...this.
* diagnostic-format-json.cc: Update for changes to
diagnostic_context.
* diagnostic-format-sarif.cc: Likewise.
* diagnostic-show-locus.cc: Likewise.
* diagnostic-text-art.h: Deleted file, moving content...
(enum diagnostic_text_art_charset): ...to diagnostic.h,
(DIAGNOSTICS_TEXT_ART_CHARSET_DEFAULT): ...deleting,
(diagnostics_text_art_charset_init): ...deleting in favor of
diagnostic_context::set_text_art_charset.
* diagnostic.cc: Remove include of "diagnostic-text-art.h".
(pedantic_warning_kind): Update for field renaming.
(permissive_error_kind): Likewise.
(permissive_error_option): Likewise.
(diagnostic_initialize): Convert to...
(diagnostic_context::initialize): ...this, updating for field
renamings.
(diagnostic_color_init): Convert to...
(diagnostic_context::color_init): ...this.
(diagnostic_urls_init): Convert to...
(diagnostic_context::urls_init): ...this.
(diagnostic_initialize_input_context): Convert to...
(diagnostic_context::initialize_input_context): ...this.
(diagnostic_finish): Convert to...
(diagnostic_context::finish): ...this, updating for field
renamings.
(diagnostic_context::set_output_format): New.
(diagnostic_context::set_client_data_hooks): New.
(diagnostic_context::create_edit_context): New.
(diagnostic_converted_column): Convert to...
(diagnostic_context::converted_column): ...this.
(diagnostic_get_location_text): Update for field renaming.
(diagnostic_check_max_errors): Convert to...
(diagnostic_context::check_max_errors): ...this, updating for
field renamings.
(diagnostic_action_after_output): Convert to...
(diagnostic_context::action_after_output): ...this, updating for
field renamings.
(last_module_changed_p): Delete.
(set_last_module): Delete.
(includes_seen): Convert to...
(diagnostic_context::includes_seen_p): ...this, updating for field
renamings.
(diagnostic_report_current_module): Convert to...
(diagnostic_context::report_current_module): ...this, updating for
field renamings, and replacing uses of last_module_changed_p and
set_last_module to simple field accesses.
(diagnostic_show_any_path): Convert to...
(diagnostic_context::show_any_path): ...this.
(diagnostic_classify_diagnostic): Convert to...
(diagnostic_context::classify_diagnostic): ...this, updating for
field renamings.
(diagnostic_push_diagnostics): Convert to...
(diagnostic_context::push_diagnostics): ...this, updating for field
renamings.
(diagnostic_pop_diagnostics): Convert to...
(diagnostic_context::pop_diagnostics): ...this, updating for field
renamings.
(get_any_inlining_info): Convert to...
(diagnostic_context::get_any_inlining_info): ...this, updating for
field renamings.
(update_effective_level_from_pragmas): Convert to...
(diagnostic_context::update_effective_level_from_pragmas):
...this, updating for field renamings.
(print_any_cwe): Convert to...
(diagnostic_context::print_any_cwe): ...this.
(print_any_rules): Convert to...
(diagnostic_context::print_any_rules): ...this.
(print_option_information): Convert to...
(diagnostic_context::print_option_information): ...this, updating
for field renamings.
(diagnostic_enabled): Convert to...
(diagnostic_context::diagnostic_enabled): ...this, updating for
field renamings.
(warning_enabled_at): Convert to...
(diagnostic_context::warning_enabled_at): ...this.
(diagnostic_report_diagnostic): Convert to...
(diagnostic_context::report_diagnostic): ...this, updating for
field renamings and conversions to member functions.
(diagnostic_append_note): Update for field renaming.
(diagnostic_impl): Use diagnos

RE: [PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec

2023-11-03 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: 钟居哲 
Sent: Saturday, November 4, 2023 9:43 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec

LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-11-04 09:41
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Remove HF modes of FP to INT rounding autovec
From: Pan Li mailto:pan2...@intel.com>>

The [i|l|ll][rint|round|ceil|floor] internal functions are
defined as DEF_INTERNAL_FLT_FN instead of DEF_INTERNAL_FLT_FLOATN_FN.
Then the *f16 (N=16 of FLOATN) format of these functions are not
available when try to get the ifn from the given cfn in the
vectorizable_call. Aka:

BUILT_IN_LRINTF16 => IFN_LAST (should be IFN_LRINT here)
BUILT_IN_RINTF16 => IFN_RINT

It is better to remove FP16 related modes until the additional
middle-end support is ready. This patch would like to clean the FP16
modes with some comments.

gcc/ChangeLog:

* config/riscv/vector-iterators.md: Remove HF modes.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/vector-iterators.md | 59 +---
1 file changed, 2 insertions(+), 57 deletions(-)

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f2d9f60b631..e80eaedc4b3 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3221,20 +3221,15 @@ (define_mode_attr vnnconvert [
;; V_F2SI_CONVERT: (HF, SF, DF) => SI
;; V_F2DI_CONVERT: (HF, SF, DF) => DI
;;
+;; HF requires additional support from internal function, aka
+;; gcc/internal-fn.def, remove HF shortly until the middle-end is ready.
(define_mode_attr V_F2SI_CONVERT [
-  (RVVM4HF "RVVM8SI") (RVVM2HF "RVVM4SI") (RVVM1HF "RVVM2SI")
-  (RVVMF2HF "RVVM1SI") (RVVMF4HF "RVVMF2SI")
-
   (RVVM8SF "RVVM8SI") (RVVM4SF "RVVM4SI") (RVVM2SF "RVVM2SI")
   (RVVM1SF "RVVM1SI") (RVVMF2SF "RVVMF2SI")
   (RVVM8DF "RVVM4SI") (RVVM4DF "RVVM2SI") (RVVM2DF "RVVM1SI")
   (RVVM1DF "RVVMF2SI")
-  (V1HF "V1SI") (V2HF "V2SI") (V4HF "V4SI") (V8HF "V8SI") (V16HF "V16SI")
-  (V32HF "V32SI") (V64HF "V64SI") (V128HF "V128SI") (V256HF "V256SI")
-  (V512HF "V512SI") (V1024HF "V1024SI")
-
   (V1SF "V1SI") (V2SF "V2SI") (V4SF "V4SI") (V8SF "V8SI") (V16SF "V16SI")
   (V32SF "V32SI") (V64SF "V64SI") (V128SF "V128SI") (V256SF "V256SI")
   (V512SF "V512SI") (V1024SF "V1024SI")
@@ -3245,19 +3240,12 @@ (define_mode_attr V_F2SI_CONVERT [
])
(define_mode_attr v_f2si_convert [
-  (RVVM4HF "rvvm8si") (RVVM2HF "rvvm4si") (RVVM1HF "rvvm2si")
-  (RVVMF2HF "rvvm1si") (RVVMF4HF "rvvmf2si")
-
   (RVVM8SF "rvvm8si") (RVVM4SF "rvvm4si") (RVVM2SF "rvvm2si")
   (RVVM1SF "rvvm1si") (RVVMF2SF "rvvmf2si")
   (RVVM8DF "rvvm4si") (RVVM4DF "rvvm2si") (RVVM2DF "rvvm1si")
   (RVVM1DF "rvvmf2si")
-  (V1HF "v1si") (V2HF "v2si") (V4HF "v4si") (V8HF "v8si") (V16HF "v16si")
-  (V32HF "v32si") (V64HF "v64si") (V128HF "v128si") (V256HF "v256si")
-  (V512HF "v512si") (V1024HF "v1024si")
-
   (V1SF "v1si") (V2SF "v2si") (V4SF "v4si") (V8SF "v8si") (V16SF "v16si")
   (V32SF "v32si") (V64SF "v64si") (V128SF "v128si") (V256SF "v256si")
   (V512SF "v512si") (V1024SF "v1024si")
@@ -3268,9 +3256,6 @@ (define_mode_attr v_f2si_convert [
])
(define_mode_iterator V_VLS_F_CONVERT_SI [
-  (RVVM4HF "TARGET_ZVFH") (RVVM2HF "TARGET_ZVFH") (RVVM1HF "TARGET_ZVFH")
-  (RVVMF2HF "TARGET_ZVFH") (RVVMF4HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
-
   (RVVM8SF "TARGET_VECTOR_ELEN_FP_32") (RVVM4SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVM2SF "TARGET_VECTOR_ELEN_FP_32") (RVVM1SF "TARGET_VECTOR_ELEN_FP_32")
   (RVVMF2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
@@ -3280,18 +3265,6 @@ (define_mode_iterator V_VLS_F_CONVERT_SI [
   (RVVM2DF "TARGET_VECTOR_ELEN_FP_64")
   (RVVM1DF "TARGET_VECTOR_ELEN_FP_64")
-  (V1HF "riscv_vector::vls_mode_valid_p (V1HFmode) && TARGET_ZVFH")
-  (V2HF "riscv_vector::vls_mode_valid_p (V2HFmode) && TARGET_ZVFH")
-  (V4HF "riscv_vector::vls_mode_valid_p (V4HFmode) && TARGET_ZVFH")
-  (V8HF "riscv_vector::vls_mode_valid_p (V8HFmode) && TARGET_ZVFH")
-  (V16HF "riscv_vector::vls_mode_valid_p (V16HFmode) && TARGET_ZVFH")
-  (V32HF "riscv_vector::vls_mode_valid_p (V32HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 64")
-  (V64HF "riscv_vector::vls_mode_valid_p (V64HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 128")
-  (V128HF "riscv_vector::vls_mode_valid_p (V128HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 256")
-  (V256HF "riscv_vector::vls_mode_valid_p (V256HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 512")
-  (V512HF "riscv_vector::vls_mode_valid_p (V512HFmode) && TARGET_ZVFH && 
TARGET_MIN_VLEN >= 1024")
-  (V1024HF "riscv_vector::vls_mode_valid_p (V1024HFmode) && 

  1   2   >