Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-28 Thread Richard Biener
On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
 wrote:
>
> This patch adds a new compiler pass aimed at identifying naive CRC 
> implementations,
> characterized by the presence of a loop calculating a CRC (polynomial long 
> division).
> Upon detection of a potential CRC, the pass prints an informational message.
>
> Performs CRC optimization if optimization level is >= 2,
> besides optimizations for size and if fno_gimple_crc_optimization given.
>
> This pass is added for the detection and optimization of naive CRC 
> implementations,
> improving the efficiency of CRC-related computations.
>
> This patch includes only initial fast checks for filtering out non-CRCs,
> detected possible CRCs verification and optimization parts will be provided 
> in subsequent patches.

Just a few quick questions - I'm waiting for a revision with Jeffs
comments cleared before
having a closer look.  The patch does nothing but analyze right now,
correct?  I assume
a later patch will fill in stuff in ::execute and use the return value
of loop_may_calculate_crc
(it's a bit odd to review such a "split" thing).

I think what this does fits final value replacement which lives in
tree-scalar-evolution.cc
and works from the loop-closed PHIs, trying to replace those.  I'm not
sure we want to
have a separate pass for this.  Consider a loop calculating two or
four CRCs in parallel,
replacing LC PHIs one-by-one should be able to handle this.

Richard.

>   gcc/
>
> * Makefile.in (OBJS): Add gimple-crc-optimization.o.
> * common.opt (fgimple-crc-optimization): New option.
> * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
> * gimple-crc-optimization.cc: New file.
> * gimple.cc (set_phi_stmts_not_visited): New function.
> (set_gimple_stmts_not_visited): Likewise.
> (set_bbs_stmts_not_visited): Likewise.
> * gimple.h (set_gimple_stmts_not_visited): New extern function 
> declaration.
> (set_phi_stmts_not_visited): New extern function declaration.
> (set_bbs_stmts_not_visited): New extern function declaration.
> * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
> (enable_fdo_optimizations): Enable gimple-crc-optimization.
> * passes.def (pass_crc_optimization): Add new pass.
> * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
> * tree-pass.h (make_pass_crc_optimization): New extern function 
> declaration.
>
> Signed-off-by: Mariam Arutunian 


Re: [PATCHv4] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 8:36 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Compared to previous version, the main change is to specify acceptable
> input and output modes for the optab.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652813.html
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?

OK.

> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
> * doc/md.texi (isfinite): Document.
>
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..b8432f84020 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..7be0c75baf9 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite floating point
> +number and to 0 otherwise.  Input mode should be a scalar floating
> +point mode and output mode should be @code{SImode}.
> +
>  @end table
>
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCHv4] Optab: add isnormal_optab for __builtin_isnormal

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 8:37 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Compared to previous version, the main change is to specify acceptable
> input and output modes for the optab.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652814.html
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?

OK

> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
> * doc/md.texi (isnormal): Document.
>
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index b8432f84020..ccd57fce522 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 7be0c75baf9..491cd09c620 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8563,6 +8563,12 @@ Set operand 0 to nonzero if operand 1 is a finite 
> floating point
>  number and to 0 otherwise.  Input mode should be a scalar floating
>  point mode and output mode should be @code{SImode}.
>
> +@cindex @code{isnormal@var{m}2} instruction pattern
> +@item @samp{isnormal@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a normal floating point
> +number and to 0 otherwise.  Input mode should be a scalar floating
> +point mode and return mode should be @code{SImode}.
> +
>  @end table
>
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH v3] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

2024-05-28 Thread Richard Biener
On Mon, May 27, 2024 at 9:48 AM Jiawei  wrote:
>
> Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html
>
> version log v3: remove additional POLY_INT_CST check.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-ssa-pre.cc (create_component_ref_by_pieces_1): New conditions.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.
>
> ---
>  .../gcc.target/riscv/rvv/vsetvl/pr115214.c| 52 +++
>  gcc/tree-ssa-pre.cc   | 10 ++--
>  2 files changed, 59 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
> new file mode 100644
> index 000..fce2e9da766
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
> @@ -0,0 +1,52 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 
> -w" } */
> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
> +
> +#include 
> +
> +static inline __attribute__(()) int vaddq_f32();
> +static inline __attribute__(()) int vload_tillz_f32(int nlane) {
> +  vint32m1_t __trans_tmp_9;
> +  {
> +int __trans_tmp_0 = nlane;
> +{
> +  vint64m1_t __trans_tmp_1;
> +  vint64m1_t __trans_tmp_2;
> +  vint64m1_t __trans_tmp_3;
> +  vint64m1_t __trans_tmp_4;
> +  if (__trans_tmp_0 == 1) {
> +{
> +  __trans_tmp_3 =
> +  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
> +}
> +__trans_tmp_4 = __trans_tmp_2;
> +  }
> +  __trans_tmp_4 = __trans_tmp_3;
> +  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
> +}
> +  }
> +  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' 
> cannot be passed to an unprototyped function} } */
> +}
> +
> +char CFLOAT_add_args[3];
> +const int *CFLOAT_add_steps;
> +const int CFLOAT_steps;
> +
> +__attribute__(()) void CFLOAT_add() {
> +  char *b_src0 = &CFLOAT_add_args[0], *b_src1 = &CFLOAT_add_args[1],
> +   *b_dst = &CFLOAT_add_args[2];
> +  const float *src1 = (float *)b_src1;
> +  float *dst = (float *)b_dst;
> +  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
> +  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
> +  const int hstep = 4 / 2;
> +  vfloat32m1x2_t a;
> +  int len = 255;
> +  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
> +int b = vload_tillz_f32(len);
> +int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
> '__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
> +  }
> +  for (; len > 0; --len, b_src0 += CFLOAT_steps,
> +  b_src1 += CFLOAT_add_steps[1], b_dst += 
> CFLOAT_add_steps[2])
> +;
> +}
> diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
> index 75217f5cde1..5cf1968bc26 100644
> --- a/gcc/tree-ssa-pre.cc
> +++ b/gcc/tree-ssa-pre.cc
> @@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, 
> vn_reference_t ref,
>here as the element alignment may be not visible.  See
>PR43783.  Simply drop the element size for constant
>sizes.  */
> -   if (TREE_CODE (genop3) == INTEGER_CST
> +   if ((TREE_CODE (genop3) == INTEGER_CST
> && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
> && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
> -(wi::to_offset (genop3)
> - * vn_ref_op_align_unit (currop
> +(wi::to_offset (genop3) * vn_ref_op_align_unit 
> (currop
> + || (TREE_CODE (genop3) == EXACT_DIV_EXPR
> +   && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
> +   && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT 
> (elmt_type))
> +   && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
> +vn_ref_op_align_unit (currop
>   genop3 = NULL_TREE;
> else
>   {
> --
> 2.25.1
>


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Kewen.Lin
Hi,

on 2024/5/27 20:54, Richard Biener wrote:
> On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:
>>
>> Hi,
>>   This patch adds an optab for __builtin_isfinite. The finite check can be
>> implemented on rs6000 by a single instruction. It needs an optab to be
>> expanded to the certain sequence of instructions.
>>
>>   The subsequent patches will implement the expand on rs6000.
>>
>>   Compared to previous version, the main change is to specify acceptable
>> modes for the optab.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is this OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> optab: Add isfinite_optab for isfinite builtin
>>
>> gcc/
>> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>> for isfinite builtin.
>> * optabs.def (isfinite_optab): New.
>> * doc/md.texi (isfinite): Document.
>>
>>
>> patch.diff
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index f8d94c4b435..b8432f84020 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>errno_set = true; builtin_optab = ilogb_optab; break;
>>  CASE_FLT_FN (BUILT_IN_ISINF):
>>builtin_optab = isinf_optab; break;
>> -case BUILT_IN_ISNORMAL:
>>  case BUILT_IN_ISFINITE:
>> +  builtin_optab = isfinite_optab; break;
>> +case BUILT_IN_ISNORMAL:
>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>  case BUILT_IN_FINITED32:
>>  case BUILT_IN_FINITED64:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 5730bda80dc..67407fad37d 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
>> with operand 2.
>>
>>  This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{isfinite@var{m}2} instruction pattern
>> +@item @samp{isfinite@var{m}2}
>> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
>> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> 
> It should probably say scalar floating-point mode?  But what about the result?
> Is any integer mode OK?  That's esp. important if this might be used on
> vector modes.
> 
>> +otherwise.
>> +
>> +If this pattern @code{FAIL}, a call to the library function
>> +@code{isfinite} is used.
> 
> Or it's otherwise inline expanded?  Or does this imply targets
> have to make sure to implement the pattern when isfinite is
> not available in libc/libm?  I suggest to leave this sentence out,
> we usually only say when a pattern may _not_ FAIL (and usually
> FAILing isn't different from not providing a pattern).

As Haochen's previous reply, I think there are three cases:
  1) no optab defined, fold in a generic way;
  2) optab defined, SUCC, expand as what it defines;
  3) optab defined, FAIL, generate a library call;

>From above, I had the concern that ports may assume FAILing can
fall back with the generic folding, but it's not actually.
Does your comment imply ports usually don't make such assumption
(or they just check what happens for FAIL)?

BR,
Kewen

> 
>>  @end table
>>
>>  @end ifset
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index ad14f9328b9..dcd77315c2a 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>>  OPTAB_D (hypot_optab, "hypot$a3")
>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>  OPTAB_D (isinf_optab, "isinf$a2")
>> +OPTAB_D (isfinite_optab, "isfinite$a2")
>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>  OPTAB_D (log10_optab, "log10$a2")





Re: [PATCH 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-05-28 Thread Tobias Burnus

Hi PA, hi all,

two remarks while quickly browsing the code:

Paul-Antoine Arras:

+ if (n->sym->ts.type != BT_DERIVED
+ || !n->sym->ts.u.derived->ts.is_iso_c)
+   {
+ gfc_error ("argument list item %qs in "
+"% at %L must be of "
+"TYPE(C_PTR)",
+n->sym->name, &n->where);


I think you need to rule out 'c_funptr' as well, e.g. via:

|| (n->sym->ts.u.derived->intmod_sym_id
!= ISOCBINDING_PTR)))

I do note that in openmp.cc, we have one check which checks explicitly 
for c_ptr and one existing one which only checks for (c_ptr or 
c_funptr); can you fix that one as well?


* * *

But I mainly miss an update to 'module.cc' for the 'declare variant' 
change; the 'adjust_args' (for 'need_device_ptr', only) list items have

to be saved in the .mod file - otherwise the following will not work:

-aux.f90
! { dg-do compile { target skip-all-targets } }
module my_mod
  ...
  !$omp declare variant ... adjust_args(need_device_ptr: ...)
  ...
end module

.f90
{ dg-do ...
! { dg-additional-sources -aux.f90 }
  ...
  call 
  ...
  !$omp displatch
   call 
end


For C++ modules, it should be fine as those for those, the tree is dumped.

Tobias


Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Richard Biener
On Wed, May 15, 2024 at 12:59 PM Alexander Monakov  wrote:
>
>
> Hello,
>
> I'd like to ask if anyone has any new thoughts on this patch.
>
> Let me also point out that valgrind/memcheck.h is permissively
> licensed (BSD-style, rest of Valgrind is GPLv2), with the intention
> to allow importing into projects that are interested in using
> client requests without build-time dependency on installed headers.
> So maybe we have that as an option too.

Inlining the VALGRIND_DO_CLIENT_REQUEST_EXPR would be a lot
cheaper and would not add to the libgcc ABI.  I would guess the
valgrind "ABI" for these is practically fixed but of course architecture
dependent.

I do like the feature in general.

> Alexander
>
> On Fri, 22 Dec 2023, Alexander Monakov wrote:
>
> > From: Daniil Frolov 
> >
> > PR 66487 is asking to provide sanitizer-like detection for C++ object
> > lifetime violations that are worked around with -fno-lifetime-dse or
> > -flifetime-dse=1 in Firefox, LLVM (PR 106943), OpenJade (PR 69534).
> >
> > The discussion in the PR was centered around extending MSan, but MSan
> > was not ported to GCC (and requires rebuilding everything with
> > instrumentation).
> >
> > Instead, allow Valgrind to see lifetime boundaries by emitting client
> > requests along *this = { CLOBBER }.  The client request marks the
> > "clobbered" memory as undefined for Valgrind; clobbering assignments
> > mark the beginning of ctor and end of dtor execution for C++ objects.
> > Hence, attempts to read object storage after the destructor, or
> > "pre-initialize" its fields prior to the constructor will be caught.
> >
> > Valgrind client requests are offered as macros that emit inline asm.
> > For use in code generation, let's wrap them as libgcc builtins.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in (OBJS): Add gimple-valgrind-interop.o.
> >   * builtins.def (BUILT_IN_VALGRIND_MAKE_UNDEFINED): New.
> >   * common.opt (-fvalgrind-annotations): New option.
> >   * doc/install.texi (--enable-valgrind-interop): Document.
> >   * doc/invoke.texi (-fvalgrind-annotations): Document.
> >   * passes.def (pass_instrument_valgrind): Add.
> >   * tree-pass.h (make_pass_instrument_valgrind): Declare.
> >   * gimple-valgrind-interop.cc: New file.
> >
> > libgcc/ChangeLog:
> >
> >   * Makefile.in (LIB2ADD_ST): Add valgrind-interop.c.
> >   * config.in: Regenerate.
> >   * configure: Regenerate.
> >   * configure.ac (--enable-valgrind-interop): New flag.
> >   * libgcc2.h (__gcc_vgmc_make_mem_undefined): Declare.
> >   * valgrind-interop.c: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/valgrind-annotations-1.C: New test.
> >   * g++.dg/valgrind-annotations-2.C: New test.
> >
> > Co-authored-by: Alexander Monakov 
> > ---
> > Changes in v2:
> >
> > * Take new clobber kinds into account.
> > * Do not link valgrind-interop.o into libgcc_s.so.
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/builtins.def  |   3 +
> >  gcc/common.opt|   4 +
> >  gcc/doc/install.texi  |   5 +
> >  gcc/doc/invoke.texi   |  27 
> >  gcc/gimple-valgrind-interop.cc| 125 ++
> >  gcc/passes.def|   1 +
> >  gcc/testsuite/g++.dg/valgrind-annotations-1.C |  22 +++
> >  gcc/testsuite/g++.dg/valgrind-annotations-2.C |  12 ++
> >  gcc/tree-pass.h   |   1 +
> >  libgcc/Makefile.in|   3 +
> >  libgcc/config.in  |   6 +
> >  libgcc/configure  |  22 ++-
> >  libgcc/configure.ac   |  15 ++-
> >  libgcc/libgcc2.h  |   2 +
> >  libgcc/valgrind-interop.c |  40 ++
> >  16 files changed, 287 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/gimple-valgrind-interop.cc
> >  create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-1.C
> >  create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-2.C
> >  create mode 100644 libgcc/valgrind-interop.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 9373800018..d027548203 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1507,6 +1507,7 @@ OBJS = \
> >   gimple-ssa-warn-restrict.o \
> >   gimple-streamer-in.o \
> >   gimple-streamer-out.o \
> > + gimple-valgrind-interop.o \
> >   gimple-walk.o \
> >   gimple-warn-recursion.o \
> >   gimplify.o \
> > diff --git a/gcc/builtins.def b/gcc/builtins.def
> > index f03df32f98..b05e20e062 100644
> > --- a/gcc/builtins.def
> > +++ b/gcc/builtins.def
> > @@ -1194,6 +1194,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, 
> > ATTR_NOTHROW_LEAF_LIST)
> >  /* Control Flow Redundancy hardening out-of-line checker.  */
> >  DEF_BUILTIN_STUB (BUILT_IN___HARDCFR_CHECK, "__builti

Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-28 Thread Richard Biener
On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>
> Including the following changes:
> * The definition of the new internal function .ACCESS_WITH_SIZE
>   in internal-fn.def.
> * C FE converts every reference to a FAM with a "counted_by" attribute
>   to a call to the internal function .ACCESS_WITH_SIZE.
>   (build_component_ref in c_typeck.cc)
>
>   This includes the case when the object is statically allocated and
>   initialized.
>   In order to make this working, the routines initializer_constant_valid_p_1
>   and output_constant in varasm.cc are updated to handle calls to
>   .ACCESS_WITH_SIZE.
>   (initializer_constant_valid_p_1 and output_constant in varasm.c)
>
>   However, for the reference inside "offsetof", the "counted_by" attribute is
>   ignored since it's not useful at all.
>   (c_parser_postfix_expression in c/c-parser.cc)
>
>   In addtion to "offsetof", for the reference inside operator "typeof" and
>   "alignof", we ignore counted_by attribute too.
>
>   When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>   replace the call with its first argument.
>
> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>   (expand_ACCESS_WITH_SIZE in internal-fn.cc)
> * Adjust alias analysis to exclude the new internal from clobbering anything.
>   (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
> tree-ssa-alias.cc)
> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE when
>   it's LHS is eliminated as dead code.
>   (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>   get the reference from the call to .ACCESS_WITH_SIZE.
>   (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
>
> gcc/c/ChangeLog:
>
> * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
> attribute when build_component_ref inside offsetof operator.
> * c-tree.h (build_component_ref): Add one more parameter.
> * c-typeck.cc (build_counted_by_ref): New function.
> (build_access_with_size_for_counted_by): New function.
> (build_component_ref): Check the counted-by attribute and build
> call to .ACCESS_WITH_SIZE.
> (build_unary_op): When building ADDR_EXPR for
> .ACCESS_WITH_SIZE, use its first argument.
> (lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>
> gcc/ChangeLog:
>
> * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
> * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
> * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
> IFN_ACCESS_WITH_SIZE.
> (call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
> to .ACCESS_WITH_SIZE when its LHS is dead.
> * tree.cc (process_call_operands): Adjust side effect for function
> .ACCESS_WITH_SIZE.
> (is_access_with_size_p): New function.
> (get_ref_from_access_with_size): New function.
> * tree.h (is_access_with_size_p): New prototype.
> (get_ref_from_access_with_size): New prototype.
> * varasm.cc (initializer_constant_valid_p_1): Handle call to
> .ACCESS_WITH_SIZE.
> (output_constant): Handle call to .ACCESS_WITH_SIZE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/flex-array-counted-by-2.c: New test.
> ---
>  gcc/c/c-parser.cc |  10 +-
>  gcc/c/c-tree.h|   2 +-
>  gcc/c/c-typeck.cc | 128 +-
>  gcc/internal-fn.cc|  35 +
>  gcc/internal-fn.def   |   4 +
>  .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
>  gcc/tree-ssa-alias.cc |   2 +
>  gcc/tree-ssa-dce.cc   |   5 +-
>  gcc/tree.cc   |  25 +++-
>  gcc/tree.h|   8 ++
>  gcc/varasm.cc |  10 ++
>  11 files changed, 331 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index c31349dae2ff..a6ed5ac43bb1 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
> if (c_parser_next_token_is (parser, CPP_NAME))
>   {
> c_token *comp_tok = c_parser_peek_token (parser);
> +   /* Ignore the counted_by attribute for reference inside
> +  offsetof since the information is not useful at all.  */
> offsetof_ref
>   = build_component_ref (loc, offsetof_ref, comp_tok->value,
> -comp_tok->location, 
> UNKNOWN_LOCATION);
> + 

Re: [PATCH v9 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-05-28 Thread Richard Biener
On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>

I have no comments here, if Siddesh is OK with this I approve.

> gcc/ChangeLog:
>
> * tree-object-size.cc (access_with_size_object_size): New function.
> (call_object_size): Call the new function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
> * gcc.dg/flex-array-counted-by-3.c: New test.
> * gcc.dg/flex-array-counted-by-4.c: New test.
> * gcc.dg/flex-array-counted-by-5.c: New test.
> ---
>  .../gcc.dg/builtin-object-size-common.h   |  11 ++
>  .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
>  .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
>  .../gcc.dg/flex-array-counted-by-5.c  |  48 +
>  gcc/tree-object-size.cc   |  60 ++
>  5 files changed, 360 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
>
> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
> b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
> index 66ff7cdd953a..b677067c6e6b 100644
> --- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
> @@ -30,3 +30,14 @@ unsigned nfails = 0;
>__builtin_abort ();
> \
>  return 0;
> \
>} while (0)
> +
> +#define EXPECT(p, _v) do {   
> \
> +  size_t v = _v; 
> \
> +  if (p == v)
> \
> +__builtin_printf ("ok:  %s == %zd\n", #p, p);
> \
> +  else   
> \
> +{
> \
> +  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);
> \
> +  FAIL ();   
> \
> +}
> \
> +} while (0);
> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
> b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
> new file mode 100644
> index ..78f50230e891
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
> @@ -0,0 +1,63 @@
> +/* Test the attribute counted_by and its usage in
> + * __builtin_dynamic_object_size.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "builtin-object-size-common.h"
> +
> +struct flex {
> +  int b;
> +  int c[];
> +} *array_flex;
> +
> +struct annotated {
> +  int b;
> +  int c[] __attribute__ ((counted_by (b)));
> +} *array_annotated;
> +
> +struct nested_annotated {
> +  struct {
> +union {
> +  int b;
> +  float f;
> +};
> +int n;
> +  };
> +  int c[] __attribute__ ((counted_by (b)));
> +} *array_nested_annotated;
> +
> +void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
> +{
> +  array_flex
> += (struct flex *)malloc (sizeof (struct flex)
> ++ normal_count *  sizeof (int));
> +  array_flex->b = normal_count;
> +
> +  array_annotated
> += (struct annotated *)malloc (sizeof (struct annotated)
> + + attr_count *  sizeof (int));
> +  array_annotated->b = attr_count;
> +
> +  array_nested_annotated
> += (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
> ++ attr_count *  sizeof (int));
> +  array_nested_annotated->b = attr_count;
> +
> +  return;
> +}
> +
> +void __attribute__((__noinline__)) test ()
> +{
> +EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
> +EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
> +  array_annotated->b * sizeof (int));
> +EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
> +  array_nested_annotated->b * sizeof (int));
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +  setup (10,10);
> +  test ();
> +  DONE ();
> +}
> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
> b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
> new file mode 100644
> index ..20103d58ef51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
> @@ -0,0 +1,178 @@
> +/* Test the attribute counted_by and its usage in
> +__builtin_dynamic_object_size: what's the correct behavior when the
> +allocation size mismatched with the value of counted_by attribute?
> +We should always use the latest value that is hold by the counted_by
> +field.  */
> +/* { dg-do run } */
> +/* { dg-options "-O -

Re: [PATCH v9 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-05-28 Thread Richard Biener
On Fri, Apr 12, 2024 at 3:55 PM Qing Zhao  wrote:
>
> to carry the TYPE of the flexible array.
>
> Such information is needed during tree-object-size.cc.
>
> We cannot use the result type or the type of the 1st argument
> of the routine .ACCESS_WITH_SIZE to decide the element type
> of the original array due to possible type casting in the
> source code.

OK.  I guess technically an empty CONSTRUCTOR of the array type
would work as well (as aggregate it's fine to have it in the call) but a
constant zero pointer might be cheaper to have as it's shared across
multiple calls.

Richard.

> gcc/c/ChangeLog:
>
> * c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
> argument to .ACCESS_WITH_SIZE.
>
> gcc/ChangeLog:
>
> * tree-object-size.cc (access_with_size_object_size): Use the type
> of the 6th argument for the type of the element.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/flex-array-counted-by-6.c: New test.
> ---
>  gcc/c/c-typeck.cc | 11 +++--
>  gcc/internal-fn.cc|  2 +
>  .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
>  gcc/tree-object-size.cc   | 16 ---
>  4 files changed, 66 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index ff6685c6c4ba..0ea3b75355a4 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -2640,7 +2640,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
> *counted_by_type)
>
> to:
>
> -   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
> +   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
> +   (TYPE_OF_ARRAY *)0))
>
> NOTE: The return type of this function is the POINTER type pointing
> to the original flexible array type.
> @@ -2652,6 +2653,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
> *counted_by_type)
> The 4th argument of the call is a constant 0 with the TYPE of the
> object pointed by COUNTED_BY_REF.
>
> +   The 6th argument of the call is a constant 0 with the pointer TYPE
> +   to the original flexible array type.
> +
>*/
>  static tree
>  build_access_with_size_for_counted_by (location_t loc, tree ref,
> @@ -2664,12 +2668,13 @@ build_access_with_size_for_counted_by (location_t 
> loc, tree ref,
>
>tree call
>  = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
> -   result_type, 5,
> +   result_type, 6,
> array_to_pointer_conversion (loc, ref),
> counted_by_ref,
> build_int_cst (integer_type_node, 1),
> build_int_cst (counted_by_type, 0),
> -   build_int_cst (integer_type_node, -1));
> +   build_int_cst (integer_type_node, -1),
> +   build_int_cst (result_type, 0));
>/* Wrap the call with an INDIRECT_REF with the flexible array type.  */
>call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
>SET_EXPR_LOCATION (call, loc);
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index e744080ee670..34e4a4aea534 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>   1: read_only
>   2: write_only
>   3: read_write
> +   6th argument: A constant 0 with the pointer TYPE to the original flexible
> + array type.
>
> Both the return type and the type of the first argument of this
> function have been converted from the incomplete array type to
> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
> b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
> new file mode 100644
> index ..65fa01443d95
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
> @@ -0,0 +1,46 @@
> +/* Test the attribute counted_by and its usage in
> + * __builtin_dynamic_object_size: when the type of the flexible array member
> + * is casting to another type.  */
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +#include "builtin-object-size-common.h"
> +
> +typedef unsigned short u16;
> +
> +struct info {
> +   u16 data_len;
> +   char data[] __attribute__((counted_by(data_len)));
> +};
> +
> +struct foo {
> +   int a;
> +   int b;
> +};
> +
> +static __attribute__((__noinline__))
> +struct info *setup ()
> +{
> + struct info *p;
> + size_t bytes = 3 * sizeof(struct foo);
> +
> + p = (struct info *)malloc (sizeof (struct info) + bytes);
> + p->data_len = bytes;
> +
> + return p;
> +}
> +
> +static void
> +__attribute__((__noinline__)) report (struct info *p)
> +{
> + struct foo *bar = (struct foo *)p->data;
> + EXPECT(__builtin_dynamic_object_si

[PATCH] i386: Optimize EQ/NE comparison between avx512 kmask and -1.

2024-05-28 Thread Hu, Lin1
Hi all,

This patch aims to acheive EQ/NE comparison between avx512 kmask and -1
by using kxortest with checking CF.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,-m64}. Ok for trunk?

BRs,
Lin

gcc/ChangeLog:

PR target/113609
* config/i386/sse.md
(*kortest_cmp_setcc): New define_insn_and_split.
(*kortest_cmp_jcc): Ditto.

gcc/testsuite/ChangeLog:

PR target/113609
* gcc.target/i386/pr113609-1.c: New test.
* gcc.target/i386/pr113609-2.c: Ditto.
---
 gcc/config/i386/sse.md |  67 +++
 gcc/testsuite/gcc.target/i386/pr113609-1.c | 194 +
 gcc/testsuite/gcc.target/i386/pr113609-2.c | 161 +
 3 files changed, 422 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113609-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr113609-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b59c988fc31..34fd2e4afac 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2201,6 +2201,73 @@ (define_expand "kortest"
  UNSPEC_KORTEST))]
   "TARGET_AVX512F")
 
+;; Optimize cmp + setcc with mask register by kortest + setcc.
+(define_insn_and_split "*kortest_cmp_setcc"
+   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm, qm")
+(match_operator:QI 1 "bt_comparison_operator"
+   [(match_operand:SWI1248_AVX512BWDQ_64 2 "register_operand" "?k, 
")
+(const_int -1)]))
+  (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512BW"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  if (MASK_REGNO_P (REGNO (operands[2])))
+{
+  emit_insn (gen_kortest_ccc (operands[2], operands[2]));
+  operands[4] = gen_rtx_REG (CCCmode, FLAGS_REG);
+}
+  else
+{
+  operands[4] = gen_rtx_REG (CCZmode, FLAGS_REG);
+  emit_insn (gen_rtx_SET (operands[4],
+ gen_rtx_COMPARE (CCZmode,
+  operands[2],
+  constm1_rtx)));
+}
+  ix86_expand_setcc (operands[0],
+GET_CODE (operands[1]),
+operands[4],
+const0_rtx);
+  DONE;
+})
+
+;; Optimize cmp + jcc with mask register by kortest + jcc.
+(define_insn_and_split "*kortest_cmp_jcc"
+   [(set (pc)
+  (if_then_else
+   (match_operator 0 "bt_comparison_operator"
+ [(match_operand:SWI1248_AVX512BWDQ_64 1 "register_operand" "?k, ")
+  (const_int -1)])
+ (label_ref (match_operand 2))
+  (pc)))
+  (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512BW"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  if (MASK_REGNO_P (REGNO (operands[1])))
+{
+  emit_insn (gen_kortest_ccc (operands[1], operands[1]));
+  operands[4] = gen_rtx_REG (CCCmode, FLAGS_REG);
+}
+  else
+{
+  operands[4] = gen_rtx_REG (CCZmode, FLAGS_REG);
+  emit_insn (gen_rtx_SET (operands[4],
+ gen_rtx_COMPARE (CCZmode,
+  operands[1],
+  constm1_rtx)));
+}
+  ix86_expand_branch (GET_CODE (operands[0]),
+ operands[4],
+ const0_rtx,
+ operands[2]);
+  DONE;
+})
+
 (define_insn "kunpckhi"
   [(set (match_operand:HI 0 "register_operand" "=k")
(ior:HI
diff --git a/gcc/testsuite/gcc.target/i386/pr113609-1.c 
b/gcc/testsuite/gcc.target/i386/pr113609-1.c
new file mode 100644
index 000..f0639b8500a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113609-1.c
@@ -0,0 +1,194 @@
+/* PR target/113609 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v4" } */
+/* { dg-final { scan-assembler-not "^cmp" } } */
+/* { dg-final { scan-assembler-not "\[ \\t\]+sete" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "\[ \\t\]+setne" { target { ! ia32 } } } } 
*/
+/* { dg-final { scan-assembler-not "\[ \\t\]+je" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "\[ \\t\]+jne" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "\[ \\t\]+sete" 1 { target { ia32 } } } } 
*/
+/* { dg-final { scan-assembler-times "\[ \\t\]+setne" 1 { target { ia32 } } } 
} */
+/* { dg-final { scan-assembler-times "\[ \\t\]+je" 1 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "\[ \\t\]+jne" 2 { target { ia32 } } } } 
*/
+/* { dg-final { scan-assembler-times "kortest" 12 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "kortest" 17 { target { ! ia32 } } } } */
+
+#include 
+
+unsigned int
+cmp_vector_sete_mask8(__m128i a, __m128i b)
+{
+__mmask8 k = _mm_cmpeq_epi16_mask (a, b);
+if (k == (__mmask8) -1)
+  return 1;
+else
+  return 0;
+}
+
+unsigned int
+cmp_vector_sete_mask16(__m128i a, __m128i b)
+{
+__mmask16 k = _mm_cmpeq_epi8_mask (a, b);
+if (k == (__mmask16) -1)
+  return 1;
+else
+  return 0;
+}
+

[PATCH] Fix SLP reduction neutral op value for pointer reductions

2024-05-28 Thread Richard Biener
When the neutral op is the initial value we might need to convert
it from pointer to integer.

Bootstrapped and tested no x86_64-unknown-linux-gnu, pushed.

This shows with the SLP single-lane reduction discovery.

* tree-vect-loop.cc (get_initial_defs_for_reduction): Convert
neutral op to the vector component type.
---
 gcc/tree-vect-loop.cc | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 31abfe047a4..24a1239f016 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5616,7 +5616,14 @@ get_initial_defs_for_reduction (loop_vec_info loop_vinfo,
   /* Get the def before the loop.  In reduction chain we have only
 one initial value.  Else we have as many as PHIs in the group.  */
   if (i >= initial_values.length () || (j > i && neutral_op))
-   op = neutral_op;
+   {
+ if (!useless_type_conversion_p (TREE_TYPE (vector_type),
+ TREE_TYPE (neutral_op)))
+   neutral_op = gimple_convert (&ctor_seq,
+TREE_TYPE (vector_type),
+neutral_op);
+ op = neutral_op;
+   }
   else
{
  if (!useless_type_conversion_p (TREE_TYPE (vector_type),
-- 
2.35.3


[PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-05-28 Thread pan2 . li
From: Pan Li 

This patch would like to add the middle-end presentation for the
saturation sub.  Aka set the result of add to the min when downflow.
It will take the pattern similar as below.

SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));

For example for uint8_t, we have

* SAT_SUB (255, 0)   => 255
* SAT_SUB (1, 2) => 0
* SAT_SUB (254, 255) => 0
* SAT_SUB (0, 255)   => 0

Given below SAT_SUB for uint64

uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
{
  return (x + y) & (- (uint64_t)((x >= y)));
}

Before this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  _Bool _1;
  long unsigned int _3;
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _1 = x_4(D) >= y_5(D);
  _3 = x_4(D) - y_5(D);
  _6 = _1 ? _3 : 0;
  return _6;
;;succ:   EXIT
}

After this patch:
uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
{
  uint64_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
  return _6;
;;succ:   EXIT
}

The below tests are running for this patch:
*. The riscv fully regression tests.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.

PR target/51492
PR target/112600

gcc/ChangeLog:

* internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
* match.pd: Add new match for SAT_SUB.
* optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
* tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
new decl for generated in match.pd.
(build_saturation_binary_arith_call): Add new helper function
to build the gimple call to binary SAT alu.
(match_saturation_arith): Rename from.
(match_unsigned_saturation_add): Rename to.
(match_unsigned_saturation_sub): Add new func to match the
unsigned sat sub.
(math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
try when COND_EXPR.

Signed-off-by: Pan Li 
---
 gcc/internal-fn.def   |  1 +
 gcc/match.pd  | 14 
 gcc/optabs.def|  4 +--
 gcc/tree-ssa-math-opts.cc | 67 +++
 4 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 25badbb86e5..24539716e5b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -276,6 +276,7 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
ECF_NOTHROW, first,
  smulhrs, umulhrs, binary)
 
 DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, binary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, binary)
 
 DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
 DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
diff --git a/gcc/match.pd b/gcc/match.pd
index 024e3350465..3e334533ff8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3086,6 +3086,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+/* Unsigned saturation sub, case 1 (branch with gt):
+   SAT_U_SUB = X > Y ? X - Y : 0  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond (gt @0 @1) (minus @0 @1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
+/* Unsigned saturation sub, case 2 (branch with ge):
+   SAT_U_SUB = X >= Y ? X - Y : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond (ge @0 @1) (minus @0 @1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3f2cb46aff8..bc2611abdc2 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -118,8 +118,8 @@ OPTAB_NX(sub_optab, "sub$F$a3")
 OPTAB_NX(sub_optab, "sub$Q$a3")
 OPTAB_VL(subv_optab, "subv$I$a3", MINUS, "sub", '3', gen_intv_fp_libfunc)
 OPTAB_VX(subv_optab, "sub$F$a3")
-OPTAB_NL(sssub_optab, "sssub$Q$a3", SS_MINUS, "sssub", '3', 
gen_signed_fixed_libfunc)
-OPTAB_NL(ussub_optab, "ussub$Q$a3", US_MINUS, "ussub", '3', 
gen_unsigned_fixed_libfunc)
+OPTAB_NL(sssub_optab, "sssub$a3", SS_MINUS, "sssub", '3', 
gen_signed_fixed_libfunc)
+OPTAB_NL(ussub_optab, "ussub$a3", US_MINUS, "ussub", '3', 
gen_unsigned_fixed_libfunc)
 OPTAB_NL(smul_optab, "mul$Q$a3", MULT, "mul", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(smul_optab, "mul$P$a3")
 OPTAB_NX(smul_optab, "mul$F$a3")
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 62da1c5ee08..4717302b728 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4087,33 +4087,56 @@ arith_overflow_check_p (gimple *stmt, gimple 
*cast_stmt, gimple *&use_stmt,
 }
 
 extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
+
+static void
+buil

Re: [PATCH] Match: optimize `a == CST & unary(a)` [PR111487]

2024-05-28 Thread Richard Biener
On Mon, May 13, 2024 at 5:25 PM Andrew Pinski  wrote:
>
> This is an expansion of the optimize `a == CST & a`
> to handle more than just casts. It adds optimization
> for unary.
> The patch for binary operators will come later.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> PR tree-optimization/111487
> gcc/ChangeLog:
>
> * match.pd (tcc_int_unary): New operator list.
> (`a == CST & unary(a)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/and-unary-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd| 12 
>  gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c | 61 +
>  2 files changed, 73 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..3ee28a3d8fc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -57,6 +57,10 @@ along with GCC; see the file COPYING3.  If not see
>
>  #include "cfn-operators.pd"
>
> +/* integer unary operators that return the same type. */
> +(define_operator_list tcc_int_unary
> + abs absu negate bit_not BSWAP POPCOUNT CTZ CLZ PARITY)
> +

FFS and CLRSB (what's that...) missing at least.  CTZ and friends do
not return the same type as the argument (nor does absu).  So the
comment needs some work and the tcc_int_unary is a bad name unless
the comments means "Unary operators with integer operand and result type"?

>  /* Define operand lists for math rounding functions {,i,l,ll}FN,
> where the versions prefixed with "i" return an int, those prefixed with
> "l" return a long and those prefixed with "ll" return a long long.
> @@ -5451,6 +5455,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>@2
>{ build_zero_cst (type); }))
>
> +/* `(a == CST) & unary(a)` can be simplified to `(a == CST) & unary(CST)`. */
> +(simplify
> + (bit_and:c (convert@2 (eq @0 INTEGER_CST@1))
> +(convert? (tcc_int_unary @3)))
> + (if (bitwise_equal_p (@0, @3))

I know you like it - but the testcase doesn't seem to exercise
bitwise_equal_p here?
(I don't like it too much - it makes matching expensive)

OK with matching @0 to @3 or some arguing from your side (and maybe
testsuite coverage).

> +  (with { tree  inner_type = TREE_TYPE (@3); }
> +   (bit_and @2 (convert (tcc_int_unary (convert:inner_type @1)))

So I suppose the bitwise_equal_p "hides" the check against
truncation/bogus extension of @1?

For (a == INT_MIN) & -a we end up constant folding -INT_MIN which
invokes UB and thus might not actually fold.  As & is not short-circuiting
it's probably "safe" in some sense, but should we worry here?  I was
thinking of doing (tcc_int_unary! (convert:inner_type! @1)) as we expect
to constant fold.

> +
>  /* Optimize
> # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
> x_5 == cstN ? cst4 : cst3
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
> new file mode 100644
> index 000..c157bc11b00
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
> @@ -0,0 +1,61 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-forwprop1-raw -fdump-tree-optimized-raw" } 
> */
> +/* unary part of PR tree-optimization/111487 */
> +
> +int abs1(int a)
> +{
> +  int b = __builtin_abs(a);
> +  return (a == 1) & b;
> +}
> +int absu1(int a)
> +{
> +  int b;
> +  b = a > 0 ? -a:a;
> +  b = -b;
> +return (a == 1) & b;
> +}
> +
> +int bswap1(int a)
> +{
> +  int b = __builtin_bswap32(a);
> +  return (a == 1) & b;
> +}
> +
> +int ctz1(int a)
> +{
> +  int b = __builtin_ctz(a);
> +  return (a == 1) & b;
> +}
> +int pop1(int a)
> +{
> +  int b = __builtin_popcount(a);
> +  return (a == 1) & b;
> +}
> +int neg1(int a)
> +{
> +  int b = -(a);
> +  return (a == 1) & b;
> +}
> +int not1(int a)
> +{
> +  int b = ~(a);
> +  return (a == 1) & b;
> +}
> +int partity1(int a)
> +{
> +  int b = __builtin_parity(a);
> +  return (a == 1) & b;
> +}
> +
> +
> +/* We should optimize out the unary operator for each.
> +   For ctz we can optimize directly to `return 0`.
> +   For bswap1 and not1, we can do the same but not until after forwprop1.  */
> +/* { dg-final { scan-tree-dump-times "eq_expr, " 7 "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-times "eq_expr, " 5 "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "abs_expr, "  "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "absu_expr, "  "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "bit_not_expr, "  "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "negate_expr, "  "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "gimple_call <"  "forwprop1" } } */
> +/* { dg-final { scan-tree-dump-not "bit_and_expr,  "  "forwprop1" } } */
> --
> 2.34.1
>


Re: [PATCH] libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 branch

2024-05-28 Thread Jonathan Wakely
On Mon, 27 May 2024 at 09:26, Jakub Jelinek  wrote:
>
> Hi!
>
> The r13-8207-g17acf9fbeb10d7adad commit changed some tests to use
> -lstdc++exp instead of -lstdc++_libbacktrace, but it didn't change
> the 19_diagnostics/stacktrace/hash.cc test, presumably because
> when it was added on the trunk, it already had -lstdc++exp and
> it was changed to -lstdc++_libbacktrace only in the
> r13-8067-g16635b89f36c07b9e0 cherry-pick.
>
> The test fails with
> /usr/bin/ld: cannot find -lstdc++_libbacktrace
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
> FAIL: 19_diagnostics/stacktrace/hash.cc (test for excess errors)
> without this (while the library is still built, it isn't added in
> -L options).

Ah yes, because r13-8207-g17acf9fbeb10d7 changed the -L flags used for testing.

I wonder why I didn't see this failure though. It must have found the
lib in an already-installed path.

> Ok for 13 branch?

OK, thanks.


>
> I think the r13-8067 cherry-pick hasn't been applied to 12 branch,
> so we don't need it there.
>
> 2024-05-27  Jakub Jelinek  
>
> * testsuite/19_diagnostics/stacktrace/hash.cc: Adjust
> dg-options to use -lstdc++exp.
>
> --- libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc.jj 2023-11-22 
> 11:03:28.812657550 +0100
> +++ libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc2024-05-27 
> 10:18:44.900058884 +0200
> @@ -1,4 +1,4 @@
> -// { dg-options "-std=gnu++23 -lstdc++_libbacktrace" }
> +// { dg-options "-std=gnu++23 -lstdc++exp" }
>  // { dg-do run { target c++23 } }
>  // { dg-require-effective-target stacktrace }
>
>
>
> Jakub
>



[pushed] wwwdocs: news: Move www.velox-project.eu to https

2024-05-28 Thread Gerald Pfeifer
Not sure why these all are popping up recently...

Anyway: pushed.

Gerald
---
 htdocs/news.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index ca3e7dc5..b7a6f479 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -512,7 +512,7 @@ ongoing https://gcc.gnu.org/wiki/TransactionalMemory";>transactional
 memory standard has been added.  Code was contributed by Richard
 Henderson, Aldy Hernandez, and Torvald Riegel, all of Red Hat, Inc.
 The project was partially funded by
-the http://www.velox-project.eu/";>Velox project.  This
+the https://www.velox-project.eu";>Velox project.  This
 feature is experimental and is available for C and C++ on selected
 platforms.
 
-- 
2.45.1


Arm branding (was: [PATCH] doc: Document arm_v8_1m_main_cde_mve_fp)

2024-05-28 Thread Gerald Pfeifer
On Mon, 10 Jul 2023, Kyrylo Tkachov via Gcc-patches wrote:
> I know the GCC source is inconsistent on this but the proper branding 
> these days is "ARM" -> "Arm" and "ARMv8.1-M" -> "Armv8.1-M".

Arm, Red Hat, and SUSE - those three are spelt incorrectly by third 
parties more often than not, it seems. :-(

Is it always Arm now in every context and meaning (outside target 
triplets)?

If so, I'll add a line to our table in codingconventions.html?
If not, can you explain the specifics?

Gerald


Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-28 Thread Richard Biener
On Fri, May 24, 2024 at 11:27 AM Feng Xue OS
 wrote:
>
> Hi,
>
> The patch was updated with the newest trunk, and also contained some minor 
> changes.
>
> I am working on another new feature which is meant to support pattern 
> recognition
> of lane-reducing operations in affine closure originated from loop reduction 
> variable,
> like:
>
>   sum += cst1 * dot_prod_1 + cst2 * sad_2 + ... + cstN * lane_reducing_op_N
>
> The feature WIP depends on the patch. It has been a little bit long time 
> since its post,
> would you please take a time to review this one? Thanks.

This seems to do multiple things so I wonder if you can split up the
patch a bit?
For example adding lane_reducing_op_p can be split out, it also seems like
the vect_transform_reduction change to better distribute work can be done
separately?  Likewise refactoring like splitting out
vect_reduction_use_partial_vector.

When we have

   sum += d0[i] * d1[i];  // dot-prod 
   sum += w[i];   // widen-sum 
   sum += abs(s0[i] - s1[i]); // sad 
   sum += n[i];   // normal 

the vector DOT_PROD and friend ops can end up mixing different lanes
since it is not specified which lanes are reduced into which output lane.
So, DOT_PROD might combine 0-3, 4-7, ... but SAD might combine
0,4,8,12; 1,5,9,13; ... I think this isn't worse than what one op itself
is doing, but it's worth pointing out (it's probably unlikely a target
mixes different reduction strategies anyway).

Can you make sure to add at least one SLP reduction example to show
this works for SLP as well?

Richard.

> Feng
> 
>
> gcc/
> PR tree-optimization/114440
> * tree-vectorizer.h (struct _stmt_vec_info): Add a new field
> reduc_result_pos.
> (lane_reducing_op_p): New function.
> (vectorizable_lane_reducing): New function declaration.
> * tree-vect-stmts.cc (vectorizable_condition): Treat the condition
> statement that is pointed by stmt_vec_info of reduction PHI as the
> real "for_reduction" statement.
> (vect_analyze_stmt): Call new function vectorizable_lane_reducing
> to analyze lane-reducing operation.
> * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove 
> parameter
> loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
> (vect_model_reduction_cost): Remove cost computation code related to
> emulated_mixed_dot_prod.
> (vect_reduction_use_partial_vector): New function.
> (vectorizable_lane_reducing): New function.
> (vectorizable_reduction): Allow multiple lane-reducing operations in
> loop reduction. Move some original lane-reducing related code to
> vectorizable_lane_reducing, and move partial vectorization checking
> code to vect_reduction_use_partial_vector.
> (vect_transform_reduction): Extend transformation to support reduction
> statements with mixed input vectypes.
> * tree-vect-slp.cc (vect_analyze_slp): Use new function
> lane_reducing_op_p to check statement code.
>
> gcc/testsuite/
> PR tree-optimization/114440
> * gcc.dg/vect/vect-reduc-chain-1.c
> * gcc.dg/vect/vect-reduc-chain-2.c
> * gcc.dg/vect/vect-reduc-chain-3.c
> * gcc.dg/vect/vect-reduc-dot-slp-1.c
> * gcc.dg/vect/vect-reduc-dot-slp-2.c
> ---
>  .../gcc.dg/vect/vect-reduc-chain-1.c  |  62 ++
>  .../gcc.dg/vect/vect-reduc-chain-2.c  |  77 ++
>  .../gcc.dg/vect/vect-reduc-chain-3.c  |  66 ++
>  .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  97 +++
>  .../gcc.dg/vect/vect-reduc-dot-slp-2.c|  81 +++
>  gcc/tree-vect-loop.cc | 680 --
>  gcc/tree-vect-slp.cc  |   4 +-
>  gcc/tree-vect-stmts.cc|  13 +-
>  gcc/tree-vectorizer.h |  14 +
>  9 files changed, 873 insertions(+), 221 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
> new file mode 100644
> index 000..04bfc419dbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
> @@ -0,0 +1,62 @@
> +/* Disabling epilogues until we find a better way to deal with scans.  */
> +/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
> aarch64*-*-* || arm*-*-* } } } */
> +/* { dg-add-options arm_v8_2a_dotprod_neon }  */
> +
> +#include "tree-vect

Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Richard Biener
On Sat, May 25, 2024 at 4:45 PM Feng Xue OS  wrote:
>
> Some utility functions (such as vect_look_through_possible_promotion) that are
> to find out certain kind of direct or indirect definition SSA for a value, may
> return the original one of the SSA, not its pattern representative SSA, even
> pattern is involved. For example,
>
>a = (T1) patt_b;
>patt_b = (T2) c;// b = ...
>patt_c = not-a-cast;// c = ...
>
> Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
> subtlety would make some pattern recog code that is unaware of it mis-use the
> original instead of the new pattern statement, which is inconsistent wth
> processing logic of the pattern formation pass. This patch corrects the issue
> by forcing another utility function (vect_get_internal_def) return the pattern
> statement information to caller by default.
>
> Regression test on x86-64 and aarch64.
>
> Feng
> --
> gcc/
> PR tree-optimization/115060
> * tree-vect-patterns.h (vect_get_internal_def): Add a new parameter
> for_vectorize.
> (vect_widened_op_tree): Call vect_get_internal_def instead of look_def
> to get statement information.
> (vect_recog_widen_abd_pattern): No need to call 
> vect_stmt_to_vectorize.
> ---
>  gcc/tree-vect-patterns.cc | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index a313dc64643..fa35bf26372 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -258,15 +258,21 @@ vect_element_precision (unsigned int precision)
>  }
>
>  /* If OP is defined by a statement that's being considered for vectorization,
> -   return information about that statement, otherwise return NULL.  */
> +   return information about that statement, otherwise return NULL.
> +   FOR_VECTORIZE is used to specify whether original or vectorization
> +   representative (if have) statement information is returned.  */
>
>  static stmt_vec_info
> -vect_get_internal_def (vec_info *vinfo, tree op)
> +vect_get_internal_def (vec_info *vinfo, tree op, bool for_vectorize = true)

I'm probably blind - but you nowhere pass 'false' and I think returning the
pattern stmt is the correct behavior always.

OK with omitting the new parameter.

>  {
>stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
>if (def_stmt_info
>&& STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def)
> -return def_stmt_info;
> +{
> +  if (for_vectorize)
> +   def_stmt_info = vect_stmt_to_vectorize (def_stmt_info);
> +  return def_stmt_info;
> +}
>return NULL;
>  }
>
> @@ -655,7 +661,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>
>   /* Recursively process the definition of the operand.  */
>   stmt_vec_info def_stmt_info
> -   = vinfo->lookup_def (this_unprom->op);
> +   = vect_get_internal_def (vinfo, this_unprom->op);
> +
>   nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>widened_code, shift_p, max_nops,
>this_unprom, common_type,
> @@ -1739,7 +1746,6 @@ vect_recog_widen_abd_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>if (!abd_pattern_vinfo)
>  return NULL;
>
> -  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
>gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
>if (!abd_stmt


RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-28 Thread Tamar Christina
> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, May 22, 2024 10:29 AM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ktkac...@gcc.gnu.org
> Subject: RE: [PATCH 2/4]AArch64: add new tuning param and attribute for
> enabling conditional early clobber
> 
> >
> > Sorry for the bike-shedding, but how about something like "avoid_pred_rmw"?
> > (I'm open to other suggestions.)  Just looking for something that describes
> > either the architecture or the end result that we want to achieve.
> > And preferable something fairly short :)
> >
> > avoid_* would be consistent with the existing "avoid_cross_loop_fma".
> >
> > > +
> > >  #undef AARCH64_EXTRA_TUNING_OPTION
> > > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> > > index
> >
> bbf11faaf4b4340956094a983f8b0dc2649b2d27..76a18dd511f40ebb58ed12d5
> > 6b46c74084ba7c3c 100644
> > > --- a/gcc/config/aarch64/aarch64.h
> > > +++ b/gcc/config/aarch64/aarch64.h
> > > @@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE =
> > AARCH64_FL_SM_OFF;
> > >  enabled through +gcs.  */
> > >  #define TARGET_GCS (AARCH64_ISA_GCS)
> > >
> > > +/*  Prefer different predicate registers for the output of a predicated 
> > > operation
> > over
> > > +re-using an existing input predicate.  */
> > > +#define TARGET_SVE_PRED_CLOBBER (TARGET_SVE \
> > > +  && (aarch64_tune_params.extra_tuning_flags \
> > > +  &
> > AARCH64_EXTRA_TUNE_EARLY_CLOBBER_SVE_PRED_DEST))
> > >
> > >  /* Standard register usage.  */
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > > index
> >
> dbde066f7478bec51a8703b017ea553aa98be309..1ecd1a2812969504bd5114a
> > 53473b478c5ddba82 100644
> > > --- a/gcc/config/aarch64/aarch64.md
> > > +++ b/gcc/config/aarch64/aarch64.md
> > > @@ -445,6 +445,10 @@ (define_enum_attr "arch" "arches" (const_string
> > "any"))
> > >  ;; target-independent code.
> > >  (define_attr "is_call" "no,yes" (const_string "no"))
> > >
> > > +;; Indicates whether we want to enable the pattern with an optional early
> > > +;; clobber for SVE predicates.
> > > +(define_attr "pred_clobber" "no,yes" (const_string "no"))
> > > +
> > >  ;; [For compatibility with Arm in pipeline models]
> > >  ;; Attribute that specifies whether or not the instruction touches fp
> > >  ;; registers.
> > > @@ -461,7 +465,8 @@ (define_attr "fp" "no,yes"
> > >  (define_attr "arch_enabled" "no,yes"
> > >(if_then_else
> > >  (ior
> > > - (eq_attr "arch" "any")
> > > + (and (eq_attr "arch" "any")
> > > +  (eq_attr "pred_clobber" "no"))
> > >
> > >   (and (eq_attr "arch" "rcpc8_4")
> > >(match_test "AARCH64_ISA_RCPC8_4"))
> > > @@ -488,7 +493,10 @@ (define_attr "arch_enabled" "no,yes"
> > >(match_test "TARGET_SVE"))
> > >
> > >   (and (eq_attr "arch" "sme")
> > > -  (match_test "TARGET_SME")))
> > > +  (match_test "TARGET_SME"))
> > > +
> > > + (and (eq_attr "pred_clobber" "yes")
> > > +  (match_test "TARGET_SVE_PRED_CLOBBER")))
> >
> > IMO it'd be bettero handle pred_clobber separately from arch, as a new
> > top-level AND:
> >
> >   (and
> > (ior
> >   (eq_attr "pred_clobber" "no")
> >   (match_test "!TARGET_..."))
> > (ior
> >   ...existing arch tests...))
> >
> 

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def
(AVOID_PRED_RMW): New.
* config/aarch64/aarch64.h (TARGET_SVE_PRED_CLOBBER): New.
* config/aarch64/aarch64.md (pred_clobber): New.
(arch_enabled): Use it.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 
d5bcaebce770f0b217aac783063d39135f754c77..a9f48f5d3d4ea32fbf53086ba21eab4bc65b6dcb
 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -48,4 +48,8 @@ AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", 
AVOID_CROSS_LOOP_FMA)
 
 AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_fma", FULLY_PIPELINED_FMA)
 
+/* Enable is the target prefers to use a fresh register for predicate outputs
+   rather than re-use an input predicate register.  */
+AARCH64_EXTRA_TUNING_OPTION ("avoid_pred_rmw", AVOID_PRED_RMW)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
bbf11faaf4b4340956094a983f8b0dc2649b2d27..e7669e65d7dae5df2ba42c265079b1856a5c382b
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -495,6 +495,11 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
AARCH64_FL_SM_OFF;
 enabled through +gcs.  */
 #define TARGET_GCS (AARCH64_ISA_GCS)
 
+/*  Prefer different predicate registers for the output of a predicated 
operation over
+re-using an existing input predic

RE: [PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-28 Thread Tamar Christina


> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, May 22, 2024 12:24 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to
> patterns
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Wednesday, May 22, 2024 10:48 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; Marcus Shawcroft
> >> ; ktkac...@gcc.gnu.org
> >> Subject: Re: [PATCH 3/4]AArch64: add new alternative with early clobber to
> >> patterns
> >>
> >> Tamar Christina  writes:
> >> > Hi All,
> >> >
> >> > This patch adds new alternatives to the patterns which are affected.  
> >> > The new
> >> > alternatives with the conditional early clobbers are added before the 
> >> > normal
> >> > ones in order for LRA to prefer them in the event that we have enough 
> >> > free
> >> > registers to accommodate them.
> >> >
> >> > In case register pressure is too high the normal alternatives will be 
> >> > preferred
> >> > before a reload is considered as we rather have the tie than a spill.
> >> >
> >> > Tests are in the next patch.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * config/aarch64/aarch64-sve.md (and3,
> >> >  @aarch64_pred__z, *3_cc,
> >> >  *3_ptest, aarch64_pred__z,
> >> >  *3_cc, *3_ptest,
> >> >  aarch64_pred__z, *3_cc,
> >> >  *3_ptest, @aarch64_pred_cmp,
> >> >  *cmp_cc, *cmp_ptest,
> >> >  @aarch64_pred_cmp_wide,
> >> >  *aarch64_pred_cmp_wide_cc,
> >> >  *aarch64_pred_cmp_wide_ptest,
> >> @aarch64_brk,
> >> >  *aarch64_brk_cc, *aarch64_brk_ptest,
> >> >  @aarch64_brk, *aarch64_brkn_cc, *aarch64_brkn_ptest,
> >> >  *aarch64_brk_cc, *aarch64_brk_ptest,
> >> >  aarch64_rdffr_z, *aarch64_rdffr_z_ptest, *aarch64_rdffr_ptest,
> >> >  *aarch64_rdffr_z_cc, *aarch64_rdffr_cc): Add new early clobber
> >> >  alternative.
> >> >  * config/aarch64/aarch64-sve2.md
> >> >  (@aarch64_pred_): Likewise.
> >> >
> >> > ---
> >> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> b/gcc/config/aarch64/aarch64-
> >> sve.md
> >> > index
> >>
> e3085c0c636f1317409bbf3b5fbaf5342a2df1f6..8fdc1bc3cd43acfcd675a18350c
> >> 297428c85fe46 100644
> >> > --- a/gcc/config/aarch64/aarch64-sve.md
> >> > +++ b/gcc/config/aarch64/aarch64-sve.md
> >> > @@ -1161,8 +1161,10 @@ (define_insn "aarch64_rdffr_z"
> >> >(reg:VNx16BI FFRT_REGNUM)
> >> >(match_operand:VNx16BI 1 "register_operand")))]
> >> >"TARGET_SVE && TARGET_NON_STREAMING"
> >> > -  {@ [ cons: =0, 1   ]
> >> > - [ Upa , Upa ] rdffr\t%0.b, %1/z
> >> > +  {@ [ cons: =0, 1  ; attrs: pred_clobber ]
> >> > + [ &Upa, Upa; yes ] rdffr\t%0.b, %1/z
> >> > + [ ?Upa, Upa; yes ] ^
> >> > + [ Upa , Upa; *   ] ^
> >> >}
> >> >  )
> >>
> >> Sorry for not explaining it very well, but in the previous review I 
> >> suggested:
> >>
> >> > The gather-like approach would be something like:
> >> >
> >> >  [ &Upa , Upl , w , ; yes ]
> >> cmp\t%0., %1/z, %3., #%4
> >> >  [ ?Upl , 0   , w , ; yes ] ^
> >> >  [ Upa  , Upl , w , ; no  ] ^
> >> >  [ &Upa , Upl , w , w; yes ] 
> >> > cmp\t%0.,
> %1/z,
> >> %3., %4.
> >> >  [ ?Upl , 0   , w , w; yes ] ^
> >> >  [ Upa  , Upl , w , w; no  ] ^
> >> >
> >> > with:
> >> >
> >> >   (define_attr "pred_clobber" "any,no,yes" (const_string "any"))
> >>
> >> (with emphasis on the last line).  What I didn't say explicitly is
> >> that "no" should require !TARGET_SVE_PRED_CLOBBER.
> >>
> >> The premise of that review was that we shouldn't enable things like:
> >>
> >>  [ Upa  , Upl , w , w; no  ] ^
> >>
> >> for TARGET_SVE_PRED_CLOBBER since it contradicts the earlyclobber
> >> alternative.  So we should enable either the pred_clobber=yes
> >> alternatives or the pred_clobber=no alternatives, but not both.
> >>
> >> The default "any" is then for other non-predicate instructions that
> >> don't care about TARGET_SVE_PRED_CLOBBER either way.
> >>
> >> In contrast, this patch makes pred_clobber=yes enable the alternatives
> >> that correctly describe the restriction (good!) but then also enables
> >> the normal alternatives too, which IMO makes the semantics unclear.
> >
> > Sure, the reason I still had that is because this ICEs under high register
> > pressure:
> >
> >   {@ [ cons: =0 , 1   , 3 , 4; attrs: pred_clobber ]
> >  [ &Upa , Upl , w , ; yes ]
> cmp\t%0., %1/z, %3., #%4
> >  [ ?Upa , 0   , w , ; yes ] ^
> >  [ Upa  

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-28 Thread Richard Biener
On Sat, May 25, 2024 at 4:54 PM Feng Xue OS  wrote:
>
> Both derived classes ( loop_vec_info/bb_vec_info) have their own "bbs"
> field, which have exactly same purpose of recording all basic blocks
> inside the corresponding vect region, while the fields are composed by
> different data type, one is normal array, the other is auto_vec. This
> difference causes some duplicated code even handling the same stuff,
> almost in tree-vect-patterns. One refinement is lifting this field into the
> base class "vec_info", and reset its value to the continuous memory area
> pointed by two old "bbs" in each constructor of derived classes.

Nice.  But.  bbs_as_vector - why is that necessary?  Why is vinfo->bbs
not a vec?  Having bbs and nbbs feels like a step back.

Note the code duplications can probably be removed by "indirecting"
through an array_slice.

I'm a bit torn to approve this as-is given the above.  Can you explain what
made you not choose vec<> for bbs?  I bet you tried.

Richard.

> Regression test on x86-64 and aarch64.
>
> Feng
> --
> gcc/
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
> initialization of bbs to explicit construction code.  Adjust the
> definition of nbbs.
> * tree-vect-pattern.cc (vect_determine_precisions): Make
> loop_vec_info and bb_vec_info share same code.
> (vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
> via base vec_info class.
> (_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
> fields of input auto_vec<> bbs.
> (_bb_vec_info::_bb_vec_info): Add assertions on bbs and nbbs to ensure
> they are not changed externally.
> (vect_slp_region): Use access to nbbs to replace original
> bbs.length().
> (vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
> * tree-vectorizer.cc (vec_info::vec_info): Add initialization of
> bbs and nbbs.
> (vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
> class.
> * tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
> (_loop_vec_info): Remove field bbs.
> (_bb_vec_info): Rename old bbs field to bbs_as_vector, and make it
> be private.
> ---
>  gcc/tree-vect-loop.cc |   6 +-
>  gcc/tree-vect-patterns.cc | 142 +++---
>  gcc/tree-vect-slp.cc  |  24 ---
>  gcc/tree-vectorizer.cc|   7 +-
>  gcc/tree-vectorizer.h |  19 ++---
>  5 files changed, 72 insertions(+), 126 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 83c0544b6aa..aef17420a5f 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
>  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>: vec_info (vec_info::loop, shared),
>  loop (loop_in),
> -bbs (XCNEWVEC (basic_block, loop->num_nodes)),
>  num_itersm1 (NULL_TREE),
>  num_iters (NULL_TREE),
>  num_iters_unchanged (NULL_TREE),
> @@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>   case of the loop forms we allow, a dfs order of the BBs would the same
>   as reversed postorder traversal, so we are safe.  */
>
> -  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> - bbs, loop->num_nodes, loop);
> +  bbs = XCNEWVEC (basic_block, loop->num_nodes);
> +  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
> +loop->num_nodes, loop);
>gcc_assert (nbbs == loop->num_nodes);
>
>for (unsigned int i = 0; i < nbbs; i++)
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index a313dc64643..848a3195a93 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
> stmt_vec_info stmt_info)
>  void
>  vect_determine_precisions (vec_info *vinfo)
>  {
> +  basic_block *bbs = vinfo->bbs;
> +  unsigned int nbbs = vinfo->nbbs;
> +
>DUMP_VECT_SCOPE ("vect_determine_precisions");
>
> -  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
> +  for (unsigned int i = 0; i < nbbs; i++)
>  {
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> -  unsigned int nbbs = loop->num_nodes;
> -
> -  for (unsigned int i = 0; i < nbbs; i++)
> +  basic_block bb = bbs[i];
> +  for (auto gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> {
> - basic_block bb = bbs[i];
> - for (auto gsi = gsi_start_phis (bb);
> -  !gsi_end_p (gsi); gsi_next (&gsi))
> -   {
> - stmt_vec_info stmt_info = vinfo->lookup_st

Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Alexander Monakov

On Tue, 28 May 2024, Richard Biener wrote:

> On Wed, May 15, 2024 at 12:59 PM Alexander Monakov  wrote:
> >
> >
> > Hello,
> >
> > I'd like to ask if anyone has any new thoughts on this patch.
> >
> > Let me also point out that valgrind/memcheck.h is permissively
> > licensed (BSD-style, rest of Valgrind is GPLv2), with the intention
> > to allow importing into projects that are interested in using
> > client requests without build-time dependency on installed headers.
> > So maybe we have that as an option too.
> 
> Inlining the VALGRIND_DO_CLIENT_REQUEST_EXPR would be a lot
> cheaper

I am a bit confused what you mean by "cheaper". Could it be that we are not
on the same page regarding the machine code behind client requests?

Here's how the proposed libgcc helper compiles on amd64:

; Prepare the descriptor structure on the stack
movq$1296236545, -56(%rsp)
leaq-56(%rsp), %rax
xorl%edx, %edx
movq%rdi, -48(%rsp)
movq%rsi, -40(%rsp)
movq$0, -32(%rsp)
movq$0, -24(%rsp)
movq$0, -16(%rsp)
#APP
; Request preamble: Valgrind detects the useless
; rotate sequence. Other architectures use different
; preambles.
rolq $3,  %rdi ; rolq $13, %rdi
rolq $61, %rdi ; rolq $51, %rdi
; Request trigger (also varies by target).
xchgq %rbx,%rbx
#NO_APP
; The (ignored) result is a volatile automatic variable
movq%rdx, -64(%rsp)
movq-64(%rsp), %rax

I think this is not well-suited for inlining, and to expand it you need to know
the layout of the descriptor struct and machine-specific preamble+trigger.

Also I am not sure where that would leave us with target support. What if
folks will be eventually interested in using this to hunt down mysterious
miscompilations on, say, PowerPC?
 
> and would not add to the libgcc ABI.

What about linking a new library with that helper?

> I would guess the valgrind "ABI" for these is practically fixed but of course
> architecture dependent.
> 
> I do like the feature in general.

Thank you.
Alexander

[PATCH] tree-optimization/115254 - don't account single-lane SLP against discovery limit

2024-05-28 Thread Richard Biener
The following avoids accounting single-lane SLP to the discovery
limit.  As the two testcases show this makes discovery fail,
unfortunately even not the same across targets.  The following
should fix two FAILs for GCN as a side-effect.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115254
* tree-vect-slp.cc (vect_build_slp_tree): Only account
multi-lane SLP to limit.

* gcc.dg/vect/slp-cond-2-big-array.c: Expect 4 times SLP.
* gcc.dg/vect/slp-cond-2.c: Likewise.
---
 .../gcc.dg/vect/slp-cond-2-big-array.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-cond-2.c|  2 +-
 gcc/tree-vect-slp.cc  | 31 +++
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c 
b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
index cb7eb94b3a3..9a9f63c0b8d 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2-big-array.c
@@ -128,4 +128,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c 
b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
index 1dcee46cd95..08bbb3dbec6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-cond-2.c
@@ -128,4 +128,4 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } 
} */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0dd9a4daf6a..bbfde8849c1 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1725,21 +1725,26 @@ vect_build_slp_tree (vec_info *vinfo,
   SLP_TREE_SCALAR_STMTS (res) = stmts;
   bst_map->put (stmts.copy (), res);
 
-  if (*limit == 0)
+  /* Single-lane SLP doesn't have the chance of run-away, do not account
+ it to the limit.  */
+  if (stmts.length () > 1)
 {
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"SLP discovery limit exceeded\n");
-  /* Mark the node invalid so we can detect those when still in use
-as backedge destinations.  */
-  SLP_TREE_SCALAR_STMTS (res) = vNULL;
-  SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def;
-  res->failed = XNEWVEC (bool, group_size);
-  memset (res->failed, 0, sizeof (bool) * group_size);
-  memset (matches, 0, sizeof (bool) * group_size);
-  return NULL;
+  if (*limit == 0)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"SLP discovery limit exceeded\n");
+ /* Mark the node invalid so we can detect those when still in use
+as backedge destinations.  */
+ SLP_TREE_SCALAR_STMTS (res) = vNULL;
+ SLP_TREE_DEF_TYPE (res) = vect_uninitialized_def;
+ res->failed = XNEWVEC (bool, group_size);
+ memset (res->failed, 0, sizeof (bool) * group_size);
+ memset (matches, 0, sizeof (bool) * group_size);
+ return NULL;
+   }
+  --*limit;
 }
-  --*limit;
 
   if (dump_enabled_p ())
 dump_printf_loc (MSG_NOTE, vect_location,
-- 
2.35.3


Re: [PATCH] Fix -Wstringop-overflow warning in 23_containers/vector/types/1.cc

2024-05-28 Thread Jonathan Wakely

On 27/05/24 22:07 +0200, François Dumont wrote:

In C++98 this test fails with:

Excess errors:
/home/fdumont/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/stl_algobase.h:452: 
warning: 'void* __builtin_memcpy(void*, const void*, long unsigned 
int)' writing between 2 and 9223372036854775806 bytes into a region of 
size 0 overflows the destination [-Wstringop-overflow=]


The attached patch avoids this warning.

    libstdc++: Fix -Wstringop-overflow warning coming from std::vector

    Make vector<>::_M_range_insert implementation more transparent to 
the compiler checks.


    Extend local copies of members to the whole method scope so that 
all branches benefit

    from those.

    libstdc++-v3/ChangeLog:

    * include/bits/vector.tcc
    (std::vector<>::_M_range_insert(iterator, _FwdIt, _FwdIt, 
forward_iterator_tag)):

    Use local copies of members to call the different algorithms.

Ok to commit if all tests passes ?

François



diff --git a/libstdc++-v3/include/bits/vector.tcc 
b/libstdc++-v3/include/bits/vector.tcc
index 36b27dce7b9..671929dee55 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -885,83 +885,80 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  {
if (__first != __last)
  {
+   // Make local copies of these members because the compiler
+   // thinks the allocator can alter them if 'this' is globally
+   // reachable.
+   pointer __start = this->_M_impl._M_start;
+   pointer __end = this->_M_impl._M_end_of_storage;
+   pointer __finish = this->_M_impl._M_finish;
+   pointer __pos = __position.base();
+   _Tp_alloc_type& __allocator = _M_get_Tp_allocator();
+
+   if (__pos < __start || __finish < __pos)
+ __builtin_unreachable();


I don't think we should use __builtin_unreachable for something which
is not an invariant of the class. The __position argument is supplied
by the user, so we should not make promises about it being valid,
because we can't know that.

We can promise that __start <= __finish, and that __finish <= end,
because we control those. We can't promise the user won't pass in a
bad __position. Although it's undefined for the user to do that, using
__builtin_unreachable() here makes the effects worse, and makes it
harder to debug.

Also, (__pos < __start) might already trigger undefined behaviour for
fancy pointers, if they don't point to the same memory region.

So this change is not OK.



+
const size_type __n = std::distance(__first, __last);
-   if (size_type(this->_M_impl._M_end_of_storage
- - this->_M_impl._M_finish) >= __n)
+   if (size_type(__end - __finish) >= __n)
  {
-   const size_type __elems_after = end() - __position;
-   pointer __old_finish(this->_M_impl._M_finish);
+   const size_type __elems_after = __end - __pos;
+   pointer __old_finish(__finish);
if (__elems_after > __n)
  {
_GLIBCXX_ASAN_ANNOTATE_GROW(__n);
-   std::__uninitialized_move_a(this->_M_impl._M_finish - __n,
-   this->_M_impl._M_finish,
-   this->_M_impl._M_finish,
-   _M_get_Tp_allocator());
-   this->_M_impl._M_finish += __n;
+   __finish = std::__uninitialized_move_a
+ (__finish - __n, __finish, __finish, __allocator);
_GLIBCXX_ASAN_ANNOTATE_GREW(__n);
-   _GLIBCXX_MOVE_BACKWARD3(__position.base(),
-   __old_finish - __n, __old_finish);
-   std::copy(__first, __last, __position);
+   _GLIBCXX_MOVE_BACKWARD3
+ (__pos, __old_finish - __n, __old_finish);
+   std::copy(__first, __last, __pos);
  }
else
  {
_ForwardIterator __mid = __first;
std::advance(__mid, __elems_after);
_GLIBCXX_ASAN_ANNOTATE_GROW(__n);
-   std::__uninitialized_copy_a(__mid, __last,
-   this->_M_impl._M_finish,
-   _M_get_Tp_allocator());
-   this->_M_impl._M_finish += __n - __elems_after;
+   __finish = std::__uninitialized_copy_a
+ (__mid, __last, __finish, __allocator);
_GLIBCXX_ASAN_ANNOTATE_GREW(__n - __elems_after);
-   std::__uninitialized_move_a(__position.base(),
-   __old_finish,
-   this->_M_impl._M_finish,
-   

Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-28 Thread Jonathan Wakely
On Mon, 27 May 2024 at 05:37, François Dumont  wrote:
>
> Here is a new version working also in C++98.

Can we use a different solution that doesn't involve an explicit
template argument list for that __uninitialized_fill_n_a call?

-+this->_M_impl._M_finish = std::__uninitialized_fill_n_a
++this->_M_impl._M_finish =
++  std::__uninitialized_fill_n_a
+  (__start, __n, __value, _M_get_Tp_allocator());

Using _M_fill_initialize solves the problem :-)



>
> Note that I have this failure:
>
> FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)
>
> but it's already failing on master, my patch do not change anything.

Yes, that's been failing for ages.

>
> Tested under Linux x64,
>
> still ok to commit ?
>
> François
>
> On 24/05/2024 16:17, Jonathan Wakely wrote:
> > On Thu, 23 May 2024 at 18:38, François Dumont  wrote:
> >>
> >> On 23/05/2024 15:31, Jonathan Wakely wrote:
> >>> On 23/05/24 06:55 +0200, François Dumont wrote:
>  As explained in this email:
> 
>  https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html
> 
>  I experimented -Wfree-nonheap-object because of my enhancements on
>  algos.
> 
>  So here is a patch to extend the usage of the _Guard type to other
>  parts of vector.
> >>> Nice, that fixes the warning you were seeing?
> >> Yes ! I indeed forgot to say so :-)
> >>
> >>
> >>> We recently got a bug report about -Wfree-nonheap-object in
> >>> std::vector, but that is coming from _M_realloc_append which already
> >>> uses the RAII guard :-(
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016
> >> Note that I also had to move call to __uninitialized_copy_a before
> >> assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
> >> warn. But _M_realloc_append is already doing potentially throwing
> >> operations before assigning this->_M_impl so it must be something else.
> >>
> >> Though it made me notice another occurence of _Guard in this method. Now
> >> replaced too in this new patch.
> >>
> >>   libstdc++: Use RAII to replace try/catch blocks
> >>
> >>   Move _Guard into std::vector declaration and use it to guard all
> >> calls to
> >>   vector _M_allocate.
> >>
> >>   Doing so the compiler has more visibility on what is done with the
> >> pointers
> >>   and do not raise anymore the -Wfree-nonheap-object warning.
> >>
> >>   libstdc++-v3/ChangeLog:
> >>
> >>   * include/bits/vector.tcc (_Guard): Move all the nested
> >> duplicated class...
> >>   * include/bits/stl_vector.h (_Guard_alloc): ...here.
> >>   (_M_allocate_and_copy): Use latter.
> >>   (_M_initialize_dispatch): Likewise and set _M_finish first
> >> from the result
> >>   of __uninitialize_fill_n_a that can throw.
> >>   (_M_range_initialize): Likewise.
> >>
>  diff --git a/libstdc++-v3/include/bits/stl_vector.h
>  b/libstdc++-v3/include/bits/stl_vector.h
>  index 31169711a48..4ea74e3339a 100644
>  --- a/libstdc++-v3/include/bits/stl_vector.h
>  +++ b/libstdc++-v3/include/bits/stl_vector.h
>  @@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> clear() _GLIBCXX_NOEXCEPT
> { _M_erase_at_end(this->_M_impl._M_start); }
> 
>  +private:
>  +  // RAII guard for allocated storage.
>  +  struct _Guard
> >>> If it's being defined at class scope instead of locally in a member
> >>> function, I think a better name would be good. Maybe _Ptr_guard or
> >>> _Dealloc_guard or something.
> >> _Guard_alloc chosen.
>  +  {
>  +pointer _M_storage;// Storage to deallocate
>  +size_type _M_len;
>  +_Base& _M_vect;
>  +
>  +_GLIBCXX20_CONSTEXPR
>  +_Guard(pointer __s, size_type __l, _Base& __vect)
>  +: _M_storage(__s), _M_len(__l), _M_vect(__vect)
>  +{ }
>  +
>  +_GLIBCXX20_CONSTEXPR
>  +~_Guard()
>  +{
>  +  if (_M_storage)
>  +_M_vect._M_deallocate(_M_storage, _M_len);
>  +}
>  +
>  +_GLIBCXX20_CONSTEXPR
>  +pointer
>  +_M_release()
>  +{
>  +  pointer __res = _M_storage;
>  +  _M_storage = 0;
> >>> I don't think the NullablePointer requirements include assigning 0,
> >>> only from nullptr, which isn't valid in C++98.
> >>>
> >>> https://en.cppreference.com/w/cpp/named_req/NullablePointer
> >>>
> >>> Please use _M_storage = pointer() instead.
> >> I forgot about user fancy pointer, fixed.
> >>
> >>
>  +  return __res;
>  +}
>  +
>  +  private:
>  +_Guard(const _Guard&);
>  +  };
>  +
>   protected:
> /**
>  *  Memory expansion handler.  Uses the member allocation
>  function to
>  @@ -1618,18 +1651,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
>   _M_allocate_and_copy(size_type _

[PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread liuhongt
> IMO, there is no need for CONST_INT_P condition, we should also allow
> symbol_ref, label_ref and const (all allowed by
> x86_64_immediate_operand predicate), these all decay to an immediate
> value.

Changed.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk.

For MEM, rtx_cost iterates each subrtx, and adds up the costs,
so for MEM (reg) and MEM (reg + 4), the former costs 5,
the latter costs 9, it is not accurate for x86. Ideally
address_cost should be used, but it reduce cost too much.
So current solution is make constant disp as cheap as possible.

gcc/ChangeLog:

PR target/67325
* config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
+ imm) to "cost of MEM (A)" + 1.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr67325.c: New test.
---
 gcc/config/i386/i386.cc | 18 +-
 gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3e2a3a194f1..85d87b9f778 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22194,7 +22194,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
   /* An insn that accesses memory is slightly more expensive
  than one that does not.  */
   if (speed)
-*total += 1;
+   {
+ *total += 1;
+ rtx addr = XEXP (x, 0);
+ /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
+so for MEM (reg) and MEM (reg + 4), the former costs 5,
+the latter costs 9, it is not accurate for x86. Ideally
+address_cost should be used, but it reduce cost too much.
+So current solution is make constant disp as cheap as possible.  */
+ if (GET_CODE (addr) == PLUS
+ && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
+   {
+ *total += 1;
+ *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
+ return true;
+   }
+   }
+
   return false;
 
 case ZERO_EXTRACT:
diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
b/gcc/testsuite/gcc.target/i386/pr67325.c
new file mode 100644
index 000..c3c1e4c5b4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67325.c
@@ -0,0 +1,7 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
+
+int f(long*l){
+  return *l>>32;
+}
-- 
2.31.1



[wwwdocs][patch] gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

2024-05-28 Thread Tobias Burnus
Let's make https://gcc.gnu.org/gcc-15/changes.html a bit more useful … 
While there were several useful Fortran commits already, only one seems 
to be about a new feature.


Thus, document selected_logical_kind and the ISO_FORTRAN_ENV additions.

Comments or suggestions before I commit it?

Tobias
Title: GCC 15 Release Series — Changes, New Features, and Fixes








GCC 15 Release SeriesChanges, New Features, and Fixes


This page is a "brief" summary of some of the huge number of improvements
in GCC 15.



Note: GCC 15 has not been released yet, so this document is
a work-in-progress.


Caveats

  ...




General Improvements


New Languages and Language specific improvements










Fortran


  Fortran 2023: The selected_logical_kind intrinsic function
  and, in the ISO_FORTRAN_ENV module, the named constants
  logical{8,16,32,64} and real16 were added.








New Targets and Target Specific Improvements








































Operating Systems



























Other significant improvements










Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 11:46 AM Alexander Monakov  wrote:
>
>
> On Tue, 28 May 2024, Richard Biener wrote:
>
> > On Wed, May 15, 2024 at 12:59 PM Alexander Monakov  
> > wrote:
> > >
> > >
> > > Hello,
> > >
> > > I'd like to ask if anyone has any new thoughts on this patch.
> > >
> > > Let me also point out that valgrind/memcheck.h is permissively
> > > licensed (BSD-style, rest of Valgrind is GPLv2), with the intention
> > > to allow importing into projects that are interested in using
> > > client requests without build-time dependency on installed headers.
> > > So maybe we have that as an option too.
> >
> > Inlining the VALGRIND_DO_CLIENT_REQUEST_EXPR would be a lot
> > cheaper
>
> I am a bit confused what you mean by "cheaper". Could it be that we are not
> on the same page regarding the machine code behind client requests?

Probably "cheaper" in register usage.  I also wondered if valgrind is happy
with these when applied to stack space allocated in the caller?  Is there
means to verify valgrind picks them up appropriately (as opposed to
simply ignore them)?

> Here's how the proposed libgcc helper compiles on amd64:
>
> ; Prepare the descriptor structure on the stack
> movq$1296236545, -56(%rsp)
> leaq-56(%rsp), %rax
> xorl%edx, %edx
> movq%rdi, -48(%rsp)
> movq%rsi, -40(%rsp)
> movq$0, -32(%rsp)
> movq$0, -24(%rsp)
> movq$0, -16(%rsp)
> #APP
> ; Request preamble: Valgrind detects the useless
> ; rotate sequence. Other architectures use different
> ; preambles.
> rolq $3,  %rdi ; rolq $13, %rdi
> rolq $61, %rdi ; rolq $51, %rdi
> ; Request trigger (also varies by target).
> xchgq %rbx,%rbx
> #NO_APP
> ; The (ignored) result is a volatile automatic variable
> movq%rdx, -64(%rsp)
> movq-64(%rsp), %rax
>
> I think this is not well-suited for inlining, and to expand it you need to 
> know
> the layout of the descriptor struct and machine-specific preamble+trigger.
>
> Also I am not sure where that would leave us with target support. What if
> folks will be eventually interested in using this to hunt down mysterious
> miscompilations on, say, PowerPC?

No idea ;)  But the same argument applies when libgcc from newer compilers
suddenly change that "ABI" because the valgrind version built against changes?

> > and would not add to the libgcc ABI.
>
> What about linking a new library with that helper?

I guess that would work for me (a static library, that is).  Ideally
valgrind itself
would provide it so it's clear its tied to the valgrind version rather than to a
GCC version.

> > I would guess the valgrind "ABI" for these is practically fixed but of 
> > course
> > architecture dependent.
> >
> > I do like the feature in general.
>
> Thank you.
> Alexander


Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Alexander Monakov


On Tue, 28 May 2024, Richard Biener wrote:

> > I am a bit confused what you mean by "cheaper". Could it be that we are not
> > on the same page regarding the machine code behind client requests?
> 
> Probably "cheaper" in register usage.

But it doesn't matter considering that execution under Valgrind is about 40x
slower than native. The intended use is that the project is rebuilt with
this instrumentation, run under Valgrind, then discarded.

Here's an argument against inlining: it makes breakpointing on the helper
possible. And it may be actually necessary.

> I also wondered if valgrind is happy with these when applied to stack space
> allocated in the caller?  Is there means to verify valgrind picks them up
> appropriately (as opposed to simply ignore them)?

Yes, it works. Exercising this scenario under gcc.dg does not seem easy, though.

> No idea ;)  But the same argument applies when libgcc from newer compilers
> suddenly change that "ABI" because the valgrind version built against changes?

This was raised previously with Jakub. I find it implausible that Valgrind
folks will make incompatible changes to the client request ABI (they know to
keep old requests working when ehnancing the interface).

> > What about linking a new library with that helper?
> 
> I guess that would work for me (a static library, that is).  Ideally valgrind
> itself would provide it so it's clear its tied to the valgrind version rather
> than to a GCC version.

How about packaging all of this separately as a plugin?

Thanks.
Alexander


Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt  wrote:
>
> > IMO, there is no need for CONST_INT_P condition, we should also allow
> > symbol_ref, label_ref and const (all allowed by
> > x86_64_immediate_operand predicate), these all decay to an immediate
> > value.
>
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> so for MEM (reg) and MEM (reg + 4), the former costs 5,
> the latter costs 9, it is not accurate for x86. Ideally
> address_cost should be used, but it reduce cost too much.
> So current solution is make constant disp as cheap as possible.
>
> gcc/ChangeLog:
>
> PR target/67325
> * config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
> + imm) to "cost of MEM (A)" + 1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr67325.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 18 +-
>  gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..85d87b9f778 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22194,7 +22194,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> -*total += 1;
> +   {
> + *total += 1;
> + rtx addr = XEXP (x, 0);
> + /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> +so for MEM (reg) and MEM (reg + 4), the former costs 5,
> +the latter costs 9, it is not accurate for x86. Ideally
> +address_cost should be used, but it reduce cost too much.
> +So current solution is make constant disp as cheap as possible.  
> */
> + if (GET_CODE (addr) == PLUS
> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> +   {
> + *total += 1;
> + *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> + return true;
> +   }
> +   }
> +
>return false;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
> b/gcc/testsuite/gcc.target/i386/pr67325.c
> new file mode 100644
> index 000..c3c1e4c5b4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67325.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
> +
> +int f(long*l){
> +  return *l>>32;
> +}
> --
> 2.31.1
>


Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 1:38 PM Alexander Monakov  wrote:
>
>
> On Tue, 28 May 2024, Richard Biener wrote:
>
> > > I am a bit confused what you mean by "cheaper". Could it be that we are 
> > > not
> > > on the same page regarding the machine code behind client requests?
> >
> > Probably "cheaper" in register usage.
>
> But it doesn't matter considering that execution under Valgrind is about 40x
> slower than native. The intended use is that the project is rebuilt with
> this instrumentation, run under Valgrind, then discarded.
>
> Here's an argument against inlining: it makes breakpointing on the helper
> possible. And it may be actually necessary.
>
> > I also wondered if valgrind is happy with these when applied to stack space
> > allocated in the caller?  Is there means to verify valgrind picks them up
> > appropriately (as opposed to simply ignore them)?
>
> Yes, it works. Exercising this scenario under gcc.dg does not seem easy, 
> though.
>
> > No idea ;)  But the same argument applies when libgcc from newer compilers
> > suddenly change that "ABI" because the valgrind version built against 
> > changes?
>
> This was raised previously with Jakub. I find it implausible that Valgrind
> folks will make incompatible changes to the client request ABI (they know to
> keep old requests working when ehnancing the interface).

OK, I see.

> > > What about linking a new library with that helper?
> >
> > I guess that would work for me (a static library, that is).  Ideally 
> > valgrind
> > itself would provide it so it's clear its tied to the valgrind version 
> > rather
> > than to a GCC version.
>
> How about packaging all of this separately as a plugin?

Well, sure - but of course I think our plugin API is broken and I rather have
such feature in-tree.  It possibly makes sense for _valgrind_ to host such
a plugin, not so much for GCC itself (because then, just build it in-tree).

As said, I'm nervous about libgcc, everything else is OK I think (didn't look
into the pass in detail yet, but I trust you here).

Richard.

> Thanks.
> Alexander


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
>
> Hi,
>
> on 2024/5/27 20:54, Richard Biener wrote:
> > On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:
> >>
> >> Hi,
> >>   This patch adds an optab for __builtin_isfinite. The finite check can be
> >> implemented on rs6000 by a single instruction. It needs an optab to be
> >> expanded to the certain sequence of instructions.
> >>
> >>   The subsequent patches will implement the expand on rs6000.
> >>
> >>   Compared to previous version, the main change is to specify acceptable
> >> modes for the optab.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >> regressions. Is this OK for trunk?
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> optab: Add isfinite_optab for isfinite builtin
> >>
> >> gcc/
> >> * builtins.cc (interclass_mathfn_icode): Set optab to 
> >> isfinite_optab
> >> for isfinite builtin.
> >> * optabs.def (isfinite_optab): New.
> >> * doc/md.texi (isfinite): Document.
> >>
> >>
> >> patch.diff
> >> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> >> index f8d94c4b435..b8432f84020 100644
> >> --- a/gcc/builtins.cc
> >> +++ b/gcc/builtins.cc
> >> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
> >>errno_set = true; builtin_optab = ilogb_optab; break;
> >>  CASE_FLT_FN (BUILT_IN_ISINF):
> >>builtin_optab = isinf_optab; break;
> >> -case BUILT_IN_ISNORMAL:
> >>  case BUILT_IN_ISFINITE:
> >> +  builtin_optab = isfinite_optab; break;
> >> +case BUILT_IN_ISNORMAL:
> >>  CASE_FLT_FN (BUILT_IN_FINITE):
> >>  case BUILT_IN_FINITED32:
> >>  case BUILT_IN_FINITED64:
> >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >> index 5730bda80dc..67407fad37d 100644
> >> --- a/gcc/doc/md.texi
> >> +++ b/gcc/doc/md.texi
> >> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
> >> with operand 2.
> >>
> >>  This pattern is not allowed to @code{FAIL}.
> >>
> >> +@cindex @code{isfinite@var{m}2} instruction pattern
> >> +@item @samp{isfinite@var{m}2}
> >> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> >> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> >
> > It should probably say scalar floating-point mode?  But what about the 
> > result?
> > Is any integer mode OK?  That's esp. important if this might be used on
> > vector modes.
> >
> >> +otherwise.
> >> +
> >> +If this pattern @code{FAIL}, a call to the library function
> >> +@code{isfinite} is used.
> >
> > Or it's otherwise inline expanded?  Or does this imply targets
> > have to make sure to implement the pattern when isfinite is
> > not available in libc/libm?  I suggest to leave this sentence out,
> > we usually only say when a pattern may _not_ FAIL (and usually
> > FAILing isn't different from not providing a pattern).
>
> As Haochen's previous reply, I think there are three cases:
>   1) no optab defined, fold in a generic way;
>   2) optab defined, SUCC, expand as what it defines;
>   3) optab defined, FAIL, generate a library call;
>
> From above, I had the concern that ports may assume FAILing can
> fall back with the generic folding, but it's not actually.

Hmm, but it should.  Can you make that work?

> Does your comment imply ports usually don't make such assumption
> (or they just check what happens for FAIL)?
>
> BR,
> Kewen
>
> >
> >>  @end table
> >>
> >>  @end ifset
> >> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >> index ad14f9328b9..dcd77315c2a 100644
> >> --- a/gcc/optabs.def
> >> +++ b/gcc/optabs.def
> >> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
> >>  OPTAB_D (hypot_optab, "hypot$a3")
> >>  OPTAB_D (ilogb_optab, "ilogb$a2")
> >>  OPTAB_D (isinf_optab, "isinf$a2")
> >> +OPTAB_D (isfinite_optab, "isfinite$a2")
> >>  OPTAB_D (issignaling_optab, "issignaling$a2")
> >>  OPTAB_D (ldexp_optab, "ldexp$a3")
> >>  OPTAB_D (log10_optab, "log10$a2")
>
>
>


[Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-28 Thread Andre Vehreschild
Hi all,

the attached patch fixes a memory leak with unlimited polymorphic return types.
The leak occurred, because an expression with side-effects was evaluated twice.
I have substituted the check for non-variable expressions followed by creating a
SAVE_EXPR with checking for trees with side effects and creating temp. variable
and freeing the memory.

Btw, I do not get the SAVE_EXPR in the old code. Is there something missing to
manifest it or is a SAVE_EXPR not meant to be evaluated twice?

Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?

This work is funded by the Souvereign Tech Fund. Yes, the funding has been
granted and Nicolas, Mikael and me will be working on some Fortran topics in
the next 12-18 months.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From edd6c94b802732b0dd742ef9eca4d74aaaf6d91b Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 12 Jul 2023 16:52:15 +0200
Subject: [PATCH] Fix memory leak.

Prevent double call of function return class object
and free the object after copy.

gcc/fortran/ChangeLog:

	PR fortran/90069
	* trans-expr.cc (gfc_conv_procedure_call): Evaluate
	expressions with side-effects only ones and ensure
	old is freeed.

gcc/testsuite/ChangeLog:

	PR fortran/90069
	* gfortran.dg/class_76.f90: New test.
---
 gcc/fortran/trans-expr.cc  | 29 +--
 gcc/testsuite/gfortran.dg/class_76.f90 | 66 ++
 2 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/class_76.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index dfc5b8e9b4a..38ba278f725 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6725,9 +6725,32 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			{
 			  tree efield;

-			  /* Evaluate arguments just once.  */
-			  if (e->expr_type != EXPR_VARIABLE)
-parmse.expr = save_expr (parmse.expr);
+			  /* Evaluate arguments just once, when they have
+			 side effects.  */
+			  if (TREE_SIDE_EFFECTS (parmse.expr))
+{
+  tree cldata, zero;
+
+  parmse.expr = gfc_evaluate_now (parmse.expr,
+  &parmse.pre);
+
+  /* Prevent memory leak, when old component
+ was allocated already.  */
+  cldata = gfc_class_data_get (parmse.expr);
+  zero = build_int_cst (TREE_TYPE (cldata),
+			0);
+  tmp = fold_build2_loc (input_location, NE_EXPR,
+			 logical_type_node,
+			 cldata, zero);
+  tmp = build3_v (COND_EXPR, tmp,
+		  gfc_call_free (cldata),
+		  build_empty_stmt (
+		input_location));
+  gfc_add_expr_to_block (&parmse.finalblock,
+			 tmp);
+  gfc_add_modify (&parmse.finalblock,
+		  cldata, zero);
+}

 			  /* Set the _data field.  */
 			  tmp = gfc_class_data_get (var);
diff --git a/gcc/testsuite/gfortran.dg/class_76.f90 b/gcc/testsuite/gfortran.dg/class_76.f90
new file mode 100644
index 000..1ee1e1fc25f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_76.f90
@@ -0,0 +1,66 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/90069
+!
+! Contributed by Brad Richardson  
+!
+
+program returned_memory_leak
+implicit none
+
+type, abstract :: base
+end type base
+
+type, extends(base) :: extended
+end type extended
+
+type :: container
+class(*), allocatable :: thing
+end type
+
+call run()
+contains
+subroutine run()
+type(container) :: a_container
+
+a_container = theRightWay()
+a_container = theWrongWay()
+end subroutine
+
+function theRightWay()
+type(container) :: theRightWay
+
+class(base), allocatable :: thing
+
+allocate(thing, source = newAbstract())
+theRightWay = newContainer(thing)
+end function theRightWay
+
+function theWrongWay()
+type(container) :: theWrongWay
+
+theWrongWay = newContainer(newAbstract())
+end function theWrongWay
+
+function  newAbstract()
+class(base), allocatable :: newAbstract
+
+allocate(newAbstract, source = newExtended())
+end function newAbstract
+
+function newExtended()
+type(extended) :: newExtended
+end function newExtended
+
+function newContainer(thing)
+class(*), intent(in) :: thing
+type(container) :: newContainer
+
+allocate(newContainer%thing, source = thing)
+end function newContainer
+end program returned_memory_leak
+
+! { dg-final { scan-tree-dump-times "newabstract" 14 "original" } }
+! { dg-final { scan-tree-dump-times "__builtin_free" 8 "original" } }
+
--
2.45.1



[PATCH] Avoid pessimistic constraints for asm memory constraints

2024-05-28 Thread Richard Biener
We process asm memory input/outputs with constraints to ESCAPED
but for this temporarily build an ADDR_EXPR.  The issue is that
the used build_fold_addr_expr ends up wrapping the ADDR_EXPR in
a conversion which ends up producing &ANYTHING constraints which
is quite bad.  The following uses get_constraint_for_address_of
instead, avoiding the temporary tree and the unhandled conversion.

This avoids a gcc.dg/tree-ssa/restrict-9.c FAIL with the fix
for PR115236.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-ssa-structalias.cc (find_func_aliases): Use
get_constraint_for_address_of to build escape constraints
for asm inputs and outputs.
---
 gcc/tree-ssa-structalias.cc | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 53552b63532..330e64e65da 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -5277,7 +5277,11 @@ find_func_aliases (struct function *fn, gimple *origt)
 
  /* A memory constraint makes the address of the operand escape.  */
  if (!allows_reg && allows_mem)
-   make_escape_constraint (build_fold_addr_expr (op));
+   {
+ auto_vec tmpc;
+ get_constraint_for_address_of (op, &tmpc);
+ make_constraints_to (escaped_id, tmpc);
+   }
 
  /* The asm may read global memory, so outputs may point to
 any global memory.  */
@@ -5306,7 +5310,11 @@ find_func_aliases (struct function *fn, gimple *origt)
 
  /* A memory constraint makes the address of the operand escape.  */
  if (!allows_reg && allows_mem)
-   make_escape_constraint (build_fold_addr_expr (op));
+   {
+ auto_vec tmpc;
+ get_constraint_for_address_of (op, &tmpc);
+ make_constraints_to (escaped_id, tmpc);
+   }
  /* Strictly we'd only need the constraint to ESCAPED if
 the asm clobbers memory, otherwise using something
 along the lines of per-call clobbers/uses would be enough.  */
-- 
2.35.3


[PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-28 Thread Richard Biener
Code generation for contiguous load vectorization can already deal
with generalized avoidance of loading from a gap.  The following
extends detection of peeling for gaps requirement with that,
gets rid of the old special casing of a half load and makes sure
when we do access the gap we have peeling for gaps enabled.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This is the first patch in a series to improve peeling for gaps,
it turned out into an improvement for code rather than just doing
the (delayed from stage3) removal of the "old" half-vector codepath.

I'll wait for the pre-CI testing for pushing so you also have time
for some comments.

Richard.

PR tree-optimization/115252
* tree-vect-stmts.cc (get_group_load_store_type): Enhance
detecting the number of cases where we can avoid accessing a gap
during code generation.
(vectorizable_load): Remove old half-vector peeling for gap
avoidance which is now redundant.  Add gap-aligned case where
it's OK to access the gap.  Add assert that we have peeling for
gaps enabled when we access a gap.

* gcc.dg/vect/slp-gap-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +
 gcc/tree-vect-stmts.cc| 58 +--
 2 files changed, 46 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
new file mode 100644
index 000..36463ca22c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+typedef unsigned char uint8_t;
+typedef short int16_t;
+void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
+  for (int y = 0; y < 4; y++) {
+for (int x = 0; x < 4; x++)
+  diff[x + y * 4] = pix1[x] - pix2[x];
+pix1 += 16;
+pix2 += 32;
+  }
+}
+
+/* We can vectorize this without peeling for gaps and thus without epilogue,
+   but the only thing we can reliably scan is the zero-padding trick for the
+   partial loads.  */
+/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
vect64 } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a01099d3456..b26cc74f417 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
  dr_alignment_support alss;
  int misalign = dr_misalignment (first_dr_info, vectype);
  tree half_vtype;
+ poly_uint64 remain;
+ unsigned HOST_WIDE_INT tem, num;
  if (overrun_p
  && !masked_p
  && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
  vectype, misalign)))
   == dr_aligned
  || alss == dr_unaligned_supported)
- && known_eq (nunits, (group_size - gap) * 2)
- && known_eq (nunits, group_size)
- && (vector_vector_composition_type (vectype, 2, &half_vtype)
- != NULL_TREE))
+ && can_div_trunc_p (group_size
+ * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
+ nunits, &tem, &remain)
+ && (known_eq (remain, 0u)
+ || (constant_multiple_p (nunits, remain, &num)
+ && (vector_vector_composition_type (vectype, num,
+ &half_vtype)
+ != NULL_TREE
overrun_p = false;
 
  if (overrun_p && !can_overrun_p)
@@ -11533,33 +11539,14 @@ vectorizable_load (vec_info *vinfo,
unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
unsigned int vect_align
  = vect_known_alignment_in_bytes (first_dr_info, vectype);
-   unsigned int scalar_dr_size
- = vect_get_scalar_dr_size (first_dr_info);
-   /* If there's no peeling for gaps but we have a gap
-  with slp loads then load the lower half of the
-  vector only.  See get_group_load_store_type for
-  when we apply this optimization.  */
-   if (slp
-   && loop_vinfo
-   && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && gap != 0
-   && known_eq (nunits, (group_size - gap) * 2)
-   && known_eq (nunits, group_size)
-   && gap >= (vect_align / scalar_dr_size))
- {
-   tree half_vtype;
-   new_vtype
- = vector_vector_composition_type (vectype, 2,
- 

Re: [wwwdocs][patch] gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

2024-05-28 Thread FX Coudert
Seems good, thanks Tobias!

FX


[PATCH] tree-optimization/115236 - more points-to *ANYTHING = x fixes

2024-05-28 Thread Richard Biener
The stored-to ANYTHING handling has more holes, uncovered by treating
volatile accesses as ANYTHING.  We fail to properly build the
pred and succ graphs, in particular we may not elide direct nodes
from receiving from STOREDANYTHING.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115236
* tree-ssa-structalias.cc (build_pred_graph): Properly
handle *ANYTHING = X.
(build_succ_graph): Likewise.  Do not elide direct nodes
from receiving from STOREDANYTHING.

* gcc.dg/pr115236.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr115236.c | 12 
 gcc/tree-ssa-structalias.cc | 20 ++--
 2 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr115236.c

diff --git a/gcc/testsuite/gcc.dg/pr115236.c b/gcc/testsuite/gcc.dg/pr115236.c
new file mode 100644
index 000..91edfab957a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115236.c
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-fre" } */
+
+int a, *b = &a;
+int main()
+{
+  int *c, *volatile *d = &c;
+  *d = b;
+  if (c != &a)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9cec2c6cfd9..330e64e65da 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -1312,7 +1312,12 @@ build_pred_graph (void)
{
  /* *x = y.  */
  if (rhs.offset == 0 && lhs.offset == 0 && rhs.type == SCALAR)
-   add_pred_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   {
+ if (lhs.var == anything_id)
+   add_pred_graph_edge (graph, storedanything_id, rhsvar);
+ else
+   add_pred_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   }
}
   else if (rhs.type == DEREF)
{
@@ -1398,7 +1403,12 @@ build_succ_graph (void)
   if (lhs.type == DEREF)
{
  if (rhs.offset == 0 && lhs.offset == 0 && rhs.type == SCALAR)
-   add_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   {
+ if (lhs.var == anything_id)
+   add_graph_edge (graph, storedanything_id, rhsvar);
+ else
+   add_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   }
}
   else if (rhs.type == DEREF)
{
@@ -1418,13 +1428,11 @@ build_succ_graph (void)
}
 }
 
-  /* Add edges from STOREDANYTHING to all non-direct nodes that can
- receive pointers.  */
+  /* Add edges from STOREDANYTHING to all nodes that can receive pointers.  */
   t = find (storedanything_id);
   for (i = integer_id + 1; i < FIRST_REF_NODE; ++i)
 {
-  if (!bitmap_bit_p (graph->direct_nodes, i)
- && get_varinfo (i)->may_have_pointers)
+  if (get_varinfo (i)->may_have_pointers)
add_graph_edge (graph, find (i), t);
 }
 
-- 
2.35.3


Re: [committed v2 0/2] VAX: Fix issues with FP format option documentation

2024-05-28 Thread Mark Wielaard
Hi Maciej (Hi David, added to CC),

On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
>  As reported in PR target/79646 and fixed by a change proposed by Abe we 
> have a couple of issues with the descriptions of the VAX floating-point 
> format options in the option definition file.  Additionally most of these 
> options are not documented in the manual.
> 
>  This mini patch series addresses these issues, including Abe's change, 
> slightly updated, and my new change.  See individual change descriptions 
> for details.
> 
>  Verified by inspecting output produced by `vax-netbsdelf-gcc -v --help' 
> and by eyeballing `gcc.info' and `gcc.pdf' files produced.  Committed.

This broke the gcc-autoregen checker because the
gcc/config/vax/vax.opt.urls file wasn't regenerated:
https://builder.sourceware.org/buildbot/#/builders/269/builds/5347

Producing the following diff:

diff --git a/gcc/config/vax/vax.opt.urls b/gcc/config/vax/vax.opt.urls
index c6b1c418b61..ca78b31dd4c 100644
--- a/gcc/config/vax/vax.opt.urls
+++ b/gcc/config/vax/vax.opt.urls
@@ -1,7 +1,13 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/vax/vax.opt and 
generated HTML
 
+; skipping UrlSuffix for 'md' due to finding no URLs
+
+; skipping UrlSuffix for 'md-float' due to finding no URLs
+
 ; skipping UrlSuffix for 'mg' due to finding no URLs
 
+; skipping UrlSuffix for 'mg-float' due to finding no URLs
+
 ; skipping UrlSuffix for 'mgnu' due to finding no URLs
 
 ; skipping UrlSuffix for 'munix' due to finding no URLs

I am not completely clear on why though. Since it seems you actually
did add documentation for exactly these options.

David, should the above diff just be checked in, or do we need to
investigate why the URLs weren't found?

Cheers,

Mark


Re: [PATCH] libstdc++: Avoid MMX return types from __builtin_shufflevector

2024-05-28 Thread Jonathan Wakely
On Wed, 15 May 2024 at 20:50, Matthias Kretz  wrote:
>
> Tested on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu,
> x86_64-linux-gnu (-m64, -m32, -mx32), and arm-linux-gnueabi
>
> OK for trunk?

OK

> And when backporting, should I squash it with the commit that
> introduced the regression?

I don't mind about that. If you cherry-pick them next to each other
and push them at the same time, nobody's going to end up using the
broken commit before the fix. It's fine to squash it if you prefer to
though.

OK for backports either way.

>
>  8< ---
>
> This resolves a regression on i686 that was introduced with
> r15-429-gfb1649f8b4ad50.
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/114958
> * include/experimental/bits/simd.h (__as_vector): Don't use
> vector_size(8) on __i386__.
> (__vec_shuffle): Never return MMX vectors, widen to 16 bytes
> instead.
> (concat): Fix padding calculation to pick up widening logic from
> __as_vector.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 39 +--
>  1 file changed, 28 insertions(+), 11 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──


[PATCH] target/115254 - fix gcc.dg/vect/vect-gather-4.c dump scanning

2024-05-28 Thread Richard Biener
The dump scanning is supposed to check that we do not merge two
sligtly different gathers into one SLP node but since we now
SLP the store scanning for "ectorizing stmts using SLP" is no
longer good.  Instead the following makes us look for
"stmt 1 .* = .MASK" which would be how the second lane of an SLP
node looks like.  We have to handle both .MASK_GATHER_LOAD (for
targets with ifun mask gathers) and .MASK_LOAD (for ones without).

Tested on x86_64-linux with and without native gather and on GCN
where this now avoids a FAIL.

Pushed.

PR target/115254
* gcc.dg/vect/vect-gather-4.c: Adjust dump scan.
---
 gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
index d18094d6982..edd9a6783c2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
@@ -45,4 +45,7 @@ f3 (int *restrict y, int *restrict x, int *restrict indices)
 }
 }
 
-/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" vect } } */
+/* We do not want to see a two-lane .MASK_LOAD or .MASK_GATHER_LOAD since
+   the gathers are different on each lane.  This is a bit fragile and
+   should possibly be turned into a runtime test.  */
+/* { dg-final { scan-tree-dump-not "stmt 1 \[^\r\n\]* = .MASK" vect } } */
-- 
2.35.3


Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Feng Xue OS
Changed as the comments.

Thanks,
Feng


From: Richard Biener 
Sent: Tuesday, May 28, 2024 5:34 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Use vect representative statement instead of 
original in patch recog [PR115060]

On Sat, May 25, 2024 at 4:45 PM Feng Xue OS  wrote:
>
> Some utility functions (such as vect_look_through_possible_promotion) that are
> to find out certain kind of direct or indirect definition SSA for a value, may
> return the original one of the SSA, not its pattern representative SSA, even
> pattern is involved. For example,
>
>a = (T1) patt_b;
>patt_b = (T2) c;// b = ...
>patt_c = not-a-cast;// c = ...
>
> Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
> subtlety would make some pattern recog code that is unaware of it mis-use the
> original instead of the new pattern statement, which is inconsistent wth
> processing logic of the pattern formation pass. This patch corrects the issue
> by forcing another utility function (vect_get_internal_def) return the pattern
> statement information to caller by default.
>
> Regression test on x86-64 and aarch64.
>
> Feng
> --
> gcc/
> PR tree-optimization/115060
> * tree-vect-patterns.h (vect_get_internal_def): Add a new parameter
> for_vectorize.
> (vect_widened_op_tree): Call vect_get_internal_def instead of look_def
> to get statement information.
> (vect_recog_widen_abd_pattern): No need to call 
> vect_stmt_to_vectorize.
> ---
>  gcc/tree-vect-patterns.cc | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index a313dc64643..fa35bf26372 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -258,15 +258,21 @@ vect_element_precision (unsigned int precision)
>  }
>
>  /* If OP is defined by a statement that's being considered for vectorization,
> -   return information about that statement, otherwise return NULL.  */
> +   return information about that statement, otherwise return NULL.
> +   FOR_VECTORIZE is used to specify whether original or vectorization
> +   representative (if have) statement information is returned.  */
>
>  static stmt_vec_info
> -vect_get_internal_def (vec_info *vinfo, tree op)
> +vect_get_internal_def (vec_info *vinfo, tree op, bool for_vectorize = true)

I'm probably blind - but you nowhere pass 'false' and I think returning the
pattern stmt is the correct behavior always.

OK with omitting the new parameter.

>  {
>stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
>if (def_stmt_info
>&& STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def)
> -return def_stmt_info;
> +{
> +  if (for_vectorize)
> +   def_stmt_info = vect_stmt_to_vectorize (def_stmt_info);
> +  return def_stmt_info;
> +}
>return NULL;
>  }
>
> @@ -655,7 +661,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>
>   /* Recursively process the definition of the operand.  */
>   stmt_vec_info def_stmt_info
> -   = vinfo->lookup_def (this_unprom->op);
> +   = vect_get_internal_def (vinfo, this_unprom->op);
> +
>   nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>widened_code, shift_p, max_nops,
>this_unprom, common_type,
> @@ -1739,7 +1746,6 @@ vect_recog_widen_abd_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>if (!abd_pattern_vinfo)
>  return NULL;
>
> -  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
>gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
>if (!abd_stmt


[PATCH] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

2024-05-28 Thread Rainer Orth
Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
and 64-bit), Solaris/x86 (32-bit only), and several other targets:

FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++26 execution test

As it turns out, both the copy of libbacktrace in libstdc++ and the
testcases proper need to compiled with -funwind-tables, as is done for
libbacktrace itself.

This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
always defaults to -funwind-tables.  32-bit x86 does, too, when
-fomit-frame-pointer is enabled as on Linux/i686, but unlike
Solaris/i386.

So this patch always enables the option both for the libbacktrace copy
and the testcases.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-23  Rainer Orth  

libstdc++-v3:
PR libstdc++/111641
* src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables.
* src/libbacktrace/Makefile.in: Regenerate.

* testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add
-funwind-tables.
* testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.

# HG changeset patch
# Parent  a0526be1377da6b48eacbdd53f1d0e0b02ddb731
libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

diff --git a/libstdc++-v3/src/libbacktrace/Makefile.am b/libstdc++-v3/src/libbacktrace/Makefile.am
--- a/libstdc++-v3/src/libbacktrace/Makefile.am
+++ b/libstdc++-v3/src/libbacktrace/Makefile.am
@@ -51,7 +51,7 @@ C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-pr
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = \
 	$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-	$(C_WARN_FLAGS)
+	$(C_WARN_FLAGS) -funwind-tables
 AM_CFLAGS += $(EXTRA_CFLAGS)
 AM_CXXFLAGS = \
 	$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
diff --git a/libstdc++-v3/src/libbacktrace/Makefile.in b/libstdc++-v3/src/libbacktrace/Makefile.in
--- a/libstdc++-v3/src/libbacktrace/Makefile.in
+++ b/libstdc++-v3/src/libbacktrace/Makefile.in
@@ -473,7 +473,7 @@ libstdc___libbacktrace_la_CPPFLAGS = \
 C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wno-unused-but-set-variable
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-	$(C_WARN_FLAGS) $(EXTRA_CFLAGS)
+	$(C_WARN_FLAGS) -funwind-tables $(EXTRA_CFLAGS)
 AM_CXXFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
 	$(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions $(EXTRA_CXXFLAGS)
 obj_prefix = std_stacktrace
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/s

Re: [PATCH] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

2024-05-28 Thread Jonathan Wakely
On Tue, 28 May 2024 at 15:25, Rainer Orth  wrote:
>
> Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
> and 64-bit), Solaris/x86 (32-bit only), and several other targets:
>
> FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++26 execution test
>
> As it turns out, both the copy of libbacktrace in libstdc++ and the
> testcases proper need to compiled with -funwind-tables, as is done for
> libbacktrace itself.
>
> This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
> always defaults to -funwind-tables.  32-bit x86 does, too, when
> -fomit-frame-pointer is enabled as on Linux/i686, but unlike
> Solaris/i386.
>
> So this patch always enables the option both for the libbacktrace copy
> and the testcases.
>
> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
> x86_64-pc-linux-gnu.
>
> Ok for trunk?

OK for trunk and gcc-14. Thanks for figuring out the problem here!


>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-05-23  Rainer Orth  
>
> libstdc++-v3:
> PR libstdc++/111641
> * src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables.
> * src/libbacktrace/Makefile.in: Regenerate.
>
> * testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add
> -funwind-tables.
> * testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.
>



[PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-05-28 Thread David Malcolm
> On Tue, 2024-05-28 at 15:03 +0200, Mark Wielaard wrote:
> Hi Maciej (Hi David, added to CC),

>On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
> >  As reported in PR target/79646 and fixed by a change proposed by
> > Abe we 
> > have a couple of issues with the descriptions of the VAX
> > floating-point 
> > format options in the option definition file.  Additionally most of
> > these 
> > options are not documented in the manual.
> > 
> >  This mini patch series addresses these issues, including Abe's
> > change, 
> > slightly updated, and my new change.  See individual change
> > descriptions 
> > for details.
> > 
> >  Verified by inspecting output produced by `vax-netbsdelf-gcc -v
> > --help' 
> > and by eyeballing `gcc.info' and `gcc.pdf' files produced. 
> > Committed.
>
> This broke the gcc-autoregen checker because the
> gcc/config/vax/vax.opt.urls file wasn't regenerated:
> https://builder.sourceware.org/buildbot/#/builders/269/builds/5347
> 
> Producing the following diff:
> 
> diff --git a/gcc/config/vax/vax.opt.urls
> b/gcc/config/vax/vax.opt.urls
> index c6b1c418b61..ca78b31dd4c 100644
> --- a/gcc/config/vax/vax.opt.urls
> +++ b/gcc/config/vax/vax.opt.urls
> @@ -1,7 +1,13 @@
>  ; Autogenerated by regenerate-opt-urls.py from
> gcc/config/vax/vax.opt and generated HTML
>  
> +; skipping UrlSuffix for 'md' due to finding no URLs
> +
> +; skipping UrlSuffix for 'md-float' due to finding no URLs
> +
>  ; skipping UrlSuffix for 'mg' due to finding no URLs
>  
> +; skipping UrlSuffix for 'mg-float' due to finding no URLs
> +
>  ; skipping UrlSuffix for 'mgnu' due to finding no URLs
>  
>  ; skipping UrlSuffix for 'munix' due to finding no URLs
> 
> I am not completely clear on why though. Since it seems you actually
> did add documentation for exactly these options.
> 
> David, should the above diff just be checked in, or do we need to
> investigate why the URLs weren't found?

[adding Nick, re the v850 target]

I found the problem - I messed up when I was populating
TARGET_SPECIFIC_PAGES in regenerate-opt-urls.py, accidentally
transposing the entries for v850 and vax by writing:

'gcc/V850-Options.html' : 'gcc/config/vax/',
'gcc/VAX-Options.html' : 'gcc/config/v850/',

leading to both gcc/config/v850/v850.opt.urls and
gcc/config/vax/vax.opt.urls being full of such comments.

Sorry.

Fixing that leads to the files for both targets being populated with
correct-looking URL entries.

I'll push this to trunk (and backport to gcc 14) after suitable testing.

Dave

gcc/ChangeLog:
* config/v850/v850.opt.urls: Regenerate, with fix.
* config/vax/vax.opt.urls: Likewise.
* regenerate-opt-urls.py (TARGET_SPECIFIC_PAGES): Fix transposed
values for "vax" and "v850".

Signed-off-by: David Malcolm 
---
 gcc/config/v850/v850.opt.urls | 81 +++
 gcc/config/vax/vax.opt.urls   | 21 +++--
 gcc/regenerate-opt-urls.py|  4 +-
 3 files changed, 73 insertions(+), 33 deletions(-)

diff --git a/gcc/config/v850/v850.opt.urls b/gcc/config/v850/v850.opt.urls
index dc5a83107b3..a06f4833f47 100644
--- a/gcc/config/v850/v850.opt.urls
+++ b/gcc/config/v850/v850.opt.urls
@@ -1,60 +1,87 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/v850/v850.opt and 
generated HTML
 
-; skipping UrlSuffix for 'mapp-regs' due to finding no URLs
+mapp-regs
+UrlSuffix(gcc/V850-Options.html#index-mapp-regs-1)
 
-; skipping UrlSuffix for 'mbig-switch' due to finding no URLs
+mbig-switch
+UrlSuffix(gcc/V850-Options.html#index-mbig-switch-1)
 
 ; skipping UrlSuffix for 'mdebug' due to finding no URLs
 
-; skipping UrlSuffix for 'mdisable-callt' due to finding no URLs
+mdisable-callt
+UrlSuffix(gcc/V850-Options.html#index-mdisable-callt)
 
-; skipping UrlSuffix for 'mep' due to finding no URLs
+mep
+UrlSuffix(gcc/V850-Options.html#index-mep)
 
-; skipping UrlSuffix for 'mghs' due to finding no URLs
+mghs
+UrlSuffix(gcc/V850-Options.html#index-mghs)
 
-; skipping UrlSuffix for 'mlong-calls' due to finding no URLs
+mlong-calls
+UrlSuffix(gcc/V850-Options.html#index-mlong-calls-7)
 
-; skipping UrlSuffix for 'mprolog-function' due to finding no URLs
+mprolog-function
+UrlSuffix(gcc/V850-Options.html#index-mprolog-function)
 
-; skipping UrlSuffix for 'msda=' due to finding no URLs
+msda=
+UrlSuffix(gcc/V850-Options.html#index-msda)
 
-; skipping UrlSuffix for 'mspace' due to finding no URLs
+mspace
+UrlSuffix(gcc/V850-Options.html#index-mspace)
 
-; skipping UrlSuffix for 'mtda=' due to finding no URLs
+mtda=
+UrlSuffix(gcc/V850-Options.html#index-mtda)
 
 ; skipping UrlSuffix for 'mno-strict-align' due to finding no URLs
 
-; skipping UrlSuffix for 'mv850' due to finding no URLs
+mv850
+UrlSuffix(gcc/V850-Options.html#index-mv850)
 
-; skipping UrlSuffix for 'mv850e' due to finding no URLs
+mv850e
+UrlSuffix(gcc/V850-Options.html#index-mv850e)
 
-; skipping UrlSuffix for 'mv850e1' due to finding no URLs
+mv850e1
+UrlSuffix(gcc/V850-Options.html#index-mv850

[PATCH] MIPS/testsuite: Fix bseli.b fail in msa-builtins.c

2024-05-28 Thread YunQiang Su
commit 05daf617ea22e1d818295ed2d037456937e23530
Author: Jeff Law 
Date:   Sat May 25 12:39:05 2024 -0600

[committed] [v2] More logical op simplifications in simplify-rtx.cc

does some simplifications, and then `bseli.b $w1,$w0,255` is found that
it is same with `or.v $w1,$w0,$w1`. So there will be no bseli.b instruction
generated.

Let's use 254 instead of 255 to test the generation of `bseli.b`.

gcc/testsuite

* gcc.target/mips/msa-builtins.c: Use 254 instead of 255 for
bseli.b, as `bseli.b $w0,$w1,255` is same as `or.v $w0,$w0,$w1`.
---
 gcc/testsuite/gcc.target/mips/msa-builtins.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/mips/msa-builtins.c 
b/gcc/testsuite/gcc.target/mips/msa-builtins.c
index a679f065f34..6a146b3e6ae 100644
--- a/gcc/testsuite/gcc.target/mips/msa-builtins.c
+++ b/gcc/testsuite/gcc.target/mips/msa-builtins.c
@@ -705,7 +705,7 @@
 #define BNEG(T) NOMIPS16 T FN (bneg, T ## _DF) (T i, T j) { return BUILTIN 
(bneg, T ## _DF) (i, j); }
 #define BNEGI(T) NOMIPS16 T FN (bnegi, T ## _DF) (T i) { return BUILTIN 
(bnegi, T ## _DF) (i, 0); }
 #define BSEL(T) NOMIPS16 T FN (bsel, v) (T i, T j, T k) { return BUILTIN 
(bsel, v) (i, j, k); }
-#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN 
(bseli, T ## _DF) (i, j, U8MAX); }
+#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN 
(bseli, T ## _DF) (i, j, U8MAX-1); }
 #define BSET(T) NOMIPS16 T FN (bset, T ## _DF) (T i, T j) { return BUILTIN 
(bset, T ## _DF) (i, j); }
 #define BSETI(T) NOMIPS16 T FN (bseti, T ## _DF) (T i) { return BUILTIN 
(bseti, T ## _DF) (i, 0); }
 #define NLOC(T) NOMIPS16 T FN (nloc, T ## _DF) (T i) { return BUILTIN (nloc, T 
## _DF) (i); }
-- 
2.39.2



Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-28 Thread Feng Xue OS
Because bbs of loop_vec_info need to be allocated via old-fashion
XCNEWVEC, in order to receive result from dfs_enumerate_from(),
so have to make bb_vec_info align with loop_vec_info, use
basic_block * instead of vec. Another reason is that
some loop vect related codes assume that bbs is a pointer, such
as using LOOP_VINFO_BBS() to directly free the bbs area.

While encapsulating bbs into array_slice might make changed code
more wordy. So still choose basic_block * as its type. Updated the
patch by removing bbs_as_vector.

Feng.

gcc/
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
initialization of bbs to explicit construction code.  Adjust the
definition of nbbs.
(update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
* tree-vect-pattern.cc (vect_determine_precisions): Make
loop_vec_info and bb_vec_info share same code.
(vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
via base vec_info class.
(_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
fields of input auto_vec<> bbs.
(vect_slp_region): Use access to nbbs to replace original
bbs.length().
(vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
* tree-vectorizer.cc (vec_info::vec_info): Add initialization of
bbs and nbbs.
(vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
class.
* tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
(LOOP_VINFO_NBBS): New macro.
(BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
(BB_VINFO_NBBS): New macro.
(_loop_vec_info): Remove field bbs.
(_bb_vec_info): Rename field bbs.
---
 gcc/tree-vect-loop.cc |   7 +-
 gcc/tree-vect-patterns.cc | 142 +++---
 gcc/tree-vect-slp.cc  |  23 +++---
 gcc/tree-vectorizer.cc|   7 +-
 gcc/tree-vectorizer.h |  19 +++--
 5 files changed, 70 insertions(+), 128 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3b94bb13a8b..04a9ac64df7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
 _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
   : vec_info (vec_info::loop, shared),
 loop (loop_in),
-bbs (XCNEWVEC (basic_block, loop->num_nodes)),
 num_itersm1 (NULL_TREE),
 num_iters (NULL_TREE),
 num_iters_unchanged (NULL_TREE),
@@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
  case of the loop forms we allow, a dfs order of the BBs would the same
  as reversed postorder traversal, so we are safe.  */

-  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
- bbs, loop->num_nodes, loop);
+  bbs = XCNEWVEC (basic_block, loop->num_nodes);
+  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+loop->num_nodes, loop);
   gcc_assert (nbbs == loop->num_nodes);

   for (unsigned int i = 0; i < nbbs; i++)
@@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)

   free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
+  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;

   /* Advance data_reference's with the number of iterations of the previous
  loop and its prologue.  */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 8929e5aa7f3..88e7e34d78d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
stmt_vec_info stmt_info)
 void
 vect_determine_precisions (vec_info *vinfo)
 {
+  basic_block *bbs = vinfo->bbs;
+  unsigned int nbbs = vinfo->nbbs;
+
   DUMP_VECT_SCOPE ("vect_determine_precisions");

-  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
+  for (unsigned int i = 0; i < nbbs; i++)
 {
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  unsigned int nbbs = loop->num_nodes;
-
-  for (unsigned int i = 0; i < nbbs; i++)
+  basic_block bb = bbs[i];
+  for (auto gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
- basic_block bb = bbs[i];
- for (auto gsi = gsi_start_phis (bb);
-  !gsi_end_p (gsi); gsi_next (&gsi))
-   {
- stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
- if (stmt_info)
-   vect_determine_mask_precision (vinfo, stmt_info);
-   }
- for (auto si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
-   if (!is_gimple_debug (gsi_stmt (si)))
- vect_determine_mask_precision
- 

Re: [PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-28 Thread Richard Sandiford
Andrew Carlotti  writes:
> The existing implementation of this function was convoluted, and had
> multiple control flow errors that became apparent to me while reading
> the code:
>
> 1. The initial early return only checked the properties of the first
> exclusion in the list, when these properties could be different for
> subsequent exclusions.
>
> 2. excl was not reset within the outer loop, so the inner loop body
> would only execute during the first iteration of the outer loop.  This
> effectively meant that the value of attrs[1] was ignored.
>
> 3. The function called itself recursively twice, with both last_decl and
> TREE_TYPE (last_decl) as parameters. The second recursive call should
> have been redundant, since attrs[1] = TREE_TYPE (last_decl) during the
> first recursive call.

Thanks for doing this.  Agree with the above.

> This patch eliminated the early return, and combines the checks with
> those present within the inner loop.  It also fixes the inner loop
> initialisation, and modifies the outer loop to iterate over nodes
> instead of their attributes. This latter change allows the recursion to
> be eliminated, by extending the new nodes array to include last_decl
> (and its type) as well.
>
> This patch provides an alternative fix for PR114634, although I wasn't
> aware of that issue until rebasing on top of Jakub's fix.
>
> I am not aware of any other compiler bugs resulting from these issues.
> However, if the exclusions for target_clones were listed in the opposite
> order, then it would have broken detection of the always_inline
> exclusion on aarch64 (where TARGET_HAS_FMV_TARGET_ATTRIBUTE is false).
>
> Is this ok for master?
>
> gcc/ChangeLog:
>
>   * attribs.cc (diag_attr_exclusions): Fix and refactor.
>
>
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 
> 3ab0b0fd87a4404a593b2de365ea5226e31fe24a..431dd4255e68e92dd8d10bbb21ea079e50811faa
>  100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -433,84 +433,69 @@ get_attribute_namespace (const_tree attr)
> or a TYPE.  */
>  
>  static bool
> -diag_attr_exclusions (tree last_decl, tree node, tree attrname,
> +diag_attr_exclusions (tree last_decl, tree base_node, tree attrname,
> const attribute_spec *spec)
>  {
> -  const attribute_spec::exclusions *excl = spec->exclude;
>  
> -  tree_code code = TREE_CODE (node);
> +  /* BASE_NODE is either the current decl to which the attribute is being
> + applied, or its type.  For the former, consider the attributes on both 
> the
> + decl and its type.  Check both LAST_DECL and its type as well.  */
>  
> -  if ((code == FUNCTION_DECL && !excl->function
> -   && (!excl->type || !spec->affects_type_identity))
> -  || (code == VAR_DECL && !excl->variable
> -   && (!excl->type || !spec->affects_type_identity))
> -  || (((code == TYPE_DECL || RECORD_OR_UNION_TYPE_P (node)) && 
> !excl->type)))
> -return false;
> +  tree nodes[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
>  
> -  /* True if an attribute that's mutually exclusive with ATTRNAME
> - has been found.  */
> -  bool found = false;
> +  nodes[0] = base_node;
> +  if (DECL_P (base_node))
> +  nodes[1] = (TREE_TYPE (base_node));

Nit: too much indentation.

> -  if (last_decl && last_decl != node && TREE_TYPE (last_decl) != node)
> +  if (last_decl)
>  {
> -  /* Check both the last DECL and its type for conflicts with
> -  the attribute being added to the current decl or type.  */
> -  found |= diag_attr_exclusions (last_decl, last_decl, attrname, spec);
> -  tree decl_type = TREE_TYPE (last_decl);
> -  found |= diag_attr_exclusions (last_decl, decl_type, attrname, spec);
> +  nodes[2] = last_decl;
> +  if (DECL_P (last_decl))
> +   nodes[3] = TREE_TYPE (last_decl);
>  }
>  
> -  /* NODE is either the current DECL to which the attribute is being
> - applied or its TYPE.  For the former, consider the attributes on
> - both the DECL and its type.  */
> -  tree attrs[2];
> -
> -  if (DECL_P (node))
> -{
> -  attrs[0] = DECL_ATTRIBUTES (node);
> -  if (TREE_TYPE (node))
> - attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
> -  else
> - /* TREE_TYPE can be NULL e.g. while processing attributes on
> -enumerators.  */
> - attrs[1] = NULL_TREE;
> -}
> -  else
> -{
> -  attrs[0] = TYPE_ATTRIBUTES (node);
> -  attrs[1] = NULL_TREE;
> -}
> +  /* True if an attribute that's mutually exclusive with ATTRNAME
> + has been found.  */
> +  bool found = false;
>  
>/* Iterate over the mutually exclusive attribute names and verify
>   that the symbol doesn't contain it.  */
> -  for (unsigned i = 0; i != ARRAY_SIZE (attrs); ++i)
> +  for (unsigned i = 0; i != ARRAY_SIZE (nodes); ++i)
>  {
> -  if (!attrs[i])
> +  tree node = nodes[i];
> +
> +  if (!node)
>   continue;
>  
> -  for ( ; excl->name; ++excl)
> +  tree attr;
> +  if DECL_P 

[pushed] Fix bootstrap on AIX by adding c-family/c-type-mismatch.cc [PR115167]

2024-05-28 Thread David Malcolm
PR bootstrap/115167 reports a bootstrap failure on AIX triggered by
r15-636-g770657d02c986c whilst building f951 in stage 2, due to
the linker not being able to find symbols for:

  vtable for range_label_for_type_mismatch
  range_label_for_type_mismatch::get_text(unsigned int) const

The only users of the class range_label_for_type_mismatch are in the
C/C++ frontends, each of which supply their own implementation of:

  range_label_for_type_mismatch::get_text(unsigned int) const

i.e. we had a cluster of symbols that was disconnnected from any
users on f951.

The above patch added a new range_label::get_effects vfunc to the
base class.  My hunch is that we were getting away with not defining
the symbol for Fortran with AIX's linker before (since none of the
users are used), but adding the get_effects vfunc has somehow broken
things (possibly because there's an empty implementation in the base
class in the *header*).

The following patch moves all of the code in
gcc/gcc-rich-location.[cc,h,o} defining and using
range_label_for_type_mismatch to a new
gcc/c-family/c-type-mismatch.{cc,h,o}, to help the linker ignore this
cluster of symbols when it's disconnected from users.

I was able to reproduce the failure without the patch, and then
successfully bootstrap with this patch on powerpc-ibm-aix7.3.1.0
(cfarm119).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-865-gb544ff88560e10.

gcc/ChangeLog:
PR bootstrap/115167
* Makefile.in (C_COMMON_OBJS): Add c-family/c-type-mismatch.o.
* gcc-rich-location.cc
(maybe_range_label_for_tree_type_mismatch::get_text): Move to
c-family/c-type-mismatch.cc.
(binary_op_rich_location::binary_op_rich_location): Likewise.
(binary_op_rich_location::use_operator_loc_p): Likewise.
* gcc-rich-location.h (class range_label_for_type_mismatch):
Likewise.
(class maybe_range_label_for_tree_type_mismatch): Likewise.
(class op_location_t): Likewise for forward decl.
(class binary_op_rich_location): Likewise.

gcc/c-family/ChangeLog:
PR bootstrap/115167
* c-format.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-type-mismatch.cc: New file, taking material from
gcc-rich-location.cc.
* c-type-mismatch.h: New file, taking material from
gcc-rich-location.h.
* c-warn.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".

gcc/c/ChangeLog:
PR bootstrap/115167
* c-objc-common.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-typeck.cc: Likewise.

gcc/cp/ChangeLog:
PR bootstrap/115167
PR bootstrap/115167
* call.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* error.cc: Likewise.
* typeck.cc: Likewise.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in |   3 +-
 gcc/c-family/c-format.cc|   2 +-
 gcc/c-family/c-type-mismatch.cc | 127 
 gcc/c-family/c-type-mismatch.h  | 126 +++
 gcc/c-family/c-warn.cc  |   2 +-
 gcc/c/c-objc-common.cc  |   2 +-
 gcc/c/c-typeck.cc   |   2 +-
 gcc/cp/call.cc  |   2 +-
 gcc/cp/error.cc |   2 +-
 gcc/cp/typeck.cc|   2 +-
 gcc/gcc-rich-location.cc|  89 --
 gcc/gcc-rich-location.h | 101 -
 12 files changed, 262 insertions(+), 198 deletions(-)
 create mode 100644 gcc/c-family/c-type-mismatch.cc
 create mode 100644 gcc/c-family/c-type-mismatch.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a7f15694c34b..66d42cc41f84 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1301,7 +1301,8 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o \
   c-family/c-ubsan.o c-family/known-headers.o \
-  c-family/c-attribs.o c-family/c-warn.o c-family/c-spellcheck.o
+  c-family/c-attribs.o c-family/c-warn.o c-family/c-spellcheck.o \
+  c-family/c-type-mismatch.o
 
 # Analyzer object files
 ANALYZER_OBJS = \
diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 9c4deabc1095..7a5ffc25602c 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest-diagnostic.h"
 #include "builtins.h"
 #include "attribs.h"
-#include "gcc-rich-location.h"
+#include "c-family/c-type-mismatch.h"
 
 /* Handle attributes associated with format checking.  */
 
diff --git a/gcc/c-family/c-type-mismatch.cc b/gcc/c-family/c-type-mismatch.cc
new file mode 100644
index ..fae31261d544
--- /dev/null
+++ b/gcc/c-family/c

[pushed] diagnostics: disable localization of events in selftest paths [PR115203]

2024-05-28 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-866-g2dbb1c124c1e58.

gcc/ChangeLog:
PR analyzer/115203
* diagnostic-path.h
(simple_diagnostic_path::disable_event_localization): New.
(simple_diagnostic_path::m_localize_events): New field.
* diagnostic.cc
(simple_diagnostic_path::simple_diagnostic_path): Initialize
m_localize_events.
(simple_diagnostic_path::add_event): Only localize fmt if
m_localize_events is true.
* tree-diagnostic-path.cc
(test_diagnostic_path::test_diagnostic_path): Call
disable_event_localization.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-path.h   | 3 +++
 gcc/diagnostic.cc   | 8 +---
 gcc/tree-diagnostic-path.cc | 3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/diagnostic-path.h b/gcc/diagnostic-path.h
index 982d68b872ea..938bd583a3da 100644
--- a/gcc/diagnostic-path.h
+++ b/gcc/diagnostic-path.h
@@ -293,12 +293,15 @@ class simple_diagnostic_path : public diagnostic_path
 
   void connect_to_next_event ();
 
+  void disable_event_localization () { m_localize_events = false; }
+
  private:
   auto_delete_vec m_threads;
   auto_delete_vec m_events;
 
   /* (for use by add_event).  */
   pretty_printer *m_event_pp;
+  bool m_localize_events;
 };
 
 extern void debug (diagnostic_path *path);
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 1f30d1d7cdac..f27b2f1a492c 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -2517,7 +2517,8 @@ set_text_art_charset (enum diagnostic_text_art_charset 
charset)
 /* class simple_diagnostic_path : public diagnostic_path.  */
 
 simple_diagnostic_path::simple_diagnostic_path (pretty_printer *event_pp)
-  : m_event_pp (event_pp)
+: m_event_pp (event_pp),
+  m_localize_events (true)
 {
   add_thread ("main");
 }
@@ -2563,7 +2564,7 @@ simple_diagnostic_path::add_thread (const char *name)
stack depth DEPTH.
 
Use m_context's printer to format FMT, as the text of the new
-   event.
+   event.  Localize FMT iff m_localize_events is set.
 
Return the id of the new event.  */
 
@@ -2580,7 +2581,8 @@ simple_diagnostic_path::add_event (location_t loc, tree 
fndecl, int depth,
 
   va_start (ap, fmt);
 
-  text_info ti (_(fmt), &ap, 0, nullptr, &rich_loc);
+  text_info ti (m_localize_events ? _(fmt) : fmt,
+   &ap, 0, nullptr, &rich_loc);
   pp_format (pp, &ti);
   pp_output_formatted_text (pp);
 
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/tree-diagnostic-path.cc
index 743a8c2a1d29..0ad6c5beb81c 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/tree-diagnostic-path.cc
@@ -1016,7 +1016,7 @@ path_events_have_column_data_p (const diagnostic_path 
&path)
 }
 
 /* A subclass of simple_diagnostic_path that adds member functions
-   for adding test events.  */
+   for adding test events and suppresses translation of these events.  */
 
 class test_diagnostic_path : public simple_diagnostic_path
 {
@@ -1024,6 +1024,7 @@ class test_diagnostic_path : public simple_diagnostic_path
   test_diagnostic_path (pretty_printer *event_pp)
   : simple_diagnostic_path (event_pp)
   {
+disable_event_localization ();
   }
 
   void add_entry (tree fndecl, int stack_depth)
-- 
2.26.3



PING: Re: [PATCH] selftest: invoke "diff" when ASSERT_STREQ fails

2024-05-28 Thread David Malcolm
Ping.

This patch has actually been *very* helpful to me when debugging
selftest failures involving ASSERT_STREQ.

Thanks
Dave

On Fri, 2024-05-17 at 15:51 -0400, David Malcolm wrote:
> Currently when ASSERT_STREQ or ASSERT_STREQ_AT fail we print
> both strings to stderr.  However it can be hard to figure out
> the problem (e.g. for 1-character differences in long strings).
> 
> Extend the output by writing out the strings to tempfiles and
> invoking "diff -up" on them when we have such a selftest failure,
> to (I hope) simplify debugging.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> * selftest.cc (selftest::print_diff): New function.
> (selftest::assert_streq): Call it when we have non-equal
> non-null strings.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/selftest.cc | 28 ++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/selftest.cc b/gcc/selftest.cc
> index 6438d86a6aa0..f58c0631908e 100644
> --- a/gcc/selftest.cc
> +++ b/gcc/selftest.cc
> @@ -63,6 +63,26 @@ fail_formatted (const location &loc, const char
> *fmt, ...)
>    abort ();
>  }
>  
> +/* Invoke "diff" to print the difference between VAL1 and VAL2
> +   on stdout.  */
> +
> +static void
> +print_diff (const location &loc, const char *val1, const char *val2)
> +{
> +  temp_source_file tmpfile1 (loc, ".txt", val1);
> +  temp_source_file tmpfile2 (loc, ".txt", val2);
> +  const char *args[] = {"diff",
> +   "-up",
> +   tmpfile1.get_filename (),
> +   tmpfile2.get_filename (),
> +   NULL};
> +  int exit_status = 0;
> +  int err = 0;
> +  pex_one (PEX_SEARCH | PEX_LAST,
> +  args[0], CONST_CAST (char **, args),
> +  NULL, NULL, NULL, &exit_status, &err);
> +}
> +
>  /* Implementation detail of ASSERT_STREQ.
>     Compare val1 and val2 with strcmp.  They ought
>     to be non-NULL; fail gracefully if either or both are NULL.  */
> @@ -89,8 +109,12 @@ assert_streq (const location &loc,
> if (strcmp (val1, val2) == 0)
>   pass (loc, "ASSERT_STREQ");
> else
> - fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> val2=\"%s\"\n",
> - desc_val1, desc_val2, val1, val2);
> + {
> +   print_diff (loc, val1, val2);
> +   fail_formatted
> + (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> val2=\"%s\"\n",
> +  desc_val1, desc_val2, val1, val2);
> + }
>    }
>  }
>  



[Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Tobias Burnus
Improve test coverage by removing 'prune-output' given that the features 
are implemented in the meanwhile.


Comments, suggestions? Otherwise I will commit the patch as obvious.

Tobias
testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/lastprivate-conditional-1.c: Remove
	'{ dg-prune-output "not supported yet" }'.
	* c-c++-common/gomp/requires-1.c: Likewise.
	* c-c++-common/gomp/requires-2.c: Likewise.
	* c-c++-common/gomp/reverse-offload-1.c: Likewise.
	* g++.dg/gomp/requires-1.C: Likewise.
	* gfortran.dg/gomp/requires-1.f90: Likewise.
	* gfortran.dg/gomp/requires-2.f90: Likewise.
	* gfortran.dg/gomp/requires-4.f90: Likewise.
	* gfortran.dg/gomp/requires-5.f90: Likewise.
	* gfortran.dg/gomp/requires-6.f90: Likewise.
	* gfortran.dg/gomp/requires-7.f90: Likewise.

 gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c | 2 --
 gcc/testsuite/c-c++-common/gomp/requires-1.c| 2 --
 gcc/testsuite/c-c++-common/gomp/requires-2.c| 2 --
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c | 2 --
 gcc/testsuite/g++.dg/gomp/requires-1.C  | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90   | 1 -
 11 files changed, 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
index 722aba79a52..d4ef49690e8 100644
--- a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
@@ -63,2 +62,0 @@ bar (int *p)
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-1.c b/gcc/testsuite/c-c++-common/gomp/requires-1.c
index e1f2e3a503f..a47ec659566 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-1.c
@@ -13,2 +12,0 @@ foo ()
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-2.c b/gcc/testsuite/c-c++-common/gomp/requires-2.c
index 717b65caeea..d7430b1b1a4 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-2.c
@@ -9,2 +8,0 @@
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 9a3fa5230f8..ddc3c2c6be1 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -9,2 +8,0 @@
-/* { dg-prune-output "'reverse_offload' clause on 'requires' directive not supported yet" } */
-
diff --git a/gcc/testsuite/g++.dg/gomp/requires-1.C b/gcc/testsuite/g++.dg/gomp/requires-1.C
index aefeb288dad..5ca5e006da1 100644
--- a/gcc/testsuite/g++.dg/gomp/requires-1.C
+++ b/gcc/testsuite/g++.dg/gomp/requires-1.C
@@ -11,2 +10,0 @@ namespace M {
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
index b115a654e71..19007834c45 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
@@ -12,2 +11,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
index 5f11a7bfb2a..f144d391034 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
@@ -13,2 +12,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
index c870a2840d3..9d936197f8f 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
@@ -36 +35,0 @@ end
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
index e719e929294..87be933ba49 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
index cabd3d94a90..b20c218dd6b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-7.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-7

Re: [Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Jakub Jelinek
On Tue, May 28, 2024 at 07:43:00PM +0200, Tobias Burnus wrote:
> Improve test coverage by removing 'prune-output' given that the features are
> implemented in the meanwhile.
> 
> Comments, suggestions? Otherwise I will commit the patch as obvious.
> 
> Tobias

> testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/lastprivate-conditional-1.c: Remove
>   '{ dg-prune-output "not supported yet" }'.
>   * c-c++-common/gomp/requires-1.c: Likewise.
>   * c-c++-common/gomp/requires-2.c: Likewise.
>   * c-c++-common/gomp/reverse-offload-1.c: Likewise.
>   * g++.dg/gomp/requires-1.C: Likewise.
>   * gfortran.dg/gomp/requires-1.f90: Likewise.
>   * gfortran.dg/gomp/requires-2.f90: Likewise.
>   * gfortran.dg/gomp/requires-4.f90: Likewise.
>   * gfortran.dg/gomp/requires-5.f90: Likewise.
>   * gfortran.dg/gomp/requires-6.f90: Likewise.
>   * gfortran.dg/gomp/requires-7.f90: Likewise.

LGTM.

Jakub



[PATCH] MIPS16: Mark $2/$3 as clobbered if GP is used

2024-05-28 Thread YunQiang Su
PR Target/84790.
The gp init sequence
li  $2,%hi(_gp_disp)
addiu   $3,$pc,%lo(_gp_disp)
sll $2,16
addu$2,$3
is generated directly in `mips_output_function_prologue`, and does
not appear in the RTL.

So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
so they may be used for cross (local) function call.

Let's mark $2/$3 clobber both:
  - Just after the UNSPEC_GP RTL of a function;
  - Just after a function call.

Reported-by: Matthias Schiffer 
Origin-Patch-by: Felix Fietkau .

gcc
* config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
(mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
---
 gcc/config/mips/mips.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index b63d40a357b..b478cddc8ad 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3233,6 +3233,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx 
addr, bool lazy_p)
 {
   rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG);
   clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg);
+  clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP);
+  clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn),
+   MIPS_PROLOGUE_TEMP (word_mode));
 }
 
   return insn;
@@ -3329,7 +3332,13 @@ mips16_gp_pseudo_reg (void)
   rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx);
   rtx_insn *insn = emit_insn_after (set, scan);
   INSN_LOCATION (insn) = 0;
-
+  /* NewABI support hasn't been implement.  NewABI should generate RTL
+sequence instead of ASM sequence directly.  */
+  if (mips_current_loadgp_style () == LOADGP_OLDABI)
+   {
+ emit_clobber (MIPS16_PIC_TEMP);
+ emit_clobber (MIPS_PROLOGUE_TEMP (Pmode));
+   }
   pop_topmost_sequence ();
 }
 
-- 
2.39.2



Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread YunQiang Su
YunQiang Su  于2024年5月22日周三 17:54写道:
>
> If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try with
> -as/ld/objcopy before fallback to native as/ld/objcopy.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff
>
> gcc
> * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
> to native as/ld/objcopy.

ping. OK for the trunk?

> ---
>  gcc/gcc.cc | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..3dc6348d761 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,26 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
> commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> +   && !strcmp (commands[0].argv[0], commands[0].prog)
> +   && (!strcmp (commands[0].prog, "as")
> +   || !strcmp (commands[0].prog, "ld")
> +   || !strcmp (commands[0].prog, "objcopy")))
> +   {
> + string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> +   commands[0].prog, NULL);
> + const char *string_args[] = {string, "--version", NULL};
> + int exit_status = 0;
> + int err = 0;
> + const char *errmsg = pex_one (PEX_SEARCH, string,
> + CONST_CAST (char **, string_args), string,
> + NULL, NULL, &exit_status, &err);
> + if (errmsg == NULL && exit_status == 0 && err == 0)
> +   {
> + commands[0].argv[0] = string;
> + commands[0].prog = string;
> +   }
> +   }
>  }
>
>for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> --
> 2.39.2
>


[COMMITTED] Strlen pass should set current range query.

2024-05-28 Thread Andrew MacLeod

Thanks.

Committed with the change to the testcase.

Bootstraps on x86_64-pc-linux-gnu with no regressions.

Andrew



On 5/28/24 02:49, Richard Biener wrote:

On Tue, May 28, 2024 at 1:24 AM Andrew MacLeod  wrote:

The strlen pass currently has a local ranger instance, but when it
invokes SCEV or any other shared component, SCEV will not be able to
access to this ranger as it uses get_range_query().  They will be stuck
with global ranges.

Enable/disable ranger should be used instead of a local version which
allows other components to use the current range_query.

Bootstraps on 86_64-pc-linux-gnu, but there is one regression. The
regression is from gcc.dg/Wstringop-overflow-10.c.  the function in
question:

void
baz (char *a)
{
char b[16] = "abcdefg";
__builtin_strncpy (a, b, __builtin_strnlen (b, 7));/* { dg-bogus
"specified bound depends on the length of the source argument" } */
}

when compiled with  -O2 -Wstringop-overflow -Wstringop-truncation

it now spits out:

b2.c: In function ‘baz’:
b2.c:24:3: warning: ‘__builtin_strncpy’ output 2 truncated before
terminating nul copying  bytes from a string of the same length
[-Wstringop-truncation]
 24 |   __builtin_strncpy (a, b, __builtin_strnlen (b, 7));   /* {
dg-bogus "specified bound depends on the length of the source argument" } */

It seems like maybe something got smarter by setting the current range
query and this is a legitimate warning for this line of code?   There
will indeed not be a NULL copied as there are 7 characters in the string...

Is this a testcase issue where this warning should have been issued
before, or am I misunderstanding the warning?

I think the warning makes sense in this case.  But I'm not sure why the
dg-bogus is there, that looks like a valid complaint as well?!

I think the patch is OK.

Richard.


Andrew

PS im afraid of adjusting the status quo in this pass... :-P  Not
allowing sSCEV to access the current ranger is causing me other issues
with the fix for 115221.  This *should* have been a harmless change
sigh. :-(  The whole mechanism should just use the current range-query
instad of passing a ranger pointer aorund. But that a much bigger
issue.  one thing at a time.


From c43236cb59e11cadda2654edc117d9270dff75c6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 27 May 2024 13:20:13 -0400
Subject: [PATCH 1/5] Strlen pass should set current range query.

The strlen pass currently has a local ranger instance, but when it
invokes SCEV, scev will not be able to access to this ranger.

Enable/disable ranger shoud be used, allowing other components to use
the current range_query.

	gcc/
	* tree-ssa-strlen.cc (strlen_pass::strlen_pass): Add function
	pointer and initialize ptr_qry with current range_query.
	(strlen_pass::m_ranger): Remove.
	(printf_strlen_execute): Enable and disable ranger.
	gcc/testsuite/
	* gcc.dg/Wstringop-overflow-10.c: Add truncating warning.
---
 gcc/testsuite/gcc.dg/Wstringop-overflow-10.c |  2 +-
 gcc/tree-ssa-strlen.cc   | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
index bace08ad5d3..ddc27fc0580 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
@@ -21,7 +21,7 @@ void
 baz (char *a)
 {
   char b[16] = "abcdefg";
-  __builtin_strncpy (a, b, __builtin_strnlen (b, 7));	/* { dg-bogus "specified bound depends on the length of the source argument" } */
+  __builtin_strncpy (a, b, __builtin_strnlen (b, 7));	/* { dg-warning "output truncated before terminating nul" } */
 }
 
 void fill (char *);
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 7596dd80942..c43a2da2836 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -235,9 +235,9 @@ get_range (tree val, gimple *stmt, wide_int minmax[2],
 class strlen_pass : public dom_walker
 {
 public:
-  strlen_pass (cdi_direction direction)
+  strlen_pass (function *fun, cdi_direction direction)
 : dom_walker (direction),
-  ptr_qry (&m_ranger),
+  ptr_qry (get_range_query (fun)),
   m_cleanup_cfg (false)
   {
   }
@@ -299,8 +299,6 @@ public:
 			unsigned HOST_WIDE_INT lenrng[2],
 			unsigned HOST_WIDE_INT *size, bool *nulterm);
 
-  gimple_ranger m_ranger;
-
   /* A pointer_query object to store information about pointers and
  their targets in.  */
   pointer_query ptr_qry;
@@ -5912,9 +5910,10 @@ printf_strlen_execute (function *fun, bool warn_only)
   ssa_ver_to_stridx.safe_grow_cleared (num_ssa_names, true);
   max_stridx = 1;
 
+  enable_ranger (fun);
   /* String length optimization is implemented as a walk of the dominator
  tree and a forward walk of statements within each block.  */
-  strlen_pass walker (CDI_DOMINATORS);
+  strlen_pass walker (fun, CDI_DOMINATORS);
   walker.walk (ENTRY_BLOCK_PTR_FOR_FN (fun));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -5939,6 +5

[COMMITTED] tree-optimization/115221 - Do not invoke SCEV if it will use a different range query.

2024-05-28 Thread Andrew MacLeod
The original patch causing the PR made  ranger's cache re-entrant to 
enable SCEV to use the current range_query when called from within ranger..


SCEV uses the currently active range query (via get_range_query()) for 
picking up values.  fold_using_range is the general purpose stmt folder 
many  components use, and it takes a range_query to use for folding.   
When propagating values in the cache, we need to ensure no new queries 
are invoked, and when the cache is propagating and calculating outgoing 
edges, it switches to a read only range_query which uses what it knows 
about global values to come up with best result using current state.


SCEV is unaware of what the caller is using for a range_query, so when 
attempting to fold a PHI node, it is re-invoking the current query 
during propagation which is undesired behavior.   This patch tells 
fold_using_range to not use SCEV if the range_query being used is not 
the same as the one SCEV is going to use.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From b814e390e7c87c14ce8d9cdea6c6cd127a4e6261 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 27 May 2024 11:00:57 -0400
Subject: [PATCH] Do not invoke SCEV if it will use a different range query.

SCEV always uses the current range_query object.
Ranger's cache uses a global value_query when propagating cache values to
avoid re-invoking ranger during simple vavhe propagations.
when folding a PHI value, SCEV can be invoked, and since it alwys uses
the current range_query object, when ranger is active this causes the
undesired re-invoking of ranger during cache propagation.

This patch checks to see if the fold_using_range specified range_query
object is the same as the one SCEV uses, and does not invoke SCEV if
they do not match.

	PR tree-optimization/115221
	gcc/
	* gimple-range-fold.cc (range_of_ssa_name_with_loop_info): Do
	not invoke SCEV is range_query's do not match.
	gcc/testsuite/
	* gcc.dg/pr115221.c: New.
---
 gcc/gimple-range-fold.cc|  6 +-
 gcc/testsuite/gcc.dg/pr115221.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr115221.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index b3965b5ee50..98a4877ba18 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1264,7 +1264,11 @@ fold_using_range::range_of_ssa_name_with_loop_info (vrange &r, tree name,
 		fur_source &src)
 {
   gcc_checking_assert (TREE_CODE (name) == SSA_NAME);
-  if (!range_of_var_in_loop (r, name, l, phi, src.query ()))
+  // SCEV currently invokes get_range_query () for values.  If the query
+  // being passed in is not the same SCEV will use, do not invoke SCEV.
+  // This can be remove if/when SCEV uses a passed in range-query.
+  if (src.query () != get_range_query (cfun)
+  || !range_of_var_in_loop (r, name, l, phi, src.query ()))
 r.set_varying (TREE_TYPE (name));
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr115221.c b/gcc/testsuite/gcc.dg/pr115221.c
new file mode 100644
index 000..f139394e5c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115221.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned uint32_t;
+int cde40_t;
+int offset;
+void aal_test_bit();
+uint32_t cde40_key_pol();
+long cde40_offset_check(uint32_t pos) {
+  cde40_key_pol();
+  if (cde40_t)
+return (offset - 2) % (((pos == 3) ? 18 : 26)) != 0;
+  return 0;
+}
+void cde40_check_struct() {
+  uint32_t i, j, to_compare;
+  for (;; i++) {
+cde40_offset_check(i);
+if (to_compare == 0) {
+  if (i && cde40_key_pol())
+	;
+  to_compare = i;
+  continue;
+}
+j = to_compare;
+for (; j < i; j++)
+  aal_test_bit();
+  }
+}
-- 
2.41.0



Re: [PATCH] c++/modules: Prevent revealing a using-decl affecting cached overloads [PR114867]

2024-05-28 Thread Jason Merrill

On 5/26/24 09:01, Nathaniel Shead wrote:

Is this approach OK?  Alternatively I suppose we could do a deep copy of
the overload list when this occurs to ensure we don't affect existing
referents, would that be preferable?


This strategy makes sense, but I have other concerns:


Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

Doing 'remove_node' here is not safe, because it not only mutates the
OVERLOAD we're walking over but potentially any other references to this
OVERLOAD that are cached from phase-1 template lookup.  This causes the
attached testcase to fail because the overload set in X::test no longer
contains the 'ns::foo' template once instantiated at the end of the


It looks like ns::foo has been renamed to just f in the testcase.


file.

This patch works around this by simply not removing the old declaration.
This does make the overload list potentially longer than it otherwise
would have been, but only when re-exporting the same set of functions in
a using-decl.  Additionally, because 'ovl_insert' always prepends these
newly inserted overloads, repeated exported using-decls won't continue
to add declarations, as the first exported using-decl will be found
before the original (unexported) declaration.

PR c++/114867

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Don't remove the
existing overload.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-17_a.C: New test.
* g++.dg/modules/using-17_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc | 24 +++---
  gcc/testsuite/g++.dg/modules/using-17_a.C | 31 +++
  gcc/testsuite/g++.dg/modules/using-17_b.C | 13 ++
  3 files changed, 53 insertions(+), 15 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-17_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-17_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index f1f8c19feb1..130a0e6b5db 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5231,25 +5231,19 @@ do_nonmember_using_decl (name_lookup &lookup, bool 
fn_scope_p,
  
  	  if (new_fn == old_fn)

{
- /* The function already exists in the current
-namespace.  We will still want to insert it if
-it is revealing a not-revealed thing.  */
+ /* The function already exists in the current namespace.  */
  found = true;
- if (!revealing_p)
-   ;
- else if (old.using_p ())
+ if (exporting)
{
- if (exporting)
+ if (old.using_p ())
/* Update in place.  'tis ok.  */
OVL_EXPORT_P (old.get_using ()) = true;
- ;
-   }
- else if (DECL_MODULE_EXPORT_P (new_fn))
-   ;
- else
-   {
- value = old.remove_node (value);
- found = false;
+ else if (!DECL_MODULE_EXPORT_P (new_fn))
+   /* We need to re-insert this function as an exported
+  declaration.  We can't remove the existing decl
+  because that will change any overloads cached in
+  template functions.  */
+   found = false;


What if we're revealing without exporting?  That is, a using-declaration 
in module purview that isn't exported?  Such a declaration should still 
prevent discarding, which is my understanding of the use of "revealing" 
here.


It seems like the current code already gets that wrong for e.g.

M_1.C:
module;
 struct A {};
 inline int f() { return 42; }
export module M;
 using ::A;
 using ::f;

M_2.C:
import M;
 inline int f();
 struct A a; // { dg-bogus "incomplete" }
int main() {
  return f(); // { dg-bogus "undefined" }
}

It looks like part of the problem is that add_binding_entity is only 
interested in exported usings, but I think it should also handle 
revealing ones.


Jason



Re: [PATCH] c++: canonicity of fn types w/ instantiated eh specs [PR115223]

2024-05-28 Thread Jason Merrill

On 5/25/24 19:18, Patrick Palka wrote:

Bootstrap and regtest on x86_64-pc-linux-gnu in progress,
does this look OK for trunk if successful?


OK.


-- >8 --

When propagating structural equality in build_cp_fntype_variant, we
should consider structural equality of the exception-less variant, not
of the given type which might use structural equality only because of
the (complex) noexcept-spec we're intending to replace, as in
maybe_instantiate_noexcept which calls build_exception_variant using
the function type with a deferred noexcept-spec.  Otherwise we might
pessimisticly use structural equality for a function type with a simple
instantiated noexcept-spec, leading to a failed LTO-specific sanity
check if we later use that (structural-equality) type as the canonical
version of some other variant.

PR c++/115223

gcc/cp/ChangeLog:

* tree.cc (build_cp_fntype_variant): Propagate structural
equality of the exception-less variant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept87.C: New test.
---
  gcc/cp/tree.cc  |  4 
  gcc/testsuite/g++.dg/cpp0x/noexcept87.C | 11 +++
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept87.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 4d87661b4ad..f810b8cd777 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -2796,6 +2796,10 @@ build_cp_fntype_variant (tree type, cp_ref_qualifier 
rqual,
bool complex_eh_spec_p = (cr && cr != noexcept_true_spec
&& !UNPARSED_NOEXCEPT_SPEC_P (cr));
  
+  if (!complex_eh_spec_p && TYPE_RAISES_EXCEPTIONS (type))

+/* We want to consider structural equality of the exception-less
+   variant since we'll be replacing the exception specification.  */
+type = build_cp_fntype_variant (type, rqual, /*raises=*/NULL_TREE, late);
if (TYPE_STRUCTURAL_EQUALITY_P (type) || complex_eh_spec_p)
  /* Propagate structural equality.  And always use structural equality
 for function types with a complex noexcept-spec since their identity
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept87.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
new file mode 100644
index 000..60b1497472b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
@@ -0,0 +1,11 @@
+// PR c++/115223
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -flto }
+
+template
+void f() noexcept(bool(T() || true));
+
+void g(int n) { f(); }
+
+using type = void;
+type callDestructorIfNecessary() noexcept {}




Re: [PATCH] c++: extend -Wself-move for mem-init-list [PR109396]

2024-05-28 Thread Marek Polacek
On Fri, May 24, 2024 at 10:15:56AM -0400, Jason Merrill wrote:
> On 5/23/24 19:57, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > We already warn for:
> > 
> >x = std::move (x);
> > 
> > which triggers:
> > 
> >warning: moving 'x' of type 'int' to itself [-Wself-move]
> > 
> > but bug 109396 reports that this doesn't work for a member-initializer-list:
> > 
> >X() : x(std::move (x))
> > 
> > so this patch amends that.
> > 
> > PR c++/109396
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (maybe_warn_self_move): Declare.
> > * init.cc (perform_member_init): Call maybe_warn_self_move.
> > * typeck.cc (maybe_warn_self_move): No longer static.  Change the
> > return type to bool.  Also warn when called from
> > a member-initializer-list.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wself-move2.C: New test.
> > ---
> >   gcc/cp/cp-tree.h|  1 +
> >   gcc/cp/init.cc  |  5 ++--
> >   gcc/cp/typeck.cc| 28 +--
> >   gcc/testsuite/g++.dg/warn/Wself-move2.C | 37 +
> >   4 files changed, 60 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move2.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index ba9e848c177..ea3fa6f4aac 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -8263,6 +8263,7 @@ extern cp_expr build_c_cast   
> > (location_t loc, tree type,
> >  cp_expr expr);
> >   extern tree cp_build_c_cast   (location_t, tree, tree,
> >  tsubst_flags_t);
> > +extern bool maybe_warn_self_move   (location_t, tree, tree);
> >   extern cp_expr build_x_modify_expr(location_t, tree,
> >  enum tree_code, tree,
> >  tree, tsubst_flags_t);
> > diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> > index 52396d87a8c..4a7ed7f5302 100644
> > --- a/gcc/cp/init.cc
> > +++ b/gcc/cp/init.cc
> > @@ -999,7 +999,7 @@ perform_member_init (tree member, tree init, 
> > hash_set &uninitialized)
> > if (decl == error_mark_node)
> >   return;
> > -  if ((warn_init_self || warn_uninitialized)
> > +  if ((warn_init_self || warn_uninitialized || warn_self_move)
> > && init
> > && TREE_CODE (init) == TREE_LIST
> > && TREE_CHAIN (init) == NULL_TREE)
> > @@ -1013,7 +1013,8 @@ perform_member_init (tree member, tree init, 
> > hash_set &uninitialized)
> > warning_at (DECL_SOURCE_LOCATION (current_function_decl),
> > OPT_Winit_self, "%qD is initialized with itself",
> > member);
> > -  else
> > +  else if (!maybe_warn_self_move (input_location, member,
> > + TREE_VALUE (init)))
> > find_uninit_fields (&val, &uninitialized, decl);
> >   }
> > diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> > index d7fa6e0dd96..e058ce18276 100644
> > --- a/gcc/cp/typeck.cc
> > +++ b/gcc/cp/typeck.cc
> > @@ -9355,27 +9355,27 @@ cp_build_c_cast (location_t loc, tree type, tree 
> > expr,
> >   /* Warn when a value is moved to itself with std::move.  LHS is the 
> > target,
> >  RHS may be the std::move call, and LOC is the location of the whole
> > -   assignment.  */
> > +   assignment.  Return true if we warned.  */
> > -static void
> > +bool
> >   maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
> >   {
> > if (!warn_self_move)
> > -return;
> > +return false;
> > /* C++98 doesn't know move.  */
> > if (cxx_dialect < cxx11)
> > -return;
> > +return false;
> > if (processing_template_decl)
> > -return;
> > +return false;
> > if (!REFERENCE_REF_P (rhs)
> > || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
> > -return;
> > +return false;
> > tree fn = TREE_OPERAND (rhs, 0);
> > if (!is_std_move_p (fn))
> > -return;
> > +return false;
> > /* Just a little helper to strip * and various NOPs.  */
> > auto extract_op = [] (tree &op) {
> > @@ -9393,13 +9393,23 @@ maybe_warn_self_move (location_t loc, tree lhs, 
> > tree rhs)
> > tree type = TREE_TYPE (lhs);
> > tree orig_lhs = lhs;
> > extract_op (lhs);
> > -  if (cp_tree_equal (lhs, arg))
> > +  if (cp_tree_equal (lhs, arg)
> > +  /* Also warn in a member-initializer-list, as in : i(std::move(i)).  
> > */
> > +  || (TREE_CODE (lhs) == FIELD_DECL
> > + && TREE_CODE (arg) == COMPONENT_REF
> > + && cp_tree_equal (TREE_OPERAND (arg, 0), current_class_ref)
> > + && TREE_OPERAND (arg, 1) == lhs))
> >   {
> > auto_diagnostic_group d;
> > if (warning_at (loc, OPT_Wself_move,
> >   "moving %qE of type %qT to itself", orig_lhs, type))
> > - 

Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-28 Thread Jason Merrill

On 5/25/24 08:12, Alexandre Oliva wrote:

On Apr 27, 2023, Alexandre Oliva  wrote:

On Apr 14, 2023, Alexandre Oliva  wrote:

On Mar 23, 2023, Alexandre Oliva  wrote:

This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.



Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614452.html


Ping?
Refreshed, retested on ppc64le-linux-gnu.  Ok to install?


I wonder about adding this information to REG_CALL_ARG_LOCATION, but 
doing it this way also seems reasonable.  I'm interested in Jakub's 
input, but the patch is OK in a week if he doesn't get to it.

This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.


for  gcc/ChangeLog

* target.def (call_offset_return_label): New hook.
* gcc/doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
placeholder.
* gcc/doc/tm.texi: Rebuild.
* dwarf2out.cc (struct call_arg_loc_node): Record call_insn
instad of call_arg_loc_note.
(add_AT_lbl_id): Add optional offset argument.
(gen_call_site_die): Compute and pass on a return pc offset.
(gen_subprogram_die): Move call_arg_loc_note computation...
(dwarf2out_var_location): ... from here.  Set call_insn.
---
  gcc/doc/tm.texi|7 +++
  gcc/doc/tm.texi.in |2 ++
  gcc/dwarf2out.cc   |   26 +-
  gcc/target.def |9 +
  4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd50078227d98..8a7aa70d605ba 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5557,6 +5557,13 @@ except the last are treated as named.
  You need not define this hook if it always returns @code{false}.
  @end deftypefn
  
+@deftypefn {Target Hook} int TARGET_CALL_OFFSET_RETURN_LABEL (rtx_insn *@var{call_insn})

+While generating call-site debug info for a CALL insn, or a SEQUENCE
+insn starting with a CALL, this target hook is invoked to compute the
+offset to be added to the debug label emitted after the call to obtain
+the return address that should be recorded as the return PC.
+@end deftypefn
+
  @deftypefn {Target Hook} void TARGET_START_CALL_ARGS (cumulative_args_t 
@var{complete_args})
  This target hook is invoked while generating RTL for a function call,
  after the argument values have been computed, and after stack arguments
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 058bd56487a9a..9e0830758aeea 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3887,6 +3887,8 @@ These machine description macros help implement varargs:
  
  @hook TARGET_STRICT_ARGUMENT_NAMING
  
+@hook TARGET_CALL_OFFSET_RETURN_LABEL

+
  @hook TARGET_START_CALL_ARGS
  
  @hook TARGET_CALL_ARGS

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5b064ffd78ad1..1092880738df4 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3593,7 +3593,7 @@ typedef struct var_loc_list_def var_loc_list;
  
  /* Call argument location list.  */

  struct GTY ((chain_next ("%h.next"))) call_arg_loc_node {
-  rtx GTY (()) call_arg_loc_note;
+  rtx_insn * GTY (()) call_insn;
const char * GTY (()) label;
tree GTY (()) block;
bool tail_call_p;
@@ -3777,7 +3777,8 @@ static void remove_addr_table_entry (addr_table_entry *);
  static void add_AT_addr (dw_die_ref, enum dwarf_attribute, rtx, bool);
  static inline rtx AT_addr (dw_attr_node *);
  static void add_AT_symview (dw_die_ref, enum dwarf_attribute, const char *);
-static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *);
+static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *,
+  int = 0);
  static void add_AT_lineptr (dw_die_ref, enum dwarf_attribute, const char *);
  static void add_AT_macptr (dw_die_ref, enum dwarf_attribute, const char *);
  static void add_AT_range_list (dw_die_ref, enum dwarf_attribute,
@@ -5353,14 +5354,17 @@ add_AT_symview (dw_die_ref die, enum dwarf_attribute 
attr_kind,
  
  static inline void

  add_AT_lbl_id (dw_die_ref die, enum dwarf_attribute attr_kind,
-   const char *lbl_id)
+  const char *lbl_id, int offset)
  {
dw_attr_node attr;
  
attr.dw_attr = attr_kind;

attr.dw_attr_val.val_class = dw_val_class_lbl_id;
attr.dw_attr_val.val_entry = NULL;
-  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  if (!offset)
+attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  else
+attr.dw_attr_val.v.val_lbl_id = xasprintf ("%s%+i", lbl_id, offset);
if (dwarf_split_debug_info)
  attr.dw_attr_val.val_entry
  = add_addr_table_entry (attr.dw_attr_val.v.val_lbl_id,
@@

[PATCH v4] RISC-V: Introduce -mvector-strict-align.

2024-05-28 Thread Robin Dapp
Hi,

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign.  For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
 - Adressed Kito's comments.
 - Made -mscalar-strict-align a real alias.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and map to riscv_vector_unaligned_access_p.
* config/riscv/riscv.opt: Add -mvector-strict-align.
* config/riscv/riscv.cc (struct riscv_tune_param): Add
vector_unaligned_access.
(riscv_override_options_internal): Set
riscv_vector_unaligned_access_p.
* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mno-vector-strict-align.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
---
 gcc/config/riscv/riscv-opts.h |  3 --
 gcc/config/riscv/riscv.cc | 19 +++
 gcc/config/riscv/riscv.h  |  5 +++
 gcc/config/riscv/riscv.opt|  8 +
 gcc/doc/invoke.texi   | 22 
 .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
 .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 34 +--
 13 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
 
-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a99211d56b1..13cd61a4a22 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -287,6 +287,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool vector_unaligned_access;
   bool use_divmod_expansion;
   bool overlap_op_by_pieces;
   unsigned int fusible_ops;
@@ -299,6 +300,10 @@ struct riscv_tune_param
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Whether misaligned vector accesses are supported (i.e. do not
+   throw an exception).  */
+bool riscv_vector_unaligned_access_p;
+
 /* Whether user explicitly passed -mstrict-align.  */
 bool riscv_user_wants_strict_align;
 
@@ -441,6 +446,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* vector_unaligned_access */
   false,   /* use_divmod_expansion */
   false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
@@ -459,6 +465,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* vector_unaligned_access */
   false,   

[patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus
-fopenmp-force-usm can be useful for some badly written code. Explicity 
using 'omp requires' makes more sense but still. It might also make 
sense for testing purpose.


Unfortunately, I did not see a simple way of testing it. When trying it 
manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where 
gcn_data has the omp_requires_mask as second argument and testing showed 
that an explicit pragma and the -f... argument have the same result.


Alternative would be to move this code later, e.g. to lto-cgraph.cc's 
omp_requires_mask, which might be safer (as it avoids changing as many 
locations). On the other hand, it might require more special cases 
elsewhere.*


Comment, suggestions?

Tobias

*I am especially thinking about a global variable and "#pragma omp 
declare target". At least with 'omp requires self_maps' of OpenMP 6, it 
seems as if 'declare target enter(global_var)' should become 
'link(global_var)' where the global_var pointer is updated to point to 
the host version.


At least I don't see how otherwise the "all corresponding list items 
created by the 'enter' clauses specified by declare target directives in 
the compilation unit share storage with the original list items." could 
be fulfilled.


This will require generating different code for 'self_maps' (and, 
potentially / [RFC] 'unified_shared_memory') than normal code, which 
would be the first compiler code-gen change due to USM (→ 
GOMP_OFFLOAD_CAP_SHARED_MEM) for non-host devices.
OpenMP: Add -fopenmp-force-usm mode

Add an implicit 'omp requires unified_shared_memory' to all files that
use target constructs ("OMP_REQUIRES_TARGET_USED").  As constructed, the
diagnostic "'unified_shared_memory' clause used lexically after first target
construct or offloading API" is not inhibited.

The option has no effect without -fopenmp and does not affect OpenACC code,
matching what the directive would do.  The name of the command-line option
matches Clang's, added in LLVM 18.

gcc/c-family/ChangeLog:

	* c.opt (fopenmp-force-usm): New.
	* c.opt.urls: Regenerated

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
	c_parser_omp_target_enter_data, c_parser_omp_target_exit_data,
	c_parser_omp_target): When setting OMP_REQUIRES_TARGET_USED, also
	set OMP_REQUIRES_UNIFIED_SHARED_MEMORY if -fopenmp-force-usm is
	in force.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_target_data,
	cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
	cp_parser_omp_target_update, cp_parser_omp_target): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.


gcc/ChangeLog:

	* doc/invoke.texi (-fopenmp-force-usm): Document new option.

gcc/fortran/ChangeLog:

	* invoke.texi (-fopenmp-force-usm): Document new option.
	* lang.opt (fopenmp-force-usm): New.
	* lang.opt.urls: Regenerate.
	* parse.cc (gfc_parse_file): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.

 gcc/c-family/c.opt|  4 
 gcc/c-family/c.opt.urls   |  3 +++
 gcc/c/c-parser.cc | 50 +--
 gcc/cp/parser.cc  | 50 +--
 gcc/doc/invoke.texi   | 11 +--
 gcc/fortran/invoke.texi   |  7 +++
 gcc/fortran/lang.opt  |  4 
 gcc/fortran/lang.opt.urls |  3 +++
 gcc/fortran/parse.cc  | 10 --
 9 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..4985cd61c48 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2136,6 +2136,10 @@ fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
 
+fopenmp-force-usm
+C ObjC C++ ObjC++ Var(flag_openmp_force_usm)
+Behave as if the source file contained OpenMP's 'requires unified_shared_memory'.
+
 fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index dd455d7c0dc..34b3a395e84 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -1222,6 +1222,9 @@ UrlSuffix(gcc/C-Dialect-Options.html#index-fopenacc-dim)
 fopenmp
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp)
 
+fopenmp-force-usm
+UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-force-usm) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-force-usm)
+
 fopenmp-simd
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-simd) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-simd)
 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 00f8bf4376e..93c9cd1c9d0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23849,8 +23849,14 @@ static tree
 c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
 {
   if (f

Re: [Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-28 Thread Harald Anlauf

Hi Andre,

On 5/28/24 14:10, Andre Vehreschild wrote:

Hi all,

the attached patch fixes a memory leak with unlimited polymorphic return types.
The leak occurred, because an expression with side-effects was evaluated twice.
I have substituted the check for non-variable expressions followed by creating a
SAVE_EXPR with checking for trees with side effects and creating temp. variable
and freeing the memory.


this looks good to me.  It also solves the runtime memory leak in
testcase pr114012.f90 .  Nice!


Btw, I do not get the SAVE_EXPR in the old code. Is there something missing to
manifest it or is a SAVE_EXPR not meant to be evaluated twice?


I was assuming that the comment in gcc/tree.h applies here:

/* save_expr (EXP) returns an expression equivalent to EXP
   but it can be used multiple times within context CTX
   and only evaluate EXP once.  */

I do not know what the practical difference between a SAVE_EXPR
and a temporary explicitly evaluated once (which you have now)
is, except that you can free the temporary cleanly.


Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?


Yes, this is fine from my side.  If you are inclined to backport
to e.g. 14-branch after a grace period, that would be great.


This work is funded by the Souvereign Tech Fund. Yes, the funding has been
granted and Nicolas, Mikael and me will be working on some Fortran topics in
the next 12-18 months.


This is really great news!


Regards,
Andre


Thanks for the patch!

Harald


--
Andre Vehreschild * Email: vehre ad gmx dot de




Re: [PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-05-28 Thread David Malcolm
On Tue, 2024-05-28 at 11:41 -0400, David Malcolm wrote:
> > On Tue, 2024-05-28 at 15:03 +0200, Mark Wielaard wrote:
> > Hi Maciej (Hi David, added to CC),
> 
> > On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
> > >  As reported in PR target/79646 and fixed by a change proposed by
> > > Abe we 
> > > have a couple of issues with the descriptions of the VAX
> > > floating-point 
> > > format options in the option definition file.  Additionally most
> > > of
> > > these 
> > > options are not documented in the manual.
> > > 
> > >  This mini patch series addresses these issues, including Abe's
> > > change, 
> > > slightly updated, and my new change.  See individual change
> > > descriptions 
> > > for details.
> > > 
> > >  Verified by inspecting output produced by `vax-netbsdelf-gcc -v
> > > --help' 
> > > and by eyeballing `gcc.info' and `gcc.pdf' files produced. 
> > > Committed.
> > 
> > This broke the gcc-autoregen checker because the
> > gcc/config/vax/vax.opt.urls file wasn't regenerated:
> > https://builder.sourceware.org/buildbot/#/builders/269/builds/5347
> > 
> > Producing the following diff:
> > 
> > diff --git a/gcc/config/vax/vax.opt.urls
> > b/gcc/config/vax/vax.opt.urls
> > index c6b1c418b61..ca78b31dd4c 100644
> > --- a/gcc/config/vax/vax.opt.urls
> > +++ b/gcc/config/vax/vax.opt.urls
> > @@ -1,7 +1,13 @@
> >  ; Autogenerated by regenerate-opt-urls.py from
> > gcc/config/vax/vax.opt and generated HTML
> >  
> > +; skipping UrlSuffix for 'md' due to finding no URLs
> > +
> > +; skipping UrlSuffix for 'md-float' due to finding no URLs
> > +
> >  ; skipping UrlSuffix for 'mg' due to finding no URLs
> >  
> > +; skipping UrlSuffix for 'mg-float' due to finding no URLs
> > +
> >  ; skipping UrlSuffix for 'mgnu' due to finding no URLs
> >  
> >  ; skipping UrlSuffix for 'munix' due to finding no URLs
> > 
> > I am not completely clear on why though. Since it seems you
> > actually
> > did add documentation for exactly these options.
> > 
> > David, should the above diff just be checked in, or do we need to
> > investigate why the URLs weren't found?
> 
> [adding Nick, re the v850 target]
> 
> I found the problem - I messed up when I was populating
> TARGET_SPECIFIC_PAGES in regenerate-opt-urls.py, accidentally
> transposing the entries for v850 and vax by writing:
> 
>     'gcc/V850-Options.html' : 'gcc/config/vax/',
>     'gcc/VAX-Options.html' : 'gcc/config/v850/',
> 
> leading to both gcc/config/v850/v850.opt.urls and
> gcc/config/vax/vax.opt.urls being full of such comments.
> 
> Sorry.
> 
> Fixing that leads to the files for both targets being populated with
> correct-looking URL entries.
> 
> I'll push this to trunk (and backport to gcc 14) after suitable
> testing.

I've pushed this to gcc trunk as r15-872-g7cc529fe514cc6 (having
bootstrapped and lightly tested it on x86_64-pc-linux-gnu)

Dave



[pushed 2/3] libcpp: move label_text to its own header

2024-05-28 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-874-g9bda2c4c81b668.

libcpp/ChangeLog:
* Makefile.in (TAGS_SOURCES): Add include/label-text.h.
* include/label-text.h: New file.
* include/rich-location.h: Include "label-text.h".
(class label_text): Move to label-text.h.

Signed-off-by: David Malcolm 
---
 libcpp/Makefile.in |   2 +-
 libcpp/include/label-text.h| 102 +
 libcpp/include/rich-location.h |  79 +
 3 files changed, 105 insertions(+), 78 deletions(-)
 create mode 100644 libcpp/include/label-text.h

diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
index ebbca3fb..7e47153264c0 100644
--- a/libcpp/Makefile.in
+++ b/libcpp/Makefile.in
@@ -271,7 +271,7 @@ ETAGS = @ETAGS@
 
 TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
 include/cpplib.h include/line-map.h include/mkdeps.h include/symtab.h \
-include/rich-location.h
+include/rich-location.h include/label-text.h
 
 
 TAGS: $(TAGS_SOURCES)
diff --git a/libcpp/include/label-text.h b/libcpp/include/label-text.h
new file mode 100644
index ..13562cda41f9
--- /dev/null
+++ b/libcpp/include/label-text.h
@@ -0,0 +1,102 @@
+/* A very simple string class.
+   Copyright (C) 2015-2024 Free Software Foundation, Inc.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING3.  If not see
+.
+
+ In other words, you are welcome to use, share and improve this program.
+ You are forbidden to forbid anyone else to use, share and improve
+ what you give them.   Help stamp out software-hoarding!  */
+
+#ifndef LIBCPP_LABEL_TEXT_H
+#define LIBCPP_LABEL_TEXT_H
+
+/* A struct for the result of range_label::get_text: a NUL-terminated buffer
+   of localized text, and a flag to determine if the caller should "free" the
+   buffer.  */
+
+class label_text
+{
+public:
+  label_text ()
+  : m_buffer (NULL), m_owned (false)
+  {}
+
+  ~label_text ()
+  {
+if (m_owned)
+  free (m_buffer);
+  }
+
+  /* Move ctor.  */
+  label_text (label_text &&other)
+  : m_buffer (other.m_buffer), m_owned (other.m_owned)
+  {
+other.release ();
+  }
+
+  /* Move assignment.  */
+  label_text & operator= (label_text &&other)
+  {
+if (m_owned)
+  free (m_buffer);
+m_buffer = other.m_buffer;
+m_owned = other.m_owned;
+other.release ();
+return *this;
+  }
+
+  /* Delete the copy ctor and copy-assignment operator.  */
+  label_text (const label_text &) = delete;
+  label_text & operator= (const label_text &) = delete;
+
+  /* Create a label_text instance that borrows BUFFER from a
+ longer-lived owner.  */
+  static label_text borrow (const char *buffer)
+  {
+return label_text (const_cast  (buffer), false);
+  }
+
+  /* Create a label_text instance that takes ownership of BUFFER.  */
+  static label_text take (char *buffer)
+  {
+return label_text (buffer, true);
+  }
+
+  void release ()
+  {
+m_buffer = NULL;
+m_owned = false;
+  }
+
+  const char *get () const
+  {
+return m_buffer;
+  }
+
+  bool is_owner () const
+  {
+return m_owned;
+  }
+
+private:
+  char *m_buffer;
+  bool m_owned;
+
+  label_text (char *buffer, bool owned)
+  : m_buffer (buffer), m_owned (owned)
+  {}
+};
+
+#endif /* !LIBCPP_LABEL_TEXT_H  */
diff --git a/libcpp/include/rich-location.h b/libcpp/include/rich-location.h
index a2ece8b033c0..be424cb4b65f 100644
--- a/libcpp/include/rich-location.h
+++ b/libcpp/include/rich-location.h
@@ -22,6 +22,8 @@ along with this program; see the file COPYING3.  If not see
 #ifndef LIBCPP_RICH_LOCATION_H
 #define LIBCPP_RICH_LOCATION_H
 
+#include "label-text.h"
+
 class range_label;
 class label_effects;
 
@@ -541,83 +543,6 @@ protected:
   const diagnostic_path *m_path;
 };
 
-/* A struct for the result of range_label::get_text: a NUL-terminated buffer
-   of localized text, and a flag to determine if the caller should "free" the
-   buffer.  */
-
-class label_text
-{
-public:
-  label_text ()
-  : m_buffer (NULL), m_owned (false)
-  {}
-
-  ~label_text ()
-  {
-if (m_owned)
-  free (m_buffer);
-  }
-
-  /* Move ctor.  */
-  label_text (label_text &&other)
-  : m_buffer (other.m_buffer), m_owned (other.m_owned)
-  {
-other.release ();
-  }
-
-  /* Move assignment.  */
-  label_text & operator= (label_text &&other)
-  {
-if (m_owned)
-  fr

[pushed 3/3] diagnostics: consolidate global state in diagnostic-color.cc

2024-05-28 Thread David Malcolm
Simplify the table of default colors, avoiding the need to manually
add the strlen of each entry.
Consolidate the global state in diagnostic-color.cc into a
g_color_dict, adding selftests for the new class diagnostic_color_dict.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Tested with "make selftest-valgrind" and manually with various
values for GCC_COLORS.
Pushed to trunk as r15-875-g21fc89bac61983.

gcc/ChangeLog:
* diagnostic-color.cc: Define INCLUDE_VECTOR.
Include "label-text.h" and "selftest.h".
(struct color_cap): Replace with...
(struct color_default): ...this, adding "m_" prefixes to fields
and dropping "name_len" and "free_val" field.
(color_dict): Convert to...
(gcc_color_defaults): ...this, making const, dropping the trailing
strlen and "false" from each entry.
(class diagnostic_color_dict): New.
(g_color_dict): New.
(colorize_start): Reimplement in terms of g_color_dict.
(diagnostic_color_dict::get_entry_by_name): New, based on
colorize_start.
(diagnostic_color_dict::get_start_by_name): Likewise.
(diagnostic_color_dict::diagnostic_color_dict): New.
(parse_gcc_colors): Reimplement, moving body...
(diagnostic_color_dict::parse_envvar_value): ...here.
(colorize_init): Lazily create g_color_dict.
(selftest::test_empty_color_dict): New.
(selftest::test_default_color_dict): New.
(selftest::test_color_dict_envvar_parsing): New.
(selftest::diagnostic_color_cc_tests): New.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::diagnostic_color_cc_tests.
* selftest.h (selftest::diagnostic_color_cc_tests): New decl.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-color.cc   | 277 +-
 gcc/selftest-run-tests.cc |   1 +
 gcc/selftest.h|   1 +
 3 files changed, 216 insertions(+), 63 deletions(-)

diff --git a/gcc/diagnostic-color.cc b/gcc/diagnostic-color.cc
index f01a0fc2e377..cbe57ce763f2 100644
--- a/gcc/diagnostic-color.cc
+++ b/gcc/diagnostic-color.cc
@@ -17,9 +17,11 @@
02110-1301, USA.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "diagnostic-color.h"
 #include "diagnostic-url.h"
+#include "label-text.h"
 
 #ifdef __MINGW32__
 #  define WIN32_LEAN_AND_MEAN
@@ -27,6 +29,7 @@
 #endif
 
 #include "color-macros.h"
+#include "selftest.h"
 
 /* The context and logic for choosing default --color screen attributes
(foreground and background colors, etc.) are the following.
@@ -72,56 +75,124 @@
 counterparts) and possibly bold blue.  */
 /* Default colors. The user can overwrite them using environment
variable GCC_COLORS.  */
-struct color_cap
+struct color_default
 {
-  const char *name;
-  const char *val;
-  unsigned char name_len;
-  bool free_val;
+  const char *m_name;
+  const char *m_val;
 };
 
 /* For GCC_COLORS.  */
-static struct color_cap color_dict[] =
+static const color_default gcc_color_defaults[] =
 {
-  { "error", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED), 5, false },
-  { "warning", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_MAGENTA),
-  7, false },
-  { "note", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN), 4, false },
-  { "range1", SGR_SEQ (COLOR_FG_GREEN), 6, false },
-  { "range2", SGR_SEQ (COLOR_FG_BLUE), 6, false },
-  { "locus", SGR_SEQ (COLOR_BOLD), 5, false },
-  { "quote", SGR_SEQ (COLOR_BOLD), 5, false },
-  { "path", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN), 4, false },
-  { "fnname", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 6, false },
-  { "targs", SGR_SEQ (COLOR_FG_MAGENTA), 5, false },
-  { "fixit-insert", SGR_SEQ (COLOR_FG_GREEN), 12, false },
-  { "fixit-delete", SGR_SEQ (COLOR_FG_RED), 12, false },
-  { "diff-filename", SGR_SEQ (COLOR_BOLD), 13, false },
-  { "diff-hunk", SGR_SEQ (COLOR_FG_CYAN), 9, false },
-  { "diff-delete", SGR_SEQ (COLOR_FG_RED), 11, false },
-  { "diff-insert", SGR_SEQ (COLOR_FG_GREEN), 11, false },
-  { "type-diff", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 9, false 
},
-  { "valid", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 5, false },
-  { "invalid", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED), 7, false },
-  { NULL, NULL, 0, false }
+  { "error", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED) },
+  { "warning", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_MAGENTA) },
+  { "note", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN) },
+  { "range1", SGR_SEQ (COLOR_FG_GREEN) },
+  { "range2", SGR_SEQ (COLOR_FG_BLUE) },
+  { "locus", SGR_SEQ (COLOR_BOLD) },
+  { "quote", SGR_SEQ (COLOR_BOLD) },
+  { "path", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN) },
+  { "fnname", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN) },
+  { "targs", SGR_SEQ (COLOR_FG_MAGENTA) },
+  { "fixit-insert", SGR_SEQ (COLOR_FG_GRE

[pushed 1/3] selftests: split out make_fndecl from selftest.h to its own header

2024-05-28 Thread David Malcolm
Avoid selftest.h requiring the "tree" type.
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-873-gfb7a943ead689e.

gcc/analyzer/ChangeLog:
* region-model.cc: Include "selftest-tree.h".

gcc/ChangeLog:
* function-tests.cc: Include "selftest-tree.h".
* selftest-tree.h: New file.
* selftest.h (make_fndecl): Move to selftest-tree.h.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc |  1 +
 gcc/function-tests.cc|  1 +
 gcc/selftest-tree.h  | 41 
 gcc/selftest.h   |  7 --
 4 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 gcc/selftest-tree.h

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index bebe2ed3cd69..0dd5671db1be 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-color.h"
 #include "bitmap.h"
 #include "selftest.h"
+#include "selftest-tree.h"
 #include "analyzer/analyzer.h"
 #include "analyzer/analyzer-logging.h"
 #include "ordered-hash-map.h"
diff --git a/gcc/function-tests.cc b/gcc/function-tests.cc
index 827734422d88..ea3d722d4b69 100644
--- a/gcc/function-tests.cc
+++ b/gcc/function-tests.cc
@@ -76,6 +76,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-ref.h"
 #include "cgraph.h"
 #include "selftest.h"
+#include "selftest-tree.h"
 #include "print-rtl.h"
 
 #if CHECKING_P
diff --git a/gcc/selftest-tree.h b/gcc/selftest-tree.h
new file mode 100644
index ..9922af3340f2
--- /dev/null
+++ b/gcc/selftest-tree.h
@@ -0,0 +1,41 @@
+/* A self-testing framework, for use by -fself-test.
+   Copyright (C) 2015-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_SELFTEST_TREE_H
+#define GCC_SELFTEST_TREE_H
+
+/* The selftest code should entirely disappear in a production
+   configuration, hence we guard all of it with #if CHECKING_P.  */
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Helper function for selftests that need a function decl.  */
+
+extern tree make_fndecl (tree return_type,
+const char *name,
+vec  ¶m_types,
+bool is_variadic = false);
+
+} /* end of namespace selftest.  */
+
+#endif /* #if CHECKING_P */
+
+#endif /* GCC_SELFTEST_TREE_H */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 3bddaf1c3228..808d432ec480 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -178,13 +178,6 @@ class line_table_test
   ~line_table_test ();
 };
 
-/* Helper function for selftests that need a function decl.  */
-
-extern tree make_fndecl (tree return_type,
-const char *name,
-vec  ¶m_types,
-bool is_variadic = false);
-
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 extern void
-- 
2.26.3



Re: [PATCH 2/4] resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN

2024-05-28 Thread Hans-Peter Nilsson
> Date: Mon, 27 May 2024 12:57:53 -0600
> From: Jeff Law 

> > * resource.cc: Include cfgrtl.h.  Use BLOCK_FOR_INSN (insn)->index
> > instead of calling find_basic_block (insn).  Assert for not -1.
> > (find_basic_block): Remove function.
> > (init_resource_info): Call compute_bb_for_insn.
> > (free_resource_info): Call free_bb_for_insn.
> I'm pretty sure that code as part of the overall problem -- namely that 
> we didn't have good basic block info so we resorted to insn scanning.
> 
> Presumably we set BLOCK_FOR_INSN when we generate a wrapper SEQUENCE 
> insns for a filled delay slot?

Yes - one way or the other: most insn chain changes from
reorg are through calls to add_insn_after, which always sets
the bb of the added insn according to the reference insn
(except when either insn is a barrier, then it never sets a
bb); see for example emit_delay_sequence.  Others by
emit_insn_before and emit_copy_of_insn_after.

(Not-so-)fun fact: add_insn_after takes a bb parameter which
reorg.cc always passes as NULL.  But - the argument is
*always ignored* and the bb in the "after" insn is used.
I traced that ignored parameter as far as
r0-81421-g6fb5fa3cbc0d78 "Merge dataflow branch into
mainline" when is was added.  I *guess* it's an artifact
left over from some idea explored on that branch.  Ripe for
obvious cleanup by removal everywhere.

>  Assuming we do create the right mapping 
> for those new insns, then this is OK.

Thanks for the quick review of the whole set!

brgds, H-P


Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-28 Thread Segher Boessenkool
On Sat, May 25, 2024 at 09:12:05AM -0300, Alexandre Oliva wrote:


You sent multiple patch series in one thread, and multiple versions of
the same series even.

This is very hard to even *read*, let alone work with.  Please don't.


Segher


Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-28 Thread François Dumont
I can indeed restore _M_initialize_dispatch as it was before. It was not 
fixing my initial problem. I simply kept the code simplification.


    libstdc++: Use RAII to replace try/catch blocks

    Move _Guard into std::vector declaration and use it to guard all 
calls to

    vector _M_allocate.

    Doing so the compiler has more visibility on what is done with the 
pointers

    and do not raise anymore the -Wfree-nonheap-object warning.

    libstdc++-v3/ChangeLog:

    * include/bits/vector.tcc (_Guard): Move all the nested 
duplicated class...

    * include/bits/stl_vector.h (_Guard_alloc): ...here and rename.
    (_M_allocate_and_copy): Use latter.
    (_M_initialize_dispatch): Small code simplification.
    (_M_range_initialize): Likewise and set _M_finish first 
from the result

    of __uninitialize_fill_n_a that can throw.

Tested under Linux x86_64.

Ok to commit ?

François

On 28/05/2024 12:30, Jonathan Wakely wrote:

On Mon, 27 May 2024 at 05:37, François Dumont  wrote:

Here is a new version working also in C++98.

Can we use a different solution that doesn't involve an explicit
template argument list for that __uninitialized_fill_n_a call?

-+this->_M_impl._M_finish = std::__uninitialized_fill_n_a
++this->_M_impl._M_finish =
++  std::__uninitialized_fill_n_a
+  (__start, __n, __value, _M_get_Tp_allocator());

Using _M_fill_initialize solves the problem :-)




Note that I have this failure:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)

but it's already failing on master, my patch do not change anything.

Yes, that's been failing for ages.


Tested under Linux x64,

still ok to commit ?

François

On 24/05/2024 16:17, Jonathan Wakely wrote:

On Thu, 23 May 2024 at 18:38, François Dumont  wrote:

On 23/05/2024 15:31, Jonathan Wakely wrote:

On 23/05/24 06:55 +0200, François Dumont wrote:

As explained in this email:

https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html

I experimented -Wfree-nonheap-object because of my enhancements on
algos.

So here is a patch to extend the usage of the _Guard type to other
parts of vector.

Nice, that fixes the warning you were seeing?

Yes ! I indeed forgot to say so :-)



We recently got a bug report about -Wfree-nonheap-object in
std::vector, but that is coming from _M_realloc_append which already
uses the RAII guard :-(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016

Note that I also had to move call to __uninitialized_copy_a before
assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
warn. But _M_realloc_append is already doing potentially throwing
operations before assigning this->_M_impl so it must be something else.

Though it made me notice another occurence of _Guard in this method. Now
replaced too in this new patch.

   libstdc++: Use RAII to replace try/catch blocks

   Move _Guard into std::vector declaration and use it to guard all
calls to
   vector _M_allocate.

   Doing so the compiler has more visibility on what is done with the
pointers
   and do not raise anymore the -Wfree-nonheap-object warning.

   libstdc++-v3/ChangeLog:

   * include/bits/vector.tcc (_Guard): Move all the nested
duplicated class...
   * include/bits/stl_vector.h (_Guard_alloc): ...here.
   (_M_allocate_and_copy): Use latter.
   (_M_initialize_dispatch): Likewise and set _M_finish first
from the result
   of __uninitialize_fill_n_a that can throw.
   (_M_range_initialize): Likewise.


diff --git a/libstdc++-v3/include/bits/stl_vector.h
b/libstdc++-v3/include/bits/stl_vector.h
index 31169711a48..4ea74e3339a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
clear() _GLIBCXX_NOEXCEPT
{ _M_erase_at_end(this->_M_impl._M_start); }

+private:
+  // RAII guard for allocated storage.
+  struct _Guard

If it's being defined at class scope instead of locally in a member
function, I think a better name would be good. Maybe _Ptr_guard or
_Dealloc_guard or something.

_Guard_alloc chosen.

+  {
+pointer _M_storage;// Storage to deallocate
+size_type _M_len;
+_Base& _M_vect;
+
+_GLIBCXX20_CONSTEXPR
+_Guard(pointer __s, size_type __l, _Base& __vect)
+: _M_storage(__s), _M_len(__l), _M_vect(__vect)
+{ }
+
+_GLIBCXX20_CONSTEXPR
+~_Guard()
+{
+  if (_M_storage)
+_M_vect._M_deallocate(_M_storage, _M_len);
+}
+
+_GLIBCXX20_CONSTEXPR
+pointer
+_M_release()
+{
+  pointer __res = _M_storage;
+  _M_storage = 0;

I don't think the NullablePointer requirements include assigning 0,
only from nullptr, which isn't valid in C++98.

https://en.cppreference.com/w/cpp/named_req/NullablePointer

Please use _M_storage = pointer() instea

[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.

The load from _Atomic location a improves from:

movqa, %xmm0
movq%xmm0, (%esp)
movl(%esp), %eax
movl4(%esp), %edx

to:
movqa, %xmm0
movd%xmm0, %eax
pextrd  $1, %xmm0, %edx

The store to _Atomic location b improves from:

movl%eax, (%esp)
movl%edx, 4(%esp)
movq(%esp), %xmm0
movq%xmm0, b

to:
movd%eax, %xmm0
pinsrd  $1, %edx, %xmm0
movq%xmm0, b

gcc/ChangeLog:

* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.

Bootstrapped and regression tested on x86_64-pc-linuxgnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8317581ebe2..f2b3ba0aa7a 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -215,8 +215,18 @@ (define_insn_and_split "atomic_loaddi_fpu"
}
   else
{
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
  emit_insn (gen_loaddi_via_sse (tmp, src));
- emit_insn (gen_storedi_via_sse (mem, tmp));
+
+ if (GENERAL_REG_P (dst)
+ && TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_FROM_VEC)
+   {
+ emit_move_insn (dst, tmpdi);
+ DONE;
+   }
+ else
+   emit_move_insn (mem, tmpdi);
}
 
   if (mem != dst)
@@ -294,20 +304,30 @@ (define_insn_and_split "atomic_storedi_fpu"
 emit_move_insn (dst, src);
   else
 {
-  if (REG_P (src))
-   {
- emit_move_insn (mem, src);
- src = mem;
-   }
-
   if (STACK_REG_P (tmp))
{
+ if (GENERAL_REG_P (src))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
  emit_insn (gen_loaddi_via_fpu (tmp, src));
  emit_insn (gen_storedi_via_fpu (dst, tmp));
}
   else
{
- emit_insn (gen_loaddi_via_sse (tmp, src));
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
+ if (GENERAL_REG_P (src)
+ && !(TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_TO_VEC))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
+ emit_move_insn (tmpdi, src);
+
  emit_insn (gen_storedi_via_sse (dst, tmp));
}
 }


Re: [RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-28 Thread Joseph Myers
On Fri, 24 May 2024, Kewen.Lin wrote:

> Following your suggestion and comments, I made this patch
> for mode_for_floating_type first, considering this touches
> a few FE and port specific code, I think I have to split
> it into a patch series.  Before making that, I'd like to
> ensure this meets what you expected, and also seek for the

The general idea seems reasonable (I haven't reviewed it in detail).  
Note that when removing a target macro, it's a good idea to add it to the 
"Old target macros that have moved to the target hooks structure." list 
(of #pragma GCC poison) in system.h to ensure any new target that was 
originally written before the change doesn't accidentally get into GCC 
while still using the old macros.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-28 Thread Qing Zhao
Thank you for the comments. See my answers below:

Joseph, please see the last question, I need your help on it. Thanks a lot for 
the help.

Qing

> On May 28, 2024, at 03:38, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>> 
>> Including the following changes:
>> * The definition of the new internal function .ACCESS_WITH_SIZE
>>  in internal-fn.def.
>> * C FE converts every reference to a FAM with a "counted_by" attribute
>>  to a call to the internal function .ACCESS_WITH_SIZE.
>>  (build_component_ref in c_typeck.cc)
>> 
>>  This includes the case when the object is statically allocated and
>>  initialized.
>>  In order to make this working, the routines initializer_constant_valid_p_1
>>  and output_constant in varasm.cc are updated to handle calls to
>>  .ACCESS_WITH_SIZE.
>>  (initializer_constant_valid_p_1 and output_constant in varasm.c)
>> 
>>  However, for the reference inside "offsetof", the "counted_by" attribute is
>>  ignored since it's not useful at all.
>>  (c_parser_postfix_expression in c/c-parser.cc)
>> 
>>  In addtion to "offsetof", for the reference inside operator "typeof" and
>>  "alignof", we ignore counted_by attribute too.
>> 
>>  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>>  replace the call with its first argument.
>> 
>> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>>  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
>> * Adjust alias analysis to exclude the new internal from clobbering anything.
>>  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
>> tree-ssa-alias.cc)
>> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
>> when
>>  it's LHS is eliminated as dead code.
>>  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
>> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>>  get the reference from the call to .ACCESS_WITH_SIZE.
>>  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
>> 
>> gcc/c/ChangeLog:
>> 
>>* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>>attribute when build_component_ref inside offsetof operator.
>>* c-tree.h (build_component_ref): Add one more parameter.
>>* c-typeck.cc (build_counted_by_ref): New function.
>>(build_access_with_size_for_counted_by): New function.
>>(build_component_ref): Check the counted-by attribute and build
>>call to .ACCESS_WITH_SIZE.
>>(build_unary_op): When building ADDR_EXPR for
>>.ACCESS_WITH_SIZE, use its first argument.
>>(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>> 
>> gcc/ChangeLog:
>> 
>>* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>>* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>>* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
>>IFN_ACCESS_WITH_SIZE.
>>(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
>>* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
>>to .ACCESS_WITH_SIZE when its LHS is dead.
>>* tree.cc (process_call_operands): Adjust side effect for function
>>.ACCESS_WITH_SIZE.
>>(is_access_with_size_p): New function.
>>(get_ref_from_access_with_size): New function.
>>* tree.h (is_access_with_size_p): New prototype.
>>(get_ref_from_access_with_size): New prototype.
>>* varasm.cc (initializer_constant_valid_p_1): Handle call to
>>.ACCESS_WITH_SIZE.
>>(output_constant): Handle call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/flex-array-counted-by-2.c: New test.
>> ---
>> gcc/c/c-parser.cc |  10 +-
>> gcc/c/c-tree.h|   2 +-
>> gcc/c/c-typeck.cc | 128 +-
>> gcc/internal-fn.cc|  35 +
>> gcc/internal-fn.def   |   4 +
>> .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
>> gcc/tree-ssa-alias.cc |   2 +
>> gcc/tree-ssa-dce.cc   |   5 +-
>> gcc/tree.cc   |  25 +++-
>> gcc/tree.h|   8 ++
>> gcc/varasm.cc |  10 ++
>> 11 files changed, 331 insertions(+), 10 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> 
>> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
>> index c31349dae2ff..a6ed5ac43bb1 100644
>> --- a/gcc/c/c-parser.cc
>> +++ b/gcc/c/c-parser.cc
>> @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
>>if (c_parser_next_token_is (parser, CPP_NAME))
>>  {
>>c_token *comp_tok = c_parser_peek_token (parser);
>> +   /* Ignore the counted_by attribute for reference inside
>> +  offsetof since the information

Re: [PATCH v9 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-05-28 Thread Qing Zhao


> On May 28, 2024, at 03:39, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>> 
> 
> I have no comments here, if Siddesh is OK with this I approve.

thanks.

Qing
> 
>> gcc/ChangeLog:
>> 
>>* tree-object-size.cc (access_with_size_object_size): New function.
>>(call_object_size): Call the new function.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
>>* gcc.dg/flex-array-counted-by-3.c: New test.
>>* gcc.dg/flex-array-counted-by-4.c: New test.
>>* gcc.dg/flex-array-counted-by-5.c: New test.
>> ---
>> .../gcc.dg/builtin-object-size-common.h   |  11 ++
>> .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
>> .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
>> .../gcc.dg/flex-array-counted-by-5.c  |  48 +
>> gcc/tree-object-size.cc   |  60 ++
>> 5 files changed, 360 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
>> b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> index 66ff7cdd953a..b677067c6e6b 100644
>> --- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> @@ -30,3 +30,14 @@ unsigned nfails = 0;
>>   __builtin_abort ();
>> \
>> return 0;
>> \
>>   } while (0)
>> +
>> +#define EXPECT(p, _v) do {  
>>  \
>> +  size_t v = _v;
>>  \
>> +  if (p == v)   
>>  \
>> +__builtin_printf ("ok:  %s == %zd\n", #p, p);   
>>  \
>> +  else  
>>  \
>> +{   
>>  \
>> +  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   
>>  \
>> +  FAIL ();  
>>  \
>> +}   
>>  \
>> +} while (0);
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> new file mode 100644
>> index ..78f50230e891
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> @@ -0,0 +1,63 @@
>> +/* Test the attribute counted_by and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +struct flex {
>> +  int b;
>> +  int c[];
>> +} *array_flex;
>> +
>> +struct annotated {
>> +  int b;
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_annotated;
>> +
>> +struct nested_annotated {
>> +  struct {
>> +union {
>> +  int b;
>> +  float f;
>> +};
>> +int n;
>> +  };
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_nested_annotated;
>> +
>> +void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
>> +{
>> +  array_flex
>> += (struct flex *)malloc (sizeof (struct flex)
>> ++ normal_count *  sizeof (int));
>> +  array_flex->b = normal_count;
>> +
>> +  array_annotated
>> += (struct annotated *)malloc (sizeof (struct annotated)
>> + + attr_count *  sizeof (int));
>> +  array_annotated->b = attr_count;
>> +
>> +  array_nested_annotated
>> += (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
>> ++ attr_count *  sizeof (int));
>> +  array_nested_annotated->b = attr_count;
>> +
>> +  return;
>> +}
>> +
>> +void __attribute__((__noinline__)) test ()
>> +{
>> +EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
>> +EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
>> +  array_annotated->b * sizeof (int));
>> +EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
>> +  array_nested_annotated->b * sizeof (int));
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +  setup (10,10);
>> +  test ();
>> +  DONE ();
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> new file mode 100644
>> index ..20103d58ef51
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> @@ -0,0 +1,178 @@
>> +/* Test the attribute counted_by and its usage in
>> +__builtin_dynamic_object_size: what's the correct behavior when the
>> 

Re: [PATCH v9 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-05-28 Thread Qing Zhao


> On May 28, 2024, at 03:43, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:55 PM Qing Zhao  wrote:
>> 
>> to carry the TYPE of the flexible array.
>> 
>> Such information is needed during tree-object-size.cc.
>> 
>> We cannot use the result type or the type of the 1st argument
>> of the routine .ACCESS_WITH_SIZE to decide the element type
>> of the original array due to possible type casting in the
>> source code.
> 
> OK.  I guess technically an empty CONSTRUCTOR of the array type
> would work as well (as aggregate it's fine to have it in the call) but a
> constant zero pointer might be cheaper to have as it's shared across
> multiple calls.

So, I consider this as an approval? -:)

thanks.

Qing
> 
> Richard.
> 
>> gcc/c/ChangeLog:
>> 
>>* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
>>argument to .ACCESS_WITH_SIZE.
>> 
>> gcc/ChangeLog:
>> 
>>* tree-object-size.cc (access_with_size_object_size): Use the type
>>of the 6th argument for the type of the element.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/flex-array-counted-by-6.c: New test.
>> ---
>> gcc/c/c-typeck.cc | 11 +++--
>> gcc/internal-fn.cc|  2 +
>> .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
>> gcc/tree-object-size.cc   | 16 ---
>> 4 files changed, 66 insertions(+), 9 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index ff6685c6c4ba..0ea3b75355a4 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -2640,7 +2640,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
>> *counted_by_type)
>> 
>>to:
>> 
>> -   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
>> +   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
>> +   (TYPE_OF_ARRAY *)0))
>> 
>>NOTE: The return type of this function is the POINTER type pointing
>>to the original flexible array type.
>> @@ -2652,6 +2653,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
>> *counted_by_type)
>>The 4th argument of the call is a constant 0 with the TYPE of the
>>object pointed by COUNTED_BY_REF.
>> 
>> +   The 6th argument of the call is a constant 0 with the pointer TYPE
>> +   to the original flexible array type.
>> +
>>   */
>> static tree
>> build_access_with_size_for_counted_by (location_t loc, tree ref,
>> @@ -2664,12 +2668,13 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> -   result_type, 5,
>> +   result_type, 6,
>>array_to_pointer_conversion (loc, ref),
>>counted_by_ref,
>>build_int_cst (integer_type_node, 1),
>>build_int_cst (counted_by_type, 0),
>> -   build_int_cst (integer_type_node, -1));
>> +   build_int_cst (integer_type_node, -1),
>> +   build_int_cst (result_type, 0));
>>   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
>>   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
>>   SET_EXPR_LOCATION (call, loc);
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index e744080ee670..34e4a4aea534 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>  1: read_only
>>  2: write_only
>>  3: read_write
>> +   6th argument: A constant 0 with the pointer TYPE to the original flexible
>> + array type.
>> 
>>Both the return type and the type of the first argument of this
>>function have been converted from the incomplete array type to
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> new file mode 100644
>> index ..65fa01443d95
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> @@ -0,0 +1,46 @@
>> +/* Test the attribute counted_by and its usage in
>> + * __builtin_dynamic_object_size: when the type of the flexible array member
>> + * is casting to another type.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +typedef unsigned short u16;
>> +
>> +struct info {
>> +   u16 data_len;
>> +   char data[] __attribute__((counted_by(data_len)));
>> +};
>> +
>> +struct foo {
>> +   int a;
>> +   int b;
>> +};
>> +
>> +static __attribute__((__noinline__))
>> +struct info *setup ()
>> +{
>> + struct info *p;
>> + size_t bytes = 3 * sizeof(struct foo);
>> +
>> + p = (struct info *)malloc (sizeof (st

Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-28 Thread Segher Boessenkool
On Sat, May 25, 2024 at 09:13:12AM -0300, Alexandre Oliva wrote:
> Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes
> out of a single call insn, but the call (bl) or jump (b) is not always
> the last opcode in the sequence.

> This does not seem to be a problem for exception handling tables, but
> the return_pc attribute in the call graph output in dwarf2+ debug
> information, that takes the address of a label output right after the
> call, does not match the value of the link register even for non-tail
> calls.  E.g., with ABI_AIX or ABI_ELFv2, such code as:
> 
>   foo ();
> 
> outputs:
> 
>   bl foo
>   nop
>  LVL#:
> [...]
>   .8byte .LVL#  # DW_AT_call_return_pc
> 
> but debug info consumers may rely on the return_pc address, and draw
> incorrect conclusions from its off-by-4 value.
> 
> This patch uses the infrastructure for targets to add an offset to the
> label issued after the call_insn to set the call_return_pc attribute,
> on rs6000, to account for opcodes issued after actual call opcode as
> part of call insns output patterns.

> for  gcc/ChangeLog
> 
>   * config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL):
>   Override.

Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
take 8 columns.

> +/* Return the offset to be added to the label output after CALL_INSN
> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> +
> +static int
> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> +{
> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always

This isn't true.  There is a crlogical insn before the bcl for sysv for
example.

> + a 4-byte instruction, but some output patterns issue other
> + opcodes afterwards.  The return label is issued after the entire
> + call insn, including any such post-call opcodes.  Instead of
> + figuring out which cases need adjustments, we compute the offset
> + back to the address of the call opcode proper, then add the
> + constant 4 bytes, to get the address after that opcode.  */
> +  return 4 - get_attr_length (call_insn);

Please explain this magic, too -- in code preferably (so with a ? :
maybe, but don't try to "optimise" that expression, let the compiler
do that, it is much better at it anyway :-) )

> +}

Is that correct for all ABIs we support?  Even if so, it needs a lot
more documentation than this.


Segher


Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread Richard Sandiford
YunQiang Su  writes:
> If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try with
> -as/ld/objcopy before fallback to native as/ld/objcopy.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff

I'm probably making you repeat a previous discussion, sorry, but could
you describe the use case in more detail?  The current approach to
handling cross toolchains has been used for many years.  Presumably
this patch is supporting a different way of organising things,
but I wasn't sure from the description what it was.

AIUI, we currently assume that cross as, ld and objcopy will be
installed under those names in $prefix/$target_alias/bin (aka $tooldir/bin).
E.g.:

   bin/aarch64-elf-as = aarch64-elf/bin/as

GCC should then find as in aarch64-elf/bin.

Is that not true in your case?

To be clear, I'm not saying the patch is wrong.  I'm just trying to
understand why the patch is needed.

Thanks,
Richard

>
> gcc
>   * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
>   to native as/ld/objcopy.
> ---
>  gcc/gcc.cc | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..3dc6348d761 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,26 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
>   commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> + && !strcmp (commands[0].argv[0], commands[0].prog)
> + && (!strcmp (commands[0].prog, "as")
> + || !strcmp (commands[0].prog, "ld")
> + || !strcmp (commands[0].prog, "objcopy")))
> + {
> +   string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> + commands[0].prog, NULL);
> +   const char *string_args[] = {string, "--version", NULL};
> +   int exit_status = 0;
> +   int err = 0;
> +   const char *errmsg = pex_one (PEX_SEARCH, string,
> +   CONST_CAST (char **, string_args), string,
> +   NULL, NULL, &exit_status, &err);
> +   if (errmsg == NULL && exit_status == 0 && err == 0)
> + {
> +   commands[0].argv[0] = string;
> +   commands[0].prog = string;
> + }
> + }
>  }
>  
>for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)


Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-28 Thread Joseph Myers
On Fri, 24 May 2024, Martin Uecker wrote:

> This is another version of this patch with two changes:
> 
> - I added a fix (with test) for PR 115177 which is just the same
> issue for hardbools which are internally implemented as enums.
> 
> - I fixed the golang issue. Since the addition of the main variant
> to the seen decls is unconditional I removed also the addition
> of the type itself which now seems unnecessary.
> 
> Bootstrapped and regression tested on x86_64.

The front-end changes and the testcases are OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [C23 PATCH, v2] fix aliasing for structures/unions with incomplete types

2024-05-28 Thread Joseph Myers
On Sun, 26 May 2024, Martin Uecker wrote:

> This is the patch I sent previously, but I tried to improve the
> description and added a long comment.  This patch is needed so
> that we do not have to update TYPE_CANONICAL of structures / unions
> when a tagged type is completed that is (recursively) pointed to 
> by a member of the structure / union.
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> C23: fix aliasing for structures/unions with incomplete types
> 
> When incomplete structure/union types are completed later, compatibility
> of struct types that contain pointers to such types changes.  When forming
> equivalence classes for TYPE_CANONICAL, we therefor need to be 
> conservative
> and treat all structs with the same tag which are pointer targets as
> equivalent for purposed of determining equivalency of structure/union
> types which contain such types as member. This avoids having to update
> TYPE_CANONICAL of such structure/unions recursively. The pointer types
> themselves are updated in c_update_type_canonical.
> 
> gcc/c/
> * c-typeck.cc (comptypes_internal): Add flag to track
> whether a struct is the target of a pointer.
> (tagged_types_tu_compatible): When forming equivalence
> classes, treat nested pointed-to structs as equivalent.
> 
> gcc/testsuite/
> * gcc.dg/c23-tag-incomplete-alias-1.c: New test.

This patch is OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [C23 PATCH]: allow aliasing for types derived from structs with variable size

2024-05-28 Thread Joseph Myers
On Sun, 26 May 2024, Martin Uecker wrote:

> +/* Helper function for comptypes.  For two compatible types, return 1
> +   if they pass consistency checks.  In particular we test that
> +   TYPE_CANONICAL ist set correctly, i.e. the two types can alias.  */

s/ist/is/.  OK with that fix.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v3] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

2024-05-28 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, May 27, 2024 at 9:48 AM Jiawei  wrote:
>>
>> Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html
>>
>> version log v3: remove additional POLY_INT_CST check.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html
>
> OK.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>> * tree-ssa-pre.cc (create_component_ref_by_pieces_1): New conditions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.
>>
>> ---
>>  .../gcc.target/riscv/rvv/vsetvl/pr115214.c| 52 +++
>>  gcc/tree-ssa-pre.cc   | 10 ++--
>>  2 files changed, 59 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
>> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>> new file mode 100644
>> index 000..fce2e9da766
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>> @@ -0,0 +1,52 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 
>> -w" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +
>> +#include 
>> +
>> +static inline __attribute__(()) int vaddq_f32();
>> +static inline __attribute__(()) int vload_tillz_f32(int nlane) {
>> +  vint32m1_t __trans_tmp_9;
>> +  {
>> +int __trans_tmp_0 = nlane;
>> +{
>> +  vint64m1_t __trans_tmp_1;
>> +  vint64m1_t __trans_tmp_2;
>> +  vint64m1_t __trans_tmp_3;
>> +  vint64m1_t __trans_tmp_4;
>> +  if (__trans_tmp_0 == 1) {
>> +{
>> +  __trans_tmp_3 =
>> +  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
>> +}
>> +__trans_tmp_4 = __trans_tmp_2;
>> +  }
>> +  __trans_tmp_4 = __trans_tmp_3;
>> +  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
>> +}
>> +  }
>> +  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' 
>> cannot be passed to an unprototyped function} } */
>> +}
>> +
>> +char CFLOAT_add_args[3];
>> +const int *CFLOAT_add_steps;
>> +const int CFLOAT_steps;
>> +
>> +__attribute__(()) void CFLOAT_add() {
>> +  char *b_src0 = &CFLOAT_add_args[0], *b_src1 = &CFLOAT_add_args[1],
>> +   *b_dst = &CFLOAT_add_args[2];
>> +  const float *src1 = (float *)b_src1;
>> +  float *dst = (float *)b_dst;
>> +  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
>> +  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
>> +  const int hstep = 4 / 2;
>> +  vfloat32m1x2_t a;
>> +  int len = 255;
>> +  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
>> +int b = vload_tillz_f32(len);
>> +int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
>> '__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
>> +  }
>> +  for (; len > 0; --len, b_src0 += CFLOAT_steps,
>> +  b_src1 += CFLOAT_add_steps[1], b_dst += 
>> CFLOAT_add_steps[2])
>> +;
>> +}
>> diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
>> index 75217f5cde1..5cf1968bc26 100644
>> --- a/gcc/tree-ssa-pre.cc
>> +++ b/gcc/tree-ssa-pre.cc
>> @@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, 
>> vn_reference_t ref,
>>here as the element alignment may be not visible.  See
>>PR43783.  Simply drop the element size for constant
>>sizes.  */
>> -   if (TREE_CODE (genop3) == INTEGER_CST
>> +   if ((TREE_CODE (genop3) == INTEGER_CST
>> && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
>> && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
>> -(wi::to_offset (genop3)
>> - * vn_ref_op_align_unit (currop
>> +(wi::to_offset (genop3) * vn_ref_op_align_unit 
>> (currop

Sorry for the nits, but the original formatting was correct here.
The new one instead goes over 80 columns.

>> + || (TREE_CODE (genop3) == EXACT_DIV_EXPR
>> +   && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
>> +   && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT 
>> (elmt_type))

Similarly this line is too long.

Thanks for fixing this.

Richard

>> +   && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
>> +vn_ref_op_align_unit (currop
>>   genop3 = NULL_TREE;
>> else
>>   {
>> --
>> 2.25.1
>>


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Compared to previous version, the main change is to specify acceptable
> modes for the optab.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>   for isfinite builtin.
>   * optabs.def (isfinite_optab): New.
>   * doc/md.texi (isfinite): Document.
>
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..b8432f84020 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..67407fad37d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> +otherwise.

This has probably already been discussed, sorry, but how about defining
the optab to return a strict 0/1 result, rather than just zero/nonzero?
I realise that's stricter than the underlying math.h routines, but it
would in principle avoid the need to expand extra instructions in
a setcc-like operation.

Richard

> +
> +If this pattern @code{FAIL}, a call to the library function
> +@code{isfinite} is used.
> +
>  @end table
>
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


[COMMITTED] More tweaks from gimple_outgoing_range changes.

2024-05-28 Thread Andrew MacLeod
The dom_ranger class used for fast vrp no longer needs it's own local 
gimple_outgoing_range object as it is now always available from the 
range_query parent class.


The builtin_unreachable code for adjusting globals and removing the 
builtin calls during the final VRP pass can now function with just a 
range_query object rather than a specific ranger.   This adjusts it to 
use the extra methods in the base range_query API. This will now allow 
removal of builtin_unreachable calls even if there is no active ranger 
with dependency info available.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 05ff069ba937dc3970f2a757e426935fcf4c15fb Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 22 May 2024 19:27:01 -0400
Subject: [PATCH 3/5] More tweaks from gimple_outgoing_range changes.

the dom_ranger used for fast vrp no longer needs a local
gimple_outgoing_range object as it is now always available from the
range_query parent class.

The builtin_unreachable code for adjusting globals and removing the
builtin calls during the final VRP pass can now function with just
a range_query object rather than a specific ranger.   This adjusts it to
use the extra methods in the range_query API.
This will now allow removal of builtin_unreachable calls even if there is no
active ranger with dependency info available.

	* gimple-range.cc (dom_ranger::dom_ranger): Do not initialize m_out.
	(dom_ranger::maybe_push_edge): Use gori () rather than m_out.
	* gimple-range.h (dom_ranger::m_out): Remove.
	* tree-vrp.cc (remove_unreachable::remove_unreachable): Use a
	range-query ranther than a gimple_ranger.
	(remove_unreachable::remove): New.
	(remove_unreachable::m_ranger): Change to a range_query.
	(remove_unreachable::handle_early): If there is no dependency
	information, do nothing.
	(remove_unreachable::remove_and_update_globals): Do not update
	globals if there is no dependecy info to use.
---
 gcc/gimple-range.cc |  4 ++--
 gcc/gimple-range.h  |  1 -
 gcc/tree-vrp.cc | 47 +++--
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 0749c9fa215..711646abb67 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -922,7 +922,7 @@ assume_query::dump (FILE *f)
 
 // Create a DOM based ranger for use by a DOM walk pass.
 
-dom_ranger::dom_ranger () : m_global (), m_out ()
+dom_ranger::dom_ranger () : m_global ()
 {
   m_freelist.create (0);
   m_freelist.truncate (0);
@@ -1156,7 +1156,7 @@ dom_ranger::maybe_push_edge (edge e, bool edge_0)
 e_cache = m_freelist.pop ();
   else
 e_cache = new ssa_lazy_cache;
-  gori_on_edge (*e_cache, e, this, &m_out);
+  gori_on_edge (*e_cache, e, this, &gori ());
   if (e_cache->empty_p ())
 m_freelist.safe_push (e_cache);
   else
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 1532951a449..180090bed15 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -121,7 +121,6 @@ protected:
   DISABLE_COPY_AND_ASSIGN (dom_ranger);
   void maybe_push_edge (edge e, bool edge_0);
   ssa_cache m_global;
-  gimple_outgoing_range m_out;
   vec m_freelist;
   vec m_e0;
   vec m_e1;
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 7d7f9fe2932..1c7b451d8fb 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -85,14 +85,15 @@ along with GCC; see the file COPYING3.  If not see
 
 class remove_unreachable {
 public:
-  remove_unreachable (gimple_ranger &r, bool all) : m_ranger (r), final_p (all)
+  remove_unreachable (range_query &r, bool all) : m_ranger (r), final_p (all)
 { m_list.create (30); }
   ~remove_unreachable () { m_list.release (); }
   void handle_early (gimple *s, edge e);
   void maybe_register (gimple *s);
+  bool remove ();
   bool remove_and_update_globals ();
   vec > m_list;
-  gimple_ranger &m_ranger;
+  range_query &m_ranger;
   bool final_p;
 };
 
@@ -195,6 +196,9 @@ fully_replaceable (tree name, basic_block bb)
 void
 remove_unreachable::handle_early (gimple *s, edge e)
 {
+  // If there is no gori_ssa, there is no early processsing.
+  if (!m_ranger.gori_ssa ())
+return ;
   bool lhs_p = TREE_CODE (gimple_cond_lhs (s)) == SSA_NAME;
   bool rhs_p = TREE_CODE (gimple_cond_rhs (s)) == SSA_NAME;
   // Do not remove __builtin_unreachable if it confers a relation, or
@@ -253,6 +257,41 @@ remove_unreachable::handle_early (gimple *s, edge e)
 }
 }
 
+// Process the edges in the list, change the conditions and removing any
+// dead code feeding those conditions.   This removes the unreachables, but
+// makes no attempt to set globals values.
+
+bool
+remove_unreachable::remove ()
+{
+  if (!final_p || m_list.length () == 0)
+return false;
+
+  bool change = false;
+  unsigned i;
+  for (i = 0; i < m_list.length (); i++)
+{
+  auto eb = m_list[i];
+  basic_block src = BASIC_BLOCK_FOR_FN (cfun, eb.first);
+  basic_block dest = BASIC_BLOCK_FOR_FN (cfun, eb.second);
+  if (!src || !dest)
+	con

[PATCH 0/2] RISC-V: Add -m(no-)autovec-segment option

2024-05-28 Thread Patrick O'Neill
Rebased and combined these two patches into a series for precommit-CI to
properly test.

Edwin Lu (1):
  RISC-V: Fix testcases renamed test flag options

Greg McGary (1):
  RISC-V: add option -m(no-)autovec-segment

 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c
 create mode 100644 
gcc/testsuite/gcc.tar

[PATCH 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 

[PATCH 2/2] RISC-V: Fix testcases renamed test flag options

2024-05-28 Thread Patrick O'Neill
From: Edwin Lu 

Some testcases still had --param=riscv-autovec-preference=_,
update to use -mrvv-vector-bits=_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/no-segment.c: Update dejagnu flags
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-3.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-4.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-5.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-6.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-7.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-8.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-9.c:
  Ditto
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c  | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c   | 4 ++--
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-3.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-4.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-5.c| 2 +-
 .../riscv/rvv/autovec/struct/s

Re: [PATCH 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Vineet Gupta



On 5/28/24 14:55, Patrick O'Neill wrote:
> From: Greg McGary 
>
> Add option -m(no-)autovec-segment to enable/disable autovectorizer
> from emitting vector segment load/store instructions. This is useful for
> performance experiments.
>
> gcc/ChangeLog:
>   * config/riscv/autovec.md (vec_mask_len_load_lanes, 
> vec_mask_len_store_lanes):
> Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
>   * gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
> macro.
>   * gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
>   * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
> divide-by-zero.

I think this middle-end change needs to be broken out, even if
eventually merged when committing, with its own test case.

-Vineet


Re: [PATCH 2/2] RISC-V: Fix testcases renamed test flag options

2024-05-28 Thread Vineet Gupta
On 5/28/24 14:55, Patrick O'Neill wrote:
> From: Edwin Lu 
>
> Some testcases still had --param=riscv-autovec-preference=_,
> update to use -mrvv-vector-bits=_.

And this can be squashed with prev one, maybe added Tested-by Edwin.

Thx,
-Vineet


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool
On Tue, May 28, 2024 at 02:09:50PM +0200, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
> > As Haochen's previous reply, I think there are three cases:
> >   1) no optab defined, fold in a generic way;
> >   2) optab defined, SUCC, expand as what it defines;
> >   3) optab defined, FAIL, generate a library call;
> >
> > From above, I had the concern that ports may assume FAILing can
> > fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

That certainly would be the least surprising!


Segher


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool
Hi!

On Mon, May 27, 2024 at 05:37:23PM +0800, HAO CHEN GUI wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;

This needs a line break after the first ; (like after *any* semicolon
in C).  It is rather important that every "break;" stands out :-)

> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> +otherwise.

operand 0 is the output of the builtin, right?  So write that instead?
"Return 1 if the operand (a scalar floating poiint number) is finite",
or such?


Segher


[patch] libgomp: Enable USM for some nvptx devices

2024-05-28 Thread Tobias Burnus
While most of the nvptx systems I have access to don't have the support 
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, 
one has:


Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support 
this feature. And with that feature, unified-shared memory support does 
work, presumably by handling automatic page migration when a page fault 
occurs.


Hence: Enable USM support for those. When doing so, all 'requires 
unified_shared_memory' tests of sollve_vv pass :-)


I am not quite sure whether there are unintended side effects, hence, I 
have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.


OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…

libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES;
for those, unified shared memory is supported in hardware. This
patch enables support for those - if all installed nvptx devices
have this feature (as the capabilities are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h
	(CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES):
	Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  5 -
 libgomp/plugin/plugin-nvptx.c | 15 +++
 libgomp/target.c  | 24 +++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..db640d20366 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..e0d37f67983 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,7 +6435,10 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the
+  @code{CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES}
+  attribute; otherwise, all nvptx device are removed from the
   list of available devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..c4b0f5dd4bf 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether automatic page migration is supported; if so, enable USM.
+ Currently, capabilities is per device type, hence, check all devices.  */
+  if (num_devices > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+for (int dev = 0; dev < num_devices; dev++)
+  {
+	int pi;
+	CUresult r;
+	r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
+	  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES,
+	  dev);
+	if (r != CUDA_SUCCESS || pi == 0)
+	  return -1;
+  }
   return num_devices;
 }
 
diff --git a/libgomp/target.c b/lib

[PATCH v2 0/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
Rebased to squash Edwin's fixup into Greg's patch. Split out the middle-end
change and xfailed the associated testcase so the second patch can land
seperately.

Relying on pre-commit CI for full testing.

Greg McGary (2):
  RISC-V: add option -m(no-)autovec-segment
  Prevent divide-by-zero

 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/stru

[PATCH v2 2/2] Prevent divide-by-zero

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

gcc/ChangeLog:
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove xfail.
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
 gcc/tree-vect-stmts.cc  | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
index fd996a27501..79d03612a22 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
-mno-autovec-segment" } */
-/* { xfail *-*-* } */

 enum e { c, d };
 enum g { f };
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..34f5736ba00 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
 - (vec_num * j + i) * nunits);
/* remain should now be > 0 and < nunits.  */
unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)
+   && constant_multiple_p (nunits, remain, &num))
  {
tree ptype;
new_vtype
--
2.43.2



[PATCH v2 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.

Tested-by: Edwin Lu 
---
Added tested-by on Vineet's recommendation. Please wait for riscv precommit to
finish before committing.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 62 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 68 files changed, 410 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/tes

Re: ping: [PATCH] libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]

2024-05-28 Thread Lewis Hyatt
Hello-

May I please ping this one (now for GCC 15)? Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html

-Lewis

On Sat, Feb 10, 2024 at 9:02 AM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html
>
> May I please ping this one? Thanks!
>
> On Sat, Jan 13, 2024 at 5:12 PM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704
> >
> > The below patch fixes the issue noted in the PR that extended characters
> > cannot appear in the identifier passed to a #pragma push_macro or #pragma
> > pop_macro. Bootstrap + regtest all languages on x86-64 Linux. Is it OK for
> > GCC 13 please?
> >
> > I know we just entered stage 4, however I feel this is kinda like an old
> > regression, given that the issue was not apparent until support for UCNs and
> > UTF-8 in identifiers got added. FWIW, it would be nice if it makes it into
> > GCC 13, because AFAIK all other UTF-8-related bugs are fixed in this
> > release. (The other major one was for extended characters in a user-defined
> > literal, that was fixed by r14-2629).
> >
> > Speaking of just entering stage 4. I do have 4 really short patches sent
> > over the past several months that never got any response. Is there any
> > chance someone may have a few minutes to look at them please? They are
> > really just like 1-3 line fixes for PRs.
> >
> > libcpp (pinged once recently):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641247.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640386.html
> >
> > diagnostics (pinged for 3rd time last week):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html
>
> > -- >8 --
> >
> > The implementation of #pragma push_macro and #pragma pop_macro has to date
> > made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
> > identifier out of a string. When support was added for extended characters
> > in identifiers ($, UCNs, or UTF-8), that support was added only for the
> > "normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
> > not for the ad-hoc way. Consequently, extended identifiers are not usable
> > with these pragmas.
> >
> > The logic for lexing identifiers has become more complicated than it was
> > when _cpp_lex_identifier() was written -- it now handles things like \N{}
> > escapes in C++, for instance -- and it no longer seems practical to maintain
> > a redundant code path for lexing identifiers. Address the issue by changing
> > the implementation of #pragma {push,pop}_macro to lex identifiers in the
> > expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
> > there.
> >
> > The existing implementation has some quirks because of the ad-hoc parsing
> > logic. For example:
> >
> >  #pragma push_macro("X ")
> >  ...
> >  #pragma pop_macro("X")
> >
> > will not restore macro X (note the extra space in the first string). 
> > However:
> >
> >  #pragma push_macro("X ")
> >  ...
> >  #pragma pop_macro("X ")
> >
> > actually does sucessfully restore "X". This is because the key for looking
> > up the saved macro on the push stack is the original string passed, so the
> > string passed to pop_macro needs to match it exactly. It is not that easy to
> > reproduce this logic in the world of extended characters, given that for
> > example it should be valid to pass a UCN to push_macro, and the
> > corresponding UTF-8 to pop_macro. Given that this aspect of the existing
> > behavior seems unintentional and has no tests (and does not match other
> > implementations), I opted to make the new logic more straightforward. The
> > string passed needs to lex to one token, which must be a valid identifier,
> > or else no action is taken and no error is generated. Any diagnostics
> > encountered during lexing (e.g., due to a UTF-8 character not permitted to
> > appear in an identifier) are also suppressed.
> >
> > It could be nice (for GCC 15) to also add a warning if a pop_macro does not
> > match a previous push_macro.
> >
> > libcpp/ChangeLog:
> >
> > PR preprocessor/109704
> > * include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
> > * errors.cc
> > (cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
> > function.
> > (cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
> > function.
> > * charset.cc (noop_diagnostic_cb): Remove.
> > (cpp_interpret_string_ranges): Refactor diagnostic suppression logic
> > into new class cpp_auto_suppress_diagnostics.
> > (count_source_chars): Likewise.
> > * directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
> > (lex_identifier_from_string): New static helper function.
> > (push_pop_macro_common): Refactor common logic from
> > do_pragma_push_macro and do_pragma_pop_macro; use
> > lex_identifier_from_string i

Re: [PATCH v4] RISC-V: Introduce -mvector-strict-align.

2024-05-28 Thread Kito Cheng
I just created two PRs for adding those new options into
riscv-toolchain-conventions, so that we could make sure it aligned
with clang/LLVM community.

https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/49
https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/50

On Wed, May 29, 2024 at 3:20 AM Robin Dapp  wrote:
>
> Hi,
>
> this patch disables movmisalign by default and introduces
> the -mno-vector-strict-align option to override it and re-enable
> movmisalign.  For now, generic-ooo is the only uarch that supports
> misaligned vector access.
>
> The patch also adds a check_effective_target_riscv_v_misalign_ok to
> the testsuite which enables or disables the vector misalignment tests
> depending on whether the target under test can execute a misaligned
> vle32.
>
> Changes from v3:
>  - Adressed Kito's comments.
>  - Made -mscalar-strict-align a real alias.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
> Move from here...
> * config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
> ...to here and map to riscv_vector_unaligned_access_p.
> * config/riscv/riscv.opt: Add -mvector-strict-align.
> * config/riscv/riscv.cc (struct riscv_tune_param): Add
> vector_unaligned_access.
> (riscv_override_options_internal): Set
> riscv_vector_unaligned_access_p.
> * doc/invoke.texi: Document -mvector-strict-align.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp: Add
> check_effective_target_riscv_v_misalign_ok.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
> -mno-vector-strict-align.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
> ---
>  gcc/config/riscv/riscv-opts.h |  3 --
>  gcc/config/riscv/riscv.cc | 19 +++
>  gcc/config/riscv/riscv.h  |  5 +++
>  gcc/config/riscv/riscv.opt|  8 +
>  gcc/doc/invoke.texi   | 22 
>  .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
>  .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
>  gcc/testsuite/lib/target-supports.exp | 34 +--
>  13 files changed, 93 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index 1b2dd5757a8..f58a07abffc 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
>   ? 0 
>   \
>   : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
>
> -/* TODO: Enable RVV movmisalign by default for now.  */
> -#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
> -
>  /* The maximmum LMUL according to user configuration.  */
>  #define TARGET_MAX_LMUL  
>   \
>(int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a99211d56b1..13cd61a4a22 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -287,6 +287,7 @@ struct riscv_tune_param
>unsigned short memory_cost;
>unsigned short fmv_cost;
>bool slow_unaligned_access;
> +  bool vector_unaligned_access;
>bool use_divmod_expansion;
>bool overlap_op_by_pieces;
>unsigned int fusible_ops;
> @@ -299,6 +300,10 @@ struct riscv_tune_param
>  /* Whether unaligned accesses execute very slowly.  */
>  bool riscv_slow_unaligned_access_p;
>
> +/* Whether misaligned vector accesses are supported (i.e. do not
> +   throw an exception).  */
> +bool riscv_vector_unaligned_access_p;
> +
>  /* Whether user explicitly passed -mstrict-align.  */
>  bool riscv_user_wants_strict_align;
>
> @@ -441,6 +446,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>5,   /* memory_cost */
>8,   /* fmv_cost */
>true,/* 
> slow_unaligned_access */
> +  false,   /* vector_unaligned_access */
>false,   /* use_divmod_expansion */
>false,  

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Kewen.Lin
on 2024/5/28 20:09, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> on 2024/5/27 20:54, Richard Biener wrote:
>>> On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:

 Hi,
   This patch adds an optab for __builtin_isfinite. The finite check can be
 implemented on rs6000 by a single instruction. It needs an optab to be
 expanded to the certain sequence of instructions.

   The subsequent patches will implement the expand on rs6000.

   Compared to previous version, the main change is to specify acceptable
 modes for the optab.
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
 regressions. Is this OK for trunk?

 Thanks
 Gui Haochen

 ChangeLog
 optab: Add isfinite_optab for isfinite builtin

 gcc/
 * builtins.cc (interclass_mathfn_icode): Set optab to 
 isfinite_optab
 for isfinite builtin.
 * optabs.def (isfinite_optab): New.
 * doc/md.texi (isfinite): Document.


 patch.diff
 diff --git a/gcc/builtins.cc b/gcc/builtins.cc
 index f8d94c4b435..b8432f84020 100644
 --- a/gcc/builtins.cc
 +++ b/gcc/builtins.cc
 @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
errno_set = true; builtin_optab = ilogb_optab; break;
  CASE_FLT_FN (BUILT_IN_ISINF):
builtin_optab = isinf_optab; break;
 -case BUILT_IN_ISNORMAL:
  case BUILT_IN_ISFINITE:
 +  builtin_optab = isfinite_optab; break;
 +case BUILT_IN_ISNORMAL:
  CASE_FLT_FN (BUILT_IN_FINITE):
  case BUILT_IN_FINITED32:
  case BUILT_IN_FINITED64:
 diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
 index 5730bda80dc..67407fad37d 100644
 --- a/gcc/doc/md.texi
 +++ b/gcc/doc/md.texi
 @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
 with operand 2.

  This pattern is not allowed to @code{FAIL}.

 +@cindex @code{isfinite@var{m}2} instruction pattern
 +@item @samp{isfinite@var{m}2}
 +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
 +@code{DFmode}, or @code{TFmode} floating point number and to 0
>>>
>>> It should probably say scalar floating-point mode?  But what about the 
>>> result?
>>> Is any integer mode OK?  That's esp. important if this might be used on
>>> vector modes.
>>>
 +otherwise.
 +
 +If this pattern @code{FAIL}, a call to the library function
 +@code{isfinite} is used.
>>>
>>> Or it's otherwise inline expanded?  Or does this imply targets
>>> have to make sure to implement the pattern when isfinite is
>>> not available in libc/libm?  I suggest to leave this sentence out,
>>> we usually only say when a pattern may _not_ FAIL (and usually
>>> FAILing isn't different from not providing a pattern).
>>
>> As Haochen's previous reply, I think there are three cases:
>>   1) no optab defined, fold in a generic way;
>>   2) optab defined, SUCC, expand as what it defines;
>>   3) optab defined, FAIL, generate a library call;
>>
>> From above, I had the concern that ports may assume FAILing can
>> fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

Good point, sure, I'll follow up this.

BR,
Kewen

> 
>> Does your comment imply ports usually don't make such assumption
>> (or they just check what happens for FAIL)?
>>
>> BR,
>> Kewen
>>
>>>
  @end table

  @end ifset
 diff --git a/gcc/optabs.def b/gcc/optabs.def
 index ad14f9328b9..dcd77315c2a 100644
 --- a/gcc/optabs.def
 +++ b/gcc/optabs.def
 @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
  OPTAB_D (hypot_optab, "hypot$a3")
  OPTAB_D (ilogb_optab, "ilogb$a2")
  OPTAB_D (isinf_optab, "isinf$a2")
 +OPTAB_D (isfinite_optab, "isfinite$a2")
  OPTAB_D (issignaling_optab, "issignaling$a2")
  OPTAB_D (ldexp_optab, "ldexp$a3")
  OPTAB_D (log10_optab, "log10$a2")
>>
>>
>>



Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread YunQiang Su
Richard Sandiford  于2024年5月29日周三 05:28写道:
>
> YunQiang Su  writes:
> > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross 
> > toolchain,
> > the final fallback is `as/ld` of system.  In fact, we can have a try with
> > -as/ld/objcopy before fallback to native as/ld/objcopy.
> >
> > This patch is derivatived from Debian's patch:
> >   gcc-search-prefixed-as-ld.diff
>
> I'm probably making you repeat a previous discussion, sorry, but could
> you describe the use case in more detail?  The current approach to
> handling cross toolchains has been used for many years.  Presumably
> this patch is supporting a different way of organising things,
> but I wasn't sure from the description what it was.
>
> AIUI, we currently assume that cross as, ld and objcopy will be
> installed under those names in $prefix/$target_alias/bin (aka $tooldir/bin).
> E.g.:
>
>bin/aarch64-elf-as = aarch64-elf/bin/as
>
> GCC should then find as in aarch64-elf/bin.
>
> Is that not true in your case?
>

Yes. This patch is only about the final fallback. I mean aarch64-elf/bin/as
still has higher priority than bin/aarch64-elf-as.

In the current code, we find gas with:
/prefix/aarch64-elf/bin/as > $PATH/as

And this patch a new one between them:
/prefix/aarch64-elf/bin/as > $PATH/aarch64-elf-as > $PATH/as

> To be clear, I'm not saying the patch is wrong.  I'm just trying to
> understand why the patch is needed.
>

Yes. If gcc is configured correctly, it is not so useful.
In some case for some lazy user, it may be useful,
for example, the binutils installed into different prefix with libc etc.

For example, binutils is installed into /usr/aarch64-elf/bin, while
libc is installed into /usr/local/aarch64-elf/.

> Thanks,
> Richard
>
> >
> > gcc
> >   * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
> >   to native as/ld/objcopy.
> > ---
> >  gcc/gcc.cc | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > index 830a4700a87..3dc6348d761 100644
> > --- a/gcc/gcc.cc
> > +++ b/gcc/gcc.cc
> > @@ -3293,6 +3293,26 @@ execute (void)
> >string = find_a_program(commands[0].prog);
> >if (string)
> >   commands[0].argv[0] = string;
> > +  else if (*cross_compile != '0'
> > + && !strcmp (commands[0].argv[0], commands[0].prog)
> > + && (!strcmp (commands[0].prog, "as")
> > + || !strcmp (commands[0].prog, "ld")
> > + || !strcmp (commands[0].prog, "objcopy")))
> > + {
> > +   string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> > + commands[0].prog, NULL);
> > +   const char *string_args[] = {string, "--version", NULL};
> > +   int exit_status = 0;
> > +   int err = 0;
> > +   const char *errmsg = pex_one (PEX_SEARCH, string,
> > +   CONST_CAST (char **, string_args), string,
> > +   NULL, NULL, &exit_status, &err);
> > +   if (errmsg == NULL && exit_status == 0 && err == 0)
> > + {
> > +   commands[0].argv[0] = string;
> > +   commands[0].prog = string;
> > + }
> > + }
> >  }
> >
> >for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)


[COMMITTED] Gori_on_edge tweaks.

2024-05-28 Thread Andrew MacLeod


FAST_VRP uses a non-ranger gori_on_edge routine to calculate the full 
set of SSA ranges that can be calculated on an edge.  It allows an 
optional  outgoing_edge_range object if one wanted to use switches.  
This is now integrated with the gori () method of a range_query, and is 
no longer needed.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit a19f588d0b71a4cbc48b064177de87d3ca46b39f
Author: Andrew MacLeod 
Date:   Wed May 22 19:51:16 2024 -0400

Gori_on_edge tweaks.

FAST_VRP uses a non-ranger gori_on_edge routine which allows an optional
outgoing_edge_range object if one wanted to use switches.  This is now
integrated with the gori () method of a range_query, and is no longer
needed.

* gimple-range-gori.cc (gori_on_edge): Always use static ranges
from the specified range_query.
* gimple-range-gori.h (gori_on_edge): Change prototype.
* gimple-range.cc (dom_ranger::maybe_push_edge): Change arguments
to call.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 0d471b46903..d489aef312c 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1625,28 +1625,20 @@ gori_calc_operands (vrange &lhs, gimple *stmt, 
ssa_cache &r, range_query *q)
 }
 
 // Use ssa_cache R as a repository for all outgoing ranges on edge E that
-// can be calculated.  Use OGR if present to establish starting edge ranges,
-// and Q to resolve operand values.  If Q is NULL use the current range
+// can be calculated.  Use Q to establish starting edge ranges anbd to resolve
+// operand values.  If Q is NULL use the current range
 // query available to the system.
 
 bool
-gori_on_edge (ssa_cache &r, edge e, range_query *q, gimple_outgoing_range *ogr)
+gori_on_edge (ssa_cache &r, edge e, range_query *q)
 {
+  if (!q)
+q = get_range_query (cfun);
   // Start with an empty vector
   r.clear ();
   int_range_max lhs;
   // Determine if there is an outgoing edge.
-  gimple *stmt;
-  if (ogr)
-stmt = ogr->edge_range_p (lhs, e);
-  else
-{
-  stmt = gimple_outgoing_range_stmt_p (e->src);
-  if (stmt && is_a (stmt))
-   gcond_edge_range (lhs, e);
-  else
-   stmt = NULL;
-}
+  gimple *stmt = q->gori ().edge_range_p (lhs, e);
   if (!stmt)
 return false;
   gori_calc_operands (lhs, stmt, r, q);
diff --git a/gcc/gimple-range-gori.h b/gcc/gimple-range-gori.h
index 9b4bcd919f5..11019e38471 100644
--- a/gcc/gimple-range-gori.h
+++ b/gcc/gimple-range-gori.h
@@ -213,10 +213,8 @@ private:
 // ssa_cache structure).
 // GORI_NAME_ON_EDGE  is used to simply ask if NAME has a range on edge E
 
-// Fill ssa-cache R with any outgoing ranges on edge E, using OGR and QUERY.
-bool gori_on_edge (class ssa_cache &r, edge e,
-  range_query *query = NULL,
-  gimple_outgoing_range *ogr = NULL);
+// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
+bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);
 
 // Query if NAME has an outgoing range on edge E, and return it in R if so.
 // Note this doesnt use ranger, its a static GORI analysis of the range in
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 711646abb67..be22bb4aa18 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1156,7 +1156,7 @@ dom_ranger::maybe_push_edge (edge e, bool edge_0)
 e_cache = m_freelist.pop ();
   else
 e_cache = new ssa_lazy_cache;
-  gori_on_edge (*e_cache, e, this, &gori ());
+  gori_on_edge (*e_cache, e, this);
   if (e_cache->empty_p ())
 m_freelist.safe_push (e_cache);
   else


  1   2   >