Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 11:09 PM Qing Zhao  wrote:
>
> Thank you for the comments. See my answers below:
>
> Joseph, please see the last question, I need your help on it. Thanks a lot 
> for the help.
>
> Qing
>
> > On May 28, 2024, at 03:38, Richard Biener  
> > wrote:
> >
> > On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
> >>
> >> Including the following changes:
> >> * The definition of the new internal function .ACCESS_WITH_SIZE
> >>  in internal-fn.def.
> >> * C FE converts every reference to a FAM with a "counted_by" attribute
> >>  to a call to the internal function .ACCESS_WITH_SIZE.
> >>  (build_component_ref in c_typeck.cc)
> >>
> >>  This includes the case when the object is statically allocated and
> >>  initialized.
> >>  In order to make this working, the routines initializer_constant_valid_p_1
> >>  and output_constant in varasm.cc are updated to handle calls to
> >>  .ACCESS_WITH_SIZE.
> >>  (initializer_constant_valid_p_1 and output_constant in varasm.c)
> >>
> >>  However, for the reference inside "offsetof", the "counted_by" attribute 
> >> is
> >>  ignored since it's not useful at all.
> >>  (c_parser_postfix_expression in c/c-parser.cc)
> >>
> >>  In addtion to "offsetof", for the reference inside operator "typeof" and
> >>  "alignof", we ignore counted_by attribute too.
> >>
> >>  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
> >>  replace the call with its first argument.
> >>
> >> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
> >>  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
> >> * Adjust alias analysis to exclude the new internal from clobbering 
> >> anything.
> >>  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
> >> tree-ssa-alias.cc)
> >> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
> >> when
> >>  it's LHS is eliminated as dead code.
> >>  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
> >> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
> >>  get the reference from the call to .ACCESS_WITH_SIZE.
> >>  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
> >>
> >> gcc/c/ChangeLog:
> >>
> >>* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
> >>attribute when build_component_ref inside offsetof operator.
> >>* c-tree.h (build_component_ref): Add one more parameter.
> >>* c-typeck.cc (build_counted_by_ref): New function.
> >>(build_access_with_size_for_counted_by): New function.
> >>(build_component_ref): Check the counted-by attribute and build
> >>call to .ACCESS_WITH_SIZE.
> >>(build_unary_op): When building ADDR_EXPR for
> >>.ACCESS_WITH_SIZE, use its first argument.
> >>(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
> >>
> >> gcc/ChangeLog:
> >>
> >>* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
> >>* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
> >>* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
> >>IFN_ACCESS_WITH_SIZE.
> >>(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
> >>* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
> >>to .ACCESS_WITH_SIZE when its LHS is dead.
> >>* tree.cc (process_call_operands): Adjust side effect for function
> >>.ACCESS_WITH_SIZE.
> >>(is_access_with_size_p): New function.
> >>(get_ref_from_access_with_size): New function.
> >>* tree.h (is_access_with_size_p): New prototype.
> >>(get_ref_from_access_with_size): New prototype.
> >>* varasm.cc (initializer_constant_valid_p_1): Handle call to
> >>.ACCESS_WITH_SIZE.
> >>(output_constant): Handle call to .ACCESS_WITH_SIZE.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.dg/flex-array-counted-by-2.c: New test.
> >> ---
> >> gcc/c/c-parser.cc |  10 +-
> >> gcc/c/c-tree.h|   2 +-
> >> gcc/c/c-typeck.cc | 128 +-
> >> gcc/internal-fn.cc|  35 +
> >> gcc/internal-fn.def   |   4 +
> >> .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
> >> gcc/tree-ssa-alias.cc |   2 +
> >> gcc/tree-ssa-dce.cc   |   5 +-
> >> gcc/tree.cc   |  25 +++-
> >> gcc/tree.h|   8 ++
> >> gcc/varasm.cc |  10 ++
> >> 11 files changed, 331 insertions(+), 10 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
> >>
> >> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> >> index c31349dae2ff..a6ed5ac43bb1 100644
> >> --- a/gcc/c/c-parser.cc
> >> +++ b/gcc/c/c-parser.cc
> >> @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
> >>   

Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:49:01AM +0200, Tobias Burnus wrote:
> Jakub Jelinek wrote:
> > I mean, if we want to add something, maybe better would an -include like
> > option that instead of including a file includes it directly.
> > gcc --include-inline '#pragma omp requires unified_shared_memory' ...
> 
> Likewise for Fortran, but there the question is whether it should be in the
> use-stmt, import-stmt, implicit-part or declaration-part; I guess having one
> --include-inline-use-stmt and --include-inline-declaration would make sense

Maybe name it slightly differently for Fortran and have the where it should
be added as one argument, so --whatever=where=what

> And, I guess, multiple flags should be permitted, which can then be
> processed as separate lines.

Obviously.  That was the intent with --include-inline= for C as well,
after all, -include works that way too.
-include a.h -include b.h -include c.h

Jakub



Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-28 Thread Alexandre Oliva
On May 27, 2024, "Kewen.Lin"  wrote:

> I wonder if it's possible to have a test case for this?

gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on
ppc64le-linux-gnu.  Are these the sort of test case you're interested
in, or are you looking for something that tests the offsets in debug
info, rather than the end-to-end debugging feature?

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:41:04AM +0200, Tobias Burnus wrote:
> Jakub Jelinek wrote:
> > How is that option different from
> > echo '#pragma omp requires unified_shared_memory' > omp-usm.h
> > gcc -include omp-usm.h
> > ?
> > I mean with -include you can add anything you want, not just one particular
> > directive, and adding a separate option for each is just weird.
> 
> For C/C++, -include seems to be indeed sufficient (albeit not widely known).
> For Fortran, there at two issues: One placement/semantic issue: it has to be
> added per "compilation unit", i.e. to the specification part of a module,
> subprogram or main program. And a practical issue, gfortran shows:
> 
> error: command-line option '-include !$omp requires' is valid for
> C/C++/ObjC/ObjC++ but not for Fortran
> 
> Thus, for Fortran it is still intrinsically useful – even if one can argue
> whether that feature is needed at all / whether it should be added as
> command-line argument.

But then shouldn't we have an option that adds something at the start of
the declaration part of each ?
I mean, option to add 'implicit none' everywhere, or this
'!$omp requires unified_shared_memory' etc.?

I could live with an one off option for clang compatibility, I just fear
that in 2 years we'll need another one etc. and that solving it in some more
versatile way would be better.

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus

Jakub Jelinek wrote:

I mean, if we want to add something, maybe better would an -include like
option that instead of including a file includes it directly.
gcc --include-inline '#pragma omp requires unified_shared_memory' ...


Likewise for Fortran, but there the question is whether it should be in 
the use-stmt, import-stmt, implicit-part or declaration-part; I guess 
having one --include-inline-use-stmt and --include-inline-declaration 
would make sense …


And, I guess, multiple flags should be permitted, which can then be 
processed as separate lines.


Tobias


Re: [Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 9:46 PM Harald Anlauf  wrote:
>
> Hi Andre,
>
> On 5/28/24 14:10, Andre Vehreschild wrote:
> > Hi all,
> >
> > the attached patch fixes a memory leak with unlimited polymorphic return 
> > types.
> > The leak occurred, because an expression with side-effects was evaluated 
> > twice.
> > I have substituted the check for non-variable expressions followed by 
> > creating a
> > SAVE_EXPR with checking for trees with side effects and creating temp. 
> > variable
> > and freeing the memory.
>
> this looks good to me.  It also solves the runtime memory leak in
> testcase pr114012.f90 .  Nice!
>
> > Btw, I do not get the SAVE_EXPR in the old code. Is there something missing 
> > to
> > manifest it or is a SAVE_EXPR not meant to be evaluated twice?
>
> I was assuming that the comment in gcc/tree.h applies here:
>
> /* save_expr (EXP) returns an expression equivalent to EXP
> but it can be used multiple times within context CTX
> and only evaluate EXP once.  */
>
> I do not know what the practical difference between a SAVE_EXPR
> and a temporary explicitly evaluated once (which you have now)
> is, except that you can free the temporary cleanly.

A SAVE_EXPR is turned into the latter - a temporary plus once evaluated
side-effects - by the gimplifier.  It's sometimes more convenient to use
a SAVE_EXPR but an IMO an explicit temporary is prefered.  Note
SAVE_EXPRs have to be used from the same "evaluation context",
when you use the SAVE_EXPR from two different statements it's
by chance the temporary is initialized by the earlier one and not by
the later and used uninitialized by the earlier.  That means SAVE_EXPRs
are to be used from within a single (large) expression.

> > Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?
>
> Yes, this is fine from my side.  If you are inclined to backport
> to e.g. 14-branch after a grace period, that would be great.
>
> > This work is funded by the Souvereign Tech Fund. Yes, the funding has been
> > granted and Nicolas, Mikael and me will be working on some Fortran topics in
> > the next 12-18 months.
>
> This is really great news!
>
> > Regards,
> >   Andre
>
> Thanks for the patch!
>
> Harald
>
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus

Jakub Jelinek wrote:

How is that option different from
echo '#pragma omp requires unified_shared_memory' > omp-usm.h
gcc -include omp-usm.h
?
I mean with -include you can add anything you want, not just one particular
directive, and adding a separate option for each is just weird.


For C/C++, -include seems to be indeed sufficient (albeit not widely 
known). For Fortran, there at two issues: One placement/semantic issue: 
it has to be added per "compilation unit", i.e. to the specification 
part of a module, subprogram or main program. And a practical issue, 
gfortran shows:


error: command-line option '-include !$omp requires' is valid for 
C/C++/ObjC/ObjC++ but not for Fortran


Thus, for Fortran it is still intrinsically useful – even if one can 
argue whether that feature is needed at all / whether it should be added 
as command-line argument.


Tobias


Ping^2: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-28 Thread Xi Ruoyao
Ping again.

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)


-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCHv5] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify return
value of the optab should be either 0 or 1.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652864.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..53e9d210541 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,10 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab;
+  break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..3eb4216141e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,12 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Return 1 if operand 1 is a finite floating point number and 0
+otherwise.  @var{m} is a scalar floating point mode.  Operand 0
+has mode @code{SImode}, and operand 1 has mode @var{m}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH v2] add explicit ABI and align options to pr88233.c

2024-05-28 Thread Alexandre Oliva
On May 26, 2024, "Kewen.Lin"  wrote:

> Hi,
> on 2024/4/22 17:38, Alexandre Oliva wrote:
>> Ping?
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566530.html
>> (modified version follows)

> Segher originated this test case, I was expecting he can chime in this. :)

Me too ;-)

>> We've observed failures of this test on powerpc configurations that
>> default to different calling conventions and alignment requirements.

> It seems that it was using the original "BE" and "LE" guards to shadow
> ABIs, could you share some more on how you found this failure?  It seems
> that your test environment with -mstrict-align turned on by default?  And
> also having a ABI which passing small struct return value in register?

Exactly, AdaCore's ppc64-vx7r2 are configured so as to enable
-mstrict-align and -freg-struct-return by default.

But since these settings may change depending on the target variant, I
figured it would be useful to record what the assumptions are that the
test makes.  That one of these settings changed depending on endianness
and affected codegen was, to me, further evidence that this would be
useful, so, with the explicit settings, I could restore the original
test's expectations.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:26:04AM +0200, Jakub Jelinek wrote:
> > *I am especially thinking about a global variable and "#pragma omp declare
> > target". At least with 'omp requires self_maps' of OpenMP 6, it seems as if
> > 'declare target enter(global_var)' should become 'link(global_var)' where
> > the global_var pointer is updated to point to the host version.
> 
> How is that option different from
> echo '#pragma omp requires unified_shared_memory' > omp-usm.h
> gcc -include omp-usm.h
> ?
> I mean with -include you can add anything you want, not just one particular
> directive, and adding a separate option for each is just weird.

I mean, if we want to add something, maybe better would an -include like
option that instead of including a file includes it directly.
gcc --include-inline '#pragma omp requires unified_shared_memory' ...

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Jakub Jelinek
On Tue, May 28, 2024 at 09:23:41PM +0200, Tobias Burnus wrote:
> -fopenmp-force-usm can be useful for some badly written code. Explicity
> using 'omp requires' makes more sense but still. It might also make sense
> for testing purpose.
> 
> Unfortunately, I did not see a simple way of testing it. When trying it
> manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where
> gcn_data has the omp_requires_mask as second argument and testing showed
> that an explicit pragma and the -f... argument have the same result.
> 
> Alternative would be to move this code later, e.g. to lto-cgraph.cc's
> omp_requires_mask, which might be safer (as it avoids changing as many
> locations). On the other hand, it might require more special cases
> elsewhere.*
> 
> Comment, suggestions?
> 
> Tobias
> 
> *I am especially thinking about a global variable and "#pragma omp declare
> target". At least with 'omp requires self_maps' of OpenMP 6, it seems as if
> 'declare target enter(global_var)' should become 'link(global_var)' where
> the global_var pointer is updated to point to the host version.

How is that option different from
echo '#pragma omp requires unified_shared_memory' > omp-usm.h
gcc -include omp-usm.h
?
I mean with -include you can add anything you want, not just one particular
directive, and adding a separate option for each is just weird.

Jakub



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-05-28 Thread Tobias Burnus

Tobias Burnus wrote:
While most of the nvptx systems I have access to don't have the 
support for 
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one 
has:


Actually, CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS is sufficient. And 
I finally also found the proper webpage for this feature; I couldn't 
find it as Nvidia's documentation uses pageableMemoryAccess and not 
CU_... for that feature. The updated patch is attached.


For details: 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements


In principle, this proper USM is supported by Grace Hopper, PowerPC9 + 
Volta (sm_70) – but for some reasons, our PPC/Volta system does not 
support it. It is also said to work with Turing (sm_75) and newer when 
using Linux Kernel's HMM and the Open Kernel Modules (newer CUDA have 
this but don't use them by default). See link above.


I am not quite sure whether there are unintended side effects, hence, 
I have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.

OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…
libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  7 +--
 libgomp/plugin/plugin-nvptx.c | 16 
 libgomp/target.c  | 24 +++-
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..804d08ca57e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..ba534b6b3c4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,8 +6435,11 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
-  list of available devices (``host fallback'').
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the @code{pageableMemoryAccess} property;@footnote{
+  @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
+  otherwise, all nvptx device are removed from the list of available
+  devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
 @item The OpenMP routines @code{omp_target_memcpy_rect} and
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..d3764185d4b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,24 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether host page access (direct or via migration) is supported;
+ if so, enable USM.  Currently, capabilities is per dev

Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-28 Thread Kewen.Lin
Hi,

on 2024/4/30 15:18, HAO CHEN GUI wrote:
> Hi,
>   It's the first patch of a series of patches optimizing CC modes on
> rs6000.
> 
>   bcd insns set all four bits of a CR field. But it has different single
> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
> to indict overflow or invalid number. It's not a bit for unordered test.
> So the "le" test should be reversed to "gt" not "ungt". The "ge" test
> should be reversed to "lt" not "unlt". That's the root cause of PR100736
> and PR114732.
> 
>   This patch fixes the issue by adding a new type of CC mode - CCBCD for
> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
> be merged to a uniform pattern which is for all CC modes in sequential
> patch.

Thanks for doing this, adding one more CCmode for BCD specific looks
reasonable and make code more clear.

> 
>   The rtl code "unordered" is still used for testing overflow or
> invalid number. IMHO, the "unordered" on a CC mode can be considered as
> testing the forth bit of a CR field setting or not. The "eq" on a CC mode
> can be considered as testing the third bit setting or not. Thus we avoid
> creating lots of unspecs for the CR bit testing.

I can understand re-using "unordered" and "eq" will save some efforts than
doing with unspecs, but they are actually RTL codes instead of bits on the
specific hardware CR, a downside is that people who isn't aware of this
design point can have some misunderstanding when reading/checking the code
or dumping, from this perspective unspecs (with reasonable name) can be
more meaningful.  Normally adopting RTL code is better since they have the
chance to be considered (optimized) in generic pass/code, but it isn't the
case here as we just use the code itself but not be with the same semantic
(meaning).  Looking forward to others' opinions on this, if we want to adopt
"unordered" and "eq" like what this patch does, I think we should at least
emphasize such points in rs6000-modes.def.

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

Some minor comments are inlined, Segher did a lot of work on CC, I'm looking
forward to his review on this patch series. :)

> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add a new type of CC mode - CCBCD for bcd insns
> 
> gcc/
>   PR target/100736
>   PR target/114732
>   * config/rs6000/altivec.md (bcd_): Replace CCFP
>   with CCBCD.
>   (*bcd_test_): Likewise.
>   (*bcd_test2_): Likewise.
>   (bcd__): Likewise.
>   (*bcdinvalid_): Likewise.
>   (bcdinvalid_): Likewise.
>   (bcdshift_v16qi): Likewise.
>   (bcdmul10_v16qi): Likewise.
>   (bcddiv10_v16qi): Likewise.
>   (peephole for bcd_add/sub): Likewise.
>   * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
>   and its supported comparison codes.
>   * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
>   * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
>   assertion.
>   * config/rs6000/rs6000.md (CC_any): Add CCBCD.
>   (ccbcd_rev): New code iterator.
>   (*_cc): New insn and split pattern for CCBCD reverse
>   compare.
> 
> gcc/testsuite/
>   PR target/100736
>   PR target/114732
>   * gcc.target/powerpc/pr100736.c: New.
>   * gcc.target/powerpc/pr114732.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index bb20441c096..9fa8cf89f61 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -4443,7 +4443,7 @@ (define_insn "bcd_"
> (match_operand:VBCD 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_1_operand" "n")]
>UNSPEC_BCD_ADD_SUB))
> -   (clobber (reg:CCFP CR6_REGNO))]
> +   (clobber (reg:CCBCD CR6_REGNO))]
>"TARGET_P8_VECTOR"
>"bcd. %0,%1,%2,%3"
>[(set_attr "type" "vecsimple")])
> @@ -4454,8 +4454,8 @@ (define_insn "bcd_"
>  ;; probably should be one that can go in the VMX (Altivec) registers, so we
>  ;; can't use DDmode or DFmode.

Here is a paragraph of comments above:

;; Use a floating point type (V2DFmode) for the compare to set CR6 so that we
;; can use the unordered test for BCD nans and add/subtracts that overflow.  An
;; UNORDERED test on an integer type (like V1TImode) is not defined.  The type
;; probably should be one that can go in the VMX (Altivec) registers, so we
;; can't use DDmode or DFmode.

Is it still hold?  It's not obvious where is the code checking unordered test
should be on fp type (modes), if it still takes effect, "unspec" would help
to get rid of this restriction.  Otherwise, this comment should be updated
and we can drop this workaround with V2DF here.

>  (define_insn "*bcd_test_"
> -  [(set (reg:CCFP CR6_REGNO)
> - (compare:CCFP
> +  [(set (reg:CCBCD CR6_REGNO)
> + (compare:CCBCD
>(unspec:V2DF [(match_operand:VBCD 1 "regi

[PATCH v3 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (x86_64_shld_nf): New define_insn.
(x86_64_shld_ndd_nf): Ditto.
(x86_64_shld_1_nf): Ditto.
(x86_64_shld_ndd_1_nf): Ditto.
(*x86_64_shld_shrd_1_nozext_nf): Ditto.
(x86_shld_nf): Ditto.
(x86_shld_ndd_nf): Ditto.
(x86_shld_1_nf): Ditto.
(x86_shld_ndd_1_nf): Ditto.
(*x86_shld_shrd_1_nozext_nf): Ditto.
(3_doubleword_lowpart_nf): Ditto.
(x86_64_shrd_nf): Ditto.
(x86_64_shrd_ndd_nf): Ditto.
(x86_64_shrd_1_nf): Ditto.
(x86_64_shrd_ndd_1_nf): Ditto.
(*x86_64_shrd_shld_1_nozext_nf): Ditto.
(x86_shrd_nf): Ditto.
(x86_shrd_ndd_nf): Ditto.
(x86_shrd_1_nf): Ditto.
(x86_shrd_ndd_1_nf): Ditto.
(*x86_shrd_shld_1_nozext_nf): Ditto.
---
 gcc/config/i386/i386.md | 377 +++-
 1 file changed, 296 insertions(+), 81 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9d518e90d07..719cce7d3ef 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14551,7 +14551,7 @@
   DONE;
 })
 
-(define_insn "x86_64_shld"
+(define_insn "x86_64_shld"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
  (and:QI (match_operand:QI 2 "nonmemory_operand" "Jc")
@@ -14561,10 +14561,9 @@
(zero_extend:TI
  (match_operand:DI 1 "register_operand" "r"))
(minus:QI (const_int 64)
- (and:QI (match_dup 2) (const_int 63 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_64BIT"
-  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
+ (and:QI (match_dup 2) (const_int 63 0)))]
+  "TARGET_64BIT && "
+  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ishift")
(set_attr "prefix_0f" "1")
(set_attr "mode" "DI")
@@ -14572,7 +14571,7 @@
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
-(define_insn "x86_64_shld_ndd"
+(define_insn "x86_64_shld_ndd"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
  (and:QI (match_operand:QI 3 "nonmemory_operand" "Jc")
@@ -14582,14 +14581,13 @@
(zero_extend:TI
  (match_operand:DI 2 "register_operand" "r"))
(minus:QI (const_int 64)
- (and:QI (match_dup 3) (const_int 63 0)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_APX_NDD"
-  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+ (and:QI (match_dup 3) (const_int 63 0)))]
+  "TARGET_APX_NDD && "
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ishift")
(set_attr "mode" "DI")])
 
-(define_insn "x86_64_shld_1"
+(define_insn "x86_64_shld_1"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
   (match_operand:QI 2 "const_0_to_63_operand"))
@@ -14597,11 +14595,11 @@
  (lshiftrt:TI
(zero_extend:TI
  (match_operand:DI 1 "register_operand" "r"))
-   (match_operand:QI 3 "const_0_to_255_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
+   (match_operand:QI 3 "const_0_to_255_operand")) 0)))]
   "TARGET_64BIT
-   && INTVAL (operands[3]) == 64 - INTVAL (operands[2])"
-  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
+   && INTVAL (operands[3]) == 64 - INTVAL (operands[2])
+   && "
+  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ishift")
(set_attr "prefix_0f" "1")
(set_attr "mode" "DI")
@@ -14610,7 +14608,7 @@
(set_attr "amdfam10_decode" "vector")
(set_attr "bdver1_decode" "vector")])
 
-(define_insn "x86_64_shld_ndd_1"
+(define_insn "x86_64_shld_ndd_1"
   [(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI (ashift:DI (match_operand:DI 1 "nonimmediate_operand" "rm")
   (match_operand:QI 3 "const_0_to_63_operand"))
@@ -14618,15 +14616,66 @@
  (lshiftrt:TI
(zero_extend:TI
  (match_operand:DI 2 "register_operand" "r"))
-   (match_operand:QI 4 "const_0_to_255_operand")) 0)))
-   (clobber (reg:CC FLAGS_REG))]
+   (match_operand:QI 4 "const_0_to_255_operand")) 0)))]
   "TARGET_APX_NDD
-   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])"
-  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+   && INTVAL (operands[4]) == 64 - INTVAL (operands[3])
+   && "
+  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ishift")
(set_attr "mode" "DI")
(set_attr "length_immediate" "1")])
 
+(define_insn_and_split "*x86_64_shld_shrd_1_nozext_nf"
+  [(set (match_operand:DI 0 "nonimmediate_operand")
+   (ior:DI (ashift:DI (match_operand:DI 4 "nonimmediate_operand")

[PATCH v3 7/8] [APX NF] Support APX NF for mul/div

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*mul3_1_nf): New define_insn.
(*mulqi3_1_nf): Ditto.
(*divmod4_noext_nf): Ditto.
(divmodhiqi3_nf): Ditto.
---
 gcc/config/i386/i386.md | 47 ++---
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 719cce7d3ef..e688e92785e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9898,17 +9898,17 @@
 ;;
 ;; On BDVER1, all HI MULs use DoublePath
 
-(define_insn "*mul3_1"
+(define_insn "*mul3_1"
   [(set (match_operand:SWIM248 0 "register_operand" "=r,r,r")
(mult:SWIM248
  (match_operand:SWIM248 1 "nonimmediate_operand" "%rm,rm,0")
- (match_operand:SWIM248 2 "" "K,,r")))
-   (clobber (reg:CC FLAGS_REG))]
-  "!(MEM_P (operands[1]) && MEM_P (operands[2]))"
+ (match_operand:SWIM248 2 "" "K,,r")))]
+  "!(MEM_P (operands[1]) && MEM_P (operands[2]))
+   && "
   "@
-   imul{}\t{%2, %1, %0|%0, %1, %2}
-   imul{}\t{%2, %1, %0|%0, %1, %2}
-   imul{}\t{%2, %0|%0, %2}"
+   imul{}\t{%2, %1, %0|%0, %1, %2}
+   imul{}\t{%2, %1, %0|%0, %1, %2}
+   imul{}\t{%2, %0|%0, %2}"
   [(set_attr "type" "imul")
(set_attr "prefix_0f" "0,0,1")
(set (attr "athlon_decode")
@@ -9969,14 +9969,14 @@
 ;; MUL reg8Direct
 ;; MUL mem8Direct
 
-(define_insn "*mulqi3_1"
+(define_insn "*mulqi3_1"
   [(set (match_operand:QI 0 "register_operand" "=a")
(mult:QI (match_operand:QI 1 "nonimmediate_operand" "%0")
-(match_operand:QI 2 "nonimmediate_operand" "qm")))
-   (clobber (reg:CC FLAGS_REG))]
+(match_operand:QI 2 "nonimmediate_operand" "qm")))]
   "TARGET_QIMODE_MATH
-   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "mul{b}\t%2"
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))
+   && "
+  "mul{b}\t%2"
   [(set_attr "type" "imul")
(set_attr "length_immediate" "0")
(set (attr "athlon_decode")
@@ -9,6 +9,19 @@
   [(set_attr "type" "multi")
(set_attr "mode" "SI")])
 
+(define_insn "*divmod4_noext_nf"
+  [(set (match_operand:SWIM248 0 "register_operand" "=a")
+   (any_div:SWIM248
+ (match_operand:SWIM248 2 "register_operand" "0")
+ (match_operand:SWIM248 3 "nonimmediate_operand" "rm")))
+   (set (match_operand:SWIM248 1 "register_operand" "=d")
+   (:SWIM248 (match_dup 2) (match_dup 3)))
+   (use (match_operand:SWIM248 4 "register_operand" "1"))]
+  "TARGET_APX_NF"
+  "%{nf%} div{}\t%3"
+  [(set_attr "type" "idiv")
+   (set_attr "mode" "")])
+
 (define_insn "*divmod4_noext"
   [(set (match_operand:SWIM248 0 "register_operand" "=a")
(any_div:SWIM248
@@ -11266,7 +11279,7 @@
 ;; Change div/mod to HImode and extend the second argument to HImode
 ;; so that mode of div/mod matches with mode of arguments.  Otherwise
 ;; combine may fail.
-(define_insn "divmodhiqi3"
+(define_insn "divmodhiqi3"
   [(set (match_operand:HI 0 "register_operand" "=a")
(ior:HI
  (ashift:HI
@@ -11278,10 +11291,10 @@
(const_int 8))
  (zero_extend:HI
(truncate:QI
- (div:HI (match_dup 1) (any_extend:HI (match_dup 2)))
-   (clobber (reg:CC FLAGS_REG))]
-  "TARGET_QIMODE_MATH"
-  "div{b}\t%2"
+ (div:HI (match_dup 1) (any_extend:HI (match_dup 2)))]
+  "TARGET_QIMODE_MATH
+   && "
+  "div{b}\t%2"
   [(set_attr "type" "idiv")
(set_attr "mode" "QI")])
 
-- 
2.31.1



[PATCH v3 4/8] [APX NF] Support APX NF for right shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*ashr3_1_nf): New.
(*lshr3_1_nf): Ditto.
(*lshrqi3_1_nf): Ditto.
(*lshrhi3_1_nf): Ditto.
---
 gcc/config/i386/i386.md | 82 +++--
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4c06c243cc3..d10caf04fcc 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16323,13 +16323,13 @@
   [(set_attr "type" "ishiftx")
(set_attr "mode" "")])
 
-(define_insn "*ashr3_1"
+(define_insn "*ashr3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,r,c")))]
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -16340,11 +16340,11 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "sar{}\t%0";
   else
-   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sar{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,apx_ndd")
@@ -16384,14 +16384,13 @@
 }
 [(set_attr "isa" "*,*,*,apx_ndd")])
 
-
-(define_insn "*lshr3_1"
+(define_insn "*lshr3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k,r")
(lshiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,k,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,r,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (LSHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,r,,c")))]
+  "ix86_binary_operator_ok (LSHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
@@ -16403,11 +16402,11 @@
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "shr{}\t%0";
   else
-   return use_ndd ? "shr{}\t{%2, %1, %0|%0, %1, %2}"
-  : "shr{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "shr{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "shr{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,avx512bw,apx_ndd")
@@ -16423,6 +16422,17 @@
(set_attr "mode" "")])
 
 ;; Convert shift to the shiftx pattern to avoid flags dependency.
+;; For NF/NDD doesn't support shift count as r, it just support c,
+;; and it has no flag.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+  (match_operand:QI 2 "register_operand")))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+   (any_shiftrt:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (mode, operands[2]);")
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(any_shiftrt:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
@@ -16491,22 +16501,22 @@
(zero_extend:DI (any_shiftrt:SI (match_dup 1) (match_dup 2]
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
-(define_insn "*ashr3_1"
+(define_insn "*ashr3_1"
   [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m, r")
(ashiftrt:SWI12
  (match_operand:SWI12 1 "nonimmediate_operand" "0, rm")
- (match_operand:QI 2 "nonmemory_operand" "c, c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c, c")))]
+  "ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   if (operands[2] == const1_rtx
   && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
-  && !use_ndd)
+  && !use_ndd && !)
 return "sar{}\t%0";
   else
-return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sar{}\t{%2, %0|%0, %2}";
+return use_ndd ? "sar{}\t{%2, %1, %0|%0, %1, %2}"
+  : "sar{}\t{%2, %0|%0, %2}";
 }
   [(set_attr "isa" "*, apx_ndd")
(set_attr "type" "ishift")
@@ -16519,13 +16529,13 @@
(const_string "*")))
(set_attr "mode" "")])
 
-(define_insn "*lshrqi3_1"
+(define_insn "*lshrqi3_1"
   [(set (match_operand:QI 0 "nonimmediate_operand"  "=qm,?k,r")
(lshiftrt:QI
  (match_operand:QI

[PATCH v3 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (clz2_lzcnt_nf): New define_insn.
(*clz2_lzcnt_falsedep_nf): Ditto.
(__nf): Ditto.
(*__falsedep_nf): Ditto.
(_hi_nf): Ditto.
(popcount2_nf): Ditto.
(*popcount2_falsedep_nf): Ditto.
(popcounthi2_nf): Ditto.
---
 gcc/config/i386/i386.md | 124 
 1 file changed, 113 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e688e92785e..b0eb497cd23 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -20269,6 +20269,24 @@
   operands[3] = gen_reg_rtx (mode);
 })
 
+(define_insn_and_split "clz2_lzcnt_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (clz:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_APX_NF && TARGET_LZCNT"
+  "%{nf%} lzcnt{}\t{%1, %0|%0, %1}"
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_mentioned_p (operands[0], operands[1])"
+  [(parallel
+[(set (match_dup 0)
+ (clz:SWI48 (match_dup 1)))
+ (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "ix86_expand_clear (operands[0]);"
+  [(set_attr "prefix_rep" "1")
+   (set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
 (define_insn_and_split "clz2_lzcnt"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(clz:SWI48
@@ -20292,6 +20310,18 @@
 ; False dependency happens when destination is only updated by tzcnt,
 ; lzcnt or popcnt.  There is no false dependency when destination is
 ; also used in source.
+(define_insn "*clz2_lzcnt_falsedep_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (clz:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))
+   (unspec [(match_operand:SWI48 2 "register_operand" "0")]
+  UNSPEC_INSN_FALSE_DEP)]
+  "TARGET_APX_NF && TARGET_LZCNT"
+  "%{nf%} lzcnt{}\t{%1, %0|%0, %1}"
+  [(set_attr "prefix_rep" "1")
+   (set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
 (define_insn "*clz2_lzcnt_falsedep"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(clz:SWI48
@@ -20398,6 +20428,25 @@
 ;; Version of lzcnt/tzcnt that is expanded from intrinsics.  This version
 ;; provides operand size as output when source operand is zero. 
 
+(define_insn_and_split "__nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (unspec:SWI48
+ [(match_operand:SWI48 1 "nonimmediate_operand" "rm")] LT_ZCNT))]
+  "TARGET_APX_NF"
+  "%{nf%} {}\t{%1, %0|%0, %1}"
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_mentioned_p (operands[0], operands[1])"
+  [(parallel
+[(set (match_dup 0)
+ (unspec:SWI48 [(match_dup 1)] LT_ZCNT))
+ (unspec [(match_dup 0)] UNSPEC_INSN_FALSE_DEP)])]
+  "ix86_expand_clear (operands[0]);"
+  [(set_attr "type" "")
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "mode" "")])
+
 (define_insn_and_split "_"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(unspec:SWI48
@@ -20422,6 +20471,19 @@
 ; False dependency happens when destination is only updated by tzcnt,
 ; lzcnt or popcnt.  There is no false dependency when destination is
 ; also used in source.
+(define_insn "*__falsedep_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (unspec:SWI48
+ [(match_operand:SWI48 1 "nonimmediate_operand" "rm")] LT_ZCNT))
+   (unspec [(match_operand:SWI48 2 "register_operand" "0")]
+  UNSPEC_INSN_FALSE_DEP)]
+  "TARGET_APX_NF"
+  "%{nf%} {}\t{%1, %0|%0, %1}"
+  [(set_attr "type" "")
+   (set_attr "prefix_0f" "1")
+   (set_attr "prefix_rep" "1")
+   (set_attr "mode" "")])
+
 (define_insn "*__falsedep"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(unspec:SWI48
@@ -20436,13 +20498,12 @@
(set_attr "prefix_rep" "1")
(set_attr "mode" "")])
 
-(define_insn "_hi"
+(define_insn "_hi"
   [(set (match_operand:HI 0 "register_operand" "=r")
(unspec:HI
- [(match_operand:HI 1 "nonimmediate_operand" "rm")] LT_ZCNT))
-   (clobber (reg:CC FLAGS_REG))]
-  ""
-  "{w}\t{%1, %0|%0, %1}"
+ [(match_operand:HI 1 "nonimmediate_operand" "rm")] LT_ZCNT))]
+  ""
+  "{w}\t{%1, %0|%0, %1}"
   [(set_attr "type" "")
(set_attr "prefix_0f" "1")
(set_attr "prefix_rep" "1")
@@ -20860,6 +20921,30 @@
   [(set_attr "type" "bitmanip")
(set_attr "mode" "")])
 
+(define_insn_and_split "popcount2_nf"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (popcount:SWI48
+ (match_operand:SWI48 1 "nonimmediate_operand" "rm")))]
+  "TARGET_APX_NF && TARGET_POPCNT"
+{
+#if TARGET_MACHO
+  return "%{nf%} popcnt\t{%1, %0|%0, %1}";
+#else
+  return "%{nf%} popcnt{}\t{%1, %0|%0, %1}";
+#endif
+}
+  "&& TARGET_AVOID_FALSE_DEP_FOR_BMI && epilogue_completed
+   && optimize_function_for_speed_p (cfun)
+   && !reg_mentione

[PATCH v3 5/8] [APX NF] Support APX NF for rotate insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (ashr3_cvt_nf): New define_insn.
(*3_1_nf): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-nf.c: Add NF test for rotate insns.
---
 gcc/config/i386/i386.md| 59 +-
 gcc/testsuite/gcc.target/i386/apx-nf.c |  5 +++
 2 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d10caf04fcc..9d518e90d07 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16245,19 +16245,19 @@
 (define_mode_attr cvt_mnemonic
   [(SI "{cltd|cdq}") (DI "{cqto|cqo}")])
 
-(define_insn "ashr3_cvt"
+(define_insn "ashr3_cvt"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=*d,rm,r")
(ashiftrt:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "*a,0,rm")
- (match_operand:QI 2 "const_int_operand")))
-   (clobber (reg:CC FLAGS_REG))]
+ (match_operand:QI 2 "const_int_operand")))]
   "INTVAL (operands[2]) == GET_MODE_BITSIZE (mode)-1
&& (TARGET_USE_CLTD || optimize_function_for_size_p (cfun))
-   && ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)"
+   && ix86_binary_operator_ok (ASHIFTRT, mode, operands, TARGET_APX_NDD)
+   && "
   "@

-   sar{}\t{%2, %0|%0, %2}
-   sar{}\t{%2, %1, %0|%0, %1, %2}"
+   sar{}\t{%2, %0|%0, %2}
+   sar{}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "*,*,apx_ndd")
(set_attr "type" "imovx,ishift,ishift")
(set_attr "prefix_0f" "0,*,*")
@@ -17109,28 +17109,31 @@
   [(set_attr "type" "rotatex")
(set_attr "mode" "")])
 
-(define_insn "*3_1"
+(define_insn "*3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(any_rotate:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "0,rm,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" "c,,c")))]
+  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_ROTATEX:
-  return "#";
+  if (TARGET_APX_NDD && )
+   return "%{nf%} {}\t{%2, %1, %0|%0, %1, %2}";
+  else
+   return "#";
 
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "{}\t%0";
   else
-   return use_ndd ? "{}\t{%2, %1, %0|%0, %1, %2}"
-  : "{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "{}\t{%2, %1, %0|%0, 
%1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,bmi2,apx_ndd")
@@ -17164,6 +17167,20 @@
   operands[2] = GEN_INT ((bitsize - INTVAL (operands[2])) % bitsize);
 })
 
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (rotate:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+ (match_operand:QI 2 "const_int_operand")))]
+  "TARGET_BMI2 && reload_completed && !optimize_function_for_size_p (cfun)
+   && !TARGET_APX_NDD"
+  [(set (match_dup 0)
+   (rotatert:SWI48 (match_dup 1) (match_dup 2)))]
+{
+  int bitsize = GET_MODE_BITSIZE (mode);
+
+  operands[2] = GEN_INT ((bitsize - INTVAL (operands[2])) % bitsize);
+})
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(rotatert:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
@@ -17251,22 +17268,22 @@
   [(set (match_dup 0)
(zero_extend:DI (rotatert:SI (match_dup 1) (match_dup 2])
 
-(define_insn "*3_1"
+(define_insn "*3_1"
   [(set (match_operand:SWI12 0 "nonimmediate_operand" "=m,r")
(any_rotate:SWI12 (match_operand:SWI12 1 "nonimmediate_operand" "0,rm")
- (match_operand:QI 2 "nonmemory_operand" "c,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" 
"c,c")))]
+  "ix86_binary_operator_ok (, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   if (operands[2] == const1_rtx
   && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
-  && !use_ndd)
+  && !use_ndd && !)
 return "{}\t%0";
   else
 return use_ndd
-  ? "{}\t{%2, %1, %0|%0, %1, %2}"
-  : "{}\t{%2, %0|%0, %2}";
+  ? "{}\t{%2, %1, %0|%0, %1, %2}"
+  : "{}\t{%2, %0|%0, %2}";
 }
   [(set_attr "isa" "*,apx_ndd")
(set_attr "type" "rotate")
diff --git a/gcc/testsuite/gcc.target/i386/apx-nf.c 
b/gcc/testsuite/gcc.target/i386/apx-nf.c
index f33a994f0b7..ed859b399b8 100644
--- a/gcc/testsuite/gcc.target/i386/apx-nf.c
+++ b/gcc/testsuite/gcc.target/i386/apx-nf.c
@@ -2,6 +2,7 @@
 /* { dg-options "-mapx-features=egpr,push2pop2,ndd,ppx,nf -march=x86-64 -O2" } 
*/
 /* { dg-final

[PATCH v3 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (nf_nonf_attr): New subst_attr.
(nf_nonf_x64_attr): Ditto.
(*sub_1_nf): New define_insn.
(*anddi_1_nf): Ditto.
(*and_1_nf): Ditto.
(*qi_1_nf): Ditto.
(*_1_nf): Ditto.
(*neg_1_nf): Ditto.
* config/i386/sse.md : New define_split.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-nf.c: Add test.
---
 gcc/config/i386/i386.md| 173 +
 gcc/config/i386/sse.md |  11 ++
 gcc/testsuite/gcc.target/i386/apx-nf.c |  12 ++
 3 files changed, 114 insertions(+), 82 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1eeadaddeba..d3cb224abad 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -575,7 +575,7 @@
noavx512dq,fma_or_avx512vl,avx512vl,noavx512vl,avxvnni,
avx512vnnivl,avx512fp16,avxifma,avx512ifmavl,avxneconvert,
avx512bf16vl,vpclmulqdqvl,avx_noavx512f,avx_noavx512vl,
-   vaes_avx512vl"
+   vaes_avx512vl,noapx_nf"
   (const_string "base"))
 
 ;; The (bounding maximum) length of an instruction immediate.
@@ -981,6 +981,7 @@
   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
 (eq_attr "mmx_isa" "avx")
   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
+(eq_attr "isa" "noapx_nf") (symbol_ref "!TARGET_APX_NF")
]
(const_int 1)))
 
@@ -6449,6 +6450,8 @@
 (define_subst_attr "nf_condition" "nf_subst" "TARGET_APX_NF" "true")
 (define_subst_attr "nf_mem_constraint" "nf_subst" "je" "m")
 (define_subst_attr "nf_applied" "nf_subst" "true" "false")
+(define_subst_attr "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")
+(define_subst_attr "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64")
 
 (define_subst "nf_subst"
   [(set (match_operand:SWI 0)
@@ -7893,20 +7896,21 @@
   "split_double_mode (mode, &operands[0], 2, &operands[0], &operands[3]);"
 [(set_attr "isa" "*,*,apx_ndd,apx_ndd")])
 
-(define_insn "*sub_1"
-  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,,r,r,r")
+(define_insn "*sub_1"
+  [(set (match_operand:SWI 0 "nonimmediate_operand" 
"=m,r,,r,r,r")
(minus:SWI
- (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,rjM,r")
- (match_operand:SWI 2 "" ",,r,,")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)"
+ (match_operand:SWI 1 "nonimmediate_operand" "0,0,0,rm,rjM,r")
+ (match_operand:SWI 2 "" ",,,r,,")))]
+  "ix86_binary_operator_ok (MINUS, mode, operands, TARGET_APX_NDD)
+  && "
   "@
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %0|%0, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}
-  sub{}\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd")
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %0|%0, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}
+  sub{}\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,*,*,apx_ndd,apx_ndd,apx_ndd")
(set_attr "type" "alu")
(set_attr "mode" "")])
 
@@ -11795,27 +11799,28 @@
 }
 [(set_attr "isa" "*,*,apx_ndd,apx_ndd,apx_ndd,apx_ndd_64,apx_ndd")])
 
-(define_insn "*anddi_1"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,rm,r,r,r,r,r,?k")
+(define_insn "*anddi_1"
+  [(set (match_operand:DI 0 "nonimmediate_operand" 
"=r,r,rm,r,r,r,r,r,r,?k")
(and:DI
-(match_operand:DI 1 "nonimmediate_operand" "%0,r,0,0,rm,rjM,r,qm,k")
-(match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,re,m,r,e,m,L,k")))
-   (clobber (reg:CC FLAGS_REG))]
+(match_operand:DI 1 "nonimmediate_operand" "%0,r,0,0,0,rm,rjM,r,qm,k")
+(match_operand:DI 2 "x86_64_szext_general_operand" 
"Z,Z,r,e,m,r,e,m,L,k")))]
   "TARGET_64BIT
-   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)"
+   && ix86_binary_operator_ok (AND, DImode, operands, TARGET_APX_NDD)
+   && "
   "@
-   and{l}\t{%k2, %k0|%k0, %k2}
-   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %0|%0, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
-   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{l}\t{%k2, %k0|%k0, %k2}
+   and{l}\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %0|%0, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
+   and{q}\t{%2, %1, %0|%0, %1, %2}
#
#"
-  [(set_attr "isa" "x64,apx_ndd,x64,x64,apx_ndd,apx_ndd,apx_ndd,x64,avx512bw")
-   (set_attr "type" "alu,alu,alu,alu,alu,alu,alu,imovx,msklog")
-   (set_attr "length_immediate" "*,*,*,*,*,*,*,0,*")
+  [(set_attr "isa" 
"x64,apx_ndd,x64,x64,x64,apx_ndd,apx_ndd,apx_ndd,,avx512bw")
+   (set_attr "type" "alu,alu,alu,alu,alu,alu,alu,alu,imovx,msklog")
+   (set_attr "length_immediate" "*,*,*,*,*,*,*,*

[PATCH v3 3/8] [APX NF] Support APX NF for left shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386.md (*ashl3_1_nf): New.
(*ashlhi3_1_nf): Ditto.
(*ashlqi3_1_nf): Ditto.
* config/i386/sse.md: New define_split.
---
 gcc/config/i386/i386.md | 96 ++---
 gcc/config/i386/sse.md  | 13 ++
 2 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d3cb224abad..4c06c243cc3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15011,17 +15011,22 @@
   [(set_attr "type" "ishiftx")
(set_attr "mode" "")])
 
-(define_insn "*ashl3_1"
+(define_insn "*ashl3_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,?k,r")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm,k,rm")
- (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, mode, operands, TARGET_APX_NDD)"
+ (match_operand:QI 2 "nonmemory_operand" 
"c,M,r,,c")))]
+  "ix86_binary_operator_ok (ASHIFT, mode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
+  if (TARGET_APX_NDD && )
+   return "%{nf%} sal{}\t{%2, %1, %0|%0, %1, %2}";
+  else
+   return "#";
+
 case TYPE_ISHIFTX:
 case TYPE_MSKLOG:
   return "#";
@@ -15029,7 +15034,7 @@
 case TYPE_ALU:
   gcc_assert (operands[2] == const1_rtx);
   gcc_assert (rtx_equal_p (operands[0], operands[1]));
-  return "add{}\t%0, %0";
+  return "add{}\t%0, %0";
 
 default:
   if (operands[2] == const1_rtx
@@ -15037,11 +15042,11 @@
  /* For NDD form instructions related to TARGET_SHIFT1, the $1
 immediate do not need to be omitted as assembler will map it
 to use shorter encoding. */
- && !use_ndd)
+ && !use_ndd && !)
return "sal{}\t%0";
   else
-   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, %2}"
-  : "sal{}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{}\t{%2, %1, %0|%0, %1, 
%2}"
+  : "sal{}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,*,bmi2,avx512bw,apx_ndd")
@@ -15072,6 +15077,17 @@
(set_attr "mode" "")])
 
 ;; Convert shift to the shiftx pattern to avoid flags dependency.
+;; For NF/NDD doesn't support shift count as r, it just support c,
+;; and it has no flag.
+(define_split
+  [(set (match_operand:SWI48 0 "register_operand")
+   (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
+ (match_operand:QI 2 "register_operand")))]
+  "TARGET_BMI2 && reload_completed"
+  [(set (match_dup 0)
+   (ashift:SWI48 (match_dup 1) (match_dup 2)))]
+  "operands[2] = gen_lowpart (mode, operands[2]);")
+
 (define_split
   [(set (match_operand:SWI48 0 "register_operand")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand")
@@ -15158,32 +15174,37 @@
(zero_extend:DI (ashift:SI (match_dup 1) (match_dup 2]
   "operands[2] = gen_lowpart (SImode, operands[2]);")
 
-(define_insn "*ashlhi3_1"
+(define_insn "*ashlhi3_1"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=rm,Yp,?k,r")
(ashift:HI (match_operand:HI 1 "nonimmediate_operand" "0,l,k,rm")
-  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww,cI")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (ASHIFT, HImode, operands, TARGET_APX_NDD)"
+  (match_operand:QI 2 "nonmemory_operand" "cI,M,Ww,cI")))]
+  "ix86_binary_operator_ok (ASHIFT, HImode, operands, TARGET_APX_NDD)
+   && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
+  if (TARGET_APX_NDD && )
+   return "%{nf%} sal{w}\t{%2, %1, %0|%0, %1, %2}";
+  else
+   return "#";
+
 case TYPE_MSKLOG:
   return "#";
 
 case TYPE_ALU:
   gcc_assert (operands[2] == const1_rtx);
-  return "add{w}\t%0, %0";
+  return "add{w}\t%0, %0";
 
 default:
   if (operands[2] == const1_rtx
  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
- && !use_ndd)
+ && !use_ndd && !)
return "sal{w}\t%0";
   else
-   return use_ndd ? "sal{w}\t{%2, %1, %0|%0, %1, %2}"
-  : "sal{w}\t{%2, %0|%0, %2}";
+   return use_ndd ? "sal{w}\t{%2, %1, %0|%0, %1, %2}"
+  : "sal{w}\t{%2, %0|%0, %2}";
 }
 }
   [(set_attr "isa" "*,*,avx512f,apx_ndd")
@@ -15211,31 +15232,36 @@
(const_string "*")))
(set_attr "mode" "HI,SI,HI,HI")])
 
-(define_insn "*ashlqi3_1"
+(define_insn "*ashlqi3_1"
   [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,Yp,?k,r")
(ashift:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,l,k,rm")
-  (match_operand:QI 2 "nonmemory_operand" "cI,cI,M,Wb,cI")))
-   (clobber (reg:CC FLAGS_REG))]
-  "i

[PATCH v3 1/8] [APX NF]: Support APX NF add

2024-05-28 Thread Kong, Lingling
Hi, compared with v2, these patches restored the original lea patten position 
and addressed hongtao's comment. 

APX NF(no flags) feature implements suppresses the update of status flags
for arithmetic operations.

For NF add, it is not clear whether nf add can be faster than lea. If so,
the pattern needs to be adjusted to perfer lea generation.

gcc/ChangeLog:

* config/i386/i386-opts.h (enum apx_features): Add nf
enumeration.
* config/i386/i386.h (TARGET_APX_NF): New.
* config/i386/i386.md (*add_1_nf): New define_insn.
* config/i386/i386.opt: Add apx_nf enumeration.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-ndd.c: Fixed test.

Co-authored-by: Lingling Kong 
---
 gcc/config/i386/i386-opts.h |   3 +-
 gcc/config/i386/i386.h  |   1 +
 gcc/config/i386/i386.md | 135 
 gcc/config/i386/i386.opt|   3 +
 gcc/testsuite/gcc.target/i386/apx-ndd.c |   2 +-
 5 files changed, 98 insertions(+), 46 deletions(-)

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index ef2825803b3..60176ce609f 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -140,7 +140,8 @@ enum apx_features {
   apx_push2pop2 = 1 << 1,
   apx_ndd = 1 << 2,
   apx_ppx = 1 << 3,
-  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
+  apx_nf = 1<< 4,
+  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx | apx_nf,
 };
 
 #endif
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 359a8408263..969391d3013 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
 #define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
 #define TARGET_APX_PPX (ix86_apx_features & apx_ppx)
+#define TARGET_APX_NF (ix86_apx_features & apx_nf)
 
 #include "config/vxworks-dummy.h"
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e8073f5a200..1eeadaddeba 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6290,6 +6290,13 @@
   [(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
   (clobber (reg:CC FLAGS_REG))])]
   "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
+
+(define_split
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 "const1248_operand")))]
+  "TARGET_APX_NF && reload_completed"
+  [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))]
+  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
 

 ;; Add instructions
 
@@ -6437,48 +6444,65 @@
  (clobber (reg:CC FLAGS_REG))])]
  "split_double_mode (mode, &operands[0], 1, &operands[0], &operands[5]);")
 
-(define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r")
+(define_subst_attr "nf_name" "nf_subst" "_nf" "")
+(define_subst_attr "nf_prefix" "nf_subst" "%{nf%} " "")
+(define_subst_attr "nf_condition" "nf_subst" "TARGET_APX_NF" "true")
+(define_subst_attr "nf_mem_constraint" "nf_subst" "je" "m")
+(define_subst_attr "nf_applied" "nf_subst" "true" "false")
+
+(define_subst "nf_subst"
+  [(set (match_operand:SWI 0)
+(match_operand:SWI 1))]
+  ""
+  [(set (match_dup 0)
+   (match_dup 1))
+   (clobber (reg:CC FLAGS_REG))])
+
+(define_insn "*add_1"
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" 
"=rm,r,r,r,r,r,r,r")
(plus:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r,rje,jM,r")
- (match_operand:SWI48 2 "x86_64_general_operand" "re,BM,0,le,r,e,BM")))
-   (clobber (reg:CC FLAGS_REG))]
-  "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)"
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,0,r,r,rje,jM,r")
+ (match_operand:SWI48 2 "x86_64_general_operand" 
"r,e,BM,0,le,r,e,BM")))]
+  "ix86_binary_operator_ok (PLUS, mode, operands, TARGET_APX_NDD)
+  && "
 {
   bool use_ndd = get_attr_isa (insn) == ISA_APX_NDD;
   switch (get_attr_type (insn))
 {
 case TYPE_LEA:
-  return "#";
+  if (TARGET_APX_NDD && )
+   return "%{nf%} add{}\t{%2, %1, %0|%0, %1, %2}";
+  else
+   return "#";
 
 case TYPE_INCDEC:
   if (operands[2] == const1_rtx)
-return use_ndd ? "inc{}\t{%1, %0|%0, %1}"
- : "inc{}\t%0";
+return use_ndd ? "inc{}\t{%1, %0|%0, %1}"
+ : "inc{}\t%0";
   else
 {
  gcc_assert (operands[2] == constm1_rtx);
- return use_ndd ? "dec{}\t{%1, %0|%0, %1}"
-   : "dec{}\t%0";
+ return use_ndd ? "dec{}\t{%1, %0|%0, %1}"
+   : "dec{}\t%0";
}
 
 default:
   /* For most processors, ADD is faster than LEA.  This alternative
 was added to use ADD as much as possible.  */
-  if (whic

[PATCH] Implement -fassume-sane-operator-new [PR110137]

2024-05-28 Thread user202729
This patch implements the flag -fassume-sane-operator-new as suggested in 
PR110137. When the flag is enabled, it is assumed that operator new does not 
modify global memory.

While this patch is not powerful enough to handle the original issue in 
PR110035, it allows the optimizer to handle some simpler case (e.g. load from 
global memory with fixed address), as demonstrated in the test 
sane-operator-new-1.C.

To handle the original issue in PR110035, some other improvement to the 
optimizer is needed, which will be sent as subsequent patches.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.From 14a8604907c89838577ff8560df9a3f9dc2d8afb Mon Sep 17 00:00:00 2001
From: user202729 
Date: Fri, 24 May 2024 17:40:55 +0800
Subject: [PATCH] Implement -fassume-sane-operator-new [PR110137]

	PR c++/110137

gcc/c-family/ChangeLog:

	* c.opt: New option.

gcc/ChangeLog:

	* ira.cc (is_call_operator_new_p): New function.
	(may_modify_memory_p): Likewise.
	(validate_equiv_mem): Modify to use may_modify_memory_p.

gcc/testsuite/ChangeLog:

	* g++.dg/sane-operator-new-1.C: New test.
	* g++.dg/sane-operator-new-2.C: New test.
	* g++.dg/sane-operator-new-3.C: New test.
---
 gcc/c-family/c.opt |  4 
 gcc/ira.cc | 23 -
 gcc/testsuite/g++.dg/sane-operator-new-1.C | 12 +++
 gcc/testsuite/g++.dg/sane-operator-new-2.C | 12 +++
 gcc/testsuite/g++.dg/sane-operator-new-3.C | 24 ++
 5 files changed, 74 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/sane-operator-new-1.C
 create mode 100644 gcc/testsuite/g++.dg/sane-operator-new-2.C
 create mode 100644 gcc/testsuite/g++.dg/sane-operator-new-3.C

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..20c3ff77ee8 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1605,6 +1605,10 @@ fasm
 C ObjC C++ ObjC++ Var(flag_no_asm, 0)
 Recognize the \"asm\" keyword.
 
+fassume-sane-operator-new
+C++ Optimization Var(flag_assume_sane_operator_new)
+Assume operator new does not have any side effect other than the allocation.
+
 ; Define extra predefined macros for use in libgcc.
 fbuilding-libgcc
 C ObjC C++ ObjC++ Undocumented Var(flag_building_libgcc)
diff --git a/gcc/ira.cc b/gcc/ira.cc
index 5642aea3caa..2902853a2bc 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -3080,6 +3080,27 @@ validate_equiv_mem_from_store (rtx dest, const_rtx set ATTRIBUTE_UNUSED,
 
 static bool equiv_init_varies_p (rtx x);
 
+static bool is_call_operator_new_p (rtx_insn *insn)
+{
+  if (!CALL_P (insn))
+return false;
+  tree fn = get_call_fndecl (insn);
+  if (fn == NULL_TREE)
+return false;
+  return DECL_IS_OPERATOR_NEW_P (fn);
+}
+
+/* Returns true if there is a possibility that INSN may modify memory.
+   If false is returned, the compiler proved INSN never modify memory.  */
+static bool may_modify_memory_p (rtx_insn *insn)
+{
+  if (RTL_CONST_OR_PURE_CALL_P (insn))
+return false;
+  if (flag_assume_sane_operator_new && is_call_operator_new_p (insn))
+return false;
+  return true;
+}
+
 enum valid_equiv { valid_none, valid_combine, valid_reload };
 
 /* Verify that no store between START and the death of REG invalidates
@@ -3123,7 +3144,7 @@ validate_equiv_mem (rtx_insn *start, rtx reg, rtx memref)
 	 been changed and all hell breaks loose.  */
 	  ret = valid_combine;
 	  if (!MEM_READONLY_P (memref)
-	  && (!RTL_CONST_OR_PURE_CALL_P (insn)
+	  && (may_modify_memory_p (insn)
 		  || equiv_init_varies_p (XEXP (memref, 0
 	return valid_none;
 	}
diff --git a/gcc/testsuite/g++.dg/sane-operator-new-1.C b/gcc/testsuite/g++.dg/sane-operator-new-1.C
new file mode 100644
index 000..de81e1d92b9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/sane-operator-new-1.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2 -fassume-sane-operator-new" } */
+int a;
+float *b;
+int
+m ()
+{
+  int x = a;
+  b = new float;
+  return x + a;
+}
+/* { dg-final { scan-assembler-times {a\(%} 1 } } */
diff --git a/gcc/testsuite/g++.dg/sane-operator-new-2.C b/gcc/testsuite/g++.dg/sane-operator-new-2.C
new file mode 100644
index 000..28fe880810e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/sane-operator-new-2.C
@@ -0,0 +1,12 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-O2" } */
+int a;
+float *b;
+int
+m ()
+{
+  int x = a;
+  b = new float;
+  return x + a;
+}
+/* { dg-final { scan-assembler-times {a\(%} 2 } } */
diff --git a/gcc/testsuite/g++.dg/sane-operator-new-3.C b/gcc/testsuite/g++.dg/sane-operator-new-3.C
new file mode 100644
index 000..17e9f0640e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/sane-operator-new-3.C
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+typedef __SIZE_TYPE__ size_t;
+extern "C" {
+  void* malloc (size_t);
+  void free (void*);
+  void abort (void);
+}
+int a = 5;
+float *b;
+void *
+__attribute__ ((noin

RE: [PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-28 Thread Jiang, Haochen
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for this if there's no objection in 48 hours.
> > > >
> > > > After we committed into trunk for a month, if there isn't any
> > > > unexpected happen. We planned to backport it to GCC14.2.

I accidentally backported it to GCC14.2 for now since I did not realize
that my local branch is on GCC14, not trunk.

If there is something unexpected on trunk, I will revert the patches for
GCC14.

Thx,
Haochen

> > > >
> > > > Thx,
> > > > Haochen
> > > >
> > > > Haochen Jiang (1):
> > > >   Adjust generic loop alignment from 16:11:8 to 16 for Intel
> > > > processors
> > For this one, current znver{1,2,3,4,5}_cost already set loop align as
> > 16, so I think it should be fine set it to generic_cost.
> > > >
> > > > liuhongt (1):
> > > >   Align tight&hot loop without considering max skipping bytes.
> > For this one, although we have seen similar growth on AMD's
> > processors, it's still nice to have someone from AMD to look at this
> > to see if it's what they need.
> > > >
> > > >  gcc/config/i386/i386.cc  | 148 ++-
> > > >  gcc/config/i386/i386.md  |  10 ++-
> > > >  gcc/config/i386/x86-tune-costs.h |   2 +-
> > > >  3 files changed, 154 insertions(+), 6 deletions(-)
> > > >
> > > > --
> > > > 2.31.1


[PATCH] ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

2024-05-28 Thread Andrew Pinski
Sometimes initialize_sanitizer_builtins is not called before emitting
the asan builtins with hwasan. In the case of the bug report, there
was a path with the fortran front-end where it was not called.
So let's call it in asan_instrument before calling transform_statements.

Built and tested for aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR sanitizer/115205
* asan.cc (asan_instrument): Call initialize_sanitizer_builtins
for hwasan.

Signed-off-by: Andrew Pinski 
---
 gcc/asan.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 9e0f51b1477..c684ca6d366 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -4276,6 +4276,7 @@ asan_instrument (void)
 {
   if (hwasan_sanitize_p ())
 {
+  initialize_sanitizer_builtins ();
   transform_statements ();
   return 0;
 }
-- 
2.43.0



Re: [RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-28 Thread Kewen.Lin
Hi Richi and Joseph,

on 2024/5/24 20:23, Richard Biener wrote:
> On Fri, May 24, 2024 at 12:20 PM Kewen.Lin  wrote:
>> btw, the attached patch is bootstrapped and regtested on
>> powerpc64-linux-gnu and powerpc64le-linux-gnu with all
>> languages on, cross cc1 built well for affected ports.
> 
> Looks reasonable to me - I'd split language changes out but
> keep target and middle-end together.  The middle-end parts
> look good to me - I'm always a bit nervous when using
> size and precision exchangably, esp. for FP, but it seems
> this has been done before.

Thanks for the suggestion!  I'll split them into a patch series
as components soon and follow this suggestion when committing
(some preparation language changes go first and squash the
others together).

on 2024/5/29 05:06, Joseph Myers wrote:
> On Fri, 24 May 2024, Kewen.Lin wrote:
> 
>> Following your suggestion and comments, I made this patch
>> for mode_for_floating_type first, considering this touches
>> a few FE and port specific code, I think I have to split
>> it into a patch series.  Before making that, I'd like to
>> ensure this meets what you expected, and also seek for the
> 
> The general idea seems reasonable (I haven't reviewed it in detail).  
> Note that when removing a target macro, it's a good idea to add it to the 
> "Old target macros that have moved to the target hooks structure." list 
> (of #pragma GCC poison) in system.h to ensure any new target that was 
> originally written before the change doesn't accidentally get into GCC 
> while still using the old macros.
> 

Thanks for the comments on target macro removal!  I found it means
that we can't use such macros any more even if they have become port
specific.  For some targets such as pa, they redefine these macros in
some subtarget headers, or these macros get used in other macro
definitions.  Considering leaving them can have better readability,
I didn't try to change them in this RFC/PATCH, I'll update them with
target prefix in the following patch series.

BR,
Kewen



[PATCH] [x86] Support vcond_mask_qiqi and friends.

2024-05-28 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/sse.md (vcond_mask_): New expander.

gcc/testsuite/ChangeLog:
* gcc.target/i386/pr114125.c: New test.
---
 gcc/config/i386/sse.md   | 20 
 gcc/testsuite/gcc.target/i386/pr114125.c | 10 ++
 2 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114125.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0f4fbcb2c5d..7cd912eeeb1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4807,6 +4807,26 @@ (define_expand "vcond_mask_"
   DONE;
 })
 
+(define_expand "vcond_mask_"
+  [(match_operand:SWI1248_AVX512BW 0 "register_operand")
+   (match_operand:SWI1248_AVX512BW 1 "register_operand")
+   (match_operand:SWI1248_AVX512BW 2 "register_operand")
+   (match_operand:SWI1248_AVX512BW 3 "register_operand")]
+  "TARGET_AVX512F"
+{
+  /* (operand[1] & operand[3]) | (operand[2] & ~operand[3])  */
+  rtx op1 = gen_reg_rtx (mode);
+  rtx op2 = gen_reg_rtx (mode);
+  rtx op3 = gen_reg_rtx (mode);
+
+  emit_insn (gen_and3 (op1, operands[1], operands[3]));
+  emit_insn (gen_one_cmpl2 (op3, operands[3]));
+  emit_insn (gen_and3 (op2, operands[2], op3));
+  emit_insn (gen_ior3 (operands[0], op1, op2));
+
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel floating point logical operations
diff --git a/gcc/testsuite/gcc.target/i386/pr114125.c 
b/gcc/testsuite/gcc.target/i386/pr114125.c
new file mode 100644
index 000..e63fbffe965
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114125.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=x86-64-v4 -fdump-tree-forwprop3-raw " } */
+
+typedef long vec __attribute__((vector_size(16)));
+vec f(vec x){
+  vec y = x < 10;
+  return y & (y == 0);
+}
+
+/* { dg-final { scan-tree-dump-not "_expr" "forwprop3" } } */
-- 
2.31.1



[to-be-committed] [RISC-V] Use pack to handle repeating constants

2024-05-28 Thread Jeff Law
This patch utilizes zbkb to improve the code we generate for 64bit 
constants when the high half is a duplicate of the low half.


Basically we generate the low half and use a pack instruction with that 
same register repeated.  ie


pack dest,src,src

That gives us a maximum sequence of 3 instructions and sometimes it will 
be just 2 instructions (say if the low 32bits can be constructed with a 
single addi or lui).


As with shadd, I'm abusing an RTL opcode.  This time it's CONCAT.  It's 
reasonably close to what we're doing.  Obviously it's just how we 
identify the desire to generate a pack in the array of opcodes.  We 
don't actually emit a CONCAT.


Note that we don't care about the potential sign extension from bit 31. 
pack will only look at bits 0..31 of each input (for rv64).  So we go 
ahead and sign extend before synthesizing the low part as that allows us 
to handle more cases trivially.


I had my testsuite generator chew on random cases of a repeating 
constant without any surprises.  I don't see much point in including all 
those in the testcase (after all there's 2**32 of them).  I've got a set 
of 10 I'm including.  Nothing particularly interesting in them.


An enterprising developer that needs this improved without zbkb could 
probably do so with a bit of work.  First increase the cost by 1 unit. 
Second avoid cases where bit 31 is set and restrict it to cases when we 
can still create pseudos.   On the codegen side, when encountering the 
CONCAT, generate the appropriate shift of "X" into a temporary register, 
then IOR the temporary with "X" into the new destination.


Anyway, I've tested this in my tester (though it doesn't turn on zbkb, 
yet).  I'll let the CI system chew on it overnight, but like mine, I 
don't think it lights up zbkb.  So it's unlikely to spit out anything 
interesting.


Jeffdiff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md
index b632312ade2..b9cac78fce1 100644
--- a/gcc/config/riscv/crypto.md
+++ b/gcc/config/riscv/crypto.md
@@ -107,7 +107,7 @@ (define_insn "riscv_pack_"
 ;; This is slightly more complex than the other pack patterns
 ;; that fully expose the RTL as it needs to self-adjust to
 ;; rv32 and rv64.  But it's not that hard.
-(define_insn "*riscv_xpack__2"
+(define_insn "riscv_xpack___2"
   [(set (match_operand:X 0 "register_operand" "=r")
(ior:X (ashift:X (match_operand:X 1 "register_operand" "r")
 (match_operand 2 "immediate_operand" "n"))
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a99211d56b1..91fefacee80 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1123,6 +1123,22 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 }
 
+  /* With pack we can generate a 64 bit constant with the same high
+ and low 32 bits triviall.  */
+  if (cost > 3 && TARGET_64BIT && TARGET_ZBKB)
+{
+  unsigned HOST_WIDE_INT loval = value & 0x;
+  unsigned HOST_WIDE_INT hival = value & ~loval;
+  if (hival >> 32 == loval)
+   {
+ cost = 1 + riscv_build_integer_1 (codes, sext_hwi (loval, 32), mode);
+ codes[cost - 1].code = CONCAT;
+ codes[cost - 1].value = 0;
+ codes[cost - 1].use_uw = false;
+   }
+
+}
+
   return cost;
 }
 
@@ -2679,6 +2695,13 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
  rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
  x = riscv_emit_set (t, x);
}
+ else if (codes[i].code == CONCAT)
+   {
+ rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp;
+ rtx t2 = gen_lowpart (SImode, x);
+ emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2));
+ x = t;
+   }
  else
x = gen_rtx_fmt_ee (codes[i].code, mode,
x, GEN_INT (codes[i].value));
diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-9.c 
b/gcc/testsuite/gcc.target/riscv/synthesis-9.c
new file mode 100644
index 000..cc622188abc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/synthesis-9.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* We aggressively skip as we really just need to test the basic synthesis
+   which shouldn't vary based on the optimization level.  -O1 seems to work
+   and eliminates the usual sources of extraneous dead code that would throw
+   off the counts.  */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } 
*/
+/* { dg-options "-march=rv64gc_zba_zbb_zbkb_zbs" } */
+
+/* Rather than test for a specific synthesis of all these constants or
+   having thousands of tests each testing one variant, we just test the
+   total number of instructions.
+
+   This isn't expected to change much and any change is worthy of a look.  */
+/* { dg-final { scan-assembler-times 
"\\t(add|addi|bseti|li|pack

[PATCH] Add AVX10.1 target_clones support

2024-05-28 Thread Haochen Jiang
Hi all,

Since AVX10 is the first major ISA introduced after AVX-512, we propose
to add target_clones support for it.

Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since
it is only for priority but not for implication, it won't be an issue.

Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk and backport
to GCC14?

Thx,
hAOCHEN

gcc/ChangeLog:

* common/config/i386/i386-common.cc: Change Granite Rapids
series CPU type to P_PROC_AVX10_1_512.
* common/config/i386/i386-cpuinfo.h (enum feature_priority):
Revise comment part. Add P_AVX10_1_256, P_AVX10_1_512,
P_PROC_AVX10_1_512.
* common/config/i386/i386-isas.h: Link to avx10.1-256, avx10.1-512.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_1-25.c: New test.
* gcc.target/i386/avx10_1-26.c: Ditto.
---
 gcc/common/config/i386/i386-common.cc  | 4 ++--
 gcc/common/config/i386/i386-cpuinfo.h  | 5 -
 gcc/common/config/i386/i386-isas.h | 4 ++--
 gcc/testsuite/gcc.target/i386/avx10_1-25.c | 9 +
 gcc/testsuite/gcc.target/i386/avx10_1-26.c | 9 +
 5 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-25.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx10_1-26.c

diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 77b154663bc..d578918dfb7 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2273,10 +2273,10 @@ const pta processor_alias_table[] =
   {"meteorlake", PROCESSOR_ALDERLAKE, CPU_HASWELL, PTA_ALDERLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ALDERLAKE), P_PROC_AVX2},
   {"graniterapids", PROCESSOR_GRANITERAPIDS, CPU_HASWELL, PTA_GRANITERAPIDS,
-M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX512F},
+M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS), P_PROC_AVX10_1_512},
   {"graniterapids-d", PROCESSOR_GRANITERAPIDS_D, CPU_HASWELL,
 PTA_GRANITERAPIDS_D, M_CPU_SUBTYPE (INTEL_COREI7_GRANITERAPIDS_D),
-P_PROC_AVX512F},
+P_PROC_AVX10_1_512},
   {"arrowlake", PROCESSOR_ARROWLAKE, CPU_HASWELL, PTA_ARROWLAKE,
 M_CPU_SUBTYPE (INTEL_COREI7_ARROWLAKE), P_PROC_AVX2},
   {"arrowlake-s", PROCESSOR_ARROWLAKE_S, CPU_HASWELL, PTA_ARROWLAKE_S,
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 73131657eab..be52ad2c60d 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -112,7 +112,7 @@ enum processor_subtypes
 /* Priority of i386 features, greater value is higher priority.   This is
used to decide the order in which function dispatch must happen.  For
instance, a version specialized for SSE4.2 should be checked for dispatch
-   before a version for SSE3, as SSE4.2 implies SSE3.  */
+   before a version for SSE3.  */
 enum feature_priority
 {
   P_NONE = 0,
@@ -148,6 +148,9 @@ enum feature_priority
   P_AVX512F,
   P_PROC_AVX512F,
   P_X86_64_V4,
+  P_AVX10_1_256,
+  P_AVX10_1_512,
+  P_PROC_AVX10_1_512,
   P_PROC_DYNAMIC
 };
 
diff --git a/gcc/common/config/i386/i386-isas.h 
b/gcc/common/config/i386/i386-isas.h
index d6deb9a1522..9c2179a3dd8 100644
--- a/gcc/common/config/i386/i386-isas.h
+++ b/gcc/common/config/i386/i386-isas.h
@@ -194,6 +194,6 @@ ISA_NAMES_TABLE_START
   ISA_NAMES_TABLE_ENTRY("apxf", FEATURE_APX_F, P_NONE, "-mapxf")
   ISA_NAMES_TABLE_ENTRY("usermsr", FEATURE_USER_MSR, P_NONE, "-musermsr")
   ISA_NAMES_TABLE_ENTRY("avx10.1", FEATURE_AVX10_1_256, P_NONE, "-mavx10.1")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_NONE, 
"-mavx10.1-256")
-  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_NONE, 
"-mavx10.1-512")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-256", FEATURE_AVX10_1_256, P_AVX10_1_256, 
"-mavx10.1-256")
+  ISA_NAMES_TABLE_ENTRY("avx10.1-512", FEATURE_AVX10_1_512, P_AVX10_1_512, 
"-mavx10.1-512")
 ISA_NAMES_TABLE_END
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-25.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
new file mode 100644
index 000..73f1b724560
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-25.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-256")))
+__m256d foo(__m256d a, __m256d b)
+{
+  return a + b;
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx10_1-26.c 
b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
new file mode 100644
index 000..514ab57a406
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx10_1-26.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f" } */
+
+#include 
+__attribute__((target_clones ("default","avx10.1-512")))
+__m512d foo(__m512d a, __m512d b)
+{
+  return a + b;
+}
-- 
2.31.1



[COMMITTED] Gori_on_edge tweaks.

2024-05-28 Thread Andrew MacLeod


FAST_VRP uses a non-ranger gori_on_edge routine to calculate the full 
set of SSA ranges that can be calculated on an edge.  It allows an 
optional  outgoing_edge_range object if one wanted to use switches.  
This is now integrated with the gori () method of a range_query, and is 
no longer needed.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit a19f588d0b71a4cbc48b064177de87d3ca46b39f
Author: Andrew MacLeod 
Date:   Wed May 22 19:51:16 2024 -0400

Gori_on_edge tweaks.

FAST_VRP uses a non-ranger gori_on_edge routine which allows an optional
outgoing_edge_range object if one wanted to use switches.  This is now
integrated with the gori () method of a range_query, and is no longer
needed.

* gimple-range-gori.cc (gori_on_edge): Always use static ranges
from the specified range_query.
* gimple-range-gori.h (gori_on_edge): Change prototype.
* gimple-range.cc (dom_ranger::maybe_push_edge): Change arguments
to call.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 0d471b46903..d489aef312c 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -1625,28 +1625,20 @@ gori_calc_operands (vrange &lhs, gimple *stmt, 
ssa_cache &r, range_query *q)
 }
 
 // Use ssa_cache R as a repository for all outgoing ranges on edge E that
-// can be calculated.  Use OGR if present to establish starting edge ranges,
-// and Q to resolve operand values.  If Q is NULL use the current range
+// can be calculated.  Use Q to establish starting edge ranges anbd to resolve
+// operand values.  If Q is NULL use the current range
 // query available to the system.
 
 bool
-gori_on_edge (ssa_cache &r, edge e, range_query *q, gimple_outgoing_range *ogr)
+gori_on_edge (ssa_cache &r, edge e, range_query *q)
 {
+  if (!q)
+q = get_range_query (cfun);
   // Start with an empty vector
   r.clear ();
   int_range_max lhs;
   // Determine if there is an outgoing edge.
-  gimple *stmt;
-  if (ogr)
-stmt = ogr->edge_range_p (lhs, e);
-  else
-{
-  stmt = gimple_outgoing_range_stmt_p (e->src);
-  if (stmt && is_a (stmt))
-   gcond_edge_range (lhs, e);
-  else
-   stmt = NULL;
-}
+  gimple *stmt = q->gori ().edge_range_p (lhs, e);
   if (!stmt)
 return false;
   gori_calc_operands (lhs, stmt, r, q);
diff --git a/gcc/gimple-range-gori.h b/gcc/gimple-range-gori.h
index 9b4bcd919f5..11019e38471 100644
--- a/gcc/gimple-range-gori.h
+++ b/gcc/gimple-range-gori.h
@@ -213,10 +213,8 @@ private:
 // ssa_cache structure).
 // GORI_NAME_ON_EDGE  is used to simply ask if NAME has a range on edge E
 
-// Fill ssa-cache R with any outgoing ranges on edge E, using OGR and QUERY.
-bool gori_on_edge (class ssa_cache &r, edge e,
-  range_query *query = NULL,
-  gimple_outgoing_range *ogr = NULL);
+// Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
+bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);
 
 // Query if NAME has an outgoing range on edge E, and return it in R if so.
 // Note this doesnt use ranger, its a static GORI analysis of the range in
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 711646abb67..be22bb4aa18 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1156,7 +1156,7 @@ dom_ranger::maybe_push_edge (edge e, bool edge_0)
 e_cache = m_freelist.pop ();
   else
 e_cache = new ssa_lazy_cache;
-  gori_on_edge (*e_cache, e, this, &gori ());
+  gori_on_edge (*e_cache, e, this);
   if (e_cache->empty_p ())
 m_freelist.safe_push (e_cache);
   else


Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread YunQiang Su
Richard Sandiford  于2024年5月29日周三 05:28写道:
>
> YunQiang Su  writes:
> > If `find_a_program` cannot find `as/ld/objcopy` and we are a cross 
> > toolchain,
> > the final fallback is `as/ld` of system.  In fact, we can have a try with
> > -as/ld/objcopy before fallback to native as/ld/objcopy.
> >
> > This patch is derivatived from Debian's patch:
> >   gcc-search-prefixed-as-ld.diff
>
> I'm probably making you repeat a previous discussion, sorry, but could
> you describe the use case in more detail?  The current approach to
> handling cross toolchains has been used for many years.  Presumably
> this patch is supporting a different way of organising things,
> but I wasn't sure from the description what it was.
>
> AIUI, we currently assume that cross as, ld and objcopy will be
> installed under those names in $prefix/$target_alias/bin (aka $tooldir/bin).
> E.g.:
>
>bin/aarch64-elf-as = aarch64-elf/bin/as
>
> GCC should then find as in aarch64-elf/bin.
>
> Is that not true in your case?
>

Yes. This patch is only about the final fallback. I mean aarch64-elf/bin/as
still has higher priority than bin/aarch64-elf-as.

In the current code, we find gas with:
/prefix/aarch64-elf/bin/as > $PATH/as

And this patch a new one between them:
/prefix/aarch64-elf/bin/as > $PATH/aarch64-elf-as > $PATH/as

> To be clear, I'm not saying the patch is wrong.  I'm just trying to
> understand why the patch is needed.
>

Yes. If gcc is configured correctly, it is not so useful.
In some case for some lazy user, it may be useful,
for example, the binutils installed into different prefix with libc etc.

For example, binutils is installed into /usr/aarch64-elf/bin, while
libc is installed into /usr/local/aarch64-elf/.

> Thanks,
> Richard
>
> >
> > gcc
> >   * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
> >   to native as/ld/objcopy.
> > ---
> >  gcc/gcc.cc | 20 
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> > index 830a4700a87..3dc6348d761 100644
> > --- a/gcc/gcc.cc
> > +++ b/gcc/gcc.cc
> > @@ -3293,6 +3293,26 @@ execute (void)
> >string = find_a_program(commands[0].prog);
> >if (string)
> >   commands[0].argv[0] = string;
> > +  else if (*cross_compile != '0'
> > + && !strcmp (commands[0].argv[0], commands[0].prog)
> > + && (!strcmp (commands[0].prog, "as")
> > + || !strcmp (commands[0].prog, "ld")
> > + || !strcmp (commands[0].prog, "objcopy")))
> > + {
> > +   string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> > + commands[0].prog, NULL);
> > +   const char *string_args[] = {string, "--version", NULL};
> > +   int exit_status = 0;
> > +   int err = 0;
> > +   const char *errmsg = pex_one (PEX_SEARCH, string,
> > +   CONST_CAST (char **, string_args), string,
> > +   NULL, NULL, &exit_status, &err);
> > +   if (errmsg == NULL && exit_status == 0 && err == 0)
> > + {
> > +   commands[0].argv[0] = string;
> > +   commands[0].prog = string;
> > + }
> > + }
> >  }
> >
> >for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Kewen.Lin
on 2024/5/28 20:09, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> on 2024/5/27 20:54, Richard Biener wrote:
>>> On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:

 Hi,
   This patch adds an optab for __builtin_isfinite. The finite check can be
 implemented on rs6000 by a single instruction. It needs an optab to be
 expanded to the certain sequence of instructions.

   The subsequent patches will implement the expand on rs6000.

   Compared to previous version, the main change is to specify acceptable
 modes for the optab.
 https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html

   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
 regressions. Is this OK for trunk?

 Thanks
 Gui Haochen

 ChangeLog
 optab: Add isfinite_optab for isfinite builtin

 gcc/
 * builtins.cc (interclass_mathfn_icode): Set optab to 
 isfinite_optab
 for isfinite builtin.
 * optabs.def (isfinite_optab): New.
 * doc/md.texi (isfinite): Document.


 patch.diff
 diff --git a/gcc/builtins.cc b/gcc/builtins.cc
 index f8d94c4b435..b8432f84020 100644
 --- a/gcc/builtins.cc
 +++ b/gcc/builtins.cc
 @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
errno_set = true; builtin_optab = ilogb_optab; break;
  CASE_FLT_FN (BUILT_IN_ISINF):
builtin_optab = isinf_optab; break;
 -case BUILT_IN_ISNORMAL:
  case BUILT_IN_ISFINITE:
 +  builtin_optab = isfinite_optab; break;
 +case BUILT_IN_ISNORMAL:
  CASE_FLT_FN (BUILT_IN_FINITE):
  case BUILT_IN_FINITED32:
  case BUILT_IN_FINITED64:
 diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
 index 5730bda80dc..67407fad37d 100644
 --- a/gcc/doc/md.texi
 +++ b/gcc/doc/md.texi
 @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
 with operand 2.

  This pattern is not allowed to @code{FAIL}.

 +@cindex @code{isfinite@var{m}2} instruction pattern
 +@item @samp{isfinite@var{m}2}
 +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
 +@code{DFmode}, or @code{TFmode} floating point number and to 0
>>>
>>> It should probably say scalar floating-point mode?  But what about the 
>>> result?
>>> Is any integer mode OK?  That's esp. important if this might be used on
>>> vector modes.
>>>
 +otherwise.
 +
 +If this pattern @code{FAIL}, a call to the library function
 +@code{isfinite} is used.
>>>
>>> Or it's otherwise inline expanded?  Or does this imply targets
>>> have to make sure to implement the pattern when isfinite is
>>> not available in libc/libm?  I suggest to leave this sentence out,
>>> we usually only say when a pattern may _not_ FAIL (and usually
>>> FAILing isn't different from not providing a pattern).
>>
>> As Haochen's previous reply, I think there are three cases:
>>   1) no optab defined, fold in a generic way;
>>   2) optab defined, SUCC, expand as what it defines;
>>   3) optab defined, FAIL, generate a library call;
>>
>> From above, I had the concern that ports may assume FAILing can
>> fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

Good point, sure, I'll follow up this.

BR,
Kewen

> 
>> Does your comment imply ports usually don't make such assumption
>> (or they just check what happens for FAIL)?
>>
>> BR,
>> Kewen
>>
>>>
  @end table

  @end ifset
 diff --git a/gcc/optabs.def b/gcc/optabs.def
 index ad14f9328b9..dcd77315c2a 100644
 --- a/gcc/optabs.def
 +++ b/gcc/optabs.def
 @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
  OPTAB_D (hypot_optab, "hypot$a3")
  OPTAB_D (ilogb_optab, "ilogb$a2")
  OPTAB_D (isinf_optab, "isinf$a2")
 +OPTAB_D (isfinite_optab, "isfinite$a2")
  OPTAB_D (issignaling_optab, "issignaling$a2")
  OPTAB_D (ldexp_optab, "ldexp$a3")
  OPTAB_D (log10_optab, "log10$a2")
>>
>>
>>



Re: [PATCH v4] RISC-V: Introduce -mvector-strict-align.

2024-05-28 Thread Kito Cheng
I just created two PRs for adding those new options into
riscv-toolchain-conventions, so that we could make sure it aligned
with clang/LLVM community.

https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/49
https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/50

On Wed, May 29, 2024 at 3:20 AM Robin Dapp  wrote:
>
> Hi,
>
> this patch disables movmisalign by default and introduces
> the -mno-vector-strict-align option to override it and re-enable
> movmisalign.  For now, generic-ooo is the only uarch that supports
> misaligned vector access.
>
> The patch also adds a check_effective_target_riscv_v_misalign_ok to
> the testsuite which enables or disables the vector misalignment tests
> depending on whether the target under test can execute a misaligned
> vle32.
>
> Changes from v3:
>  - Adressed Kito's comments.
>  - Made -mscalar-strict-align a real alias.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
> Move from here...
> * config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
> ...to here and map to riscv_vector_unaligned_access_p.
> * config/riscv/riscv.opt: Add -mvector-strict-align.
> * config/riscv/riscv.cc (struct riscv_tune_param): Add
> vector_unaligned_access.
> (riscv_override_options_internal): Set
> riscv_vector_unaligned_access_p.
> * doc/invoke.texi: Document -mvector-strict-align.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp: Add
> check_effective_target_riscv_v_misalign_ok.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
> -mno-vector-strict-align.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
> * gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
> ---
>  gcc/config/riscv/riscv-opts.h |  3 --
>  gcc/config/riscv/riscv.cc | 19 +++
>  gcc/config/riscv/riscv.h  |  5 +++
>  gcc/config/riscv/riscv.opt|  8 +
>  gcc/doc/invoke.texi   | 22 
>  .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
>  .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
>  .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
>  gcc/testsuite/lib/target-supports.exp | 34 +--
>  13 files changed, 93 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index 1b2dd5757a8..f58a07abffc 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
>   ? 0 
>   \
>   : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
>
> -/* TODO: Enable RVV movmisalign by default for now.  */
> -#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
> -
>  /* The maximmum LMUL according to user configuration.  */
>  #define TARGET_MAX_LMUL  
>   \
>(int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a99211d56b1..13cd61a4a22 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -287,6 +287,7 @@ struct riscv_tune_param
>unsigned short memory_cost;
>unsigned short fmv_cost;
>bool slow_unaligned_access;
> +  bool vector_unaligned_access;
>bool use_divmod_expansion;
>bool overlap_op_by_pieces;
>unsigned int fusible_ops;
> @@ -299,6 +300,10 @@ struct riscv_tune_param
>  /* Whether unaligned accesses execute very slowly.  */
>  bool riscv_slow_unaligned_access_p;
>
> +/* Whether misaligned vector accesses are supported (i.e. do not
> +   throw an exception).  */
> +bool riscv_vector_unaligned_access_p;
> +
>  /* Whether user explicitly passed -mstrict-align.  */
>  bool riscv_user_wants_strict_align;
>
> @@ -441,6 +446,7 @@ static const struct riscv_tune_param rocket_tune_info = {
>5,   /* memory_cost */
>8,   /* fmv_cost */
>true,/* 
> slow_unaligned_access */
> +  false,   /* vector_unaligned_access */
>false,   /* use_divmod_expansion */
>false,  

Re: ping: [PATCH] libcpp: Support extended characters for #pragma {push,pop}_macro [PR109704]

2024-05-28 Thread Lewis Hyatt
Hello-

May I please ping this one (now for GCC 15)? Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html

-Lewis

On Sat, Feb 10, 2024 at 9:02 AM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642926.html
>
> May I please ping this one? Thanks!
>
> On Sat, Jan 13, 2024 at 5:12 PM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109704
> >
> > The below patch fixes the issue noted in the PR that extended characters
> > cannot appear in the identifier passed to a #pragma push_macro or #pragma
> > pop_macro. Bootstrap + regtest all languages on x86-64 Linux. Is it OK for
> > GCC 13 please?
> >
> > I know we just entered stage 4, however I feel this is kinda like an old
> > regression, given that the issue was not apparent until support for UCNs and
> > UTF-8 in identifiers got added. FWIW, it would be nice if it makes it into
> > GCC 13, because AFAIK all other UTF-8-related bugs are fixed in this
> > release. (The other major one was for extended characters in a user-defined
> > literal, that was fixed by r14-2629).
> >
> > Speaking of just entering stage 4. I do have 4 really short patches sent
> > over the past several months that never got any response. Is there any
> > chance someone may have a few minutes to look at them please? They are
> > really just like 1-3 line fixes for PRs.
> >
> > libcpp (pinged once recently):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641247.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640386.html
> >
> > diagnostics (pinged for 3rd time last week):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638692.html
>
> > -- >8 --
> >
> > The implementation of #pragma push_macro and #pragma pop_macro has to date
> > made use of an ad-hoc function, _cpp_lex_identifier(), which lexes an
> > identifier out of a string. When support was added for extended characters
> > in identifiers ($, UCNs, or UTF-8), that support was added only for the
> > "normal" way of lexing identifiers out of a cpp_buffer (_cpp_lex_direct) and
> > not for the ad-hoc way. Consequently, extended identifiers are not usable
> > with these pragmas.
> >
> > The logic for lexing identifiers has become more complicated than it was
> > when _cpp_lex_identifier() was written -- it now handles things like \N{}
> > escapes in C++, for instance -- and it no longer seems practical to maintain
> > a redundant code path for lexing identifiers. Address the issue by changing
> > the implementation of #pragma {push,pop}_macro to lex identifiers in the
> > expected way, i.e. by pushing a cpp_buffer and lexing the identifier from
> > there.
> >
> > The existing implementation has some quirks because of the ad-hoc parsing
> > logic. For example:
> >
> >  #pragma push_macro("X ")
> >  ...
> >  #pragma pop_macro("X")
> >
> > will not restore macro X (note the extra space in the first string). 
> > However:
> >
> >  #pragma push_macro("X ")
> >  ...
> >  #pragma pop_macro("X ")
> >
> > actually does sucessfully restore "X". This is because the key for looking
> > up the saved macro on the push stack is the original string passed, so the
> > string passed to pop_macro needs to match it exactly. It is not that easy to
> > reproduce this logic in the world of extended characters, given that for
> > example it should be valid to pass a UCN to push_macro, and the
> > corresponding UTF-8 to pop_macro. Given that this aspect of the existing
> > behavior seems unintentional and has no tests (and does not match other
> > implementations), I opted to make the new logic more straightforward. The
> > string passed needs to lex to one token, which must be a valid identifier,
> > or else no action is taken and no error is generated. Any diagnostics
> > encountered during lexing (e.g., due to a UTF-8 character not permitted to
> > appear in an identifier) are also suppressed.
> >
> > It could be nice (for GCC 15) to also add a warning if a pop_macro does not
> > match a previous push_macro.
> >
> > libcpp/ChangeLog:
> >
> > PR preprocessor/109704
> > * include/cpplib.h (class cpp_auto_suppress_diagnostics): New class.
> > * errors.cc
> > (cpp_auto_suppress_diagnostics::cpp_auto_suppress_diagnostics): New
> > function.
> > (cpp_auto_suppress_diagnostics::~cpp_auto_suppress_diagnostics): New
> > function.
> > * charset.cc (noop_diagnostic_cb): Remove.
> > (cpp_interpret_string_ranges): Refactor diagnostic suppression logic
> > into new class cpp_auto_suppress_diagnostics.
> > (count_source_chars): Likewise.
> > * directives.cc (cpp_pop_definition): Add cpp_hashnode argument.
> > (lex_identifier_from_string): New static helper function.
> > (push_pop_macro_common): Refactor common logic from
> > do_pragma_push_macro and do_pragma_pop_macro; use
> > lex_identifier_from_string i

[PATCH v2 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.

Tested-by: Edwin Lu 
---
Added tested-by on Vineet's recommendation. Please wait for riscv precommit to
finish before committing.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 62 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 68 files changed, 410 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/tes

[PATCH v2 2/2] Prevent divide-by-zero

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

gcc/ChangeLog:
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove xfail.
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
 gcc/tree-vect-stmts.cc  | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
index fd996a27501..79d03612a22 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
-mno-autovec-segment" } */
-/* { xfail *-*-* } */

 enum e { c, d };
 enum g { f };
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..34f5736ba00 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
 - (vec_num * j + i) * nunits);
/* remain should now be > 0 and < nunits.  */
unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)
+   && constant_multiple_p (nunits, remain, &num))
  {
tree ptype;
new_vtype
--
2.43.2



[PATCH v2 0/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
Rebased to squash Edwin's fixup into Greg's patch. Split out the middle-end
change and xfailed the associated testcase so the second patch can land
seperately.

Relying on pre-commit CI for full testing.

Greg McGary (2):
  RISC-V: add option -m(no-)autovec-segment
  Prevent divide-by-zero

 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/stru

[patch] libgomp: Enable USM for some nvptx devices

2024-05-28 Thread Tobias Burnus
While most of the nvptx systems I have access to don't have the support 
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, 
one has:


Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support 
this feature. And with that feature, unified-shared memory support does 
work, presumably by handling automatic page migration when a page fault 
occurs.


Hence: Enable USM support for those. When doing so, all 'requires 
unified_shared_memory' tests of sollve_vv pass :-)


I am not quite sure whether there are unintended side effects, hence, I 
have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.


OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…

libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES;
for those, unified shared memory is supported in hardware. This
patch enables support for those - if all installed nvptx devices
have this feature (as the capabilities are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h
	(CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES):
	Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  5 -
 libgomp/plugin/plugin-nvptx.c | 15 +++
 libgomp/target.c  | 24 +++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..db640d20366 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..e0d37f67983 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,7 +6435,10 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the
+  @code{CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES}
+  attribute; otherwise, all nvptx device are removed from the
   list of available devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..c4b0f5dd4bf 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether automatic page migration is supported; if so, enable USM.
+ Currently, capabilities is per device type, hence, check all devices.  */
+  if (num_devices > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+for (int dev = 0; dev < num_devices; dev++)
+  {
+	int pi;
+	CUresult r;
+	r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
+	  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES,
+	  dev);
+	if (r != CUDA_SUCCESS || pi == 0)
+	  return -1;
+  }
   return num_devices;
 }
 
diff --git a/libgomp/target.c b/lib

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool
Hi!

On Mon, May 27, 2024 at 05:37:23PM +0800, HAO CHEN GUI wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;

This needs a line break after the first ; (like after *any* semicolon
in C).  It is rather important that every "break;" stands out :-)

> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> +otherwise.

operand 0 is the output of the builtin, right?  So write that instead?
"Return 1 if the operand (a scalar floating poiint number) is finite",
or such?


Segher


Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool
On Tue, May 28, 2024 at 02:09:50PM +0200, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
> > As Haochen's previous reply, I think there are three cases:
> >   1) no optab defined, fold in a generic way;
> >   2) optab defined, SUCC, expand as what it defines;
> >   3) optab defined, FAIL, generate a library call;
> >
> > From above, I had the concern that ports may assume FAILing can
> > fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

That certainly would be the least surprising!


Segher


Re: [PATCH 2/2] RISC-V: Fix testcases renamed test flag options

2024-05-28 Thread Vineet Gupta
On 5/28/24 14:55, Patrick O'Neill wrote:
> From: Edwin Lu 
>
> Some testcases still had --param=riscv-autovec-preference=_,
> update to use -mrvv-vector-bits=_.

And this can be squashed with prev one, maybe added Tested-by Edwin.

Thx,
-Vineet


Re: [PATCH 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Vineet Gupta



On 5/28/24 14:55, Patrick O'Neill wrote:
> From: Greg McGary 
>
> Add option -m(no-)autovec-segment to enable/disable autovectorizer
> from emitting vector segment load/store instructions. This is useful for
> performance experiments.
>
> gcc/ChangeLog:
>   * config/riscv/autovec.md (vec_mask_len_load_lanes, 
> vec_mask_len_store_lanes):
> Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
>   * gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
> macro.
>   * gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
>   * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
> divide-by-zero.

I think this middle-end change needs to be broken out, even if
eventually merged when committing, with its own test case.

-Vineet


[PATCH 2/2] RISC-V: Fix testcases renamed test flag options

2024-05-28 Thread Patrick O'Neill
From: Edwin Lu 

Some testcases still had --param=riscv-autovec-preference=_,
update to use -mrvv-vector-bits=_.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/no-segment.c: Update dejagnu flags
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c: 
Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c: Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-3.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-4.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-5.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-6.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-7.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-8.c:
  Ditto
* gcc.target/riscv/rvv/autovec/struct/struct_vect_noseg_run-9.c:
  Ditto
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-2.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-3.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-4.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-5.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-6.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_load_noseg_run-7.c   | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-1.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-2.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-3.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-4.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-5.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-6.c  | 2 +-
 .../riscv/rvv/autovec/struct/mask_struct_store_noseg_run-7.c  | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-1.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-10.c   | 4 ++--
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-11.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-12.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-13.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-14.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-15.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-16.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-17.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-18.c   | 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-2.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-3.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-4.c| 2 +-
 .../riscv/rvv/autovec/struct/struct_vect_noseg_run-5.c| 2 +-
 .../riscv/rvv/autovec/struct/s

[PATCH 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-28 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 

[PATCH 0/2] RISC-V: Add -m(no-)autovec-segment option

2024-05-28 Thread Patrick O'Neill
Rebased and combined these two patches into a series for precommit-CI to
properly test.

Edwin Lu (1):
  RISC-V: Fix testcases renamed test flag options

Greg McGary (1):
  RISC-V: add option -m(no-)autovec-segment

 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg_run-1.c
 create mode 100644 
gcc/testsuite/gcc.tar

[COMMITTED] More tweaks from gimple_outgoing_range changes.

2024-05-28 Thread Andrew MacLeod
The dom_ranger class used for fast vrp no longer needs it's own local 
gimple_outgoing_range object as it is now always available from the 
range_query parent class.


The builtin_unreachable code for adjusting globals and removing the 
builtin calls during the final VRP pass can now function with just a 
range_query object rather than a specific ranger.   This adjusts it to 
use the extra methods in the base range_query API. This will now allow 
removal of builtin_unreachable calls even if there is no active ranger 
with dependency info available.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew
From 05ff069ba937dc3970f2a757e426935fcf4c15fb Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 22 May 2024 19:27:01 -0400
Subject: [PATCH 3/5] More tweaks from gimple_outgoing_range changes.

the dom_ranger used for fast vrp no longer needs a local
gimple_outgoing_range object as it is now always available from the
range_query parent class.

The builtin_unreachable code for adjusting globals and removing the
builtin calls during the final VRP pass can now function with just
a range_query object rather than a specific ranger.   This adjusts it to
use the extra methods in the range_query API.
This will now allow removal of builtin_unreachable calls even if there is no
active ranger with dependency info available.

	* gimple-range.cc (dom_ranger::dom_ranger): Do not initialize m_out.
	(dom_ranger::maybe_push_edge): Use gori () rather than m_out.
	* gimple-range.h (dom_ranger::m_out): Remove.
	* tree-vrp.cc (remove_unreachable::remove_unreachable): Use a
	range-query ranther than a gimple_ranger.
	(remove_unreachable::remove): New.
	(remove_unreachable::m_ranger): Change to a range_query.
	(remove_unreachable::handle_early): If there is no dependency
	information, do nothing.
	(remove_unreachable::remove_and_update_globals): Do not update
	globals if there is no dependecy info to use.
---
 gcc/gimple-range.cc |  4 ++--
 gcc/gimple-range.h  |  1 -
 gcc/tree-vrp.cc | 47 +++--
 3 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 0749c9fa215..711646abb67 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -922,7 +922,7 @@ assume_query::dump (FILE *f)
 
 // Create a DOM based ranger for use by a DOM walk pass.
 
-dom_ranger::dom_ranger () : m_global (), m_out ()
+dom_ranger::dom_ranger () : m_global ()
 {
   m_freelist.create (0);
   m_freelist.truncate (0);
@@ -1156,7 +1156,7 @@ dom_ranger::maybe_push_edge (edge e, bool edge_0)
 e_cache = m_freelist.pop ();
   else
 e_cache = new ssa_lazy_cache;
-  gori_on_edge (*e_cache, e, this, &m_out);
+  gori_on_edge (*e_cache, e, this, &gori ());
   if (e_cache->empty_p ())
 m_freelist.safe_push (e_cache);
   else
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 1532951a449..180090bed15 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -121,7 +121,6 @@ protected:
   DISABLE_COPY_AND_ASSIGN (dom_ranger);
   void maybe_push_edge (edge e, bool edge_0);
   ssa_cache m_global;
-  gimple_outgoing_range m_out;
   vec m_freelist;
   vec m_e0;
   vec m_e1;
diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index 7d7f9fe2932..1c7b451d8fb 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -85,14 +85,15 @@ along with GCC; see the file COPYING3.  If not see
 
 class remove_unreachable {
 public:
-  remove_unreachable (gimple_ranger &r, bool all) : m_ranger (r), final_p (all)
+  remove_unreachable (range_query &r, bool all) : m_ranger (r), final_p (all)
 { m_list.create (30); }
   ~remove_unreachable () { m_list.release (); }
   void handle_early (gimple *s, edge e);
   void maybe_register (gimple *s);
+  bool remove ();
   bool remove_and_update_globals ();
   vec > m_list;
-  gimple_ranger &m_ranger;
+  range_query &m_ranger;
   bool final_p;
 };
 
@@ -195,6 +196,9 @@ fully_replaceable (tree name, basic_block bb)
 void
 remove_unreachable::handle_early (gimple *s, edge e)
 {
+  // If there is no gori_ssa, there is no early processsing.
+  if (!m_ranger.gori_ssa ())
+return ;
   bool lhs_p = TREE_CODE (gimple_cond_lhs (s)) == SSA_NAME;
   bool rhs_p = TREE_CODE (gimple_cond_rhs (s)) == SSA_NAME;
   // Do not remove __builtin_unreachable if it confers a relation, or
@@ -253,6 +257,41 @@ remove_unreachable::handle_early (gimple *s, edge e)
 }
 }
 
+// Process the edges in the list, change the conditions and removing any
+// dead code feeding those conditions.   This removes the unreachables, but
+// makes no attempt to set globals values.
+
+bool
+remove_unreachable::remove ()
+{
+  if (!final_p || m_list.length () == 0)
+return false;
+
+  bool change = false;
+  unsigned i;
+  for (i = 0; i < m_list.length (); i++)
+{
+  auto eb = m_list[i];
+  basic_block src = BASIC_BLOCK_FOR_FN (cfun, eb.first);
+  basic_block dest = BASIC_BLOCK_FOR_FN (cfun, eb.second);
+  if (!src || !dest)
+	con

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Compared to previous version, the main change is to specify acceptable
> modes for the optab.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>   for isfinite builtin.
>   * optabs.def (isfinite_optab): New.
>   * doc/md.texi (isfinite): Document.
>
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..b8432f84020 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..67407fad37d 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> +otherwise.

This has probably already been discussed, sorry, but how about defining
the optab to return a strict 0/1 result, rather than just zero/nonzero?
I realise that's stricter than the underlying math.h routines, but it
would in principle avoid the need to expand extra instructions in
a setcc-like operation.

Richard

> +
> +If this pattern @code{FAIL}, a call to the library function
> +@code{isfinite} is used.
> +
>  @end table
>
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH v3] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

2024-05-28 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, May 27, 2024 at 9:48 AM Jiawei  wrote:
>>
>> Return NULL_TREE when genop3 equal EXACT_DIV_EXPR.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html
>>
>> version log v3: remove additional POLY_INT_CST check.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html
>
> OK.
>
> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>> * tree-ssa-pre.cc (create_component_ref_by_pieces_1): New conditions.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.
>>
>> ---
>>  .../gcc.target/riscv/rvv/vsetvl/pr115214.c| 52 +++
>>  gcc/tree-ssa-pre.cc   | 10 ++--
>>  2 files changed, 59 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
>> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>> new file mode 100644
>> index 000..fce2e9da766
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
>> @@ -0,0 +1,52 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 
>> -w" } */
>> +/* { dg-skip-if "" { *-*-* } { "-flto" } } */
>> +
>> +#include 
>> +
>> +static inline __attribute__(()) int vaddq_f32();
>> +static inline __attribute__(()) int vload_tillz_f32(int nlane) {
>> +  vint32m1_t __trans_tmp_9;
>> +  {
>> +int __trans_tmp_0 = nlane;
>> +{
>> +  vint64m1_t __trans_tmp_1;
>> +  vint64m1_t __trans_tmp_2;
>> +  vint64m1_t __trans_tmp_3;
>> +  vint64m1_t __trans_tmp_4;
>> +  if (__trans_tmp_0 == 1) {
>> +{
>> +  __trans_tmp_3 =
>> +  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
>> +}
>> +__trans_tmp_4 = __trans_tmp_2;
>> +  }
>> +  __trans_tmp_4 = __trans_tmp_3;
>> +  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
>> +}
>> +  }
>> +  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' 
>> cannot be passed to an unprototyped function} } */
>> +}
>> +
>> +char CFLOAT_add_args[3];
>> +const int *CFLOAT_add_steps;
>> +const int CFLOAT_steps;
>> +
>> +__attribute__(()) void CFLOAT_add() {
>> +  char *b_src0 = &CFLOAT_add_args[0], *b_src1 = &CFLOAT_add_args[1],
>> +   *b_dst = &CFLOAT_add_args[2];
>> +  const float *src1 = (float *)b_src1;
>> +  float *dst = (float *)b_dst;
>> +  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
>> +  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
>> +  const int hstep = 4 / 2;
>> +  vfloat32m1x2_t a;
>> +  int len = 255;
>> +  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
>> +int b = vload_tillz_f32(len);
>> +int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
>> '__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
>> +  }
>> +  for (; len > 0; --len, b_src0 += CFLOAT_steps,
>> +  b_src1 += CFLOAT_add_steps[1], b_dst += 
>> CFLOAT_add_steps[2])
>> +;
>> +}
>> diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
>> index 75217f5cde1..5cf1968bc26 100644
>> --- a/gcc/tree-ssa-pre.cc
>> +++ b/gcc/tree-ssa-pre.cc
>> @@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, 
>> vn_reference_t ref,
>>here as the element alignment may be not visible.  See
>>PR43783.  Simply drop the element size for constant
>>sizes.  */
>> -   if (TREE_CODE (genop3) == INTEGER_CST
>> +   if ((TREE_CODE (genop3) == INTEGER_CST
>> && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
>> && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
>> -(wi::to_offset (genop3)
>> - * vn_ref_op_align_unit (currop
>> +(wi::to_offset (genop3) * vn_ref_op_align_unit 
>> (currop

Sorry for the nits, but the original formatting was correct here.
The new one instead goes over 80 columns.

>> + || (TREE_CODE (genop3) == EXACT_DIV_EXPR
>> +   && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
>> +   && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT 
>> (elmt_type))

Similarly this line is too long.

Thanks for fixing this.

Richard

>> +   && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
>> +vn_ref_op_align_unit (currop
>>   genop3 = NULL_TREE;
>> else
>>   {
>> --
>> 2.25.1
>>


Re: [C23 PATCH]: allow aliasing for types derived from structs with variable size

2024-05-28 Thread Joseph Myers
On Sun, 26 May 2024, Martin Uecker wrote:

> +/* Helper function for comptypes.  For two compatible types, return 1
> +   if they pass consistency checks.  In particular we test that
> +   TYPE_CANONICAL ist set correctly, i.e. the two types can alias.  */

s/ist/is/.  OK with that fix.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [C23 PATCH, v2] fix aliasing for structures/unions with incomplete types

2024-05-28 Thread Joseph Myers
On Sun, 26 May 2024, Martin Uecker wrote:

> This is the patch I sent previously, but I tried to improve the
> description and added a long comment.  This patch is needed so
> that we do not have to update TYPE_CANONICAL of structures / unions
> when a tagged type is completed that is (recursively) pointed to 
> by a member of the structure / union.
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> C23: fix aliasing for structures/unions with incomplete types
> 
> When incomplete structure/union types are completed later, compatibility
> of struct types that contain pointers to such types changes.  When forming
> equivalence classes for TYPE_CANONICAL, we therefor need to be 
> conservative
> and treat all structs with the same tag which are pointer targets as
> equivalent for purposed of determining equivalency of structure/union
> types which contain such types as member. This avoids having to update
> TYPE_CANONICAL of such structure/unions recursively. The pointer types
> themselves are updated in c_update_type_canonical.
> 
> gcc/c/
> * c-typeck.cc (comptypes_internal): Add flag to track
> whether a struct is the target of a pointer.
> (tagged_types_tu_compatible): When forming equivalence
> classes, treat nested pointed-to structs as equivalent.
> 
> gcc/testsuite/
> * gcc.dg/c23-tag-incomplete-alias-1.c: New test.

This patch is OK.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-28 Thread Joseph Myers
On Fri, 24 May 2024, Martin Uecker wrote:

> This is another version of this patch with two changes:
> 
> - I added a fix (with test) for PR 115177 which is just the same
> issue for hardbools which are internally implemented as enums.
> 
> - I fixed the golang issue. Since the addition of the main variant
> to the seen decls is unconditional I removed also the addition
> of the type itself which now seems unnecessary.
> 
> Bootstrapped and regression tested on x86_64.

The front-end changes and the testcases are OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread Richard Sandiford
YunQiang Su  writes:
> If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try with
> -as/ld/objcopy before fallback to native as/ld/objcopy.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff

I'm probably making you repeat a previous discussion, sorry, but could
you describe the use case in more detail?  The current approach to
handling cross toolchains has been used for many years.  Presumably
this patch is supporting a different way of organising things,
but I wasn't sure from the description what it was.

AIUI, we currently assume that cross as, ld and objcopy will be
installed under those names in $prefix/$target_alias/bin (aka $tooldir/bin).
E.g.:

   bin/aarch64-elf-as = aarch64-elf/bin/as

GCC should then find as in aarch64-elf/bin.

Is that not true in your case?

To be clear, I'm not saying the patch is wrong.  I'm just trying to
understand why the patch is needed.

Thanks,
Richard

>
> gcc
>   * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
>   to native as/ld/objcopy.
> ---
>  gcc/gcc.cc | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..3dc6348d761 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,26 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
>   commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> + && !strcmp (commands[0].argv[0], commands[0].prog)
> + && (!strcmp (commands[0].prog, "as")
> + || !strcmp (commands[0].prog, "ld")
> + || !strcmp (commands[0].prog, "objcopy")))
> + {
> +   string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> + commands[0].prog, NULL);
> +   const char *string_args[] = {string, "--version", NULL};
> +   int exit_status = 0;
> +   int err = 0;
> +   const char *errmsg = pex_one (PEX_SEARCH, string,
> +   CONST_CAST (char **, string_args), string,
> +   NULL, NULL, &exit_status, &err);
> +   if (errmsg == NULL && exit_status == 0 && err == 0)
> + {
> +   commands[0].argv[0] = string;
> +   commands[0].prog = string;
> + }
> + }
>  }
>  
>for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)


Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-28 Thread Segher Boessenkool
On Sat, May 25, 2024 at 09:13:12AM -0300, Alexandre Oliva wrote:
> Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes
> out of a single call insn, but the call (bl) or jump (b) is not always
> the last opcode in the sequence.

> This does not seem to be a problem for exception handling tables, but
> the return_pc attribute in the call graph output in dwarf2+ debug
> information, that takes the address of a label output right after the
> call, does not match the value of the link register even for non-tail
> calls.  E.g., with ABI_AIX or ABI_ELFv2, such code as:
> 
>   foo ();
> 
> outputs:
> 
>   bl foo
>   nop
>  LVL#:
> [...]
>   .8byte .LVL#  # DW_AT_call_return_pc
> 
> but debug info consumers may rely on the return_pc address, and draw
> incorrect conclusions from its off-by-4 value.
> 
> This patch uses the infrastructure for targets to add an offset to the
> label issued after the call_insn to set the call_return_pc attribute,
> on rs6000, to account for opcodes issued after actual call opcode as
> part of call insns output patterns.

> for  gcc/ChangeLog
> 
>   * config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL):
>   Override.

Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
take 8 columns.

> +/* Return the offset to be added to the label output after CALL_INSN
> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> +
> +static int
> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> +{
> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always

This isn't true.  There is a crlogical insn before the bcl for sysv for
example.

> + a 4-byte instruction, but some output patterns issue other
> + opcodes afterwards.  The return label is issued after the entire
> + call insn, including any such post-call opcodes.  Instead of
> + figuring out which cases need adjustments, we compute the offset
> + back to the address of the call opcode proper, then add the
> + constant 4 bytes, to get the address after that opcode.  */
> +  return 4 - get_attr_length (call_insn);

Please explain this magic, too -- in code preferably (so with a ? :
maybe, but don't try to "optimise" that expression, let the compiler
do that, it is much better at it anyway :-) )

> +}

Is that correct for all ABIs we support?  Even if so, it needs a lot
more documentation than this.


Segher


Re: [PATCH v9 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-05-28 Thread Qing Zhao


> On May 28, 2024, at 03:43, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:55 PM Qing Zhao  wrote:
>> 
>> to carry the TYPE of the flexible array.
>> 
>> Such information is needed during tree-object-size.cc.
>> 
>> We cannot use the result type or the type of the 1st argument
>> of the routine .ACCESS_WITH_SIZE to decide the element type
>> of the original array due to possible type casting in the
>> source code.
> 
> OK.  I guess technically an empty CONSTRUCTOR of the array type
> would work as well (as aggregate it's fine to have it in the call) but a
> constant zero pointer might be cheaper to have as it's shared across
> multiple calls.

So, I consider this as an approval? -:)

thanks.

Qing
> 
> Richard.
> 
>> gcc/c/ChangeLog:
>> 
>>* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
>>argument to .ACCESS_WITH_SIZE.
>> 
>> gcc/ChangeLog:
>> 
>>* tree-object-size.cc (access_with_size_object_size): Use the type
>>of the 6th argument for the type of the element.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/flex-array-counted-by-6.c: New test.
>> ---
>> gcc/c/c-typeck.cc | 11 +++--
>> gcc/internal-fn.cc|  2 +
>> .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
>> gcc/tree-object-size.cc   | 16 ---
>> 4 files changed, 66 insertions(+), 9 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> 
>> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
>> index ff6685c6c4ba..0ea3b75355a4 100644
>> --- a/gcc/c/c-typeck.cc
>> +++ b/gcc/c/c-typeck.cc
>> @@ -2640,7 +2640,8 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
>> *counted_by_type)
>> 
>>to:
>> 
>> -   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
>> +   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
>> +   (TYPE_OF_ARRAY *)0))
>> 
>>NOTE: The return type of this function is the POINTER type pointing
>>to the original flexible array type.
>> @@ -2652,6 +2653,9 @@ build_counted_by_ref (tree datum, tree subdatum, tree 
>> *counted_by_type)
>>The 4th argument of the call is a constant 0 with the TYPE of the
>>object pointed by COUNTED_BY_REF.
>> 
>> +   The 6th argument of the call is a constant 0 with the pointer TYPE
>> +   to the original flexible array type.
>> +
>>   */
>> static tree
>> build_access_with_size_for_counted_by (location_t loc, tree ref,
>> @@ -2664,12 +2668,13 @@ build_access_with_size_for_counted_by (location_t 
>> loc, tree ref,
>> 
>>   tree call
>> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
>> -   result_type, 5,
>> +   result_type, 6,
>>array_to_pointer_conversion (loc, ref),
>>counted_by_ref,
>>build_int_cst (integer_type_node, 1),
>>build_int_cst (counted_by_type, 0),
>> -   build_int_cst (integer_type_node, -1));
>> +   build_int_cst (integer_type_node, -1),
>> +   build_int_cst (result_type, 0));
>>   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
>>   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
>>   SET_EXPR_LOCATION (call, loc);
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index e744080ee670..34e4a4aea534 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
>>  1: read_only
>>  2: write_only
>>  3: read_write
>> +   6th argument: A constant 0 with the pointer TYPE to the original flexible
>> + array type.
>> 
>>Both the return type and the type of the first argument of this
>>function have been converted from the incomplete array type to
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> new file mode 100644
>> index ..65fa01443d95
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
>> @@ -0,0 +1,46 @@
>> +/* Test the attribute counted_by and its usage in
>> + * __builtin_dynamic_object_size: when the type of the flexible array member
>> + * is casting to another type.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +typedef unsigned short u16;
>> +
>> +struct info {
>> +   u16 data_len;
>> +   char data[] __attribute__((counted_by(data_len)));
>> +};
>> +
>> +struct foo {
>> +   int a;
>> +   int b;
>> +};
>> +
>> +static __attribute__((__noinline__))
>> +struct info *setup ()
>> +{
>> + struct info *p;
>> + size_t bytes = 3 * sizeof(struct foo);
>> +
>> + p = (struct info *)malloc (sizeof (st

Re: [PATCH v9 3/5] Use the .ACCESS_WITH_SIZE in builtin object size.

2024-05-28 Thread Qing Zhao


> On May 28, 2024, at 03:39, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>> 
> 
> I have no comments here, if Siddesh is OK with this I approve.

thanks.

Qing
> 
>> gcc/ChangeLog:
>> 
>>* tree-object-size.cc (access_with_size_object_size): New function.
>>(call_object_size): Call the new function.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/builtin-object-size-common.h: Add a new macro EXPECT.
>>* gcc.dg/flex-array-counted-by-3.c: New test.
>>* gcc.dg/flex-array-counted-by-4.c: New test.
>>* gcc.dg/flex-array-counted-by-5.c: New test.
>> ---
>> .../gcc.dg/builtin-object-size-common.h   |  11 ++
>> .../gcc.dg/flex-array-counted-by-3.c  |  63 +++
>> .../gcc.dg/flex-array-counted-by-4.c  | 178 ++
>> .../gcc.dg/flex-array-counted-by-5.c  |  48 +
>> gcc/tree-object-size.cc   |  60 ++
>> 5 files changed, 360 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-5.c
>> 
>> diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-common.h 
>> b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> index 66ff7cdd953a..b677067c6e6b 100644
>> --- a/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> +++ b/gcc/testsuite/gcc.dg/builtin-object-size-common.h
>> @@ -30,3 +30,14 @@ unsigned nfails = 0;
>>   __builtin_abort ();
>> \
>> return 0;
>> \
>>   } while (0)
>> +
>> +#define EXPECT(p, _v) do {  
>>  \
>> +  size_t v = _v;
>>  \
>> +  if (p == v)   
>>  \
>> +__builtin_printf ("ok:  %s == %zd\n", #p, p);   
>>  \
>> +  else  
>>  \
>> +{   
>>  \
>> +  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v);   
>>  \
>> +  FAIL ();  
>>  \
>> +}   
>>  \
>> +} while (0);
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> new file mode 100644
>> index ..78f50230e891
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-3.c
>> @@ -0,0 +1,63 @@
>> +/* Test the attribute counted_by and its usage in
>> + * __builtin_dynamic_object_size.  */
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +#include "builtin-object-size-common.h"
>> +
>> +struct flex {
>> +  int b;
>> +  int c[];
>> +} *array_flex;
>> +
>> +struct annotated {
>> +  int b;
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_annotated;
>> +
>> +struct nested_annotated {
>> +  struct {
>> +union {
>> +  int b;
>> +  float f;
>> +};
>> +int n;
>> +  };
>> +  int c[] __attribute__ ((counted_by (b)));
>> +} *array_nested_annotated;
>> +
>> +void __attribute__((__noinline__)) setup (int normal_count, int attr_count)
>> +{
>> +  array_flex
>> += (struct flex *)malloc (sizeof (struct flex)
>> ++ normal_count *  sizeof (int));
>> +  array_flex->b = normal_count;
>> +
>> +  array_annotated
>> += (struct annotated *)malloc (sizeof (struct annotated)
>> + + attr_count *  sizeof (int));
>> +  array_annotated->b = attr_count;
>> +
>> +  array_nested_annotated
>> += (struct nested_annotated *)malloc (sizeof (struct nested_annotated)
>> ++ attr_count *  sizeof (int));
>> +  array_nested_annotated->b = attr_count;
>> +
>> +  return;
>> +}
>> +
>> +void __attribute__((__noinline__)) test ()
>> +{
>> +EXPECT(__builtin_dynamic_object_size(array_flex->c, 1), -1);
>> +EXPECT(__builtin_dynamic_object_size(array_annotated->c, 1),
>> +  array_annotated->b * sizeof (int));
>> +EXPECT(__builtin_dynamic_object_size(array_nested_annotated->c, 1),
>> +  array_nested_annotated->b * sizeof (int));
>> +}
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +  setup (10,10);
>> +  test ();
>> +  DONE ();
>> +}
>> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c 
>> b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> new file mode 100644
>> index ..20103d58ef51
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-4.c
>> @@ -0,0 +1,178 @@
>> +/* Test the attribute counted_by and its usage in
>> +__builtin_dynamic_object_size: what's the correct behavior when the
>> 

Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-28 Thread Qing Zhao
Thank you for the comments. See my answers below:

Joseph, please see the last question, I need your help on it. Thanks a lot for 
the help.

Qing

> On May 28, 2024, at 03:38, Richard Biener  wrote:
> 
> On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
>> 
>> Including the following changes:
>> * The definition of the new internal function .ACCESS_WITH_SIZE
>>  in internal-fn.def.
>> * C FE converts every reference to a FAM with a "counted_by" attribute
>>  to a call to the internal function .ACCESS_WITH_SIZE.
>>  (build_component_ref in c_typeck.cc)
>> 
>>  This includes the case when the object is statically allocated and
>>  initialized.
>>  In order to make this working, the routines initializer_constant_valid_p_1
>>  and output_constant in varasm.cc are updated to handle calls to
>>  .ACCESS_WITH_SIZE.
>>  (initializer_constant_valid_p_1 and output_constant in varasm.c)
>> 
>>  However, for the reference inside "offsetof", the "counted_by" attribute is
>>  ignored since it's not useful at all.
>>  (c_parser_postfix_expression in c/c-parser.cc)
>> 
>>  In addtion to "offsetof", for the reference inside operator "typeof" and
>>  "alignof", we ignore counted_by attribute too.
>> 
>>  When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
>>  replace the call with its first argument.
>> 
>> * Convert every call to .ACCESS_WITH_SIZE to its first argument.
>>  (expand_ACCESS_WITH_SIZE in internal-fn.cc)
>> * Adjust alias analysis to exclude the new internal from clobbering anything.
>>  (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
>> tree-ssa-alias.cc)
>> * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
>> when
>>  it's LHS is eliminated as dead code.
>>  (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
>> * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
>>  get the reference from the call to .ACCESS_WITH_SIZE.
>>  (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
>> 
>> gcc/c/ChangeLog:
>> 
>>* c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
>>attribute when build_component_ref inside offsetof operator.
>>* c-tree.h (build_component_ref): Add one more parameter.
>>* c-typeck.cc (build_counted_by_ref): New function.
>>(build_access_with_size_for_counted_by): New function.
>>(build_component_ref): Check the counted-by attribute and build
>>call to .ACCESS_WITH_SIZE.
>>(build_unary_op): When building ADDR_EXPR for
>>.ACCESS_WITH_SIZE, use its first argument.
>>(lvalue_p): Accept call to .ACCESS_WITH_SIZE.
>> 
>> gcc/ChangeLog:
>> 
>>* internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
>>* internal-fn.def (ACCESS_WITH_SIZE): New internal function.
>>* tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
>>IFN_ACCESS_WITH_SIZE.
>>(call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
>>* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
>>to .ACCESS_WITH_SIZE when its LHS is dead.
>>* tree.cc (process_call_operands): Adjust side effect for function
>>.ACCESS_WITH_SIZE.
>>(is_access_with_size_p): New function.
>>(get_ref_from_access_with_size): New function.
>>* tree.h (is_access_with_size_p): New prototype.
>>(get_ref_from_access_with_size): New prototype.
>>* varasm.cc (initializer_constant_valid_p_1): Handle call to
>>.ACCESS_WITH_SIZE.
>>(output_constant): Handle call to .ACCESS_WITH_SIZE.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>* gcc.dg/flex-array-counted-by-2.c: New test.
>> ---
>> gcc/c/c-parser.cc |  10 +-
>> gcc/c/c-tree.h|   2 +-
>> gcc/c/c-typeck.cc | 128 +-
>> gcc/internal-fn.cc|  35 +
>> gcc/internal-fn.def   |   4 +
>> .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
>> gcc/tree-ssa-alias.cc |   2 +
>> gcc/tree-ssa-dce.cc   |   5 +-
>> gcc/tree.cc   |  25 +++-
>> gcc/tree.h|   8 ++
>> gcc/varasm.cc |  10 ++
>> 11 files changed, 331 insertions(+), 10 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
>> 
>> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
>> index c31349dae2ff..a6ed5ac43bb1 100644
>> --- a/gcc/c/c-parser.cc
>> +++ b/gcc/c/c-parser.cc
>> @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_parser *parser)
>>if (c_parser_next_token_is (parser, CPP_NAME))
>>  {
>>c_token *comp_tok = c_parser_peek_token (parser);
>> +   /* Ignore the counted_by attribute for reference inside
>> +  offsetof since the information

Re: [RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-28 Thread Joseph Myers
On Fri, 24 May 2024, Kewen.Lin wrote:

> Following your suggestion and comments, I made this patch
> for mode_for_floating_type first, considering this touches
> a few FE and port specific code, I think I have to split
> it into a patch series.  Before making that, I'd like to
> ensure this meets what you expected, and also seek for the

The general idea seems reasonable (I haven't reviewed it in detail).  
Note that when removing a target macro, it's a good idea to add it to the 
"Old target macros that have moved to the target hooks structure." list 
(of #pragma GCC poison) in system.h to ensure any new target that was 
originally written before the change doesn't accidentally get into GCC 
while still using the old macros.

-- 
Joseph S. Myers
josmy...@redhat.com



[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.

The load from _Atomic location a improves from:

movqa, %xmm0
movq%xmm0, (%esp)
movl(%esp), %eax
movl4(%esp), %edx

to:
movqa, %xmm0
movd%xmm0, %eax
pextrd  $1, %xmm0, %edx

The store to _Atomic location b improves from:

movl%eax, (%esp)
movl%edx, 4(%esp)
movq(%esp), %xmm0
movq%xmm0, b

to:
movd%eax, %xmm0
pinsrd  $1, %edx, %xmm0
movq%xmm0, b

gcc/ChangeLog:

* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.

Bootstrapped and regression tested on x86_64-pc-linuxgnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8317581ebe2..f2b3ba0aa7a 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -215,8 +215,18 @@ (define_insn_and_split "atomic_loaddi_fpu"
}
   else
{
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
  emit_insn (gen_loaddi_via_sse (tmp, src));
- emit_insn (gen_storedi_via_sse (mem, tmp));
+
+ if (GENERAL_REG_P (dst)
+ && TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_FROM_VEC)
+   {
+ emit_move_insn (dst, tmpdi);
+ DONE;
+   }
+ else
+   emit_move_insn (mem, tmpdi);
}
 
   if (mem != dst)
@@ -294,20 +304,30 @@ (define_insn_and_split "atomic_storedi_fpu"
 emit_move_insn (dst, src);
   else
 {
-  if (REG_P (src))
-   {
- emit_move_insn (mem, src);
- src = mem;
-   }
-
   if (STACK_REG_P (tmp))
{
+ if (GENERAL_REG_P (src))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
  emit_insn (gen_loaddi_via_fpu (tmp, src));
  emit_insn (gen_storedi_via_fpu (dst, tmp));
}
   else
{
- emit_insn (gen_loaddi_via_sse (tmp, src));
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
+ if (GENERAL_REG_P (src)
+ && !(TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_TO_VEC))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
+ emit_move_insn (tmpdi, src);
+
  emit_insn (gen_storedi_via_sse (dst, tmp));
}
 }


Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-28 Thread François Dumont
I can indeed restore _M_initialize_dispatch as it was before. It was not 
fixing my initial problem. I simply kept the code simplification.


    libstdc++: Use RAII to replace try/catch blocks

    Move _Guard into std::vector declaration and use it to guard all 
calls to

    vector _M_allocate.

    Doing so the compiler has more visibility on what is done with the 
pointers

    and do not raise anymore the -Wfree-nonheap-object warning.

    libstdc++-v3/ChangeLog:

    * include/bits/vector.tcc (_Guard): Move all the nested 
duplicated class...

    * include/bits/stl_vector.h (_Guard_alloc): ...here and rename.
    (_M_allocate_and_copy): Use latter.
    (_M_initialize_dispatch): Small code simplification.
    (_M_range_initialize): Likewise and set _M_finish first 
from the result

    of __uninitialize_fill_n_a that can throw.

Tested under Linux x86_64.

Ok to commit ?

François

On 28/05/2024 12:30, Jonathan Wakely wrote:

On Mon, 27 May 2024 at 05:37, François Dumont  wrote:

Here is a new version working also in C++98.

Can we use a different solution that doesn't involve an explicit
template argument list for that __uninitialized_fill_n_a call?

-+this->_M_impl._M_finish = std::__uninitialized_fill_n_a
++this->_M_impl._M_finish =
++  std::__uninitialized_fill_n_a
+  (__start, __n, __value, _M_get_Tp_allocator());

Using _M_fill_initialize solves the problem :-)




Note that I have this failure:

FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess errors)

but it's already failing on master, my patch do not change anything.

Yes, that's been failing for ages.


Tested under Linux x64,

still ok to commit ?

François

On 24/05/2024 16:17, Jonathan Wakely wrote:

On Thu, 23 May 2024 at 18:38, François Dumont  wrote:

On 23/05/2024 15:31, Jonathan Wakely wrote:

On 23/05/24 06:55 +0200, François Dumont wrote:

As explained in this email:

https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html

I experimented -Wfree-nonheap-object because of my enhancements on
algos.

So here is a patch to extend the usage of the _Guard type to other
parts of vector.

Nice, that fixes the warning you were seeing?

Yes ! I indeed forgot to say so :-)



We recently got a bug report about -Wfree-nonheap-object in
std::vector, but that is coming from _M_realloc_append which already
uses the RAII guard :-(
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016

Note that I also had to move call to __uninitialized_copy_a before
assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
warn. But _M_realloc_append is already doing potentially throwing
operations before assigning this->_M_impl so it must be something else.

Though it made me notice another occurence of _Guard in this method. Now
replaced too in this new patch.

   libstdc++: Use RAII to replace try/catch blocks

   Move _Guard into std::vector declaration and use it to guard all
calls to
   vector _M_allocate.

   Doing so the compiler has more visibility on what is done with the
pointers
   and do not raise anymore the -Wfree-nonheap-object warning.

   libstdc++-v3/ChangeLog:

   * include/bits/vector.tcc (_Guard): Move all the nested
duplicated class...
   * include/bits/stl_vector.h (_Guard_alloc): ...here.
   (_M_allocate_and_copy): Use latter.
   (_M_initialize_dispatch): Likewise and set _M_finish first
from the result
   of __uninitialize_fill_n_a that can throw.
   (_M_range_initialize): Likewise.


diff --git a/libstdc++-v3/include/bits/stl_vector.h
b/libstdc++-v3/include/bits/stl_vector.h
index 31169711a48..4ea74e3339a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
clear() _GLIBCXX_NOEXCEPT
{ _M_erase_at_end(this->_M_impl._M_start); }

+private:
+  // RAII guard for allocated storage.
+  struct _Guard

If it's being defined at class scope instead of locally in a member
function, I think a better name would be good. Maybe _Ptr_guard or
_Dealloc_guard or something.

_Guard_alloc chosen.

+  {
+pointer _M_storage;// Storage to deallocate
+size_type _M_len;
+_Base& _M_vect;
+
+_GLIBCXX20_CONSTEXPR
+_Guard(pointer __s, size_type __l, _Base& __vect)
+: _M_storage(__s), _M_len(__l), _M_vect(__vect)
+{ }
+
+_GLIBCXX20_CONSTEXPR
+~_Guard()
+{
+  if (_M_storage)
+_M_vect._M_deallocate(_M_storage, _M_len);
+}
+
+_GLIBCXX20_CONSTEXPR
+pointer
+_M_release()
+{
+  pointer __res = _M_storage;
+  _M_storage = 0;

I don't think the NullablePointer requirements include assigning 0,
only from nullptr, which isn't valid in C++98.

https://en.cppreference.com/w/cpp/named_req/NullablePointer

Please use _M_storage = pointer() instea

Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-28 Thread Segher Boessenkool
On Sat, May 25, 2024 at 09:12:05AM -0300, Alexandre Oliva wrote:


You sent multiple patch series in one thread, and multiple versions of
the same series even.

This is very hard to even *read*, let alone work with.  Please don't.


Segher


Re: [PATCH 2/4] resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN

2024-05-28 Thread Hans-Peter Nilsson
> Date: Mon, 27 May 2024 12:57:53 -0600
> From: Jeff Law 

> > * resource.cc: Include cfgrtl.h.  Use BLOCK_FOR_INSN (insn)->index
> > instead of calling find_basic_block (insn).  Assert for not -1.
> > (find_basic_block): Remove function.
> > (init_resource_info): Call compute_bb_for_insn.
> > (free_resource_info): Call free_bb_for_insn.
> I'm pretty sure that code as part of the overall problem -- namely that 
> we didn't have good basic block info so we resorted to insn scanning.
> 
> Presumably we set BLOCK_FOR_INSN when we generate a wrapper SEQUENCE 
> insns for a filled delay slot?

Yes - one way or the other: most insn chain changes from
reorg are through calls to add_insn_after, which always sets
the bb of the added insn according to the reference insn
(except when either insn is a barrier, then it never sets a
bb); see for example emit_delay_sequence.  Others by
emit_insn_before and emit_copy_of_insn_after.

(Not-so-)fun fact: add_insn_after takes a bb parameter which
reorg.cc always passes as NULL.  But - the argument is
*always ignored* and the bb in the "after" insn is used.
I traced that ignored parameter as far as
r0-81421-g6fb5fa3cbc0d78 "Merge dataflow branch into
mainline" when is was added.  I *guess* it's an artifact
left over from some idea explored on that branch.  Ripe for
obvious cleanup by removal everywhere.

>  Assuming we do create the right mapping 
> for those new insns, then this is OK.

Thanks for the quick review of the whole set!

brgds, H-P


[pushed 1/3] selftests: split out make_fndecl from selftest.h to its own header

2024-05-28 Thread David Malcolm
Avoid selftest.h requiring the "tree" type.
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-873-gfb7a943ead689e.

gcc/analyzer/ChangeLog:
* region-model.cc: Include "selftest-tree.h".

gcc/ChangeLog:
* function-tests.cc: Include "selftest-tree.h".
* selftest-tree.h: New file.
* selftest.h (make_fndecl): Move to selftest-tree.h.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/region-model.cc |  1 +
 gcc/function-tests.cc|  1 +
 gcc/selftest-tree.h  | 41 
 gcc/selftest.h   |  7 --
 4 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 gcc/selftest-tree.h

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index bebe2ed3cd69..0dd5671db1be 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-color.h"
 #include "bitmap.h"
 #include "selftest.h"
+#include "selftest-tree.h"
 #include "analyzer/analyzer.h"
 #include "analyzer/analyzer-logging.h"
 #include "ordered-hash-map.h"
diff --git a/gcc/function-tests.cc b/gcc/function-tests.cc
index 827734422d88..ea3d722d4b69 100644
--- a/gcc/function-tests.cc
+++ b/gcc/function-tests.cc
@@ -76,6 +76,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-ref.h"
 #include "cgraph.h"
 #include "selftest.h"
+#include "selftest-tree.h"
 #include "print-rtl.h"
 
 #if CHECKING_P
diff --git a/gcc/selftest-tree.h b/gcc/selftest-tree.h
new file mode 100644
index ..9922af3340f2
--- /dev/null
+++ b/gcc/selftest-tree.h
@@ -0,0 +1,41 @@
+/* A self-testing framework, for use by -fself-test.
+   Copyright (C) 2015-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_SELFTEST_TREE_H
+#define GCC_SELFTEST_TREE_H
+
+/* The selftest code should entirely disappear in a production
+   configuration, hence we guard all of it with #if CHECKING_P.  */
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Helper function for selftests that need a function decl.  */
+
+extern tree make_fndecl (tree return_type,
+const char *name,
+vec  ¶m_types,
+bool is_variadic = false);
+
+} /* end of namespace selftest.  */
+
+#endif /* #if CHECKING_P */
+
+#endif /* GCC_SELFTEST_TREE_H */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 3bddaf1c3228..808d432ec480 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -178,13 +178,6 @@ class line_table_test
   ~line_table_test ();
 };
 
-/* Helper function for selftests that need a function decl.  */
-
-extern tree make_fndecl (tree return_type,
-const char *name,
-vec  ¶m_types,
-bool is_variadic = false);
-
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 extern void
-- 
2.26.3



[pushed 3/3] diagnostics: consolidate global state in diagnostic-color.cc

2024-05-28 Thread David Malcolm
Simplify the table of default colors, avoiding the need to manually
add the strlen of each entry.
Consolidate the global state in diagnostic-color.cc into a
g_color_dict, adding selftests for the new class diagnostic_color_dict.

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Tested with "make selftest-valgrind" and manually with various
values for GCC_COLORS.
Pushed to trunk as r15-875-g21fc89bac61983.

gcc/ChangeLog:
* diagnostic-color.cc: Define INCLUDE_VECTOR.
Include "label-text.h" and "selftest.h".
(struct color_cap): Replace with...
(struct color_default): ...this, adding "m_" prefixes to fields
and dropping "name_len" and "free_val" field.
(color_dict): Convert to...
(gcc_color_defaults): ...this, making const, dropping the trailing
strlen and "false" from each entry.
(class diagnostic_color_dict): New.
(g_color_dict): New.
(colorize_start): Reimplement in terms of g_color_dict.
(diagnostic_color_dict::get_entry_by_name): New, based on
colorize_start.
(diagnostic_color_dict::get_start_by_name): Likewise.
(diagnostic_color_dict::diagnostic_color_dict): New.
(parse_gcc_colors): Reimplement, moving body...
(diagnostic_color_dict::parse_envvar_value): ...here.
(colorize_init): Lazily create g_color_dict.
(selftest::test_empty_color_dict): New.
(selftest::test_default_color_dict): New.
(selftest::test_color_dict_envvar_parsing): New.
(selftest::diagnostic_color_cc_tests): New.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::diagnostic_color_cc_tests.
* selftest.h (selftest::diagnostic_color_cc_tests): New decl.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-color.cc   | 277 +-
 gcc/selftest-run-tests.cc |   1 +
 gcc/selftest.h|   1 +
 3 files changed, 216 insertions(+), 63 deletions(-)

diff --git a/gcc/diagnostic-color.cc b/gcc/diagnostic-color.cc
index f01a0fc2e377..cbe57ce763f2 100644
--- a/gcc/diagnostic-color.cc
+++ b/gcc/diagnostic-color.cc
@@ -17,9 +17,11 @@
02110-1301, USA.  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "diagnostic-color.h"
 #include "diagnostic-url.h"
+#include "label-text.h"
 
 #ifdef __MINGW32__
 #  define WIN32_LEAN_AND_MEAN
@@ -27,6 +29,7 @@
 #endif
 
 #include "color-macros.h"
+#include "selftest.h"
 
 /* The context and logic for choosing default --color screen attributes
(foreground and background colors, etc.) are the following.
@@ -72,56 +75,124 @@
 counterparts) and possibly bold blue.  */
 /* Default colors. The user can overwrite them using environment
variable GCC_COLORS.  */
-struct color_cap
+struct color_default
 {
-  const char *name;
-  const char *val;
-  unsigned char name_len;
-  bool free_val;
+  const char *m_name;
+  const char *m_val;
 };
 
 /* For GCC_COLORS.  */
-static struct color_cap color_dict[] =
+static const color_default gcc_color_defaults[] =
 {
-  { "error", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED), 5, false },
-  { "warning", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_MAGENTA),
-  7, false },
-  { "note", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN), 4, false },
-  { "range1", SGR_SEQ (COLOR_FG_GREEN), 6, false },
-  { "range2", SGR_SEQ (COLOR_FG_BLUE), 6, false },
-  { "locus", SGR_SEQ (COLOR_BOLD), 5, false },
-  { "quote", SGR_SEQ (COLOR_BOLD), 5, false },
-  { "path", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN), 4, false },
-  { "fnname", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 6, false },
-  { "targs", SGR_SEQ (COLOR_FG_MAGENTA), 5, false },
-  { "fixit-insert", SGR_SEQ (COLOR_FG_GREEN), 12, false },
-  { "fixit-delete", SGR_SEQ (COLOR_FG_RED), 12, false },
-  { "diff-filename", SGR_SEQ (COLOR_BOLD), 13, false },
-  { "diff-hunk", SGR_SEQ (COLOR_FG_CYAN), 9, false },
-  { "diff-delete", SGR_SEQ (COLOR_FG_RED), 11, false },
-  { "diff-insert", SGR_SEQ (COLOR_FG_GREEN), 11, false },
-  { "type-diff", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 9, false 
},
-  { "valid", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 5, false },
-  { "invalid", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED), 7, false },
-  { NULL, NULL, 0, false }
+  { "error", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_RED) },
+  { "warning", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_MAGENTA) },
+  { "note", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN) },
+  { "range1", SGR_SEQ (COLOR_FG_GREEN) },
+  { "range2", SGR_SEQ (COLOR_FG_BLUE) },
+  { "locus", SGR_SEQ (COLOR_BOLD) },
+  { "quote", SGR_SEQ (COLOR_BOLD) },
+  { "path", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN) },
+  { "fnname", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN) },
+  { "targs", SGR_SEQ (COLOR_FG_MAGENTA) },
+  { "fixit-insert", SGR_SEQ (COLOR_FG_GRE

[pushed 2/3] libcpp: move label_text to its own header

2024-05-28 Thread David Malcolm
No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-874-g9bda2c4c81b668.

libcpp/ChangeLog:
* Makefile.in (TAGS_SOURCES): Add include/label-text.h.
* include/label-text.h: New file.
* include/rich-location.h: Include "label-text.h".
(class label_text): Move to label-text.h.

Signed-off-by: David Malcolm 
---
 libcpp/Makefile.in |   2 +-
 libcpp/include/label-text.h| 102 +
 libcpp/include/rich-location.h |  79 +
 3 files changed, 105 insertions(+), 78 deletions(-)
 create mode 100644 libcpp/include/label-text.h

diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
index ebbca3fb..7e47153264c0 100644
--- a/libcpp/Makefile.in
+++ b/libcpp/Makefile.in
@@ -271,7 +271,7 @@ ETAGS = @ETAGS@
 
 TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
 include/cpplib.h include/line-map.h include/mkdeps.h include/symtab.h \
-include/rich-location.h
+include/rich-location.h include/label-text.h
 
 
 TAGS: $(TAGS_SOURCES)
diff --git a/libcpp/include/label-text.h b/libcpp/include/label-text.h
new file mode 100644
index ..13562cda41f9
--- /dev/null
+++ b/libcpp/include/label-text.h
@@ -0,0 +1,102 @@
+/* A very simple string class.
+   Copyright (C) 2015-2024 Free Software Foundation, Inc.
+
+This program is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; see the file COPYING3.  If not see
+.
+
+ In other words, you are welcome to use, share and improve this program.
+ You are forbidden to forbid anyone else to use, share and improve
+ what you give them.   Help stamp out software-hoarding!  */
+
+#ifndef LIBCPP_LABEL_TEXT_H
+#define LIBCPP_LABEL_TEXT_H
+
+/* A struct for the result of range_label::get_text: a NUL-terminated buffer
+   of localized text, and a flag to determine if the caller should "free" the
+   buffer.  */
+
+class label_text
+{
+public:
+  label_text ()
+  : m_buffer (NULL), m_owned (false)
+  {}
+
+  ~label_text ()
+  {
+if (m_owned)
+  free (m_buffer);
+  }
+
+  /* Move ctor.  */
+  label_text (label_text &&other)
+  : m_buffer (other.m_buffer), m_owned (other.m_owned)
+  {
+other.release ();
+  }
+
+  /* Move assignment.  */
+  label_text & operator= (label_text &&other)
+  {
+if (m_owned)
+  free (m_buffer);
+m_buffer = other.m_buffer;
+m_owned = other.m_owned;
+other.release ();
+return *this;
+  }
+
+  /* Delete the copy ctor and copy-assignment operator.  */
+  label_text (const label_text &) = delete;
+  label_text & operator= (const label_text &) = delete;
+
+  /* Create a label_text instance that borrows BUFFER from a
+ longer-lived owner.  */
+  static label_text borrow (const char *buffer)
+  {
+return label_text (const_cast  (buffer), false);
+  }
+
+  /* Create a label_text instance that takes ownership of BUFFER.  */
+  static label_text take (char *buffer)
+  {
+return label_text (buffer, true);
+  }
+
+  void release ()
+  {
+m_buffer = NULL;
+m_owned = false;
+  }
+
+  const char *get () const
+  {
+return m_buffer;
+  }
+
+  bool is_owner () const
+  {
+return m_owned;
+  }
+
+private:
+  char *m_buffer;
+  bool m_owned;
+
+  label_text (char *buffer, bool owned)
+  : m_buffer (buffer), m_owned (owned)
+  {}
+};
+
+#endif /* !LIBCPP_LABEL_TEXT_H  */
diff --git a/libcpp/include/rich-location.h b/libcpp/include/rich-location.h
index a2ece8b033c0..be424cb4b65f 100644
--- a/libcpp/include/rich-location.h
+++ b/libcpp/include/rich-location.h
@@ -22,6 +22,8 @@ along with this program; see the file COPYING3.  If not see
 #ifndef LIBCPP_RICH_LOCATION_H
 #define LIBCPP_RICH_LOCATION_H
 
+#include "label-text.h"
+
 class range_label;
 class label_effects;
 
@@ -541,83 +543,6 @@ protected:
   const diagnostic_path *m_path;
 };
 
-/* A struct for the result of range_label::get_text: a NUL-terminated buffer
-   of localized text, and a flag to determine if the caller should "free" the
-   buffer.  */
-
-class label_text
-{
-public:
-  label_text ()
-  : m_buffer (NULL), m_owned (false)
-  {}
-
-  ~label_text ()
-  {
-if (m_owned)
-  free (m_buffer);
-  }
-
-  /* Move ctor.  */
-  label_text (label_text &&other)
-  : m_buffer (other.m_buffer), m_owned (other.m_owned)
-  {
-other.release ();
-  }
-
-  /* Move assignment.  */
-  label_text & operator= (label_text &&other)
-  {
-if (m_owned)
-  fr

Re: [PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-05-28 Thread David Malcolm
On Tue, 2024-05-28 at 11:41 -0400, David Malcolm wrote:
> > On Tue, 2024-05-28 at 15:03 +0200, Mark Wielaard wrote:
> > Hi Maciej (Hi David, added to CC),
> 
> > On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
> > >  As reported in PR target/79646 and fixed by a change proposed by
> > > Abe we 
> > > have a couple of issues with the descriptions of the VAX
> > > floating-point 
> > > format options in the option definition file.  Additionally most
> > > of
> > > these 
> > > options are not documented in the manual.
> > > 
> > >  This mini patch series addresses these issues, including Abe's
> > > change, 
> > > slightly updated, and my new change.  See individual change
> > > descriptions 
> > > for details.
> > > 
> > >  Verified by inspecting output produced by `vax-netbsdelf-gcc -v
> > > --help' 
> > > and by eyeballing `gcc.info' and `gcc.pdf' files produced. 
> > > Committed.
> > 
> > This broke the gcc-autoregen checker because the
> > gcc/config/vax/vax.opt.urls file wasn't regenerated:
> > https://builder.sourceware.org/buildbot/#/builders/269/builds/5347
> > 
> > Producing the following diff:
> > 
> > diff --git a/gcc/config/vax/vax.opt.urls
> > b/gcc/config/vax/vax.opt.urls
> > index c6b1c418b61..ca78b31dd4c 100644
> > --- a/gcc/config/vax/vax.opt.urls
> > +++ b/gcc/config/vax/vax.opt.urls
> > @@ -1,7 +1,13 @@
> >  ; Autogenerated by regenerate-opt-urls.py from
> > gcc/config/vax/vax.opt and generated HTML
> >  
> > +; skipping UrlSuffix for 'md' due to finding no URLs
> > +
> > +; skipping UrlSuffix for 'md-float' due to finding no URLs
> > +
> >  ; skipping UrlSuffix for 'mg' due to finding no URLs
> >  
> > +; skipping UrlSuffix for 'mg-float' due to finding no URLs
> > +
> >  ; skipping UrlSuffix for 'mgnu' due to finding no URLs
> >  
> >  ; skipping UrlSuffix for 'munix' due to finding no URLs
> > 
> > I am not completely clear on why though. Since it seems you
> > actually
> > did add documentation for exactly these options.
> > 
> > David, should the above diff just be checked in, or do we need to
> > investigate why the URLs weren't found?
> 
> [adding Nick, re the v850 target]
> 
> I found the problem - I messed up when I was populating
> TARGET_SPECIFIC_PAGES in regenerate-opt-urls.py, accidentally
> transposing the entries for v850 and vax by writing:
> 
>     'gcc/V850-Options.html' : 'gcc/config/vax/',
>     'gcc/VAX-Options.html' : 'gcc/config/v850/',
> 
> leading to both gcc/config/v850/v850.opt.urls and
> gcc/config/vax/vax.opt.urls being full of such comments.
> 
> Sorry.
> 
> Fixing that leads to the files for both targets being populated with
> correct-looking URL entries.
> 
> I'll push this to trunk (and backport to gcc 14) after suitable
> testing.

I've pushed this to gcc trunk as r15-872-g7cc529fe514cc6 (having
bootstrapped and lightly tested it on x86_64-pc-linux-gnu)

Dave



Re: [Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-28 Thread Harald Anlauf

Hi Andre,

On 5/28/24 14:10, Andre Vehreschild wrote:

Hi all,

the attached patch fixes a memory leak with unlimited polymorphic return types.
The leak occurred, because an expression with side-effects was evaluated twice.
I have substituted the check for non-variable expressions followed by creating a
SAVE_EXPR with checking for trees with side effects and creating temp. variable
and freeing the memory.


this looks good to me.  It also solves the runtime memory leak in
testcase pr114012.f90 .  Nice!


Btw, I do not get the SAVE_EXPR in the old code. Is there something missing to
manifest it or is a SAVE_EXPR not meant to be evaluated twice?


I was assuming that the comment in gcc/tree.h applies here:

/* save_expr (EXP) returns an expression equivalent to EXP
   but it can be used multiple times within context CTX
   and only evaluate EXP once.  */

I do not know what the practical difference between a SAVE_EXPR
and a temporary explicitly evaluated once (which you have now)
is, except that you can free the temporary cleanly.


Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?


Yes, this is fine from my side.  If you are inclined to backport
to e.g. 14-branch after a grace period, that would be great.


This work is funded by the Souvereign Tech Fund. Yes, the funding has been
granted and Nicolas, Mikael and me will be working on some Fortran topics in
the next 12-18 months.


This is really great news!


Regards,
Andre


Thanks for the patch!

Harald


--
Andre Vehreschild * Email: vehre ad gmx dot de




[patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus
-fopenmp-force-usm can be useful for some badly written code. Explicity 
using 'omp requires' makes more sense but still. It might also make 
sense for testing purpose.


Unfortunately, I did not see a simple way of testing it. When trying it 
manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where 
gcn_data has the omp_requires_mask as second argument and testing showed 
that an explicit pragma and the -f... argument have the same result.


Alternative would be to move this code later, e.g. to lto-cgraph.cc's 
omp_requires_mask, which might be safer (as it avoids changing as many 
locations). On the other hand, it might require more special cases 
elsewhere.*


Comment, suggestions?

Tobias

*I am especially thinking about a global variable and "#pragma omp 
declare target". At least with 'omp requires self_maps' of OpenMP 6, it 
seems as if 'declare target enter(global_var)' should become 
'link(global_var)' where the global_var pointer is updated to point to 
the host version.


At least I don't see how otherwise the "all corresponding list items 
created by the 'enter' clauses specified by declare target directives in 
the compilation unit share storage with the original list items." could 
be fulfilled.


This will require generating different code for 'self_maps' (and, 
potentially / [RFC] 'unified_shared_memory') than normal code, which 
would be the first compiler code-gen change due to USM (→ 
GOMP_OFFLOAD_CAP_SHARED_MEM) for non-host devices.
OpenMP: Add -fopenmp-force-usm mode

Add an implicit 'omp requires unified_shared_memory' to all files that
use target constructs ("OMP_REQUIRES_TARGET_USED").  As constructed, the
diagnostic "'unified_shared_memory' clause used lexically after first target
construct or offloading API" is not inhibited.

The option has no effect without -fopenmp and does not affect OpenACC code,
matching what the directive would do.  The name of the command-line option
matches Clang's, added in LLVM 18.

gcc/c-family/ChangeLog:

	* c.opt (fopenmp-force-usm): New.
	* c.opt.urls: Regenerated

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
	c_parser_omp_target_enter_data, c_parser_omp_target_exit_data,
	c_parser_omp_target): When setting OMP_REQUIRES_TARGET_USED, also
	set OMP_REQUIRES_UNIFIED_SHARED_MEMORY if -fopenmp-force-usm is
	in force.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_target_data,
	cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
	cp_parser_omp_target_update, cp_parser_omp_target): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.


gcc/ChangeLog:

	* doc/invoke.texi (-fopenmp-force-usm): Document new option.

gcc/fortran/ChangeLog:

	* invoke.texi (-fopenmp-force-usm): Document new option.
	* lang.opt (fopenmp-force-usm): New.
	* lang.opt.urls: Regenerate.
	* parse.cc (gfc_parse_file): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.

 gcc/c-family/c.opt|  4 
 gcc/c-family/c.opt.urls   |  3 +++
 gcc/c/c-parser.cc | 50 +--
 gcc/cp/parser.cc  | 50 +--
 gcc/doc/invoke.texi   | 11 +--
 gcc/fortran/invoke.texi   |  7 +++
 gcc/fortran/lang.opt  |  4 
 gcc/fortran/lang.opt.urls |  3 +++
 gcc/fortran/parse.cc  | 10 --
 9 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..4985cd61c48 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2136,6 +2136,10 @@ fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
 
+fopenmp-force-usm
+C ObjC C++ ObjC++ Var(flag_openmp_force_usm)
+Behave as if the source file contained OpenMP's 'requires unified_shared_memory'.
+
 fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index dd455d7c0dc..34b3a395e84 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -1222,6 +1222,9 @@ UrlSuffix(gcc/C-Dialect-Options.html#index-fopenacc-dim)
 fopenmp
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp)
 
+fopenmp-force-usm
+UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-force-usm) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-force-usm)
+
 fopenmp-simd
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-simd) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-simd)
 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 00f8bf4376e..93c9cd1c9d0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23849,8 +23849,14 @@ static tree
 c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
 {
   if (f

[PATCH v4] RISC-V: Introduce -mvector-strict-align.

2024-05-28 Thread Robin Dapp
Hi,

this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign.  For now, generic-ooo is the only uarch that supports
misaligned vector access.

The patch also adds a check_effective_target_riscv_v_misalign_ok to
the testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

Changes from v3:
 - Adressed Kito's comments.
 - Made -mscalar-strict-align a real alias.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and map to riscv_vector_unaligned_access_p.
* config/riscv/riscv.opt: Add -mvector-strict-align.
* config/riscv/riscv.cc (struct riscv_tune_param): Add
vector_unaligned_access.
(riscv_override_options_internal): Set
riscv_vector_unaligned_access_p.
* doc/invoke.texi: Document -mvector-strict-align.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mno-vector-strict-align.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Ditto.
---
 gcc/config/riscv/riscv-opts.h |  3 --
 gcc/config/riscv/riscv.cc | 19 +++
 gcc/config/riscv/riscv.h  |  5 +++
 gcc/config/riscv/riscv.opt|  8 +
 gcc/doc/invoke.texi   | 22 
 .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
 .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 34 +--
 13 files changed, 93 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
 
-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a99211d56b1..13cd61a4a22 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -287,6 +287,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool vector_unaligned_access;
   bool use_divmod_expansion;
   bool overlap_op_by_pieces;
   unsigned int fusible_ops;
@@ -299,6 +300,10 @@ struct riscv_tune_param
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Whether misaligned vector accesses are supported (i.e. do not
+   throw an exception).  */
+bool riscv_vector_unaligned_access_p;
+
 /* Whether user explicitly passed -mstrict-align.  */
 bool riscv_user_wants_strict_align;
 
@@ -441,6 +446,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* vector_unaligned_access */
   false,   /* use_divmod_expansion */
   false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
@@ -459,6 +465,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* vector_unaligned_access */
   false,   

Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-28 Thread Jason Merrill

On 5/25/24 08:12, Alexandre Oliva wrote:

On Apr 27, 2023, Alexandre Oliva  wrote:

On Apr 14, 2023, Alexandre Oliva  wrote:

On Mar 23, 2023, Alexandre Oliva  wrote:

This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.



Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614452.html


Ping?
Refreshed, retested on ppc64le-linux-gnu.  Ok to install?


I wonder about adding this information to REG_CALL_ARG_LOCATION, but 
doing it this way also seems reasonable.  I'm interested in Jakub's 
input, but the patch is OK in a week if he doesn't get to it.

This patch introduces infrastructure for targets to add an offset to
the label issued after the call_insn to set the call_return_pc
attribute.  This will be used on rs6000, that sometimes issues another
instruction after the call proper as part of a call insn.


for  gcc/ChangeLog

* target.def (call_offset_return_label): New hook.
* gcc/doc/tm.texi.in (TARGET_CALL_OFFSET_RETURN_LABEL): Add
placeholder.
* gcc/doc/tm.texi: Rebuild.
* dwarf2out.cc (struct call_arg_loc_node): Record call_insn
instad of call_arg_loc_note.
(add_AT_lbl_id): Add optional offset argument.
(gen_call_site_die): Compute and pass on a return pc offset.
(gen_subprogram_die): Move call_arg_loc_note computation...
(dwarf2out_var_location): ... from here.  Set call_insn.
---
  gcc/doc/tm.texi|7 +++
  gcc/doc/tm.texi.in |2 ++
  gcc/dwarf2out.cc   |   26 +-
  gcc/target.def |9 +
  4 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd50078227d98..8a7aa70d605ba 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5557,6 +5557,13 @@ except the last are treated as named.
  You need not define this hook if it always returns @code{false}.
  @end deftypefn
  
+@deftypefn {Target Hook} int TARGET_CALL_OFFSET_RETURN_LABEL (rtx_insn *@var{call_insn})

+While generating call-site debug info for a CALL insn, or a SEQUENCE
+insn starting with a CALL, this target hook is invoked to compute the
+offset to be added to the debug label emitted after the call to obtain
+the return address that should be recorded as the return PC.
+@end deftypefn
+
  @deftypefn {Target Hook} void TARGET_START_CALL_ARGS (cumulative_args_t 
@var{complete_args})
  This target hook is invoked while generating RTL for a function call,
  after the argument values have been computed, and after stack arguments
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 058bd56487a9a..9e0830758aeea 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3887,6 +3887,8 @@ These machine description macros help implement varargs:
  
  @hook TARGET_STRICT_ARGUMENT_NAMING
  
+@hook TARGET_CALL_OFFSET_RETURN_LABEL

+
  @hook TARGET_START_CALL_ARGS
  
  @hook TARGET_CALL_ARGS

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 5b064ffd78ad1..1092880738df4 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -3593,7 +3593,7 @@ typedef struct var_loc_list_def var_loc_list;
  
  /* Call argument location list.  */

  struct GTY ((chain_next ("%h.next"))) call_arg_loc_node {
-  rtx GTY (()) call_arg_loc_note;
+  rtx_insn * GTY (()) call_insn;
const char * GTY (()) label;
tree GTY (()) block;
bool tail_call_p;
@@ -3777,7 +3777,8 @@ static void remove_addr_table_entry (addr_table_entry *);
  static void add_AT_addr (dw_die_ref, enum dwarf_attribute, rtx, bool);
  static inline rtx AT_addr (dw_attr_node *);
  static void add_AT_symview (dw_die_ref, enum dwarf_attribute, const char *);
-static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *);
+static void add_AT_lbl_id (dw_die_ref, enum dwarf_attribute, const char *,
+  int = 0);
  static void add_AT_lineptr (dw_die_ref, enum dwarf_attribute, const char *);
  static void add_AT_macptr (dw_die_ref, enum dwarf_attribute, const char *);
  static void add_AT_range_list (dw_die_ref, enum dwarf_attribute,
@@ -5353,14 +5354,17 @@ add_AT_symview (dw_die_ref die, enum dwarf_attribute 
attr_kind,
  
  static inline void

  add_AT_lbl_id (dw_die_ref die, enum dwarf_attribute attr_kind,
-   const char *lbl_id)
+  const char *lbl_id, int offset)
  {
dw_attr_node attr;
  
attr.dw_attr = attr_kind;

attr.dw_attr_val.val_class = dw_val_class_lbl_id;
attr.dw_attr_val.val_entry = NULL;
-  attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  if (!offset)
+attr.dw_attr_val.v.val_lbl_id = xstrdup (lbl_id);
+  else
+attr.dw_attr_val.v.val_lbl_id = xasprintf ("%s%+i", lbl_id, offset);
if (dwarf_split_debug_info)
  attr.dw_attr_val.val_entry
  = add_addr_table_entry (attr.dw_attr_val.v.val_lbl_id,
@@

Re: [PATCH] c++: extend -Wself-move for mem-init-list [PR109396]

2024-05-28 Thread Marek Polacek
On Fri, May 24, 2024 at 10:15:56AM -0400, Jason Merrill wrote:
> On 5/23/24 19:57, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > We already warn for:
> > 
> >x = std::move (x);
> > 
> > which triggers:
> > 
> >warning: moving 'x' of type 'int' to itself [-Wself-move]
> > 
> > but bug 109396 reports that this doesn't work for a member-initializer-list:
> > 
> >X() : x(std::move (x))
> > 
> > so this patch amends that.
> > 
> > PR c++/109396
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (maybe_warn_self_move): Declare.
> > * init.cc (perform_member_init): Call maybe_warn_self_move.
> > * typeck.cc (maybe_warn_self_move): No longer static.  Change the
> > return type to bool.  Also warn when called from
> > a member-initializer-list.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/warn/Wself-move2.C: New test.
> > ---
> >   gcc/cp/cp-tree.h|  1 +
> >   gcc/cp/init.cc  |  5 ++--
> >   gcc/cp/typeck.cc| 28 +--
> >   gcc/testsuite/g++.dg/warn/Wself-move2.C | 37 +
> >   4 files changed, 60 insertions(+), 11 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move2.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index ba9e848c177..ea3fa6f4aac 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -8263,6 +8263,7 @@ extern cp_expr build_c_cast   
> > (location_t loc, tree type,
> >  cp_expr expr);
> >   extern tree cp_build_c_cast   (location_t, tree, tree,
> >  tsubst_flags_t);
> > +extern bool maybe_warn_self_move   (location_t, tree, tree);
> >   extern cp_expr build_x_modify_expr(location_t, tree,
> >  enum tree_code, tree,
> >  tree, tsubst_flags_t);
> > diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
> > index 52396d87a8c..4a7ed7f5302 100644
> > --- a/gcc/cp/init.cc
> > +++ b/gcc/cp/init.cc
> > @@ -999,7 +999,7 @@ perform_member_init (tree member, tree init, 
> > hash_set &uninitialized)
> > if (decl == error_mark_node)
> >   return;
> > -  if ((warn_init_self || warn_uninitialized)
> > +  if ((warn_init_self || warn_uninitialized || warn_self_move)
> > && init
> > && TREE_CODE (init) == TREE_LIST
> > && TREE_CHAIN (init) == NULL_TREE)
> > @@ -1013,7 +1013,8 @@ perform_member_init (tree member, tree init, 
> > hash_set &uninitialized)
> > warning_at (DECL_SOURCE_LOCATION (current_function_decl),
> > OPT_Winit_self, "%qD is initialized with itself",
> > member);
> > -  else
> > +  else if (!maybe_warn_self_move (input_location, member,
> > + TREE_VALUE (init)))
> > find_uninit_fields (&val, &uninitialized, decl);
> >   }
> > diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc
> > index d7fa6e0dd96..e058ce18276 100644
> > --- a/gcc/cp/typeck.cc
> > +++ b/gcc/cp/typeck.cc
> > @@ -9355,27 +9355,27 @@ cp_build_c_cast (location_t loc, tree type, tree 
> > expr,
> >   /* Warn when a value is moved to itself with std::move.  LHS is the 
> > target,
> >  RHS may be the std::move call, and LOC is the location of the whole
> > -   assignment.  */
> > +   assignment.  Return true if we warned.  */
> > -static void
> > +bool
> >   maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
> >   {
> > if (!warn_self_move)
> > -return;
> > +return false;
> > /* C++98 doesn't know move.  */
> > if (cxx_dialect < cxx11)
> > -return;
> > +return false;
> > if (processing_template_decl)
> > -return;
> > +return false;
> > if (!REFERENCE_REF_P (rhs)
> > || TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
> > -return;
> > +return false;
> > tree fn = TREE_OPERAND (rhs, 0);
> > if (!is_std_move_p (fn))
> > -return;
> > +return false;
> > /* Just a little helper to strip * and various NOPs.  */
> > auto extract_op = [] (tree &op) {
> > @@ -9393,13 +9393,23 @@ maybe_warn_self_move (location_t loc, tree lhs, 
> > tree rhs)
> > tree type = TREE_TYPE (lhs);
> > tree orig_lhs = lhs;
> > extract_op (lhs);
> > -  if (cp_tree_equal (lhs, arg))
> > +  if (cp_tree_equal (lhs, arg)
> > +  /* Also warn in a member-initializer-list, as in : i(std::move(i)).  
> > */
> > +  || (TREE_CODE (lhs) == FIELD_DECL
> > + && TREE_CODE (arg) == COMPONENT_REF
> > + && cp_tree_equal (TREE_OPERAND (arg, 0), current_class_ref)
> > + && TREE_OPERAND (arg, 1) == lhs))
> >   {
> > auto_diagnostic_group d;
> > if (warning_at (loc, OPT_Wself_move,
> >   "moving %qE of type %qT to itself", orig_lhs, type))
> > - 

Re: [PATCH] c++: canonicity of fn types w/ instantiated eh specs [PR115223]

2024-05-28 Thread Jason Merrill

On 5/25/24 19:18, Patrick Palka wrote:

Bootstrap and regtest on x86_64-pc-linux-gnu in progress,
does this look OK for trunk if successful?


OK.


-- >8 --

When propagating structural equality in build_cp_fntype_variant, we
should consider structural equality of the exception-less variant, not
of the given type which might use structural equality only because of
the (complex) noexcept-spec we're intending to replace, as in
maybe_instantiate_noexcept which calls build_exception_variant using
the function type with a deferred noexcept-spec.  Otherwise we might
pessimisticly use structural equality for a function type with a simple
instantiated noexcept-spec, leading to a failed LTO-specific sanity
check if we later use that (structural-equality) type as the canonical
version of some other variant.

PR c++/115223

gcc/cp/ChangeLog:

* tree.cc (build_cp_fntype_variant): Propagate structural
equality of the exception-less variant.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/noexcept87.C: New test.
---
  gcc/cp/tree.cc  |  4 
  gcc/testsuite/g++.dg/cpp0x/noexcept87.C | 11 +++
  2 files changed, 15 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept87.C

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 4d87661b4ad..f810b8cd777 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -2796,6 +2796,10 @@ build_cp_fntype_variant (tree type, cp_ref_qualifier 
rqual,
bool complex_eh_spec_p = (cr && cr != noexcept_true_spec
&& !UNPARSED_NOEXCEPT_SPEC_P (cr));
  
+  if (!complex_eh_spec_p && TYPE_RAISES_EXCEPTIONS (type))

+/* We want to consider structural equality of the exception-less
+   variant since we'll be replacing the exception specification.  */
+type = build_cp_fntype_variant (type, rqual, /*raises=*/NULL_TREE, late);
if (TYPE_STRUCTURAL_EQUALITY_P (type) || complex_eh_spec_p)
  /* Propagate structural equality.  And always use structural equality
 for function types with a complex noexcept-spec since their identity
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept87.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
new file mode 100644
index 000..60b1497472b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C
@@ -0,0 +1,11 @@
+// PR c++/115223
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -flto }
+
+template
+void f() noexcept(bool(T() || true));
+
+void g(int n) { f(); }
+
+using type = void;
+type callDestructorIfNecessary() noexcept {}




Re: [PATCH] c++/modules: Prevent revealing a using-decl affecting cached overloads [PR114867]

2024-05-28 Thread Jason Merrill

On 5/26/24 09:01, Nathaniel Shead wrote:

Is this approach OK?  Alternatively I suppose we could do a deep copy of
the overload list when this occurs to ensure we don't affect existing
referents, would that be preferable?


This strategy makes sense, but I have other concerns:


Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

Doing 'remove_node' here is not safe, because it not only mutates the
OVERLOAD we're walking over but potentially any other references to this
OVERLOAD that are cached from phase-1 template lookup.  This causes the
attached testcase to fail because the overload set in X::test no longer
contains the 'ns::foo' template once instantiated at the end of the


It looks like ns::foo has been renamed to just f in the testcase.


file.

This patch works around this by simply not removing the old declaration.
This does make the overload list potentially longer than it otherwise
would have been, but only when re-exporting the same set of functions in
a using-decl.  Additionally, because 'ovl_insert' always prepends these
newly inserted overloads, repeated exported using-decls won't continue
to add declarations, as the first exported using-decl will be found
before the original (unexported) declaration.

PR c++/114867

gcc/cp/ChangeLog:

* name-lookup.cc (do_nonmember_using_decl): Don't remove the
existing overload.

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-17_a.C: New test.
* g++.dg/modules/using-17_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc | 24 +++---
  gcc/testsuite/g++.dg/modules/using-17_a.C | 31 +++
  gcc/testsuite/g++.dg/modules/using-17_b.C | 13 ++
  3 files changed, 53 insertions(+), 15 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/using-17_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/using-17_b.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index f1f8c19feb1..130a0e6b5db 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -5231,25 +5231,19 @@ do_nonmember_using_decl (name_lookup &lookup, bool 
fn_scope_p,
  
  	  if (new_fn == old_fn)

{
- /* The function already exists in the current
-namespace.  We will still want to insert it if
-it is revealing a not-revealed thing.  */
+ /* The function already exists in the current namespace.  */
  found = true;
- if (!revealing_p)
-   ;
- else if (old.using_p ())
+ if (exporting)
{
- if (exporting)
+ if (old.using_p ())
/* Update in place.  'tis ok.  */
OVL_EXPORT_P (old.get_using ()) = true;
- ;
-   }
- else if (DECL_MODULE_EXPORT_P (new_fn))
-   ;
- else
-   {
- value = old.remove_node (value);
- found = false;
+ else if (!DECL_MODULE_EXPORT_P (new_fn))
+   /* We need to re-insert this function as an exported
+  declaration.  We can't remove the existing decl
+  because that will change any overloads cached in
+  template functions.  */
+   found = false;


What if we're revealing without exporting?  That is, a using-declaration 
in module purview that isn't exported?  Such a declaration should still 
prevent discarding, which is my understanding of the use of "revealing" 
here.


It seems like the current code already gets that wrong for e.g.

M_1.C:
module;
 struct A {};
 inline int f() { return 42; }
export module M;
 using ::A;
 using ::f;

M_2.C:
import M;
 inline int f();
 struct A a; // { dg-bogus "incomplete" }
int main() {
  return f(); // { dg-bogus "undefined" }
}

It looks like part of the problem is that add_binding_entity is only 
interested in exported usings, but I think it should also handle 
revealing ones.


Jason



[COMMITTED] tree-optimization/115221 - Do not invoke SCEV if it will use a different range query.

2024-05-28 Thread Andrew MacLeod
The original patch causing the PR made  ranger's cache re-entrant to 
enable SCEV to use the current range_query when called from within ranger..


SCEV uses the currently active range query (via get_range_query()) for 
picking up values.  fold_using_range is the general purpose stmt folder 
many  components use, and it takes a range_query to use for folding.   
When propagating values in the cache, we need to ensure no new queries 
are invoked, and when the cache is propagating and calculating outgoing 
edges, it switches to a read only range_query which uses what it knows 
about global values to come up with best result using current state.


SCEV is unaware of what the caller is using for a range_query, so when 
attempting to fold a PHI node, it is re-invoking the current query 
during propagation which is undesired behavior.   This patch tells 
fold_using_range to not use SCEV if the range_query being used is not 
the same as the one SCEV is going to use.


Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Andrew
From b814e390e7c87c14ce8d9cdea6c6cd127a4e6261 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 27 May 2024 11:00:57 -0400
Subject: [PATCH] Do not invoke SCEV if it will use a different range query.

SCEV always uses the current range_query object.
Ranger's cache uses a global value_query when propagating cache values to
avoid re-invoking ranger during simple vavhe propagations.
when folding a PHI value, SCEV can be invoked, and since it alwys uses
the current range_query object, when ranger is active this causes the
undesired re-invoking of ranger during cache propagation.

This patch checks to see if the fold_using_range specified range_query
object is the same as the one SCEV uses, and does not invoke SCEV if
they do not match.

	PR tree-optimization/115221
	gcc/
	* gimple-range-fold.cc (range_of_ssa_name_with_loop_info): Do
	not invoke SCEV is range_query's do not match.
	gcc/testsuite/
	* gcc.dg/pr115221.c: New.
---
 gcc/gimple-range-fold.cc|  6 +-
 gcc/testsuite/gcc.dg/pr115221.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr115221.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index b3965b5ee50..98a4877ba18 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -1264,7 +1264,11 @@ fold_using_range::range_of_ssa_name_with_loop_info (vrange &r, tree name,
 		fur_source &src)
 {
   gcc_checking_assert (TREE_CODE (name) == SSA_NAME);
-  if (!range_of_var_in_loop (r, name, l, phi, src.query ()))
+  // SCEV currently invokes get_range_query () for values.  If the query
+  // being passed in is not the same SCEV will use, do not invoke SCEV.
+  // This can be remove if/when SCEV uses a passed in range-query.
+  if (src.query () != get_range_query (cfun)
+  || !range_of_var_in_loop (r, name, l, phi, src.query ()))
 r.set_varying (TREE_TYPE (name));
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr115221.c b/gcc/testsuite/gcc.dg/pr115221.c
new file mode 100644
index 000..f139394e5c0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115221.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned uint32_t;
+int cde40_t;
+int offset;
+void aal_test_bit();
+uint32_t cde40_key_pol();
+long cde40_offset_check(uint32_t pos) {
+  cde40_key_pol();
+  if (cde40_t)
+return (offset - 2) % (((pos == 3) ? 18 : 26)) != 0;
+  return 0;
+}
+void cde40_check_struct() {
+  uint32_t i, j, to_compare;
+  for (;; i++) {
+cde40_offset_check(i);
+if (to_compare == 0) {
+  if (i && cde40_key_pol())
+	;
+  to_compare = i;
+  continue;
+}
+j = to_compare;
+for (; j < i; j++)
+  aal_test_bit();
+  }
+}
-- 
2.41.0



[COMMITTED] Strlen pass should set current range query.

2024-05-28 Thread Andrew MacLeod

Thanks.

Committed with the change to the testcase.

Bootstraps on x86_64-pc-linux-gnu with no regressions.

Andrew



On 5/28/24 02:49, Richard Biener wrote:

On Tue, May 28, 2024 at 1:24 AM Andrew MacLeod  wrote:

The strlen pass currently has a local ranger instance, but when it
invokes SCEV or any other shared component, SCEV will not be able to
access to this ranger as it uses get_range_query().  They will be stuck
with global ranges.

Enable/disable ranger should be used instead of a local version which
allows other components to use the current range_query.

Bootstraps on 86_64-pc-linux-gnu, but there is one regression. The
regression is from gcc.dg/Wstringop-overflow-10.c.  the function in
question:

void
baz (char *a)
{
char b[16] = "abcdefg";
__builtin_strncpy (a, b, __builtin_strnlen (b, 7));/* { dg-bogus
"specified bound depends on the length of the source argument" } */
}

when compiled with  -O2 -Wstringop-overflow -Wstringop-truncation

it now spits out:

b2.c: In function ‘baz’:
b2.c:24:3: warning: ‘__builtin_strncpy’ output 2 truncated before
terminating nul copying  bytes from a string of the same length
[-Wstringop-truncation]
 24 |   __builtin_strncpy (a, b, __builtin_strnlen (b, 7));   /* {
dg-bogus "specified bound depends on the length of the source argument" } */

It seems like maybe something got smarter by setting the current range
query and this is a legitimate warning for this line of code?   There
will indeed not be a NULL copied as there are 7 characters in the string...

Is this a testcase issue where this warning should have been issued
before, or am I misunderstanding the warning?

I think the warning makes sense in this case.  But I'm not sure why the
dg-bogus is there, that looks like a valid complaint as well?!

I think the patch is OK.

Richard.


Andrew

PS im afraid of adjusting the status quo in this pass... :-P  Not
allowing sSCEV to access the current ranger is causing me other issues
with the fix for 115221.  This *should* have been a harmless change
sigh. :-(  The whole mechanism should just use the current range-query
instad of passing a ranger pointer aorund. But that a much bigger
issue.  one thing at a time.


From c43236cb59e11cadda2654edc117d9270dff75c6 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Mon, 27 May 2024 13:20:13 -0400
Subject: [PATCH 1/5] Strlen pass should set current range query.

The strlen pass currently has a local ranger instance, but when it
invokes SCEV, scev will not be able to access to this ranger.

Enable/disable ranger shoud be used, allowing other components to use
the current range_query.

	gcc/
	* tree-ssa-strlen.cc (strlen_pass::strlen_pass): Add function
	pointer and initialize ptr_qry with current range_query.
	(strlen_pass::m_ranger): Remove.
	(printf_strlen_execute): Enable and disable ranger.
	gcc/testsuite/
	* gcc.dg/Wstringop-overflow-10.c: Add truncating warning.
---
 gcc/testsuite/gcc.dg/Wstringop-overflow-10.c |  2 +-
 gcc/tree-ssa-strlen.cc   | 10 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c b/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
index bace08ad5d3..ddc27fc0580 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-10.c
@@ -21,7 +21,7 @@ void
 baz (char *a)
 {
   char b[16] = "abcdefg";
-  __builtin_strncpy (a, b, __builtin_strnlen (b, 7));	/* { dg-bogus "specified bound depends on the length of the source argument" } */
+  __builtin_strncpy (a, b, __builtin_strnlen (b, 7));	/* { dg-warning "output truncated before terminating nul" } */
 }
 
 void fill (char *);
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 7596dd80942..c43a2da2836 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -235,9 +235,9 @@ get_range (tree val, gimple *stmt, wide_int minmax[2],
 class strlen_pass : public dom_walker
 {
 public:
-  strlen_pass (cdi_direction direction)
+  strlen_pass (function *fun, cdi_direction direction)
 : dom_walker (direction),
-  ptr_qry (&m_ranger),
+  ptr_qry (get_range_query (fun)),
   m_cleanup_cfg (false)
   {
   }
@@ -299,8 +299,6 @@ public:
 			unsigned HOST_WIDE_INT lenrng[2],
 			unsigned HOST_WIDE_INT *size, bool *nulterm);
 
-  gimple_ranger m_ranger;
-
   /* A pointer_query object to store information about pointers and
  their targets in.  */
   pointer_query ptr_qry;
@@ -5912,9 +5910,10 @@ printf_strlen_execute (function *fun, bool warn_only)
   ssa_ver_to_stridx.safe_grow_cleared (num_ssa_names, true);
   max_stridx = 1;
 
+  enable_ranger (fun);
   /* String length optimization is implemented as a walk of the dominator
  tree and a forward walk of statements within each block.  */
-  strlen_pass walker (CDI_DOMINATORS);
+  strlen_pass walker (fun, CDI_DOMINATORS);
   walker.walk (ENTRY_BLOCK_PTR_FOR_FN (fun));
 
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -5939,6 +5

Re: [PATCH v2 1/2] driver: Use -as/ld/objcopy as final fallback instead of native ones for cross

2024-05-28 Thread YunQiang Su
YunQiang Su  于2024年5月22日周三 17:54写道:
>
> If `find_a_program` cannot find `as/ld/objcopy` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try with
> -as/ld/objcopy before fallback to native as/ld/objcopy.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff
>
> gcc
> * gcc.cc(execute): Looks for -as/ld/objcopy before fallback
> to native as/ld/objcopy.

ping. OK for the trunk?

> ---
>  gcc/gcc.cc | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..3dc6348d761 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,26 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
> commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> +   && !strcmp (commands[0].argv[0], commands[0].prog)
> +   && (!strcmp (commands[0].prog, "as")
> +   || !strcmp (commands[0].prog, "ld")
> +   || !strcmp (commands[0].prog, "objcopy")))
> +   {
> + string = concat (DEFAULT_REAL_TARGET_MACHINE, "-",
> +   commands[0].prog, NULL);
> + const char *string_args[] = {string, "--version", NULL};
> + int exit_status = 0;
> + int err = 0;
> + const char *errmsg = pex_one (PEX_SEARCH, string,
> + CONST_CAST (char **, string_args), string,
> + NULL, NULL, &exit_status, &err);
> + if (errmsg == NULL && exit_status == 0 && err == 0)
> +   {
> + commands[0].argv[0] = string;
> + commands[0].prog = string;
> +   }
> +   }
>  }
>
>for (n_commands = 1, i = 0; argbuf.iterate (i, &arg); i++)
> --
> 2.39.2
>


[PATCH] MIPS16: Mark $2/$3 as clobbered if GP is used

2024-05-28 Thread YunQiang Su
PR Target/84790.
The gp init sequence
li  $2,%hi(_gp_disp)
addiu   $3,$pc,%lo(_gp_disp)
sll $2,16
addu$2,$3
is generated directly in `mips_output_function_prologue`, and does
not appear in the RTL.

So the IRA/IPA passes are not aware that $2/$3 have been clobbered,
so they may be used for cross (local) function call.

Let's mark $2/$3 clobber both:
  - Just after the UNSPEC_GP RTL of a function;
  - Just after a function call.

Reported-by: Matthias Schiffer 
Origin-Patch-by: Felix Fietkau .

gcc
* config/mips/mips.cc(mips16_gp_pseudo_reg): Mark
MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered.
(mips_emit_call_insn): Mark MIPS16_PIC_TEMP and
MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP.
---
 gcc/config/mips/mips.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index b63d40a357b..b478cddc8ad 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3233,6 +3233,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx 
addr, bool lazy_p)
 {
   rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG);
   clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg);
+  clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP);
+  clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn),
+   MIPS_PROLOGUE_TEMP (word_mode));
 }
 
   return insn;
@@ -3329,7 +3332,13 @@ mips16_gp_pseudo_reg (void)
   rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx);
   rtx_insn *insn = emit_insn_after (set, scan);
   INSN_LOCATION (insn) = 0;
-
+  /* NewABI support hasn't been implement.  NewABI should generate RTL
+sequence instead of ASM sequence directly.  */
+  if (mips_current_loadgp_style () == LOADGP_OLDABI)
+   {
+ emit_clobber (MIPS16_PIC_TEMP);
+ emit_clobber (MIPS_PROLOGUE_TEMP (Pmode));
+   }
   pop_topmost_sequence ();
 }
 
-- 
2.39.2



Re: [Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Jakub Jelinek
On Tue, May 28, 2024 at 07:43:00PM +0200, Tobias Burnus wrote:
> Improve test coverage by removing 'prune-output' given that the features are
> implemented in the meanwhile.
> 
> Comments, suggestions? Otherwise I will commit the patch as obvious.
> 
> Tobias

> testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/lastprivate-conditional-1.c: Remove
>   '{ dg-prune-output "not supported yet" }'.
>   * c-c++-common/gomp/requires-1.c: Likewise.
>   * c-c++-common/gomp/requires-2.c: Likewise.
>   * c-c++-common/gomp/reverse-offload-1.c: Likewise.
>   * g++.dg/gomp/requires-1.C: Likewise.
>   * gfortran.dg/gomp/requires-1.f90: Likewise.
>   * gfortran.dg/gomp/requires-2.f90: Likewise.
>   * gfortran.dg/gomp/requires-4.f90: Likewise.
>   * gfortran.dg/gomp/requires-5.f90: Likewise.
>   * gfortran.dg/gomp/requires-6.f90: Likewise.
>   * gfortran.dg/gomp/requires-7.f90: Likewise.

LGTM.

Jakub



[Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Tobias Burnus
Improve test coverage by removing 'prune-output' given that the features 
are implemented in the meanwhile.


Comments, suggestions? Otherwise I will commit the patch as obvious.

Tobias
testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/lastprivate-conditional-1.c: Remove
	'{ dg-prune-output "not supported yet" }'.
	* c-c++-common/gomp/requires-1.c: Likewise.
	* c-c++-common/gomp/requires-2.c: Likewise.
	* c-c++-common/gomp/reverse-offload-1.c: Likewise.
	* g++.dg/gomp/requires-1.C: Likewise.
	* gfortran.dg/gomp/requires-1.f90: Likewise.
	* gfortran.dg/gomp/requires-2.f90: Likewise.
	* gfortran.dg/gomp/requires-4.f90: Likewise.
	* gfortran.dg/gomp/requires-5.f90: Likewise.
	* gfortran.dg/gomp/requires-6.f90: Likewise.
	* gfortran.dg/gomp/requires-7.f90: Likewise.

 gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c | 2 --
 gcc/testsuite/c-c++-common/gomp/requires-1.c| 2 --
 gcc/testsuite/c-c++-common/gomp/requires-2.c| 2 --
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c | 2 --
 gcc/testsuite/g++.dg/gomp/requires-1.C  | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90   | 1 -
 11 files changed, 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
index 722aba79a52..d4ef49690e8 100644
--- a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
@@ -63,2 +62,0 @@ bar (int *p)
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-1.c b/gcc/testsuite/c-c++-common/gomp/requires-1.c
index e1f2e3a503f..a47ec659566 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-1.c
@@ -13,2 +12,0 @@ foo ()
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-2.c b/gcc/testsuite/c-c++-common/gomp/requires-2.c
index 717b65caeea..d7430b1b1a4 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-2.c
@@ -9,2 +8,0 @@
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 9a3fa5230f8..ddc3c2c6be1 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -9,2 +8,0 @@
-/* { dg-prune-output "'reverse_offload' clause on 'requires' directive not supported yet" } */
-
diff --git a/gcc/testsuite/g++.dg/gomp/requires-1.C b/gcc/testsuite/g++.dg/gomp/requires-1.C
index aefeb288dad..5ca5e006da1 100644
--- a/gcc/testsuite/g++.dg/gomp/requires-1.C
+++ b/gcc/testsuite/g++.dg/gomp/requires-1.C
@@ -11,2 +10,0 @@ namespace M {
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
index b115a654e71..19007834c45 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
@@ -12,2 +11,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
index 5f11a7bfb2a..f144d391034 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
@@ -13,2 +12,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
index c870a2840d3..9d936197f8f 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
@@ -36 +35,0 @@ end
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
index e719e929294..87be933ba49 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
index cabd3d94a90..b20c218dd6b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-7.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-7

PING: Re: [PATCH] selftest: invoke "diff" when ASSERT_STREQ fails

2024-05-28 Thread David Malcolm
Ping.

This patch has actually been *very* helpful to me when debugging
selftest failures involving ASSERT_STREQ.

Thanks
Dave

On Fri, 2024-05-17 at 15:51 -0400, David Malcolm wrote:
> Currently when ASSERT_STREQ or ASSERT_STREQ_AT fail we print
> both strings to stderr.  However it can be hard to figure out
> the problem (e.g. for 1-character differences in long strings).
> 
> Extend the output by writing out the strings to tempfiles and
> invoking "diff -up" on them when we have such a selftest failure,
> to (I hope) simplify debugging.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/ChangeLog:
> * selftest.cc (selftest::print_diff): New function.
> (selftest::assert_streq): Call it when we have non-equal
> non-null strings.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/selftest.cc | 28 ++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/selftest.cc b/gcc/selftest.cc
> index 6438d86a6aa0..f58c0631908e 100644
> --- a/gcc/selftest.cc
> +++ b/gcc/selftest.cc
> @@ -63,6 +63,26 @@ fail_formatted (const location &loc, const char
> *fmt, ...)
>    abort ();
>  }
>  
> +/* Invoke "diff" to print the difference between VAL1 and VAL2
> +   on stdout.  */
> +
> +static void
> +print_diff (const location &loc, const char *val1, const char *val2)
> +{
> +  temp_source_file tmpfile1 (loc, ".txt", val1);
> +  temp_source_file tmpfile2 (loc, ".txt", val2);
> +  const char *args[] = {"diff",
> +   "-up",
> +   tmpfile1.get_filename (),
> +   tmpfile2.get_filename (),
> +   NULL};
> +  int exit_status = 0;
> +  int err = 0;
> +  pex_one (PEX_SEARCH | PEX_LAST,
> +  args[0], CONST_CAST (char **, args),
> +  NULL, NULL, NULL, &exit_status, &err);
> +}
> +
>  /* Implementation detail of ASSERT_STREQ.
>     Compare val1 and val2 with strcmp.  They ought
>     to be non-NULL; fail gracefully if either or both are NULL.  */
> @@ -89,8 +109,12 @@ assert_streq (const location &loc,
> if (strcmp (val1, val2) == 0)
>   pass (loc, "ASSERT_STREQ");
> else
> - fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> val2=\"%s\"\n",
> - desc_val1, desc_val2, val1, val2);
> + {
> +   print_diff (loc, val1, val2);
> +   fail_formatted
> + (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> val2=\"%s\"\n",
> +  desc_val1, desc_val2, val1, val2);
> + }
>    }
>  }
>  



[pushed] diagnostics: disable localization of events in selftest paths [PR115203]

2024-05-28 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-866-g2dbb1c124c1e58.

gcc/ChangeLog:
PR analyzer/115203
* diagnostic-path.h
(simple_diagnostic_path::disable_event_localization): New.
(simple_diagnostic_path::m_localize_events): New field.
* diagnostic.cc
(simple_diagnostic_path::simple_diagnostic_path): Initialize
m_localize_events.
(simple_diagnostic_path::add_event): Only localize fmt if
m_localize_events is true.
* tree-diagnostic-path.cc
(test_diagnostic_path::test_diagnostic_path): Call
disable_event_localization.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-path.h   | 3 +++
 gcc/diagnostic.cc   | 8 +---
 gcc/tree-diagnostic-path.cc | 3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/diagnostic-path.h b/gcc/diagnostic-path.h
index 982d68b872ea..938bd583a3da 100644
--- a/gcc/diagnostic-path.h
+++ b/gcc/diagnostic-path.h
@@ -293,12 +293,15 @@ class simple_diagnostic_path : public diagnostic_path
 
   void connect_to_next_event ();
 
+  void disable_event_localization () { m_localize_events = false; }
+
  private:
   auto_delete_vec m_threads;
   auto_delete_vec m_events;
 
   /* (for use by add_event).  */
   pretty_printer *m_event_pp;
+  bool m_localize_events;
 };
 
 extern void debug (diagnostic_path *path);
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 1f30d1d7cdac..f27b2f1a492c 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -2517,7 +2517,8 @@ set_text_art_charset (enum diagnostic_text_art_charset 
charset)
 /* class simple_diagnostic_path : public diagnostic_path.  */
 
 simple_diagnostic_path::simple_diagnostic_path (pretty_printer *event_pp)
-  : m_event_pp (event_pp)
+: m_event_pp (event_pp),
+  m_localize_events (true)
 {
   add_thread ("main");
 }
@@ -2563,7 +2564,7 @@ simple_diagnostic_path::add_thread (const char *name)
stack depth DEPTH.
 
Use m_context's printer to format FMT, as the text of the new
-   event.
+   event.  Localize FMT iff m_localize_events is set.
 
Return the id of the new event.  */
 
@@ -2580,7 +2581,8 @@ simple_diagnostic_path::add_event (location_t loc, tree 
fndecl, int depth,
 
   va_start (ap, fmt);
 
-  text_info ti (_(fmt), &ap, 0, nullptr, &rich_loc);
+  text_info ti (m_localize_events ? _(fmt) : fmt,
+   &ap, 0, nullptr, &rich_loc);
   pp_format (pp, &ti);
   pp_output_formatted_text (pp);
 
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/tree-diagnostic-path.cc
index 743a8c2a1d29..0ad6c5beb81c 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/tree-diagnostic-path.cc
@@ -1016,7 +1016,7 @@ path_events_have_column_data_p (const diagnostic_path 
&path)
 }
 
 /* A subclass of simple_diagnostic_path that adds member functions
-   for adding test events.  */
+   for adding test events and suppresses translation of these events.  */
 
 class test_diagnostic_path : public simple_diagnostic_path
 {
@@ -1024,6 +1024,7 @@ class test_diagnostic_path : public simple_diagnostic_path
   test_diagnostic_path (pretty_printer *event_pp)
   : simple_diagnostic_path (event_pp)
   {
+disable_event_localization ();
   }
 
   void add_entry (tree fndecl, int stack_depth)
-- 
2.26.3



[pushed] Fix bootstrap on AIX by adding c-family/c-type-mismatch.cc [PR115167]

2024-05-28 Thread David Malcolm
PR bootstrap/115167 reports a bootstrap failure on AIX triggered by
r15-636-g770657d02c986c whilst building f951 in stage 2, due to
the linker not being able to find symbols for:

  vtable for range_label_for_type_mismatch
  range_label_for_type_mismatch::get_text(unsigned int) const

The only users of the class range_label_for_type_mismatch are in the
C/C++ frontends, each of which supply their own implementation of:

  range_label_for_type_mismatch::get_text(unsigned int) const

i.e. we had a cluster of symbols that was disconnnected from any
users on f951.

The above patch added a new range_label::get_effects vfunc to the
base class.  My hunch is that we were getting away with not defining
the symbol for Fortran with AIX's linker before (since none of the
users are used), but adding the get_effects vfunc has somehow broken
things (possibly because there's an empty implementation in the base
class in the *header*).

The following patch moves all of the code in
gcc/gcc-rich-location.[cc,h,o} defining and using
range_label_for_type_mismatch to a new
gcc/c-family/c-type-mismatch.{cc,h,o}, to help the linker ignore this
cluster of symbols when it's disconnected from users.

I was able to reproduce the failure without the patch, and then
successfully bootstrap with this patch on powerpc-ibm-aix7.3.1.0
(cfarm119).

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-865-gb544ff88560e10.

gcc/ChangeLog:
PR bootstrap/115167
* Makefile.in (C_COMMON_OBJS): Add c-family/c-type-mismatch.o.
* gcc-rich-location.cc
(maybe_range_label_for_tree_type_mismatch::get_text): Move to
c-family/c-type-mismatch.cc.
(binary_op_rich_location::binary_op_rich_location): Likewise.
(binary_op_rich_location::use_operator_loc_p): Likewise.
* gcc-rich-location.h (class range_label_for_type_mismatch):
Likewise.
(class maybe_range_label_for_tree_type_mismatch): Likewise.
(class op_location_t): Likewise for forward decl.
(class binary_op_rich_location): Likewise.

gcc/c-family/ChangeLog:
PR bootstrap/115167
* c-format.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-type-mismatch.cc: New file, taking material from
gcc-rich-location.cc.
* c-type-mismatch.h: New file, taking material from
gcc-rich-location.h.
* c-warn.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".

gcc/c/ChangeLog:
PR bootstrap/115167
* c-objc-common.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* c-typeck.cc: Likewise.

gcc/cp/ChangeLog:
PR bootstrap/115167
PR bootstrap/115167
* call.cc: Replace include of "gcc-rich-location.h" with
"c-family/c-type-mismatch.h".
* error.cc: Likewise.
* typeck.cc: Likewise.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in |   3 +-
 gcc/c-family/c-format.cc|   2 +-
 gcc/c-family/c-type-mismatch.cc | 127 
 gcc/c-family/c-type-mismatch.h  | 126 +++
 gcc/c-family/c-warn.cc  |   2 +-
 gcc/c/c-objc-common.cc  |   2 +-
 gcc/c/c-typeck.cc   |   2 +-
 gcc/cp/call.cc  |   2 +-
 gcc/cp/error.cc |   2 +-
 gcc/cp/typeck.cc|   2 +-
 gcc/gcc-rich-location.cc|  89 --
 gcc/gcc-rich-location.h | 101 -
 12 files changed, 262 insertions(+), 198 deletions(-)
 create mode 100644 gcc/c-family/c-type-mismatch.cc
 create mode 100644 gcc/c-family/c-type-mismatch.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a7f15694c34b..66d42cc41f84 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1301,7 +1301,8 @@ C_COMMON_OBJS = c-family/c-common.o 
c-family/c-cppbuiltin.o c-family/c-dump.o \
   c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \
   c-family/c-semantics.o c-family/c-ada-spec.o \
   c-family/c-ubsan.o c-family/known-headers.o \
-  c-family/c-attribs.o c-family/c-warn.o c-family/c-spellcheck.o
+  c-family/c-attribs.o c-family/c-warn.o c-family/c-spellcheck.o \
+  c-family/c-type-mismatch.o
 
 # Analyzer object files
 ANALYZER_OBJS = \
diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 9c4deabc1095..7a5ffc25602c 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -35,7 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest-diagnostic.h"
 #include "builtins.h"
 #include "attribs.h"
-#include "gcc-rich-location.h"
+#include "c-family/c-type-mismatch.h"
 
 /* Handle attributes associated with format checking.  */
 
diff --git a/gcc/c-family/c-type-mismatch.cc b/gcc/c-family/c-type-mismatch.cc
new file mode 100644
index ..fae31261d544
--- /dev/null
+++ b/gcc/c-family/c

Re: [PATCH] attribs: Fix and refactor diag_attr_exclusions

2024-05-28 Thread Richard Sandiford
Andrew Carlotti  writes:
> The existing implementation of this function was convoluted, and had
> multiple control flow errors that became apparent to me while reading
> the code:
>
> 1. The initial early return only checked the properties of the first
> exclusion in the list, when these properties could be different for
> subsequent exclusions.
>
> 2. excl was not reset within the outer loop, so the inner loop body
> would only execute during the first iteration of the outer loop.  This
> effectively meant that the value of attrs[1] was ignored.
>
> 3. The function called itself recursively twice, with both last_decl and
> TREE_TYPE (last_decl) as parameters. The second recursive call should
> have been redundant, since attrs[1] = TREE_TYPE (last_decl) during the
> first recursive call.

Thanks for doing this.  Agree with the above.

> This patch eliminated the early return, and combines the checks with
> those present within the inner loop.  It also fixes the inner loop
> initialisation, and modifies the outer loop to iterate over nodes
> instead of their attributes. This latter change allows the recursion to
> be eliminated, by extending the new nodes array to include last_decl
> (and its type) as well.
>
> This patch provides an alternative fix for PR114634, although I wasn't
> aware of that issue until rebasing on top of Jakub's fix.
>
> I am not aware of any other compiler bugs resulting from these issues.
> However, if the exclusions for target_clones were listed in the opposite
> order, then it would have broken detection of the always_inline
> exclusion on aarch64 (where TARGET_HAS_FMV_TARGET_ATTRIBUTE is false).
>
> Is this ok for master?
>
> gcc/ChangeLog:
>
>   * attribs.cc (diag_attr_exclusions): Fix and refactor.
>
>
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 
> 3ab0b0fd87a4404a593b2de365ea5226e31fe24a..431dd4255e68e92dd8d10bbb21ea079e50811faa
>  100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -433,84 +433,69 @@ get_attribute_namespace (const_tree attr)
> or a TYPE.  */
>  
>  static bool
> -diag_attr_exclusions (tree last_decl, tree node, tree attrname,
> +diag_attr_exclusions (tree last_decl, tree base_node, tree attrname,
> const attribute_spec *spec)
>  {
> -  const attribute_spec::exclusions *excl = spec->exclude;
>  
> -  tree_code code = TREE_CODE (node);
> +  /* BASE_NODE is either the current decl to which the attribute is being
> + applied, or its type.  For the former, consider the attributes on both 
> the
> + decl and its type.  Check both LAST_DECL and its type as well.  */
>  
> -  if ((code == FUNCTION_DECL && !excl->function
> -   && (!excl->type || !spec->affects_type_identity))
> -  || (code == VAR_DECL && !excl->variable
> -   && (!excl->type || !spec->affects_type_identity))
> -  || (((code == TYPE_DECL || RECORD_OR_UNION_TYPE_P (node)) && 
> !excl->type)))
> -return false;
> +  tree nodes[4] = { NULL_TREE, NULL_TREE, NULL_TREE, NULL_TREE };
>  
> -  /* True if an attribute that's mutually exclusive with ATTRNAME
> - has been found.  */
> -  bool found = false;
> +  nodes[0] = base_node;
> +  if (DECL_P (base_node))
> +  nodes[1] = (TREE_TYPE (base_node));

Nit: too much indentation.

> -  if (last_decl && last_decl != node && TREE_TYPE (last_decl) != node)
> +  if (last_decl)
>  {
> -  /* Check both the last DECL and its type for conflicts with
> -  the attribute being added to the current decl or type.  */
> -  found |= diag_attr_exclusions (last_decl, last_decl, attrname, spec);
> -  tree decl_type = TREE_TYPE (last_decl);
> -  found |= diag_attr_exclusions (last_decl, decl_type, attrname, spec);
> +  nodes[2] = last_decl;
> +  if (DECL_P (last_decl))
> +   nodes[3] = TREE_TYPE (last_decl);
>  }
>  
> -  /* NODE is either the current DECL to which the attribute is being
> - applied or its TYPE.  For the former, consider the attributes on
> - both the DECL and its type.  */
> -  tree attrs[2];
> -
> -  if (DECL_P (node))
> -{
> -  attrs[0] = DECL_ATTRIBUTES (node);
> -  if (TREE_TYPE (node))
> - attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
> -  else
> - /* TREE_TYPE can be NULL e.g. while processing attributes on
> -enumerators.  */
> - attrs[1] = NULL_TREE;
> -}
> -  else
> -{
> -  attrs[0] = TYPE_ATTRIBUTES (node);
> -  attrs[1] = NULL_TREE;
> -}
> +  /* True if an attribute that's mutually exclusive with ATTRNAME
> + has been found.  */
> +  bool found = false;
>  
>/* Iterate over the mutually exclusive attribute names and verify
>   that the symbol doesn't contain it.  */
> -  for (unsigned i = 0; i != ARRAY_SIZE (attrs); ++i)
> +  for (unsigned i = 0; i != ARRAY_SIZE (nodes); ++i)
>  {
> -  if (!attrs[i])
> +  tree node = nodes[i];
> +
> +  if (!node)
>   continue;
>  
> -  for ( ; excl->name; ++excl)
> +  tree attr;
> +  if DECL_P 

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-28 Thread Feng Xue OS
Because bbs of loop_vec_info need to be allocated via old-fashion
XCNEWVEC, in order to receive result from dfs_enumerate_from(),
so have to make bb_vec_info align with loop_vec_info, use
basic_block * instead of vec. Another reason is that
some loop vect related codes assume that bbs is a pointer, such
as using LOOP_VINFO_BBS() to directly free the bbs area.

While encapsulating bbs into array_slice might make changed code
more wordy. So still choose basic_block * as its type. Updated the
patch by removing bbs_as_vector.

Feng.

gcc/
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
initialization of bbs to explicit construction code.  Adjust the
definition of nbbs.
(update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
* tree-vect-pattern.cc (vect_determine_precisions): Make
loop_vec_info and bb_vec_info share same code.
(vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
via base vec_info class.
(_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
fields of input auto_vec<> bbs.
(vect_slp_region): Use access to nbbs to replace original
bbs.length().
(vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
* tree-vectorizer.cc (vec_info::vec_info): Add initialization of
bbs and nbbs.
(vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
class.
* tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
(LOOP_VINFO_NBBS): New macro.
(BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
(BB_VINFO_NBBS): New macro.
(_loop_vec_info): Remove field bbs.
(_bb_vec_info): Rename field bbs.
---
 gcc/tree-vect-loop.cc |   7 +-
 gcc/tree-vect-patterns.cc | 142 +++---
 gcc/tree-vect-slp.cc  |  23 +++---
 gcc/tree-vectorizer.cc|   7 +-
 gcc/tree-vectorizer.h |  19 +++--
 5 files changed, 70 insertions(+), 128 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3b94bb13a8b..04a9ac64df7 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
 _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
   : vec_info (vec_info::loop, shared),
 loop (loop_in),
-bbs (XCNEWVEC (basic_block, loop->num_nodes)),
 num_itersm1 (NULL_TREE),
 num_iters (NULL_TREE),
 num_iters_unchanged (NULL_TREE),
@@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
  case of the loop forms we allow, a dfs order of the BBs would the same
  as reversed postorder traversal, so we are safe.  */

-  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
- bbs, loop->num_nodes, loop);
+  bbs = XCNEWVEC (basic_block, loop->num_nodes);
+  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
+loop->num_nodes, loop);
   gcc_assert (nbbs == loop->num_nodes);

   for (unsigned int i = 0; i < nbbs; i++)
@@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)

   free (LOOP_VINFO_BBS (epilogue_vinfo));
   LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
+  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;

   /* Advance data_reference's with the number of iterations of the previous
  loop and its prologue.  */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 8929e5aa7f3..88e7e34d78d 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
stmt_vec_info stmt_info)
 void
 vect_determine_precisions (vec_info *vinfo)
 {
+  basic_block *bbs = vinfo->bbs;
+  unsigned int nbbs = vinfo->nbbs;
+
   DUMP_VECT_SCOPE ("vect_determine_precisions");

-  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
+  for (unsigned int i = 0; i < nbbs; i++)
 {
-  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
-  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  unsigned int nbbs = loop->num_nodes;
-
-  for (unsigned int i = 0; i < nbbs; i++)
+  basic_block bb = bbs[i];
+  for (auto gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
- basic_block bb = bbs[i];
- for (auto gsi = gsi_start_phis (bb);
-  !gsi_end_p (gsi); gsi_next (&gsi))
-   {
- stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi.phi ());
- if (stmt_info)
-   vect_determine_mask_precision (vinfo, stmt_info);
-   }
- for (auto si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
-   if (!is_gimple_debug (gsi_stmt (si)))
- vect_determine_mask_precision
- 

[PATCH] MIPS/testsuite: Fix bseli.b fail in msa-builtins.c

2024-05-28 Thread YunQiang Su
commit 05daf617ea22e1d818295ed2d037456937e23530
Author: Jeff Law 
Date:   Sat May 25 12:39:05 2024 -0600

[committed] [v2] More logical op simplifications in simplify-rtx.cc

does some simplifications, and then `bseli.b $w1,$w0,255` is found that
it is same with `or.v $w1,$w0,$w1`. So there will be no bseli.b instruction
generated.

Let's use 254 instead of 255 to test the generation of `bseli.b`.

gcc/testsuite

* gcc.target/mips/msa-builtins.c: Use 254 instead of 255 for
bseli.b, as `bseli.b $w0,$w1,255` is same as `or.v $w0,$w0,$w1`.
---
 gcc/testsuite/gcc.target/mips/msa-builtins.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/mips/msa-builtins.c 
b/gcc/testsuite/gcc.target/mips/msa-builtins.c
index a679f065f34..6a146b3e6ae 100644
--- a/gcc/testsuite/gcc.target/mips/msa-builtins.c
+++ b/gcc/testsuite/gcc.target/mips/msa-builtins.c
@@ -705,7 +705,7 @@
 #define BNEG(T) NOMIPS16 T FN (bneg, T ## _DF) (T i, T j) { return BUILTIN 
(bneg, T ## _DF) (i, j); }
 #define BNEGI(T) NOMIPS16 T FN (bnegi, T ## _DF) (T i) { return BUILTIN 
(bnegi, T ## _DF) (i, 0); }
 #define BSEL(T) NOMIPS16 T FN (bsel, v) (T i, T j, T k) { return BUILTIN 
(bsel, v) (i, j, k); }
-#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN 
(bseli, T ## _DF) (i, j, U8MAX); }
+#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN 
(bseli, T ## _DF) (i, j, U8MAX-1); }
 #define BSET(T) NOMIPS16 T FN (bset, T ## _DF) (T i, T j) { return BUILTIN 
(bset, T ## _DF) (i, j); }
 #define BSETI(T) NOMIPS16 T FN (bseti, T ## _DF) (T i) { return BUILTIN 
(bseti, T ## _DF) (i, 0); }
 #define NLOC(T) NOMIPS16 T FN (nloc, T ## _DF) (T i) { return BUILTIN (nloc, T 
## _DF) (i); }
-- 
2.39.2



[PATCH] regenerate-opt-urls.py: fix transposed values for "vax" and "v850"

2024-05-28 Thread David Malcolm
> On Tue, 2024-05-28 at 15:03 +0200, Mark Wielaard wrote:
> Hi Maciej (Hi David, added to CC),

>On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
> >  As reported in PR target/79646 and fixed by a change proposed by
> > Abe we 
> > have a couple of issues with the descriptions of the VAX
> > floating-point 
> > format options in the option definition file.  Additionally most of
> > these 
> > options are not documented in the manual.
> > 
> >  This mini patch series addresses these issues, including Abe's
> > change, 
> > slightly updated, and my new change.  See individual change
> > descriptions 
> > for details.
> > 
> >  Verified by inspecting output produced by `vax-netbsdelf-gcc -v
> > --help' 
> > and by eyeballing `gcc.info' and `gcc.pdf' files produced. 
> > Committed.
>
> This broke the gcc-autoregen checker because the
> gcc/config/vax/vax.opt.urls file wasn't regenerated:
> https://builder.sourceware.org/buildbot/#/builders/269/builds/5347
> 
> Producing the following diff:
> 
> diff --git a/gcc/config/vax/vax.opt.urls
> b/gcc/config/vax/vax.opt.urls
> index c6b1c418b61..ca78b31dd4c 100644
> --- a/gcc/config/vax/vax.opt.urls
> +++ b/gcc/config/vax/vax.opt.urls
> @@ -1,7 +1,13 @@
>  ; Autogenerated by regenerate-opt-urls.py from
> gcc/config/vax/vax.opt and generated HTML
>  
> +; skipping UrlSuffix for 'md' due to finding no URLs
> +
> +; skipping UrlSuffix for 'md-float' due to finding no URLs
> +
>  ; skipping UrlSuffix for 'mg' due to finding no URLs
>  
> +; skipping UrlSuffix for 'mg-float' due to finding no URLs
> +
>  ; skipping UrlSuffix for 'mgnu' due to finding no URLs
>  
>  ; skipping UrlSuffix for 'munix' due to finding no URLs
> 
> I am not completely clear on why though. Since it seems you actually
> did add documentation for exactly these options.
> 
> David, should the above diff just be checked in, or do we need to
> investigate why the URLs weren't found?

[adding Nick, re the v850 target]

I found the problem - I messed up when I was populating
TARGET_SPECIFIC_PAGES in regenerate-opt-urls.py, accidentally
transposing the entries for v850 and vax by writing:

'gcc/V850-Options.html' : 'gcc/config/vax/',
'gcc/VAX-Options.html' : 'gcc/config/v850/',

leading to both gcc/config/v850/v850.opt.urls and
gcc/config/vax/vax.opt.urls being full of such comments.

Sorry.

Fixing that leads to the files for both targets being populated with
correct-looking URL entries.

I'll push this to trunk (and backport to gcc 14) after suitable testing.

Dave

gcc/ChangeLog:
* config/v850/v850.opt.urls: Regenerate, with fix.
* config/vax/vax.opt.urls: Likewise.
* regenerate-opt-urls.py (TARGET_SPECIFIC_PAGES): Fix transposed
values for "vax" and "v850".

Signed-off-by: David Malcolm 
---
 gcc/config/v850/v850.opt.urls | 81 +++
 gcc/config/vax/vax.opt.urls   | 21 +++--
 gcc/regenerate-opt-urls.py|  4 +-
 3 files changed, 73 insertions(+), 33 deletions(-)

diff --git a/gcc/config/v850/v850.opt.urls b/gcc/config/v850/v850.opt.urls
index dc5a83107b3..a06f4833f47 100644
--- a/gcc/config/v850/v850.opt.urls
+++ b/gcc/config/v850/v850.opt.urls
@@ -1,60 +1,87 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/v850/v850.opt and 
generated HTML
 
-; skipping UrlSuffix for 'mapp-regs' due to finding no URLs
+mapp-regs
+UrlSuffix(gcc/V850-Options.html#index-mapp-regs-1)
 
-; skipping UrlSuffix for 'mbig-switch' due to finding no URLs
+mbig-switch
+UrlSuffix(gcc/V850-Options.html#index-mbig-switch-1)
 
 ; skipping UrlSuffix for 'mdebug' due to finding no URLs
 
-; skipping UrlSuffix for 'mdisable-callt' due to finding no URLs
+mdisable-callt
+UrlSuffix(gcc/V850-Options.html#index-mdisable-callt)
 
-; skipping UrlSuffix for 'mep' due to finding no URLs
+mep
+UrlSuffix(gcc/V850-Options.html#index-mep)
 
-; skipping UrlSuffix for 'mghs' due to finding no URLs
+mghs
+UrlSuffix(gcc/V850-Options.html#index-mghs)
 
-; skipping UrlSuffix for 'mlong-calls' due to finding no URLs
+mlong-calls
+UrlSuffix(gcc/V850-Options.html#index-mlong-calls-7)
 
-; skipping UrlSuffix for 'mprolog-function' due to finding no URLs
+mprolog-function
+UrlSuffix(gcc/V850-Options.html#index-mprolog-function)
 
-; skipping UrlSuffix for 'msda=' due to finding no URLs
+msda=
+UrlSuffix(gcc/V850-Options.html#index-msda)
 
-; skipping UrlSuffix for 'mspace' due to finding no URLs
+mspace
+UrlSuffix(gcc/V850-Options.html#index-mspace)
 
-; skipping UrlSuffix for 'mtda=' due to finding no URLs
+mtda=
+UrlSuffix(gcc/V850-Options.html#index-mtda)
 
 ; skipping UrlSuffix for 'mno-strict-align' due to finding no URLs
 
-; skipping UrlSuffix for 'mv850' due to finding no URLs
+mv850
+UrlSuffix(gcc/V850-Options.html#index-mv850)
 
-; skipping UrlSuffix for 'mv850e' due to finding no URLs
+mv850e
+UrlSuffix(gcc/V850-Options.html#index-mv850e)
 
-; skipping UrlSuffix for 'mv850e1' due to finding no URLs
+mv850e1
+UrlSuffix(gcc/V850-Options.html#index-mv850

Re: [PATCH] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

2024-05-28 Thread Jonathan Wakely
On Tue, 28 May 2024 at 15:25, Rainer Orth  wrote:
>
> Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
> and 64-bit), Solaris/x86 (32-bit only), and several other targets:
>
> FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
> FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++23 execution test
> FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++26 execution test
>
> As it turns out, both the copy of libbacktrace in libstdc++ and the
> testcases proper need to compiled with -funwind-tables, as is done for
> libbacktrace itself.
>
> This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
> always defaults to -funwind-tables.  32-bit x86 does, too, when
> -fomit-frame-pointer is enabled as on Linux/i686, but unlike
> Solaris/i386.
>
> So this patch always enables the option both for the libbacktrace copy
> and the testcases.
>
> Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
> x86_64-pc-linux-gnu.
>
> Ok for trunk?

OK for trunk and gcc-14. Thanks for figuring out the problem here!


>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-05-23  Rainer Orth  
>
> libstdc++-v3:
> PR libstdc++/111641
> * src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables.
> * src/libbacktrace/Makefile.in: Regenerate.
>
> * testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add
> -funwind-tables.
> * testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
> * testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.
>



[PATCH] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

2024-05-28 Thread Rainer Orth
Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32
and 64-bit), Solaris/x86 (32-bit only), and several other targets:

FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/current.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/entry.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/output.cc  -std=gnu++26 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++23 execution test
FAIL: 19_diagnostics/stacktrace/stacktrace.cc  -std=gnu++26 execution test

As it turns out, both the copy of libbacktrace in libstdc++ and the
testcases proper need to compiled with -funwind-tables, as is done for
libbacktrace itself.

This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86
always defaults to -funwind-tables.  32-bit x86 does, too, when
-fomit-frame-pointer is enabled as on Linux/i686, but unlike
Solaris/i386.

So this patch always enables the option both for the libbacktrace copy
and the testcases.

Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and
x86_64-pc-linux-gnu.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-05-23  Rainer Orth  

libstdc++-v3:
PR libstdc++/111641
* src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables.
* src/libbacktrace/Makefile.in: Regenerate.

* testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add
-funwind-tables.
* testsuite/19_diagnostics/stacktrace/entry.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/hash.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/output.cc: Likewise.
* testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise.

# HG changeset patch
# Parent  a0526be1377da6b48eacbdd53f1d0e0b02ddb731
libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]

diff --git a/libstdc++-v3/src/libbacktrace/Makefile.am b/libstdc++-v3/src/libbacktrace/Makefile.am
--- a/libstdc++-v3/src/libbacktrace/Makefile.am
+++ b/libstdc++-v3/src/libbacktrace/Makefile.am
@@ -51,7 +51,7 @@ C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-pr
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = \
 	$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-	$(C_WARN_FLAGS)
+	$(C_WARN_FLAGS) -funwind-tables
 AM_CFLAGS += $(EXTRA_CFLAGS)
 AM_CXXFLAGS = \
 	$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
diff --git a/libstdc++-v3/src/libbacktrace/Makefile.in b/libstdc++-v3/src/libbacktrace/Makefile.in
--- a/libstdc++-v3/src/libbacktrace/Makefile.in
+++ b/libstdc++-v3/src/libbacktrace/Makefile.in
@@ -473,7 +473,7 @@ libstdc___libbacktrace_la_CPPFLAGS = \
 C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wno-unused-but-set-variable
 CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter
 AM_CFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
-	$(C_WARN_FLAGS) $(EXTRA_CFLAGS)
+	$(C_WARN_FLAGS) -funwind-tables $(EXTRA_CFLAGS)
 AM_CXXFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
 	$(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions $(EXTRA_CXXFLAGS)
 obj_prefix = std_stacktrace
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
@@ -1,4 +1,4 @@
-// { dg-options "-lstdc++exp" }
+// { dg-options "-funwind-tables -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-cpp-feature-test __cpp_lib_stacktrace }
 
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
--- a/libstdc++-v3/testsuite/19_diagnostics/s

Re: [PATCH] vect: Use vect representative statement instead of original in patch recog [PR115060]

2024-05-28 Thread Feng Xue OS
Changed as the comments.

Thanks,
Feng


From: Richard Biener 
Sent: Tuesday, May 28, 2024 5:34 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Use vect representative statement instead of 
original in patch recog [PR115060]

On Sat, May 25, 2024 at 4:45 PM Feng Xue OS  wrote:
>
> Some utility functions (such as vect_look_through_possible_promotion) that are
> to find out certain kind of direct or indirect definition SSA for a value, may
> return the original one of the SSA, not its pattern representative SSA, even
> pattern is involved. For example,
>
>a = (T1) patt_b;
>patt_b = (T2) c;// b = ...
>patt_c = not-a-cast;// c = ...
>
> Given 'a', the mentioned function will return 'c', instead of 'patt_c'. This
> subtlety would make some pattern recog code that is unaware of it mis-use the
> original instead of the new pattern statement, which is inconsistent wth
> processing logic of the pattern formation pass. This patch corrects the issue
> by forcing another utility function (vect_get_internal_def) return the pattern
> statement information to caller by default.
>
> Regression test on x86-64 and aarch64.
>
> Feng
> --
> gcc/
> PR tree-optimization/115060
> * tree-vect-patterns.h (vect_get_internal_def): Add a new parameter
> for_vectorize.
> (vect_widened_op_tree): Call vect_get_internal_def instead of look_def
> to get statement information.
> (vect_recog_widen_abd_pattern): No need to call 
> vect_stmt_to_vectorize.
> ---
>  gcc/tree-vect-patterns.cc | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index a313dc64643..fa35bf26372 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -258,15 +258,21 @@ vect_element_precision (unsigned int precision)
>  }
>
>  /* If OP is defined by a statement that's being considered for vectorization,
> -   return information about that statement, otherwise return NULL.  */
> +   return information about that statement, otherwise return NULL.
> +   FOR_VECTORIZE is used to specify whether original or vectorization
> +   representative (if have) statement information is returned.  */
>
>  static stmt_vec_info
> -vect_get_internal_def (vec_info *vinfo, tree op)
> +vect_get_internal_def (vec_info *vinfo, tree op, bool for_vectorize = true)

I'm probably blind - but you nowhere pass 'false' and I think returning the
pattern stmt is the correct behavior always.

OK with omitting the new parameter.

>  {
>stmt_vec_info def_stmt_info = vinfo->lookup_def (op);
>if (def_stmt_info
>&& STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_internal_def)
> -return def_stmt_info;
> +{
> +  if (for_vectorize)
> +   def_stmt_info = vect_stmt_to_vectorize (def_stmt_info);
> +  return def_stmt_info;
> +}
>return NULL;
>  }
>
> @@ -655,7 +661,8 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info 
> stmt_info, tree_code code,
>
>   /* Recursively process the definition of the operand.  */
>   stmt_vec_info def_stmt_info
> -   = vinfo->lookup_def (this_unprom->op);
> +   = vect_get_internal_def (vinfo, this_unprom->op);
> +
>   nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>widened_code, shift_p, max_nops,
>this_unprom, common_type,
> @@ -1739,7 +1746,6 @@ vect_recog_widen_abd_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>if (!abd_pattern_vinfo)
>  return NULL;
>
> -  abd_pattern_vinfo = vect_stmt_to_vectorize (abd_pattern_vinfo);
>gcall *abd_stmt = dyn_cast  (STMT_VINFO_STMT (abd_pattern_vinfo));
>if (!abd_stmt


[PATCH] target/115254 - fix gcc.dg/vect/vect-gather-4.c dump scanning

2024-05-28 Thread Richard Biener
The dump scanning is supposed to check that we do not merge two
sligtly different gathers into one SLP node but since we now
SLP the store scanning for "ectorizing stmts using SLP" is no
longer good.  Instead the following makes us look for
"stmt 1 .* = .MASK" which would be how the second lane of an SLP
node looks like.  We have to handle both .MASK_GATHER_LOAD (for
targets with ifun mask gathers) and .MASK_LOAD (for ones without).

Tested on x86_64-linux with and without native gather and on GCN
where this now avoids a FAIL.

Pushed.

PR target/115254
* gcc.dg/vect/vect-gather-4.c: Adjust dump scan.
---
 gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
index d18094d6982..edd9a6783c2 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
@@ -45,4 +45,7 @@ f3 (int *restrict y, int *restrict x, int *restrict indices)
 }
 }
 
-/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" vect } } */
+/* We do not want to see a two-lane .MASK_LOAD or .MASK_GATHER_LOAD since
+   the gathers are different on each lane.  This is a bit fragile and
+   should possibly be turned into a runtime test.  */
+/* { dg-final { scan-tree-dump-not "stmt 1 \[^\r\n\]* = .MASK" vect } } */
-- 
2.35.3


Re: [PATCH] libstdc++: Avoid MMX return types from __builtin_shufflevector

2024-05-28 Thread Jonathan Wakely
On Wed, 15 May 2024 at 20:50, Matthias Kretz  wrote:
>
> Tested on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu,
> x86_64-linux-gnu (-m64, -m32, -mx32), and arm-linux-gnueabi
>
> OK for trunk?

OK

> And when backporting, should I squash it with the commit that
> introduced the regression?

I don't mind about that. If you cherry-pick them next to each other
and push them at the same time, nobody's going to end up using the
broken commit before the fix. It's fine to squash it if you prefer to
though.

OK for backports either way.

>
>  8< ---
>
> This resolves a regression on i686 that was introduced with
> r15-429-gfb1649f8b4ad50.
>
> Signed-off-by: Matthias Kretz 
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/114958
> * include/experimental/bits/simd.h (__as_vector): Don't use
> vector_size(8) on __i386__.
> (__vec_shuffle): Never return MMX vectors, widen to 16 bytes
> instead.
> (concat): Fix padding calculation to pick up widening logic from
> __as_vector.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 39 +--
>  1 file changed, 28 insertions(+), 11 deletions(-)
>
>
> --
> ──
>  Dr. Matthias Kretz   https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
>  stdₓ::simd
> ──


Re: [committed v2 0/2] VAX: Fix issues with FP format option documentation

2024-05-28 Thread Mark Wielaard
Hi Maciej (Hi David, added to CC),

On Mon, 2024-05-27 at 05:19 +0100, Maciej W. Rozycki wrote:
>  As reported in PR target/79646 and fixed by a change proposed by Abe we 
> have a couple of issues with the descriptions of the VAX floating-point 
> format options in the option definition file.  Additionally most of these 
> options are not documented in the manual.
> 
>  This mini patch series addresses these issues, including Abe's change, 
> slightly updated, and my new change.  See individual change descriptions 
> for details.
> 
>  Verified by inspecting output produced by `vax-netbsdelf-gcc -v --help' 
> and by eyeballing `gcc.info' and `gcc.pdf' files produced.  Committed.

This broke the gcc-autoregen checker because the
gcc/config/vax/vax.opt.urls file wasn't regenerated:
https://builder.sourceware.org/buildbot/#/builders/269/builds/5347

Producing the following diff:

diff --git a/gcc/config/vax/vax.opt.urls b/gcc/config/vax/vax.opt.urls
index c6b1c418b61..ca78b31dd4c 100644
--- a/gcc/config/vax/vax.opt.urls
+++ b/gcc/config/vax/vax.opt.urls
@@ -1,7 +1,13 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/vax/vax.opt and 
generated HTML
 
+; skipping UrlSuffix for 'md' due to finding no URLs
+
+; skipping UrlSuffix for 'md-float' due to finding no URLs
+
 ; skipping UrlSuffix for 'mg' due to finding no URLs
 
+; skipping UrlSuffix for 'mg-float' due to finding no URLs
+
 ; skipping UrlSuffix for 'mgnu' due to finding no URLs
 
 ; skipping UrlSuffix for 'munix' due to finding no URLs

I am not completely clear on why though. Since it seems you actually
did add documentation for exactly these options.

David, should the above diff just be checked in, or do we need to
investigate why the URLs weren't found?

Cheers,

Mark


[PATCH] tree-optimization/115236 - more points-to *ANYTHING = x fixes

2024-05-28 Thread Richard Biener
The stored-to ANYTHING handling has more holes, uncovered by treating
volatile accesses as ANYTHING.  We fail to properly build the
pred and succ graphs, in particular we may not elide direct nodes
from receiving from STOREDANYTHING.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115236
* tree-ssa-structalias.cc (build_pred_graph): Properly
handle *ANYTHING = X.
(build_succ_graph): Likewise.  Do not elide direct nodes
from receiving from STOREDANYTHING.

* gcc.dg/pr115236.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr115236.c | 12 
 gcc/tree-ssa-structalias.cc | 20 ++--
 2 files changed, 26 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr115236.c

diff --git a/gcc/testsuite/gcc.dg/pr115236.c b/gcc/testsuite/gcc.dg/pr115236.c
new file mode 100644
index 000..91edfab957a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115236.c
@@ -0,0 +1,12 @@
+/* { dg-do run } */
+/* { dg-options "-O -fno-tree-fre" } */
+
+int a, *b = &a;
+int main()
+{
+  int *c, *volatile *d = &c;
+  *d = b;
+  if (c != &a)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9cec2c6cfd9..330e64e65da 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -1312,7 +1312,12 @@ build_pred_graph (void)
{
  /* *x = y.  */
  if (rhs.offset == 0 && lhs.offset == 0 && rhs.type == SCALAR)
-   add_pred_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   {
+ if (lhs.var == anything_id)
+   add_pred_graph_edge (graph, storedanything_id, rhsvar);
+ else
+   add_pred_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   }
}
   else if (rhs.type == DEREF)
{
@@ -1398,7 +1403,12 @@ build_succ_graph (void)
   if (lhs.type == DEREF)
{
  if (rhs.offset == 0 && lhs.offset == 0 && rhs.type == SCALAR)
-   add_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   {
+ if (lhs.var == anything_id)
+   add_graph_edge (graph, storedanything_id, rhsvar);
+ else
+   add_graph_edge (graph, FIRST_REF_NODE + lhsvar, rhsvar);
+   }
}
   else if (rhs.type == DEREF)
{
@@ -1418,13 +1428,11 @@ build_succ_graph (void)
}
 }
 
-  /* Add edges from STOREDANYTHING to all non-direct nodes that can
- receive pointers.  */
+  /* Add edges from STOREDANYTHING to all nodes that can receive pointers.  */
   t = find (storedanything_id);
   for (i = integer_id + 1; i < FIRST_REF_NODE; ++i)
 {
-  if (!bitmap_bit_p (graph->direct_nodes, i)
- && get_varinfo (i)->may_have_pointers)
+  if (get_varinfo (i)->may_have_pointers)
add_graph_edge (graph, find (i), t);
 }
 
-- 
2.35.3


Re: [wwwdocs][patch] gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

2024-05-28 Thread FX Coudert
Seems good, thanks Tobias!

FX


[PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-28 Thread Richard Biener
Code generation for contiguous load vectorization can already deal
with generalized avoidance of loading from a gap.  The following
extends detection of peeling for gaps requirement with that,
gets rid of the old special casing of a half load and makes sure
when we do access the gap we have peeling for gaps enabled.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This is the first patch in a series to improve peeling for gaps,
it turned out into an improvement for code rather than just doing
the (delayed from stage3) removal of the "old" half-vector codepath.

I'll wait for the pre-CI testing for pushing so you also have time
for some comments.

Richard.

PR tree-optimization/115252
* tree-vect-stmts.cc (get_group_load_store_type): Enhance
detecting the number of cases where we can avoid accessing a gap
during code generation.
(vectorizable_load): Remove old half-vector peeling for gap
avoidance which is now redundant.  Add gap-aligned case where
it's OK to access the gap.  Add assert that we have peeling for
gaps enabled when we access a gap.

* gcc.dg/vect/slp-gap-1.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +
 gcc/tree-vect-stmts.cc| 58 +--
 2 files changed, 46 insertions(+), 30 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
new file mode 100644
index 000..36463ca22c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+typedef unsigned char uint8_t;
+typedef short int16_t;
+void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
+  for (int y = 0; y < 4; y++) {
+for (int x = 0; x < 4; x++)
+  diff[x + y * 4] = pix1[x] - pix2[x];
+pix1 += 16;
+pix2 += 32;
+  }
+}
+
+/* We can vectorize this without peeling for gaps and thus without epilogue,
+   but the only thing we can reliably scan is the zero-padding trick for the
+   partial loads.  */
+/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
vect64 } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a01099d3456..b26cc74f417 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
  dr_alignment_support alss;
  int misalign = dr_misalignment (first_dr_info, vectype);
  tree half_vtype;
+ poly_uint64 remain;
+ unsigned HOST_WIDE_INT tem, num;
  if (overrun_p
  && !masked_p
  && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
  vectype, misalign)))
   == dr_aligned
  || alss == dr_unaligned_supported)
- && known_eq (nunits, (group_size - gap) * 2)
- && known_eq (nunits, group_size)
- && (vector_vector_composition_type (vectype, 2, &half_vtype)
- != NULL_TREE))
+ && can_div_trunc_p (group_size
+ * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
+ nunits, &tem, &remain)
+ && (known_eq (remain, 0u)
+ || (constant_multiple_p (nunits, remain, &num)
+ && (vector_vector_composition_type (vectype, num,
+ &half_vtype)
+ != NULL_TREE
overrun_p = false;
 
  if (overrun_p && !can_overrun_p)
@@ -11533,33 +11539,14 @@ vectorizable_load (vec_info *vinfo,
unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
unsigned int vect_align
  = vect_known_alignment_in_bytes (first_dr_info, vectype);
-   unsigned int scalar_dr_size
- = vect_get_scalar_dr_size (first_dr_info);
-   /* If there's no peeling for gaps but we have a gap
-  with slp loads then load the lower half of the
-  vector only.  See get_group_load_store_type for
-  when we apply this optimization.  */
-   if (slp
-   && loop_vinfo
-   && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && gap != 0
-   && known_eq (nunits, (group_size - gap) * 2)
-   && known_eq (nunits, group_size)
-   && gap >= (vect_align / scalar_dr_size))
- {
-   tree half_vtype;
-   new_vtype
- = vector_vector_composition_type (vectype, 2,
- 

[PATCH] Avoid pessimistic constraints for asm memory constraints

2024-05-28 Thread Richard Biener
We process asm memory input/outputs with constraints to ESCAPED
but for this temporarily build an ADDR_EXPR.  The issue is that
the used build_fold_addr_expr ends up wrapping the ADDR_EXPR in
a conversion which ends up producing &ANYTHING constraints which
is quite bad.  The following uses get_constraint_for_address_of
instead, avoiding the temporary tree and the unhandled conversion.

This avoids a gcc.dg/tree-ssa/restrict-9.c FAIL with the fix
for PR115236.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-ssa-structalias.cc (find_func_aliases): Use
get_constraint_for_address_of to build escape constraints
for asm inputs and outputs.
---
 gcc/tree-ssa-structalias.cc | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 53552b63532..330e64e65da 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -5277,7 +5277,11 @@ find_func_aliases (struct function *fn, gimple *origt)
 
  /* A memory constraint makes the address of the operand escape.  */
  if (!allows_reg && allows_mem)
-   make_escape_constraint (build_fold_addr_expr (op));
+   {
+ auto_vec tmpc;
+ get_constraint_for_address_of (op, &tmpc);
+ make_constraints_to (escaped_id, tmpc);
+   }
 
  /* The asm may read global memory, so outputs may point to
 any global memory.  */
@@ -5306,7 +5310,11 @@ find_func_aliases (struct function *fn, gimple *origt)
 
  /* A memory constraint makes the address of the operand escape.  */
  if (!allows_reg && allows_mem)
-   make_escape_constraint (build_fold_addr_expr (op));
+   {
+ auto_vec tmpc;
+ get_constraint_for_address_of (op, &tmpc);
+ make_constraints_to (escaped_id, tmpc);
+   }
  /* Strictly we'd only need the constraint to ESCAPED if
 the asm clobbers memory, otherwise using something
 along the lines of per-call clobbers/uses would be enough.  */
-- 
2.35.3


[Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-28 Thread Andre Vehreschild
Hi all,

the attached patch fixes a memory leak with unlimited polymorphic return types.
The leak occurred, because an expression with side-effects was evaluated twice.
I have substituted the check for non-variable expressions followed by creating a
SAVE_EXPR with checking for trees with side effects and creating temp. variable
and freeing the memory.

Btw, I do not get the SAVE_EXPR in the old code. Is there something missing to
manifest it or is a SAVE_EXPR not meant to be evaluated twice?

Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?

This work is funded by the Souvereign Tech Fund. Yes, the funding has been
granted and Nicolas, Mikael and me will be working on some Fortran topics in
the next 12-18 months.

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From edd6c94b802732b0dd742ef9eca4d74aaaf6d91b Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 12 Jul 2023 16:52:15 +0200
Subject: [PATCH] Fix memory leak.

Prevent double call of function return class object
and free the object after copy.

gcc/fortran/ChangeLog:

	PR fortran/90069
	* trans-expr.cc (gfc_conv_procedure_call): Evaluate
	expressions with side-effects only ones and ensure
	old is freeed.

gcc/testsuite/ChangeLog:

	PR fortran/90069
	* gfortran.dg/class_76.f90: New test.
---
 gcc/fortran/trans-expr.cc  | 29 +--
 gcc/testsuite/gfortran.dg/class_76.f90 | 66 ++
 2 files changed, 92 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/class_76.f90

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index dfc5b8e9b4a..38ba278f725 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6725,9 +6725,32 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			{
 			  tree efield;

-			  /* Evaluate arguments just once.  */
-			  if (e->expr_type != EXPR_VARIABLE)
-parmse.expr = save_expr (parmse.expr);
+			  /* Evaluate arguments just once, when they have
+			 side effects.  */
+			  if (TREE_SIDE_EFFECTS (parmse.expr))
+{
+  tree cldata, zero;
+
+  parmse.expr = gfc_evaluate_now (parmse.expr,
+  &parmse.pre);
+
+  /* Prevent memory leak, when old component
+ was allocated already.  */
+  cldata = gfc_class_data_get (parmse.expr);
+  zero = build_int_cst (TREE_TYPE (cldata),
+			0);
+  tmp = fold_build2_loc (input_location, NE_EXPR,
+			 logical_type_node,
+			 cldata, zero);
+  tmp = build3_v (COND_EXPR, tmp,
+		  gfc_call_free (cldata),
+		  build_empty_stmt (
+		input_location));
+  gfc_add_expr_to_block (&parmse.finalblock,
+			 tmp);
+  gfc_add_modify (&parmse.finalblock,
+		  cldata, zero);
+}

 			  /* Set the _data field.  */
 			  tmp = gfc_class_data_get (var);
diff --git a/gcc/testsuite/gfortran.dg/class_76.f90 b/gcc/testsuite/gfortran.dg/class_76.f90
new file mode 100644
index 000..1ee1e1fc25f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_76.f90
@@ -0,0 +1,66 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+!
+! PR fortran/90069
+!
+! Contributed by Brad Richardson  
+!
+
+program returned_memory_leak
+implicit none
+
+type, abstract :: base
+end type base
+
+type, extends(base) :: extended
+end type extended
+
+type :: container
+class(*), allocatable :: thing
+end type
+
+call run()
+contains
+subroutine run()
+type(container) :: a_container
+
+a_container = theRightWay()
+a_container = theWrongWay()
+end subroutine
+
+function theRightWay()
+type(container) :: theRightWay
+
+class(base), allocatable :: thing
+
+allocate(thing, source = newAbstract())
+theRightWay = newContainer(thing)
+end function theRightWay
+
+function theWrongWay()
+type(container) :: theWrongWay
+
+theWrongWay = newContainer(newAbstract())
+end function theWrongWay
+
+function  newAbstract()
+class(base), allocatable :: newAbstract
+
+allocate(newAbstract, source = newExtended())
+end function newAbstract
+
+function newExtended()
+type(extended) :: newExtended
+end function newExtended
+
+function newContainer(thing)
+class(*), intent(in) :: thing
+type(container) :: newContainer
+
+allocate(newContainer%thing, source = thing)
+end function newContainer
+end program returned_memory_leak
+
+! { dg-final { scan-tree-dump-times "newabstract" 14 "original" } }
+! { dg-final { scan-tree-dump-times "__builtin_free" 8 "original" } }
+
--
2.45.1



Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
>
> Hi,
>
> on 2024/5/27 20:54, Richard Biener wrote:
> > On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:
> >>
> >> Hi,
> >>   This patch adds an optab for __builtin_isfinite. The finite check can be
> >> implemented on rs6000 by a single instruction. It needs an optab to be
> >> expanded to the certain sequence of instructions.
> >>
> >>   The subsequent patches will implement the expand on rs6000.
> >>
> >>   Compared to previous version, the main change is to specify acceptable
> >> modes for the optab.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
> >>
> >>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> >> regressions. Is this OK for trunk?
> >>
> >> Thanks
> >> Gui Haochen
> >>
> >> ChangeLog
> >> optab: Add isfinite_optab for isfinite builtin
> >>
> >> gcc/
> >> * builtins.cc (interclass_mathfn_icode): Set optab to 
> >> isfinite_optab
> >> for isfinite builtin.
> >> * optabs.def (isfinite_optab): New.
> >> * doc/md.texi (isfinite): Document.
> >>
> >>
> >> patch.diff
> >> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> >> index f8d94c4b435..b8432f84020 100644
> >> --- a/gcc/builtins.cc
> >> +++ b/gcc/builtins.cc
> >> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
> >>errno_set = true; builtin_optab = ilogb_optab; break;
> >>  CASE_FLT_FN (BUILT_IN_ISINF):
> >>builtin_optab = isinf_optab; break;
> >> -case BUILT_IN_ISNORMAL:
> >>  case BUILT_IN_ISFINITE:
> >> +  builtin_optab = isfinite_optab; break;
> >> +case BUILT_IN_ISNORMAL:
> >>  CASE_FLT_FN (BUILT_IN_FINITE):
> >>  case BUILT_IN_FINITED32:
> >>  case BUILT_IN_FINITED64:
> >> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> >> index 5730bda80dc..67407fad37d 100644
> >> --- a/gcc/doc/md.texi
> >> +++ b/gcc/doc/md.texi
> >> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
> >> with operand 2.
> >>
> >>  This pattern is not allowed to @code{FAIL}.
> >>
> >> +@cindex @code{isfinite@var{m}2} instruction pattern
> >> +@item @samp{isfinite@var{m}2}
> >> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> >> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> >
> > It should probably say scalar floating-point mode?  But what about the 
> > result?
> > Is any integer mode OK?  That's esp. important if this might be used on
> > vector modes.
> >
> >> +otherwise.
> >> +
> >> +If this pattern @code{FAIL}, a call to the library function
> >> +@code{isfinite} is used.
> >
> > Or it's otherwise inline expanded?  Or does this imply targets
> > have to make sure to implement the pattern when isfinite is
> > not available in libc/libm?  I suggest to leave this sentence out,
> > we usually only say when a pattern may _not_ FAIL (and usually
> > FAILing isn't different from not providing a pattern).
>
> As Haochen's previous reply, I think there are three cases:
>   1) no optab defined, fold in a generic way;
>   2) optab defined, SUCC, expand as what it defines;
>   3) optab defined, FAIL, generate a library call;
>
> From above, I had the concern that ports may assume FAILing can
> fall back with the generic folding, but it's not actually.

Hmm, but it should.  Can you make that work?

> Does your comment imply ports usually don't make such assumption
> (or they just check what happens for FAIL)?
>
> BR,
> Kewen
>
> >
> >>  @end table
> >>
> >>  @end ifset
> >> diff --git a/gcc/optabs.def b/gcc/optabs.def
> >> index ad14f9328b9..dcd77315c2a 100644
> >> --- a/gcc/optabs.def
> >> +++ b/gcc/optabs.def
> >> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
> >>  OPTAB_D (hypot_optab, "hypot$a3")
> >>  OPTAB_D (ilogb_optab, "ilogb$a2")
> >>  OPTAB_D (isinf_optab, "isinf$a2")
> >> +OPTAB_D (isfinite_optab, "isfinite$a2")
> >>  OPTAB_D (issignaling_optab, "issignaling$a2")
> >>  OPTAB_D (ldexp_optab, "ldexp$a3")
> >>  OPTAB_D (log10_optab, "log10$a2")
>
>
>


Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Richard Biener
On Tue, May 28, 2024 at 1:38 PM Alexander Monakov  wrote:
>
>
> On Tue, 28 May 2024, Richard Biener wrote:
>
> > > I am a bit confused what you mean by "cheaper". Could it be that we are 
> > > not
> > > on the same page regarding the machine code behind client requests?
> >
> > Probably "cheaper" in register usage.
>
> But it doesn't matter considering that execution under Valgrind is about 40x
> slower than native. The intended use is that the project is rebuilt with
> this instrumentation, run under Valgrind, then discarded.
>
> Here's an argument against inlining: it makes breakpointing on the helper
> possible. And it may be actually necessary.
>
> > I also wondered if valgrind is happy with these when applied to stack space
> > allocated in the caller?  Is there means to verify valgrind picks them up
> > appropriately (as opposed to simply ignore them)?
>
> Yes, it works. Exercising this scenario under gcc.dg does not seem easy, 
> though.
>
> > No idea ;)  But the same argument applies when libgcc from newer compilers
> > suddenly change that "ABI" because the valgrind version built against 
> > changes?
>
> This was raised previously with Jakub. I find it implausible that Valgrind
> folks will make incompatible changes to the client request ABI (they know to
> keep old requests working when ehnancing the interface).

OK, I see.

> > > What about linking a new library with that helper?
> >
> > I guess that would work for me (a static library, that is).  Ideally 
> > valgrind
> > itself would provide it so it's clear its tied to the valgrind version 
> > rather
> > than to a GCC version.
>
> How about packaging all of this separately as a plugin?

Well, sure - but of course I think our plugin API is broken and I rather have
such feature in-tree.  It possibly makes sense for _valgrind_ to host such
a plugin, not so much for GCC itself (because then, just build it in-tree).

As said, I'm nervous about libgcc, everything else is OK I think (didn't look
into the pass in detail yet, but I trust you here).

Richard.

> Thanks.
> Alexander


Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt  wrote:
>
> > IMO, there is no need for CONST_INT_P condition, we should also allow
> > symbol_ref, label_ref and const (all allowed by
> > x86_64_immediate_operand predicate), these all decay to an immediate
> > value.
>
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> so for MEM (reg) and MEM (reg + 4), the former costs 5,
> the latter costs 9, it is not accurate for x86. Ideally
> address_cost should be used, but it reduce cost too much.
> So current solution is make constant disp as cheap as possible.
>
> gcc/ChangeLog:
>
> PR target/67325
> * config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
> + imm) to "cost of MEM (A)" + 1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr67325.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 18 +-
>  gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..85d87b9f778 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22194,7 +22194,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> -*total += 1;
> +   {
> + *total += 1;
> + rtx addr = XEXP (x, 0);
> + /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> +so for MEM (reg) and MEM (reg + 4), the former costs 5,
> +the latter costs 9, it is not accurate for x86. Ideally
> +address_cost should be used, but it reduce cost too much.
> +So current solution is make constant disp as cheap as possible.  
> */
> + if (GET_CODE (addr) == PLUS
> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> +   {
> + *total += 1;
> + *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> + return true;
> +   }
> +   }
> +
>return false;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
> b/gcc/testsuite/gcc.target/i386/pr67325.c
> new file mode 100644
> index 000..c3c1e4c5b4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67325.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
> +
> +int f(long*l){
> +  return *l>>32;
> +}
> --
> 2.31.1
>


Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Alexander Monakov


On Tue, 28 May 2024, Richard Biener wrote:

> > I am a bit confused what you mean by "cheaper". Could it be that we are not
> > on the same page regarding the machine code behind client requests?
> 
> Probably "cheaper" in register usage.

But it doesn't matter considering that execution under Valgrind is about 40x
slower than native. The intended use is that the project is rebuilt with
this instrumentation, run under Valgrind, then discarded.

Here's an argument against inlining: it makes breakpointing on the helper
possible. And it may be actually necessary.

> I also wondered if valgrind is happy with these when applied to stack space
> allocated in the caller?  Is there means to verify valgrind picks them up
> appropriately (as opposed to simply ignore them)?

Yes, it works. Exercising this scenario under gcc.dg does not seem easy, though.

> No idea ;)  But the same argument applies when libgcc from newer compilers
> suddenly change that "ABI" because the valgrind version built against changes?

This was raised previously with Jakub. I find it implausible that Valgrind
folks will make incompatible changes to the client request ABI (they know to
keep old requests working when ehnancing the interface).

> > What about linking a new library with that helper?
> 
> I guess that would work for me (a static library, that is).  Ideally valgrind
> itself would provide it so it's clear its tied to the valgrind version rather
> than to a GCC version.

How about packaging all of this separately as a plugin?

Thanks.
Alexander


  1   2   >