Re: [PATCH v8] Introduce attribute sym_alias

2024-05-24 Thread Alexandre Oliva
On Dec  5, 2023, Alexandre Oliva  wrote:

> Here's an improved version that fixes some cases of making static local
> names visible through sym_alias, detection of symbol name clashes when
> sym_alias is registered before a clashing definition ("sym name"
> attributes are now introduced to enable sym_alias-created declarations
> to be identified), and aliases to typeinfo sym_alias names, with tests
> adjusted and extended to match.

> I've retained comments for the desired create_alias calls that didn't
> work, instead of create_same_body_alias for functions, and for the
> create_extra_name_alias calls I used to issue for variables, mainly to
> draw attention to the fact that some of these calls, found undesirable
> in earlier iterations, are still there, hoping that we can keep them
> this way rather than work out some other way to introduce this feature.

> Regstrapped on x86_64-linux-gnu, also tested on arm-eabi.  Ok to
> install?

Ping?

Refreshed (minor conflict resolution) for GCC 15, retested.


Introduce attribute sym_alias

This patch introduces an attribute to add extra asm names (aliases)
for a decl when its definition is output.  The main goal is to ease
interfacing C++ with Ada, as C++ mangled names have to be named, and
in some cases (e.g. when using stdint.h typedefs in function
arguments) the symbol names may vary across platforms.

The attribute is usable in C and C++, presumably in all C-family
languages.  It can be attached to global variables and functions, and
also to local static variables.  In C++, it can also be attached to
class types, namespace-scoped variables and functions, static data
members, member functions, explicit instantiations and specializations
of template functions, members and classes.

When applied to constructors or destructor, additional sym aliases
with _Base and _Del suffixes are defined for variants other than
complete-object ones.  This changes the assumption that clones always
carry the same attributes as their abstract declarations, so there is
now a function to adjust them.

C++ also had a bug in which attributes from local extern declarations
failed to be propagated to a preexisting corresponding
namespace-scoped decl.  I've fixed that, and adjusted acc tests that
distinguished between C and C++ in this regard.

Applying the attribute to class types is only valid in C++, and the
effect is to attach the alias to the RTTI object associated with the
class type.


for  gcc/ChangeLog

* attribs.cc: Include cgraph.h.
(decl_attributes): Allow late introduction of sym_alias in
types.
(create_sym_alias_decl, create_sym_alias_decls): New.
* attribs.h: Declare them.
(FOR_EACH_SYM_ALIAS): New macro.
* cgraph.cc (cgraph_node::create): Create sym_alias decls.
* varpool.cc (varpool_node::get_create): Create sym_alias
decls.
* cgraph.h (symtab_node::remap_sym_alias_target): New.
* symtab.cc (symtab_node::remap_sym_alias_target): Define.
(symbol_table::insert_to_assembler_name_hash): Check for
symbol name clashes.
(symtab_node::noninterposable_alias): Drop sym_alias
attributes.
* cgraphunit.cc (cgraph_node::analyze): Create alias_target
node if needed.
(analyze_functions): Fixup visibility of implicit alias only
after its node is analyzed.
* doc/extend.texi (sym_alias): Document for variables,
functions and types.

for  gcc/ada/ChangeLog

* doc/gnat_rm/interfacing_to_other_languages.rst: Mention
attribute sym_alias to give RTTI symbols mnemonic names.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
aliases.  Fix incorrect ref to C1 ctor variant.

for  gcc/c-family/ChangeLog

* c-ada-spec.cc (pp_asm_name): Use first sym_alias if
available.
* c-attribs.cc (handle_sym_alias_attribute): New.
(c_common_attribute_table): Add sym_alias.
(handle_copy_attribute): Do not copy sym_alias attribute.

for  gcc/c/ChangeLog

* c-decl.cc (duplicate_decls): Remap sym_alias target.
(finish_decl): Create varpool_node for local static
variables.

for  gcc/cp/ChangeLog

* class.cc (adjust_clone_attributes): New.
(copy_fndecl_with_name, build_clone): Call it.
* cp-tree.h (adjust_clone_attributes): Declare.
(update_sym_alias_interface): Declare.
(update_tinfo_sym_alias): Declare.
* decl.cc (duplicate_decls): Remap sym_alias target.
Adjust clone attributes.
(grokfndecl): Tentatively create sym_alias decls after
adding attributes in e.g. a template member function explicit
instantiation.
* decl2.cc (cplus_decl_attributes): Update tinfo sym_alias.
(copy_interface, update_sym_alias_interface): New.
(determine_visibility): Update sym_alias interface.
(tentative_decl_linkage, import_export_decl): Likewise.
* 

Re: [RFC/RFA][PATCH 00/12] CRC optimization

2024-05-24 Thread Jeff Law




On 5/24/24 2:41 AM, Mariam Arutunian wrote:

Hello!

This patch set detects bitwise CRC implementation loops (with branches) 
in the GIMPLE optimizers and replaces them with more optimal CRC 
implementations in RTL. These patches introduce new internal functions, 
built-in functions, and expanders for CRC generation, leveraging 
hardware instructions where available. Additionally, various tests are 
included to check CRC detection and generation.
Thanks so much for getting this process started.  It's a bit quicker 
than I was ready, but no worries.





 2.

Architecture-Specific Expanders:

  * Expanders are added for RISC-V, aarch64, and i386 architectures.
  * These expanders generate CRCs using either carry-less
multiplication instructions or direct CRC instructions, based on
the target architecture's capabilities.
Also note for the wider audience, this work can also generate table 
lookup based CRC implementations.  This has proven exceedingly helpful 
during the testing phase as we were able to run this code on a wide 
variety of the embedded targets to shake out target dependencies.


On Ventana's V1 design the clmul variant was a small, but clear winner 
over the table lookup.  Obviously the bitwise implementation found in 
coremark was the worst performing.


On our V2 design clmul outperforms the table lookup by a wide margin, 
largely due to reduced latency of clmul.



Jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:50:52 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 5:43 PM, Palmer Dabbelt wrote:



I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned
systems if they ever show up.

So if Robin's OK with re-spinning things, let's just go that way?

Works for me.  Hopefully he's offline until Monday as it's rather late
for him :-)  So we'll pick it back up in the Tuesday meeting.


Cool, no rush on my end.



jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 5:43 PM, Palmer Dabbelt wrote:



I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids 
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned 
systems if they ever show up.


So if Robin's OK with re-spinning things, let's just go that way?
Works for me.  Hopefully he's offline until Monday as it's rather late 
for him :-)  So we'll pick it back up in the Tuesday meeting.


jeff



Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:41:39 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 5:39 PM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that
using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned
accesses are supported, but nothing about the performance.

I think it was in the vector docs.  It didn't say anything about
performance, just a note that scalar & vector behavior could differ.


Either way, the split naming scheme seems clearer to me.  It also avoids 
getting mixed up by the no-scalar-misaligned, yes-vector-misaligned 
systems if they ever show up.


So if Robin's OK with re-spinning things, let's just go that way?


Seems reasonable to me.  Just having a regular naming scheme for the
scalar/vector makes it clear what we're doing, and it's not like having
the extra name for -mscalar-strict-align really costs anything.

That was my thinking -- get the names right should help avoid confusion.

Jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 5:39 PM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that 
using

-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned 
accesses are supported, but nothing about the performance.
I think it was in the vector docs.  It didn't say anything about 
performance, just a note that scalar & vector behavior could differ.






Seems reasonable to me.  Just having a regular naming scheme for the 
scalar/vector makes it clear what we're doing, and it's not like having 
the extra name for -mscalar-strict-align really costs anything.

That was my thinking -- get the names right should help avoid confusion.

Jeff


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 16:31:48 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to
encode this: either we treat scalar and vector as independent, or we
couple them.  If we treat them independently then we end up with four
cases, it's not clear if they're all interesting.  IIUC with this patch
we'd be able to encode

Given the ISA documents them as independent, I think we should follow
suit and allow them to vary independently.


I'm only reading Zicclsm as saying both scalar and vector misaligned 
accesses are supported, but nothing about the performance.



* -mstrict-align: Both scalar and vector misaligned accesses are
  unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if
  there's hardware there, but given we have systems that don't support
  scalar misaligned accesses it seems reasonable to assume they'll also
  not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are
  supported, vector misaligned aren't supported.  This matches our best
  theory of how the k230 and k1 behave, so it also seems reasonable to
  support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector
  misaligned accesses are supported.  This seems reasonable to support
  as it's how I'd hope big cores end up being designed, though again
  there's no hardware.

I'd almost lean towards -m[no-]scalar-strict-align and
-m[no-]vector-strict-align and deprecate -mstrict-align (aliasing it to
the scalar alignment option).  But I'll go with consensus here.


Seems reasonable to me.  Just having a regular naming scheme for the 
scalar/vector makes it clear what we're doing, and it's not like having 
the extra name for -mscalar-strict-align really costs anything.



The fourth case is kind of wacky: scalar misaligned is unsupported,
vector misaligned is supported.  I'm not really sure why we'd end up
with a system like that, but HW vendors do wacky things so it's kind of
hard to predict.

I've worked on one of these :-)  The thinking from the designers was
unaligned scalar access just wasn't that important, particularly with
mem* and str* using the vector rather than scalar ops.


OK then ;)


Re: [PATCH] RISC-V: Avoid splitting store dataref groups during SLP discovery

2024-05-24 Thread Jeff Law




On 5/23/24 11:52 PM, Richard Biener wrote:



This worked out so I pushed the change.  The gcc.dg/vect/pr97428.c
test is FAILing on RISC-V (it still gets 0 SLP), because of missed
load permutations.  I hope the followup reorg for the load side will
fix this.  It also FAILs gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c
which does excessive assembly scanning on many functions - I'll leave
this for target maintainers to update - there's one or two functions
which we now expect to SLP.
Yea, folks got a bit carried away with the scan body capability. 
Someone will have to follow-up behind you and clean this up a bit.


Thanks for checking it agains the CI system.  While it's a bit on the 
slow side, we are finding its helping catch real issues and keeping the 
testsuite cleaner WRT FAILs.


jeff



Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Jeff Law




On 5/24/24 11:14 AM, Palmer Dabbelt wrote:

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to 
encode this: either we treat scalar and vector as independent, or we 
couple them.  If we treat them independently then we end up with four 
cases, it's not clear if they're all interesting.  IIUC with this patch 
we'd be able to encode
Given the ISA documents them as independent, I think we should follow 
suit and allow them to vary independently.




* -mstrict-align: Both scalar and vector misaligned accesses are 
  unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if 
  there's hardware there, but given we have systems that don't support 
  scalar misaligned accesses it seems reasonable to assume they'll also 
  not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are 
  supported, vector misaligned aren't supported.  This matches our best 
  theory of how the k230 and k1 behave, so it also seems reasonable to 
  support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector 
  misaligned accesses are supported.  This seems reasonable to support 
  as it's how I'd hope big cores end up being designed, though again 
  there's no hardware.
I'd almost lean towards -m[no-]scalar-strict-align and 
-m[no-]vector-strict-align and deprecate -mstrict-align (aliasing it to 
the scalar alignment option).  But I'll go with consensus here.




The fourth case is kind of wacky: scalar misaligned is unsupported, 
vector misaligned is supported.  I'm not really sure why we'd end up 
with a system like that, but HW vendors do wacky things so it's kind of 
hard to predict.
I've worked on one of these :-)  The thinking from the designers was 
unaligned scalar access just wasn't that important, particularly with 
mem* and str* using the vector rather than scalar ops.


jeff





[pushed] wwwdocs: gcc-13: Run time instead of runtime

2024-05-24 Thread Gerald Pfeifer
Per our codingconventions.html 

Gerald
---
 htdocs/gcc-13/changes.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index d431c768..2702170d 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -62,9 +62,9 @@ You may also want to check out our
   includes the iostream header. This change
   improves the start-up performance of C++ programs, but it means that
   code compiled with GCC 13.1 will crash if the correct version of
-  libstdc++.so is not used at runtime. See the
+  libstdc++.so is not used at run time. See the
   https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dynamic_or_shared.html#manual.intro.using.linkage.dynamic;>documentation
-  about using the right libstdc++.so at runtime.
+  about using the right libstdc++.so at run time.
   Future GCC releases will mitigate the problem so that the program
   cannot be run at all with an older libstdc++.so.
 
-- 
2.45.0


Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-24 Thread Carl Love



On 5/13/24 22:14, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new overloaded instance for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instance.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
>>  overloaded built-in instance.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def | 2 ++
>>  gcc/doc/extend.texi   | 1 +
>>  2 files changed, 3 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 5912c9452f4..49962e2f2a2 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,8 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
> 
> This actually introduces the signed __int128, considering the other
> existing ones, I think we want both signed and unsigned.

Added unsigned as well.

> 
>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 86b8e536dbe..47cf2f3bc8b 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
>> char *);
>>  void vec_vsx_st (vector bool char, int, unsigned char *);
>>  void vec_vsx_st (vector bool char, int, signed char *);
>>  
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
> 
> Nit: Considering the existing ones sorted by element size descending, I guess
> it's better to move the above here (and with the explicit signed and 
> unsigned).

OK, moved the new prototype down below the float prototype and added the 
unsigned prototype.
> 
> And we need a test case for it as well?
Yes, we need a test case for both.  Added a new runnable test file.

   Carl 


Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:



>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> index 01f35dad713..35ea31b2616 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>> @@ -2,7 +2,6 @@
>>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>>  /* { dg-require-effective-target powerpc_vsx_ok } */
>>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
>> -/* { dg-final { scan-assembler "vperm" } } */
>>  /* { dg-final { scan-assembler "xvrdpi" } } */
>>  /* { dg-final { scan-assembler "xvrdpic" } } */
>>  /* { dg-final { scan-assembler "xvrdpim" } } */
>> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
>>  extern __vector __bool long bl[][4];
>>  #endif
>>  
>> -int do_perm(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to just relace these __builtin_vsx_vperm with vec_perm,
> OK with this tweaked (also keep the above removed vperm scan), thanks!

OK, sounds good.  Updated the patch to change built-in calls to vec_perm.  
Updated ChangeLog message to match change.
   
 Carl 


Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-24 Thread Carl Love
Kewen:

On 5/21/24 20:05, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/5/22 08:13, Carl Love wrote:
>> Kewen:



>>> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't 
>>> really
>>> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are 
>>> placed
>>> in altivec stanza, so it looks that we should put it under the section
>>> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an 
>>> extension
>>> of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
>>> extension of the @code{vec_sel} built-in documented in the PVIPR" and 
>>> omitting
>>> the description to avoid possible slightly different wording.
>>
>> Honestly, at this point in time I don't remember why I put it there.  It has 
>> been too long since I created the patch.  That said, the test case requires 
>> Power 10 do to the comparison check using built-in vec_all_eq but that is 
>> another issue.  
>> The built-in generates the xxsel instruction that is an ISA 2.06 
>> instruction.  So, I would say it should to into the ISA 2.06 section.  I 
>> moved it to the ISA 2.06 section.
> 
> But the underlying implementation is:
> 
>   const vsq __builtin_altivec_vsel_1ti (vsq, vsq, vuq);
> VSEL_1TI vector_select_v1ti {}
> 
>   const vuq __builtin_altivec_vsel_1ti_uns (vuq, vuq, vuq);
> VSEL_1TI_UNS vector_select_v1ti_uns {}
> 
> , it's under altivec stanza and can result with insn vsel (so not xxsel),
> vsel is ISA 2.03, so I think ISA 2.05 better matches the implementation.

OK, moved to ISA 2.05

> 



>>
>> Sounds like there was some issue that you noticed on 
>> r14-10011-g6e62ede7aaccc6.  The new version of
>> print_i128 should be functionally equivalent but perhaps is "safer"?
> 
> Thanks for checking!  Looking into this more closely, I realized you didn't 
> apply the previously
> adopted way for printing (the way used in 
> gcc.target/powerpc/builtins-6-p9-runnable.c), sorry for
> the false alarm!  So your supposed print_i128 is fine to me.

OK, no problem.  Will go with the original print_i128 function.

Carl 


Re: [PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-24 Thread Carl Love



On 5/13/24 22:26, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
>>
>> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
>> __builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
>> there are no test cases for it.  The patch removes built-in
>> __builtin_vsx_xvcmpeqsp_p.
> As the previous review comments in the v1 (this is actually v2):
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646728.html
> , both __builtin_vsx_xvcmpeqsp_p and __builtin_vsx_xvcmpeqsp can be
> dropped, so please consider __builtin_vsx_xvcmpeqsp as well.

Yes, as you noted, __builtin_vsx_xvcmpeqsp is removed in the next patch.
> 
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT):
>>  Remove case statement.
> 
> It seems you mixed this with some other patch, this line doesn't
> belong to this patch, ...

Took that out of this patch.  Didn't get the changes separated cleanly.

> 
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
>>  Remove built-in definition.
>> ---
>>  gcc/config/rs6000/rs6000-builtin.cc   | 6 --
>>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>>  2 files changed, 12 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
>> b/gcc/config/rs6000/rs6000-builtin.cc
>> index f83d65b06ef..74ed8fc1805 100644
>> --- a/gcc/config/rs6000/rs6000-builtin.cc
>> +++ b/gcc/config/rs6000/rs6000-builtin.cc
>> @@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, 
>> tree type_out,
>>  = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>>switch (fn)
>>  {
>> -case RS6000_BIF_RSQRTF:
>> -  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
>> -  && out_mode == SFmode && out_n == 4
>> -  && in_mode == SFmode && in_n == 4)
>> -return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP];
>> -  break;
> 
> ... and this ...

Ditto

> 
>>  case RS6000_BIF_RSQRT:
>>if (VECTOR_UNIT_VSX_P (V2DFmode)
>>&& out_mode == DFmode && out_n == 2
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index d65c858ac0c..2f6149edd5f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -917,9 +917,6 @@
>>fpmath vf __builtin_altivec_vrsqrtefp (vf);
>>  VRSQRTEFP rsqrtev4sf2 {}
>>  
>> -  fpmath vf __builtin_altivec_vrsqrtfp (vf);
>> -VRSQRTFP rsqrtv4sf2 {}
>> -
> 
> ..., also this.

Ditto

> 
> BR,
> Kewen
> 
>>const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc);
>>  VSEL_16QI vector_select_v16qi {}
>>  
>> @@ -1619,9 +1616,6 @@
>>const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>  XVCMPEQSP vector_eqv4sf {}
>>  
>> -  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
>> -XVCMPEQSP_P vector_eq_v4sf_p {pred}
>> -
>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>  XVCMPGEDP vector_gev2df {}
>>  


Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-05-24 Thread Carl Love
Kewen:

On 5/13/24 19:55, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:18, Carl Love wrote:
>> rs6000, remove the vec_xxsel built-ins, they are duplicates


>> -int do_sel(void)
>> -{
>> -  int i = 0;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
  ^ changed to ui
>> -  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
  ^ changed to ui
>> -  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
   ^ changed to uc
>> -  f[i][0] = __builtin_vsx_xxsel_4sf (f[i][1], f[i][2], f[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel_2df (d[i][1], d[i][2], d[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], bi[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], bs[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], bc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], bi[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], bl[i][3]); i++;
>> -
>> -  si[i][0] = __builtin_vsx_xxsel (si[i][1], si[i][2], ui[i][3]); i++;
>> -  ss[i][0] = __builtin_vsx_xxsel (ss[i][1], ss[i][2], us[i][3]); i++;
>> -  sc[i][0] = __builtin_vsx_xxsel (sc[i][1], sc[i][2], uc[i][3]); i++;
>> -  f[i][0] = __builtin_vsx_xxsel (f[i][1], f[i][2], ui[i][3]); i++;
>> -  d[i][0] = __builtin_vsx_xxsel (d[i][1], d[i][2], ul[i][3]); i++;
>> -
>> -  return i;
>> -}
>> -
> 
> I prefer to keep them but just replacing the call with vec_sel.
> 
> OK with the above nits tweaked, thanks.

OK, changed __builtin_vsx_xxsel_4si_* to vec_sel, changed__builtin_vsx_xxsel to 
vec_sel.
Had to add #include .

Finally, changed the third argument for the first three calls, as noted above, 
to be compatible with the vec_sel built-in specification.

   Carl

> 
> BR,
> Kewen
> 
>>  int do_perm(void)
>>  {
>>int i = 0;
> 


Re: [PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions

2024-05-24 Thread Carl Love
Keewn:

On 5/14/24 00:00, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, fix error in unsigned vector float to unsigned  int built-in 
>> definitions
>>
>> The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf
>> are supposed to take a vector of floats and return a vector of unsigned
>> long long ints.  The definitions are using the signed version of the
> 
> Sorry for nitpicking, here __builtin_vsx_vunsigned_v2df takes vector of 
> doubles
> and returns vector of unsigned long long ints while 
> __builtin_vsx_vunsigned_v4sf
> takes vector of floats and returns vector of unsigned ints.

That is not nitpicking, the description is wrong.  Changed float to double.
> 
>> instructions not the unsigned version of the instruction.  The results
>> should also be unsigned.  The builtins are used by the overloaded
>> vec_unsigned builtin which has an unsigned result.
>>
>> Similarly the built-ins __builtin_vsx_vunsignede_v2df and
>> __builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result.
> 
> Nit: s/retun/return/

Fixed.

> 
>> If the floating point argument is negative, the unsigned result is zero.
>> The built-ins are used in the overloaded built-in vec_unsignede and
>> vec_unsignedo respectively.
>>
>> Add a test cases for a negative floating point arguments for each of the
>> above built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
>>  __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
>>  __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: Add tests for
>>  vec_unsignede and vec_unsignedo with negative arguments.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 12 +-
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 23 ---
>>  2 files changed, 26 insertions(+), 9 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index c6d2ea1bc39..bf9a0ae22fc 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1580,16 +1580,16 @@
>>const vsi __builtin_vsx_vsignedo_v2df (vd);
>>  VEC_VSIGNEDO_V2DF vsignedo_v2df {}
>>  
>> -  const vsll __builtin_vsx_vunsigned_v2df (vd);
>> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
>> +  const vull __builtin_vsx_vunsigned_v2df (vd);
>> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
>>  
>> -  const vsi __builtin_vsx_vunsigned_v4sf (vf);
>> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
>> +  const vui __builtin_vsx_vunsigned_v4sf (vf);
>> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
>>  
>> -  const vsi __builtin_vsx_vunsignede_v2df (vd);
>> +  const vui __builtin_vsx_vunsignede_v2df (vd);
>>  VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
>>  
>> -  const vsi __builtin_vsx_vunsignedo_v2df (vd);
>> +  const vui __builtin_vsx_vunsignedo_v2df (vd);
>>  VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
>>  
>>const vf __builtin_vsx_xscvdpsp (double);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> index 0231a1fd086..6d4fe84c8a1 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
>> @@ -313,6 +313,15 @@ int main()
>>  test_unsigned_int_result (ALL, vec_uns_int_result,
>>vec_uns_int_expected);
>>  
>> +/* Convert single precision float to  unsigned int.  Negative
>> +   arguments
>> + */
>> +vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
>> +vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
>> +vec_uns_int_result = vec_unsigned (vec_flt0);
>> +test_unsigned_int_result (ALL, vec_uns_int_result,
>> +  vec_uns_int_expected);
>> +
>>  /* Convert double precision float to long long unsigned int */
>>  vec_dble0 = (vector double){124.930, 8134.49};
>>  vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
>> @@ -321,9 +330,9 @@ int main()
>>   vec_ll_uns_int_expected);
> 
> Nit: Similar coverage on negative for vector double can be added here.

Added.

  Carl


Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 00:53, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>> Add new overloaded specifications.
>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added
>> overloaded built-ins.
> 
> This part is missing, there are no test case changes in this patch.

Yes, the new tests are missing.  Not sure what happened to them.  Fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |  6 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 
>>  gcc/config/rs6000/vsx.md  | 23 +++
>>  gcc/doc/extend.texi   | 13 +
>>  4 files changed, 50 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..5b7237a2327 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1709,9 +1709,15 @@
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>>  XVCVSPSXDS vsx_xvcvspsxds {}
>>  
>> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);
>> +XVCVSPSXDSO vsx_xvcvspsxds_low {}
>> +
>>const vsll __builtin_vsx_xvcvspuxds (vf);
>>  XVCVSPUXDS vsx_xvcvspuxds {}
> 
> This existing should return with type vull, ...

Fixed.

> 
>>  
>> +  const vsll __builtin_vsx_xvcvspuxds_low (vf);
>> +XVCVSPUXDSO vsx_xvcvspuxds_low {}
> 
> ... so this copied one should be vull too.

Fixed.

> 
> As the existing instances for vec_signed and vec_unsigned are with
> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
> with similar style, maybe something like:
> 
> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
 XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
 XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
 XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF

NEED TO ADDRESS RESPONSE TO QUESTION I ASKED.

> 
>>const vsi __builtin_vsx_xvcvspuxws (vf);
>>  XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
>>  > diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 84bd9ae6554..68501c05289 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3307,10 +3307,14 @@
>>  [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>>vsi __builtin_vec_vsignede (vd);
>>  VEC_VSIGNEDE_V2DF
>> +  vsll __builtin_vec_vsignede (vf);
>> +XVCVSPSXDS
>>  
>>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>>vsi __builtin_vec_vsignedo (vd);
>>  VEC_VSIGNEDO_V2DF
>> +  vsll __builtin_vec_vsignedo (vf);
>> +XVCVSPSXDSO
>>  
>>  [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
>>vsi __builtin_vec_signexti (vsc);
>> @@ -4433,10 +4437,14 @@
>>  [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
>>vui __builtin_vec_vunsignede (vd);
>>  VEC_VUNSIGNEDE_V2DF
>> +  vull __builtin_vec_vunsignede (vf);
>> +XVCVSPUXDS
>>  
>>  [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
>>vui __builtin_vec_vunsignedo (vd);
>>  VEC_VUNSIGNEDO_V2DF
>> +  vull __builtin_vec_vunsignedo (vf);
>> +XVCVSPUXDSO
>>  
> As above, the name can be tweaked.

Fixed.

> 
>>  [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
>>vui __builtin_vec_extract_exp (vf);
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index f135fa079bd..3d39ae7995f 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -2704,6 +2704,29 @@
>>DONE;
>>  })
>>  
>> +;; Convert low vector elements of 32-bit floating point numbers to vector of
>> +;; 64-bit signed/unsigned integers.
>> +(define_expand "vsx_xvcvspxds_low"
>> +  [(match_operand:V2DI 0 "vsx_register_operand")
>> +   (match_operand:V4SF 1 "vsx_register_operand")
>> +   (any_fix (pc))]
>> +  "VECTOR_UNIT_VSX_P (V2DFmode)"
>> +{
>> +  /* Shift left one word to put even word in correct location */
>> +  rtx rtx_tmp;
>> +  rtx rtx_val = GEN_INT (4);
>> +  rtx_tmp = gen_reg_rtx (V4SFmode);
>> +  emit_insn (gen_altivec_vsldoi_v4sf (rtx_tmp, operands[1], 

Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/14/24 01:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/4/20 05:17, Carl Love wrote:
>> rs6000, Remove __builtin_vsx_xvcvspsxws built-in
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> This patch removes the redundant built-in.
> 
> By revisiting the comments on the previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html

The comments from the previous version:
-
   I think we should recommend users to adopt the recommended built-ins in
   PVIPR, by checking the corresponding mnemonic in PVIPR, I got:

   __builtin_vsx_xvcvspsxws -> vec_signed
   __builtin_vsx_xvcvspsxds -> N/A
   __builtin_vsx_xvcvspuxds -> N/A
   __builtin_vsx_xvcvdpsxws -> vec_signed{e,o}
   __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o}
   __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned
   __builtin_vsx_xvcvspdp   -> vec_double{e,o}
   __builtin_vsx_xvcvdpsp   -> vec_float{e,o}
   __builtin_vsx_xvcvspuxws -> vec_unsigned
   __builtin_vsx_xvcvsxwdp  -> vec_double{e,o}
   __builtin_vsx_xvcvuxddp_uns> vec_double

   For __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds which don't have
   the according PVIPR built-ins, we can extend the current vec_{un,}signed{e,o}
   to cover them and document them following the section mentioning PVIPR.

are handled by multiple patches in the new series.  The main comment on the 
previous patch series was to remove most of the built-ins as they were 
redundant.  So, basically most of the patches in the previous series were 
thrown out and a new series to remove the built-ins in the current series.


That all said, I distinctly remember addressing each of the above built-ins.  
The work on the series got
interrupted a couple of times and it looks like some of the patches to address 
the above got lost.  My bad.
The following is a list of which patch takes care of removing the duplicate 
built-ins.

__builtin_vsx_xvcvspsxws patch 2 removes this built-in
__builtin_vsx_xvcvspsxds -> N/A  patch 4 extends vec_{un,}signede 
to cover this built-in,
 Built-in used in 
rs6000-overload.def.  Built-in now for   
 internal use only.
__builtin_vsx_xvcvspuxds -> N/A  patch 4 extends vec_{un,}signedo 
to cover this built-in.
 Built-in used in 
rs6000-overload.def.  Built-in now for
 internal use only 


__builtin_vsx_xvcvdpsxws -> vec_signed{e,o}   removed in patch 4
__builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} removed in patch 4

__builtin_vsx_xvcvdpuxds_uns -> vec_unsigned  remove in patch 4
__builtin_vsx_xvcvspuxws -> vec_unsigned  remove in patch 4

The following will changes will be put into a new patch when the series is 
reposted.  It appears they
got lost in the current series.  My bad.

__builtin_vsx_xvcvspdp   -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvdpsp   -> vec_float{e,o}remove in new patch number 5

__builtin_vsx_xvcvsxwdp  -> vec_double{e,o}   remove in new patch number 5
__builtin_vsx_xvcvuxddp_uns> vec_double   remove in new patch number 5

> 
> I wonder if it's intentional to keep the others, at least bifs
> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and
> __builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the
> equivalent ones in PVIPR.  And for the others, users can still use
> the PVIPR ones by considering endianness (controlling with endianness
> macros).
> 

Hopefully that makes it clearer where the various changes are.   

The next series will add a new patch 5 in the series.  The remaining patches in 
this series, patches 5, 6, ... will get moved to patch 6, 7, ... in the next 
posting of the built-in cleanup patch series.

Carl 


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Robin Dapp
> * -mstrict-align: Both scalar and vector misaligned accesses are
> unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if
> there's hardware there, but given we have systems that don't support
> scalar misaligned accesses it seems reasonable to assume they'll also
> not support vector misaligned accesses.

As a data point, and contrary to what I said/hoped before:  There are
examples where -mstrict-align and -mrvv-allow-misalign vectorizes
code and produces unaligned vector accesses.  I haven't looked into
that area of the vectorizer for a while but it doesn't appear as
if we regard STRICT_ALIGNMENT there at all.
We keep track of the known misalignments (via peeling etc.) and either
handle them via movmisalign or give up.  Same for unknown misalignment
but all unaffected by -mstrict-align.

We could have -mrvv-allow-misalign have an "| STRICT_ALIGNMENT" to
get to the behavior you described but right now it's not like that.
And AFAICT -mstrict-align behaves the same way for other targets,
regardless if they support unaligned vector accesses or not.

So, right now, I'd tend towards describing that both flags are
independent and affect either only scalar or only vector code.
Maybe we should rename the whole thing to -mrvv-strict-align?
Might make it even more confusing, though. 

Regards
 Robin


[PATCH 3/3] openmp: Add support for iterators in to/from clauses (C/C++)

2024-05-24 Thread Kwok Cheung Yeung
This patch extends the previous patch to cover to/from clauses in 
'target update'.From 99addc124535307b50fbdeb66c4f90bb0cbeb041 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Mon, 15 Apr 2024 13:50:22 +0100
Subject: [PATCH 3/3] openmp: Add support for iterators in to/from clauses
 (C/C++)

This adds support for iterators in 'to' and 'from' clauses in the
'target update' OpenMP directive.

2024-05-24  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_from_to): Parse 'iterator' modifier.

gcc/cp/
* parser.cc (cp_parser_omp_clause_from_to): Parse 'iterator' modifier.

gcc/
* gimplify.cc (gimplify_omp_map_iterators): Gimplify iterators in
to/from clauses.
(gimplify_scan_omp_clauses): Call gimplify_omp_map_iterators once to
handle clauses with iterators, then skip subsequent iterator clauses.
* omp-low.cc (scan_sharing_clauses): Skip firstprivate handling for
to/from clauses with iterators.
(lower_omp_target): Handle kinds for to/from clauses with iterators.
* tree-pretty-print.cc (dump_omp_clause): Call dump_omp_map_iterators
for to/from clauses with iterators.

gcc/testsuite/
* c-c++-common/gomp/target-update-iterator-1.c: New.
* c-c++-common/gomp/target-update-iterator-2.c: New.
* c-c++-common/gomp/target-update-iterator-3.c: New.

libgomp/
* target.c (gomp_update): Call gomp_merge_iterator_maps.  Free
allocated variables.
* testsuite/libgomp.c-c++-common/target-update-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-update-iterators-3.c: New.
---
 gcc/c/c-parser.cc | 105 ++--
 gcc/cp/parser.cc  | 116 --
 gcc/gimplify.cc   |  17 ++-
 gcc/omp-low.cc|  24 +++-
 .../gomp/target-update-iterator-1.c   |  20 +++
 .../gomp/target-update-iterator-2.c   |  17 +++
 .../gomp/target-update-iterator-3.c   |  17 +++
 gcc/tree-pretty-print.cc  |  20 ++-
 libgomp/target.c  |  12 ++
 .../target-update-iterators-1.c   |  65 ++
 .../target-update-iterators-2.c   |  57 +
 .../target-update-iterators-3.c   |  66 ++
 12 files changed, 509 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterator-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterator-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/target-update-iterator-3.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-2.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-update-iterators-3.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 2281148561c..6353b15d64f 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -19185,8 +19185,11 @@ c_parser_omp_clause_device_type (c_parser *parser, 
tree list)
to ( variable-list )
 
OpenMP 5.1:
-   from ( [present :] variable-list )
-   to ( [present :] variable-list ) */
+   from ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+   to ( [motion-modifier[,] [motion-modifier[,]...]:] variable-list )
+
+   motion-modifier:
+ present | iterator (iterators-definition)  */
 
 static tree
 c_parser_omp_clause_from_to (c_parser *parser, enum omp_clause_code kind,
@@ -19197,15 +19200,88 @@ c_parser_omp_clause_from_to (c_parser *parser, enum 
omp_clause_code kind,
   if (!parens.require_open (parser))
 return list;
 
+  int pos = 1, colon_pos = 0;
+  int iterator_length = 0;
+  while (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_NAME)
+{
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type
+ == CPP_OPEN_PAREN)
+   {
+ unsigned int n = pos + 2;
+ if (c_parser_check_balanced_raw_token_sequence (parser, )
+&& (c_parser_peek_nth_token_raw (parser, n)->type
+== CPP_CLOSE_PAREN))
+   {
+ iterator_length = n - pos + 1;
+ pos = n;
+   }
+   }
+  if (c_parser_peek_nth_token_raw (parser, pos + 1)->type == CPP_COMMA)
+   pos += 2;
+  else
+   pos++;
+  if (c_parser_peek_nth_token_raw (parser, pos)->type == CPP_COLON)
+   {
+ colon_pos = pos;
+ break;
+   }
+}
+
   bool present = false;
-  c_token *token = c_parser_peek_token (parser);
+  tree iterators = NULL_TREE;
 
-  if (token->type == CPP_NAME
-  && strcmp (IDENTIFIER_POINTER (token->value), "present") == 0
-  && c_parser_peek_2nd_token (parser)->type == CPP_COLON)
+  for (pos = 1; pos < colon_pos; pos++)
   

[PATCH 2/3] openmp: Add support for iterators in map clauses (C/C++)

2024-05-24 Thread Kwok Cheung Yeung
This patch modifies the C and C++ parsers to accept an iterator as a map 
type modifier, encoded in the same way as the depend and affinity 
clauses. When finishing the clauses, clauses with iterators are treated 
separately from ones without to avoid clashes (e.g. iterating over x[i] 
will likely generate clauses to map x).


During gimplification, gimplify_omp_map_iterators is called during 
scanning if a map clause encountered has any iterators. This scans all 
the remaining clauses in one go, as iterators may be shared between 
clauses. Later clauses with iterators are simply skipped over.


For each map clause with an iterator, gimplify_omp_map_iterators 
generates a loop (or multiple loops, if the iterator is 
multidimensional) to iterate over the iterator expression, storing the 
result in a new array (constant-sized for now, we could dynamically 
allocate the array for non-constant iteration bounds). The data array 
stores the total number of iterations in the first element, then the 
address generated by the iterator expression and the OMP_CLAUSE_SIZE 
(since the iteration variables may occur within the size tree) for each 
iteration. The clause is then rewritten to point to the new array. The 
original clause decl is no longer directly relevant, but is kept around 
for informational purposes and to help with clause sorting. The original 
OMP_CLAUSE_SIZE is set to NULL_TREE.


When OMP lowering clauses with iterators, the data array holding the 
expanded iterator info is allocated to a field in the omp_data, and the 
size is set to SIZE_MAX to mark the entry as coming from an expanded 
iterator.


Libgomp has a new function gomp_merge_iterator_maps which identifies 
data coming from an iterator, and effectively creates new maps 
on-the-fly from the iterator info array, inserting them into the list of 
mappings at the point where iterator data occurred.From b2e8ff46929d5a2781781486ec942b344056d78b Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 12 Mar 2024 22:51:06 +
Subject: [PATCH 2/3] openmp: Add support for iterators in map clauses (C/C++)

This adds preliminary support for iterators in map clauses within OpenMP
'target' constructs (which includes constructs such as 'target enter data').

Iterators with non-constant loop bounds are not currently supported.

2024-05-24  Kwok Cheung Yeung  

gcc/c/
* c-parser.cc (c_parser_omp_clause_map): Parse 'iterator' modifier.
* c-typeck.cc (c_finish_omp_clauses): Call recursively on iterator
clauses.

gcc/cp/
* parser.cc (cp_parser_omp_clause_map): Parse 'iterator' modifier.
* semantics.cc (finish_omp_clauses): Call recursively on iterator
clauses.

gcc/
* gimplify.cc (find_var_decl): New.
(check_iterator_var_usage): New.
(gimplify_omp_map_iterators): New.
(omp_group_iterator): New.
(omp_get_attachment): Replace OMP_CLAUSE_DECL with
OMP_ITERATOR_CLAUSE_DECL.
(omp_group_last): Keep decls with and without iterators in separate
groups.
(omp_index_mapping_groups_1): Replace OMP_CLAUSE_DECL with
OMP_ITERATOR_CLAUSE_DECL.
(omp_tsort_mapping_groups_1): Likewise.
(omp_resolve_clause_dependencies): Likewise.  Prevent removal of
mapping if groups do not use the same iterators.
(omp_build_struct_sibling_lists): Replace OMP_CLAUSE_DECL with
OMP_ITERATOR_CLAUSE_DECL.
(gimplify_scan_omp_clauses): Call gimplify_omp_map_iterators once to
handle clauses with iterators, then skip subsequent iterator clauses.
* omp-low.cc (scan_sharing_clauses): Add field for iterator clauses.
(lower_omp_target): Add map entries for iterator clauses.
* tree-pretty-print.cc (dump_omp_map_iterators): New.
(dump_omp_clause): Call dump_omp_map_iterators for iterators in map
clauses.
* tree.h (OMP_ITERATOR_CLAUSE_DECL): New.

gcc/testsuite/
* c-c++-common/gomp/map-6.c (foo): Amend expected error message.
* c-c++-common/gomp/target-iterator-1.c: New.
* c-c++-common/gomp/target-iterator-2.c: New.
* c-c++-common/gomp/target-iterator-3.c: New.

libgomp/
* target.c (gomp_merge_iterator_maps): New.
(gomp_map_vars_internal): Call gomp_merge_iterator_maps.  Free
allocated variables.
* testsuite/libgomp.c-c++-common/target-map-iterators-1.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-2.c: New.
* testsuite/libgomp.c-c++-common/target-map-iterators-3.c: New.
---
 gcc/c/c-parser.cc |  60 -
 gcc/c/c-typeck.cc |  68 ++
 gcc/cp/parser.cc  |  64 -
 gcc/cp/semantics.cc   |  65 ++
 gcc/gimplify.cc   | 220 +-
 gcc/omp-low.cc|  52 -
 

[PATCH 1/3] openmp: Refactor handling of iterators

2024-05-24 Thread Kwok Cheung Yeung
This patch factors out the code to calculate the number of iterations 
required and to generate the iteration loop into separate functions from 
gimplify_omp_depend for reuse later.


I have also replaced the 'TREE_CODE (*tp) == TREE_LIST && ...' checks 
used for detecting an iterator clause with a macro OMP_ITERATOR_DECL_P, 
as it needs to be done frequently.From 0439fce03c2b5fb2802eaf65831e28f548ca074b Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Tue, 12 Mar 2024 20:51:38 +
Subject: [PATCH 1/3] openmp: Refactor handling of iterators

Move code to calculate the iteration size and to generate the iterator
expansion loop into separate functions.

Use OMP_ITERATOR_DECL_P to check for iterators in clause declarations.

2024-05-24  Kwok Cheung Yeung  

gcc/c-family/
* c-omp.cc (c_finish_omp_depobj): Use OMP_ITERATOR_DECL_P.

gcc/c/
* c-typeck.cc (handle_omp_array_sections): Use OMP_ITERATOR_DECL_P.
(c_finish_omp_clauses): Likewise.

gcc/cp/
* pt.cc (tsubst_omp_clause_decl): Use OMP_ITERATOR_DECL_P.
* semantics.cc (handle_omp_array_sections): Likewise.
(finish_omp_clauses): Likewise.

gcc/
* gimplify.cc (gimplify_omp_affinity): Use OMP_ITERATOR_DECL_P.
(compute_iterator_count): New.
(build_iterator_loop): New.
(gimplify_omp_depend): Use OMP_ITERATOR_DECL_P, compute_iterator_count
and build_iterator_loop.
* tree-inline.cc (copy_tree_body_r): Use OMP_ITERATOR_DECL_P.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.h (OMP_ITERATOR_DECL_P): New macro.
---
 gcc/c-family/c-omp.cc|   4 +-
 gcc/c/c-typeck.cc|  13 +-
 gcc/cp/pt.cc |   4 +-
 gcc/cp/semantics.cc  |   8 +-
 gcc/gimplify.cc  | 326 +++
 gcc/tree-inline.cc   |   5 +-
 gcc/tree-pretty-print.cc |   8 +-
 gcc/tree.h   |   6 +
 8 files changed, 175 insertions(+), 199 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index c0e02aa422f..b56e49da62c 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -744,9 +744,7 @@ c_finish_omp_depobj (location_t loc, tree depobj,
  kind = OMP_CLAUSE_DEPEND_KIND (clause);
  t = OMP_CLAUSE_DECL (clause);
  gcc_assert (t);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  error_at (OMP_CLAUSE_LOCATION (clause),
"% modifier may not be specified on "
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7ecca9f58c6..b0fe80cf224 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14218,9 +14218,7 @@ handle_omp_array_sections (tree , enum 
c_omp_region_type ort)
   tree *tp = _CLAUSE_DECL (c);
   if ((OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
|| OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AFFINITY)
-  && TREE_CODE (*tp) == TREE_LIST
-  && TREE_PURPOSE (*tp)
-  && TREE_CODE (TREE_PURPOSE (*tp)) == TREE_VEC)
+  && OMP_ITERATOR_DECL_P (*tp))
 tp = _VALUE (*tp);
   tree first = handle_omp_array_sections_1 (c, *tp, types,
maybe_zero_len, first_non_one,
@@ -15409,9 +15407,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_DEPEND:
case OMP_CLAUSE_AFFINITY:
  t = OMP_CLAUSE_DECL (c);
- if (TREE_CODE (t) == TREE_LIST
- && TREE_PURPOSE (t)
- && TREE_CODE (TREE_PURPOSE (t)) == TREE_VEC)
+ if (OMP_ITERATOR_DECL_P (t))
{
  if (TREE_PURPOSE (t) != last_iterators)
last_iterators_remove
@@ -15511,10 +15507,7 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  break;
}
}
- if (TREE_CODE (OMP_CLAUSE_DECL (c)) == TREE_LIST
- && TREE_PURPOSE (OMP_CLAUSE_DECL (c))
- && (TREE_CODE (TREE_PURPOSE (OMP_CLAUSE_DECL (c)))
- == TREE_VEC))
+ if (OMP_ITERATOR_DECL_P (OMP_CLAUSE_DECL (c)))
TREE_VALUE (OMP_CLAUSE_DECL (c)) = t;
  else
OMP_CLAUSE_DECL (c) = t;
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e77c48e463e..26db4f6e0cf 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -17520,9 +17520,7 @@ tsubst_omp_clause_decl (tree decl, tree args, 
tsubst_flags_t complain,
 return decl;
 
   /* Handle OpenMP iterators.  */
-  if (TREE_CODE (decl) == TREE_LIST
-  && TREE_PURPOSE (decl)
-  && TREE_CODE (TREE_PURPOSE (decl)) == TREE_VEC)
+  if (OMP_ITERATOR_DECL_P (decl))
 {
   tree ret;
   if (iterator_cache[0] == TREE_PURPOSE (decl))
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index f90c304a65b..a48b3d2fcc5 100644
--- a/gcc/cp/semantics.cc
+++ 

[PATCH 0/3] openmp: Add support for iterators in OpenMP mapping clauses (C/C++)

2024-05-24 Thread Kwok Cheung Yeung
This series of patches adds support for OpenMP iterators in the 'map' 
clause of the 'target' construct (and it's derivatives such as 'target 
enter data'), and the 'to' and 'from' constructs of the 'target update' 
construct, currently for C and C++ only.


The approach in this patch differs from Tobias' WFC patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586237.html) in 
that it does not rely on generating a callback function - instead, 
during Gimplification it generates loop(s) to evaluate every iteration 
of the iterator expression, and the results (i.e. addresses, as the 
expression should be an lvalue) are placed into a new array. This array 
is then used as the 'hostaddrs' entry for that particular map. Libgomp 
detects this (the corresponding size entry is set to SIZE_MAX, which 
shouldn't normally occur) and inserts the contents of the array into the 
map information before continuing on as normal.


Caveats:

- In section 2.21.7.1 of the OpenMP 5.1 standard, it states that 'If an 
expression that is used to form a list item in a map clause contains an 
iterator identifier, the list item instances that would result from 
different values of the iterator must not have the same containing array 
and must not have base pointers that share original storage' - this is 
currently not enforced (it would prohibit something like map 
(iterator(i=0:10), to: x[i]) while x is an int[]). As the expression in 
the iterator is more-or-less unbound, it would be very difficult to 
determine this at compile time. At runtime in libgomp, I suppose we 
could check every iterator-derived mapping to ensure that they all 
access unique entries in mem_map?


- The clause finishing currently generates spurious firstprivate maps - 
the patch currently just ignores them when in iterator clauses, but is 
there a better way of doing this?


- Clause reordering does not work too well with iterators. I believe the 
current approach to reordering on trunk is a bit buggy in the first 
place, so I just added enough to get the clauses through the pass 
without ICEing.


The GCC gomp tests and all the libgomp tests have been run without 
regressions on an x86-64 host with NVPTX offloading. Testing on AMD GCN 
to follow.


Kwok


Re: [PATCH] c++: Fix parsing of abstract-declarator starting with ... followed by [ or ( [PR115012]

2024-05-24 Thread Jason Merrill

On 5/9/24 14:12, Jakub Jelinek wrote:


The C++26 P2662R3 Pack indexing paper mentions that both GCC
and MSVC don't handle T...[10] parameter declaration when T
is a pack.  While that will change meaning in C++26, in C++11 .. C++23
this ought to be valid.


Sure, but I don't think it does anyone any favors to start accepting a 
pattern that we know is going to break before long.  Us not accepting it 
was part of the rationale for the paper.



Also, T...(args) as well.


This part of the patch is OK.


The following patch handles those in cp_parser_direct_declarator.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-05-09  Jakub Jelinek  

PR c++/115012
* parser.cc (cp_parser_direct_declarator): Handle
abstract declarator starting with ... followed by [
or (.

* g++.dg/cpp0x/variadic185.C: New test.
* g++.dg/cpp0x/variadic186.C: New test.

--- gcc/cp/parser.cc.jj 2024-05-09 10:30:58.0 +0200
+++ gcc/cp/parser.cc2024-05-09 16:44:01.250777325 +0200
@@ -23916,7 +23916,12 @@ cp_parser_direct_declarator (cp_parser*
  {
/* Peek at the next token.  */
token = cp_lexer_peek_token (parser->lexer);
-  if (token->type == CPP_OPEN_PAREN)
+  if (token->type == CPP_OPEN_PAREN
+ || (first
+ && dcl_kind != CP_PARSER_DECLARATOR_NAMED
+ && token->type == CPP_ELLIPSIS
+ && cxx_dialect > cxx98
+ && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_PAREN)))
{
  /* This is either a parameter-declaration-clause, or a
 parenthesized declarator. When we know we are parsing a
@@ -23955,6 +23960,11 @@ cp_parser_direct_declarator (cp_parser*
  
  	 Thus again, we try a parameter-declaration-clause, and if

 that fails, we back out and return.  */
+ bool pack_expansion_p = token->type == CPP_ELLIPSIS;
+
+ if (pack_expansion_p)
+   /* Consume the `...' */
+   cp_lexer_consume_token (parser->lexer);
  
  	  if (!first || dcl_kind != CP_PARSER_DECLARATOR_NAMED)

{
@@ -24098,6 +24108,7 @@ cp_parser_direct_declarator (cp_parser*
 attrs,
 parens_loc);
  declarator->attributes = gnu_attrs;
+ declarator->parameter_pack_p |= pack_expansion_p;
  /* Any subsequent parameter lists are to do with
 return type, so are not those of the declared
 function.  */
@@ -24121,7 +24132,7 @@ cp_parser_direct_declarator (cp_parser*
  
  	  /* If this is the first, we can try a parenthesized

 declarator.  */
- if (first)
+ if (first && !pack_expansion_p)
{
  bool saved_in_type_id_in_expr_p;
  
@@ -24156,16 +24167,27 @@ cp_parser_direct_declarator (cp_parser*

  else
break;
}
-  else if ((!first || dcl_kind != CP_PARSER_DECLARATOR_NAMED)
-  && token->type == CPP_OPEN_SQUARE
-  && !cp_next_tokens_can_be_attribute_p (parser))
+  else if (((!first || dcl_kind != CP_PARSER_DECLARATOR_NAMED)
+   && token->type == CPP_OPEN_SQUARE
+   && !cp_next_tokens_can_be_attribute_p (parser))
+  || (first
+  && dcl_kind != CP_PARSER_DECLARATOR_NAMED
+  && token->type == CPP_ELLIPSIS
+  && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_SQUARE)
+  && cxx_dialect > cxx98
+  && !cp_nth_tokens_can_be_std_attribute_p (parser, 2)))
{
  /* Parse an array-declarator.  */
  tree bounds, attrs;
+ bool pack_expansion_p = token->type == CPP_ELLIPSIS;
  
  	  if (ctor_dtor_or_conv_p)

*ctor_dtor_or_conv_p = 0;
  
+	  if (pack_expansion_p)

+   /* Consume the `...' */
+   cp_lexer_consume_token (parser->lexer);
+
  open_paren = NULL;
  first = false;
  parser->default_arg_ok_p = false;
@@ -24220,6 +24242,7 @@ cp_parser_direct_declarator (cp_parser*
  attrs = cp_parser_std_attribute_spec_seq (parser);
  declarator = make_array_declarator (declarator, bounds);
  declarator->std_attributes = attrs;
+ declarator->parameter_pack_p |= pack_expansion_p;
}
else if (first && dcl_kind != CP_PARSER_DECLARATOR_ABSTRACT)
{
--- gcc/testsuite/g++.dg/cpp0x/variadic185.C.jj 2024-05-09 15:08:49.070651189 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/variadic185.C2024-05-09 15:07:40.045583153 
+0200
@@ -0,0 +1,39 @@
+// PR c++/115012
+// { dg-do compile { target { c++11 && c++23_down } } }
+// { dg-final { scan-assembler "_Z3fooILi10EJidEEvDpAT__T0_" } }
+// { dg-final { scan-assembler "_Z3barILi10EiEvPT0_" } }
+// { dg-final { scan-assembler "_Z3bazILi10EJidEEvDpAT__T0_" } }
+// { dg-final { scan-assembler 

Re: [PATCH, v2] Fortran: improve attribute conflict checking [PR93635]

2024-05-24 Thread Harald Anlauf

Hi Mikael,

On 5/24/24 20:17, Mikael Morin wrote:

Le 23/05/2024 à 21:15, Harald Anlauf a écrit :

Hi Mikael,

On 5/23/24 09:49, Mikael Morin wrote:

Le 13/05/2024 à 09:25, Mikael Morin a écrit :

Le 10/05/2024 à 21:56, Harald Anlauf a écrit :

Am 10.05.24 um 21:48 schrieb Harald Anlauf:

Hi Mikael,

Am 10.05.24 um 11:45 schrieb Mikael Morin:

Le 09/05/2024 à 22:30, Harald Anlauf a écrit :

I'll stop here...


Thanks. Go figure, I have no problem reproducing today.
It's PR99798 (and there is even a patch for it).


this patch has rotten a bit: the type of gfc_reluease_symbol
has changed to bool, this can be fixed.

Unfortunately, applying the patch does not remove the ICEs here...


Oops, I take that back!  There was an error on my side applying the
patch; and now it does fix the ICEs after correcting that hickup


Now the PR99798 patch is ready to be pushed, but I won't be available
for a few days.  We can finish our discussion on this topic afterwards.


Hello,

I'm coming back to this.
I think either one of Steve's patch or your variant in the PR is a
better fix for the ICE as a first step; they seem less fragile at least.
Then we can look at a possible reordering of conflict checks as with the
patch you originally submitted in this thread.


like the attached variant?


Yes.  The churn in the testsuite is actually not that bad.
OK for master, thanks for the patch.


thanks, will do.

I wouldn't push for backporting, but if you feel like doing it, it seems 
safe enough (depending on my own backport for PR99798 of course).


There's no pressing need.  I'll mark the patch as backportable
with dependency in my own list, in case the question comes up.

Regarding the conflict check reordering, I'm tempted to just drop it at 
this point, or do you think it remains worth it?


I don't really have a showcase where this would bring a benefit now,
so I'm dropping this idea.

There are issues where specifying a standard version changes
the error recovery path (or rather lead to an ICE), but as some
of these are due to emitting an error during parsing instead of
during resolution, my suggestion does not help there.

If you look for an example: this one is taken from pr101281

subroutine a3pr (xn) bind(C)
  character(len=n), pointer :: xn(..)
end

vs.

subroutine a3pr (xn) bind(C)
  character(len=n), pointer :: xn
  dimension :: xn(..)
end

The first one gives lots of invalid reads in valgrind with -std=f2008,
or ICEs, while the second does not.

Thanks,
Harald




Re: [PATCH] Add testcase for PR c++/105229: ICE in lookup_template_class_1

2024-05-24 Thread Jason Merrill

On 5/24/24 08:16, Simon Martin wrote:

The test case in PR c++/105229 has been fixed since 11.4 (via PR
c++/106024) - the attached patch simply adds the case to the test suite.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?


OK, thanks.

BTW, the patch was corrupted by your mail client wrapping long lines; 
can you use git send-email instead in the future?



Thanks! Simon


  PR c++/105229

gcc/testsuite/ChangeLog:

  * g++.dg/parse/crash72.C: New test.

---
   gcc/testsuite/g++.dg/parse/crash72.C | 12 
   1 file changed, 12 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/parse/crash72.C

diff --git a/gcc/testsuite/g++.dg/parse/crash72.C
b/gcc/testsuite/g++.dg/parse/crash72.C
new file mode 100644
index 000..df469e20f28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash72.C
@@ -0,0 +1,12 @@
+// PR c++/105229
+// { dg-do compile { target c++20 } }
+// { dg-additional-options "-Wno-missing-template-keyword" }
+
+template  void bar ()
+{
+  []  {}.operator () <> (); // { dg-error "expected
primary-expression" }
+}
+void foo ()
+{
+  bar ();
+}
--
2.44.0





Re: [PATCH, v2] Fortran: improve attribute conflict checking [PR93635]

2024-05-24 Thread Mikael Morin

Le 23/05/2024 à 21:15, Harald Anlauf a écrit :

Hi Mikael,

On 5/23/24 09:49, Mikael Morin wrote:

Le 13/05/2024 à 09:25, Mikael Morin a écrit :

Le 10/05/2024 à 21:56, Harald Anlauf a écrit :

Am 10.05.24 um 21:48 schrieb Harald Anlauf:

Hi Mikael,

Am 10.05.24 um 11:45 schrieb Mikael Morin:

Le 09/05/2024 à 22:30, Harald Anlauf a écrit :

I'll stop here...


Thanks. Go figure, I have no problem reproducing today.
It's PR99798 (and there is even a patch for it).


this patch has rotten a bit: the type of gfc_reluease_symbol
has changed to bool, this can be fixed.

Unfortunately, applying the patch does not remove the ICEs here...


Oops, I take that back!  There was an error on my side applying the
patch; and now it does fix the ICEs after correcting that hickup


Now the PR99798 patch is ready to be pushed, but I won't be available
for a few days.  We can finish our discussion on this topic afterwards.


Hello,

I'm coming back to this.
I think either one of Steve's patch or your variant in the PR is a
better fix for the ICE as a first step; they seem less fragile at least.
Then we can look at a possible reordering of conflict checks as with the
patch you originally submitted in this thread.


like the attached variant?


Yes.  The churn in the testsuite is actually not that bad.
OK for master, thanks for the patch.
I wouldn't push for backporting, but if you feel like doing it, it seems 
safe enough (depending on my own backport for PR99798 of course).


Regarding the conflict check reordering, I'm tempted to just drop it at 
this point, or do you think it remains worth it?





[c-family] Small enhancement to implementation of -fdump-ada-spec

2024-05-24 Thread Eric Botcazou
This lets it recognize more preprocessing floating constants.

Tested on x86-64/Linux, applied on the mainline.


2024-05-24  Eric Botcazou  

* c-ada-spec.cc (is_cpp_float): New predicate.
(dump_number): Deal with more preprocessing floating constants.
(dump_ada_macros) : Use is_cpp_float.

-- 
Eric Botcazoudiff --git a/gcc/c-family/c-ada-spec.cc b/gcc/c-family/c-ada-spec.cc
index 8f0849bd427..0bea923220b 100644
--- a/gcc/c-family/c-ada-spec.cc
+++ b/gcc/c-family/c-ada-spec.cc
@@ -113,6 +113,26 @@ macro_length (const cpp_macro *macro, int *supported, int *buffer_len,
   (*buffer_len)++;
 }
 
+/* Return true if NUMBER is a preprocessing floating-point number.  */
+
+static bool
+is_cpp_float (unsigned char *number)
+{
+  /* In C, a floating constant need not have a point.  */
+  while (*number != '\0')
+{
+  if (*number == '.')
+	return true;
+  else if ((*number == 'e' || *number == 'E')
+	   && (*(number + 1) == '+' || *(number + 1) == '-'))
+	return true;
+  else
+	number++;
+}
+
+  return false;
+}
+
 /* Dump all digits/hex chars from NUMBER to BUFFER and return a pointer
to the character after the last character written.  If FLOAT_P is true,
this is a floating-point number.  */
@@ -120,12 +140,45 @@ macro_length (const cpp_macro *macro, int *supported, int *buffer_len,
 static unsigned char *
 dump_number (unsigned char *number, unsigned char *buffer, bool float_p)
 {
-  while (*number != '\0'
-	 && *number != (float_p ? 'F' : 'U')
-	 && *number != (float_p ? 'f' : 'u')
-	 && *number != 'l'
-	 && *number != 'L')
-*buffer++ = *number++;
+  /* In Ada, a real literal is a numeric literal that includes a point.  */
+  if (float_p)
+{
+  bool point_seen = false;
+
+  while (*number != '\0')
+	{
+	  if (ISDIGIT (*number))
+	*buffer++ = *number++;
+	  else if (*number == '.')
+	{
+	  *buffer++ = *number++;
+	  point_seen = true;
+	}
+	  else if ((*number == 'e' || *number == 'E')
+		   && (*(number + 1) == '+' || *(number + 1) == '-'))
+	{
+	  if (!point_seen)
+		{
+		  *buffer++ = '.';
+		  *buffer++ = '0';
+		  point_seen = true;
+		}
+	   *buffer++ = *number++;
+	   *buffer++ = *number++;
+	}
+	  else
+	break;
+	}
+}
+
+  /* An integer literal is a numeric literal without a point.  */
+  else
+while (*number != '\0'
+	   && *number != 'U'
+	   && *number != 'u'
+	   && *number != 'l'
+	   && *number != 'L')
+  *buffer++ = *number++;
 
   return buffer;
 }
@@ -450,7 +503,7 @@ dump_ada_macros (pretty_printer *pp, const char* file)
 
 			  default:
 /* Dump floating-point constant unmodified.  */
-if (strchr ((const char *)tmp, '.'))
+if (is_cpp_float (tmp))
   buffer = dump_number (tmp, buffer, true);
 else
   {
@@ -480,8 +533,7 @@ dump_ada_macros (pretty_printer *pp, const char* file)
 
 			default:
 			  buffer
-			= dump_number (tmp, buffer,
-	   strchr ((const char *)tmp, '.'));
+			= dump_number (tmp, buffer, is_cpp_float (tmp));
 			  break;
 		  }
 		break;


Re: [PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 09:19:09 PDT (-0700), Robin Dapp wrote:

We should have something in doc/invoke too, this one is going to be
tricky for users.  We'll also have to define how this interacts with
the existing -mstrict-align.


Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.


I think we just need to write it down.  I think there's two ways to 
encode this: either we treat scalar and vector as independent, or we 
couple them.  If we treat them independently then we end up with four 
cases, it's not clear if they're all interesting.  IIUC with this patch 
we'd be able to encode


* -mstrict-align: Both scalar and vector misaligned accesses are 
 unsupported (-mrvv-allow-misalign doesn't matter).  I'm not sure if 
 there's hardware there, but given we have systems that don't support 
 scalar misaligned accesses it seems reasonable to assume they'll also 
 not support vector misaligned accesses.
* -mno-strict-align -mno-rvv-allow-misalign: Scalar misaligned are 
 supported, vector misaligned aren't supported.  This matches our best 
 theory of how the k230 and k1 behave, so it also seems reasonable to 
 support.
* -mno-strict-align -mrvv-allow-misalign: Both scalar and vector 
 misaligned accesses are supported.  This seems reasonable to support 
 as it's how I'd hope big cores end up being designed, though again 
 there's no hardware.


The fourth case is kind of wacky: scalar misaligned is unsupported, 
vector misaligned is supported.  I'm not really sure why we'd end up 
with a system like that, but HW vendors do wacky things so it's kind of 
hard to predict.


IMO it's fine if we're defining that as an unencodeable case it's fine, 
we can always add something later.  We should just write it down so 
nobody's confused.



Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

This patch changes the default from always enabling movmisalign to
not enabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mrvv-allow-misalign.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c:
---
 gcc/config/riscv/riscv-opts.h |  3 --
 gcc/config/riscv/riscv.cc | 18 ++
 gcc/config/riscv/riscv.h  |  6 
 gcc/config/riscv/riscv.opt|  5 +++
 gcc/doc/invoke.texi   |  5 +++
 .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
 .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 34 +--
 13 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))

-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git 

[PATCH v2] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Robin Dapp
> We should have something in doc/invoke too, this one is going to be
> tricky for users.  We'll also have to define how this interacts with
> the existing -mstrict-align.

Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align.  I would have hoped that using
-mstrict-align we'd never run into any movmisalign situation but that
might be wishful thinking.  Do we need to specify an
interaction, though?  For now the new options disables movmisalign so
if we hit that despite -mstrict-align we'd still not vectorize it.

Regtested on rv64gcv_zvfh_zvbb.

Regards
 Robin

This patch changes the default from always enabling movmisalign to
not enabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Add
-mrvv-allow-misalign.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c:
---
 gcc/config/riscv/riscv-opts.h |  3 --
 gcc/config/riscv/riscv.cc | 18 ++
 gcc/config/riscv/riscv.h  |  6 
 gcc/config/riscv/riscv.opt|  5 +++
 gcc/doc/invoke.texi   |  5 +++
 .../costmodel/riscv/rvv/dynamic-lmul2-7.c |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-10.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-11.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-12.c  |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-8.c   |  2 +-
 .../vect/costmodel/riscv/rvv/vla_vs_vls-9.c   |  2 +-
 .../riscv/rvv/autovec/vls/misalign-1.c|  2 +-
 gcc/testsuite/lib/target-supports.exp | 34 +--
 13 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
 
-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 85df5b7ab49..cfdeb56559f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -287,6 +287,7 @@ struct riscv_tune_param
   unsigned short memory_cost;
   unsigned short fmv_cost;
   bool slow_unaligned_access;
+  bool rvv_unaligned_access;
   bool use_divmod_expansion;
   bool overlap_op_by_pieces;
   unsigned int fusible_ops;
@@ -299,6 +300,10 @@ struct riscv_tune_param
 /* Whether unaligned accesses execute very slowly.  */
 bool riscv_slow_unaligned_access_p;
 
+/* Whether misaligned vector accesses are supported (i.e. do not
+   throw an exception).  */
+bool riscv_rvv_unaligned_access_p;
+
 /* Whether user explicitly passed -mstrict-align.  */
 bool riscv_user_wants_strict_align;
 
@@ -441,6 +446,7 @@ static const struct riscv_tune_param rocket_tune_info = {
   5,   /* memory_cost */
   8,   /* fmv_cost */
   true,/* 
slow_unaligned_access */
+  false,   /* rvv_unaligned_access */
   false,   /* use_divmod_expansion */
   false,   /* overlap_op_by_pieces */
   RISCV_FUSE_NOTHING,   /* fusible_ops */
@@ -459,6 +465,7 @@ static const struct riscv_tune_param sifive_7_tune_info = {
   3,   /* memory_cost */
   8,   

Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-24 Thread Jakub Jelinek
On Fri, May 24, 2024 at 05:39:45PM +0200, Martin Uecker wrote:
> PR 115157
> PR 115177
> 
> gcc/c/
> * c-decl.cc (shadow_tag-warned,parse_xref_tag,start_enum,
> finish_enum): Set SET_TYPE_STRUCTURAL_EQUALITY / TYPE_CANONICAL.
> * c-obj-common.cc (get_alias_set): Remove special case.
> (get_aka_type): Add special case.
> 
> gcc/c-family/
> * c-attribs.cc (handle_hardbool_attribute): Set TYPE_CANONICAL
> for hardbools.
> 
> gcc/
> * godump.cc (go_output_typedef): use TYPE_MAIN_VARIANT instead
> of TYPE_CANONICAL.

Just a nit:
s/use/Use/

Jakub



Re: [PATCH] .gitattributes: disable crlf translation

2024-05-24 Thread Peter0x44

On 2024-05-23 05:01, Richard Biener wrote:
On Thu, May 23, 2024 at 5:50 AM Peter Damianov  
wrote:


By default, git has the "autocrlf" """feature""" enabled. This causes 
the files
to have CRLF line endings when checked out on windows, which in the 
case of

configure, causes confusing errors like:

./gcc/configure: line 14: $'\r': command not found
./gcc/configure: line 29: syntax error near unexpected token `newline'
'/gcc/configure: line 29: ` ;;

when it is invoked.

Any files damaged in this way can be fixed with:
$ git config core.autocrlf false
$ git reset
$ git checkout .

But, it's better to simply avoid this problem in the first place.
This behavior is never helpful or desired for gcc.


For files added/edited on Windows does this then also strip the \r
(upon which action?)?  Otherwise I think this looks good but I'm not
a git expert.
From what I can tell, the \r doesn't get stripped from the files, but 
the commit itself acts as if it isn't there.
In the working directory, if an editor introduces a CRLF it remains, but 
any commits created won't include it.


I am finding the git documentation a bit confusing on this point though, 
so I'm not certain.

I'm far from a git export as well.

I checked and I couldn't find any CRLFs in gcc right now.
I tried the commands here:
https://git-scm.com/docs/gitattributes

$ git add --renormalize .
$ git status# Show files that will be normalized

And git status showed no changes.

As far as I can tell, this change is okay. I would still feel more 
confident if others looked at it, though.


Thanks,
Peter D.


Richard.


Signed-off-by: Peter Damianov 
---
 .gitattributes | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitattributes b/.gitattributes
index e75bfc595bf..1e116987c98 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -8,3 +8,6 @@ ChangeLog 
whitespace=indent-with-non-tab,space-before-tab,trailing-space

 # Use together with git config diff.md.xfuncname '^\(define.*$'
 # which is run by contrib/gcc-git-customization.sh too.
 *.md diff=md
+
+# Disable lf -> crlf translation on windows.
+* -crlf
--
2.39.2



[C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-24 Thread Martin Uecker


This is another version of this patch with two changes:

- I added a fix (with test) for PR 115177 which is just the same
issue for hardbools which are internally implemented as enums.

- I fixed the golang issue. Since the addition of the main variant
to the seen decls is unconditional I removed also the addition
of the type itself which now seems unnecessary.

Bootstrapped and regression tested on x86_64.

Martin



C: allow aliasing of compatible types derived from enumeral types [PR115157]

Aliasing of enumeral types with the underlying integer is now allowed
by setting the aliasing set to zero.  But this does not allow aliasing
of derived types which are compatible as required by ISO C.  Instead,
initially set structural equality.  Then set TYPE_CANONICAL and update
pointers and main variants when the type is completed (as done for
structures and unions in C23).

PR 115157
PR 115177

gcc/c/
* c-decl.cc (shadow_tag-warned,parse_xref_tag,start_enum,
finish_enum): Set SET_TYPE_STRUCTURAL_EQUALITY / TYPE_CANONICAL.
* c-obj-common.cc (get_alias_set): Remove special case.
(get_aka_type): Add special case.

gcc/c-family/
* c-attribs.cc (handle_hardbool_attribute): Set TYPE_CANONICAL
for hardbools.

gcc/
* godump.cc (go_output_typedef): use TYPE_MAIN_VARIANT instead
of TYPE_CANONICAL.

gcc/testsuite/
* gcc.dg/enum-alias-1.c: New test.
* gcc.dg/enum-alias-2.c: New test.
* gcc.dg/enum-alias-3.c: New test.
* gcc.dg/enum-alias-4.c: New test.

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 04e39b41bdf..033395093b6 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -1074,6 +1074,7 @@ handle_hardbool_attribute (tree *node, tree name, tree 
args,
 
   TREE_SET_CODE (*node, ENUMERAL_TYPE);
   ENUM_UNDERLYING_TYPE (*node) = orig;
+  TYPE_CANONICAL (*node) = TYPE_CANONICAL (orig);
 
   tree false_value;
   if (args)
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index b691b91b3db..6e6606c9570 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5051,7 +5051,7 @@ shadow_tag_warned (const struct c_declspecs *declspecs, 
int warned)
  if (t == NULL_TREE)
{
  t = make_node (code);
- if (flag_isoc23 && code != ENUMERAL_TYPE)
+ if (flag_isoc23 || code == ENUMERAL_TYPE)
SET_TYPE_STRUCTURAL_EQUALITY (t);
  pushtag (input_location, name, t);
}
@@ -8828,7 +8828,7 @@ parser_xref_tag (location_t loc, enum tree_code code, 
tree name,
  the forward-reference will be altered into a real type.  */
 
   ref = make_node (code);
-  if (flag_isoc23 && code != ENUMERAL_TYPE)
+  if (flag_isoc23 || code == ENUMERAL_TYPE)
 SET_TYPE_STRUCTURAL_EQUALITY (ref);
   if (code == ENUMERAL_TYPE)
 {
@@ -9919,6 +9919,7 @@ start_enum (location_t loc, struct c_enum_contents 
*the_enum, tree name,
 {
   enumtype = make_node (ENUMERAL_TYPE);
   TYPE_SIZE (enumtype) = NULL_TREE;
+  SET_TYPE_STRUCTURAL_EQUALITY (enumtype);
   pushtag (loc, name, enumtype);
   if (fixed_underlying_type != NULL_TREE)
{
@@ -9935,6 +9936,8 @@ start_enum (location_t loc, struct c_enum_contents 
*the_enum, tree name,
  TYPE_SIZE (enumtype) = NULL_TREE;
  TYPE_PRECISION (enumtype) = TYPE_PRECISION (fixed_underlying_type);
  ENUM_UNDERLYING_TYPE (enumtype) = fixed_underlying_type;
+ TYPE_CANONICAL (enumtype) = TYPE_CANONICAL (fixed_underlying_type);
+ c_update_type_canonical (enumtype);
  layout_type (enumtype);
}
 }
@@ -10094,6 +10097,10 @@ finish_enum (tree enumtype, tree values, tree 
attributes)
   ENUM_UNDERLYING_TYPE (enumtype) =
c_common_type_for_size (TYPE_PRECISION (tem), TYPE_UNSIGNED (tem));
 
+  TYPE_CANONICAL (enumtype) =
+   TYPE_CANONICAL (ENUM_UNDERLYING_TYPE (enumtype));
+  c_update_type_canonical (enumtype);
+
   layout_type (enumtype);
 }
 
diff --git a/gcc/c/c-objc-common.cc b/gcc/c/c-objc-common.cc
index b7c72d2609c..551ec6f4b65 100644
--- a/gcc/c/c-objc-common.cc
+++ b/gcc/c/c-objc-common.cc
@@ -130,6 +130,8 @@ get_aka_type (tree type)
 
   result = get_aka_type (orig_type);
 }
+  else if (TREE_CODE (type) == ENUMERAL_TYPE)
+return type;
   else
 {
   tree canonical = TYPE_CANONICAL (type);
@@ -418,11 +420,6 @@ c_var_mod_p (tree x, tree fn ATTRIBUTE_UNUSED)
 alias_set_type
 c_get_alias_set (tree t)
 {
-  /* Allow aliasing between enumeral types and the underlying
- integer type.  This is required since those are compatible types.  */
-  if (TREE_CODE (t) == ENUMERAL_TYPE)
-return get_alias_set (ENUM_UNDERLYING_TYPE (t));
-
   /* Structs with variable size can alias different incompatible
  structs.  Let 

Re: [PATCH] c++/modules: Improve diagnostic when redeclaring builtin in module [PR102345]

2024-05-24 Thread Jason Merrill

On 5/24/24 11:32, Nathaniel Shead wrote:

On Fri, May 24, 2024 at 11:24:38AM -0400, Jason Merrill wrote:

On 5/24/24 11:20, Nathaniel Shead wrote:

This is just a small improvement to a diagnostic.  I thought about also
adding an inform to suggest something like "standard library headers
should be included in the GMF" or somesuch, but I'm not sure that'll be
especially valuable and may be confusing if this particular error is
caused some other way.

Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

If a user mistakenly includes a standard library header within the
module purview, they currently get a confusing "declaration conflicts
with builtin" error.  This patch updates the message to include "in
module", to help guide the user towards the likely cause.

gcc/cp/ChangeLog:

* module.cc (module_may_redeclare): Update error message.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Test for updated error.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/module.cc   | 9 -
   gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
   2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6cd7d9e0b93..60cf3ebc468 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19140,7 +19140,14 @@ module_may_redeclare (tree olddecl, tree newdecl)
 decl = newdecl ? newdecl : olddecl;
 location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : input_location;
 if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
-error_at (loc, "declaration %qD conflicts with builtin", decl);
+{
+  if (newdecl_attached_p)
+   error_at (loc, "declaring %qD in module %qs conflicts with builtin",


Maybe "with builtin in global module"?


Yup, that's even clearer, thanks.




+ decl, new_mod->get_flatname ());
+  else
+   error_at (loc, "declaring %qD in global module conflicts with builtin",


I'm not sure mentioning the global module adds anything in this case?



I'd done it for consistency with the other cases in case we ever somehow
reached this code path, but happy to remove since this should always be
a problem regardless.

So maybe something like this?


OK.


-- >8 --

If a user mistakenly includes a standard library header within the
module purview, they currently get a confusing "declaration conflicts
with builtin" error.  This patch updates the message to include "in
module", to help guide the user towards the likely cause.

gcc/cp/ChangeLog:

* module.cc (module_may_redeclare): Update error message.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Test for updated error.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 8 +++-
  gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6cd7d9e0b93..3f8f84bb9fd 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19140,7 +19140,13 @@ module_may_redeclare (tree olddecl, tree newdecl)
decl = newdecl ? newdecl : olddecl;
location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : input_location;
if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
-error_at (loc, "declaration %qD conflicts with builtin", decl);
+{
+  if (newdecl_attached_p)
+   error_at (loc, "declaring %qD in module %qs conflicts with builtin "
+ "in global module", decl, new_mod->get_flatname ());
+  else
+   error_at (loc, "declaration %qD conflicts with builtin", decl);
+}
else if (DECL_LANG_SPECIFIC (old_inner) && DECL_MODULE_IMPORT_P (old_inner))
  {
auto_diagnostic_group d;
diff --git a/gcc/testsuite/g++.dg/modules/enum-12.C 
b/gcc/testsuite/g++.dg/modules/enum-12.C
index 064f220dedf..cf8f445e076 100644
--- a/gcc/testsuite/g++.dg/modules/enum-12.C
+++ b/gcc/testsuite/g++.dg/modules/enum-12.C
@@ -4,7 +4,7 @@
  
  export module foo;

  namespace std {
-  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "conflicts with 
builtin" }
+  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "in module 
.foo. conflicts with builtin" }
  }
  
  // { dg-prune-output "not writing module" }




[PATCH v2] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases.

2024-05-24 Thread Jiawei
Return NULL_TREE when match the POLY_INT case.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html

gcc/ChangeLog:

* tree-ssa-pre.cc (create_component_ref_by_pieces_1): New
* conditions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr115214.c: New test.

---
 .../gcc.target/riscv/rvv/vsetvl/pr115214.c| 51 +++
 gcc/tree-ssa-pre.cc   | 13 +++--
 2 files changed, 61 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
new file mode 100644
index 000..9d19641196f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" 
} */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+
+#include 
+
+static inline __attribute__(()) int vaddq_f32();
+static inline __attribute__(()) int vload_tillz_f32(int nlane) {
+  vint32m1_t __trans_tmp_9;
+  {
+int __trans_tmp_0 = nlane;
+{
+  vint64m1_t __trans_tmp_1;
+  vint64m1_t __trans_tmp_2;
+  vint64m1_t __trans_tmp_3;
+  vint64m1_t __trans_tmp_4;
+  if (__trans_tmp_0 == 1) {
+{
+  __trans_tmp_3 =
+  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
+}
+__trans_tmp_4 = __trans_tmp_2;
+  }
+  __trans_tmp_4 = __trans_tmp_3;
+  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
+}
+  }
+  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot 
be passed to an unprototyped function} } */
+}
+
+char CFLOAT_add_args[3];
+const int *CFLOAT_add_steps;
+const int CFLOAT_steps;
+
+__attribute__(()) void CFLOAT_add() {
+  char *b_src0 = _add_args[0], *b_src1 = _add_args[1],
+   *b_dst = _add_args[2];
+  const float *src1 = (float *)b_src1;
+  float *dst = (float *)b_dst;
+  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
+  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
+  const int hstep = 4 / 2;
+  vfloat32m1x2_t a;
+  int len = 255;
+  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
+int b = vload_tillz_f32(len);
+int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
'__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
+  }
+  for (; len > 0; --len, b_src0 += CFLOAT_steps,
+  b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2])
+;
+}
\ No newline at end of file
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 75217f5cde1..b185f858c7f 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2685,11 +2685,18 @@ create_component_ref_by_pieces_1 (basic_block block, 
vn_reference_t ref,
   here as the element alignment may be not visible.  See
   PR43783.  Simply drop the element size for constant
   sizes.  */
-   if (TREE_CODE (genop3) == INTEGER_CST
+   if ((TREE_CODE (genop3) == INTEGER_CST
&& TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
&& wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
-(wi::to_offset (genop3)
- * vn_ref_op_align_unit (currop
+ (wi::to_offset (genop3) * vn_ref_op_align_unit (currop
+   || (TREE_CODE (genop3) == POLY_INT_CST
+ && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == POLY_INT_CST
+ && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
+   (wi::to_offset (genop3) * vn_ref_op_align_unit (currop
+   || (TREE_CODE (genop3) == EXACT_DIV_EXPR
+ && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST
+ && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)),
+   vn_ref_op_align_unit (currop
  genop3 = NULL_TREE;
else
  {
-- 
2.25.1



Re: [PATCH] c++/modules: Improve diagnostic when redeclaring builtin in module [PR102345]

2024-05-24 Thread Nathaniel Shead
On Fri, May 24, 2024 at 11:24:38AM -0400, Jason Merrill wrote:
> On 5/24/24 11:20, Nathaniel Shead wrote:
> > This is just a small improvement to a diagnostic.  I thought about also
> > adding an inform to suggest something like "standard library headers
> > should be included in the GMF" or somesuch, but I'm not sure that'll be
> > especially valuable and may be confusing if this particular error is
> > caused some other way.
> > 
> > Bootstrapped and regtested (so far just modules.exp) on
> > x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?
> > 
> > -- >8 --
> > 
> > If a user mistakenly includes a standard library header within the
> > module purview, they currently get a confusing "declaration conflicts
> > with builtin" error.  This patch updates the message to include "in
> > module", to help guide the user towards the likely cause.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (module_may_redeclare): Update error message.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/enum-12.C: Test for updated error.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/module.cc   | 9 -
> >   gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
> >   2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 6cd7d9e0b93..60cf3ebc468 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -19140,7 +19140,14 @@ module_may_redeclare (tree olddecl, tree newdecl)
> > decl = newdecl ? newdecl : olddecl;
> > location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : 
> > input_location;
> > if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
> > -error_at (loc, "declaration %qD conflicts with builtin", decl);
> > +{
> > +  if (newdecl_attached_p)
> > +   error_at (loc, "declaring %qD in module %qs conflicts with builtin",
> 
> Maybe "with builtin in global module"?

Yup, that's even clearer, thanks.

> 
> > + decl, new_mod->get_flatname ());
> > +  else
> > +   error_at (loc, "declaring %qD in global module conflicts with builtin",
> 
> I'm not sure mentioning the global module adds anything in this case?
> 

I'd done it for consistency with the other cases in case we ever somehow
reached this code path, but happy to remove since this should always be
a problem regardless.

So maybe something like this?

-- >8 --

If a user mistakenly includes a standard library header within the
module purview, they currently get a confusing "declaration conflicts
with builtin" error.  This patch updates the message to include "in
module", to help guide the user towards the likely cause.

gcc/cp/ChangeLog:

* module.cc (module_may_redeclare): Update error message.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Test for updated error.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 8 +++-
 gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6cd7d9e0b93..3f8f84bb9fd 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19140,7 +19140,13 @@ module_may_redeclare (tree olddecl, tree newdecl)
   decl = newdecl ? newdecl : olddecl;
   location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : input_location;
   if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
-error_at (loc, "declaration %qD conflicts with builtin", decl);
+{
+  if (newdecl_attached_p)
+   error_at (loc, "declaring %qD in module %qs conflicts with builtin "
+ "in global module", decl, new_mod->get_flatname ());
+  else
+   error_at (loc, "declaration %qD conflicts with builtin", decl);
+}
   else if (DECL_LANG_SPECIFIC (old_inner) && DECL_MODULE_IMPORT_P (old_inner))
 {
   auto_diagnostic_group d;
diff --git a/gcc/testsuite/g++.dg/modules/enum-12.C 
b/gcc/testsuite/g++.dg/modules/enum-12.C
index 064f220dedf..cf8f445e076 100644
--- a/gcc/testsuite/g++.dg/modules/enum-12.C
+++ b/gcc/testsuite/g++.dg/modules/enum-12.C
@@ -4,7 +4,7 @@
 
 export module foo;
 namespace std {
-  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "conflicts 
with builtin" }
+  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "in module 
.foo. conflicts with builtin" }
 }
 
 // { dg-prune-output "not writing module" }
-- 
2.43.2



Re: [PATCH] c++/modules: Improve diagnostic when redeclaring builtin in module [PR102345]

2024-05-24 Thread Jason Merrill

On 5/24/24 11:20, Nathaniel Shead wrote:

This is just a small improvement to a diagnostic.  I thought about also
adding an inform to suggest something like "standard library headers
should be included in the GMF" or somesuch, but I'm not sure that'll be
especially valuable and may be confusing if this particular error is
caused some other way.

Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

If a user mistakenly includes a standard library header within the
module purview, they currently get a confusing "declaration conflicts
with builtin" error.  This patch updates the message to include "in
module", to help guide the user towards the likely cause.

gcc/cp/ChangeLog:

* module.cc (module_may_redeclare): Update error message.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Test for updated error.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc   | 9 -
  gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6cd7d9e0b93..60cf3ebc468 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19140,7 +19140,14 @@ module_may_redeclare (tree olddecl, tree newdecl)
decl = newdecl ? newdecl : olddecl;
location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : input_location;
if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
-error_at (loc, "declaration %qD conflicts with builtin", decl);
+{
+  if (newdecl_attached_p)
+   error_at (loc, "declaring %qD in module %qs conflicts with builtin",


Maybe "with builtin in global module"?


+ decl, new_mod->get_flatname ());
+  else
+   error_at (loc, "declaring %qD in global module conflicts with builtin",


I'm not sure mentioning the global module adds anything in this case?

Jason



[PATCH] c++/modules: Improve diagnostic when redeclaring builtin in module [PR102345]

2024-05-24 Thread Nathaniel Shead
This is just a small improvement to a diagnostic.  I thought about also
adding an inform to suggest something like "standard library headers
should be included in the GMF" or somesuch, but I'm not sure that'll be
especially valuable and may be confusing if this particular error is
caused some other way.

Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

If a user mistakenly includes a standard library header within the
module purview, they currently get a confusing "declaration conflicts
with builtin" error.  This patch updates the message to include "in
module", to help guide the user towards the likely cause.

gcc/cp/ChangeLog:

* module.cc (module_may_redeclare): Update error message.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-12.C: Test for updated error.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   | 9 -
 gcc/testsuite/g++.dg/modules/enum-12.C | 2 +-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 6cd7d9e0b93..60cf3ebc468 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19140,7 +19140,14 @@ module_may_redeclare (tree olddecl, tree newdecl)
   decl = newdecl ? newdecl : olddecl;
   location_t loc = newdecl ? DECL_SOURCE_LOCATION (newdecl) : input_location;
   if (DECL_IS_UNDECLARED_BUILTIN (olddecl))
-error_at (loc, "declaration %qD conflicts with builtin", decl);
+{
+  if (newdecl_attached_p)
+   error_at (loc, "declaring %qD in module %qs conflicts with builtin",
+ decl, new_mod->get_flatname ());
+  else
+   error_at (loc, "declaring %qD in global module conflicts with builtin",
+ decl);
+}
   else if (DECL_LANG_SPECIFIC (old_inner) && DECL_MODULE_IMPORT_P (old_inner))
 {
   auto_diagnostic_group d;
diff --git a/gcc/testsuite/g++.dg/modules/enum-12.C 
b/gcc/testsuite/g++.dg/modules/enum-12.C
index 064f220dedf..cf8f445e076 100644
--- a/gcc/testsuite/g++.dg/modules/enum-12.C
+++ b/gcc/testsuite/g++.dg/modules/enum-12.C
@@ -4,7 +4,7 @@
 
 export module foo;
 namespace std {
-  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "conflicts 
with builtin" }
+  enum class align_val_t : decltype(sizeof(int)) {};  // { dg-error "in module 
.foo. conflicts with builtin" }
 }
 
 // { dg-prune-output "not writing module" }
-- 
2.43.2



Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-24 Thread Carl Love
Kewen:

On 5/24/24 03:43, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/24 02:21, Carl Love wrote:
>>
>>
>> On 5/13/24 22:37, Kewen.Lin wrote:
>>> Hi,
>>>
>>> on 2024/4/20 05:18, Carl Love wrote:
 rs6000, remove __builtin_vsx_xvcmpeqsp built-in

 The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
 vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
 the test cases are removed.

 gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
Remove built-in definition.

>>>
>>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
>>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
>>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
>>>
>>>
 gcc/testsuite/ChangeLog:
* vsx-builtin-3.c (do_cmp): Remove test case for
__builtin_vsx_xvcmpeqsp.
 ---
  gcc/config/rs6000/rs6000-builtins.def| 3 ---
  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
  2 files changed, 5 deletions(-)

 diff --git a/gcc/config/rs6000/rs6000-builtins.def 
 b/gcc/config/rs6000/rs6000-builtins.def
 index 2f6149edd5f..19d05b8043a 100644
 --- a/gcc/config/rs6000/rs6000-builtins.def
 +++ b/gcc/config/rs6000/rs6000-builtins.def
 @@ -1613,9 +1613,6 @@
const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
  XVCMPEQDP_P vector_eq_v2df_p {pred}
  
 -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
 -XVCMPEQSP vector_eqv4sf {}
 -
const vd __builtin_vsx_xvcmpgedp (vd, vd);
  XVCMPGEDP vector_gev2df {}
  
 diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
 b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
 index 35ea31b2616..245893dc0e3 100644
 --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
 +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
 @@ -27,7 +27,6 @@
  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
  /* { dg-final { scan-assembler "xvcmpgedp" } } */
 -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
  /* { dg-final { scan-assembler "xvcmpgesp" } } */
  /* { dg-final { scan-assembler "xxsldwi" } } */
 @@ -112,7 +111,6 @@ int do_cmp (void)
d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
  
 -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
return i;
>>>
>>> As the other in this patch series, I prefer to change it with
>>> vec_cmpeq here, OK for trunk with this tweaked (also keep the
>>> scan there), thanks!
>>
>> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp 
>> and vec_cmpeq both return a vector where the element is all ones if the 
>> comparison is True and zeros if False.  However, the return type for 
>> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.
>>
> 
> Ah, so they are not equivalent from prototype perspective.
> 
>> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
>> vector element is a 1 if the comparison is equal and 0 otherwise.  However, 
>> the documented result is a vector bool int for the floating point 
>> comparison.  The return value for __builtin_vsx_xvcmpeqsp was vector float.
> 
> IMHO PVIPR prototype (returning vector bool) makes more sense,
> it does match better with what the result holds.

Yes, I tend to agree.  I think the user would use be likely using the test so 
they could create a mask to selectively replace vector elements.  A bool type 
make more sense in that case.

> 
>>
>> So, the "bit values" returned are the same but not of the same type. So 
>> technically vec_cmpeq is not a drop in replacement for 
>> __builtin_vsx_xvcmpeqsp.  Given that, perhaps we should not be removing 
>> __builtin_vsx_xvcmpeqsp?
>>
>> The testcase has to be changed from:
>>  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>  bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++;
> 
> For the test case change, I'd expect that it can work with:
> 
> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
> +  f[i][0] = (vector float) vec_cmpeq (f[i][1], f[i][2]); i++;

Yes, that does work.

> 
>>
>> I am thinking we should drop this patch from the series, i.e. don't remove 
>> __builtin_vsx_xvcmpeqsp.  Thoughts?
>>
> 
> Since __builtin_vsx_xvcmpeqsp is an undocumented built-in, I don't
> expect users to use it, even there is someone, IMHO vector bool is
> a better fit.  In case someone actually wants the vector non-bool
> type, he/she can just add an explicit conversion.  So I'm inclined
> to remove the 

Re: [PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-24 Thread Jason Merrill

On 5/24/24 10:40, Iain Sandoe wrote:




On 24 May 2024, at 14:54, Jason Merrill  wrote:

On 5/24/24 04:06, Nathaniel Shead wrote:

On Thu, May 23, 2024 at 06:41:06PM -0400, Jason Merrill wrote:

On 5/13/24 07:56, Nathaniel Shead wrote:

@@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree args)
  if (tmpl != error_mark_node)
{
  /* The new TMPL is not an instantiation of anything, so we
-forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
+forget its origins.  It is also not a specialization of
+anything.  We don't reset CLASSTYPE_TI_TEMPLATE
 for the new type because that is supposed to be the
 corresponding template decl, i.e., TMPL.  */
+ spec_entry elt;
+ elt.tmpl = friend_tmpl;
+ elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
+ elt.spec = TREE_TYPE (tmpl);
+ type_specializations->remove_elt ();


For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be
unconditional.  OK.


I'm looking to backport this patch to GCC 14 now that it's been on trunk
some time.  Here's the patch I'm aiming to add (squashed with the
changes from r15-220-gec2365e07537e8) after cherrypicking the
prerequisite commit r15-58-g2faf040335f9b4; is this OK?

Or should I keep it as two separate commits to make the cherrypicking
more obvious? Not entirely sure on the etiquette around this.


It's OK to squash them, but it's typical to use -x (directly or via git
gcc-backport) to mention where a branch change was cherry-picked from, and
in this case it would make sense to edit in the second commit so it's clear
the backport includes both.  OK that way.

Sorry, still a bit confused :)  Do you mean to merge the two commits
together such that there are two "cherry picked from commit ..."s in the
commit message?  Or just list second commit, and mention that it
includes both in the commit message?


I was thinking

(cherry picked from commit a and b)


For the record, I do not think the git hooks with allow that exactly (or, at 
least, if they
do I did not find the right syntax);  what I’ve done in similar cases is to 
keep the main
“cherry picked from” line for the main patch and then add text to the intro 
section
saying that  and  are also included.


Looks like the git_commit.py regex "cherry picked from commit 
(?P\w+)" will ignore everything after the first commit-id, so it 
should be fine.  You just can't pluralize "commit" to "commits".  :)


Jason



Re: [PATCH] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Palmer Dabbelt

On Fri, 24 May 2024 07:30:20 PDT (-0700), Robin Dapp wrote:

Hi,

this patch changes the default from always enabling movmisalign to
disabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.  I haven't actually tested it on a target that does not
support misaligned vector loads, though.

Regtested on rv64gcv_zvfh_zvbb.  There are a few additional
failures in the rvv testsuite.  They are caused by us overwriting
the default vectorizer flags rather than appending.  I'm going to
fix them in a subsequent patch but for now I'd rather get things
rolling.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.


We should have something in doc/invoke too, this one is going to be 
tricky for users.  We'll also have to define how this interacts with the 
existing -mstrict-align.



gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
---
 gcc/config/riscv/riscv-opts.h |  3 ---
 gcc/config/riscv/riscv.h  |  5 
 gcc/config/riscv/riscv.opt|  5 
 gcc/testsuite/lib/target-supports.exp | 34 +--
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))

-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index d6b14c4d620..8434e5677b6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -934,6 +934,11 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   || (riscv_microarchitecture == sifive_p400) \
   || (riscv_microarchitecture == sifive_p600))

+/* True if the target supports misaligned vector loads and stores.  */
+#define TARGET_VECTOR_MISALIGN_SUPPORTED \
+  (rvv_allow_misalign == 1 \
+   || riscv_microarchitecture == generic_ooo)


We should probably just stick it in a tune struct instead?  That seems 
cleaner than matching on the exact uarch.



+
 #define LOGICAL_OP_NON_SHORT_CIRCUIT 0

 /* Control the assembler format that we output.  */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 87f58332016..cff34eee8c9 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -628,3 +628,8 @@ Specify TLS dialect.
 mfence-tso
 Target Var(TARGET_FENCE_TSO) Init(1)
 Specifies whether the fence.tso instruction should be used.
+
+mrvv-allow-misalign
+Target Var(rvv_allow_misalign) Init(0)
+Allow the creation of misaligned vector loads and stores irrespective of the
+current uarch. The default is off.


IMO we should be explicit here about these being element-misaligned 
accesses, not register-misaligned accesses.  I don't want to get roped 
into handling register-misaligned accesses under the same flag, that 
would be a whole different flavor of codegen.



diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f0f6da52275..ebb908f5c8f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2034,7 +2034,7 @@ proc check_effective_target_riscv_zvfh_ok { } {
 # check if we can execute vector insns with the given hardware or
 # simulator
 set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
-if { [check_runtime ${gcc_march}_exec {
+if { [check_runtime ${gcc_march}_zvfh_exec {
int main()
{
asm ("vsetivli zero,8,e16,m1,ta,ma");
@@ -2047,6 +2047,29 @@ proc check_effective_target_riscv_zvfh_ok { } {
 return 0
 }

+# Return 1 if we can load a vector from a 1-byte aligned address.
+
+proc check_effective_target_riscv_v_misalign_ok { } {
+
+if { ![check_effective_target_riscv_v_ok] } {
+   return 0
+}
+
+set gcc_march [riscv_get_arch]
+if { [check_runtime ${gcc_march}_misalign_exec {
+ int main() {
+ unsigned char a[16]
+   = 

Re: [PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-24 Thread Iain Sandoe



> On 24 May 2024, at 14:54, Jason Merrill  wrote:
> 
> On 5/24/24 04:06, Nathaniel Shead wrote:
>> On Thu, May 23, 2024 at 06:41:06PM -0400, Jason Merrill wrote:
>>> On 5/13/24 07:56, Nathaniel Shead wrote:
>> @@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree 
>> args)
>>  if (tmpl != error_mark_node)
>>  {
>>/* The new TMPL is not an instantiation of anything, so we
>> - forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
>> + forget its origins.  It is also not a specialization of
>> + anything.  We don't reset CLASSTYPE_TI_TEMPLATE
>>   for the new type because that is supposed to be the
>>   corresponding template decl, i.e., TMPL.  */
>> +  spec_entry elt;
>> +  elt.tmpl = friend_tmpl;
>> +  elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
>> +  elt.spec = TREE_TYPE (tmpl);
>> +  type_specializations->remove_elt ();
> 
> For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be
> unconditional.  OK.
 
 I'm looking to backport this patch to GCC 14 now that it's been on trunk
 some time.  Here's the patch I'm aiming to add (squashed with the
 changes from r15-220-gec2365e07537e8) after cherrypicking the
 prerequisite commit r15-58-g2faf040335f9b4; is this OK?
 
 Or should I keep it as two separate commits to make the cherrypicking
 more obvious? Not entirely sure on the etiquette around this.
>>> 
>>> It's OK to squash them, but it's typical to use -x (directly or via git
>>> gcc-backport) to mention where a branch change was cherry-picked from, and
>>> in this case it would make sense to edit in the second commit so it's clear
>>> the backport includes both.  OK that way.
>> Sorry, still a bit confused :)  Do you mean to merge the two commits
>> together such that there are two "cherry picked from commit ..."s in the
>> commit message?  Or just list second commit, and mention that it
>> includes both in the commit message?
> 
> I was thinking
> 
> (cherry picked from commit a and b)

For the record, I do not think the git hooks with allow that exactly (or, at 
least, if they
do I did not find the right syntax);  what I’ve done in similar cases is to 
keep the main
“cherry picked from” line for the main patch and then add text to the intro 
section
saying that  and  are also included.

Iain

> 
> but the exact format doesn't matter, just looking for a mention of both 
> commits.
> 
> Jason



[PATCH] RISC-V: Introduce -mrvv-allow-misalign.

2024-05-24 Thread Robin Dapp
Hi,

this patch changes the default from always enabling movmisalign to
disabling it.  It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.

It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or disables the vector misalignment tests
depending on whether the target under test can execute a misaligned
vle32.  I haven't actually tested it on a target that does not
support misaligned vector loads, though.

Regtested on rv64gcv_zvfh_zvbb.  There are a few additional
failures in the rvv testsuite.  They are caused by us overwriting
the default vectorizer flags rather than appending.  I'm going to
fix them in a subsequent patch but for now I'd rather get things
rolling.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
Move from here...
* config/riscv/riscv.h (TARGET_VECTOR_MISALIGN_SUPPORTED):
...to here and make dependent on uarch and rvv_allow_misalign.
* config/riscv/riscv.opt: Add -mrvv-allow-unaligned.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add
check_effective_target_riscv_v_misalign_ok.
---
 gcc/config/riscv/riscv-opts.h |  3 ---
 gcc/config/riscv/riscv.h  |  5 
 gcc/config/riscv/riscv.opt|  5 
 gcc/testsuite/lib/target-supports.exp | 34 +--
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1b2dd5757a8..f58a07abffc 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -147,9 +147,6 @@ enum rvv_vector_bits_enum {
  ? 0   
\
  : 32 << (__builtin_popcount (opts->x_riscv_zvl_flags) - 1))
 
-/* TODO: Enable RVV movmisalign by default for now.  */
-#define TARGET_VECTOR_MISALIGN_SUPPORTED 1
-
 /* The maximmum LMUL according to user configuration.  */
 #define TARGET_MAX_LMUL
\
   (int) (rvv_max_lmul == RVV_DYNAMIC ? RVV_M8 : rvv_max_lmul)
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index d6b14c4d620..8434e5677b6 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -934,6 +934,11 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   || (riscv_microarchitecture == sifive_p400) \
   || (riscv_microarchitecture == sifive_p600))
 
+/* True if the target supports misaligned vector loads and stores.  */
+#define TARGET_VECTOR_MISALIGN_SUPPORTED \
+  (rvv_allow_misalign == 1 \
+   || riscv_microarchitecture == generic_ooo)
+
 #define LOGICAL_OP_NON_SHORT_CIRCUIT 0
 
 /* Control the assembler format that we output.  */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 87f58332016..cff34eee8c9 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -628,3 +628,8 @@ Specify TLS dialect.
 mfence-tso
 Target Var(TARGET_FENCE_TSO) Init(1)
 Specifies whether the fence.tso instruction should be used.
+
+mrvv-allow-misalign
+Target Var(rvv_allow_misalign) Init(0)
+Allow the creation of misaligned vector loads and stores irrespective of the
+current uarch. The default is off.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f0f6da52275..ebb908f5c8f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2034,7 +2034,7 @@ proc check_effective_target_riscv_zvfh_ok { } {
 # check if we can execute vector insns with the given hardware or
 # simulator
 set gcc_march [regsub {[[:alnum:]]*} [riscv_get_arch] ]
-if { [check_runtime ${gcc_march}_exec {
+if { [check_runtime ${gcc_march}_zvfh_exec {
int main()
{
asm ("vsetivli zero,8,e16,m1,ta,ma");
@@ -2047,6 +2047,29 @@ proc check_effective_target_riscv_zvfh_ok { } {
 return 0
 }
 
+# Return 1 if we can load a vector from a 1-byte aligned address.
+
+proc check_effective_target_riscv_v_misalign_ok { } {
+
+if { ![check_effective_target_riscv_v_ok] } {
+   return 0
+}
+
+set gcc_march [riscv_get_arch]
+if { [check_runtime ${gcc_march}_misalign_exec {
+ int main() {
+ unsigned char a[16]
+   = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+ asm ("vsetivli zero,7,e8,m1,ta,ma");
+ asm ("addi a7,%0,1" : : "r" (a) : "a7" );
+ asm ("vle8.v v8,0(a7)" : : : "v8");
+ return 0; } } "-march=${gcc_march}"] } {
+   return 1
+}
+
+return 0
+}
+
 proc riscv_get_arch { } {
 set gcc_march ""
 # ??? do we neeed to add more extensions to the list below?
@@ -8139,7 +8162,6 @@ proc check_effective_target_vect_hw_misalign { } {
 || ([istarget mips*-*-*] && [et-is-effective-target mips_msa])
 || ([istarget s390*-*-*]
   

Re: [PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-24 Thread Nathaniel Shead
On Fri, May 24, 2024 at 09:54:31AM -0400, Jason Merrill wrote:
> On 5/24/24 04:06, Nathaniel Shead wrote:
> > On Thu, May 23, 2024 at 06:41:06PM -0400, Jason Merrill wrote:
> > > On 5/13/24 07:56, Nathaniel Shead wrote:
> > > > > > @@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, 
> > > > > > tree args)
> > > > > >   if (tmpl != error_mark_node)
> > > > > > {
> > > > > >   /* The new TMPL is not an instantiation of anything, 
> > > > > > so we
> > > > > > -forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
> > > > > > +forget its origins.  It is also not a specialization of
> > > > > > +anything.  We don't reset CLASSTYPE_TI_TEMPLATE
> > > > > >  for the new type because that is supposed to be the
> > > > > >  corresponding template decl, i.e., TMPL.  */
> > > > > > + spec_entry elt;
> > > > > > + elt.tmpl = friend_tmpl;
> > > > > > + elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
> > > > > > + elt.spec = TREE_TYPE (tmpl);
> > > > > > + type_specializations->remove_elt ();
> > > > > 
> > > > > For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it 
> > > > > can be
> > > > > unconditional.  OK.
> > > > 
> > > > I'm looking to backport this patch to GCC 14 now that it's been on trunk
> > > > some time.  Here's the patch I'm aiming to add (squashed with the
> > > > changes from r15-220-gec2365e07537e8) after cherrypicking the
> > > > prerequisite commit r15-58-g2faf040335f9b4; is this OK?
> > > > 
> > > > Or should I keep it as two separate commits to make the cherrypicking
> > > > more obvious? Not entirely sure on the etiquette around this.
> > > 
> > > It's OK to squash them, but it's typical to use -x (directly or via git
> > > gcc-backport) to mention where a branch change was cherry-picked from, and
> > > in this case it would make sense to edit in the second commit so it's 
> > > clear
> > > the backport includes both.  OK that way.
> > 
> > Sorry, still a bit confused :)  Do you mean to merge the two commits
> > together such that there are two "cherry picked from commit ..."s in the
> > commit message?  Or just list second commit, and mention that it
> > includes both in the commit message?
> 
> I was thinking
> 
> (cherry picked from commit a and b)
> 
> but the exact format doesn't matter, just looking for a mention of both
> commits.
> 
> Jason
> 

Thanks, pushed as r14-10240-gfd6fd88b1a93f4fb38f095688255ab5c00122810.


Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-24 Thread Jonathan Wakely
On Thu, 23 May 2024 at 18:38, François Dumont  wrote:
>
>
> On 23/05/2024 15:31, Jonathan Wakely wrote:
> > On 23/05/24 06:55 +0200, François Dumont wrote:
> >> As explained in this email:
> >>
> >> https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html
> >>
> >> I experimented -Wfree-nonheap-object because of my enhancements on
> >> algos.
> >>
> >> So here is a patch to extend the usage of the _Guard type to other
> >> parts of vector.
> >
> > Nice, that fixes the warning you were seeing?
>
> Yes ! I indeed forgot to say so :-)
>
>
> >
> > We recently got a bug report about -Wfree-nonheap-object in
> > std::vector, but that is coming from _M_realloc_append which already
> > uses the RAII guard :-(
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016
>
> Note that I also had to move call to __uninitialized_copy_a before
> assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
> warn. But _M_realloc_append is already doing potentially throwing
> operations before assigning this->_M_impl so it must be something else.
>
> Though it made me notice another occurence of _Guard in this method. Now
> replaced too in this new patch.
>
>  libstdc++: Use RAII to replace try/catch blocks
>
>  Move _Guard into std::vector declaration and use it to guard all
> calls to
>  vector _M_allocate.
>
>  Doing so the compiler has more visibility on what is done with the
> pointers
>  and do not raise anymore the -Wfree-nonheap-object warning.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/vector.tcc (_Guard): Move all the nested
> duplicated class...
>  * include/bits/stl_vector.h (_Guard_alloc): ...here.
>  (_M_allocate_and_copy): Use latter.
>  (_M_initialize_dispatch): Likewise and set _M_finish first
> from the result
>  of __uninitialize_fill_n_a that can throw.
>  (_M_range_initialize): Likewise.
>
> >> diff --git a/libstdc++-v3/include/bits/stl_vector.h
> >> b/libstdc++-v3/include/bits/stl_vector.h
> >> index 31169711a48..4ea74e3339a 100644
> >> --- a/libstdc++-v3/include/bits/stl_vector.h
> >> +++ b/libstdc++-v3/include/bits/stl_vector.h
> >> @@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>   clear() _GLIBCXX_NOEXCEPT
> >>   { _M_erase_at_end(this->_M_impl._M_start); }
> >>
> >> +private:
> >> +  // RAII guard for allocated storage.
> >> +  struct _Guard
> >
> > If it's being defined at class scope instead of locally in a member
> > function, I think a better name would be good. Maybe _Ptr_guard or
> > _Dealloc_guard or something.
> _Guard_alloc chosen.
> >
> >> +  {
> >> +pointer _M_storage;// Storage to deallocate
> >> +size_type _M_len;
> >> +_Base& _M_vect;
> >> +
> >> +_GLIBCXX20_CONSTEXPR
> >> +_Guard(pointer __s, size_type __l, _Base& __vect)
> >> +: _M_storage(__s), _M_len(__l), _M_vect(__vect)
> >> +{ }
> >> +
> >> +_GLIBCXX20_CONSTEXPR
> >> +~_Guard()
> >> +{
> >> +  if (_M_storage)
> >> +_M_vect._M_deallocate(_M_storage, _M_len);
> >> +}
> >> +
> >> +_GLIBCXX20_CONSTEXPR
> >> +pointer
> >> +_M_release()
> >> +{
> >> +  pointer __res = _M_storage;
> >> +  _M_storage = 0;
> >
> > I don't think the NullablePointer requirements include assigning 0,
> > only from nullptr, which isn't valid in C++98.
> >
> > https://en.cppreference.com/w/cpp/named_req/NullablePointer
> >
> > Please use _M_storage = pointer() instead.
>
> I forgot about user fancy pointer, fixed.
>
>
> >
> >> +  return __res;
> >> +}
> >> +
> >> +  private:
> >> +_Guard(const _Guard&);
> >> +  };
> >> +
> >> protected:
> >>   /**
> >>*  Memory expansion handler.  Uses the member allocation
> >> function to
> >> @@ -1618,18 +1651,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >> _M_allocate_and_copy(size_type __n,
> >>  _ForwardIterator __first, _ForwardIterator __last)
> >> {
> >> -  pointer __result = this->_M_allocate(__n);
> >> -  __try
> >> -{
> >> -  std::__uninitialized_copy_a(__first, __last, __result,
> >> -  _M_get_Tp_allocator());
> >> -  return __result;
> >> -}
> >> -  __catch(...)
> >> -{
> >> -  _M_deallocate(__result, __n);
> >> -  __throw_exception_again;
> >> -}
> >> +  _Guard __guard(this->_M_allocate(__n), __n, *this);
> >> +  std::__uninitialized_copy_a
> >> +(__first, __last, __guard._M_storage, _M_get_Tp_allocator());
> >> +  return __guard._M_release();
> >> }
> >>
> >>
> >> @@ -1642,13 +1667,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >>   // 438. Ambiguity in the "do the right thing" clause
> >>   template
> >> void
> >> -_M_initialize_dispatch(_Integer __n, _Integer __value, __true_type)
> >> +_M_initialize_dispatch(_Integer __int_n, _Integer __value,
> >> __true_type)
> >> {
> 

Re: [PATCH] c++: extend -Wself-move for mem-init-list [PR109396]

2024-05-24 Thread Jason Merrill

On 5/23/24 19:57, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
We already warn for:

   x = std::move (x);

which triggers:

   warning: moving 'x' of type 'int' to itself [-Wself-move]

but bug 109396 reports that this doesn't work for a member-initializer-list:

   X() : x(std::move (x))

so this patch amends that.

PR c++/109396

gcc/cp/ChangeLog:

* cp-tree.h (maybe_warn_self_move): Declare.
* init.cc (perform_member_init): Call maybe_warn_self_move.
* typeck.cc (maybe_warn_self_move): No longer static.  Change the
return type to bool.  Also warn when called from
a member-initializer-list.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wself-move2.C: New test.
---
  gcc/cp/cp-tree.h|  1 +
  gcc/cp/init.cc  |  5 ++--
  gcc/cp/typeck.cc| 28 +--
  gcc/testsuite/g++.dg/warn/Wself-move2.C | 37 +
  4 files changed, 60 insertions(+), 11 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wself-move2.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ba9e848c177..ea3fa6f4aac 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8263,6 +8263,7 @@ extern cp_expr build_c_cast   
(location_t loc, tree type,
 cp_expr expr);
  extern tree cp_build_c_cast   (location_t, tree, tree,
 tsubst_flags_t);
+extern bool maybe_warn_self_move   (location_t, tree, tree);
  extern cp_expr build_x_modify_expr(location_t, tree,
 enum tree_code, tree,
 tree, tsubst_flags_t);
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 52396d87a8c..4a7ed7f5302 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -999,7 +999,7 @@ perform_member_init (tree member, tree init, hash_set 
)
if (decl == error_mark_node)
  return;
  
-  if ((warn_init_self || warn_uninitialized)

+  if ((warn_init_self || warn_uninitialized || warn_self_move)
&& init
&& TREE_CODE (init) == TREE_LIST
&& TREE_CHAIN (init) == NULL_TREE)
@@ -1013,7 +1013,8 @@ perform_member_init (tree member, tree init, hash_set 
)
warning_at (DECL_SOURCE_LOCATION (current_function_decl),
OPT_Winit_self, "%qD is initialized with itself",
member);
-  else
+  else if (!maybe_warn_self_move (input_location, member,
+ TREE_VALUE (init)))
find_uninit_fields (, , decl);
  }
  
diff --git a/gcc/cp/typeck.cc b/gcc/cp/typeck.cc

index d7fa6e0dd96..e058ce18276 100644
--- a/gcc/cp/typeck.cc
+++ b/gcc/cp/typeck.cc
@@ -9355,27 +9355,27 @@ cp_build_c_cast (location_t loc, tree type, tree expr,
  
  /* Warn when a value is moved to itself with std::move.  LHS is the target,

 RHS may be the std::move call, and LOC is the location of the whole
-   assignment.  */
+   assignment.  Return true if we warned.  */
  
-static void

+bool
  maybe_warn_self_move (location_t loc, tree lhs, tree rhs)
  {
if (!warn_self_move)
-return;
+return false;
  
/* C++98 doesn't know move.  */

if (cxx_dialect < cxx11)
-return;
+return false;
  
if (processing_template_decl)

-return;
+return false;
  
if (!REFERENCE_REF_P (rhs)

|| TREE_CODE (TREE_OPERAND (rhs, 0)) != CALL_EXPR)
-return;
+return false;
tree fn = TREE_OPERAND (rhs, 0);
if (!is_std_move_p (fn))
-return;
+return false;
  
/* Just a little helper to strip * and various NOPs.  */

auto extract_op = [] (tree ) {
@@ -9393,13 +9393,23 @@ maybe_warn_self_move (location_t loc, tree lhs, tree 
rhs)
tree type = TREE_TYPE (lhs);
tree orig_lhs = lhs;
extract_op (lhs);
-  if (cp_tree_equal (lhs, arg))
+  if (cp_tree_equal (lhs, arg)
+  /* Also warn in a member-initializer-list, as in : i(std::move(i)).  */
+  || (TREE_CODE (lhs) == FIELD_DECL
+ && TREE_CODE (arg) == COMPONENT_REF
+ && cp_tree_equal (TREE_OPERAND (arg, 0), current_class_ref)
+ && TREE_OPERAND (arg, 1) == lhs))
  {
auto_diagnostic_group d;
if (warning_at (loc, OPT_Wself_move,
  "moving %qE of type %qT to itself", orig_lhs, type))
-   inform (loc, "remove % call");
+   {
+ inform (loc, "remove % call");


The patch is OK, but why do we suggest removing std::move?  That just 
changes the warning to self-init.  Maybe drop the inform?


Jason



Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric [PR108760]

2024-05-24 Thread Jonathan Wakely

On 24/05/24 13:56 -, Michael Levine (BLOOMBERG/ 731 LEX) wrote:

I've attached the v3 version of the patch as a single, squashed patch 
containing all of the changes.  I manually prepended my sign off to the patch.




Signed-off-by: Michael Levine 
---
diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 62faff173bd..d258be0b93f 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3521,58 +3521,6 @@ namespace ranges

#endif // __glibcxx_ranges_contains

-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
-
-  template
-struct out_value_result
-{
-  [[no_unique_address]] _Out out;
-  [[no_unique_address]] _Tp value;
-
-  template
-   requires convertible_to
- && convertible_to
-   constexpr
-   operator out_value_result<_Out2, _Tp2>() const &
-   { return {out, value}; }
-
-  template
-   requires convertible_to<_Out, _Out2>
- && convertible_to<_Tp, _Tp2>
-   constexpr
-   operator out_value_result<_Out2, _Tp2>() &&
-   { return {std::move(out), std::move(value)}; }
-};
-
-  template
-using iota_result = out_value_result<_Out, _Tp>;
-
-  struct __iota_fn
-  {
-template _Sent, 
weakly_incrementable _Tp>
-  requires indirectly_writable<_Out, const _Tp&>
-  constexpr iota_result<_Out, _Tp>
-  operator()(_Out __first, _Sent __last, _Tp __value) const
-  {
-   while (__first != __last)
- {
-   *__first = static_cast(__value);
-   ++__first;
-   ++__value;
- }
-   return {std::move(__first), std::move(__value)};
-  }
-
-template _Range>
-  constexpr iota_result, _Tp>
-  operator()(_Range&& __r, _Tp __value) const
-  { return (*this)(ranges::begin(__r), ranges::end(__r), 
std::move(__value)); }
-  };
-
-  inline constexpr __iota_fn iota{};
-
-#endif // __glibcxx_ranges_iota
-
#if __glibcxx_ranges_find_last >= 202207L // C++ >= 23

  struct __find_last_fn
diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index e26a73a27d6..965b36aed35 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -35,6 +35,7 @@
#include 
#include 
#include 
+#include  // __memcpy


Why is this being added here? What is __memcpy?

I don't think out_value_result requires any new headers to be included
here, does it?


#include  // ranges::begin, ranges::range etc.
#include   // __invoke
#include  // __is_byte
@@ -70,6 +71,32 @@ namespace ranges
__is_move_iterator> = true;
  } // namespace __detail

+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
+
+template
+struct out_value_result
+{
+[[no_unique_address]] _Out out;
+[[no_unique_address]] _Tp value;
+
+template
+   requires convertible_to
+&& convertible_to
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() const &
+   { return {out, value}; }
+
+template
+   requires convertible_to<_Out, _Out2>
+&& convertible_to<_Tp, _Tp2>
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() &&
+   { return {std::move(out), std::move(value)}; }
+};
+
+#endif // __glibcxx_ranges_iota
+
+
  struct __equal_fn
  {
template _Sent1,
diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index c912db4a519..d88f7f02137 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -65,6 +65,10 @@
# include 
#endif

+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
+#include  // for out_value_result as used by 
std::ranges::iota.  It transitively also brings in , from which 
_Range is used by std::ranges::iota


We generally try to keep lines below 80 columns, or 120 at a push.
This is unnecessarily long, and I don't know what _Range is meant to
be (that's just a template parameter, not something defined in
. Please just use:

#include  // for ranges::out_value_result

Otherwise this looks ready to go in, thanks.


+#endif // __glibcxx_ranges_iota
+
#if __cplusplus >= 201402L
# include 
# include 
@@ -726,6 +730,40 @@ namespace __detail
  /// @} group numeric_ops
#endif // C++17

+namespace ranges
+{
+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
+
+  template
+  using iota_result = out_value_result<_Out, _Tp>;
+
+  struct __iota_fn
+  {
+template _Sent, 
weakly_incrementable _Tp>
+  requires indirectly_writable<_Out, const _Tp&>
+  constexpr iota_result<_Out, _Tp>
+  operator()(_Out __first, _Sent __last, _Tp __value) const
+  {
+   while (__first != __last)
+ {
+   *__first = static_cast(__value);
+   ++__first;
+   ++__value;
+ }
+   return {std::move(__first), std::move(__value)};
+  }
+
+template _Range>
+  constexpr iota_result, _Tp>
+  operator()(_Range&& __r, _Tp 

Re: [PATCH v2] c++/modules: Improve errors for bad module-directives [PR115200]

2024-05-24 Thread Jason Merrill

On 5/23/24 19:58, Nathaniel Shead wrote:

On Thu, May 23, 2024 at 05:11:39PM -0400, Jason Merrill wrote:

On 5/23/24 10:54, Nathaniel Shead wrote:

Bootstrapped and regtested (so far just modules.exp and dg.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest succeeds?

-- >8 --

This fixes an ICE when a module directive is not given at global scope.
Although not explicitly mentioned, it seems implied from [basic.link] p1
and [module.global.frag] that a module-declaration must appear at the
global scope after preprocessing.  Apart from this the patch also
slightly improves the errors given when accidentally using a module
control-line in other situations where it is not expected.


This could also come up with something like

int module;
int i =
   module; // error, unexpected module directive

Adding a line break seems like confusing advice for this problem; rather,
they need to remove the line break before 'module'.  And possibly add it in
somewhere else, but the problem is that 'module' is the first token on the
line.  And if I put that in a namespace,

namespace A {
   int module;
   int i =
 module; // error, unexpected module directive
}

the problem is the same, but we get a different diagnostic.



True.

FWIW I just used the same wording as in 'cp_parser_import_declaration';
my understanding is it's because you can disambiguate by adding a
newline after the 'module' itself:

   int module;
   int i =
 module
 ;

is OK.  But I'll update this message to be clearer.


I think I'd leave the "must be at global scope" diagnostic to
cp_parser_module_declaration, and assume that if we see a module keyword at
function scope it wasn't intended to be a module directive.



How about this then? Bootstrapped and regtested on x86_64-pc-linux-gnu.


OK, thanks.


-- >8 --

This fixes an ICE when a module directive is not given at global scope.
Although not explicitly mentioned, it seems implied from [basic.link] p1
and [module.global.frag] that a module-declaration must appear at the
global scope after preprocessing.  Apart from this the patch also
slightly improves the errors given when accidentally using a module
control-line in other situations where it is not expected.

PR c++/115200

gcc/cp/ChangeLog:

* parser.cc (cp_parser_error_1): Special-case unexpected module
directives for better diagnostics.
(cp_parser_module_declaration): Check that the module
declaration is at global scope.
(cp_parser_import_declaration): Sync error message with that in
cp_parser_error_1.

gcc/testsuite/ChangeLog:

* g++.dg/modules/mod-decl-1.C: Update error messages.
* g++.dg/modules/mod-decl-6.C: New test.
* g++.dg/modules/mod-decl-7.C: New test.
* g++.dg/modules/mod-decl-8.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/parser.cc  | 26 ---
  gcc/testsuite/g++.dg/modules/mod-decl-1.C |  8 ---
  gcc/testsuite/g++.dg/modules/mod-decl-6.C | 11 ++
  gcc/testsuite/g++.dg/modules/mod-decl-7.C | 11 ++
  gcc/testsuite/g++.dg/modules/mod-decl-8.C | 14 
  5 files changed, 64 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/mod-decl-6.C
  create mode 100644 gcc/testsuite/g++.dg/modules/mod-decl-7.C
  create mode 100644 gcc/testsuite/g++.dg/modules/mod-decl-8.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 476ddc0d63a..779625144db 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -3230,6 +3230,19 @@ cp_parser_error_1 (cp_parser* parser, const char* gmsgid,
return;
  }
  
+  if (cp_token_is_module_directive (token))

+{
+  auto_diagnostic_group d;
+  error_at (token->location, "unexpected module directive");
+  if (token->keyword != RID__EXPORT)
+   inform (token->location, "perhaps insert a line break after"
+   " %qs, or other disambiguation, to prevent this being"
+   " considered a module control-line",
+   (token->keyword == RID__MODULE) ? "module" : "import");
+  cp_parser_skip_to_pragma_eol (parser, token);
+  return;
+}
+
/* If this is actually a conflict marker, report it as such.  */
if (token->type == CPP_LSHIFT
|| token->type == CPP_RSHIFT
@@ -15135,12 +15148,19 @@ cp_parser_module_declaration (cp_parser *parser, 
module_parse mp_state,
parser->lexer->in_pragma = true;
cp_token *token = cp_lexer_consume_token (parser->lexer);
  
+  tree scope = current_scope ();

if (flag_header_unit)
  {
error_at (token->location,
"module-declaration not permitted in header-unit");
goto skip_eol;
  }
+  else if (scope != global_namespace)
+{
+  error_at (token->location, "module-declaration must be at global scope");
+  inform (DECL_SOURCE_LOCATION (scope), "scope opened here");
+  goto skip_eol;
+}
else if (mp_state == MP_FIRST && 

Re: [PATCH v2] c++: mark TARGET_EXPRs for function arguments eliding [PR114707]

2024-05-24 Thread Jason Merrill

On 5/23/24 20:32, Marek Polacek wrote:

On Thu, May 23, 2024 at 04:04:13PM -0400, Jason Merrill wrote:

On 5/23/24 10:41, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Coming back to our discussion in
:
TARGET_EXPRs that initialize a function argument are not marked
TARGET_EXPR_ELIDING_P even though gimplify_arg drops such TARGET_EXPRs
on the floor.


But only if TREE_TYPE (TARGET_EXPR_INITIAL is non-void, I think we should
check that here too to be parallel.


Ah yes, definitely.
  

Perhaps most/all affected TARGET_EXPRs will have been handled earlier in the
function under the TREE_ADDRESSABLE check, but I wouldn't rely on that
without an assert.


So like this or you want an assert somewhere too?  dg.exp passed.

-- >8 --
Coming back to our discussion in
:
TARGET_EXPRs that initialize a function argument are not marked
TARGET_EXPR_ELIDING_P even though gimplify_arg drops such TARGET_EXPRs
on the floor.  To work around it, I added a pset to
replace_placeholders_for_class_temp_r, but it would be best to just rely
on TARGET_EXPR_ELIDING_P.

PR c++/114707

gcc/cp/ChangeLog:

* call.cc (convert_for_arg_passing): Call set_target_expr_eliding.
* typeck2.cc (replace_placeholders_for_class_temp_r): Don't use pset.
(digest_nsdmi_init): Call cp_walk_tree_without_duplicates instead of
cp_walk_tree.
---
  gcc/cp/call.cc|  6 ++
  gcc/cp/typeck2.cc | 20 
  2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index ed68eb3c568..35c024f2c7c 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -9437,6 +9437,12 @@ convert_for_arg_passing (tree type, tree val, 
tsubst_flags_t complain)
if (complain & tf_warning)
  warn_for_address_of_packed_member (type, val);
  
+  /* gimplify_arg elides TARGET_EXPRs that initialize a function argument.  */

+  if (TREE_CODE (val) == TARGET_EXPR)
+if (tree init = TARGET_EXPR_INITIAL (val))
+  if (!VOID_TYPE_P (TREE_TYPE (init)))


You can simplify this test to 'if (SIMPLE_TARGET_EXPR_P ...'.  OK with 
that change.


Jason



Re: [PATCH 1/2] c++/modules: Fix treatment of unnamed types

2024-05-24 Thread Jason Merrill

On 5/23/24 21:27, Nathaniel Shead wrote:

On Thu, May 23, 2024 at 03:36:48PM -0400, Jason Merrill wrote:

On 5/23/24 09:27, Nathaniel Shead wrote:

On Mon, May 20, 2024 at 06:00:09PM -0400, Jason Merrill wrote:

On 5/17/24 02:14, Nathaniel Shead wrote:

On Tue, May 14, 2024 at 06:21:48PM -0400, Jason Merrill wrote:

On 5/12/24 22:58, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.



I realised as I was looking over this again that I might have spoken too
soon with the header unit example being supported. Doing the following:

 // a.H
 struct { int y; } s;
 decltype(s) f(decltype(s));  // { dg-error "used but never defined" }
 inline auto x = f({ 123 });
 // b.C
 struct {} unrelated;
 import "a.H";
 decltype(s) f(decltype(s) x) {
   return { 456 + x.y };
 }

 // c.C
 import "linkage-3_a.H";
 int main() { auto a = x.y; }

Actually does fail to link, because in 'c.C' we call 'f(.anon_0)' but
the definition 'b.C' is f(.anon_1).

I don't think this is fixable, so I don't think this direction is
workable.


Since namespace-scope anonymous types are TU-local, we don't need to support
that for proper modules, but it's not clear to me that we don't need to
support it for header units.

OTOH, https://eel.is/c++draft/module#import-5.3 allows c.C to import a
different header unit than b.C, in which case the type is different and x
violates the odr.



Right; I think at this stage I don't know how to support this for header
units (and even for module interface units it doesn't actually work;
more on this below), so I think saying that this is actually an ODR
violation is OK.


That said, I think that it might still be worth making header modules
satisfy 'module_has_cmi_p', since that is true to the name, and will be
useful in other places we currently use 'module_p ()': in which case we
could instead make all the callers in 'no_linkage_check' do
'module_maybe_has_cmi_p () && !header_module_p ()'; something like the
following, perhaps?


If we need that condition, it should be its own predicate rather than
expecting callers to do that combined check.

But it's not clear to me how this is different from a type in the GMF of a
named module, which is exactly the maybe_has_cmi case; there we could again
see a different version of the type if another TU includes the header.

Jason



This made me go back and double-check for named modules and it actually
does fail as well; the following sample ICEs, even:

export module M;
struct {} s;
int h(decltype(s));
int x = h(s);  // ICE in write_unnamed_type_name, cp/mangle.cc:1806

So I think maybe the way to go here is to just not treat unnamed types
as something that could possibly be accessed from a different TU, like
the below.  Then we don't need to do the special handling for header
units, since as you say, they're not materially different anyway.
Thoughts?


Sounds good.


(And I've moved the original change to 'module_has_cmi_p' to a separate
patch given it's somewhat unrelated now.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk (and
maybe 14.2)?

-- >8 --

In r14-9530 we relaxed "depending on type with no-linkage" errors for
declarations that could actually be accessed from different TUs anyway.
However, this also enabled it for unnamed types, which never work.

In a normal module interface, an unnamed type is TU-local by
[basic.link] p15.2, and so cannot be exposed or the program is
ill-formed.  We don't yet implement this checking but we should assume
that we will later; currently supporting this actually causes ICEs when
attempting to create the mangled name in some situations.

For a header unit, by [module.import] p5.3 it is unspecified whether two
TUs importing a header unit providing such a declaration are importing
the same header unit.  In this case, we would require name mangling
changes to somehow allow the (anonymous) type exported by such a header
unit to correspond across different TUs in the presence of other
anonymous declarations, so for this patch just assume that this case
would be an ODR violation instead.

diff --git a/gcc/testsuite/g++.dg/modules/linkage-2.C 
b/gcc/testsuite/g++.dg/modules/linkage-2.C
index eb4d7b051af..f69bd7ff728 100644
--- a/gcc/testsuite/g++.dg/modules/linkage-2.C
+++ b/gcc/testsuite/g++.dg/modules/linkage-2.C
@@ -13,14 +13,15 @@ namespace {
   return A{};
 }
 decltype(f()) g();  // { dg-error "used but never defined" }
-
-  struct {} s;
-  decltype(s) h();  // { dg-error "used but never defined" }
   }
   export void use() {
 g();
-  h();


The above changes seem undesirable; we should still give that error.



Oh whoops, I don't know why I got rid of those test cases; added back in
and they still correctly error.


+// Additionally, unnamed types have no linkage but are also TU-local,
+// and thus cannot be in a module interface unit.
+// We don't yet implement this checking 

Re: [PATCH v3] libstdc++: Fix std::ranges::iota not in numeric [PR108760]

2024-05-24 Thread Michael Levine (BLOOMBERG/ 731 LEX)
I've attached the v3 version of the patch as a single, squashed patch 
containing all of the changes.  I manually prepended my sign off to the patch.

From: ppa...@redhat.com At: 05/23/24 18:41:14 UTC-4:00To:  Michael Levine 
(BLOOMBERG/ 731 LEX ) 
Cc:  gcc-patches@gcc.gnu.org,  libstd...@gcc.gnu.org
Subject: Re: [PATCH v2] libstdc++: Fix std::ranges::iota not included in 
numeric [PR108760]

On Fri, 17 May 2024, Michael Levine (BLOOMBERG/ 731 LEX) wrote:

> This is the revised version of my patch incorporating the provided feedback 
from Patrick Palka and Jonathan Wakely.
> This patch fixes GCC Bug 108760: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108760
> I moved out_value_result to , moved std::ranges:iota 
into , removed my new test, and moved and renamed the existing test.

Nice, thanks!  The incremental changes seem good, but could you send a
single squashed patch containing all the changes?  That's what we'll end
up pushing after all.

> 
> I built my local version of gcc using the following configuration: $ 
../gcc/configure --disable-bootstrap --prefix="$(pwd)/_pfx/" 
--enable-languages=c,c++,lto
> I then ran $ make -jN
> and $ make -jN install
> 
> Using the locally installed version, the following code compiled: 
https://godbolt.org/z/33EPeqd1b
> 
> I tested my changes by running: $ make check-c++ -jN -k
> I personally found it difficult to understand the results of running the 
tests.
> 
> I ran this on the following OS:
> 
> Virtualization: wsl
> Operating System: Ubuntu 20.04.6 LTS
> Kernel: Linux 5.15.146.1-microsoft-standard-WSL2
> Architecture: x86-64
> 
> 
> 
> From: Michael Levine (BLOOMBERG/ 731 LEX) At: 04/17/24 14:24:24 UTC-4:00
> To: libstd...@gcc.gnu.org, gcc-patches@gcc.gnu.org
> Subject: [PATCH] libstdc++: Fix std::ranges::iota is not included in numeric 
[PR108760]
> 
> This patch fixes GCC Bug 108760: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108760
> Before this patch, using std::ranges::iota required including  
when it should have been sufficient to only include .
> 
> When the patch is applied, the following code will compile: 
https://godbolt.org/z/33EPeqd1b
> 
> I added a test case for this change as well.
> 
> I built my local version of gcc using the following configuration: $ 
../gcc/configure --disable-bootstrap --prefix="$(pwd)/_pfx/" 
--enable-languages=c,c++,lto
> 
> and I tested my changes by running: $ make check-c++ -jN -k
> 
> I ran this on the following OS:
> 
> Virtualization: wsl
> Operating System: Ubuntu 20.04.6 LTS
> Kernel: Linux 5.15.146.1-microsoft-standard-WSL2
> Architecture: x86-64
> 
> 
> 
> 
> 




108760v3.patch
Description: Binary data


Re: [PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-24 Thread Jason Merrill

On 5/24/24 04:06, Nathaniel Shead wrote:

On Thu, May 23, 2024 at 06:41:06PM -0400, Jason Merrill wrote:

On 5/13/24 07:56, Nathaniel Shead wrote:

@@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree args)
  if (tmpl != error_mark_node)
{
  /* The new TMPL is not an instantiation of anything, so we
-forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
+forget its origins.  It is also not a specialization of
+anything.  We don't reset CLASSTYPE_TI_TEMPLATE
 for the new type because that is supposed to be the
 corresponding template decl, i.e., TMPL.  */
+ spec_entry elt;
+ elt.tmpl = friend_tmpl;
+ elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
+ elt.spec = TREE_TYPE (tmpl);
+ type_specializations->remove_elt ();


For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be
unconditional.  OK.


I'm looking to backport this patch to GCC 14 now that it's been on trunk
some time.  Here's the patch I'm aiming to add (squashed with the
changes from r15-220-gec2365e07537e8) after cherrypicking the
prerequisite commit r15-58-g2faf040335f9b4; is this OK?

Or should I keep it as two separate commits to make the cherrypicking
more obvious? Not entirely sure on the etiquette around this.


It's OK to squash them, but it's typical to use -x (directly or via git
gcc-backport) to mention where a branch change was cherry-picked from, and
in this case it would make sense to edit in the second commit so it's clear
the backport includes both.  OK that way.


Sorry, still a bit confused :)  Do you mean to merge the two commits
together such that there are two "cherry picked from commit ..."s in the
commit message?  Or just list second commit, and mention that it
includes both in the commit message?


I was thinking

(cherry picked from commit a and b)

but the exact format doesn't matter, just looking for a mention of both 
commits.


Jason



[PATCH 5/5] RISC-V: tree-optimization/65518 - extend fix to SLP

2024-05-24 Thread Richard Biener
This extends the PR65518 workaround to also apply for single-lane SLP.

* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.
---
 gcc/tree-vect-stmts.cc | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1c30a0388ca..a01099d3456 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2154,6 +2154,23 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
}
  overrun_p = true;
}
+
+ /* If this is single-element interleaving with an element
+distance that leaves unused vector loads around punt - we
+at least create very sub-optimal code in that case (and
+blow up memory, see PR65518).  */
+ if (loop_vinfo
+ && *memory_access_type == VMAT_CONTIGUOUS
+ && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
+ && single_element_p
+ && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"single-element interleaving not supported "
+"for not adjacent vector loads\n");
+ return false;
+   }
}
 }
   else
-- 
2.35.3


[PATCH 4/5] Allow optimized SLP reduction epilog with single-lane reductions

2024-05-24 Thread Richard Biener
This extends optimized reduction epilog handling to cover the
trivial single-lane SLP reduction case.

* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
---
 gcc/tree-vect-loop.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 83c0544b6aa..31abfe047a4 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6500,7 +6500,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three schemes described
  above. In SLP we simply need to extract all the elements from the 
  vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_fn != IFN_LAST && !slp_reduc)
+  else if (reduc_fn != IFN_LAST && (!slp_reduc || group_size == 1))
 {
   tree tmp;
   tree vec_elem_type;
@@ -6670,7 +6670,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   gsi_insert_seq_before (_gsi, stmts, GSI_SAME_STMT);
   reduc_inputs[0] = new_temp;
 
-  if (reduce_with_shift && !slp_reduc)
+  if (reduce_with_shift && (!slp_reduc || group_size == 1))
{
  int element_bitsize = tree_to_uhwi (bitsize);
  /* Enforced by vectorizable_reduction, which disallows SLP reductions
-- 
2.35.3



[PATCH 3/5] Reduce single-lane SLP testresult noise

2024-05-24 Thread Richard Biener
The following avoids dumping 'vectorizing stmts using SLP' for
single-lane instances since that causes extra testsuite fallout.

* tree-vect-slp.cc (vect_schedule_slp): Gate dumping
'vectorizing stmts using SLP' on > 1 lanes.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 73cc69d85ce..ebb71c209eb 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10098,7 +10098,8 @@ vect_schedule_slp (vec_info *vinfo, const 
vec _instances)
   if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
vectorize_slp_instance_root_stmt (node, instance);
 
-  if (dump_enabled_p ())
+  /* ???  Reduce some testsuite noise because of "more SLP".  */
+  if (SLP_TREE_LANES (node) > 1 && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
  "vectorizing stmts using SLP.\n");
 }
-- 
2.35.3



[PATCH 2/5] Avoid bogus SLP outer loop vectorization

2024-05-24 Thread Richard Biener
This fixes the check for multiple types which go wrong I think
because of bogus pointer IV increments when there are multiple
copies of vector stmts in the inner loop.

* tree-vect-stmts.cc (vectorizable_load): Avoid outer loop
SLP vectorization with multi-copy vector stmts in the inner
loop.
(vectorizable_store): Likewise.
---
 gcc/tree-vect-stmts.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..1c30a0388ca 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8196,7 +8196,9 @@ vectorizable_store (vec_info *vinfo,
   gcc_assert (ncopies >= 1);
 
   /* FORNOW.  This restriction should be relaxed.  */
-  if (loop && nested_in_vect_loop_p (loop, stmt_info) && ncopies > 1)
+  if (loop
+  && nested_in_vect_loop_p (loop, stmt_info)
+  && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9939,7 +9941,8 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (ncopies >= 1);
 
   /* FORNOW. This restriction should be relaxed.  */
-  if (nested_in_vect_loop && ncopies > 1)
+  if (nested_in_vect_loop
+  && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
 {
   if (dump_enabled_p ())
 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-- 
2.35.3



[PATCH 1/5] Do single-lane SLP discovery for reductions

2024-05-24 Thread Richard Biener
This is the second merge proposed from the SLP vectorizer branch.
I have again managed without adding and using --param vect-single-lane-slp
but instead this provides always enabled functionality.

This makes us use SLP reductions (a group of reductions) for the
case where the group size is one.  This basically means we try
to use SLP for all reductions.

I've kept the series close to changes how they are on the branch
but in the end I'll squash it, having separate commits for review
eventually helps identifying common issues we will run into.  In
particular we lack full SLP support for several reduction kinds
and the branch has more enabling patches than in this series.
For example 4/5 makes sure we use shifts and direct opcode
reductions in the reduction epilog for SLP reductions but doesn't
bother to try covering the general case but enables it only
for the single-element group case to avoid regressions
in gcc.dg/vect/reduc-{mul,or}_[12].c testcases.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
successfully built SPEC CPU 2017.  This posting should trigger
arm & riscv pre-checkin CI.

There's one ICE in gcc.target/i386/pr51235.c I discovered late
that I will investigate and address after the weekend.

This change should be more straight-forward than the previous one,
still comments are of course welcome.  After pushed I will followup
with changes to enable single-lane SLP reductions for various
COND_EXPR reductions as well as double-reduction support and
in-order reduction support (also all restricted to single-lane
for the moment).

Thanks,
Richard.

--

The following performs single-lane SLP discovery for reductions.
This exposes a latent issue with reduction SLP in outer loop
vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL
execution.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions.  Make sure to try single-lane SLP reduction
for all reductions as fallback.
---
 gcc/tree-vect-slp.cc | 71 +---
 1 file changed, 54 insertions(+), 17 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index c7ed520b629..73cc69d85ce 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
/* Reduction chain backedge defs are filled manually.
   ???  Need a better way to identify a SLP reduction chain PHI.
   Or a better overall way to SLP match those.  */
-   if (all_same && def_type == vect_reduction_def)
+   if (stmts.length () > 1
+   && all_same && def_type == vect_reduction_def)
  skip_args[loop_latch_edge (loop)->dest_idx] = true;
  }
else if (def_type != vect_internal_def)
@@ -3905,9 +3906,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  }
 
   /* Find SLP sequences starting from groups of reductions.  */
-  if (loop_vinfo->reductions.length () > 1)
+  if (loop_vinfo->reductions.length () > 0)
{
- /* Collect reduction statements.  */
+ /* Collect reduction statements we can combine into
+a SLP reduction.  */
  vec scalar_stmts;
  scalar_stmts.create (loop_vinfo->reductions.length ());
  for (auto next_info : loop_vinfo->reductions)
@@ -3920,25 +3922,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 reduction path.  In that case we'd have to reverse
 engineer that conversion stmt following the chain using
 reduc_idx and from the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
- /* Do not discover SLP reductions for lane-reducing ops, that
-will fail later.  */
- && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
+ && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+   {
+ /* Do not discover SLP reductions combining lane-reducing
+ops, that will fail later.  */
+ if (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
  || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR
  && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR
- && gimple_assign_rhs_code (g) != SAD_EXPR)))
-   scalar_stmts.quick_push (next_info);
+ && gimple_assign_rhs_code (g) != SAD_EXPR))
+   scalar_stmts.quick_push (next_info);
+ else
+   {
+ /* Do SLP discovery for single-lane reductions.  */
+ vec stmts;
+ vec roots = vNULL;
+  

Re: [PATCH] tree-ssa-pre.c/1071140(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE if no equal.

2024-05-24 Thread Jiawei



On 2024/5/24 20:33, Richard Biener wrote:

On Fri, May 24, 2024 at 1:49 PM Jiawei  wrote:

An ICE bug reported in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071140.
https://godbolt.org/z/WE9aGYvoo

Return NULL_TREE when TREE_CODE(op) not equal to SSA_NAME.

The assert is on purpose.  Can you open a GCC bug for this please?  It looks
like we have unfolded POLY_INT_CST [16, 16] /[ex] 16 here.

It seems that

 /* We can't always put a size in units of the element alignment
here as the element alignment may be not visible.  See
PR43783.  Simply drop the element size for constant
sizes.  */
 if (TREE_CODE (genop3) == INTEGER_CST
 && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
 && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
  (wi::to_offset (genop3)
   * vn_ref_op_align_unit (currop
   genop3 = NULL_TREE;

fails to match the POLY_INT case - the unit alignment is 16 here.  One
possibility would be to match the EXACT_DIV_EXPR case and the
INTEGER_CST divisor to vn_ref_op_align_unit and the other half
separately.  But maybe this can be written in a "proper" way?

The EXACT_DIV_EXPR is built by copy_reference_ops_from_ref,
I suppose SVE could be similarly affected.

Richard.


Thanks for your quick reply, reported it on bugzilla——

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115214


BR,

Jiawei


gcc/ChangeLog:

 * tree-ssa-pre.cc (find_or_generate_expression): Remove assert.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/vsetvl/pr1071140.c: New test.

---
  .../gcc.target/riscv/rvv/vsetvl/pr1071140.c   | 52 +++
  gcc/tree-ssa-pre.cc   |  4 +-
  2 files changed, 55 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
new file mode 100644
index 000..4f0815e099f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" 
} */
+
+#include 
+
+static inline __attribute__(()) int vaddq_f32();
+static inline __attribute__(()) int vload_tillz_f32(int nlane) {
+  vint32m1_t __trans_tmp_9;
+  {
+int __trans_tmp_0 = nlane;
+{
+  vint64m1_t __trans_tmp_1;
+  vint64m1_t __trans_tmp_2;
+  vint64m1_t __trans_tmp_3;
+  vint64m1_t __trans_tmp_4;
+  if (__trans_tmp_0 == 1) {
+{
+  __trans_tmp_3 =
+  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
+}
+__trans_tmp_4 = __trans_tmp_2;
+  }
+  __trans_tmp_4 = __trans_tmp_3;
+  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
+}
+  }
+  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot 
be passed to an unprototyped function} } */
+}
+
+char CFLOAT_add_args[3];
+const int *CFLOAT_add_steps;
+const int CFLOAT_steps;
+
+__attribute__(()) void CFLOAT_add() {
+  char *b_src0 = _add_args[0], *b_src1 = _add_args[1],
+   *b_dst = _add_args[2];
+  const float *src1 = (float *)b_src1;
+  float *dst = (float *)b_dst;
+  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
+  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
+  const int hstep = 4 / 2;
+  vfloat32m1x2_t a;
+  int len = 255;
+  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
+int b = vload_tillz_f32(len);
+int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
'__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
+  }
+  for (; len > 0; --len, b_src0 += CFLOAT_steps,
+  b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2])
+;
+}
+
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 75217f5cde1..e3d9c47f96b 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2777,7 +2777,9 @@ find_or_generate_expression (basic_block block, tree op, 
gimple_seq *stmts)
if (is_gimple_min_invariant (op))
  return op;

-  gcc_assert (TREE_CODE (op) == SSA_NAME);
+  if (TREE_CODE (op) != SSA_NAME)
+return NULL_TREE;
+
vn_ssa_aux_t info = VN_INFO (op);
unsigned int lookfor = info->value_id;
if (value_id_constant_p (lookfor))
--
2.25.1





Re: [PATCH] vect: Fix access size alignment assumption [PR115192]

2024-05-24 Thread Richard Biener
On Fri, May 24, 2024 at 2:35 PM Richard Sandiford
 wrote:
>
> create_intersect_range_checks checks whether two access ranges
> a and b are alias-free using something equivalent to:
>
>   end_a <= start_b || end_b <= start_a
>
> It has two ways of doing this: a "vanilla" way that calculates
> the exact exclusive end pointers, and another way that uses the
> last inclusive aligned pointers (and changes the comparisons
> accordingly).  The comment for the latter is:
>
>   /* Calculate the minimum alignment shared by all four pointers,
>  then arrange for this alignment to be subtracted from the
>  exclusive maximum values to get inclusive maximum values.
>  This "- min_align" is cumulative with a "+ access_size"
>  in the calculation of the maximum values.  In the best
>  (and common) case, the two cancel each other out, leaving
>  us with an inclusive bound based only on seg_len.  In the
>  worst case we're simply adding a smaller number than before.
>
> The problem is that the associated code implicitly assumed that the
> access size was a multiple of the pointer alignment, and so the
> alignment could be carried over to the exclusive end pointer.
>
> The testcase started failing after g:9fa5b473b5b8e289b6542
> because that commit improved the alignment information for
> the accesses.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for trunk
> and backports?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> PR tree-optimization/115192
> * tree-data-ref.cc (create_intersect_range_checks): Take the
> alignment of the access sizes into account.
>
> gcc/testsuite/
> PR tree-optimization/115192
> * gcc.dg/vect/pr115192.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/pr115192.c | 28 
>  gcc/tree-data-ref.cc |  5 -
>  2 files changed, 32 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115192.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115192.c 
> b/gcc/testsuite/gcc.dg/vect/pr115192.c
> new file mode 100644
> index 000..923d377c1bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115192.c
> @@ -0,0 +1,28 @@
> +#include "tree-vect.h"
> +
> +int data[4 * 16 * 16] __attribute__((aligned(16)));
> +
> +__attribute__((noipa)) void
> +foo (__SIZE_TYPE__ n)
> +{
> +  for (__SIZE_TYPE__ i = 1; i < n; ++i)
> +{
> +  data[i * n * 4] = data[(i - 1) * n * 4] + 1;
> +  data[i * n * 4 + 1] = data[(i - 1) * n * 4 + 1] + 2;
> +}
> +}
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +
> +  data[0] = 10;
> +  data[1] = 20;
> +
> +  foo (3);
> +
> +  if (data[24] != 12 || data[25] != 24)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index db15ddb43de..7c4049faf34 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  */
>
> +#define INCLUDE_ALGORITHM
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"
> @@ -2640,7 +2641,9 @@ create_intersect_range_checks (class loop *loop, tree 
> *cond_expr,
>  Because the maximum values are inclusive, there is an alias
>  if the maximum value of one segment is equal to the minimum
>  value of the other.  */
> -  min_align = MIN (dr_a.align, dr_b.align);
> +  min_align = std::min (dr_a.align, dr_b.align);
> +  min_align = std::min (min_align, known_alignment (dr_a.access_size));
> +  min_align = std::min (min_align, known_alignment (dr_b.access_size));
>cmp_code = LT_EXPR;
>  }
>
> --
> 2.25.1
>


[PATCH] vect: Fix access size alignment assumption [PR115192]

2024-05-24 Thread Richard Sandiford
create_intersect_range_checks checks whether two access ranges
a and b are alias-free using something equivalent to:

  end_a <= start_b || end_b <= start_a

It has two ways of doing this: a "vanilla" way that calculates
the exact exclusive end pointers, and another way that uses the
last inclusive aligned pointers (and changes the comparisons
accordingly).  The comment for the latter is:

  /* Calculate the minimum alignment shared by all four pointers,
 then arrange for this alignment to be subtracted from the
 exclusive maximum values to get inclusive maximum values.
 This "- min_align" is cumulative with a "+ access_size"
 in the calculation of the maximum values.  In the best
 (and common) case, the two cancel each other out, leaving
 us with an inclusive bound based only on seg_len.  In the
 worst case we're simply adding a smaller number than before.

The problem is that the associated code implicitly assumed that the
access size was a multiple of the pointer alignment, and so the
alignment could be carried over to the exclusive end pointer.

The testcase started failing after g:9fa5b473b5b8e289b6542
because that commit improved the alignment information for
the accesses.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK for trunk
and backports?

Richard


gcc/
PR tree-optimization/115192
* tree-data-ref.cc (create_intersect_range_checks): Take the
alignment of the access sizes into account.

gcc/testsuite/
PR tree-optimization/115192
* gcc.dg/vect/pr115192.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr115192.c | 28 
 gcc/tree-data-ref.cc |  5 -
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115192.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115192.c 
b/gcc/testsuite/gcc.dg/vect/pr115192.c
new file mode 100644
index 000..923d377c1bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115192.c
@@ -0,0 +1,28 @@
+#include "tree-vect.h"
+
+int data[4 * 16 * 16] __attribute__((aligned(16)));
+
+__attribute__((noipa)) void
+foo (__SIZE_TYPE__ n)
+{
+  for (__SIZE_TYPE__ i = 1; i < n; ++i)
+{
+  data[i * n * 4] = data[(i - 1) * n * 4] + 1;
+  data[i * n * 4 + 1] = data[(i - 1) * n * 4 + 1] + 2;
+}
+}
+
+int
+main ()
+{
+  check_vect ();
+
+  data[0] = 10;
+  data[1] = 20;
+
+  foo (3);
+
+  if (data[24] != 12 || data[25] != 24)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
index db15ddb43de..7c4049faf34 100644
--- a/gcc/tree-data-ref.cc
+++ b/gcc/tree-data-ref.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 
 */
 
+#define INCLUDE_ALGORITHM
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
@@ -2640,7 +2641,9 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 Because the maximum values are inclusive, there is an alias
 if the maximum value of one segment is equal to the minimum
 value of the other.  */
-  min_align = MIN (dr_a.align, dr_b.align);
+  min_align = std::min (dr_a.align, dr_b.align);
+  min_align = std::min (min_align, known_alignment (dr_a.access_size));
+  min_align = std::min (min_align, known_alignment (dr_b.access_size));
   cmp_code = LT_EXPR;
 }
 
-- 
2.25.1



Re: [PATCH] tree-ssa-pre.c/1071140(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE if no equal.

2024-05-24 Thread Richard Biener
On Fri, May 24, 2024 at 1:49 PM Jiawei  wrote:
>
> An ICE bug reported in
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071140.
> https://godbolt.org/z/WE9aGYvoo
>
> Return NULL_TREE when TREE_CODE(op) not equal to SSA_NAME.

The assert is on purpose.  Can you open a GCC bug for this please?  It looks
like we have unfolded POLY_INT_CST [16, 16] /[ex] 16 here.

It seems that

/* We can't always put a size in units of the element alignment
   here as the element alignment may be not visible.  See
   PR43783.  Simply drop the element size for constant
   sizes.  */
if (TREE_CODE (genop3) == INTEGER_CST
&& TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST
&& wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)),
 (wi::to_offset (genop3)
  * vn_ref_op_align_unit (currop
  genop3 = NULL_TREE;

fails to match the POLY_INT case - the unit alignment is 16 here.  One
possibility would be to match the EXACT_DIV_EXPR case and the
INTEGER_CST divisor to vn_ref_op_align_unit and the other half
separately.  But maybe this can be written in a "proper" way?

The EXACT_DIV_EXPR is built by copy_reference_ops_from_ref,
I suppose SVE could be similarly affected.

Richard.

> gcc/ChangeLog:
>
> * tree-ssa-pre.cc (find_or_generate_expression): Remove assert.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr1071140.c: New test.
>
> ---
>  .../gcc.target/riscv/rvv/vsetvl/pr1071140.c   | 52 +++
>  gcc/tree-ssa-pre.cc   |  4 +-
>  2 files changed, 55 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
> new file mode 100644
> index 000..4f0815e099f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
> @@ -0,0 +1,52 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 
> -w" } */
> +
> +#include 
> +
> +static inline __attribute__(()) int vaddq_f32();
> +static inline __attribute__(()) int vload_tillz_f32(int nlane) {
> +  vint32m1_t __trans_tmp_9;
> +  {
> +int __trans_tmp_0 = nlane;
> +{
> +  vint64m1_t __trans_tmp_1;
> +  vint64m1_t __trans_tmp_2;
> +  vint64m1_t __trans_tmp_3;
> +  vint64m1_t __trans_tmp_4;
> +  if (__trans_tmp_0 == 1) {
> +{
> +  __trans_tmp_3 =
> +  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
> +}
> +__trans_tmp_4 = __trans_tmp_2;
> +  }
> +  __trans_tmp_4 = __trans_tmp_3;
> +  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
> +}
> +  }
> +  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' 
> cannot be passed to an unprototyped function} } */
> +}
> +
> +char CFLOAT_add_args[3];
> +const int *CFLOAT_add_steps;
> +const int CFLOAT_steps;
> +
> +__attribute__(()) void CFLOAT_add() {
> +  char *b_src0 = _add_args[0], *b_src1 = _add_args[1],
> +   *b_dst = _add_args[2];
> +  const float *src1 = (float *)b_src1;
> +  float *dst = (float *)b_dst;
> +  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
> +  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
> +  const int hstep = 4 / 2;
> +  vfloat32m1x2_t a;
> +  int len = 255;
> +  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
> +int b = vload_tillz_f32(len);
> +int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
> '__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
> +  }
> +  for (; len > 0; --len, b_src0 += CFLOAT_steps,
> +  b_src1 += CFLOAT_add_steps[1], b_dst += 
> CFLOAT_add_steps[2])
> +;
> +}
> +
> diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
> index 75217f5cde1..e3d9c47f96b 100644
> --- a/gcc/tree-ssa-pre.cc
> +++ b/gcc/tree-ssa-pre.cc
> @@ -2777,7 +2777,9 @@ find_or_generate_expression (basic_block block, tree 
> op, gimple_seq *stmts)
>if (is_gimple_min_invariant (op))
>  return op;
>
> -  gcc_assert (TREE_CODE (op) == SSA_NAME);
> +  if (TREE_CODE (op) != SSA_NAME)
> +return NULL_TREE;
> +
>vn_ssa_aux_t info = VN_INFO (op);
>unsigned int lookfor = info->value_id;
>if (value_id_constant_p (lookfor))
> --
> 2.25.1
>


Re: [RFC/PATCH] Replace {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-24 Thread Richard Biener
On Fri, May 24, 2024 at 12:20 PM Kewen.Lin  wrote:
>
> Hi Joseph and Richi,
>
> on 2024/5/13 21:18, Joseph Myers wrote:
> > On Mon, 13 May 2024, Kewen.Lin wrote:
> >
> >>> In fact replacing all of X_TYPE_SIZE with a single hook might be 
> >>> worthwhile
> >>> though this removes the "convenient" defaulting, requiring each target to
> >>> enumerate all standard C ABI type modes.  But that might be also a good 
> >>> thing.
> >>>
> >>
> >> I guess the main value by extending from floating point types to all is to
> >> unify them?  (Assuming that excepting for floating types the others would
> >> not have multiple possible representations like what we faces on 128bit 
> >> fp).
> >
> > For integer types, giving the number of bits makes sense as an interface -
> > there isn't an issue with different modes.
> >
> > So I think it's appropriate for floating and integer types to have
> > separate hooks - with the one for floating types returning a mode, and the
> > one for integer types returning a number of bits.  (And also keep the
> > existing separate hook for _FloatN / _FloatNx modes.)
> >
> > That may also make for more convenient defaults (whether a target has long
> > double wider than double is largely independent of what sizes it uses for
> > integer types).
> >
>
> Following your suggestion and comments, I made this patch
> for mode_for_floating_type first, considering this touches
> a few FE and port specific code, I think I have to split
> it into a patch series.  Before making that, I'd like to
> ensure this meets what you expected, and also seek for the
> suggestion on how to organize the sub-patches.  There seem
> two ways for sub-patches:
>   1) split this into pieces according to FEs and ports, and
>  squash all of them and commit one patch.
>   2) extract all hook implementation as 1st series (split
>  as ports);
>  extract the hook enablement as 2nd part (split as
>  generic and FEs);
>  the remaining is to remove useless macros (split it
>  as generic and ports);
>
> The 1) is straightforward, while the 2) is fine-grained and
> easy for isolation, but not sure if it's worth doing.
>
> btw, the attached patch is bootstrapped and regtested on
> powerpc64-linux-gnu and powerpc64le-linux-gnu with all
> languages on, cross cc1 built well for affected ports.

Looks reasonable to me - I'd split language changes out but
keep target and middle-end together.  The middle-end parts
look good to me - I'm always a bit nervous when using
size and precision exchangably, esp. for FP, but it seems
this has been done before.

I hope Joseph will eye that part as well.

Thanks for doing this,
Richard.

> BR,
> Kewen
>
> -


[PATCH] Add testcase for PR c++/105229: ICE in lookup_template_class_1

2024-05-24 Thread Simon Martin
Hi,

The test case in PR c++/105229 has been fixed since 11.4 (via PR 
c++/106024) - the attached patch simply adds the case to the test suite.

Successfully tested on x86_64-pc-linux-gnu. OK for trunk?

Thanks! Simon


 PR c++/105229

gcc/testsuite/ChangeLog:

 * g++.dg/parse/crash72.C: New test.

---
  gcc/testsuite/g++.dg/parse/crash72.C | 12 
  1 file changed, 12 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/parse/crash72.C

diff --git a/gcc/testsuite/g++.dg/parse/crash72.C 
b/gcc/testsuite/g++.dg/parse/crash72.C
new file mode 100644
index 000..df469e20f28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/crash72.C
@@ -0,0 +1,12 @@
+// PR c++/105229
+// { dg-do compile { target c++20 } }
+// { dg-additional-options "-Wno-missing-template-keyword" }
+
+template  void bar ()
+{
+  []  {}.operator () <> (); // { dg-error "expected 
primary-expression" }
+}
+void foo ()
+{
+  bar ();
+}
--
2.44.0


Re: [PATCH v2] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Philipp Tomsich
On Fri, 24 May 2024 at 13:02, Richard Biener  wrote:
>
> On Fri, 24 May 2024, Manolis Tsamis wrote:
>
> > The match.pd patterns to merge two vector permutes into one fail when a
> > potentially no-op view convert expressions is between the two permutes.
> > This change lifts this restriction.
>
> OK.

Applied to master, thanks!
--Philipp.


Re: [PATCH V2] RISC-V: Fix missing boolean_expression in zmmul extension

2024-05-24 Thread Kito Cheng
LGTM

Liao Shihua  於 2024年5月24日 週五 13:05 寫道:

> Update v1->v2
> Add testcase for this patch.
>
> Missing boolean_expression TARGET_ZMMUL in riscv_rtx_costs() cause
> different instructions when
> multiplying an integer with a constant. (
> https://github.com/riscv-collab/riscv-gnu-toolchain/issues/1482 )
>
> int foo(int *ib) {
> *ib = *ib * 33938;
> return 0;
> }
>
> rv64im:
> lw  a4,0(a1)
> li  a5,32768
> addiw   a5,a5,1170
> mulwa5,a5,a4
> sw  a5,0(a1)
> ret
>
> rv64i_zmmul:
> lw  a4,0(a1)
> slliw   a5,a4,5
> addwa5,a5,a4
> slliw   a5,a5,3
> addwa5,a5,a4
> slliw   a5,a5,3
> addwa5,a5,a4
> slliw   a5,a5,3
> addwa5,a5,a4
> slliw   a5,a5,1
> sw  a5,0(a1)
> ret
>
> Fixed.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_rtx_costs): Add TARGET_ZMMUL.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zmmul-3.c: New test.
>
> ---
>  gcc/config/riscv/riscv.cc| 2 +-
>  gcc/testsuite/gcc.target/riscv/zmmul-3.c | 8 
>  2 files changed, 9 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zmmul-3.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 85df5b7ab49..580ae007181 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3753,7 +3753,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int
> outer_code, int opno ATTRIBUTE_UN
>  case MULT:
>if (float_mode_p)
> *total = tune_param->fp_mul[mode == DFmode];
> -  else if (!TARGET_MUL)
> +  else if (!(TARGET_MUL || TARGET_ZMMUL))
> /* Estimate the cost of a library call.  */
> *total = COSTS_N_INSNS (speed ? 32 : 6);
>else if (GET_MODE_SIZE (mode).to_constant () > UNITS_PER_WORD)
> diff --git a/gcc/testsuite/gcc.target/riscv/zmmul-3.c
> b/gcc/testsuite/gcc.target/riscv/zmmul-3.c
> new file mode 100644
> index 000..ae9752462e4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zmmul-3.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64iafdc_zmmul -mabi=lp64d" } */
> +int foo1(int a)
> +{
> +return a * 99;
> +}
> +
> +/* { dg-final { scan-assembler-times "mulw\t" 1 } } */
> \ No newline at end of file
> --
> 2.34.1
>
>


[PATCH] tree-ssa-pre.c/1071140(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE if no equal.

2024-05-24 Thread Jiawei
An ICE bug reported in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071140.
https://godbolt.org/z/WE9aGYvoo

Return NULL_TREE when TREE_CODE(op) not equal to SSA_NAME.

gcc/ChangeLog:

* tree-ssa-pre.cc (find_or_generate_expression): Remove assert.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr1071140.c: New test.

---
 .../gcc.target/riscv/rvv/vsetvl/pr1071140.c   | 52 +++
 gcc/tree-ssa-pre.cc   |  4 +-
 2 files changed, 55 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
new file mode 100644
index 000..4f0815e099f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr1071140.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" 
} */
+
+#include 
+
+static inline __attribute__(()) int vaddq_f32();
+static inline __attribute__(()) int vload_tillz_f32(int nlane) {
+  vint32m1_t __trans_tmp_9;
+  {
+int __trans_tmp_0 = nlane;
+{
+  vint64m1_t __trans_tmp_1;
+  vint64m1_t __trans_tmp_2;
+  vint64m1_t __trans_tmp_3;
+  vint64m1_t __trans_tmp_4;
+  if (__trans_tmp_0 == 1) {
+{
+  __trans_tmp_3 =
+  __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2);
+}
+__trans_tmp_4 = __trans_tmp_2;
+  }
+  __trans_tmp_4 = __trans_tmp_3;
+  __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3);
+}
+  }
+  return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot 
be passed to an unprototyped function} } */
+}
+
+char CFLOAT_add_args[3];
+const int *CFLOAT_add_steps;
+const int CFLOAT_steps;
+
+__attribute__(()) void CFLOAT_add() {
+  char *b_src0 = _add_args[0], *b_src1 = _add_args[1],
+   *b_dst = _add_args[2];
+  const float *src1 = (float *)b_src1;
+  float *dst = (float *)b_dst;
+  const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float);
+  const int sdst = CFLOAT_add_steps[2] / sizeof(float);
+  const int hstep = 4 / 2;
+  vfloat32m1x2_t a;
+  int len = 255;
+  for (; len > 0; len -= hstep, src1 += 4, dst += 4) {
+int b = vload_tillz_f32(len);
+int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type 
'__rvv_float32m1_t' cannot be passed to an unprototyped function} } */
+  }
+  for (; len > 0; --len, b_src0 += CFLOAT_steps,
+  b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2])
+;
+}
+
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 75217f5cde1..e3d9c47f96b 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2777,7 +2777,9 @@ find_or_generate_expression (basic_block block, tree op, 
gimple_seq *stmts)
   if (is_gimple_min_invariant (op))
 return op;
 
-  gcc_assert (TREE_CODE (op) == SSA_NAME);
+  if (TREE_CODE (op) != SSA_NAME)
+return NULL_TREE;
+
   vn_ssa_aux_t info = VN_INFO (op);
   unsigned int lookfor = info->value_id;
   if (value_id_constant_p (lookfor))
-- 
2.25.1



Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b

2024-05-24 Thread Alexandre Oliva
On May 23, 2024, Alexandre Oliva  wrote:

> On Apr 29, 2024, "Kewen.Lin"  wrote:
>> I think you can still push the patch as the testing just exposes
>> another issue.

> ACK, thanks, I've just confirmed that the problem I reported on
> ppc64el-linux-gnu didn't come up when testing on ppc64-vx7r2 with a
> non-power8 emulated cpu, so I'm going to install it.

I see I hadn't posted the latest version of the patch, with the updated
attribution and commit message.  Here it is.  I'm checking it in.

testsuite: adjust iteration count for ppc costmodel 76b

From: Alexandre Oliva 

For some hardware which doesn't support unaligned vector memory access,
test case costmodel-vect-76b.c expects to see cost modeling would make
the decision that it's not profitable for peeling, according to the
commit history, test case comments and the way to check.

For now, the existing loop bound 14 works well for Power7, but it does
not for some targets on which the cost of operation vec_perm can be
different from Power7, such as: Power6, it's 3 vs. 1.  This difference
further causes the difference (10 vs. 12) on the minimum iteration for
profitability and cause the failure.  To keep the original test point,
this patch is to tweak the loop bound to ensure it's not profitable
to be vectorized for !vect_no_align with peeling.


Co-Authored-By: Kewen Lin 

for  gcc/testsuite/ChangeLog

* gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c (N): Tweak.
---
 .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c 
b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
index cbbfbb24658f8..e48b0ab759e75 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
@@ -6,7 +6,7 @@
 
 /* On Power7 without misalign vector support, this case is to check it's not
profitable to perform vectorization by peeling to align the store.  */
-#define N 14
+#define N 13
 #define OFF 4
 
 /* Check handling of accesses for which the "initial condition" -


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] Fix gcc.dg/vect/vect-gather-4.c for cascadelake

2024-05-24 Thread Richard Biener
There's not really a good way to test what the testcase wants to
test, the following exchanges one dump scan for another (imperfect)
one.

Pushed.

* gcc.dg/vect/vect-gather-4.c: Scan for not vectorizing using
SLP.
---
 gcc/testsuite/gcc.dg/vect/vect-gather-4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
index 1ce63e69199..d18094d6982 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-4.c
@@ -45,4 +45,4 @@ f3 (int *restrict y, int *restrict x, int *restrict indices)
 }
 }
 
-/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */
+/* { dg-final { scan-tree-dump-not "vectorizing stmts using SLP" vect } } */
-- 
2.35.3


Re: [PATCH v2] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Richard Biener
On Fri, 24 May 2024, Manolis Tsamis wrote:

> The match.pd patterns to merge two vector permutes into one fail when a
> potentially no-op view convert expressions is between the two permutes.
> This change lifts this restriction.

OK.

> gcc/ChangeLog:
> 
>   * match.pd: Allow no-op view_convert between permutes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/fold-perm-2.c: New test.
> 
> Signed-off-by: Manolis Tsamis 
> ---
> 
> Changes in v2:
> Use TYPE_SIZE (TREE_TYPE (TREE_TYPE (@))) instead of element_precision (@).
> 
>  gcc/match.pd   | 14 --
>  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
>  2 files changed, 24 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 07e743ae464..1f91b9857c8 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -10039,19 +10039,21 @@ and,
>   d = VEC_PERM_EXPR ;  */
>  
>  (simplify
> - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 VECTOR_CST@5)
>   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
>(with
> {
>   machine_mode result_mode = TYPE_MODE (type);
> - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
>   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
>   vec_perm_builder builder0;
>   vec_perm_builder builder1;
>   vec_perm_builder builder2 (nelts, nelts, 1);
> }
> -   (if (tree_to_vec_perm_builder (, @3)
> - && tree_to_vec_perm_builder (, @4))
> +   (if (tree_to_vec_perm_builder (, @4)
> + && tree_to_vec_perm_builder (, @5)
> + && TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0)))
> +== TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1
>  (with
>   {
> vec_perm_indices sel0 (builder0, 2, nelts);
> @@ -10073,10 +10075,10 @@ and,
>  ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
> || !can_vec_perm_const_p (result_mode, op_mode, sel1, false))
>  : !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
> -  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> +  op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
>   }
>   (if (op0)
> -  (vec_perm @1 @2 { op0; })))
> +  (view_convert (vec_perm @2 @3 { op0; }
>  
>  /* Merge
>   c = VEC_PERM_EXPR ;
> diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
> b/gcc/testsuite/gcc.dg/fold-perm-2.c
> new file mode 100644
> index 000..1a4ab4065de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-fre1" } */
> +
> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> +typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof (unsigned 
> int;
> +
> +void fun (veci *a, veci *b, veci *c)
> +{
> +  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> +  vecu r2 = __builtin_convertvector (r1, vecu);
> +  vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
> +  *c = __builtin_convertvector (r3, veci);
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH][v2] tree-optimization/115144 - improve sinking destination choice

2024-05-24 Thread Richard Biener
When sinking code closer to its uses we already try to minimize the
distance we move by inserting at the start of the basic-block.  The
following makes sure to sink closest to the control dependence
check of the region we want to sink to as well as make sure to
ignore control dependences that are only guarding exceptional code.
This restores somewhat the old profile check but without requiring
nearly even probabilities.  The patch also makes sure to not give
up completely when the best sink location is one we do not want to
sink to but possibly then choose the next best one.

This addresses fallout observed in building libgo.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115144
* tree-ssa-sink.cc (do_not_sink): New function, split out
from ...
(select_best_block): Here.  First pick valid block to
sink to.  From that search for the best valid block,
avoiding sinking across conditions to exceptional code.
(sink_code_in_bb): When updating vuses of stores in
paths we do not sink a store to make sure we didn't
pick a dominating sink location.

* gcc.dg/tree-ssa/ssa-sink-22.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c |  14 +++
 gcc/tree-ssa-sink.cc| 106 +---
 2 files changed, 86 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..e35626d4070
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details" } */
+
+extern void abort (void);
+
+int foo (int x, int y, int f)
+{
+  int tem = x / y;
+  if (f)
+abort ();
+  return tem;
+}
+
+/* { dg-final { scan-tree-dump-not "Sinking" "sink1" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2188b7523c7..b0fe871cf1e 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -172,6 +172,39 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return whether sinking STMT from EARLY_BB to BEST_BB should be avoided.  */
+
+static bool
+do_not_sink (gimple *stmt, basic_block early_bb, basic_block best_bb)
+{
+  /* Placing a statement before a setjmp-like function would be invalid
+ (it cannot be reevaluated when execution follows an abnormal edge).
+ If we selected a block with abnormal predecessors, just punt.  */
+  if (bb_has_abnormal_pred (best_bb))
+return true;
+
+  /* If the latch block is empty, don't make it non-empty by sinking
+ something into it.  */
+  if (best_bb == early_bb->loop_father->latch
+  && empty_block_p (best_bb))
+return true;
+
+  /* Avoid turning an unconditional read into a conditional one when we
+ still might want to perform vectorization.  */
+  if (best_bb->loop_father == early_bb->loop_father
+  && loop_outer (best_bb->loop_father)
+  && !best_bb->loop_father->inner
+  && gimple_vuse (stmt)
+  && !gimple_vdef (stmt)
+  && flag_tree_loop_vectorize
+  && !(cfun->curr_properties & PROP_loop_opts_done)
+  && dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, early_bb)
+  && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, 
best_bb))
+return true;
+
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
statements.
@@ -185,54 +218,57 @@ select_best_block (basic_block early_bb,
   basic_block late_bb,
   gimple *stmt)
 {
+  /* First pick a block we do not disqualify.  */
+  while (late_bb != early_bb
+&& do_not_sink (stmt, early_bb, late_bb))
+late_bb = get_immediate_dominator (CDI_DOMINATORS, late_bb);
+
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-
   while (temp_bb != early_bb)
 {
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 
+  /* Do not consider blocks we do not want to sink to.  */
+  if (temp_bb != early_bb && do_not_sink (stmt, early_bb, temp_bb))
+   ;
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
-  if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
+  else if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-}
 
-  /* Placing a statement before a setjmp-like function would be invalid
- (it cannot be reevaluated when execution follows an abnormal edge).
- If we selected a block with abnormal predecessors, just punt.  */
-  if (bb_has_abnormal_pred (best_bb))
-return early_bb;
-
-  /* If we found 

Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-24 Thread Kewen.Lin
Hi,

on 2024/5/24 02:21, Carl Love wrote:
> 
> 
> On 5/13/24 22:37, Kewen.Lin wrote:
>> Hi,
>>
>> on 2024/4/20 05:18, Carl Love wrote:
>>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>>
>>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>>> the test cases are removed.
>>>
>>> gcc/ChangeLog:
>>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>> Remove built-in definition.
>>>
>>
>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
>>
>>
>>> gcc/testsuite/ChangeLog:
>>> * vsx-builtin-3.c (do_cmp): Remove test case for
>>> __builtin_vsx_xvcmpeqsp.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>>  2 files changed, 5 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 2f6149edd5f..19d05b8043a 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -1613,9 +1613,6 @@
>>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>>  
>>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>> -XVCMPEQSP vector_eqv4sf {}
>>> -
>>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>>  XVCMPGEDP vector_gev2df {}
>>>  
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> index 35ea31b2616..245893dc0e3 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> @@ -27,7 +27,6 @@
>>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>>  
>>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>>return i;
>>
>> As the other in this patch series, I prefer to change it with
>> vec_cmpeq here, OK for trunk with this tweaked (also keep the
>> scan there), thanks!
> 
> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp 
> and vec_cmpeq both return a vector where the element is all ones if the 
> comparison is True and zeros if False.  However, the return type for 
> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.
> 

Ah, so they are not equivalent from prototype perspective.

> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
> vector element is a 1 if the comparison is equal and 0 otherwise.  However, 
> the documented result is a vector bool int for the floating point comparison. 
>  The return value for __builtin_vsx_xvcmpeqsp was vector float.

IMHO PVIPR prototype (returning vector bool) makes more sense,
it does match better with what the result holds.

> 
> So, the "bit values" returned are the same but not of the same type. So 
> technically vec_cmpeq is not a drop in replacement for 
> __builtin_vsx_xvcmpeqsp.  Given that, perhaps we should not be removing 
> __builtin_vsx_xvcmpeqsp?
> 
> The testcase has to be changed from:
>  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>  bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++;

For the test case change, I'd expect that it can work with:

-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
+  f[i][0] = (vector float) vec_cmpeq (f[i][1], f[i][2]); i++;

> 
> I am thinking we should drop this patch from the series, i.e. don't remove 
> __builtin_vsx_xvcmpeqsp.  Thoughts?
> 

Since __builtin_vsx_xvcmpeqsp is an undocumented built-in, I don't
expect users to use it, even there is someone, IMHO vector bool is
a better fit.  In case someone actually wants the vector non-bool
type, he/she can just add an explicit conversion.  So I'm inclined
to remove the vsx_xvcmpeqsp, users should try to use PVIPR built-ins
as possible as they can.  But I'm also fine for holding on this, as
there are some other related built-ins cmp* (cmpge,cmpgt...), we
can re-visit and handle them together later.

BR,
Kewen


[RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-24 Thread Kewen.Lin
Hi Joseph and Richi,

on 2024/5/13 21:18, Joseph Myers wrote:
> On Mon, 13 May 2024, Kewen.Lin wrote:
> 
>>> In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
>>> though this removes the "convenient" defaulting, requiring each target to
>>> enumerate all standard C ABI type modes.  But that might be also a good 
>>> thing.
>>>
>>
>> I guess the main value by extending from floating point types to all is to
>> unify them?  (Assuming that excepting for floating types the others would
>> not have multiple possible representations like what we faces on 128bit fp).
> 
> For integer types, giving the number of bits makes sense as an interface - 
> there isn't an issue with different modes.
> 
> So I think it's appropriate for floating and integer types to have 
> separate hooks - with the one for floating types returning a mode, and the 
> one for integer types returning a number of bits.  (And also keep the 
> existing separate hook for _FloatN / _FloatNx modes.)
> 
> That may also make for more convenient defaults (whether a target has long 
> double wider than double is largely independent of what sizes it uses for 
> integer types).
> 

Following your suggestion and comments, I made this patch
for mode_for_floating_type first, considering this touches
a few FE and port specific code, I think I have to split
it into a patch series.  Before making that, I'd like to
ensure this meets what you expected, and also seek for the
suggestion on how to organize the sub-patches.  There seem
two ways for sub-patches:
  1) split this into pieces according to FEs and ports, and
 squash all of them and commit one patch.
  2) extract all hook implementation as 1st series (split
 as ports);
 extract the hook enablement as 2nd part (split as
 generic and FEs);
 the remaining is to remove useless macros (split it
 as generic and ports);

The 1) is straightforward, while the 2) is fine-grained and
easy for isolation, but not sure if it's worth doing.

btw, the attached patch is bootstrapped and regtested on
powerpc64-linux-gnu and powerpc64le-linux-gnu with all
languages on, cross cc1 built well for affected ports.

BR,
Kewen

-
From 2935750160f4eaf72eb7fba5832c99d6bf552862 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Fri, 24 May 2024 00:10:22 -0500
Subject: [PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
 mode_for_floating_type

Currently how we determine which mode will be used for a
floating point type is that for a given type precision
(size) call mode_for_size to get the first mode which has
this size in the specified class.  On Powerpc, we have
three modes (TF/KF/IF) having the same mode precision 128
(see[1]), so the processing forces us to have to place TF
at the first place, it would require us to make more
adjustment in some generic code to avoid some unexpected
mode conversions and it would be even worse if we get rid
of TF eventually one day.  And as Joseph pointed out in [2],
"floating  types should have their mode, not a poorly
defined precision value", as Joseph and Richi suggested,
this patch is to introduce one hook mode_for_floating_type
which returns the corresponding mode for type float, double
or long double.  The default implementation returns SFmode
for float and DFmode for double or long double, and ports
which need special treatment have their own port specific
implementation (referring to {,LONG_}DOUBLE_TYPE_SIZE).
For all generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
depending on the context, it replaces them with
TYPE_PRECISION of the according type node, or
GET_MODE_PRECISION on the mode from mode_for_floating_type.
It also removes some useless uses of
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in target specific codes,
but leaves those being used (like defining other macros)
untouched.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type):
New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/aarch64/aarch64.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/alpha/alpha.cc (alpha_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/alpha/alpha.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/arc/arc.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/avr/avr.cc (avr_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* 

[PATCH v2] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Manolis Tsamis
The match.pd patterns to merge two vector permutes into one fail when a
potentially no-op view convert expressions is between the two permutes.
This change lifts this restriction.

gcc/ChangeLog:

* match.pd: Allow no-op view_convert between permutes.

gcc/testsuite/ChangeLog:

* gcc.dg/fold-perm-2.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v2:
Use TYPE_SIZE (TREE_TYPE (TREE_TYPE (@))) instead of element_precision (@).

 gcc/match.pd   | 14 --
 gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
 2 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..1f91b9857c8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10039,19 +10039,21 @@ and,
  d = VEC_PERM_EXPR ;  */
 
 (simplify
- (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 VECTOR_CST@5)
  (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
   (with
{
  machine_mode result_mode = TYPE_MODE (type);
- machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
  int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
  vec_perm_builder builder0;
  vec_perm_builder builder1;
  vec_perm_builder builder2 (nelts, nelts, 1);
}
-   (if (tree_to_vec_perm_builder (, @3)
-   && tree_to_vec_perm_builder (, @4))
+   (if (tree_to_vec_perm_builder (, @4)
+   && tree_to_vec_perm_builder (, @5)
+   && TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0)))
+  == TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1
 (with
  {
vec_perm_indices sel0 (builder0, 2, nelts);
@@ -10073,10 +10075,10 @@ and,
   ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
  || !can_vec_perm_const_p (result_mode, op_mode, sel1, false))
   : !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
-op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
  }
  (if (op0)
-  (vec_perm @1 @2 { op0; })))
+  (view_convert (vec_perm @2 @3 { op0; }
 
 /* Merge
  c = VEC_PERM_EXPR ;
diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
b/gcc/testsuite/gcc.dg/fold-perm-2.c
new file mode 100644
index 000..1a4ab4065de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
+typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof (unsigned 
int;
+
+void fun (veci *a, veci *b, veci *c)
+{
+  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
+  vecu r2 = __builtin_convertvector (r1, vecu);
+  vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
+  *c = __builtin_convertvector (r3, veci);
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
-- 
2.44.0



[r15-812 Regression] FAIL: gcc.dg/vect/vect-gather-4.c scan-tree-dump-not vect "Loop contains only SLP stmts" on Linux/x86_64

2024-05-24 Thread haochen.jiang
On Linux/x86_64,

c71886f2ca2e46ce1449c7064d6f1b447d02fcba is the first bad commit
commit c71886f2ca2e46ce1449c7064d6f1b447d02fcba
Author: Richard Biener 
Date:   Fri Sep 29 13:13:16 2023 +0200

Avoid splitting store dataref groups during SLP discovery

caused

FAIL: gcc.dg/vect/vect-gather-4.c -flto -ffat-lto-objects  scan-tree-dump-not 
vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-4.c scan-tree-dump-not vect "Loop contains only 
SLP stmts"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-812/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-gather-4.c --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-gather-4.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


Re: [Patch, aarch64, middle-end] v3: Move pair_fusion pass from aarch64 to middle-end

2024-05-24 Thread Ajit Agarwal
Hello Alex:

On 22/05/24 3:30 pm, Alex Coplan wrote:
> Hi Ajit,
> 
> You need to remove the header dependencies that are no longer required
> for aarch64-ldp-fusion.o in t-aarch64 (not forgetting to update the
> ChangeLog).  A few other minor nits below.
> 
> LGTM with those changes, but you'll need Richard S to approve.
> 
> Thanks a lot for doing this.
> 
> On 22/05/2024 00:16, Ajit Agarwal wrote:
>> Hello Alex/Richard:
>>
>> All comments are addressed.
>>
>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>> to support multiple targets.
>>
>> Common infrastructure of load store pair fusion is divided into
>> target independent and target dependent code.
>>
>> Target independent code is structured in the following files.
>> gcc/pair-fusion.h
>> gcc/pair-fusion.cc
>>
>> Target independent code is the Generic code with pure virtual
>> function to interface betwwen target independent and dependent
>> code.
>>
>> Bootstrapped and regtested on aarch64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>>
>> aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
>>
>> Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
>> to support multiple targets.
>>
>> Common infrastructure of load store pair fusion is divided into
>> target independent and target dependent code.
>>
>> Target independent code is structured in the following files.
>> gcc/pair-fusion.h
>> gcc/pair-fusion.cc
>>
>> Target independent code is the Generic code with pure virtual
>> function to interface betwwen target independent and dependent
>> code.
>>
>> 2024-05-22  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>>  * pair-fusion.h: Generic header code for load store pair fusion
>>  that can be shared across different architectures.
>>  * pair-fusion.cc: Generic source code implementation for
>>  load store pair fusion that can be shared across different 
>> architectures.
>>  * Makefile.in: Add new object file pair-fusion.o.
>>  * config/aarch64/aarch64-ldp-fusion.cc: Delete generic code and move it
>>  to pair-fusion.cc in the middle-end.
>>  * config/aarch64/t-aarch64: Add header file dependency on pair-fusion.h.
>> ---
>>  gcc/Makefile.in  |1 +
>>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 3298 +-
>>  gcc/config/aarch64/t-aarch64 |2 +-
>>  gcc/pair-fusion.cc   | 3013 
>>  gcc/pair-fusion.h|  193 ++
>>  5 files changed, 3286 insertions(+), 3221 deletions(-)
>>  create mode 100644 gcc/pair-fusion.cc
>>  create mode 100644 gcc/pair-fusion.h
>>
>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>> index a7f15694c34..643342f623d 100644
>> --- a/gcc/Makefile.in
>> +++ b/gcc/Makefile.in
>> @@ -1563,6 +1563,7 @@ OBJS = \
>>  ipa-strub.o \
>>  ipa.o \
>>  ira.o \
>> +pair-fusion.o \
>>  ira-build.o \
>>  ira-costs.o \
>>  ira-conflicts.o \
>> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
>> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> index 085366cdf68..0af927231d3 100644
>> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
>> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> 
>> diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
>> index 78713558e7d..bdada08be70 100644
>> --- a/gcc/config/aarch64/t-aarch64
>> +++ b/gcc/config/aarch64/t-aarch64
>> @@ -203,7 +203,7 @@ aarch64-early-ra.o: 
>> $(srcdir)/config/aarch64/aarch64-early-ra.cc \
>>  aarch64-ldp-fusion.o: $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc \
>>  $(CONFIG_H) $(SYSTEM_H) $(CORETYPES_H) $(BACKEND_H) $(RTL_H) $(DF_H) \
>>  $(RTL_SSA_H) cfgcleanup.h tree-pass.h ordered-hash-map.h tree-dfa.h \
>> -fold-const.h tree-hash-traits.h print-tree.h
>> +fold-const.h tree-hash-traits.h print-tree.h pair-fusion.h
> 
> So now you also need to remove the deps on the includes removed in the latest
> version of the patch.
>

Addressed in v4 of the patch.
 
>>  $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
>>  $(srcdir)/config/aarch64/aarch64-ldp-fusion.cc
>>  
>> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
>> new file mode 100644
>> index 000..827b88cf2fc
>> --- /dev/null
>> +++ b/gcc/pair-fusion.cc
>> @@ -0,0 +1,3013 @@
>> +// Pass to fuse adjacent loads/stores into paired memory accesses.
>> +// Copyright (C) 2024 Free Software Foundation, Inc.
>> +//
>> +// This file is part of GCC.
>> +//
>> +// GCC is free software; you can redistribute it and/or modify it
>> +// under the terms of the GNU General Public License as published by
>> +// the Free Software Foundation; either version 3, or (at your option)
>> +// any later version.
>> +//
>> +// GCC is distributed in the hope that it will be useful, but
>> +// WITHOUT ANY WARRANTY; without even the implied warranty of
>> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +// General Public License for more 

Re: [PATCH] vect: Support multiple lane-reducing operations for loop reduction [PR114440]

2024-05-24 Thread Feng Xue OS
Hi,

The patch was updated with the newest trunk, and also contained some minor 
changes.

I am working on another new feature which is meant to support pattern 
recognition
of lane-reducing operations in affine closure originated from loop reduction 
variable,
like:

  sum += cst1 * dot_prod_1 + cst2 * sad_2 + ... + cstN * lane_reducing_op_N

The feature WIP depends on the patch. It has been a little bit long time since 
its post,
would you please take a time to review this one? Thanks.

Feng


gcc/
PR tree-optimization/114440
* tree-vectorizer.h (struct _stmt_vec_info): Add a new field
reduc_result_pos.
(lane_reducing_op_p): New function.
(vectorizable_lane_reducing): New function declaration.
* tree-vect-stmts.cc (vectorizable_condition): Treat the condition
statement that is pointed by stmt_vec_info of reduction PHI as the
real "for_reduction" statement.
(vect_analyze_stmt): Call new function vectorizable_lane_reducing
to analyze lane-reducing operation.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Remove parameter
loop_vinfo. Get input vectype from stmt_info instead of reduction PHI.
(vect_model_reduction_cost): Remove cost computation code related to
emulated_mixed_dot_prod.
(vect_reduction_use_partial_vector): New function.
(vectorizable_lane_reducing): New function.
(vectorizable_reduction): Allow multiple lane-reducing operations in
loop reduction. Move some original lane-reducing related code to
vectorizable_lane_reducing, and move partial vectorization checking
code to vect_reduction_use_partial_vector.
(vect_transform_reduction): Extend transformation to support reduction
statements with mixed input vectypes.
* tree-vect-slp.cc (vect_analyze_slp): Use new function
lane_reducing_op_p to check statement code.

gcc/testsuite/
PR tree-optimization/114440
* gcc.dg/vect/vect-reduc-chain-1.c
* gcc.dg/vect/vect-reduc-chain-2.c
* gcc.dg/vect/vect-reduc-chain-3.c
* gcc.dg/vect/vect-reduc-dot-slp-1.c
* gcc.dg/vect/vect-reduc-dot-slp-2.c
---
 .../gcc.dg/vect/vect-reduc-chain-1.c  |  62 ++
 .../gcc.dg/vect/vect-reduc-chain-2.c  |  77 ++
 .../gcc.dg/vect/vect-reduc-chain-3.c  |  66 ++
 .../gcc.dg/vect/vect-reduc-dot-slp-1.c|  97 +++
 .../gcc.dg/vect/vect-reduc-dot-slp-2.c|  81 +++
 gcc/tree-vect-loop.cc | 680 --
 gcc/tree-vect-slp.cc  |   4 +-
 gcc/tree-vect-stmts.cc|  13 +-
 gcc/tree-vectorizer.h |  14 +
 9 files changed, 873 insertions(+), 221 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-chain-3.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-slp-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
new file mode 100644
index 000..04bfc419dbd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-chain-1.c
@@ -0,0 +1,62 @@
+/* Disabling epilogues until we find a better way to deal with scans.  */
+/* { dg-additional-options "--param vect-epilogues-nomask=0" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target arm_v8_2a_dotprod_neon_hw { target { 
aarch64*-*-* || arm*-*-* } } } */
+/* { dg-add-options arm_v8_2a_dotprod_neon }  */
+
+#include "tree-vect.h"
+
+#define N 50
+
+#ifndef SIGNEDNESS_1
+#define SIGNEDNESS_1 signed
+#define SIGNEDNESS_2 signed
+#endif
+
+SIGNEDNESS_1 int __attribute__ ((noipa))
+f (SIGNEDNESS_1 int res,
+   SIGNEDNESS_2 char *restrict a,
+   SIGNEDNESS_2 char *restrict b,
+   SIGNEDNESS_2 char *restrict c,
+   SIGNEDNESS_2 char *restrict d,
+   SIGNEDNESS_1 int *restrict e)
+{
+  for (int i = 0; i < N; ++i)
+{
+  res += a[i] * b[i];
+  res += c[i] * d[i];
+  res += e[i];
+}
+  return res;
+}
+
+#define BASE ((SIGNEDNESS_2 int) -1 < 0 ? -126 : 4)
+#define OFFSET 20
+
+int
+main (void)
+{
+  check_vect ();
+
+  SIGNEDNESS_2 char a[N], b[N];
+  SIGNEDNESS_2 char c[N], d[N];
+  SIGNEDNESS_1 int e[N];
+  int expected = 0x12345;
+  for (int i = 0; i < N; ++i)
+{
+  a[i] = BASE + i * 5;
+  b[i] = BASE + OFFSET + i * 4;
+  c[i] = BASE + i * 2;
+  d[i] = BASE + OFFSET + i * 3;
+  e[i] = i;
+  asm volatile ("" ::: "memory");
+  expected += a[i] * b[i];
+  expected += c[i] * d[i];
+  expected += e[i];
+}
+  if (f (0x12345, a, b, c, d, e) != expected)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump "vect_recog_dot_prod_pattern: detected" "vect" 
} } */
+/* { dg-final { 

[PATCH] Hard register asm constraint

2024-05-24 Thread Stefan Schulze Frielinghaus
This implements hard register constraints for inline asm.  A hard register
constraint is of the form {regname} where regname is any valid register.  This
basically renders register asm superfluous.  For example, the snippet

int test (int x, int y)
{
  register int r4 asm ("r4") = x;
  register int r5 asm ("r5") = y;
  unsigned int copy = y;
  asm ("foo %0,%1,%2" : "+d" (r4) : "d" (r5), "d" (copy));
  return r4;
}

could be rewritten into

int test (int x, int y)
{
  asm ("foo %0,%1,%2" : "+{r4}" (x) : "{r5}" (y), "d" (y));
  return x;
}

As a side-effect this also solves the problem of call-clobbered registers.
That being said, I was wondering whether we could utilize this feature in order
to get rid of local register asm automatically?  For example, converting

// Result will be in r2 on s390
extern int bar (void);

void test (void)
{
  register int x asm ("r2") = 42;
  bar ();
  asm ("foo %0\n" :: "r" (x));
}

into

void test (void)
{
  int x = 42;
  bar ();
  asm ("foo %0\n" :: "{r2}" (x));
}

in order to get rid of the limitation of call-clobbered registers which may
lead to subtle bugs---especially if you think of non-obvious calls e.g.
introduced by sanitizer/tracer/whatever.  Since such a transformation has the
potential to break existing code do you see any edge cases where this might be
problematic or even show stoppers?  Currently, even

int test (void)
{
  register int x asm ("r2") = 42;
  register int y asm ("r2") = 24;
  asm ("foo %0,%1\n" :: "r" (x), "r" (y));
}

is allowed which seems error prone to me.  Thus, if 100% backwards
compatibility would be required, then automatically converting every register
asm to the new mechanism isn't viable.  Still quite a lot could be transformed.
Any thoughts?

Currently I allow multiple alternatives as demonstrated by
gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c.  However, since a hard register
constraint is pretty specific I could also think of erroring out in case of
alternatives.  Are there any real use cases out there for multiple
alternatives where one would like to use hard register constraints?

With the current implementation we have a "user visible change" in the sense
that for

void test (void)
{
  register int x asm ("r2") = 42;
  register int y asm ("r2") = 24;
  asm ("foo %0,%1\n" : "=r" (x), "=r" (y));
}

we do not get the error

  "invalid hard register usage between output operands"

anymore but rather

  "multiple outputs to hard register: %r2"

This is due to the error handling in gimplify_asm_expr ().  Speaking of errors,
I also error out earlier as before which means that e.g. in pr87600-2.c only
the first error is reported and processing is stopped afterwards which means
the subsequent tests fail.

I've been skimming through all targets and it looks to me as if none is using
curly brackets for their constraints.  Of course, I may have missed something.

Cheers,
Stefan

PS: Current state for Clang: https://reviews.llvm.org/D105142

---
 gcc/cfgexpand.cc  |  42 ---
 gcc/genpreds.cc   |   4 +-
 gcc/gimplify.cc   | 115 +-
 gcc/lra-constraints.cc|  17 +++
 gcc/recog.cc  |  14 ++-
 gcc/stmt.cc   | 102 +++-
 gcc/stmt.h|  10 +-
 .../gcc.target/s390/asm-hard-reg-1.c  | 103 
 .../gcc.target/s390/asm-hard-reg-2.c  |  29 +
 .../gcc.target/s390/asm-hard-reg-3.c  |  24 
 gcc/testsuite/lib/scanasm.exp |   4 +
 11 files changed, 407 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
 create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 557cb28733b..47f71a2e803 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2955,44 +2955,6 @@ expand_asm_loc (tree string, int vol, location_t locus)
   emit_insn (body);
 }
 
-/* Return the number of times character C occurs in string S.  */
-static int
-n_occurrences (int c, const char *s)
-{
-  int n = 0;
-  while (*s)
-n += (*s++ == c);
-  return n;
-}
-
-/* A subroutine of expand_asm_operands.  Check that all operands have
-   the same number of alternatives.  Return true if so.  */
-
-static bool
-check_operand_nalternatives (const vec )
-{
-  unsigned len = constraints.length();
-  if (len > 0)
-{
-  int nalternatives = n_occurrences (',', constraints[0]);
-
-  if (nalternatives + 1 > MAX_RECOG_ALTERNATIVES)
-   {
- error ("too many alternatives in %");
- return false;
-   }
-
-  for (unsigned i = 1; i < len; ++i)
-   if (n_occurrences (',', constraints[i]) != nalternatives)
- {
-   error ("operand constraints for % differ "
-  "in number 

Re: [PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Manolis Tsamis
The match.pd patterns to merge two vector permutes into one fail when a
potentially no-op view convert expression is between the two permutes.
This change lifts this restriction.

gcc/ChangeLog:

* match.pd: Allow no-op view_convert between permutes.

gcc/testsuite/ChangeLog:

* gcc.dg/fold-perm-2.c: New test.
---

gcc/match.pd | 14 --
gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
2 files changed, 24 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..1f91b9857c8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10039,19 +10039,21 @@ and,
d = VEC_PERM_EXPR ; */
(simplify
- (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
+ (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 VECTOR_CST@5)
(if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
(with
{
machine_mode result_mode = TYPE_MODE (type);
- machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
+ machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
vec_perm_builder builder0;
vec_perm_builder builder1;
vec_perm_builder builder2 (nelts, nelts, 1);
}
- (if (tree_to_vec_perm_builder (, @3)
- && tree_to_vec_perm_builder (, @4))
+ (if (tree_to_vec_perm_builder (, @4)
+ && tree_to_vec_perm_builder (, @5)
+ && TYPE_SIZE (TREE_TYPE (TREE_TYPE (@0)))
+ == TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1
(with
{
vec_perm_indices sel0 (builder0, 2, nelts);
@@ -10073,10 +10075,10 @@ and,
? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
|| !can_vec_perm_const_p (result_mode, op_mode, sel1, false))
: !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
- op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
+ op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
}
(if (op0)
- (vec_perm @1 @2 { op0; })))
+ (view_convert (vec_perm @2 @3 { op0; }
/* Merge
c = VEC_PERM_EXPR ;
diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c
b/gcc/testsuite/gcc.dg/fold-perm-2.c
new file mode 100644
index 000..1a4ab4065de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-fre1" } */
+
+typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
+typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof
(unsigned int;
+
+void fun (veci *a, veci *b, veci *c)
+{
+ veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
+ vecu r2 = __builtin_convertvector (r1, vecu);
+ vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
+ *c = __builtin_convertvector (r3, veci);
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
--
2.44.0

On Fri, May 24, 2024 at 11:30 AM Richard Biener  wrote:
>
> On Fri, 24 May 2024, Manolis Tsamis wrote:
>
> > On Fri, May 24, 2024 at 10:46 AM Richard Biener  wrote:
> > >
> > > On Fri, 24 May 2024, Manolis Tsamis wrote:
> > >
> > > > On Fri, May 24, 2024 at 9:31 AM Richard Biener  
> > > > wrote:
> > > > >
> > > > > On Wed, 22 May 2024, Manolis Tsamis wrote:
> > > > >
> > > > > > The match.pd patterns to merge two vector permutes into one fail 
> > > > > > when a
> > > > > > potentially no-op view convert expressions is between the two 
> > > > > > permutes.
> > > > > > This change lifts this restriction.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >   * match.pd: Allow no-op view_convert between permutes.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >   * gcc.dg/fold-perm-2.c: New test.
> > > > > >
> > > > > > Signed-off-by: Manolis Tsamis 
> > > > > > ---
> > > > > >
> > > > > >  gcc/match.pd   | 14 --
> > > > > >  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
> > > > > >  2 files changed, 24 insertions(+), 6 deletions(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index 07e743ae464..cbb3c5d86e0 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -10039,19 +10039,21 @@ and,
> > > > > >   d = VEC_PERM_EXPR ;  */
> > > > > >
> > > > > >  (simplify
> > > > > > - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> > > > > > + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 
> > > > > > VECTOR_CST@5)
> > > > > >   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > > > > >(with
> > > > > > {
> > > > > >   machine_mode result_mode = TYPE_MODE (type);
> > > > > > - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> > > > > > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
> > > > > >   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> > > > > >   vec_perm_builder builder0;
> > > > > >   vec_perm_builder builder1;
> > > > > >   vec_perm_builder builder2 (nelts, nelts, 1);

[RFC/RFA] [PATCH 10/12] Verify detected CRC loop with symbolic execution and LFSR matching

2024-05-24 Thread Mariam Arutunian
Symbolically execute potential CRC loops and check whether the loop
actually calculates CRC (uses LFSR matching).
Calculated CRC and created LFSR are compared on each iteration of the
potential CRC loop.

  gcc/

* Makefile.in (OBJS): Add crc-verification.o.
* crc-verification.cc: New file.
* crc-verification.h: New file.
* gimple-crc-optimization.cc (loop_calculates_crc): New function.
(is_output_crc): Likewise.
(swap_crc_and_data_if_needed): Likewise.
(get_output_phi): Likewise.
(execute): Add check whether potential CRC loop calculates CRC.

  gcc/sym-exec/

* state.cc (create_reversed_lfsr): New function.
(create_forward_lfsr): Likewise.
(last_set_bit): Likewise.
(create_lfsr): Likewise.
* state.h (is_bit_vector): Reorder, make the function public and static.
(create_reversed_lfsr): New static function declaration.
(create_forward_lfsr): New static function declaration.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index aab909c3510..1996a60078c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1716,6 +1716,7 @@ OBJS = \
 	tree-iterator.o \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
+	crc-verification.o \
 	gimple-crc-optimization.o \
 	sym-exec/expression.o \
 	sym-exec/state.o \
diff --git a/gcc/crc-verification.cc b/gcc/crc-verification.cc
new file mode 100644
index 000..0922199d377
--- /dev/null
+++ b/gcc/crc-verification.cc
@@ -0,0 +1,1331 @@
+/* Execute symbolically all paths of the loop.
+   Calculate the value of the polynomial by executing loop with real values to
+   create LFSR state.
+   After each iteration check that final states of calculated CRC values match
+   determined LFSR.
+   Copyright (C) 2022-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.   */
+
+#include "crc-verification.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+#include "gimple.h"
+#include "ssa.h"
+#include "gimple-iterator.h"
+#include "tree-cfg.h"
+#include "cfganal.h"
+#include "tree-ssa-loop.h"
+
+/* Check whether defined variable is used outside the loop, only
+   CRC's definition is allowed to be used outside the loop.  */
+
+bool
+crc_symbolic_execution::is_used_outside_the_loop (tree def)
+{
+  imm_use_iterator imm_iter;
+  gimple *use_stmt;
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, def)
+{
+  if (!flow_bb_inside_loop_p (m_crc_loop, use_stmt->bb))
+	{
+	  if (is_a (use_stmt)
+	  && as_a (use_stmt) == m_output_crc)
+	return false;
+	  if (dump_file)
+	fprintf (dump_file, "Defined variable is used outside the loop.\n");
+	  return true;
+	}
+}
+  return false;
+}
+
+/* Calculate value of the rhs operation of GS assigment statement
+   and assign it to lhs variable.  */
+
+bool
+crc_symbolic_execution::execute_assign_statement (const gassign *gs)
+{
+  enum tree_code rhs_code = gimple_assign_rhs_code (gs);
+  tree lhs = gimple_assign_lhs (gs);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "lhs type : %s \n",
+	 get_tree_code_name (TREE_CODE (lhs)));
+
+  /* This will filter some normal cases too.  Ex.  usage of array.  */
+  if (TREE_CODE (lhs) != SSA_NAME)
+return false;
+
+  /* Check uses only when m_output_crc is known.  */
+  if (m_output_crc)
+if (is_used_outside_the_loop (lhs))
+  return false;
+
+  state *current_state = m_states.last ();
+
+  if (gimple_num_ops (gs) == 2)
+{
+  tree op1 = gimple_assign_rhs1 (gs);
+  switch (rhs_code)
+	{
+	  case BIT_NOT_EXPR:
+	return current_state->do_complement (op1, lhs);
+	  case NOP_EXPR:
+	  case SSA_NAME:
+	  case VAR_DECL:
+	  case INTEGER_CST:
+	return current_state->do_assign (op1, lhs);
+	  default:
+	{
+	  if (dump_file)
+		fprintf (dump_file,
+			 "Warning, encountered unsupported unary operation "
+			 "with %s code while executing assign statement!\n",
+			 get_tree_code_name (rhs_code));
+	  return false;
+	}
+	}
+}
+  else if (gimple_num_ops (gs) == 3)
+{
+  tree op1 = gimple_assign_rhs1 (gs);
+  tree op2 = gimple_assign_rhs2 (gs);
+  switch (rhs_code)
+	{
+	  case LSHIFT_EXPR:
+	return current_state->do_shift_left (op1, op2, lhs);
+	  case RSHIFT_EXPR:
+	return current_state->do_shift_right (op1, op2, lhs);
+	  case 

[RFC/RFA] [PATCH 11/12] Replace the original CRC loops with a faster CRC calculation.

2024-05-24 Thread Mariam Arutunian
 Specifically, use the following alternatives: If the target
 supports the crc32 instruction, use it directly. If the target supports the
 carry-less multiplication instruction, use it to calculate the CRC. If the
 target does not support either of the above, use a table-based CRC
 calculation.

During initial checks, the loop's output CRC (i.e., the variable where the
calculated CRC is stored after the loop execution) is determined.
The entire loop is removed and replaced with an internal function call
(CRC, CRC_REV),
and the result is assigned to the output CRC variable.

  gcc/

* gimple-crc-optimization.cc (get_data): New function.
(faster_crc_code_generation): Likewise.
(build_polynomial_without_1): Likewise.
(execute): Add faster_crc_code_generation function call.
* tree-loop-distribution.cc (destroy_loop): Remove, move function to
tree-ssa-loop-manip.cc.
* tree-ssa-loop-manip.cc (destroy_loop): Add, move function from
tree-loop-distribution.cc.
* tree-ssa-loop-manip.h (destroy_loop): Add extern function declaration.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc
index 039506c1059..c23bbf9f44c 100644
--- a/gcc/gimple-crc-optimization.cc
+++ b/gcc/gimple-crc-optimization.cc
@@ -209,6 +209,24 @@ class crc_optimization {
   /* Returns phi statement which may hold the calculated CRC.  */
   gphi *get_output_phi ();
 
+  /* Returns data argument to pass to the CRC IFN.
+ If there is data from the code - use it (this is the case,
+ when data isn't xor-ed with CRC before the loop).
+ Otherwise, generate a new variable for the data with 0 value
+ (the case, when data is xor-ed with CRC before the loop).
+ For the CRC calculation, it doesn't matter CRC is calculated for the
+ (CRC^data, 0) or (CRC, data).  */
+  tree get_data ();
+
+  /* Replaces CRC calculation loop with CRC_IFN call.
+ Returns true if replacement is succeeded, otherwise false.  */
+  bool faster_crc_code_generation (function *fun, value *polynomial,
+   gphi *output_crc);
+
+  /* Build tree for the POLYNOMIAL (from its binary representation)
+   without the leading 1.  */
+  tree build_polynomial_without_1 (tree crc_arg, value *polynomial);
+
  public:
   unsigned int execute (function *fun);
 };
@@ -1065,6 +1083,178 @@ crc_optimization::get_output_phi ()
   return nullptr;
 }
 
+/* Build tree for the POLYNOMIAL (from its binary representation)
+   without the leading 1.  */
+
+tree
+crc_optimization::build_polynomial_without_1 (tree crc_arg, value *polynomial)
+{
+  unsigned HOST_WIDE_INT cst_polynomial = 0;
+  for (size_t i = 0; i < (*polynomial).length (); i++)
+{
+  value_bit *const_bit;
+  if (m_is_bit_forward)
+	const_bit = (*polynomial)[(*polynomial).length () - 1 - i];
+  else
+	const_bit = (*polynomial)[i];
+  cst_polynomial <<= 1;
+  cst_polynomial ^= (as_a (const_bit))->get_val () ? 1 : 0;
+}
+  return build_int_cstu (TREE_TYPE (crc_arg), cst_polynomial);
+}
+
+/* Returns data argument to pass to the CRC IFN.
+   If there is data from the code - use it (this is the case,
+   when data isn't xor-ed with CRC before the loop).
+   Otherwise, generate a new variable for the data with 0 value
+   (the case, when data is xor-ed with CRC before the loop).
+   For the CRC calculation, it doesn't matter CRC is calculated for the
+   (CRC^data, 0) or (CRC, data).  */
+
+tree
+crc_optimization::get_data ()
+{
+  unsigned HOST_WIDE_INT
+  data_size = tree_to_uhwi (m_crc_loop->nb_iterations) + 1;
+
+  /* If we have the data, use it.  */
+  if (m_phi_for_data)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Data and CRC are xor-ed in the for loop.  Initializing data "
+		 "with its value.\n");
+  tree data_arg = PHI_ARG_DEF (m_phi_for_data, 1);
+  if (TYPE_PRECISION (TREE_TYPE (data_arg)) == data_size)
+	return data_arg;
+  else
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Loop iteration number and data's size differ.\n");
+	  return nullptr;
+	}
+}
+
+  /* Create a new variable for the data.  */
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+	 "Data and CRC are xor-ed before for loop.  Initializing data "
+	 "with 0.\n");
+  tree type = nullptr;
+  /* Determine the data's size with the loop iteration count.
+ We assume that loop iteration count depends on the data's size.  */
+  if (data_size == TYPE_PRECISION (intQI_type_node))
+type = intQI_type_node;
+  else if (data_size == TYPE_PRECISION (intHI_type_node))
+type = intHI_type_node;
+  else if (data_size == TYPE_PRECISION (intSI_type_node))
+type = intSI_type_node;
+  else if (data_size == TYPE_PRECISION (intDI_type_node))
+type = intDI_type_node;
+  else if (data_size == TYPE_PRECISION (intTI_type_node))
+type = intTI_type_node;
+  else
+{
+  if (dump_file && (dump_flags & 

[RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-24 Thread Mariam Arutunian
This patch adds a new compiler pass aimed at identifying naive CRC
implementations,
characterized by the presence of a loop calculating a CRC (polynomial long
division).
Upon detection of a potential CRC, the pass prints an informational message.

Performs CRC optimization if optimization level is >= 2,
besides optimizations for size and if fno_gimple_crc_optimization given.

This pass is added for the detection and optimization of naive CRC
implementations,
improving the efficiency of CRC-related computations.

This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be provided
in subsequent patches.

  gcc/

* Makefile.in (OBJS): Add gimple-crc-optimization.o.
* common.opt (fgimple-crc-optimization): New option.
* doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
* gimple-crc-optimization.cc: New file.
* gimple.cc (set_phi_stmts_not_visited): New function.
(set_gimple_stmts_not_visited): Likewise.
(set_bbs_stmts_not_visited): Likewise.
* gimple.h (set_gimple_stmts_not_visited): New extern function
declaration.
(set_phi_stmts_not_visited): New extern function declaration.
(set_bbs_stmts_not_visited): New extern function declaration.
* opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
(enable_fdo_optimizations): Enable gimple-crc-optimization.
* passes.def (pass_crc_optimization): Add new pass.
* timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
* tree-pass.h (make_pass_crc_optimization): New extern function
declaration.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a7f15694c34..e9e2ecc3a17 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1716,6 +1716,7 @@ OBJS = \
 	tree-iterator.o \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
+	gimple-crc-optimization.o \
 	tree-nested.o \
 	tree-nrv.o \
 	tree-object-size.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 2c078fdd1f8..53f7ab255dd 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization
 Perform global common subexpression elimination after register allocation has
 finished.
 
+fgimple-crc-optimization
+Common Var(flag_gimple_crc_optimization) Optimization
+Detect loops calculating CRC and replace with faster implementation.
+If the target supports carry-less-multiplication instruction, generate CRC using
+it; otherwise generate table-based CRC.
+
 Enum
 Name(dwarf_gnat_encodings) Type(int)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0625a5ede6f..fcf6e4e4e36 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -565,8 +565,8 @@ Objective-C and Objective-C++ Dialects}.
 -ffast-math  -ffinite-math-only  -ffloat-store  -fexcess-precision=@var{style}
 -ffinite-loops
 -fforward-propagate  -ffp-contract=@var{style}  -ffunction-sections
--fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgraphite-identity
--fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
+-fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgimple-crc-optimization
+-fgraphite-identity -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion
 -fif-conversion2  -findirect-inlining
 -finline-stringops[=@var{fn}]
 -finline-functions  -finline-functions-called-once  -finline-limit=@var{n}
@@ -13696,6 +13696,19 @@ This flag is disabled by default.
 Note that @option{-flive-patching} is not supported with link-time optimization
 (@option{-flto}).
 
+@opindex -fgimple-crc-optimization
+@item -fgimple-crc-optimization
+Detect loops calculating CRC (performing polynomial long division) and
+replace them with a faster implementation.  Detect 8, 16, 32, and 64 bit CRC,
+with a constant polynomial without the leading 1 bit,
+for both bit-forward and bit-reversed cases.
+If the target supports a CRC instruction and the polynomial used in the source
+code matches the polynomial used in the CRC instruction, generate that CRC
+instruction.  Otherwise, if the target supports a carry-less-multiplication
+instruction, generate CRC using it; otherwise generate table-based CRC.
+This flag is enabled by default at @option{-O2} and higher
+if @option{-Os} is not also specified.
+
 @opindex fisolate-erroneous-paths-dereference
 @item -fisolate-erroneous-paths-dereference
 Detect paths that trigger erroneous or undefined behavior due to
diff --git a/gcc/gimple-crc-optimization.cc b/gcc/gimple-crc-optimization.cc
new file mode 100644
index 000..4a17fa75930
--- /dev/null
+++ b/gcc/gimple-crc-optimization.cc
@@ -0,0 +1,965 @@
+/* CRC optimization.
+   Copyright (C) 2022-2024 Free Software Foundation, Inc.
+   Contributed by Mariam Arutunian 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+

[RFC/RFA] [PATCH 05/12] i386: Implement new expander for efficient CRC computation

2024-05-24 Thread Mariam Arutunian
This patch introduces two new expanders for the i386 backend,
dedicated to generating optimized code for CRC computations.
The new expanders are designed to leverage specific hardware capabilities
to achieve faster CRC calculations,
particularly using the pclmulqdq or crc32 instructions when supported by
the target architecture.

Expander 1: Bit-Forward CRC (crc4)
For targets that support both pclmulqdq instruction (TARGET_PCLMUL) and are
64-bit (TARGET_64BIT),
the expander will generate code that uses the pclmulqdq instruction for CRC
computation.

Expander 2: Bit-Reversed CRC (crc_rev4)
The expander first checks if the target supports the CRC32 instruction set
(TARGET_CRC32)
and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
it emits calls to the corresponding crc32 instruction (crc32b, crc32w, or
crc32l depending on the data size).
If the target does not support crc32 but supports pclmulqdq, it then uses
the pclmulqdq instruction for bit-reversed CRC computation.

Otherwise table-based CRC is generated.

  gcc/config/i386/

* i386-protos.h (ix86_expand_crc_using_clmul): New extern function
declaration.
(ix86_expand_reversed_crc_using_clmul):  Likewise.
* i386.cc (ix86_expand_crc_using_clmul): New function.
(ix86_expand_reversed_crc_using_clmul):  Likewise.
* i386.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
(SWI124dup): New iterator.
(crc4): New expander for bit-forward CRC.
(crc_rev4): New expander for reversed CRC.

  gcc/testsuite/gcc.target/i386/

* crc-crc32-data16.c: New test.
* crc-crc32-data32.c: Likewise.
* crc-crc32-data8.c: Likewise.
* crc-1-pclmul.c: Likewise.
* crc-10-pclmul.c: Likewise.
* crc-12-pclmul.c: Likewise.
* crc-13-pclmul.c: Likewise.
* crc-14-pclmul.c: Likewise.
* crc-17-pclmul.c: Likewise.
* crc-18-pclmul.c: Likewise.
* crc-21-pclmul.c: Likewise.
* crc-22-pclmul.c: Likewise.
* crc-23-pclmul.c: Likewise.
* crc-4-pclmul.c: Likewise.
* crc-5-pclmul.c: Likewise.
* crc-6-pclmul.c: Likewise.
* crc-7-pclmul.c: Likewise.
* crc-8-pclmul.c: Likewise.
* crc-9-pclmul.c: Likewise.
* crc-CCIT-data16-pclmul.c: Likewise.
* crc-CCIT-data8-pclmul.c: Likewise.
* crc-coremark-16bitdata-pclmul.c: Likewise.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index dbc861fb1ea..c09c174ca9a 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -288,6 +288,8 @@ extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
 extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,
 	   rtx);
+extern void ix86_expand_crc_using_clmul (rtx *);
+extern void ix86_expand_reversed_crc_using_clmul (rtx *);
 extern bool ix86_extract_perm_from_pool_constant (int*, rtx);
 
 /* In i386-c.cc  */
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 69cd4ae05a7..164258dffef 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26185,6 +26185,135 @@ ix86_run_selftests (void)
 
 } // namespace selftest
 
+/* Generate assembly to calculate CRC using pclmulqdq instruction.
+   OPERANDS[1] is input CRC,
+   OPERANDS[2] is data (message),
+   OPERANDS[3] is the polynomial without the leading 1.  */
+
+void
+ix86_expand_crc_using_clmul (rtx *operands)
+{
+/* Check and keep arguments.  */
+  gcc_assert (!CONST_INT_P (operands[0]));
+  gcc_assert (CONST_INT_P (operands[3]));
+  rtx crc = operands[1];
+  rtx data = operands[2];
+  unsigned HOST_WIDE_INT crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0]));
+  gcc_assert (crc_size <= 32);
+  unsigned HOST_WIDE_INT data_size = GET_MODE_BITSIZE (GET_MODE (data));
+  unsigned HOST_WIDE_INT DImode_size = GET_MODE_BITSIZE (DImode);
+
+  /* Calculate the quotient.  */
+  unsigned HOST_WIDE_INT
+  q = gf2n_poly_long_div_quotient (UINTVAL (operands[3]), crc_size + 1);
+
+  if (crc_size > data_size)
+crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
+			NULL_RTX, 1);
+
+  /* Keep the quotient in V2DImode.  */
+  rtx q_v2di = gen_reg_rtx (V2DImode);
+  rtx quotient = gen_reg_rtx (DImode);
+  convert_move (quotient, gen_int_mode (q, DImode), 0);
+  emit_insn (gen_vec_concatv2di (q_v2di, quotient, const0_rtx));
+
+  /* crc ^ data and keep in V2DImode.  */
+  rtx cd_xor = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
+			 OPTAB_WIDEN);
+  rtx res = gen_reg_rtx (V2DImode);
+  emit_insn (gen_vec_concatv2di (res, cd_xor, const0_rtx));
+  /* Perform carry-less multiplication.  */
+  emit_insn (gen_pclmulqdq (res, res, q_v2di, gen_int_mode (0, DImode)));
+
+  res = expand_shift (RSHIFT_EXPR, V2DImode, res, crc_size, NULL_RTX, 0);
+
+  /* Keep the polynomial in V2DImode.  */
+  rtx polynomial = gen_reg_rtx (DImode);
+  convert_move (polynomial, operands[3], 0);
+  rtx p_v2di = gen_reg_rtx (V2DImode);
+  emit_insn (gen_vec_concatv2di 

[RFC/RFA] [PATCH 09/12] Add symbolic execution support.

2024-05-24 Thread Mariam Arutunian
Gives an opportunity to execute the code on bit level,
assigning symbolic values to the variables which don't have initial values.
Supports only CRC specific operations.

Example:

uint8_t crc;
uint8_t pol = 1;
crc = crc ^ pol;

during symbolic execution crc's value will be:
crc(8), crc(7), ... crc(1), crc(0) ^ 1

Co-authored-by: Mariam Arutunian 

  gcc/

* Makefile.in (OBJS): Add sym-exec/expression.o,
sym-exec/state.o, sym-exec/condition.o.
* configure (sym-exec): New subdir.

  gcc/sym-exec/

* condition.cc: New file.
* condition.h: New file.
* expression-is-a-helper.h: New file.
* expression.cc: New file.
* expression.h: New file.
* state.cc: New file.
* state.h: New file.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e9e2ecc3a17..aab909c3510 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1717,6 +1717,9 @@ OBJS = \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
 	gimple-crc-optimization.o \
+	sym-exec/expression.o \
+	sym-exec/state.o \
+	sym-exec/condition.o \
 	tree-nested.o \
 	tree-nrv.o \
 	tree-object-size.o \
diff --git a/gcc/configure b/gcc/configure
index aaf5899cc03..c430c76400e 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -36139,7 +36139,7 @@ $as_echo "$as_me: executing $ac_file commands" >&6;}
 "depdir":C) $SHELL $ac_aux_dir/mkinstalldirs $DEPDIR ;;
 "gccdepdir":C)
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs build/$DEPDIR
-  for lang in $subdirs c-family common analyzer text-art rtl-ssa
+  for lang in $subdirs c-family common analyzer text-art rtl-ssa sym-exec
   do
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs $lang/$DEPDIR
   done ;;
diff --git a/gcc/sym-exec/condition.cc b/gcc/sym-exec/condition.cc
new file mode 100644
index 000..5b558d1e315
--- /dev/null
+++ b/gcc/sym-exec/condition.cc
@@ -0,0 +1,53 @@
+#include "condition.h"
+
+bit_condition::bit_condition (value_bit *left, value_bit *right, tree_code code)
+{
+  this->m_left = left;
+  this->m_right = right;
+  this->m_code = code;
+  m_type = BIT_CONDITION;
+}
+
+
+bit_condition::bit_condition (const bit_condition )
+{
+  bit_expression::copy ();
+  m_code = expr.get_code ();
+}
+
+
+tree_code
+bit_condition::get_code () const
+{
+  return m_code;
+}
+
+
+value_bit *
+bit_condition::copy () const
+{
+  return new bit_condition (*this);
+}
+
+
+void
+bit_condition::print_expr_sign ()
+{
+  switch (m_code)
+{
+  case GT_EXPR:
+	fprintf (dump_file, " > ");
+	break;
+  case LT_EXPR:
+	fprintf (dump_file, " < ");
+	break;
+  case EQ_EXPR:
+	fprintf (dump_file, " == ");
+	break;
+  case NE_EXPR:
+	fprintf (dump_file, " != ");
+	break;
+  default:
+	fprintf (dump_file, " ? ");
+}
+}
\ No newline at end of file
diff --git a/gcc/sym-exec/condition.h b/gcc/sym-exec/condition.h
new file mode 100644
index 000..1882c6cfa91
--- /dev/null
+++ b/gcc/sym-exec/condition.h
@@ -0,0 +1,26 @@
+#ifndef SYM_EXEC_CONDITION_H
+#define SYM_EXEC_CONDITION_H
+
+#include "expression.h"
+
+enum condition_status {
+  CS_NO_COND,
+  CS_TRUE,
+  CS_FALSE,
+  CS_SYM
+};
+
+
+class bit_condition : public bit_expression {
+ private:
+  tree_code m_code;
+  void print_expr_sign ();
+
+ public:
+  bit_condition (value_bit *left, value_bit *right, tree_code type);
+  bit_condition (const bit_condition );
+  tree_code get_code () const;
+  value_bit *copy () const;
+};
+
+#endif /* SYM_EXEC_CONDITION_H.  */
\ No newline at end of file
diff --git a/gcc/sym-exec/expression-is-a-helper.h b/gcc/sym-exec/expression-is-a-helper.h
new file mode 100644
index 000..9931254c36e
--- /dev/null
+++ b/gcc/sym-exec/expression-is-a-helper.h
@@ -0,0 +1,204 @@
+#ifndef SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+#define SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+
+#include "condition.h"
+
+/* Defining test functions for value conversion via dyn_cast.  */
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::SYMBOLIC_BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  value_type type = ptr->get_type ();
+  return type == value_type::BIT_AND_EXPRESSION
+	 || type == value_type::BIT_OR_EXPRESSION
+	 || type == value_type::BIT_XOR_EXPRESSION
+	 || type == value_type::BIT_COMPLEMENT_EXPRESSION
+	 || type == value_type::SHIFT_RIGHT_EXPRESSION
+	 || type == value_type::SHIFT_LEFT_EXPRESSION
+	 || type == value_type::ADD_EXPRESSION
+	 || type == value_type::SUB_EXPRESSION
+	 || type == value_type::BIT_CONDITION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT_AND_EXPRESSION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT_OR_EXPRESSION;
+}
+
+

[RFC/RFA] [PATCH 07/12] aarch64: Add CRC built-ins test for the target AES.

2024-05-24 Thread Mariam Arutunian
  gcc/testsuite/gcc.target/aarch64/

* crc-builtin-pmul64.c: New test.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/testsuite/gcc.target/aarch64/crc-builtin-pmul64.c b/gcc/testsuite/gcc.target/aarch64/crc-builtin-pmul64.c
new file mode 100644
index 000..d8bb1724a65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/crc-builtin-pmul64.c
@@ -0,0 +1,61 @@
+/* { dg-options "-march=armv8-a+crypto" } */
+
+#include 
+int8_t crc8_data8 ()
+{
+  return __builtin_crc8_data8 ('a', 0xff, 0x12);
+}
+int16_t crc16_data8 ()
+{
+  return __builtin_crc16_data8 (0x1234, 'a', 0x1021);
+}
+
+int16_t crc16_data16 ()
+{
+  return __builtin_crc16_data16 (0x1234, 0x3214, 0x1021);
+}
+
+int32_t crc32_data8 ()
+{
+  return __builtin_crc32_data8 (0x, 0x32, 0x4002123);
+}
+int32_t crc32_data16 ()
+{
+  return __builtin_crc32_data16 (0x, 0x3232, 0x4002123);
+}
+
+int32_t crc32_data32 ()
+{
+  return __builtin_crc32_data32 (0x, 0x123546ff, 0x4002123);
+}
+
+int8_t rev_crc8_data8 ()
+{
+  return __builtin_rev_crc8_data8 (0x34, 'a', 0x12);
+}
+
+int16_t rev_crc16_data8 ()
+{
+  return __builtin_rev_crc16_data8 (0x1234, 'a', 0x1021);
+}
+
+int16_t rev_crc16_data16 ()
+{
+  return __builtin_rev_crc16_data16 (0x1234, 0x3214, 0x1021);
+}
+
+int32_t rev_crc32_data8 ()
+{
+  return __builtin_rev_crc32_data8 (0x, 0x32, 0x4002123);
+}
+
+int32_t rev_crc32_data16 ()
+{
+  return __builtin_rev_crc32_data16 (0x, 0x3232, 0x4002123);
+}
+
+int32_t rev_crc32_data32 ()
+{
+  return __builtin_rev_crc32_data32 (0x, 0x123546ff, 0x4002123);
+} 
+/* { dg-final { scan-assembler-times "pmull" 24 } } */
\ No newline at end of file
-- 
2.25.1



[RFC/RFA] [PATCH 04/12] RISC-V: Add CRC built-ins tests for the target ZBC.

2024-05-24 Thread Mariam Arutunian
  gcc/testsuite/gcc.target/riscv/

* crc-builtin-zbc32.c: New file.
* crc-builtin-zbc64.c: Likewise.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc32.c b/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc32.c
new file mode 100644
index 000..20d7d25f60e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { riscv32*-*-* } } } */
+/* { dg-options "-march=rv32gc_zbc" } */
+
+#include 
+
+int8_t crc8_data8 ()
+{
+  return __builtin_crc8_data8 (0x34, 'a', 0x12);
+}
+
+int16_t crc16_data8 ()
+{
+  return __builtin_crc16_data8 (0x1234, 'a', 0x1021);
+}
+
+int16_t crc16_data16 ()
+{
+  return __builtin_crc16_data16 (0x1234, 0x3214, 0x1021);
+}
+
+/* { dg-final { scan-assembler-times "clmul\t" 6 } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc64.c b/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc64.c
new file mode 100644
index 000..c9509d56d01
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/crc-builtin-zbc64.c
@@ -0,0 +1,66 @@
+/* { dg-do compile { target { riscv64*-*-* } } } */
+/* { dg-options "-march=rv64gc_zbc" } */
+
+#include 
+
+int8_t crc8_data8 ()
+{
+  return __builtin_crc8_data8 (0x34, 'a', 0x12);
+}
+
+int16_t crc16_data8 ()
+{
+  return __builtin_crc16_data8 (0x1234, 'a', 0x1021);
+}
+
+int16_t crc16_data16 ()
+{
+  return __builtin_crc16_data16 (0x1234, 0x3214, 0x1021);
+}
+
+int32_t crc32_data8 ()
+{
+  return __builtin_crc32_data8 (0x, 0x32, 0x4002123);
+}
+
+int32_t crc32_data16 ()
+{
+  return __builtin_crc32_data16 (0x, 0x3232, 0x4002123);
+}
+
+int32_t crc32_data32 ()
+{
+  return __builtin_crc32_data32 (0x, 0x123546ff, 0x4002123);
+}
+
+int8_t rev_crc8_data8 ()
+{
+  return __builtin_rev_crc8_data8 (0x34, 'a', 0x12);
+}
+
+int16_t rev_crc16_data8 ()
+{
+  return __builtin_rev_crc16_data8 (0x1234, 'a', 0x1021);
+}
+
+int16_t rev_crc16_data16 ()
+{
+  return __builtin_rev_crc16_data16 (0x1234, 0x3214, 0x1021);
+}
+
+int32_t rev_crc32_data8 ()
+{
+  return __builtin_rev_crc32_data8 (0x, 0x32, 0x4002123);
+}
+
+int32_t rev_crc32_data16 ()
+{
+  return __builtin_rev_crc32_data16 (0x, 0x3232, 0x4002123);
+}
+
+int32_t rev_crc32_data32 ()
+{
+  return __builtin_rev_crc32_data32 (0x, 0x123546ff, 0x4002123);
+}
+/* { dg-final { scan-assembler-times "clmul\t" 18 } } */
+/* { dg-final { scan-assembler-times "clmulh" 6 } } */
\ No newline at end of file
-- 
2.25.1



[RFC/RFA] [PATCH 02/12] Add built-ins and tests for bit-forward and bit-reversed CRCs

2024-05-24 Thread Mariam Arutunian
This patch introduces new built-in functions to GCC for computing
bit-forward and bit-reversed CRCs.
These builtins aim to provide efficient CRC calculation capabilities.
When the target architecture supports CRC operations (as indicated by the
presence of a CRC optab),
the builtins will utilize the expander to generate CRC code.
In the absence of hardware support, the builtins default to generating code
for a table-based CRC calculation.

The builtins are defined as follows:
__builtin_rev_crc16_data8,
__builtin_rev_crc32_data8, __builtin_rev_crc32_data16,
__builtin_rev_crc32_data32
__builtin_crc8_data8,
__builtin_crc16_data16, __builtin_crc16_data8,
__builtin_crc32_data8, __builtin_crc32_data16, __builtin_crc32_data32,
__builtin_crc64_data8, __builtin_crc64_data16,  __builtin_crc64_data32,
__builtin_crc64_data64

Each builtin takes three parameters:
crc: The initial CRC value.
data: The data to be processed.
polynomial: The CRC polynomial without the leading 1.

To validate the correctness of these builtins, this patch also includes
additions to the GCC testsuite.
This enhancement allows GCC to offer developers high-performance CRC
computation options
that automatically adapt to the capabilities of the target hardware.

Co-authored-by: Joern Rennecke 

Not complete. May continue the work if these built-ins are needed.

gcc/

 * builtin-types.def (BT_FN_UINT8_UINT8_UINT8_CONST_SIZE): Define.
 (BT_FN_UINT16_UINT16_UINT8_CONST_SIZE): Likewise.
  (BT_FN_UINT16_UINT16_UINT16_CONST_SIZE): Likewise.
  (BT_FN_UINT32_UINT32_UINT8_CONST_SIZE): Likewise.
  (BT_FN_UINT32_UINT32_UINT16_CONST_SIZE): Likewise.
  (BT_FN_UINT32_UINT32_UINT32_CONST_SIZE): Likewise.
  (BT_FN_UINT64_UINT64_UINT8_CONST_SIZE): Likewise.
  (BT_FN_UINT64_UINT64_UINT16_CONST_SIZE): Likewise.
  (BT_FN_UINT64_UINT64_UINT32_CONST_SIZE): Likewise.
  (BT_FN_UINT64_UINT64_UINT64_CONST_SIZE): Likewise.
  * builtins.cc (associated_internal_fn): Handle
BUILT_IN_CRC8_DATA8,
  BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
  BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16,
BUILT_IN_CRC32_DATA32,
  BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16,
BUILT_IN_CRC64_DATA32,
  BUILT_IN_CRC64_DATA64,
  BUILT_IN_REV_CRC8_DATA8,
  BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
  BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16,
BUILT_IN_REV_CRC32_DATA32.
  (expand_builtin_crc_table_based): New function.
  (expand_builtin): Handle BUILT_IN_CRC8_DATA8,
  BUILT_IN_CRC16_DATA8, BUILT_IN_CRC16_DATA16,
  BUILT_IN_CRC32_DATA8, BUILT_IN_CRC32_DATA16,
BUILT_IN_CRC32_DATA32,
  BUILT_IN_CRC64_DATA8, BUILT_IN_CRC64_DATA16,
BUILT_IN_CRC64_DATA32,
  BUILT_IN_CRC64_DATA64,
  BUILT_IN_REV_CRC8_DATA8,
  BUILT_IN_REV_CRC16_DATA8, BUILT_IN_REV_CRC16_DATA16,
  BUILT_IN_REV_CRC32_DATA8, BUILT_IN_REV_CRC32_DATA16,
BUILT_IN_REV_CRC32_DATA32.
  * builtins.def (BUILT_IN_CRC8_DATA8): New builtin.
  (BUILT_IN_CRC16_DATA8): Likewise.
  (BUILT_IN_CRC16_DATA16): Likewise.
  (BUILT_IN_CRC32_DATA8): Likewise.
  (BUILT_IN_CRC32_DATA16): Likewise.
  (BUILT_IN_CRC32_DATA32): Likewise.
  (BUILT_IN_CRC64_DATA8): Likewise.
  (BUILT_IN_CRC64_DATA16): Likewise.
  (BUILT_IN_CRC64_DATA32): Likewise.
  (BUILT_IN_CRC64_DATA64): Likewise.
  (BUILT_IN_REV_CRC8_DATA8): New builtin.
  (BUILT_IN_REV_CRC16_DATA8): Likewise.
  (BUILT_IN_REV_CRC16_DATA16): Likewise.
  (BUILT_IN_REV_CRC32_DATA8): Likewise.
  (BUILT_IN_REV_CRC32_DATA16): Likewise.
  (BUILT_IN_REV_CRC32_DATA32): Likewise.
  * builtins.h (expand_builtin_crc_table_based): New function
declaration.
  * doc/extend.texti (__builtin_rev_crc16_data8,
  (__builtin_rev_crc32_data32, __builtin_rev_crc32_data8,
  __builtin_rev_crc32_data16, __builtin_crc8_data8,
  __builtin_crc16_data16, __builtin_crc16_data8,
  __builtin_crc32_data32, __builtin_crc32_data8,
  __builtin_crc32_data16, __builtin_crc64_data64,
  __builtin_crc64_data8, __builtin_crc64_data16,
  __builtin_crc64_data32): Document.

  gcc/testsuite/

 * gcc.c-torture/compile/crc-builtin-rev-target32.c
 * gcc.c-torture/compile/crc-builtin-rev-target64.c
 * gcc.c-torture/compile/crc-builtin-target32.c
 * gcc.c-torture/compile/crc-builtin-target64.c

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..a0c4b8b9ca6 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -829,6 +829,26 @@ DEF_FUNCTION_TYPE_3 (BT_FN_PTR_SIZE_SIZE_PTRMODE,
 		 BT_PTR, BT_SIZE, BT_SIZE, BT_PTRMODE)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_UINT8_PTRMODE, BT_VOID, BT_PTR, BT_UINT8,
 		 BT_PTRMODE)
+DEF_FUNCTION_TYPE_3 

[RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation

2024-05-24 Thread Mariam Arutunian
This patch introduces two new expanders for the aarch64 backend,
dedicated to generate optimized code for CRC computations.
The new expanders are designed to leverage specific hardware capabilities
to achieve faster CRC calculations,
particularly using the pmul or crc32 instructions when supported by the
target architecture.

Expander 1: Bit-Forward CRC (crc4)
For targets that support pmul instruction (TARGET_AES),
the expander will generate code that uses the pmul (crypto_pmulldi)
instruction for CRC computation.

Expander 2: Bit-Reversed CRC (crc_rev4)
The expander first checks if the target supports the CRC32 instruction set
(TARGET_CRC32)
and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
crc32w, or crc32x depending on the data size).
If the target does not support crc32 but supports pmul, it then uses the
pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.

Otherwise table-based CRC is generated.

  gcc/config/aarch64/

* aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
function declaration.
(aarch64_expand_reversed_crc_using_clmul):  Likewise.
* aarch64.cc (aarch64_expand_crc_using_clmul): New function.
(aarch64_expand_reversed_crc_using_clmul):  Likewise.
* aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
(crc_rev4): New expander for reversed CRC.
(crc4): New expander for reversed CRC.
* iterators.md (crc_data_type): New mode attribute.

  gcc/testsuite/gcc.target/aarch64/

* crc-1-pmul.c: Likewise.
* crc-10-pmul.c: Likewise.
* crc-12-pmul.c: Likewise.
* crc-13-pmul.c: Likewise.
* crc-14-pmul.c: Likewise.
* crc-17-pmul.c: Likewise.
* crc-18-pmul.c: Likewise.
* crc-21-pmul.c: Likewise.
* crc-22-pmul.c: Likewise.
* crc-23-pmul.c: Likewise.
* crc-4-pmul.c: Likewise.
* crc-5-pmul.c: Likewise.
* crc-6-pmul.c: Likewise.
* crc-7-pmul.c: Likewise.
* crc-8-pmul.c: Likewise.
* crc-9-pmul.c: Likewise.
* crc-CCIT-data16-pmul.c: Likewise.
* crc-CCIT-data8-pmul.c: Likewise.
* crc-coremark-16bitdata-pmul.c: Likewise.
* crc-crc32-data16.c: New test.
* crc-crc32-data32.c: Likewise.
* crc-crc32-data8.c: Likewise.

Signed-off-by: Mariam Arutunian diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1d3f94c813e..167e1140f0d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree, rtx, int);
 
 bool aarch64_optimize_mode_switching (aarch64_mode_entity);
 void aarch64_restore_za (rtx);
+void aarch64_expand_crc_using_clmul (rtx *);
+void aarch64_expand_reversed_crc_using_clmul (rtx *);
+
 
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ee12d8897a8..05cd0296d38 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname, bool write_p, bool is128op)
   return sysreg->encoding;
 }
 
+/* Generate assembly to calculate CRC
+   using carry-less multiplication instruction.
+   OPERANDS[1] is input CRC,
+   OPERANDS[2] is data (message),
+   OPERANDS[3] is the polynomial without the leading 1.  */
+
+void
+aarch64_expand_crc_using_clmul (rtx *operands)
+{
+  /* Check and keep arguments.  */
+  gcc_assert (!CONST_INT_P (operands[0]));
+  gcc_assert (CONST_INT_P (operands[3]));
+  rtx crc = operands[1];
+  rtx data = operands[2];
+  rtx polynomial = operands[3];
+
+  unsigned HOST_WIDE_INT
+  crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
+  gcc_assert (crc_size <= 32);
+  unsigned HOST_WIDE_INT
+  data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();
+
+  /* Calculate the quotient.  */
+  unsigned HOST_WIDE_INT
+  q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size + 1);
+
+  /* CRC calculation's main part.  */
+  if (crc_size > data_size)
+crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
+			NULL_RTX, 1);
+
+  rtx t0 = gen_reg_rtx (DImode);
+  aarch64_emit_move (t0, gen_int_mode (q, DImode));
+  rtx t1 = gen_reg_rtx (DImode);
+  aarch64_emit_move (t1, polynomial);
+
+  rtx a0 = expand_binop (DImode, xor_optab, crc, data, NULL_RTX, 1,
+			 OPTAB_WIDEN);
+
+  rtx clmul_res = gen_reg_rtx (TImode);
+  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t0));
+  a0 = gen_lowpart (DImode, clmul_res);
+
+  a0 = expand_shift (RSHIFT_EXPR, DImode, a0, crc_size, NULL_RTX, 1);
+
+  emit_insn (gen_aarch64_crypto_pmulldi (clmul_res, a0, t1));
+  a0 = gen_lowpart (DImode, clmul_res);
+
+  if (crc_size > data_size)
+{
+  rtx crc_part = expand_shift (LSHIFT_EXPR, DImode, operands[1], data_size,
+   NULL_RTX, 0);
+  a0 =  expand_binop (DImode, xor_optab, a0, crc_part, NULL_RTX, 1,
+			  OPTAB_DIRECT);
+}

[RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-05-24 Thread Mariam Arutunian
If the target is ZBC or ZBKC, it uses clmul instruction for the CRC
calculation.
Otherwise, if the target is ZBKB, generates table-based CRC,
but for reversing inputs and the output uses bswap and brev8 instructions.
Add new tests to check CRC generation for ZBC, ZBKC and ZBKB targets.

  gcc/

 * expr.cc (gf2n_poly_long_div_quotient): New function.
 (reflect): Likewise.
 * expr.h (gf2n_poly_long_div_quotient): New function declaration.
 (reflect): Likewise.

  gcc/config/riscv/

 * bitmanip.md (crc_rev4): New expander for
reversed CRC.
 (crc4): New expander for bit-forward CRC.
 (SUBX1, ANYI1): New iterators.
 * riscv-protos.h (generate_reflecting_code_using_brev): New function
declaration.
 (expand_crc_using_clmul): Likewise.
 (expand_reversed_crc_using_clmul): Likewise.
 * riscv.cc (generate_reflecting_code_using_brev): New function.
 (expand_crc_using_clmul): Likewise.
 (expand_reversed_crc_using_clmul): Likewise.
 * riscv.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.

  gcc/testsuite/gcc.target/riscv/

* crc-1-zbc.c: New test.
* crc-10-zbc.c: Likewise.
* crc-12-zbc.c: Likewise.
* crc-13-zbc.c: Likewise.
* crc-14-zbc.c: Likewise.
* crc-17-zbc.c: Likewise.
* crc-18-zbc.c: Likewise.
* crc-21-zbc.c: Likewise.
* crc-22-rv64-zbc.c: Likewise.
* crc-22-zbkb.c: Likewise.
* crc-23-zbc.c: Likewise.
* crc-4-zbc.c: Likewise.
* crc-5-zbc.c: Likewise.
* crc-5-zbkb.c: Likewise.
* crc-6-zbc.c: Likewise.
* crc-7-zbc.c: Likewise.
* crc-8-zbc.c: Likewise.
* crc-8-zbkb.c: Likewise.
* crc-9-zbc.c: Likewise.
* crc-CCIT-data16-zbc.c: Likewise.
* crc-CCIT-data8-zbc.c: Likewise.
* crc-coremark-16bitdata-zbc.c: Likewise.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 8769a6b818b..c98d451f404 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -973,3 +973,66 @@
   "TARGET_ZBC"
   "clmulr\t%0,%1,%2"
   [(set_attr "type" "clmul")])
+
+
+;; Iterator for hardware integer modes narrower than XLEN, same as SUBX
+(define_mode_iterator SUBX1 [QI HI (SI "TARGET_64BIT")])
+
+;; Iterator for hardware integer modes narrower than XLEN, same as ANYI
+(define_mode_iterator ANYI1 [QI HI SI (DI "TARGET_64BIT")])
+
+;; Reversed CRC 8, 16, 32 for TARGET_64
+(define_expand "crc_rev4"
+	;; return value (calculated CRC)
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+		  ;; initial CRC
+	(unspec:ANYI [(match_operand:ANYI 1 "register_operand" "r")
+		  ;; data
+		  (match_operand:ANYI1 2 "register_operand" "r")
+		  ;; polynomial without leading 1
+		  (match_operand:ANYI 3)]
+		  UNSPEC_CRC_REV))]
+  /* We don't support the case when data's size is bigger than CRC's size.  */
+  "(((TARGET_ZBKC || TARGET_ZBC) && mode < word_mode) || TARGET_ZBKB)
+   && mode >= mode"
+{
+  /* If we have the ZBC or ZBKC extension (ie, clmul) and
+ it is possible to store the quotient within a single variable
+ (E.g.  CRC64's quotient may need 65 bits,
+ we can't keep it in 64 bit variable.)
+ then use clmul instruction to implement the CRC,
+ otherwise (TARGET_ZBKB) generate table based using brev.  */
+  if ((TARGET_ZBKC || TARGET_ZBC) && mode < word_mode)
+expand_reversed_crc_using_clmul (operands);
+  else
+/* Generate table-based CRC.
+   To reflect values use brev and bswap instructions.  */
+expand_reversed_crc_table_based (operands[0], operands[1],
+ operands[2], operands[3],
+ GET_MODE (operands[2]),
+ generate_reflecting_code_using_brev);
+  DONE;
+})
+
+;; CRC 8, 16, (32 for TARGET_64)
+(define_expand "crc4"
+	;; return value (calculated CRC)
+  [(set (match_operand:SUBX 0 "register_operand" "=r")
+		  ;; initial CRC
+	(unspec:SUBX [(match_operand:SUBX 1 "register_operand" "r")
+		  ;; data
+		  (match_operand:SUBX1 2 "register_operand" "r")
+		  ;; polynomial without leading 1
+		  (match_operand:SUBX 3)]
+		  UNSPEC_CRC))]
+  /* We don't support the case when data's size is bigger than CRC's size.  */
+  "(TARGET_ZBKC || TARGET_ZBC) && mode >= mode"
+{
+  /* If we have the ZBC or ZBKC extension (ie, clmul) and
+ it is possible to store the quotient within a single variable
+ (E.g.  CRC64's quotient may need 65 bits,
+ we can't keep it in 64 bit variable.)
+ then use clmul instruction to implement the CRC.  */
+  expand_crc_using_clmul (operands);
+  DONE;
+})
\ No newline at end of file
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0704968561b..5b6cc52ce0b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -169,6 +169,9 @@ extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 extern bool riscv_reg_frame_related 

[RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

2024-05-24 Thread Mariam Arutunian
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

Co-authored-by: Joern Rennecke 

gcc/

   * doc/md.texi (crc@var{m}@var{n}4,
   crc_rev@var{m}@var{n}4): Document.
   * expr.cc (generate_crc_table): New function.
   (calculate_table_based_CRC): Likewise.
   (expand_crc_table_based): Likewise.
   (gen_common_operation_to_reflect): Likewise.
   (reflect_64_bit_value): Likewise.
   (reflect_32_bit_value): Likewise.
   (reflect_16_bit_value): Likewise.
   (reflect_8_bit_value): Likewise.
   (generate_reflecting_code_standard): Likewise.
   (expand_reversed_crc_table_based): Likewise.
   * expr.h (generate_reflecting_code_standard): New function declaration.
   (expand_crc_table_based): Likewise.
   (expand_reversed_crc_table_based): Likewise.
   * internal-fn.cc: (crc_direct): Define.
   (direct_crc_optab_supported_p): Likewise.
   (expand_crc_optab_fn): New function
   * internal-fn.def (CRC, CRC_REV): New internal functions.
   * optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..be68ef860f9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with operand 2.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{crc@var{m}@var{n}4} instruction pattern
+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers.  The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+
 @end table
 
 @end ifset
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 1baa39b98eb..18368ae6b6c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
 
   return tree_to_shwi (size);
 }
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+   CRC_BITS is CRC size.  */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+	  unsigned HOST_WIDE_INT polynomial,
+	  unsigned crc_bits)
+{
+  crc = crc << (crc_bits - 8);
+  for (int i = 8; i > 0; --i)
+{
+  if ((crc >> (crc_bits - 1)) & 1)
+	crc = (crc << 1) ^ polynomial;
+  else
+	crc <<= 1;
+}
+
+  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+
+  return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+   given ID.
+   ID is the identifier of the table, the name of the table is unique,
+   contains CRC size and the polynomial.
+   POLYNOM is the polynomial used to calculate the CRC table's elements.
+   CRC_BITS is the size of CRC, may be 8, 16, ... . */
+
+rtx
+assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+  unsigned table_el_n = 0x100;
+  tree ar = build_array_type (make_unsigned_type (crc_bits),
+			  build_index_type (size_int (table_el_n - 1)));
+  tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL, id, ar);
+  SET_DECL_ASSEMBLER_NAME (decl, id);
+  DECL_ARTIFICIAL (decl) = 1;
+  rtx tab = gen_rtx_SYMBOL_REF (Pmode, IDENTIFIER_POINTER (id));
+  TREE_ASM_WRITTEN (decl) = 0;
+
+  /* Initialize the table.  */
+  vec *initial_values;
+  vec_alloc (initial_values, table_el_n);
+  for (size_t i = 0; i < table_el_n; ++i)
+{
+  unsigned HOST_WIDE_INT crc = calculate_crc (i, polynom, crc_bits);
+  tree element = build_int_cstu (make_unsigned_type (crc_bits), crc);
+  vec_safe_push (initial_values, element);
+}
+  DECL_INITIAL (decl) = build_constructor_from_vec (ar, initial_values);
+
+  TREE_READONLY (decl) = 1;
+  TREE_STATIC (decl) = 1;
+
+  if (TREE_PUBLIC (id))
+{
+  TREE_PUBLIC (decl) = 1;
+  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+}
+
+  mark_decl_referenced (decl);
+  varpool_node::finalize_decl (decl);
+
+  return tab;
+}
+
+/* Generate CRC lookup table by calculating CRC for all possible
+   8-bit data values.  The table is stored with a specific name in the read-only
+   data section.
+   POLYNOM is the polynomial used to calculate 

[RFC/RFA][PATCH 00/12] CRC optimization

2024-05-24 Thread Mariam Arutunian
Hello!
This patch set detects bitwise CRC implementation loops (with branches) in
the GIMPLE optimizers and replaces them with more optimal CRC
implementations in RTL. These patches introduce new internal functions,
built-in functions, and expanders for CRC generation, leveraging hardware
instructions where available. Additionally, various tests are included to
check CRC detection and generation. Main Features:

   1.

   CRC Loop Detection and Replacement:
   - Detection of CRC loops involves two stages: fast checks to identify
  potential candidates and verification using symbolic execution. The
  algorithm detects only CRCs (8, 16, 32, and 64 bits, both bit-forward and
  bit-reversed) with constant polynomials used without the leading 1. This
  part can be improved to detect more implementation types.
  - Once identified, the CRC loops are replaced with calls to newly
  added internal functions. These internal functions use target-specific
  expanders if available, otherwise generating table-based CRCs.
   2.

   Architecture-Specific Expanders:
   - Expanders are added for RISC-V, aarch64, and i386 architectures.
  - These expanders generate CRCs using either carry-less
  multiplication instructions or direct CRC instructions, based on
the target
  architecture's capabilities.
   3.

   New Internal and Built-In Functions:
   - Introduces internal functions and built-in functions for CRC
  generation, supporting various CRC and data sizes (8, 16, 32,
and 64 bits).

I presented this work during the GNU Tools Cauldron 2023. You can view the
presentation here: GCC CRC optimization presentation

.

Previously, I submitted a patch to GCC upstream that included built-in
parts and expanders for RISC-V. However, the main component of the
previously sent patch has been changed. You can find the patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626279.html


Best regards,
Mariam


Re: [PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Richard Biener
On Fri, 24 May 2024, Manolis Tsamis wrote:

> On Fri, May 24, 2024 at 10:46 AM Richard Biener  wrote:
> >
> > On Fri, 24 May 2024, Manolis Tsamis wrote:
> >
> > > On Fri, May 24, 2024 at 9:31 AM Richard Biener  wrote:
> > > >
> > > > On Wed, 22 May 2024, Manolis Tsamis wrote:
> > > >
> > > > > The match.pd patterns to merge two vector permutes into one fail when 
> > > > > a
> > > > > potentially no-op view convert expressions is between the two 
> > > > > permutes.
> > > > > This change lifts this restriction.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > >   * match.pd: Allow no-op view_convert between permutes.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > >   * gcc.dg/fold-perm-2.c: New test.
> > > > >
> > > > > Signed-off-by: Manolis Tsamis 
> > > > > ---
> > > > >
> > > > >  gcc/match.pd   | 14 --
> > > > >  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
> > > > >  2 files changed, 24 insertions(+), 6 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 07e743ae464..cbb3c5d86e0 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -10039,19 +10039,21 @@ and,
> > > > >   d = VEC_PERM_EXPR ;  */
> > > > >
> > > > >  (simplify
> > > > > - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> > > > > + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 
> > > > > VECTOR_CST@5)
> > > > >   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > > > >(with
> > > > > {
> > > > >   machine_mode result_mode = TYPE_MODE (type);
> > > > > - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> > > > > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
> > > > >   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> > > > >   vec_perm_builder builder0;
> > > > >   vec_perm_builder builder1;
> > > > >   vec_perm_builder builder2 (nelts, nelts, 1);
> > > > > }
> > > > > -   (if (tree_to_vec_perm_builder (, @3)
> > > > > - && tree_to_vec_perm_builder (, @4))
> > > > > +   (if (tree_to_vec_perm_builder (, @4)
> > > > > + && tree_to_vec_perm_builder (, @5)
> > > > > + && element_precision (TREE_TYPE (@0))
> > > > > +== element_precision (TREE_TYPE (@1)))
> > > >
> > > > I think you want to check TYPE_SIZE (TREE_TYPE (@0/@1)) for equality
> > > > instead.
> > > >
> > >
> > > I think TYPE_SIZE is not enough as we need the vector elements to have
> > > the same size, not just the vector as a whole.
> >
> > Err, yes - you want to check the element sizes of course.
> >
> From what I understand, checking the element size should be enough.
> Otherwise we can check both TYPE_SIZE and element_precision to be
> equal.
> So OK to commit with just element_precision?

Please just check the element size.  I'm always worried when
using TYPE_PRECISION on FP types and for shuffles it's really
only about size.

> BTW I also noticed from these testcases that there is a gcc 13 -> 14
> regression with weird XORs being introduced:
> 
> typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> void fun (veci *a, veci *b, veci *c) {
>   *c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> }
> 
> gcc 13.3:
>   adrp x3, .LC0
>   ldr q0, [x0]
>   ldr q1, [x1]
>   ldr q2, [x3, #:lo12:.LC0]
>   tbl v0.16b, {v0.16b - v1.16b}, v2.16b
>   str q0, [x2]
> 
> gcc 14.1:
>   ldr q30, [x1]
>   adrp x3, .LC0
>   ldr q31, [x0]
>   ldr q29, [x3, #:lo12:.LC0]
>   eor v31.16b, v31.16b, v30.16b
>   eor v30.16b, v31.16b, v30.16b
>   eor v31.16b, v31.16b, v30.16b
>   tbl v30.16b, {v30.16b - v31.16b}, v29.16b
>   str q30, [x2]

You'd need to bisect that but I'd guess we got some extra
match patterns triggering?

> Manolis
> 
> > > For example, when using the TYPE_SIZE check instead the following
> > > testcase miscompiles
> > >
> > > typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> > > typedef double vecd __attribute__ ((vector_size (2 * sizeof (double;
> > >
> > > void fun (veci *a, veci *b, veci *c)
> > > {
> > >   char data[16];
> > >   veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> > >   vecd r2;
> > >   __builtin_memcpy(data, , sizeof(veci));
> > >   __builtin_memcpy(, data, sizeof(vecd));
> > >   vecd r3 = __builtin_shufflevector (r2, r2, 1, 0);
> > >   __builtin_memcpy(data, , sizeof(vecd));
> > >   __builtin_memcpy(c, data, sizeof(veci));
> > > }
> > >
> > > To:
> > >
> > > ldr q31, [x0]
> > > rev64   v31.4s, v31.4s
> > > str q31, [x2]
> > > ret
> > >
> > > > Otherwise OK.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > >  (with
> > > > >   {
> > > > > vec_perm_indices sel0 (builder0, 2, nelts);
> > > > > @@ -10073,10 +10075,10 @@ and,
> > > > >  ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, 
> > > > > false)
> > > > > || !can_vec_perm_const_p (result_mode, 

Re: [PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Manolis Tsamis
On Fri, May 24, 2024 at 10:46 AM Richard Biener  wrote:
>
> On Fri, 24 May 2024, Manolis Tsamis wrote:
>
> > On Fri, May 24, 2024 at 9:31 AM Richard Biener  wrote:
> > >
> > > On Wed, 22 May 2024, Manolis Tsamis wrote:
> > >
> > > > The match.pd patterns to merge two vector permutes into one fail when a
> > > > potentially no-op view convert expressions is between the two permutes.
> > > > This change lifts this restriction.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * match.pd: Allow no-op view_convert between permutes.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.dg/fold-perm-2.c: New test.
> > > >
> > > > Signed-off-by: Manolis Tsamis 
> > > > ---
> > > >
> > > >  gcc/match.pd   | 14 --
> > > >  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
> > > >  2 files changed, 24 insertions(+), 6 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 07e743ae464..cbb3c5d86e0 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -10039,19 +10039,21 @@ and,
> > > >   d = VEC_PERM_EXPR ;  */
> > > >
> > > >  (simplify
> > > > - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> > > > + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 
> > > > VECTOR_CST@5)
> > > >   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > > >(with
> > > > {
> > > >   machine_mode result_mode = TYPE_MODE (type);
> > > > - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> > > > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
> > > >   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> > > >   vec_perm_builder builder0;
> > > >   vec_perm_builder builder1;
> > > >   vec_perm_builder builder2 (nelts, nelts, 1);
> > > > }
> > > > -   (if (tree_to_vec_perm_builder (, @3)
> > > > - && tree_to_vec_perm_builder (, @4))
> > > > +   (if (tree_to_vec_perm_builder (, @4)
> > > > + && tree_to_vec_perm_builder (, @5)
> > > > + && element_precision (TREE_TYPE (@0))
> > > > +== element_precision (TREE_TYPE (@1)))
> > >
> > > I think you want to check TYPE_SIZE (TREE_TYPE (@0/@1)) for equality
> > > instead.
> > >
> >
> > I think TYPE_SIZE is not enough as we need the vector elements to have
> > the same size, not just the vector as a whole.
>
> Err, yes - you want to check the element sizes of course.
>
>From what I understand, checking the element size should be enough.
Otherwise we can check both TYPE_SIZE and element_precision to be
equal.
So OK to commit with just element_precision?

BTW I also noticed from these testcases that there is a gcc 13 -> 14
regression with weird XORs being introduced:

typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
void fun (veci *a, veci *b, veci *c) {
  *c = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
}

gcc 13.3:
  adrp x3, .LC0
  ldr q0, [x0]
  ldr q1, [x1]
  ldr q2, [x3, #:lo12:.LC0]
  tbl v0.16b, {v0.16b - v1.16b}, v2.16b
  str q0, [x2]

gcc 14.1:
  ldr q30, [x1]
  adrp x3, .LC0
  ldr q31, [x0]
  ldr q29, [x3, #:lo12:.LC0]
  eor v31.16b, v31.16b, v30.16b
  eor v30.16b, v31.16b, v30.16b
  eor v31.16b, v31.16b, v30.16b
  tbl v30.16b, {v30.16b - v31.16b}, v29.16b
  str q30, [x2]

Manolis

> > For example, when using the TYPE_SIZE check instead the following
> > testcase miscompiles
> >
> > typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> > typedef double vecd __attribute__ ((vector_size (2 * sizeof (double;
> >
> > void fun (veci *a, veci *b, veci *c)
> > {
> >   char data[16];
> >   veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> >   vecd r2;
> >   __builtin_memcpy(data, , sizeof(veci));
> >   __builtin_memcpy(, data, sizeof(vecd));
> >   vecd r3 = __builtin_shufflevector (r2, r2, 1, 0);
> >   __builtin_memcpy(data, , sizeof(vecd));
> >   __builtin_memcpy(c, data, sizeof(veci));
> > }
> >
> > To:
> >
> > ldr q31, [x0]
> > rev64   v31.4s, v31.4s
> > str q31, [x2]
> > ret
> >
> > > Otherwise OK.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > >  (with
> > > >   {
> > > > vec_perm_indices sel0 (builder0, 2, nelts);
> > > > @@ -10073,10 +10075,10 @@ and,
> > > >  ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, 
> > > > false)
> > > > || !can_vec_perm_const_p (result_mode, op_mode, sel1, 
> > > > false))
> > > >  : !can_vec_perm_const_p (result_mode, op_mode, sel1, 
> > > > false)))
> > > > -  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> > > > +  op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
> > > >   }
> > > >   (if (op0)
> > > > -  (vec_perm @1 @2 { op0; })))
> > > > +  (view_convert (vec_perm @2 @3 { op0; }
> > > >
> > > >  /* Merge
> > > >   c = VEC_PERM_EXPR ;
> > > > diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
> > > > 

Re: [PATCH][14 backport] c++: Fix instantiation of imported temploid friends [PR114275]

2024-05-24 Thread Nathaniel Shead
On Thu, May 23, 2024 at 06:41:06PM -0400, Jason Merrill wrote:
> On 5/13/24 07:56, Nathaniel Shead wrote:
> > > > @@ -11751,9 +11767,16 @@ tsubst_friend_class (tree friend_tmpl, tree 
> > > > args)
> > > >  if (tmpl != error_mark_node)
> > > > {
> > > >   /* The new TMPL is not an instantiation of anything, so we
> > > > -forget its origins.  We don't reset CLASSTYPE_TI_TEMPLATE
> > > > +forget its origins.  It is also not a specialization of
> > > > +anything.  We don't reset CLASSTYPE_TI_TEMPLATE
> > > >  for the new type because that is supposed to be the
> > > >  corresponding template decl, i.e., TMPL.  */
> > > > + spec_entry elt;
> > > > + elt.tmpl = friend_tmpl;
> > > > + elt.args = CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl));
> > > > + elt.spec = TREE_TYPE (tmpl);
> > > > + type_specializations->remove_elt ();
> > > 
> > > For GCC 14.2 let's guard this with if (modules_p ()); for GCC 15 it can be
> > > unconditional.  OK.
> > 
> > I'm looking to backport this patch to GCC 14 now that it's been on trunk
> > some time.  Here's the patch I'm aiming to add (squashed with the
> > changes from r15-220-gec2365e07537e8) after cherrypicking the
> > prerequisite commit r15-58-g2faf040335f9b4; is this OK?
> > 
> > Or should I keep it as two separate commits to make the cherrypicking
> > more obvious? Not entirely sure on the etiquette around this.
> 
> It's OK to squash them, but it's typical to use -x (directly or via git
> gcc-backport) to mention where a branch change was cherry-picked from, and
> in this case it would make sense to edit in the second commit so it's clear
> the backport includes both.  OK that way.
> 
> Jason
> 

Sorry, still a bit confused :)  Do you mean to merge the two commits
together such that there are two "cherry picked from commit ..."s in the
commit message?  Or just list second commit, and mention that it
includes both in the commit message?

Nathaniel


[PATCH] Fix typo in the testcase.

2024-05-24 Thread liuhongt
Committed as an obvious patch.

gcc/testsuite/ChangeLog:

PR target/114148
* gcc.target/i386/pr106010-7b.c: Refine testcase.
---
 gcc/testsuite/gcc.target/i386/pr106010-7b.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr106010-7b.c 
b/gcc/testsuite/gcc.target/i386/pr106010-7b.c
index 26482cc10f5..917e56e45f7 100644
--- a/gcc/testsuite/gcc.target/i386/pr106010-7b.c
+++ b/gcc/testsuite/gcc.target/i386/pr106010-7b.c
@@ -34,11 +34,11 @@ avx_test (void)
 p_init[i] = i % 2 + 3;
 
   memcpy (pd_src, p_init, 2 * N * sizeof (double));
-  memcpy (ps_dst, p_init, 2 * N * sizeof (float));
-  memcpy (epi64_dst, p_init, 2 * N * sizeof (long long));
-  memcpy (epi32_dst, p_init, 2 * N * sizeof (int));
-  memcpy (epi16_dst, p_init, 2 * N * sizeof (short));
-  memcpy (epi8_dst, p_init, 2 * N * sizeof (char));
+  memcpy (ps_src, p_init, 2 * N * sizeof (float));
+  memcpy (epi64_src, p_init, 2 * N * sizeof (long long));
+  memcpy (epi32_src, p_init, 2 * N * sizeof (int));
+  memcpy (epi16_src, p_init, 2 * N * sizeof (short));
+  memcpy (epi8_src, p_init, 2 * N * sizeof (char));
 
   foo_pd (pd_dst, pd_src[0]);
   foo_ps (ps_dst, ps_src[0]);
-- 
2.31.1



RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-24 Thread Li, Pan2
Thanks Richard for help and comments.

If my understanding is correct, I should be able to follow below step(s) to 
support the branch form for unsigned SAT_ADD.

1. Building a helper in one place to match a PHI def as COND_EXPR (or even a 
better way to do it by providing native support from genmatch)
2. Leverage this helper in widen-mul and recog it as .SAT_ADD if matches.

Will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener  
Sent: Friday, May 24, 2024 3:21 PM
To: Li, Pan2 
Cc: Jeff Law ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com; 
pins...@gmail.com
Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for 
unsigned SAT_ADD

On Fri, May 24, 2024 at 8:56 AM Richard Biener
 wrote:
>
> On Fri, May 24, 2024 at 8:37 AM Li, Pan2  wrote:
> >
> > Thanks Jeff and Richard for suggestion and reviewing.
> >
> > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : 
> > b.
> > It can perform the convert from PHI to stmt = cond ? a : b successfully, 
> > and then
> > the widen-mul is able to do the recog to .SAT_ADD.
> >
> > For now, to limit the risck, the above convert from PHI to stmt = cond ? a 
> > : b only be performed when matched,
> > as well as the backend support the usadd standard name. Unfortunately, I am 
> > stuck in the case that when the lhs
> > is not matched, we need to clean up something like created stmt in 
> > previous, or we will have ICE for missing definition.
> >
> > sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> > sat_add.c:69:1: error: missing definition
> >69 | SAT_ADD_U_3(uint8_t);
> >   | ^~~
> > for SSA_NAME: _6 in statement:
> > # VUSE <.MEM_14(D)>
> > return _6;
> > during GIMPLE pass: phiopt
> > dump file: sat_add.c.046t.phiopt1
> > sat_add.c:69:1: internal compiler error: verify_ssa failed
> > 0x1db41ba verify_ssa(bool, bool
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> > 0x18e3075 execute_function_todo
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> > 0x18e1c52 do_per_function
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> > 0x18e3222 execute_todo
> >
> > I bet the reason is that we created new stmt like stmt_cond and stmt_val 
> > but we don't insert it.
> > Thus, there will be orphan nodes somewhere and we need something like 
> > rollback to recover the
> > gimple up to a point. I tried sorts of release_xx or likewise but seems not 
> > working.
> >
> > So is there any suggest to take care of such gimple rollback or another 
> > solution for this? Below are
> > The function to perform the convert from PHI to stmt = cond ? a : b for 
> > reference, thanks a lot.
> >
> > Pan
> >
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 918cf50b589..7982b65bac4 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq , gimple_match_op 
> > )
> >  }
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/* Try to match the phi expr to the gimple cond. Return true if we can
> > +   perform the convert or return false.  There will be some restrictions
> > +   or such kind of conversion, aka:
> > +
> > +   1. Only selected pattern will try this convert.
> > +   2. The generated gassign matched the selected IFN pattern.
> > +   3. The backend has implement the standard name.
> > +
> > +   From:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + if (_1 >= x_3(D))
> > +   goto ; [INV]
> > + else
> > +   goto ; [INV]
> > +
> > +  :
> > +
> > +  :
> > + # _2 = PHI <255(2), _1(3)>
> > +
> > +   To:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + phi_cond_6 = _1 >= x_3(D);
> > + _2 = phi_cond_6 ? _1 : 255; */
> > +
> > +static bool
> > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree 
> > arg1)
>
> You should do this in widen-mult and/or ISEL and if necessary for 
> vectorization
> in tree-if-conv.cc, though eventually what if-convert creates might be
> good enough
> to match during pattern recognition.
>
> > +{
> > +  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
> > +
> > +  if (!cond)
> > +return false;
> > +
> > +  enum tree_code code = gimple_cond_code (cond);
> > +  tree phi_result = gimple_phi_result (phi);
> > +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, 
> > "phi_cond");
> > +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> > + gimple_cond_rhs (cond));
> > +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, 
> > arg1);
>
> phiopt directly uses cmp_tree, so you could do that as well and avoid 
> stmt_cond.
>
> > +
> > +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> > +  

Re: [PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Richard Biener
On Fri, 24 May 2024, Manolis Tsamis wrote:

> On Fri, May 24, 2024 at 9:31 AM Richard Biener  wrote:
> >
> > On Wed, 22 May 2024, Manolis Tsamis wrote:
> >
> > > The match.pd patterns to merge two vector permutes into one fail when a
> > > potentially no-op view convert expressions is between the two permutes.
> > > This change lifts this restriction.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd: Allow no-op view_convert between permutes.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/fold-perm-2.c: New test.
> > >
> > > Signed-off-by: Manolis Tsamis 
> > > ---
> > >
> > >  gcc/match.pd   | 14 --
> > >  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
> > >  2 files changed, 24 insertions(+), 6 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 07e743ae464..cbb3c5d86e0 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -10039,19 +10039,21 @@ and,
> > >   d = VEC_PERM_EXPR ;  */
> > >
> > >  (simplify
> > > - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> > > + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 
> > > VECTOR_CST@5)
> > >   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > >(with
> > > {
> > >   machine_mode result_mode = TYPE_MODE (type);
> > > - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> > > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
> > >   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> > >   vec_perm_builder builder0;
> > >   vec_perm_builder builder1;
> > >   vec_perm_builder builder2 (nelts, nelts, 1);
> > > }
> > > -   (if (tree_to_vec_perm_builder (, @3)
> > > - && tree_to_vec_perm_builder (, @4))
> > > +   (if (tree_to_vec_perm_builder (, @4)
> > > + && tree_to_vec_perm_builder (, @5)
> > > + && element_precision (TREE_TYPE (@0))
> > > +== element_precision (TREE_TYPE (@1)))
> >
> > I think you want to check TYPE_SIZE (TREE_TYPE (@0/@1)) for equality
> > instead.
> >
> 
> I think TYPE_SIZE is not enough as we need the vector elements to have
> the same size, not just the vector as a whole.

Err, yes - you want to check the element sizes of course.

> For example, when using the TYPE_SIZE check instead the following
> testcase miscompiles
> 
> typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> typedef double vecd __attribute__ ((vector_size (2 * sizeof (double;
> 
> void fun (veci *a, veci *b, veci *c)
> {
>   char data[16];
>   veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
>   vecd r2;
>   __builtin_memcpy(data, , sizeof(veci));
>   __builtin_memcpy(, data, sizeof(vecd));
>   vecd r3 = __builtin_shufflevector (r2, r2, 1, 0);
>   __builtin_memcpy(data, , sizeof(vecd));
>   __builtin_memcpy(c, data, sizeof(veci));
> }
> 
> To:
> 
> ldr q31, [x0]
> rev64   v31.4s, v31.4s
> str q31, [x2]
> ret
> 
> > Otherwise OK.
> >
> > Thanks,
> > Richard.
> >
> > >  (with
> > >   {
> > > vec_perm_indices sel0 (builder0, 2, nelts);
> > > @@ -10073,10 +10075,10 @@ and,
> > >  ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
> > > || !can_vec_perm_const_p (result_mode, op_mode, sel1, 
> > > false))
> > >  : !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
> > > -  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> > > +  op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
> > >   }
> > >   (if (op0)
> > > -  (vec_perm @1 @2 { op0; })))
> > > +  (view_convert (vec_perm @2 @3 { op0; }
> > >
> > >  /* Merge
> > >   c = VEC_PERM_EXPR ;
> > > diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
> > > b/gcc/testsuite/gcc.dg/fold-perm-2.c
> > > new file mode 100644
> > > index 000..1a4ab4065de
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
> > > @@ -0,0 +1,16 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O -fdump-tree-fre1" } */
> > > +
> > > +typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> > > +typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof 
> > > (unsigned int;
> > > +
> > > +void fun (veci *a, veci *b, veci *c)
> > > +{
> > > +  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> > > +  vecu r2 = __builtin_convertvector (r1, vecu);
> > > +  vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
> > > +  *c = __builtin_convertvector (r3, veci);
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } 
> > > } */
> > > +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
> > >
> >
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software 

Re: [PATCH] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Manolis Tsamis
On Fri, May 24, 2024 at 9:31 AM Richard Biener  wrote:
>
> On Wed, 22 May 2024, Manolis Tsamis wrote:
>
> > The match.pd patterns to merge two vector permutes into one fail when a
> > potentially no-op view convert expressions is between the two permutes.
> > This change lifts this restriction.
> >
> > gcc/ChangeLog:
> >
> >   * match.pd: Allow no-op view_convert between permutes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/fold-perm-2.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/match.pd   | 14 --
> >  gcc/testsuite/gcc.dg/fold-perm-2.c | 16 
> >  2 files changed, 24 insertions(+), 6 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/fold-perm-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 07e743ae464..cbb3c5d86e0 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -10039,19 +10039,21 @@ and,
> >   d = VEC_PERM_EXPR ;  */
> >
> >  (simplify
> > - (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4)
> > + (vec_perm (view_convert?@0 (vec_perm@1 @2 @3 VECTOR_CST@4)) @0 
> > VECTOR_CST@5)
> >   (if (TYPE_VECTOR_SUBPARTS (type).is_constant ())
> >(with
> > {
> >   machine_mode result_mode = TYPE_MODE (type);
> > - machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1));
> > + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@2));
> >   int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> >   vec_perm_builder builder0;
> >   vec_perm_builder builder1;
> >   vec_perm_builder builder2 (nelts, nelts, 1);
> > }
> > -   (if (tree_to_vec_perm_builder (, @3)
> > - && tree_to_vec_perm_builder (, @4))
> > +   (if (tree_to_vec_perm_builder (, @4)
> > + && tree_to_vec_perm_builder (, @5)
> > + && element_precision (TREE_TYPE (@0))
> > +== element_precision (TREE_TYPE (@1)))
>
> I think you want to check TYPE_SIZE (TREE_TYPE (@0/@1)) for equality
> instead.
>

I think TYPE_SIZE is not enough as we need the vector elements to have
the same size, not just the vector as a whole.
For example, when using the TYPE_SIZE check instead the following
testcase miscompiles

typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
typedef double vecd __attribute__ ((vector_size (2 * sizeof (double;

void fun (veci *a, veci *b, veci *c)
{
  char data[16];
  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
  vecd r2;
  __builtin_memcpy(data, , sizeof(veci));
  __builtin_memcpy(, data, sizeof(vecd));
  vecd r3 = __builtin_shufflevector (r2, r2, 1, 0);
  __builtin_memcpy(data, , sizeof(vecd));
  __builtin_memcpy(c, data, sizeof(veci));
}

To:

ldr q31, [x0]
rev64   v31.4s, v31.4s
str q31, [x2]
ret

> Otherwise OK.
>
> Thanks,
> Richard.
>
> >  (with
> >   {
> > vec_perm_indices sel0 (builder0, 2, nelts);
> > @@ -10073,10 +10075,10 @@ and,
> >  ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false)
> > || !can_vec_perm_const_p (result_mode, op_mode, sel1, 
> > false))
> >  : !can_vec_perm_const_p (result_mode, op_mode, sel1, false)))
> > -  op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2);
> > +  op0 = vec_perm_indices_to_tree (TREE_TYPE (@5), sel2);
> >   }
> >   (if (op0)
> > -  (vec_perm @1 @2 { op0; })))
> > +  (view_convert (vec_perm @2 @3 { op0; }
> >
> >  /* Merge
> >   c = VEC_PERM_EXPR ;
> > diff --git a/gcc/testsuite/gcc.dg/fold-perm-2.c 
> > b/gcc/testsuite/gcc.dg/fold-perm-2.c
> > new file mode 100644
> > index 000..1a4ab4065de
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/fold-perm-2.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O -fdump-tree-fre1" } */
> > +
> > +typedef int veci __attribute__ ((vector_size (4 * sizeof (int;
> > +typedef unsigned int vecu __attribute__ ((vector_size (4 * sizeof 
> > (unsigned int;
> > +
> > +void fun (veci *a, veci *b, veci *c)
> > +{
> > +  veci r1 = __builtin_shufflevector (*a, *b, 0, 5, 2, 7);
> > +  vecu r2 = __builtin_convertvector (r1, vecu);
> > +  vecu r3 = __builtin_shufflevector (r2, r2, 2, 3, 1, 0);
> > +  *c = __builtin_convertvector (r3, veci);
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 2, 7, 5, 0 }" "fre1" } } 
> > */
> > +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "fre1" } } */
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[V3 PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-24 Thread liuhongt
Update in V3:
> Since this was about vectorization can you instead add a testcase to
> gcc.dg/vect/ and check for
> vectorization to happen?
Move to vect/pr112325.c.
>
> I believe the if (unr_insn <= 0) check can go as well.
Removed.

> as said, you want to do
>
>   curolli = false;
>
> after the above since we are iterating and for a subsequent unrolling
> of an outer loop
> of an unrolled inner loop we _do_ want to apply the 2/3 reduction
> since there's likely
> inter-loop redundancies exposed (as happens in SPEC calculix for example).
>
> Not sure if that changes any of the testsuite outcome - it possibly avoids the
> gcc.dg/vect/pr69783.c FAIL?
Yes, it avoids that, cunrolli is set to false when CHANGED is true.

> Not sure about the arm fallout.
It's the same reason as pr69783.c, there's subsequent unrolling of an outer loop
of an unrolled inner loop, and since inner loop is completely unrolled,
outer_loop->inner is false and escape from the check.
The change also fix 2 arm fallouts.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

For the innermost loop, after completely loop unroll, it will most likely
not be able to reduce the body size to 2/3. The current 2/3 reduction
will make some of the larger loops completely unrolled during
cunrolli, which will then result in them not being able to be
vectorized. It also increases the register pressure.

The patch move the 2/3 reduction from estimated_unrolled_size to
tree_unroll_loops_completely.

gcc/ChangeLog:

PR tree-optimization/112325
* tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Move the
2 / 3 loop body size reduction to ..
(try_unroll_loop_completely): .. here, add it for the check of
body size shrink, and the check of comparison against
param_max_completely_peeled_insns when
(!cunrolli ||loop->inner).
(canonicalize_loop_induction_variables): Add new parameter
cunrolli and pass down.
(tree_unroll_loops_completely_1): Ditto.
(canonicalize_induction_variables): Pass cunrolli as false to
canonicalize_loop_induction_variables.
(tree_unroll_loops_completely): Set cunrolli to true at
beginning and set it to false after CHANGED is true.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr112325.c: New test.
---
 gcc/testsuite/gcc.dg/vect/pr112325.c | 59 
 gcc/tree-ssa-loop-ivcanon.cc | 46 +++---
 2 files changed, 83 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr112325.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c 
b/gcc/testsuite/gcc.dg/vect/pr112325.c
new file mode 100644
index 000..71cf4099253
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr112325.c
@@ -0,0 +1,59 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -funroll-loops -fdump-tree-vect-details" } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
+
+typedef unsigned short ggml_fp16_t;
+static float table_f32_f16[1 << 16];
+
+inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
+unsigned short s;
+__builtin_memcpy(, , sizeof(unsigned short));
+return table_f32_f16[s];
+}
+
+typedef struct {
+ggml_fp16_t d;
+ggml_fp16_t m;
+unsigned char qh[4];
+unsigned char qs[32 / 2];
+} block_q5_1;
+
+typedef struct {
+float d;
+float s;
+char qs[32];
+} block_q8_1;
+
+void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * 
restrict vx, const void * restrict vy) {
+const int qk = 32;
+const int nb = n / qk;
+
+const block_q5_1 * restrict x = vx;
+const block_q8_1 * restrict y = vy;
+
+float sumf = 0.0;
+
+for (int i = 0; i < nb; i++) {
+unsigned qh;
+__builtin_memcpy(, x[i].qh, sizeof(qh));
+
+int sumi = 0;
+
+for (int j = 0; j < qk/2; ++j) {
+const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
+const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
+
+const int x0 = (x[i].qs[j] & 0xF) | xh_0;
+const int x1 = (x[i].qs[j] >> 4) | xh_1;
+
+sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
+}
+
+sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
+}
+
+*s = sumf;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bf017137260..216e81ef15f 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -437,11 +437,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge 
edge_to_cancel,
It is (NUNROLL + 1) * size of loop body with taking into account
the fact that in last copy everything after exit conditional
is dead and that some instructions will be eliminated after
-   peeling.
-
-   Loop 

Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-24 Thread Richard Biener
On Fri, May 24, 2024 at 8:56 AM Richard Biener
 wrote:
>
> On Fri, May 24, 2024 at 8:37 AM Li, Pan2  wrote:
> >
> > Thanks Jeff and Richard for suggestion and reviewing.
> >
> > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : 
> > b.
> > It can perform the convert from PHI to stmt = cond ? a : b successfully, 
> > and then
> > the widen-mul is able to do the recog to .SAT_ADD.
> >
> > For now, to limit the risck, the above convert from PHI to stmt = cond ? a 
> > : b only be performed when matched,
> > as well as the backend support the usadd standard name. Unfortunately, I am 
> > stuck in the case that when the lhs
> > is not matched, we need to clean up something like created stmt in 
> > previous, or we will have ICE for missing definition.
> >
> > sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> > sat_add.c:69:1: error: missing definition
> >69 | SAT_ADD_U_3(uint8_t);
> >   | ^~~
> > for SSA_NAME: _6 in statement:
> > # VUSE <.MEM_14(D)>
> > return _6;
> > during GIMPLE pass: phiopt
> > dump file: sat_add.c.046t.phiopt1
> > sat_add.c:69:1: internal compiler error: verify_ssa failed
> > 0x1db41ba verify_ssa(bool, bool
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> > 0x18e3075 execute_function_todo
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> > 0x18e1c52 do_per_function
> > 
> > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> > 0x18e3222 execute_todo
> >
> > I bet the reason is that we created new stmt like stmt_cond and stmt_val 
> > but we don't insert it.
> > Thus, there will be orphan nodes somewhere and we need something like 
> > rollback to recover the
> > gimple up to a point. I tried sorts of release_xx or likewise but seems not 
> > working.
> >
> > So is there any suggest to take care of such gimple rollback or another 
> > solution for this? Below are
> > The function to perform the convert from PHI to stmt = cond ? a : b for 
> > reference, thanks a lot.
> >
> > Pan
> >
> > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> > index 918cf50b589..7982b65bac4 100644
> > --- a/gcc/tree-ssa-phiopt.cc
> > +++ b/gcc/tree-ssa-phiopt.cc
> > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq , gimple_match_op 
> > )
> >  }
> >  }
> >
> > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> > +
> > +/* Try to match the phi expr to the gimple cond. Return true if we can
> > +   perform the convert or return false.  There will be some restrictions
> > +   or such kind of conversion, aka:
> > +
> > +   1. Only selected pattern will try this convert.
> > +   2. The generated gassign matched the selected IFN pattern.
> > +   3. The backend has implement the standard name.
> > +
> > +   From:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + if (_1 >= x_3(D))
> > +   goto ; [INV]
> > + else
> > +   goto ; [INV]
> > +
> > +  :
> > +
> > +  :
> > + # _2 = PHI <255(2), _1(3)>
> > +
> > +   To:
> > +  :
> > + _1 = x_3(D) + y_4(D);
> > + phi_cond_6 = _1 >= x_3(D);
> > + _2 = phi_cond_6 ? _1 : 255; */
> > +
> > +static bool
> > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree 
> > arg1)
>
> You should do this in widen-mult and/or ISEL and if necessary for 
> vectorization
> in tree-if-conv.cc, though eventually what if-convert creates might be
> good enough
> to match during pattern recognition.
>
> > +{
> > +  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
> > +
> > +  if (!cond)
> > +return false;
> > +
> > +  enum tree_code code = gimple_cond_code (cond);
> > +  tree phi_result = gimple_phi_result (phi);
> > +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, 
> > "phi_cond");
> > +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> > + gimple_cond_rhs (cond));
> > +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, 
> > arg1);
>
> phiopt directly uses cmp_tree, so you could do that as well and avoid 
> stmt_cond.
>
> > +
> > +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> > +  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
> > +
> > +  tree ops[2];
> > +  tree lhs = gimple_assign_lhs (stmt_val);
> > +  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> > +&& direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> > +  OPTIMIZE_FOR_BOTH));
> > +
> > +  if (matched_p)
> > +{
> > +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> > +  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> > +
> > +  gsi_insert_before (, stmt_cond, GSI_SAME_STMT);
> > +  gsi_insert_before (, stmt_val, GSI_SAME_STMT);
> > +  remove_phi_node (, false);
>
> You only matched but you do not insert the actual .SAT_ADD here and that's
> the 

Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-24 Thread Richard Biener
On Fri, May 24, 2024 at 8:37 AM Li, Pan2  wrote:
>
> Thanks Jeff and Richard for suggestion and reviewing.
>
> Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b.
> It can perform the convert from PHI to stmt = cond ? a : b successfully, and 
> then
> the widen-mul is able to do the recog to .SAT_ADD.
>
> For now, to limit the risck, the above convert from PHI to stmt = cond ? a : 
> b only be performed when matched,
> as well as the backend support the usadd standard name. Unfortunately, I am 
> stuck in the case that when the lhs
> is not matched, we need to clean up something like created stmt in previous, 
> or we will have ICE for missing definition.
>
> sat_add.c: In function ‘sat_add_u_3_uint8_t’:
> sat_add.c:69:1: error: missing definition
>69 | SAT_ADD_U_3(uint8_t);
>   | ^~~
> for SSA_NAME: _6 in statement:
> # VUSE <.MEM_14(D)>
> return _6;
> during GIMPLE pass: phiopt
> dump file: sat_add.c.046t.phiopt1
> sat_add.c:69:1: internal compiler error: verify_ssa failed
> 0x1db41ba verify_ssa(bool, bool
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203
> 0x18e3075 execute_function_todo
> 
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096
> 0x18e1c52 do_per_function
> 
> /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688
> 0x18e3222 execute_todo
>
> I bet the reason is that we created new stmt like stmt_cond and stmt_val but 
> we don't insert it.
> Thus, there will be orphan nodes somewhere and we need something like 
> rollback to recover the
> gimple up to a point. I tried sorts of release_xx or likewise but seems not 
> working.
>
> So is there any suggest to take care of such gimple rollback or another 
> solution for this? Below are
> The function to perform the convert from PHI to stmt = cond ? a : b for 
> reference, thanks a lot.
>
> Pan
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 918cf50b589..7982b65bac4 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq , gimple_match_op )
>  }
>  }
>
> +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
> +
> +/* Try to match the phi expr to the gimple cond. Return true if we can
> +   perform the convert or return false.  There will be some restrictions
> +   or such kind of conversion, aka:
> +
> +   1. Only selected pattern will try this convert.
> +   2. The generated gassign matched the selected IFN pattern.
> +   3. The backend has implement the standard name.
> +
> +   From:
> +  :
> + _1 = x_3(D) + y_4(D);
> + if (_1 >= x_3(D))
> +   goto ; [INV]
> + else
> +   goto ; [INV]
> +
> +  :
> +
> +  :
> + # _2 = PHI <255(2), _1(3)>
> +
> +   To:
> +  :
> + _1 = x_3(D) + y_4(D);
> + phi_cond_6 = _1 >= x_3(D);
> + _2 = phi_cond_6 ? _1 : 255; */
> +
> +static bool
> +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree 
> arg1)

You should do this in widen-mult and/or ISEL and if necessary for vectorization
in tree-if-conv.cc, though eventually what if-convert creates might be
good enough
to match during pattern recognition.

> +{
> +  gcond *cond = as_a  (*gsi_last_bb (cond_bb));
> +
> +  if (!cond)
> +return false;
> +
> +  enum tree_code code = gimple_cond_code (cond);
> +  tree phi_result = gimple_phi_result (phi);
> +  tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond");
> +  tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond),
> + gimple_cond_rhs (cond));
> +  tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, 
> arg1);

phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond.

> +
> +  gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree);
> +  gassign *stmt_val = gimple_build_assign (phi_result, rhs);
> +
> +  tree ops[2];
> +  tree lhs = gimple_assign_lhs (stmt_val);
> +  bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL)
> +&& direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs),
> +  OPTIMIZE_FOR_BOTH));
> +
> +  if (matched_p)
> +{
> +  gimple_stmt_iterator gsi = gsi_last_bb (cond_bb);
> +  gimple_stmt_iterator psi = gsi_for_stmt (phi);
> +
> +  gsi_insert_before (, stmt_cond, GSI_SAME_STMT);
> +  gsi_insert_before (, stmt_val, GSI_SAME_STMT);
> +  remove_phi_node (, false);

You only matched but you do not insert the actual .SAT_ADD here and that's
the definition that's missing.  You probably shouldn't need to add the
cond-stmt?

> +}
> +  else
> +{
> +  // Clean up the stmt created, but non of blow works well.
> +  // gsi = gsi_for_stmt (stmt_val);
> +  // gsi_remove (, true);
> +  // release_defs (stmt_val);
> +  // ggc_free (stmt_val);
> +
> +  // gsi = 

  1   2   >