[PATCH] Fix PR rtl-optimization/115038

2024-05-20 Thread Eric Botcazou
Hi,

this is a regression present on mainline and 14 branch under the form of an 
ICE in seh_cfa_offset from config/i386/winnt.cc on the attached C++ testcase 
compiled with -O2 -fno-omit-frame-pointer.

The problem directly comes from the -ffold-mem-offsets pass messing up with 
the prologue and the frame-related instructions, which is a no-no with SEH, so 
the fix simply disconnects the pass in these circumstances, the question being 
whether this should be done unconditionally as in the fix or only with SEH.

Tested on x86-64/Linux, OK for the mainline and 14 branch?


2024-05-20  Eric Botcazou  

PR rtl-optimization/115038
* fold-mem-offsets.cc (fold_offsets): Return 0 if the defining
instruction of the register is frame related.


2024-05-20  Eric Botcazou  

* g++.dg/opt/fmo1.C: New test.

-- 
Eric Botcazoudiff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
index 2e15b05529e..84b9623058b 100644
--- a/gcc/fold-mem-offsets.cc
+++ b/gcc/fold-mem-offsets.cc
@@ -491,7 +491,7 @@ fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns)
 {
   rtx_insn *def = get_single_def_in_bb (insn, reg);
 
-  if (!def || GET_CODE (PATTERN (def)) != SET)
+  if (!def || RTX_FRAME_RELATED_P (def) || GET_CODE (PATTERN (def)) != SET)
 return 0;
 
   rtx dest = SET_DEST (PATTERN (def));
// PR rtl-optimization/115038
// Reported by Christoph Reiter 

// { dg-do compile }
// { dg-options "-O2 -fno-omit-frame-pointer" }

struct d {
  d();
};

struct e {
  e() : c(1.0) {}
  float c;
};

class k {
  d g;
  e h;
};

class a {
  k f;
} a;

k b;


Re: [Patch] contrib/gcc-changelog/git_update_version.py: Improve diagnostic (was: [Patch] contrib/gcc-changelog/git_update_version.py: Add ignore commit, improve diagnostic)

2024-05-20 Thread Jakub Jelinek
On Mon, May 20, 2024 at 08:31:02AM +0200, Tobias Burnus wrote:
> Hmm, there were now two daily bumps:
> 
> Date:   Mon May 20 00:16:30 2024 +
> 
> Date:   Sun May 19 18:15:28 2024 +
> 
> I really wonder why.

Because I've done it by hand.
I have in ~gccadmin a gcc-changelog copy and adjusted update_version_git
script which doesn't use contrib/gcc-changelog subdirectory from the
checkout it makes but from the ~gccadmin directory, because I don't want to
constantly try to add some commit number to IGNORED_COMMITS, see that it
either works or doesn't (I think sometimes it needs the hash of the revert
commit, at other times the commit hash referenced in the revert commit)
or that further ones are needed.

> From f56b1764f2b5c2c83c6852607405e5be0a763a2c Mon Sep 17 00:00:00 2001
> From: Tobias Burnus 
> Date: Sun, 19 May 2024 08:17:42 +0200
> Subject: [PATCH] contrib/gcc-changelog/git_update_version.py: Improve 
> diagnostic
> 
> contrib/ChangeLog:
> 
> * gcc-changelog/git_update_version.py (prepend_to_changelog_files): 
> Output

8 spaces rather than tab

>   git hash in case errors occurred.
> 
> diff --git a/contrib/gcc-changelog/git_update_version.py 
> b/contrib/gcc-changelog/git_update_version.py
> index 24f6c43d0b2..ec0151b83fe 100755
> --- a/contrib/gcc-changelog/git_update_version.py
> +++ b/contrib/gcc-changelog/git_update_version.py
> @@ -58,6 +58,7 @@ def read_timestamp(path):
>  
>  def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
>  if not git_commit.success:
> +logging.info(f"While processing {git_commit.info.hexsha}:")
>  for error in git_commit.errors:
>  logging.info(error)
>  raise AssertionError()

So, your commit is useful part of it, I'm already using something similar in
my hack (just was doing it for even successful commits, but I think your
patch is better).
And, I think best would be if update_version_git script simply
accepted a list of ignored commits from the command line too,
passed it to the git_update_version.py script and that one
added those to IGNORED_COMMITS.
Because typically if the DATESTAMP/ChangeLog updates gets stuck,
one doesn't just adjust IGNORED_COMMITS and wait up to another
day to see if it worked, but runs the script by hand to make sure
it works.

--- gcc-checkout/contrib/gcc-changelog/git_update_version.py2024-05-13 
16:52:57.890151748 +
+++ gcc-changelog/git_update_version.py 2024-05-19 18:13:44.953648834 +
@@ -41,7 +41,21 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7',
+'d7bb8eaade3cd3aa70715c8567b4d7b08098e699',
+'89feb3557a018893cfe50c2e07f91559bd3cde2b',
+'ccf8d3e3d26c6ba3d5e11fffeed8d64018e9c060',
+'e0c52905f666e3d23881f82dbf39466a24f009f4',
+'b38472ffc1e631bd357573b44d956ce16d94e666',
+'a0b13d0860848dd5f2876897ada1e22e4e681e91',
+'b8c772cae97b54386f7853edf0f9897012bfa90b',
+'810d35a7e054bcbb5b66d2e5924428e445f5fba9',
+'0df1ee083434ac00ecb19582b1e5b25e105981b2',
+'2c688f6afce4cbb414f5baab1199cd525f309fca',
+'60dcb710b6b4aa22ea96abc8df6dfe9067f3d7fe',
+'44968a0e00f656e9bb3e504bb2fa1a8282002015',
+'d7bb8eaade3cd3aa70715c8567b4d7b08098e699',
+'da73261ce7731be7f2b164f1db796878cdc23365')
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -125,6 +139,7 @@ def update_current_branch(ref_name):
   % (commit.hexsha, head.hexsha), ref_name)
 commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS]
 for git_commit in reversed(commits):
+logging.info('trying %s', git_commit.info.hexsha)
 prepend_to_changelog_files(repo, args.git_path, git_commit,
not args.dry_mode)
 if args.dry_mode:

Jakub



[COMMITTED 03/30] ada: Implement representation aspect Max_Entry_Queue_Length

2024-05-20 Thread Marc Poulhiès
From: Jose Ruiz 

Enforce Max_Entry_Queue_Length (and its
synonym Max_Entry_Queue_Depth) when applied to individual
protected entries.

gcc/ada/

* exp_ch9.adb (Expand_N_Protected_Type_Declaration): Clarify
comments.
* sem_prag.adb (Analyze_Pragma): Check for duplicates
Max_Entry_Queue_Length, Max_Entry_Queue_Depth and Max_Queue_Length
for the same protected entry.
* sem_util.adb (Get_Max_Queue_Length): Take into account all three
representation aspects that can be used to set this restriction.
(Has_Max_Queue_Length): Likewise.
* doc/gnat_rm/implementation_defined_pragmas.rst:
(pragma Max_Queue_Length): Fix pragma in example.
* gnat_rm.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../implementation_defined_pragmas.rst|  2 +-
 gcc/ada/exp_ch9.adb   |  6 ++--
 gcc/ada/gnat_rm.texi  |  2 +-
 gcc/ada/sem_prag.adb  | 11 +++
 gcc/ada/sem_util.adb  | 33 ++-
 5 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
index bcbd85984dc..0661670e047 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -3771,7 +3771,7 @@ Pragma Max_Queue_Length
 
 Syntax::
 
-   pragma Max_Entry_Queue (static_integer_EXPRESSION);
+   pragma Max_Queue_Length (static_integer_EXPRESSION);
 
 
 This pragma is used to specify the maximum callers per entry queue for
diff --git a/gcc/ada/exp_ch9.adb b/gcc/ada/exp_ch9.adb
index 051b1df060f..4de253ab6e8 100644
--- a/gcc/ada/exp_ch9.adb
+++ b/gcc/ada/exp_ch9.adb
@@ -9405,7 +9405,8 @@ package body Exp_Ch9 is
   end loop;
 
   --  Create the declaration of an array object which contains the values
-  --  of aspect/pragma Max_Queue_Length for all entries of the protected
+  --  of any aspect/pragma Max_Queue_Length, Max_Entry_Queue_Length or
+  --  Max_EntryQueue_Depth for all entries of the protected
   --  type. This object is later passed to the appropriate protected object
   --  initialization routine.
 
@@ -9422,7 +9423,8 @@ package body Exp_Ch9 is
 Need_Array : Boolean := False;
 
  begin
---  First check if there is any Max_Queue_Length pragma
+--  First check if there is any Max_Queue_Length,
+--  Max_Entry_Queue_Length or Max_Entry_Queue_Depth pragma.
 
 Item := First_Entity (Prot_Typ);
 while Present (Item) loop
diff --git a/gcc/ada/gnat_rm.texi b/gcc/ada/gnat_rm.texi
index 40516121b7a..4dbbb036a25 100644
--- a/gcc/ada/gnat_rm.texi
+++ b/gcc/ada/gnat_rm.texi
@@ -5312,7 +5312,7 @@ no effect in GNAT, other than being syntax checked.
 Syntax:
 
 @example
-pragma Max_Entry_Queue (static_integer_EXPRESSION);
+pragma Max_Queue_Length (static_integer_EXPRESSION);
 @end example
 
 This pragma is used to specify the maximum callers per entry queue for
diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index f27e40edcbb..0e2ce9de4b5 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -20388,6 +20388,17 @@ package body Sem_Prag is
  ("pragma % must apply to a protected entry declaration");
 end if;
 
+--  Check for duplicates
+
+if Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Length)
+ or else
+   Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Depth)
+ or else
+   Has_Rep_Pragma (Entry_Id, Name_Max_Queue_Length)
+then
+   Error_Msg_N ("??duplicate Max_Entry_Queue_Length pragma", N);
+end if;
+
 --  Mark the pragma as Ghost if the related subprogram is also
 --  Ghost. This also ensures that any expansion performed further
 --  below will produce Ghost nodes.
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index d512d462b44..09358278210 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -10714,26 +10714,38 @@ package body Sem_Util is
 
function Get_Max_Queue_Length (Id : Entity_Id) return Uint is
   pragma Assert (Is_Entry (Id));
-  Prag : constant Entity_Id := Get_Pragma (Id, Pragma_Max_Queue_Length);
-  Max  : Uint;
+  PMQL  : constant Entity_Id := Get_Pragma (Id, Pragma_Max_Queue_Length);
+  PMEQD : constant Entity_Id :=
+ Get_Pragma (Id, Pragma_Max_Entry_Queue_Depth);
+  PMEQL : constant Entity_Id :=
+ Get_Pragma (Id, Pragma_Max_Entry_Queue_Length);
+  Max   : Uint;
 
begin
   --  A value of 0 or -1 represents no maximum specified, and entries and
   --  entry families with no Max_Queue_Length aspect or pragma default to
   --  it.
 
-  if No (Prag) then
- retur

[COMMITTED 02/30] ada: Small cleanup in System.Finalization_Primitives unit

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

It has been made possible by recent changes.

gcc/ada/

* libgnat/s-finpri.ads (Collection_Node): Move to private part.
(Collection_Node_Ptr): Likewise.
(Header_Alignment): Change to declaration and move completion to
private part.
(Header_Size): Likewise.
(Lock_Type): Delete.
(Finalization_Collection): Move Lock component and remove default
value for Finalization_Started component.
* libgnat/s-finpri.adb (Initialize): Reorder statements.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-finpri.adb |  4 +--
 gcc/ada/libgnat/s-finpri.ads | 48 +++-
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/gcc/ada/libgnat/s-finpri.adb b/gcc/ada/libgnat/s-finpri.adb
index 028c9d76062..bc90fe23ac9 100644
--- a/gcc/ada/libgnat/s-finpri.adb
+++ b/gcc/ada/libgnat/s-finpri.adb
@@ -394,14 +394,14 @@ package body System.Finalization_Primitives is
  (Collection : in out Finalization_Collection)
is
begin
-  Collection.Finalization_Started := False;
-
   --  The dummy head must point to itself in both directions
 
   Collection.Head.Prev := Collection.Head'Unchecked_Access;
   Collection.Head.Next := Collection.Head'Unchecked_Access;
 
   Initialize_RTS_Lock (Collection.Lock'Address);
+
+  Collection.Finalization_Started := False;
end Initialize;
 
-
diff --git a/gcc/ada/libgnat/s-finpri.ads b/gcc/ada/libgnat/s-finpri.ads
index 62c2474b4f4..a821f1db657 100644
--- a/gcc/ada/libgnat/s-finpri.ads
+++ b/gcc/ada/libgnat/s-finpri.ads
@@ -146,16 +146,6 @@ package System.Finalization_Primitives with Preelaborate is
--  collection, in some arbitrary order. Calls to this procedure with
--  a collection that has already been finalized have no effect.
 
-   type Collection_Node is private;
-   --  Each controlled object associated with a finalization collection has
-   --  an associated object of this type.
-
-   type Collection_Node_Ptr is access all Collection_Node;
-   for Collection_Node_Ptr'Storage_Size use 0;
-   pragma No_Strict_Aliasing (Collection_Node_Ptr);
-   --  A reference to a collection node. Since this type may not be used to
-   --  allocate objects, its storage size is zero.
-
procedure Attach_Object_To_Collection
  (Object_Address   : System.Address;
   Finalize_Address : not null Finalize_Address_Ptr;
@@ -171,13 +161,13 @@ package System.Finalization_Primitives with Preelaborate 
is
--  Calls to the procedure with an object that has already been detached
--  have no effects.
 
-   function Header_Alignment return System.Storage_Elements.Storage_Count is
- (Collection_Node'Alignment);
-   --  Return the alignment of type Collection_Node as Storage_Count
+   function Header_Alignment return System.Storage_Elements.Storage_Count;
+   --  Return the alignment of the header to be placed immediately in front of
+   --  a controlled object allocated for some access type, in storage units.
 
-   function Header_Size return System.Storage_Elements.Storage_Count is
- (Collection_Node'Object_Size / Storage_Unit);
-   --  Return the object size of type Collection_Node as Storage_Count
+   function Header_Size return System.Storage_Elements.Storage_Count;
+  --  Return the size of the header to be placed immediately in front of a
+  --  controlled object allocated for some access type, in storage units.
 
 private
 
@@ -221,6 +211,16 @@ private
 
--  Finalization collections:
 
+   type Collection_Node;
+   --  Each controlled object associated with a finalization collection has
+   --  an associated object of this type.
+
+   type Collection_Node_Ptr is access all Collection_Node;
+   for Collection_Node_Ptr'Storage_Size use 0;
+   pragma No_Strict_Aliasing (Collection_Node_Ptr);
+   --  A reference to a collection node. Since this type may not be used to
+   --  allocate objects, its storage size is zero.
+
--  Collection node type structure. Finalize_Address comes first because it
--  is an access-to-subprogram and, therefore, might be twice as large and
--  as aligned as an access-to-object on some platforms.
@@ -237,7 +237,11 @@ private
   --  Collection nodes are managed as a circular doubly-linked list
end record;
 
-   type Lock_Type is mod 2**8 with Size => 8;
+   function Header_Alignment return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Alignment);
+
+   function Header_Size return System.Storage_Elements.Storage_Count is
+ (Collection_Node'Object_Size / Storage_Unit);
 
--  Finalization collection type structure
 
@@ -245,15 +249,15 @@ private
  new Ada.Finalization.Limited_Controlled with
record
   Head : aliased Collection_Node;
-  --  The head of the circular doubly-linked list of Collection_Nodes
+  --  The head of the circular doubly-linked list of collection nodes
+
+  Lock 

[COMMITTED 04/30] ada: Detect only conflict with synomyms of max queue length

2024-05-20 Thread Marc Poulhiès
From: Jose Ruiz 

Use of duplicated representation aspect is detected elsewhere
so we do not try to detect them here to avoid repetition of
messages.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Exclude detection of duplicates
because they are detected elsewhere.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 0e2ce9de4b5..a895fd2053a 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -20388,15 +20388,23 @@ package body Sem_Prag is
  ("pragma % must apply to a protected entry declaration");
 end if;
 
---  Check for duplicates
+--  Check for conflicting use of synonyms. Note that we exclude
+--  the detection of duplicates here because they are detected
+--  elsewhere.
 
-if Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Length)
+if (Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Length)
+  and then
+Prag_Id /= Pragma_Max_Entry_Queue_Length)
  or else
-   Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Depth)
+   (Has_Rep_Pragma (Entry_Id, Name_Max_Entry_Queue_Depth)
+  and then
+Prag_Id /= Pragma_Max_Entry_Queue_Depth)
  or else
-   Has_Rep_Pragma (Entry_Id, Name_Max_Queue_Length)
+   (Has_Rep_Pragma (Entry_Id, Name_Max_Queue_Length)
+  and then
+Prag_Id /= Pragma_Max_Queue_Length)
 then
-   Error_Msg_N ("??duplicate Max_Entry_Queue_Length pragma", N);
+   Error_Msg_N ("??maximum entry queue length already set", N);
 end if;
 
 --  Mark the pragma as Ghost if the related subprogram is also
-- 
2.43.2



[COMMITTED 06/30] ada: Reject too-strict alignment specifications.

2024-05-20 Thread Marc Poulhiès
From: Steve Baird 

For a discrete (or fixed-point) type T, GNAT requires that T'Object_Size
shall be a multiple of T'Alignment * 8 .
GNAT also requires that T'Object_Size shall be no larger than
Standard'Max_Integer_Size.
For a sufficiently-large alignment specification, these requirements can
conflict.
The conflict is resolved by rejecting such alignment specifications (which
were previously accepted in some cases).

gcc/ada/

* freeze.adb (Adjust_Esize_For_Alignment): Assert that a valid
Alignment specification cannot result in adjusting the given
type's Esize to be larger than System_Max_Integer_Size.
* sem_ch13.adb (Analyze_Attribute_Definition_Clause): In analyzing
an Alignment specification, enforce the rule that a specified
Alignment value for a discrete or fixed-point type shall not be
larger than System_Max_Integer_Size / 8 .

gcc/testsuite/ChangeLog:

* gnat.dg/specs/alignment2.ads: Adjust.
* gnat.dg/specs/alignment2_bis.ads: New test.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb|  8 +++--
 gcc/ada/sem_ch13.adb  | 15 
 gcc/testsuite/gnat.dg/specs/alignment2.ads| 14 
 .../gnat.dg/specs/alignment2_bis.ads  | 36 +++
 4 files changed, 57 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/specs/alignment2_bis.ads

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index a980c7e5b47..26e9d01d8b2 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -303,8 +303,12 @@ package body Freeze is
   if Known_Esize (Typ) and then Known_Alignment (Typ) then
  Align := Alignment_In_Bits (Typ);
 
- if Align > Esize (Typ) and then Align <= System_Max_Integer_Size then
-Set_Esize (Typ, Align);
+ if Align > Esize (Typ) then
+if Align > System_Max_Integer_Size then
+   pragma Assert (Serious_Errors_Detected > 0);
+else
+   Set_Esize (Typ, Align);
+end if;
  end if;
   end if;
end Adjust_Esize_For_Alignment;
diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
index 13bf93ca548..59c80022c20 100644
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -6573,6 +6573,21 @@ package body Sem_Ch13 is
 ("alignment for & set to Maximum_Aligment??", Nam);
   Set_Alignment (U_Ent, Max_Align);
 
+   --  Because Object_Size must be multiple of Alignment (in bits),
+   --  System_Max_Integer_Size limit for discrete and fixed point
+   --  types implies a limit on alignment for such types.
+
+   elsif (Is_Discrete_Type (U_Ent)
+or else Is_Fixed_Point_Type (U_Ent))
+ and then Align > System_Max_Integer_Size / System_Storage_Unit
+   then
+  Error_Msg_N
+("specified alignment too large for discrete or fixed " &
+ "point type", Expr);
+  Set_Alignment
+(U_Ent, UI_From_Int (System_Max_Integer_Size /
+ System_Storage_Unit));
+
--  All other cases
 
else
diff --git a/gcc/testsuite/gnat.dg/specs/alignment2.ads 
b/gcc/testsuite/gnat.dg/specs/alignment2.ads
index 0b6c14f1b7d..75a002e9bee 100644
--- a/gcc/testsuite/gnat.dg/specs/alignment2.ads
+++ b/gcc/testsuite/gnat.dg/specs/alignment2.ads
@@ -32,18 +32,4 @@ package Alignment2 is
   end record;
   for R4'Alignment use 32;
 
-  -- warning
-  type I1 is new Integer_32;
-  for I1'Size use 32;
-  for I1'Alignment use 32; -- { dg-warning "suspiciously large alignment" }
-
-  -- warning
-  type I2 is new Integer_32;
-  for I2'Alignment use 32; -- { dg-warning "suspiciously large alignment" }
-
-  -- OK, big size
-  type I3 is new Integer_32;
-  for I3'Size use 32 * 8; -- { dg-warning "unused" }
-  for I3'Alignment use 32;
-
 end Alignment2;
diff --git a/gcc/testsuite/gnat.dg/specs/alignment2_bis.ads 
b/gcc/testsuite/gnat.dg/specs/alignment2_bis.ads
new file mode 100644
index 000..ad31a400b84
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/specs/alignment2_bis.ads
@@ -0,0 +1,36 @@
+-- { dg-do compile }
+
+with Interfaces; use Interfaces;
+
+package Alignment2_Bis is
+
+  pragma Warnings (Off, "*size*");
+
+  -- OK, big size
+  type R3 is record
+A, B, C, D : Integer_8;
+  end record;
+  for R3'Size use 32 * 8;
+  for R3'Alignment use 32;
+
+  -- OK, big size
+  type R4 is record
+A, B, C, D, E, F, G, H : Integer_32;
+  end record;
+  for R4'Alignment use 32;
+
+  -- warning
+  type I1 is new Integer_32;
+  for I1'Size use 32;
+  for I1'Alignment use 32; -- { dg-error "error: specified alignment too large 
for discrete or fixed point type" }
+
+  -- warning
+  type I2 is new Integer_32;
+  for I2'Alignment use 32; -- { dg-error "error: specified 

[COMMITTED 07/30] ada: Use System.Address for address computation in System.Pool_Global

2024-05-20 Thread Marc Poulhiès
From: Sebastian Poeplau 

Some architectures don't let us convert
System.Storage_Elements.Integer_Address back to a valid System.Address.
Using the arithmetic operations on System.Address from
System.Storage_Elements prevents the problem while leaving semantics
unchanged.

gcc/ada/

* libgnat/s-pooglo.adb (Allocate): Use arithmetic on
System.Address to compute the aligned address.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-pooglo.adb | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/libgnat/s-pooglo.adb b/gcc/ada/libgnat/s-pooglo.adb
index dea3de15cc5..9ce21c8fd0d 100644
--- a/gcc/ada/libgnat/s-pooglo.adb
+++ b/gcc/ada/libgnat/s-pooglo.adb
@@ -75,9 +75,10 @@ package body System.Pool_Global is
 
  --  Realign the returned address
 
- Aligned_Address := To_Address
-   (To_Integer (Allocated) + Integer_Address (Alignment)
-  - (To_Integer (Allocated) mod Integer_Address (Alignment)));
+ Aligned_Address :=
+   Allocated + Alignment
+   - Storage_Offset (To_Integer (Allocated)
+ mod Integer_Address (Alignment));
 
  --  Save the block address
 
-- 
2.43.2



[COMMITTED 05/30] ada: One more adjustment coming from aliasing considerations

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

It is needed on PowerPC platforms because of specific calling conventions.

gcc/ada/

* libgnat/g-sothco.ads (In_Addr): Add aspect Universal_Aliasing.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/g-sothco.ads | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/g-sothco.ads b/gcc/ada/libgnat/g-sothco.ads
index 8c219333649..da1e6f5bcdd 100644
--- a/gcc/ada/libgnat/g-sothco.ads
+++ b/gcc/ada/libgnat/g-sothco.ads
@@ -123,10 +123,13 @@ package GNAT.Sockets.Thin_Common is
 
type In_Addr is record
   S_B1, S_B2, S_B3, S_B4 : C.unsigned_char;
-   end record with Convention => C, Alignment => C.int'Alignment;
+   end record
+ with Convention => C, Alignment  => C.int'Alignment, Universal_Aliasing;
--  IPv4 address, represented as a network-order C.int. Note that the
--  underlying operating system may assume that values of this type have
-   --  C.int alignment, so we need to provide a suitable alignment clause here.
+   --  C.int's alignment, so we need to provide a suitable alignment clause.
+   --  We also need to inhibit strict type-based aliasing optimizations in
+   --  order to implement the following unchecked conversions efficiently.
 
function To_In_Addr is new Ada.Unchecked_Conversion (C.int, In_Addr);
function To_Int is new Ada.Unchecked_Conversion (In_Addr, C.int);
-- 
2.43.2



[COMMITTED 08/30] ada: Fix for attribute Width on enumeration types with Discard_Name

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix computation of attribute 'Width for enumeration types with
Discard_Name aspect enabled.

gcc/ada/

* exp_imgv.adb (Expand_Width_Attribute): Fix for 'Width that
is computed at run time.
* sem_attr.adb (Eval_Attribute): Fix for 'Width that is computed
at compilation time.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_imgv.adb | 25 +++--
 gcc/ada/sem_attr.adb |  7 ---
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/exp_imgv.adb b/gcc/ada/exp_imgv.adb
index 6dc59f2c6f3..e5d84cc52e3 100644
--- a/gcc/ada/exp_imgv.adb
+++ b/gcc/ada/exp_imgv.adb
@@ -2294,7 +2294,7 @@ package body Exp_Imgv is
  --  in the range of the subtype + 1 for the space at the start. We
  --  build:
 
- -- Tnn : constant Integer := Rtyp'Pos (Ptyp'Last)
+ -- Tnn : constant Integer := Rtyp'Pos (Ptyp'Last);
 
  --  and replace the expression by
 
@@ -2320,9 +2320,15 @@ package body Exp_Imgv is
 declare
Tnn   : constant Entity_Id := Make_Temporary (Loc, 'T');
Cexpr : Node_Id;
-   P : Int;
-   M : Int;
-   K : Int;
+
+   P : constant Nat :=
+ UI_To_Int (Enumeration_Pos (Entity (Type_High_Bound (Rtyp;
+   --  The largest value that might need to be represented
+
+   K : Pos;
+   M : Pos;
+   --  K is the number of chars that will fit the image of 0..M-1;
+   --  M is the smallest number that won't fit in K chars.
 
 begin
Insert_Action (N,
@@ -2342,14 +2348,13 @@ package body Exp_Imgv is
  Attribute_Name => Name_Last));
 
--  OK, now we need to build the if expression. First get the
-   --  value of M, the largest possible value needed.
+   --  values of K and M for the largest possible value P.
 
-   P := UI_To_Int
-  (Enumeration_Pos (Entity (Type_High_Bound (Rtyp;
+   K := 2;
+   M := 10;
+   --  With 2 characters we can represent values in 0..9
 
-   K := 1;
-   M := 1;
-   while M < P loop
+   while P >= M loop
   M := M * 10;
   K := K + 1;
end loop;
diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index a921909685a..96f216cc587 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -10906,9 +10906,10 @@ package body Sem_Attr is
  --  that accommodates the Pos of the largest value, which
  --  is the high bound of the range + one for the space.
 
- W := 1;
- T := Hi;
- while T /= 0 loop
+ W := 1;  --  one character for the leading space
+ W := W + 1;  --  one character for the 0 .. 9 digit
+ T := Hi; --  one character for every decimal digit
+ while T >= 10 loop
 T := T / 10;
 W := W + 1;
  end loop;
-- 
2.43.2



[COMMITTED 15/30] ada: Fix style in list of implementation-defined attributes

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/

* sem_attr.ads (Attribute_Impl_Def): Fix style in comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.ads | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
index 0e7d1693682..d18bd5b0667 100644
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -288,6 +288,10 @@ package Sem_Attr is
   --  attribute is primarily intended for use in implementation of the
   --  standard input-output functions for fixed-point values.
 
+  
+  --  Invalid_Value --
+  
+
   Attribute_Invalid_Value => True,
   --  For every scalar type, S'Invalid_Value designates an undefined value
   --  of the type. If possible this value is an invalid value, and in fact
@@ -298,6 +302,10 @@ package Sem_Attr is
   --  coding standards in use), but logically no initialization is needed,
   --  and the value should never be accessed.
 
+  
+  -- Loop_Entry --
+  
+
   Attribute_Loop_Entry => True,
   --  For every object of a non-limited type, S'Loop_Entry [(Loop_Name)]
   --  denotes the constant value of prefix S at the point of entry into the
-- 
2.43.2



[COMMITTED 16/30] ada: Use discrete choice list in declaration of universal type attributes

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup.

gcc/ada/

* sem_attr.ads (Universal_Type_Attribute): Simplify using
array aggregate syntax with discrete choice list.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.ads | 62 ++--
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
index d18bd5b0667..40ec423c4c7 100644
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -615,37 +615,37 @@ package Sem_Attr is
--  universal type.
 
Universal_Type_Attribute : constant array (Attribute_Id) of Boolean :=
- (Attribute_Aft  => True,
-  Attribute_Alignment=> True,
-  Attribute_Component_Size   => True,
-  Attribute_Count=> True,
-  Attribute_Delta=> True,
-  Attribute_Digits   => True,
-  Attribute_Exponent => True,
-  Attribute_First_Bit=> True,
-  Attribute_Fore => True,
-  Attribute_Last_Bit => True,
-  Attribute_Length   => True,
-  Attribute_Machine_Emax => True,
-  Attribute_Machine_Emin => True,
-  Attribute_Machine_Mantissa => True,
-  Attribute_Machine_Radix=> True,
-  Attribute_Max_Alignment_For_Allocation => True,
-  Attribute_Max_Size_In_Storage_Elements => True,
-  Attribute_Model_Emin   => True,
-  Attribute_Model_Epsilon=> True,
-  Attribute_Model_Mantissa   => True,
-  Attribute_Model_Small  => True,
-  Attribute_Modulus  => True,
-  Attribute_Pos  => True,
-  Attribute_Position => True,
-  Attribute_Safe_First   => True,
-  Attribute_Safe_Last=> True,
-  Attribute_Scale=> True,
-  Attribute_Size => True,
-  Attribute_Small=> True,
-  Attribute_Wide_Wide_Width  => True,
-  Attribute_Wide_Width   => True,
+ (Attribute_Aft  |
+  Attribute_Alignment|
+  Attribute_Component_Size   |
+  Attribute_Count|
+  Attribute_Delta|
+  Attribute_Digits   |
+  Attribute_Exponent |
+  Attribute_First_Bit|
+  Attribute_Fore |
+  Attribute_Last_Bit |
+  Attribute_Length   |
+  Attribute_Machine_Emax |
+  Attribute_Machine_Emin |
+  Attribute_Machine_Mantissa |
+  Attribute_Machine_Radix|
+  Attribute_Max_Alignment_For_Allocation |
+  Attribute_Max_Size_In_Storage_Elements |
+  Attribute_Model_Emin   |
+  Attribute_Model_Epsilon|
+  Attribute_Model_Mantissa   |
+  Attribute_Model_Small  |
+  Attribute_Modulus  |
+  Attribute_Pos  |
+  Attribute_Position |
+  Attribute_Safe_First   |
+  Attribute_Safe_Last|
+  Attribute_Scale|
+  Attribute_Size |
+  Attribute_Small|
+  Attribute_Wide_Wide_Width  |
+  Attribute_Wide_Width   |
   Attribute_Width=> True,
   others => False);
 
-- 
2.43.2



[COMMITTED 01/30] ada: Rework and augment documentation on strict aliasing

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

The documentation was originally centered around pragma No_Strict_Aliasing
and pragma Universal_Aliasing was mentioned only as an afterthought.  It
also contained a warning about the usage of overlays implemented by means
of address clauses that has been obsolete for long.

gcc/ada/

* doc/gnat_rm/implementation_defined_pragmas.rst
(Universal_Aliasing): Remove reference to No_Strict_Aliasing.
* doc/gnat_ugn/gnat_and_program_execution.rst
(Optimization and Strict Aliasinng): Simplify first example and
make it more consistent with the second.  Add description of the
effects of pragma Universal_Aliasing and document new warning
issued for unchecked conversions.  Remove obsolete stuff.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 .../implementation_defined_pragmas.rst|   7 +-
 .../gnat_ugn/gnat_and_program_execution.rst   | 296 +
 gcc/ada/gnat_rm.texi  |   7 +-
 gcc/ada/gnat_ugn.texi | 306 ++
 4 files changed, 353 insertions(+), 263 deletions(-)

diff --git a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst 
b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
index 7f221e32344..bcbd85984dc 100644
--- a/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
+++ b/gcc/ada/doc/gnat_rm/implementation_defined_pragmas.rst
@@ -6949,10 +6949,9 @@ Syntax:
 
 ``type_LOCAL_NAME`` must refer to a type declaration in the current
 declarative part.  The effect is to inhibit strict type-based aliasing
-optimization for the given type.  In other words, the effect is as though
-access types designating this type were subject to pragma No_Strict_Aliasing.
-For a detailed description of the strict aliasing optimization, and the
-situations in which it must be suppressed, see the section on
+optimizations for the given type.  For a detailed description of the
+strict type-based aliasing optimizations and the situations in which
+they need to be suppressed, see the section on
 ``Optimization and Strict Aliasing`` in the :title:`GNAT User's Guide`.
 
 .. _Pragma-Unmodified:
diff --git a/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst 
b/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
index 35e34772658..d502da87eb0 100644
--- a/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
+++ b/gcc/ada/doc/gnat_ugn/gnat_and_program_execution.rst
@@ -2072,37 +2072,36 @@ the following example:
 
   .. code-block:: ada
 
- procedure R is
+ procedure M is
 type Int1 is new Integer;
+I1 : Int1;
+
 type Int2 is new Integer;
-type Int1A is access Int1;
-type Int2A is access Int2;
-Int1V : Int1A;
-Int2V : Int2A;
+type A2 is access Int2;
+V2 : A2;
 ...
 
  begin
 ...
 for J in Data'Range loop
-   if Data (J) = Int1V.all then
-  Int2V.all := Int2V.all + 1;
+   if Data (J) = I1 then
+  V2.all := V2.all + 1;
end if;
 end loop;
 ...
- end R;
+ end;
 
-In this example, since the variable ``Int1V`` can only access objects
-of type ``Int1``, and ``Int2V`` can only access objects of type
-``Int2``, there is no possibility that the assignment to
-``Int2V.all`` affects the value of ``Int1V.all``. This means that
-the compiler optimizer can "know" that the value ``Int1V.all`` is constant
-for all iterations of the loop and avoid the extra memory reference
-required to dereference it each time through the loop.
+In this example, since ``V2`` can only access objects of type ``Int2``
+and ``I1`` is not one of them, there is no possibility that the assignment
+to ``V2.all`` affects the value of ``I1``. This means that the compiler
+optimizer can infer that the value ``I1`` is constant for all iterations
+of the loop and load it from memory only once, before entering the loop,
+instead of in every iteration (this is called load hoisting).
 
-This kind of optimization, called strict aliasing analysis, is
+This kind of optimizations, based on strict type-based aliasing, is
 triggered by specifying an optimization level of :switch:`-O2` or
-higher or :switch:`-Os` and allows GNAT to generate more efficient code
-when access values are involved.
+higher (or :switch:`-Os`) and allows the compiler to generate more
+efficient code.
 
 However, although this optimization is always correct in terms of
 the formal semantics of the Ada Reference Manual, difficulties can
@@ -2111,173 +2110,214 @@ the typing system. Consider the following complete 
program example:
 
   .. code-block:: ada
 
-  package p1 is
- type int1 is new integer;
- type int2 is new integer;
- type a1 is access int1;
- type a2 is access int2;
-  end p1;
+  package P1 is
+ type Int1 is new Integer;
+ ty

[COMMITTED 09/30] ada: Fix static 'Img for enumeration type with Discard_Names

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Fix a short-circuit folding of 'Img for enumeration type, which wrongly
ignored Discard_Names and exposed enumeration literals.

gcc/ada/

* sem_attr.adb (Eval_Attribute): Handle enumeration type with
Discard_Names.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 96f216cc587..2b22cf13ad0 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -8221,13 +8221,26 @@ package body Sem_Attr is
   then
  declare
 Lit : constant Entity_Id := Expr_Value_E (P);
+Typ : constant Entity_Id := Etype (Entity (P));
 Str : String_Id;
 
  begin
 Start_String;
-Get_Unqualified_Decoded_Name_String (Chars (Lit));
-Set_Casing (All_Upper_Case);
-Store_String_Chars (Name_Buffer (1 .. Name_Len));
+
+--  If Discard_Names is in effect for the type, then we emit the
+--  numeric representation of the prefix literal 'Pos attribute,
+--  prefixed with a single space.
+
+if Discard_Names (Typ) then
+   UI_Image (Enumeration_Pos (Lit), Decimal);
+   Store_String_Char  (' ');
+   Store_String_Chars (UI_Image_Buffer (1 .. UI_Image_Length));
+else
+   Get_Unqualified_Decoded_Name_String (Chars (Lit));
+   Set_Casing (All_Upper_Case);
+   Store_String_Chars (Name_Buffer (1 .. Name_Len));
+end if;
+
 Str := End_String;
 
 Rewrite (N, Make_String_Literal (Loc, Strval => Str));
-- 
2.43.2



[COMMITTED 14/30] ada: Tweak handling of thread ID on POSIX

2024-05-20 Thread Marc Poulhiès
From: Ronan Desplanques 

This patch changes the task initialization subprograms on POSIX
platforms so that the thread ID of an ATCB is only set once.
This has the advantage of getting rid of the Atomic aspect on
the corresponding record component, and silences a Helgrind
warning about a data race.

gcc/ada/

* libgnarl/s-taprop__linux.adb (Enter_Task): Move setting
of thread ID out of Enter_Task.
(Initialize): Set thread ID for the environment task.
(Create_Task): Remove now unnecessary Unrestricted_Access
attribute and add justification for a memory write.
* libgnarl/s-taprop__posix.adb: Likewise.
* libgnarl/s-taprop__qnx.adb: Likewise.
* libgnarl/s-taprop__rtems.adb: Likewise.
* libgnarl/s-taprop__solaris.adb: Likewise.
* libgnarl/s-taspri__posix.ads: Remove pragma Atomic for
Private_Data.Thread, and update documentation comment.
* libgnarl/s-taspri__lynxos.ads: Likewise.
* libgnarl/s-taspri__posix-noaltstack.ads: Likewise.
* libgnarl/s-taspri__solaris.ads: Likewise.
* libgnarl/s-tporft.adb (Register_Foreign_Thread): Adapt to
Enter_Task not setting the thread ID anymore.
* libgnarl/s-tassta.adb (Task_Wrapper): Update comment.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taprop__linux.adb| 14 +++---
 gcc/ada/libgnarl/s-taprop__posix.adb| 14 +++---
 gcc/ada/libgnarl/s-taprop__qnx.adb  | 14 +++---
 gcc/ada/libgnarl/s-taprop__rtems.adb| 14 +++---
 gcc/ada/libgnarl/s-taprop__solaris.adb  | 16 
 gcc/ada/libgnarl/s-taspri__lynxos.ads   | 16 ++--
 gcc/ada/libgnarl/s-taspri__posix-noaltstack.ads | 16 ++--
 gcc/ada/libgnarl/s-taspri__posix.ads| 16 ++--
 gcc/ada/libgnarl/s-taspri__solaris.ads  | 16 ++--
 gcc/ada/libgnarl/s-tassta.adb   |  2 +-
 gcc/ada/libgnarl/s-tporft.adb   |  1 +
 11 files changed, 78 insertions(+), 61 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taprop__linux.adb 
b/gcc/ada/libgnarl/s-taprop__linux.adb
index 0c09817739c..0a51b3601c0 100644
--- a/gcc/ada/libgnarl/s-taprop__linux.adb
+++ b/gcc/ada/libgnarl/s-taprop__linux.adb
@@ -730,7 +730,6 @@ package body System.Task_Primitives.Operations is
  raise Invalid_CPU_Number;
   end if;
 
-  Self_ID.Common.LL.Thread := pthread_self;
   Self_ID.Common.LL.LWP := lwp_self;
 
   --  Set thread name to ease debugging. If the name of the task is
@@ -1004,14 +1003,14 @@ package body System.Task_Primitives.Operations is
   --  do not need to manipulate caller's signal mask at this point.
   --  All tasks in RTS will have All_Tasks_Mask initially.
 
-  --  Note: the use of Unrestricted_Access in the following call is needed
-  --  because otherwise we have an error of getting a access-to-volatile
-  --  value which points to a non-volatile object. But in this case it is
-  --  safe to do this, since we know we have no problems with aliasing and
-  --  Unrestricted_Access bypasses this check.
+  --  The write to T.Common.LL.Thread is not racy with regard to the
+  --  created thread because the created thread will not access it until
+  --  we release the RTS lock (or the current task's lock when
+  --  Restricted.Stages is used). One can verify that by inspecting the
+  --  Task_Wrapper procedures.
 
   Result := pthread_create
-(T.Common.LL.Thread'Unrestricted_Access,
+(T.Common.LL.Thread'Access,
  Thread_Attr'Access,
  Thread_Body_Access (Wrapper),
  To_Address (T));
@@ -1385,6 +1384,7 @@ package body System.Task_Primitives.Operations is
 
begin
   Environment_Task_Id := Environment_Task;
+  Environment_Task.Common.LL.Thread := pthread_self;
 
   Interrupt_Management.Initialize;
 
diff --git a/gcc/ada/libgnarl/s-taprop__posix.adb 
b/gcc/ada/libgnarl/s-taprop__posix.adb
index 7ed52ea2d82..fb70aaf4976 100644
--- a/gcc/ada/libgnarl/s-taprop__posix.adb
+++ b/gcc/ada/libgnarl/s-taprop__posix.adb
@@ -636,7 +636,6 @@ package body System.Task_Primitives.Operations is
 
procedure Enter_Task (Self_ID : Task_Id) is
begin
-  Self_ID.Common.LL.Thread := pthread_self;
   Self_ID.Common.LL.LWP := lwp_self;
 
   Specific.Set (Self_ID);
@@ -841,14 +840,14 @@ package body System.Task_Primitives.Operations is
   --  do not need to manipulate caller's signal mask at this point.
   --  All tasks in RTS will have All_Tasks_Mask initially.
 
-  --  Note: the use of Unrestricted_Access in the following call is needed
-  --  because otherwise we have an error of getting a access-to-volatile
-  --  value which points to a non-volatile object. But in this case it is
-  --  safe to do this, since we know we have no problems with aliasin

[COMMITTED 22/30] ada: Handle accessibility calculations for 'First and 'Last

2024-05-20 Thread Marc Poulhiès
From: Justin Squirek 

This patch fixes a crash in the compiler whereby calculating the accessibility
level of of a local variable whose original expression is an 'First on an
array type led to an error during compilation.

gcc/ada/

* accessibility.adb (Accessibility_Level): Add cases for 'First
and 'Last.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/accessibility.adb | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/accessibility.adb b/gcc/ada/accessibility.adb
index c0a9d50f38a..33ce001718a 100644
--- a/gcc/ada/accessibility.adb
+++ b/gcc/ada/accessibility.adb
@@ -465,7 +465,15 @@ package body Accessibility is
 --  so handle these cases explicitly.
 
 elsif Attribute_Name (E)
-in Name_Old | Name_Loop_Entry | Name_Result | Name_Super
+in Name_Old|
+   Name_Loop_Entry |
+   Name_Result |
+   Name_Super  |
+   Name_Tag|
+   Name_Safe_First |
+   Name_Safe_Last  |
+   Name_First  |
+   Name_Last
 then
--  Named access types
 
-- 
2.43.2



[COMMITTED 11/30] ada: Fix incorrect free with Task_Info pragma

2024-05-20 Thread Marc Poulhiès
From: Ronan Desplanques 

Before this patch, on Linux, the procedure
System.Task_Primitives.Operations.Set_Task_Affinity called CPU_FREE on
instances of cpu_set_t_ptr that it didn't own when the obsolescent
Task_Info pragma was in play. This patch fixes that issue.

gcc/ada/

* libgnarl/s-taprop__linux.adb (Set_Task_Affinity): Fix
decision about whether to call CPU_FREE.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-taprop__linux.adb | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/libgnarl/s-taprop__linux.adb 
b/gcc/ada/libgnarl/s-taprop__linux.adb
index 1faa3d8914e..0c09817739c 100644
--- a/gcc/ada/libgnarl/s-taprop__linux.adb
+++ b/gcc/ada/libgnarl/s-taprop__linux.adb
@@ -1466,12 +1466,13 @@ package body System.Task_Primitives.Operations is
 and then T.Common.LL.Thread /= Null_Thread_Id
   then
  declare
-CPUs: constant size_t :=
-C.size_t (Multiprocessors.Number_Of_CPUs);
-CPU_Set : cpu_set_t_ptr := null;
-Size: constant size_t := CPU_ALLOC_SIZE (CPUs);
+CPUs : constant size_t :=
+  C.size_t (Multiprocessors.Number_Of_CPUs);
+CPU_Set  : cpu_set_t_ptr := null;
+Is_Set_Owned : Boolean := False;
+Size : constant size_t := CPU_ALLOC_SIZE (CPUs);
 
-Result  : C.int;
+Result   : C.int;
 
  begin
 --  We look at the specific CPU (Base_CPU) first, then at the
@@ -1483,6 +1484,7 @@ package body System.Task_Primitives.Operations is
--  Set the affinity to an unique CPU
 
CPU_Set := CPU_ALLOC (CPUs);
+   Is_Set_Owned := True;
System.OS_Interface.CPU_ZERO (Size, CPU_Set);
System.OS_Interface.CPU_SET
  (int (T.Common.Base_CPU), Size, CPU_Set);
@@ -1499,6 +1501,7 @@ package body System.Task_Primitives.Operations is
--  dispatching domain.
 
CPU_Set := CPU_ALLOC (CPUs);
+   Is_Set_Owned := True;
System.OS_Interface.CPU_ZERO (Size, CPU_Set);
 
for Proc in T.Common.Domain'Range loop
@@ -1512,7 +1515,9 @@ package body System.Task_Primitives.Operations is
   pthread_setaffinity_np (T.Common.LL.Thread, Size, CPU_Set);
 pragma Assert (Result = 0);
 
-CPU_FREE (CPU_Set);
+if Is_Set_Owned then
+   CPU_FREE (CPU_Set);
+end if;
  end;
   end if;
end Set_Task_Affinity;
-- 
2.43.2



[COMMITTED 12/30] ada: Resolve ACATS compilation and execution issues with container aggregates

2024-05-20 Thread Marc Poulhiès
From: Gary Dismukes 

This change set addresses various compilation and execution problems
encountered in the draft ACATS tests for container aggregates:

C435001 (container aggregates with Assign_Indexed)
C435002 (container aggregates with Add_Unnamed)
C435003 (container aggregates with Add_Named)
C435004 (container aggregates with Assign_Indexed and Add_Unnamed)

gcc/ada/

* exp_aggr.adb (Expand_Container_Aggregate): Add top-level
variables Choice_{Lo|Hi} and Int_Choice_{Lo|Hi} used for
determining the low and high bounds of component association
choices. Replace code for determining whether we have an indexed
aggregate with call to new function Sem_Aggr.Is_Indexed_Aggregate.
Remove test of whether Empty_Subp is a function, since it must be
a function. Move Default and Count_Type to be locals of a new
block enclosing the code that creates the object to hold the
aggregate length, and set them according to the default and type
of the Empty function's parameter when present (and to Empty and
Standard_Natural otherwise). Use Siz_Exp for the aggregate length
when set, and use Empty's default length when available, and use
zero for the length otherwise. In generating the call to the
New_Indexed function, use the determined lower and upper bounds if
determined earlier by Aggregate_Size, and otherwise compute those
from the index type's lower bound and the determined aggregate
length. In the case where a call to Empty is generated and the
function has a formal parameter, pass the value saved in Siz_Decl
(otherwise the parameter list is empty). Remove code specific to
making a parameterless call to the Empty function. Extend the code
for handling positional container aggregates to account for types
that define Assign_Indexed, rather than just Add_Unnamed, and in
the case of indexed aggregates, create a temporary object to hold
values of the aggregate's key index, and initialize and increment
that temporary for each call generated to the Assign_Indexed
procedure. For named container aggregates that have key choices
given by ranges, call Expand_Range_Component to generate a loop
that will call the appropriate insertion procedure for each value
of the range. For indexed aggregates with a Component_Associations
list, set and use the Assign_Indexed procedure for each component
association, whether or not there's an iterator specification.
(Add_Range_Size): Add code to determine the low and high bounds of
the range and capture those in up-level variables when their value
is less than or greater than (respectively) the current minimum
and maximum bounds values.
(Aggregate_Size): Separately handle the case where a single choice
is of a discrete type, and call Add_Range_Size to take its value
into consideration for determination of min and max bounds of the
aggregate. Add comments in a couple of places.
(Build_Siz_Exp): Remove the last sentence and "???" from the
comment that talks about accumulating nonstatic sizes, since that
sentence seems to be obsolete. Record the low and high bound
values in Choice_Lo and Choice_Hi in the case of a nonstatic
range.
(Expand_Iterated_Component): Set the Defining_Identifier of the
iterator specification to the Loop_Id in the
N_Iterated_Component_Association case.
(Expand_Range_Component): Procedure unnested from the block
handling indexed aggregates in Expand_Container_Aggregate, and
moved to top level of that procedure so it can also be called for
Add_Named cases. A formal parameter Insert_Op is added, and
existing calls to this procedure are changed to pass the
appropriate insertion procedure's Entity.
* sem_aggr.ads: Add with_clause for Sinfo.Nodes.
(Is_Indexed_Aggregate): New function for use by
Resolve_Container_Aggregate and Expand_Container_Aggregate.
* sem_aggr.adb: Add with_clause for Sem_Ch5. Move with_clause for
Sinfo.Nodes to sem_aggr.ads.
(Is_Indexed_Aggregate): New function to determine whether a
container aggregate is a container aggregate (replacing local
variable of the same name in Resolve_Container_Aggregate).
(Resolve_Iterated_Association): Remove part of comment saying that
a Key_Expression is always present. Set Parent field of the copy
of a component association with a loop parameter specification. On
the setting of Loop_Param_Id, account for a
Loop_Parameter_Specification being changed into an
Iterator_Specification as a result of being analyzed. Only call
Preanalyze_And_Resolve on Key_Expr when a key expression is
actua

[COMMITTED 13/30] ada: Extend expansion delaying mechanism to conditional expressions

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

When an aggregate that needs to be converted into a series of assignments is
present in an expression of a parent aggregate, or in the expression of an
allocator, an object declaration, or an assignment in very specific cases,
its expansion is delayed until its parent itself is expanded.  This makes
it possible to avoid creating a superfluous temporary for the aggregate.

This change extends the delaying mechanism in the case of record aggregates
to intermediate conditional expressions, that is to say, to the conditional
expressions that are present between the parent and the aggregate, provided
that the aggregate be a dependent expression, directly or recursively.  This
again makes it possible to avoid creating a temporary for the aggregate.

gcc/ada/

* exp_aggr.ads (Is_Delayed_Conditional_Expression): New predicate.
* exp_aggr.adb (Convert_To_Assignments.Known_Size): Likewise.
(Convert_To_Assignments): Climb the parent chain, looking through
qualified expressions and dependent expressions of conditional
expressions, to find out whether the expansion may be delayed.
Call Known_Size for this in the case of an object declaration.
If so, set Expansion_Delayed on the aggregate as well as all the
intermediate conditional expressions.
(Initialize_Component): Reset the Analyzed flag on an initialization
expression that is a conditional expression whose expansion has been
delayed.
(Is_Delayed_Conditional_Expression): New predicate.
* exp_ch3.adb (Expand_N_Object_Declaration): Handle initialization
expressions that are conditional expressions whose expansion has
been delayed.
* exp_ch4.adb (Build_Explicit_Assignment): New procedure.
(Expand_Allocator_Expression): Handle initialization expressions
that are conditional expressions whose expansion has been delayed.
(Expand_N_Case_Expression): Deal with expressions whose expansion
has been delayed by waiting for the rewriting of their parent as
an assignment statement and then optimizing the assignment.
(Expand_N_If_Expression): Likewise.
(Expand_N_Qualified_Expression): Do not apply a predicate check to
an operand that is a delayed aggregate or conditional expression.
* gen_il-gen-gen_nodes.adb (N_If_Expression): Add Expansion_Delayed
semantic flag.
(N_Case_Expression): Likewise.
* sinfo.ads (Expansion_Delayed): Document extended usage.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 201 -
 gcc/ada/exp_aggr.ads |   4 +
 gcc/ada/exp_ch3.adb  |  38 
 gcc/ada/exp_ch4.adb  | 363 ---
 gcc/ada/gen_il-gen-gen_nodes.adb |   4 +-
 gcc/ada/sinfo.ads|   4 +
 6 files changed, 479 insertions(+), 135 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 6208b49ffd9..a386aa85ae4 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -4216,84 +4216,152 @@ package body Exp_Aggr is
procedure Convert_To_Assignments (N : Node_Id; Typ : Entity_Id) is
   Loc : constant Source_Ptr := Sloc (N);
 
-  Aggr_Code   : List_Id;
-  Full_Typ: Entity_Id;
-  Instr   : Node_Id;
-  Parent_Kind : Node_Kind;
-  Parent_Node : Node_Id;
-  Target_Expr : Node_Id;
-  Temp: Entity_Id;
-  Unc_Decl: Boolean := False;
+  function Known_Size (Decl : Node_Id; Cond_Init : Boolean) return Boolean;
+  --  Decl is an N_Object_Declaration node. Return true if it declares an
+  --  object with a known size; in this context, that is always the case,
+  --  except for a declaration without explicit constraints of an object,
+  --  either whose nominal subtype is class-wide, or whose initialization
+  --  contains a conditional expression and whose nominal subtype is both
+  --  discriminated and unconstrained.
+
+  
+  -- Known_Size --
+  
+
+  function Known_Size (Decl : Node_Id; Cond_Init : Boolean) return Boolean
+  is
+  begin
+ if Is_Entity_Name (Object_Definition (Decl)) then
+declare
+   Typ : constant Entity_Id := Entity (Object_Definition (Decl));
+
+begin
+   return not Is_Class_Wide_Type (Typ)
+ and then not (Cond_Init
+and then Has_Discriminants (Typ)
+and then not Is_Constrained (Typ));
+end;
+
+ else
+return True;
+ end if;
+  end Known_Size;
+
+  --  Local variables
+
+  Aggr_Code: List_Id;
+  Full_Typ : Entity_Id;
+  In_Cond_Expr : Boolean;
+  Instr: Node_Id;
+  Node : Node_Id;
+  Parent_Node  : Node_Id;
+  Target_Exp

[COMMITTED 24/30] ada: Error on instantiation of generic containing legal container aggregate

2024-05-20 Thread Marc Poulhiès
From: Gary Dismukes 

When a container aggregate for a predefined container type (such as
a Vector type) that has an iterated component association occurs within
a generic unit and that generic is instantiated, the compiler reports
a spurious error message "iterated component association can only appear
in an array aggregate" and the compilation aborts (because Unrecoverable_Error
is raised unconditionally after that error). The problem is that as part of
the instantiation process, for aggregates whose type has a partial view,
in Copy_Generic_Node the compiler switches the visibility so that the full
view of the type is available, and for a type whose full view is a record
type this leads to incorrectly trying to process the aggregate as a record
aggregate in Resolve_Aggregate (making a call to Resolve_Record_Aggregate).

Rather than trying to address this by changing what Copy_Generic_Node does,
this can be fixed by reordering and adjusting the code in Resolve_Aggregate,
so that we first test whether we need to resolve as a record aggregate
(if the aggregate is not homogeneous), followed by testing whether the
type has an Aggregate aspect and calling Resolve_Container_Aggregate.
As a bonus, we also remove the subsequent complex condition and redundant
code for handling null container aggregates.

gcc/ada/

* sem_aggr.adb (Resolve_Aggregate): Move condition and call for
Resolve_Record_Aggregate in front of code related to calling
Resolve_Container_Aggregate (and add test that the aggregate is
not homogeneous), and remove special-case testing and call to
Resolve_Container_Aggregate for empty aggregates. Also, add error
check for an attempt to use "[]" for an aggregate of a record type
that does not specify an Aggregate aspect.
(Resolve_Record_Aggregate): Remove error check for record
aggregates with "[]" (now done by Resolve_Aggregate).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 6e40e5c2564..60738550ec1 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -1198,6 +1198,14 @@ package body Sem_Aggr is
 
  Resolve_Container_Aggregate (N, Typ);
 
+  --  Check for an attempt to use "[]" for an aggregate of a record type
+  --  after handling the case where the type has an Aggregate aspect,
+  --  because the aspect can be specified for record types, but if it
+  --  wasn't specified, then this is an error.
+
+  elsif Is_Record_Type (Typ) and then Is_Homogeneous_Aggregate (N) then
+ Error_Msg_N ("record aggregate must use (), not '[']", N);
+
   elsif Is_Array_Type (Typ) then
 
  --  First a special test, for the case of a positional aggregate of
@@ -5518,15 +5526,6 @@ package body Sem_Aggr is
  return;
   end if;
 
-  --  A record aggregate can only use parentheses
-
-  if Nkind (N) = N_Aggregate
-and then Is_Homogeneous_Aggregate (N)
-  then
- Error_Msg_N ("record aggregate must use (), not '[']", N);
- return;
-  end if;
-
   --  STEP 2: Verify aggregate structure
 
   Step_2 : declare
-- 
2.43.2



[COMMITTED 10/30] ada: Another small cleanup about allocators and aggregates

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

This eliminates a few more oddities present in the expander for allocators
and aggregates nested in allocators and other constructs:

  - Convert_Aggr_In_Allocator takes both the N_Allocator and the aggregate
as parameters, while the sibling procedures Convert_Aggr_In_Assignment
and Convert_Aggr_In_Object_Decl only take the former.  This changes the
first to be consistent with the two others and propagates the change to
Convert_Array_Aggr_In_Allocator.

  - Convert_Aggr_In_Object_Decl contains an awkward code structure with a
useless inner block statement.

  - In_Place_Assign_OK and Convert_To_Assignments have some declarations of
local variables not in the right place.

No functional changes (presumably).

gcc/ada/

* exp_aggr.ads (Convert_Aggr_In_Allocator): Remove Aggr parameter
and adjust description.
(Convert_Aggr_In_Object_Decl): Adjust description.
* exp_aggr.adb (Convert_Aggr_In_Allocator): Remove Aggr parameter
and add local variable of the same name instead.  Adjust call to
Convert_Array_Aggr_In_Allocator.
(Convert_Aggr_In_Object_Decl): Add comment for early return and
remove useless inner block statement.
(Convert_Array_Aggr_In_Allocator):  Remove Aggr parameter and add
local variable of the same name instead.
(In_Place_Assign_OK): Move down declarations of local variables.
(Convert_To_Assignments): Put all declarations of local variables
in the same place.  Fix typo in comment.  Replace T with Full_Typ.
* exp_ch4.adb (Expand_Allocator_Expression): Call Unqualify instead
of Expression on the qualified expression of the allocator for the
sake of consistency.  Adjust call to Convert_Aggr_In_Allocator.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 188 +--
 gcc/ada/exp_aggr.ads |  18 ++---
 gcc/ada/exp_ch4.adb  |   4 +-
 3 files changed, 104 insertions(+), 106 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 2476675604c..8a3d1685cb3 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -282,10 +282,7 @@ package body Exp_Aggr is
--Indexes is the current list of expressions used to index the object we
--are writing into.
 
-   procedure Convert_Array_Aggr_In_Allocator
- (N  : Node_Id;
-  Aggr   : Node_Id;
-  Target : Node_Id);
+   procedure Convert_Array_Aggr_In_Allocator (N : Node_Id; Target : Node_Id);
--  If the aggregate appears within an allocator and can be expanded in
--  place, this routine generates the individual assignments to components
--  of the designated object. This is an optimization over the general
@@ -3543,11 +3540,8 @@ package body Exp_Aggr is
-- Convert_Aggr_In_Allocator --
---
 
-   procedure Convert_Aggr_In_Allocator
- (N: Node_Id;
-  Aggr : Node_Id;
-  Temp : Entity_Id)
-   is
+   procedure Convert_Aggr_In_Allocator (N : Node_Id; Temp : Entity_Id) is
+  Aggr : constant Node_Id:= Unqualify (Expression (N));
   Loc  : constant Source_Ptr := Sloc (Aggr);
   Typ  : constant Entity_Id  := Etype (Aggr);
 
@@ -3557,7 +3551,7 @@ package body Exp_Aggr is
 
begin
   if Is_Array_Type (Typ) then
- Convert_Array_Aggr_In_Allocator (N, Aggr, Occ);
+ Convert_Array_Aggr_In_Allocator (N, Occ);
 
   elsif Has_Default_Init_Comps (Aggr) then
  declare
@@ -3605,12 +3599,9 @@ package body Exp_Aggr is
   Aggr : constant Node_Id:= Unqualify (Expression (N));
   Loc  : constant Source_Ptr := Sloc (Aggr);
   Typ  : constant Entity_Id  := Etype (Aggr);
-  Occ  : constant Node_Id:= New_Occurrence_Of (Obj, Loc);
-
-  Has_Transient_Scope : Boolean := False;
 
   function Discriminants_Ok return Boolean;
-  --  If the object type is constrained, the discriminants in the
+  --  If the object's subtype is constrained, the discriminants in the
   --  aggregate must be checked against the discriminants of the subtype.
   --  This cannot be done using Apply_Discriminant_Checks because after
   --  expansion there is no aggregate left to check.
@@ -3677,10 +3668,19 @@ package body Exp_Aggr is
  return True;
   end Discriminants_Ok;
 
+  --  Local variables
+
+  Has_Transient_Scope : Boolean;
+  Occ : Node_Id;
+  Param   : Node_Id;
+  Stmt: Node_Id;
+  Stmts   : List_Id;
+
--  Start of processing for Convert_Aggr_In_Object_Decl
 
begin
-  Set_Assignment_OK (Occ);
+  --  First generate discriminant checks if need be, and bail out if one
+  --  of them fails statically.
 
   if Has_Discriminants (Typ)
 and then Typ /= Etype (Obj)
@@ -3706,61 +3706,59 @@ package body Exp_Aggr is
   then
  Establish_Trans

[COMMITTED 23/30] ada: Error on instantiation of generic containing legal container aggregate

2024-05-20 Thread Marc Poulhiès
From: Gary Dismukes 

When a container aggregate for a predefined container type (such as
a Vector type) that has an iterated component association occurs within
a generic unit and that generic is instantiated, the compiler reports
a spurious error message "iterated component association can only appear
in an array aggregate" and the compilation aborts (because Unrecoverable_Error
is raised unconditionally after that error). The problem is that as part of
the instantiation process, for aggregates whose type has a partial view,
in Copy_Generic_Node the compiler switches the visibility so that the full
view of the type is available, and for a type whose full view is a record
type this leads to incorrectly trying to process the aggregate as a record
aggregate in Resolve_Aggregate (making a call to Resolve_Record_Aggregate).

Rather than trying to address this by changing what Copy_Generic_Node does,
this can be fixed by reordering and adjusting the code in Resolve_Aggregate,
so that we first test whether we need to resolve as a record aggregate
(if the aggregate is not homogeneous), followed by testing whether the
type has an Aggregate aspect and calling Resolve_Container_Aggregate.
As a bonus, we also remove the subsequent complex condition and redundant
code for handling null container aggregates.

gcc/ada/

* sem_aggr.adb (Resolve_Aggregate): Move condition and call for
Resolve_Record_Aggregate in front of code related to calling
Resolve_Container_Aggregate (and add test that the aggregate
is not homogeneous), and remove special-case testing and call
to Resolve_Container_Aggregate for empty aggregates.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_aggr.adb | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/sem_aggr.adb b/gcc/ada/sem_aggr.adb
index 658b3a4634c..6e40e5c2564 100644
--- a/gcc/ada/sem_aggr.adb
+++ b/gcc/ada/sem_aggr.adb
@@ -1182,8 +1182,12 @@ package body Sem_Aggr is
   elsif Is_Array_Type (Typ) and then Null_Record_Present (N) then
  Error_Msg_N ("null record forbidden in array aggregate", N);
 
+  elsif Is_Record_Type (Typ)
+and then not Is_Homogeneous_Aggregate (N)
+  then
+ Resolve_Record_Aggregate (N, Typ);
+
   elsif Has_Aspect (Typ, Aspect_Aggregate)
-and then Ekind (Typ) /= E_Record_Type
 and then Ada_Version >= Ada_2022
   then
  --  Check for Ada 2022 and () aggregate.
@@ -1194,22 +1198,6 @@ package body Sem_Aggr is
 
  Resolve_Container_Aggregate (N, Typ);
 
-  --  Check Ada 2022 empty aggregate [] initializing a record type that has
-  --  aspect aggregate; the empty aggregate will be expanded into a call to
-  --  the empty function specified in the aspect aggregate.
-
-  elsif Has_Aspect (Typ, Aspect_Aggregate)
-and then Ekind (Typ) = E_Record_Type
-and then Is_Homogeneous_Aggregate (N)
-and then Is_Empty_List (Expressions (N))
-and then Is_Empty_List (Component_Associations (N))
-and then Ada_Version >= Ada_2022
-  then
- Resolve_Container_Aggregate (N, Typ);
-
-  elsif Is_Record_Type (Typ) then
- Resolve_Record_Aggregate (N, Typ);
-
   elsif Is_Array_Type (Typ) then
 
  --  First a special test, for the case of a positional aggregate of
-- 
2.43.2



[COMMITTED 20/30] ada: Fix list of implementation-defined attributes

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Several of the implementation-defined attributes were wrongly recognized
as defined by the Ada RM.

This change only affects code with restriction
No_Implementation_Attributes.

gcc/ada/

* sem_attr.ads (Attribute_Impl_Def): Fix list of
implementation-defined attributes.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.ads | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/gcc/ada/sem_attr.ads b/gcc/ada/sem_attr.ads
index 40ec423c4c7..52359e40ef6 100644
--- a/gcc/ada/sem_attr.ads
+++ b/gcc/ada/sem_attr.ads
@@ -609,6 +609,33 @@ package Sem_Attr is
   --  for constructing this definition in package System (see note above
   --  in Default_Bit_Order description). This is a static attribute.
 
+  Attribute_Atomic_Always_Lock_Free|
+  Attribute_Bit_Position   |
+  Attribute_Compiler_Version   |
+  Attribute_Descriptor_Size|
+  Attribute_Enabled|
+  Attribute_Fast_Math  |
+  Attribute_From_Any   |
+  Attribute_Has_Access_Values  |
+  Attribute_Has_Tagged_Values  |
+  Attribute_Initialized|
+  Attribute_Library_Level  |
+  Attribute_Pool_Address   |
+  Attribute_Restriction_Set|
+  Attribute_Scalar_Storage_Order   |
+  Attribute_Simple_Storage_Pool|
+  Attribute_Small_Denominator  |
+  Attribute_Small_Numerator|
+  Attribute_System_Allocator_Alignment |
+  Attribute_To_Any |
+  Attribute_TypeCode   |
+  Attribute_Type_Key   |
+  Attribute_Unconstrained_Array|
+  Attribute_Update |
+  Attribute_Valid_Value|
+  Attribute_Wchar_T_Size   => True,
+  --  See description in GNAT RM
+
   others => False);
 
--  The following table lists all attributes that yield a result of a
-- 
2.43.2



[COMMITTED 26/30] ada: Formal package comment corrections in sinfo.ads

2024-05-20 Thread Marc Poulhiès
From: Bob Duff 

Misc comment corrections and clarifications in sinfo.ads
related to generic formal packages.

gcc/ada/

* sinfo.ads: Misc comment corrections and clarifications.

The syntax for GENERIC_ASSOCIATION and FORMAL_PACKAGE_ACTUAL_PART
was wrong.

Emphasize that "others => <>" is not represented as an
N_Generic_Association (with or without Box_Present set),
and give examples illustrating the various possibilities.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sinfo.ads | 61 +++
 1 file changed, 46 insertions(+), 15 deletions(-)

diff --git a/gcc/ada/sinfo.ads b/gcc/ada/sinfo.ads
index 228082eb823..599f4f63cce 100644
--- a/gcc/ada/sinfo.ads
+++ b/gcc/ada/sinfo.ads
@@ -1574,9 +1574,9 @@ package Sinfo is
--  Instance_Spec
--This field is present in generic instantiation nodes, and also in
--formal package declaration nodes (formal package declarations are
-   --treated in a manner very similar to package instantiations). It points
-   --to the node for the spec of the instance, inserted as part of the
-   --semantic processing for instantiations in Sem_Ch12.
+   --treated similarly to package instantiations). It points to the node
+   --for the spec of the instance, inserted as part of the semantic
+   --processing for instantiations in Sem_Ch12.
 
--  Is_Abort_Block
--Present in N_Block_Statement nodes. True if the block protects a list
@@ -3639,8 +3639,8 @@ package Sinfo is
 
   --  The only choice that appears explicitly is the OTHERS choice, as
   --  defined here. Other cases of discrete choice (expression and
-  --  discrete range) appear directly. This production is also used
-  --  for the OTHERS possibility of an exception choice.
+  --  discrete range) appear directly. N_Others_Choice is also used
+  --  in exception handlers and generic formal packages.
 
   --  Note: in accordance with the syntax, the parser does not check that
   --  OTHERS appears at the end on its own in a choice list context. This
@@ -7139,6 +7139,7 @@ package Sinfo is
 
   --  GENERIC_ASSOCIATION ::=
   --[generic_formal_parameter_SELECTOR_NAME =>]
+  --  EXPLICIT_GENERIC_ACTUAL_PARAMETER
 
   --  Note: unlike the procedure call case, a generic association node
   --  is generated for every association, even if no formal parameter
@@ -7149,7 +7150,8 @@ package Sinfo is
   --  In Ada 2005, a formal may be associated with a box, if the
   --  association is part of the list of actuals for a formal package.
   --  If the association is given by  OTHERS => <>, the association is
-  --  an N_Others_Choice.
+  --  an N_Others_Choice (not an N_Generic_Association whose Selector_Name
+  --  is an N_Others_Choice).
 
   --  N_Generic_Association
   --  Sloc points to first token of generic association
@@ -7442,7 +7444,7 @@ package Sinfo is
   --  Defining_Identifier
   --  Name
   --  Generic_Associations (set to No_List if (<>) case or
-  --   empty generic actual part)
+  --   empty formal package actual part)
   --  Box_Present
   --  Instance_Spec
   --  Is_Known_Guaranteed_ABE
@@ -7452,21 +7454,50 @@ package Sinfo is
   --
 
   --  FORMAL_PACKAGE_ACTUAL_PART ::=
-  --([OTHERS] => <>)
+  --([OTHERS =>] <>)
   --| [GENERIC_ACTUAL_PART]
-  --(FORMAL_PACKAGE_ASSOCIATION {. FORMAL_PACKAGE_ASSOCIATION}
+  --| (FORMAL_PACKAGE_ASSOCIATION {, FORMAL_PACKAGE_ASSOCIATION}
+  --[, OTHERS => <>])
 
   --  FORMAL_PACKAGE_ASSOCIATION ::=
   --   GENERIC_ASSOCIATION
   --  | GENERIC_FORMAL_PARAMETER_SELECTOR_NAME => <>
 
   --  There is no explicit node in the tree for a formal package actual
-  --  part. Instead the information appears in the parent node (i.e. the
-  --  formal package declaration node itself).
-
-  --  There is no explicit node for a formal package association. All of
-  --  them are represented either by a generic association, possibly with
-  --  Box_Present, or by an N_Others_Choice.
+  --  part, nor for a formal package association. A formal package
+  --  association is represented as a generic association, possibly with
+  --  Box_Present.
+  --
+  --  The "others => <>" syntax (both cases) is represented as an
+  --  N_Others_Choice (not an N_Generic_Association whose Selector_Name
+  --  is an N_Others_Choice). This admittedly odd representation does not
+  --  lose information, because "others" cannot be followed by anything
+  --  other than "=> <>". Thus:
+  --
+  --  "... is new G;"
+  --The N_Formal_Package_Declaration has empty Generic_Associations,
+  --and Box_Present = False.
+  --
+  --  "... is new G(<>);"
+  -

[COMMITTED 17/30] ada: Remove repeated condition in check for implementation attributes

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Remove condition that is
already checked by an enclosing IF statement.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 2b22cf13ad0..6c32d201c55 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -3225,7 +3225,7 @@ package body Sem_Attr is
 
   if Comes_From_Source (N) then
  if not Attribute_83 (Attr_Id) then
-if Ada_Version = Ada_83 and then Comes_From_Source (N) then
+if Ada_Version = Ada_83 then
Error_Msg_Name_1 := Aname;
Error_Msg_N ("(Ada 83) attribute% is not standard??", N);
 end if;
-- 
2.43.2



[COMMITTED 21/30] ada: Further refine 'Super attribute

2024-05-20 Thread Marc Poulhiès
From: Justin Squirek 

This patch relaxes the restriction on 'Super such that it can apply to abstract
type objects.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Remove restriction on 'Super
for abstract types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 4 
 1 file changed, 4 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index df52229b6aa..403810c8b5e 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -6683,10 +6683,6 @@ package body Sem_Attr is
 elsif Depends_On_Private (P_Type) then
Error_Attr_P ("prefix type of % is a private extension");
 
---  Check that we don't view convert to an abstract type
-
-elsif Is_Abstract_Type (Node (First_Elmt (Parents))) then
-   Error_Attr_P ("type of % cannot be abstract");
 end if;
 
 --  Generate a view conversion and analyze it
-- 
2.43.2



[COMMITTED 18/30] ada: Apply restriction No_Implementation_Attributes to source nodes only

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Restriction No_Implementation_Attributes must not be applied to nodes
that come from expansion. In particular, it must not be applied to
Object_Size, which is implementation-defined attribute before Ada 2022,
but appears in expansion of tagged types since Ada 95.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Move IF statement that
checks restriction No_Implementation_Attributes for Ada 2005,
2012 and Ada 2022 attributes inside Comes_From_Source condition
that checks the same restriction for Ada 83 attributes.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 6c32d201c55..414224e86b6 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -3221,9 +3221,10 @@ package body Sem_Attr is
 
   Check_Restriction_No_Use_Of_Attribute (N);
 
-  --  Deal with Ada 83 issues
-
   if Comes_From_Source (N) then
+
+ --  Deal with Ada 83 issues
+
  if not Attribute_83 (Attr_Id) then
 if Ada_Version = Ada_83 then
Error_Msg_Name_1 := Aname;
@@ -3234,19 +3235,19 @@ package body Sem_Attr is
Check_Restriction (No_Implementation_Attributes, N);
 end if;
  end if;
-  end if;
 
-  --  Deal with Ada 2005 attributes that are implementation attributes
-  --  because they appear in a version of Ada before Ada 2005, ditto for
-  --  Ada 2012 and Ada 2022 attributes appearing in an earlier version.
+ --  Deal with Ada 2005 attributes that are implementation attributes
+ --  because they appear in a version of Ada before Ada 2005, ditto for
+ --  Ada 2012 and Ada 2022 attributes appearing in an earlier version.
 
-  if (Attribute_05 (Attr_Id) and then Ada_Version < Ada_2005)
-or else
- (Attribute_12 (Attr_Id) and then Ada_Version < Ada_2012)
-or else
- (Attribute_22 (Attr_Id) and then Ada_Version < Ada_2022)
-  then
- Check_Restriction (No_Implementation_Attributes, N);
+ if (Attribute_05 (Attr_Id) and then Ada_Version < Ada_2005)
+   or else
+(Attribute_12 (Attr_Id) and then Ada_Version < Ada_2012)
+   or else
+(Attribute_22 (Attr_Id) and then Ada_Version < Ada_2022)
+ then
+Check_Restriction (No_Implementation_Attributes, N);
+ end if;
   end if;
 
   --   Remote access to subprogram type access attribute reference needs
-- 
2.43.2



[COMMITTED 25/30] ada: Add Is_Base_Type predicate to C interface

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

This also documents what the predicate effectively does.

gcc/ada/

* einfo-utils.ads (Is_Base_Type): Move to Miscellaneous Subprograms
section and add description.
* fe.h (Is_Base_Type): Declare.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/einfo-utils.ads | 8 ++--
 gcc/ada/fe.h| 4 +++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/einfo-utils.ads b/gcc/ada/einfo-utils.ads
index d87a3e34f49..01953c35bc3 100644
--- a/gcc/ada/einfo-utils.ads
+++ b/gcc/ada/einfo-utils.ads
@@ -183,8 +183,6 @@ package Einfo.Utils is
function Has_Null_Abstract_State (Id : E) return B;
function Has_Null_Visible_Refinement (Id : E) return B;
function Implementation_Base_Type (Id : E) return E;
-   function Is_Base_Type (Id : E) return B with Inline;
-   --  Note that Is_Base_Type returns True for nontypes
function Is_Boolean_Type (Id : E) return B with Inline;
function Is_Constant_Object (Id : E) return B with Inline;
function Is_Controlled (Id : E) return B with Inline;
@@ -504,6 +502,12 @@ package Einfo.Utils is
--  is the name of a class_wide type whose root is incomplete, return the
--  corresponding full declaration, else return T itself.
 
+   function Is_Base_Type (Id : E) return B with Inline;
+   --  Return True for a type entity and False for a subtype entity. Note that
+   --  this returns True for nontypes.
+
+   --  WARNING: There is a matching C declaration of this subprogram in fe.h
+
function Is_Entity_Name (N : Node_Id) return Boolean with Inline;
--  Test if the node N is the name of an entity (i.e. is an identifier,
--  expanded name, or an attribute reference that returns an entity).
diff --git a/gcc/ada/fe.h b/gcc/ada/fe.h
index 692c29a70af..b4c1aea5c8b 100644
--- a/gcc/ada/fe.h
+++ b/gcc/ada/fe.h
@@ -98,9 +98,11 @@ extern void Set_Normalized_First_Bit (Entity_Id, Uint);
 extern void Set_Normalized_Position(Entity_Id, Uint);
 extern void Set_RM_Size(Entity_Id, Uint);
 
+#define Is_Base_Type   einfo__utils__is_base_type
 #define Is_Entity_Name einfo__utils__is_entity_name
 
-extern Boolean Is_Entity_Name  (Node_Id);
+extern Boolean Is_Base_Type(Entity_Id);
+extern Boolean Is_Entity_Name  (Node_Id);
 
 #define Get_Attribute_Definition_Clause
einfo__utils__get_attribute_definition_clause
 
-- 
2.43.2



[COMMITTED 19/30] ada: Fix list of attributes defined by Ada 2012

2024-05-20 Thread Marc Poulhiès
From: Piotr Trojanek 

Recognize references to attributes Old, Overlaps_Storage and Result as
language-defined in Ada 2012 and implementation-defined in earlier
versions of Ada. Other attributes introduced by Ada 2012 RM are
correctly categorized.

This change only affects code with restriction
No_Implementation_Attributes.

gcc/ada/

* sem_attr.adb (Attribute_12): Add attributes Old,
Overlaps_Storage and Result.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index 414224e86b6..df52229b6aa 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -170,7 +170,10 @@ package body Sem_Attr is
  (Attribute_First_Valid  |
   Attribute_Has_Same_Storage |
   Attribute_Last_Valid   |
-  Attribute_Max_Alignment_For_Allocation => True,
+  Attribute_Max_Alignment_For_Allocation |
+  Attribute_Old  |
+  Attribute_Overlaps_Storage |
+  Attribute_Result   => True,
   others => False);
 
--  The following array is the list of attributes defined in the Ada 2022
-- 
2.43.2



[COMMITTED 29/30] ada: Add direct workaround for limitations of RTSfind mechanism

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

This adds a direct workaround for the spurious compilation errors caused by
the presence of preconditions/postconditions in the Interfaces.C unit, which
trip on limitations of the RTSfind mechanism when it comes to visibility, as
well as removes an indirect workaround that was added very recently.

These errors were first triggered in the context of finalization and worked
around by preloading the System.Finalization_Primitives unit.  Now they also
appear in the context of tasking, and it turns out that the preloading trick
does not work for separate compilation units.

gcc/ada/

* exp_ch7.ads (Preload_Finalization_Collection): Delete.
* exp_ch7.adb (Allows_Finalization_Collection): Revert change.
(Preload_Finalization_Collection): Delete.
* opt.ads (Interface_Seen): Likewise.
* scng.adb (Scan): Revert latest change.
* sem_ch10.adb: Remove clause for Exp_Ch7.
(Analyze_Compilation_Unit): Revert latest change.
* libgnat/i-c.ads: Use a fully qualified name for the standard "+"
operator in the preconditons/postconditions of subprograms.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 38 --
 gcc/ada/exp_ch7.ads |  6 --
 gcc/ada/libgnat/i-c.ads | 19 +++
 gcc/ada/opt.ads |  4 
 gcc/ada/scng.adb|  5 +
 gcc/ada/sem_ch10.adb|  3 ---
 6 files changed, 12 insertions(+), 63 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index fdacf1cdc01..993c13c7318 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -965,12 +965,6 @@ package body Exp_Ch7 is
   if Restriction_Active (No_Finalization) then
  return False;
 
-  --  The System.Finalization_Primitives unit must have been preloaded if
-  --  finalization is really required.
-
-  elsif not RTU_Loaded (System_Finalization_Primitives) then
- return False;
-
   --  Do not consider C and C++ types since it is assumed that the non-Ada
   --  side will handle their cleanup.
 
@@ -8630,38 +8624,6 @@ package body Exp_Ch7 is
   return Scope_Stack.Table (Scope_Stack.Last).Node_To_Be_Wrapped;
end Node_To_Be_Wrapped;
 
-   --
-   -- Preload_Finalization_Collection --
-   --
-
-   procedure Preload_Finalization_Collection (Compilation_Unit : Node_Id) is
-   begin
-  --  We can't call RTE (Finalization_Collection) for at least some
-  --  predefined units, because it would introduce cyclic dependences,
-  --  as the type is itself a controlled type.
-  --
-  --  It's only needed when finalization is involved in the unit, which
-  --  requires the presence of controlled or class-wide types in the unit
-  --  (see the Sem_Util.Needs_Finalization predicate for the rationale).
-  --  But controlled types are tagged or contain tagged (sub)components
-  --  so it is sufficient for the parser to detect the "interface" and
-  --  "tagged" keywords.
-  --
-  --  Don't do it if Finalization_Collection is unavailable in the runtime
-
-  if not In_Predefined_Unit (Compilation_Unit)
-and then (Interface_Seen or else Tagged_Seen)
-and then not No_Run_Time_Mode
-and then RTE_Available (RE_Finalization_Collection)
-  then
- declare
-Ignore : constant Entity_Id := RTE (RE_Finalization_Collection);
- begin
-null;
- end;
-  end if;
-   end Preload_Finalization_Collection;
-

-- Store_Actions_In_Scope --

diff --git a/gcc/ada/exp_ch7.ads b/gcc/ada/exp_ch7.ads
index 386a02b9283..712671a427e 100644
--- a/gcc/ada/exp_ch7.ads
+++ b/gcc/ada/exp_ch7.ads
@@ -257,12 +257,6 @@ package Exp_Ch7 is
--  Build a call to suppress the finalization of the object Obj, only after
--  creating the Master_Node of Obj if it does not already exist.
 
-   procedure Preload_Finalization_Collection (Compilation_Unit : Node_Id);
-   --  Call RTE (RE_Finalization_Collection) if necessary to load the packages
-   --  involved in finalization support. We need to do this explicitly, fairly
-   --  early during compilation, because otherwise it happens during freezing,
-   --  which triggers visibility bugs in generic instantiations.
-

-- Task and Protected Object finalization --

diff --git a/gcc/ada/libgnat/i-c.ads b/gcc/ada/libgnat/i-c.ads
index fe87fba32b6..f9f9f75fc03 100644
--- a/gcc/ada/libgnat/i-c.ads
+++ b/gcc/ada/libgnat/i-c.ads
@@ -24,6 +24,9 @@ pragma Assertion_Policy (Pre=> Ignore,
  Contract_Cases => Ignore,
  Ghost  => Ignore);
 
+--  Pre/postconditions use a fully qualified name for the st

[COMMITTED 27/30] ada: Get rid of secondary stack for indefinite record types with size clause

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

This change eliminates the use of the secondary stack for indefinite record
types for which a valid (object) size clause is specified.  In accordance
with the RM, the compiler accepts (object) size clauses on such types only
if all the components, including those of the variants of the variant part
if any, have a size known at compile time, and only if the clauses specify
a value that is at least as large as the largest possible size of objects
of the types when all the variants are considered.  However, it would still
have used the secondary stack, despite valid (object) size clauses, before
the change, as soon as a variant part was present in the types.

gcc/ada/

* freeze.ads (Check_Compile_Time_Size): Remove obsolete description
of usage for the Size_Known_At_Compile_Time flag.
* freeze.adb (Check_Compile_Time_Size.Size_Known): In the case where
a variant part is present, do not return False if Esize is known.
* sem_util.adb (Needs_Secondary_Stack.Caller_Known_Size_Record): Add
missing "Start of processing" comment.  Return true if either a size
clause or an object size clause has been given for the first subtype
of the type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb   |  1 +
 gcc/ada/freeze.ads   | 11 +--
 gcc/ada/sem_util.adb | 12 
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 26e9d01d8b2..ea6106e6455 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -1077,6 +1077,7 @@ package body Freeze is
 and then
   No (Discriminant_Default_Value (First_Discriminant (T)))
 and then not Known_RM_Size (T)
+and then not Known_Esize (T)
   then
  return False;
   end if;
diff --git a/gcc/ada/freeze.ads b/gcc/ada/freeze.ads
index fc0b7678fdc..066d8f054f6 100644
--- a/gcc/ada/freeze.ads
+++ b/gcc/ada/freeze.ads
@@ -156,17 +156,16 @@ package Freeze is
--RM_Size field is set to the required size, allowing for possible front
--end packing of an array using this type as a component type.
--
-   --  Note: the flag Size_Known_At_Compile_Time is used to determine if the
-   --  secondary stack must be used to return a value of the type, and also
-   --  to determine whether a component clause is allowed for a component
-   --  of the given type.
-   --
-   --  Note: this is public because of one dubious use in Sem_Res???
+   --  Note: the flag Size_Known_At_Compile_Time is used to determine whether a
+   --  size clause is allowed for the type, and also whether a component clause
+   --  is allowed for a component of the type.
--
--  Note: Check_Compile_Time_Size does not test the case of the size being
--  known because a size clause is specifically given. That is because we
--  do not allow a size clause if the size would not otherwise be known at
--  compile time in any case.
+   --
+   --  ??? This is public because of dubious uses in Sem_Ch3 and Sem_Res
 
procedure Check_Inherited_Conditions
 (R   : Entity_Id;
diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
index 09358278210..15994b4d1e9 100644
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -22409,6 +22409,8 @@ package body Sem_Util is
 return False;
  end Depends_On_Discriminant;
 
+  --  Start of processing for Caller_Known_Size_Record
+
   begin
  --  This is a protected type without Corresponding_Record_Type set,
  --  typically because expansion is disabled. The safe thing to do is
@@ -22418,6 +22420,16 @@ package body Sem_Util is
 return True;
  end if;
 
+ --  If either size is specified for the type, then it's known in the
+ --  caller in particular. Note that, even if the clause is confirming,
+ --  this does not change the outcome since the size was already known.
+
+ if Has_Size_Clause (First_Subtype (Typ))
+   or else Has_Object_Size_Clause (First_Subtype (Typ))
+ then
+return True;
+ end if;
+
  --  First see if we have a variant part and return False if it depends
  --  on discriminants.
 
-- 
2.43.2



[COMMITTED 28/30] ada: Fix internal error on nested aggregate in conditional expression

2024-05-20 Thread Marc Poulhiès
From: Eric Botcazou 

This plugs a loophole in the change improving code generation for nested
aggregates present in conditional expressions: once the delayed expansion
is chosen for the nested aggregate, the expansion of the parent aggregate
cannot be left to the back-end and the test must be adjusted to implement
this in the presence of conditional expressions too.

gcc/ada/

* exp_aggr.adb (Expand_Record_Aggregate.Component_OK_For_Backend):
Also return False for a delayed conditional expression.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index a386aa85ae4..796b0f1e0de 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -8376,7 +8376,9 @@ package body Exp_Aggr is
Static_Components := False;
return False;
 
-elsif Is_Delayed_Aggregate (Expr_Q) then
+elsif Is_Delayed_Aggregate (Expr_Q)
+  or else Is_Delayed_Conditional_Expression (Expr_Q)
+then
Static_Components := False;
return False;
 
-- 
2.43.2



[COMMITTED 30/30] ada: Allow 'others' in formal packages with overloaded formals

2024-05-20 Thread Marc Poulhiès
From: Bob Duff 

If a generic package has two or more generic formal parameters with the
same defining name (which can happen only for formal subprograms), then
RM-12.7(4.1/3) disallows named associations in a corresponding formal
package. This is not intended to cover "others => <>".

This patch allows "others => <>" even when it applies to such
formals. Previously, the compiler incorrectly gave an error.

Minor related cleanups involving type Text_Ptr.

gcc/ada/

* sem_ch12.adb: Misc cleanups and comment fixes.
(Check_Overloaded_Formal_Subprogram): Remove the Others_Choice
error message.
(Others_Choice): Remove this variable; no longer needed.
* types.ads (Text_Ptr): Add a range constraint limiting the
subtype to values that are actually used. This has the advantage
that when the compiler is compiled with validity checks,
uninitialized values of subtypes Text_Ptr and Source_Ptr will be
caught.
* sinput.ads (Sloc_Adjust): Use the base subtype; this is used as
an offset, so we need to allow arbitrary negative values.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 27 ++-
 gcc/ada/sinput.ads   |  2 +-
 gcc/ada/types.ads|  7 +++
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 4ceddda2052..9919cda6340 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -1130,10 +1130,11 @@ package body Sem_Ch12 is
   Saved_Formal: Node_Id;
 
   Default_Formals : constant List_Id := New_List;
-  --  If an Others_Choice is present, some of the formals may be defaulted.
-  --  To simplify the treatment of visibility in an instance, we introduce
-  --  individual defaults for each such formal. These defaults are
-  --  appended to the list of associations and replace the Others_Choice.
+  --  If an N_Others_Choice is present, some of the formals may be
+  --  defaulted. To simplify the treatment of visibility in an instance,
+  --  we introduce individual defaults for each such formal. These
+  --  defaults are appended to the list of associations and replace the
+  --  N_Others_Choice.
 
   Found_Assoc : Node_Id;
   --  Association for the current formal being match. Empty if there are
@@ -1145,9 +1146,8 @@ package body Sem_Ch12 is
   Num_Actuals: Nat := 0;
 
   Others_Present : Boolean := False;
-  Others_Choice  : Node_Id := Empty;
   --  In Ada 2005, indicates partial parameterization of a formal
-  --  package. As usual an other association must be last in the list.
+  --  package. As usual an 'others' association must be last in the list.
 
   procedure Build_Subprogram_Wrappers;
   --  Ada 2022: AI12-0272 introduces pre/postconditions for formal
@@ -1195,7 +1195,7 @@ package body Sem_Ch12 is
   procedure Process_Default (Formal : Node_Id);
   --  Add a copy of the declaration of a generic formal to the list of
   --  associations, and add an explicit box association for its entity
-  --  if there is none yet, and the default comes from an Others_Choice.
+  --  if there is none yet, and the default comes from an N_Others_Choice.
 
   function Renames_Standard_Subprogram (Subp : Entity_Id) return Boolean;
   --  Determine whether Subp renames one of the subprograms defined in the
@@ -1314,14 +1314,8 @@ package body Sem_Ch12 is
   Error_Msg_N
 ("named association not allowed for overloaded formal",
  Found_Assoc);
-
-   else
-  Error_Msg_N
-("named association not allowed for overloaded formal",
- Others_Choice);
+  Abandon_Instantiation (Instantiation_Node);
end if;
-
-   Abandon_Instantiation (Instantiation_Node);
 end if;
 
 Next (Temp_Formal);
@@ -1592,7 +1586,7 @@ package body Sem_Ch12 is
 
  Append (Decl, Assoc_List);
 
- if No (Found_Assoc) then
+ if No (Found_Assoc) then -- i.e. 'others'
 Default :=
Make_Generic_Association (Loc,
  Selector_Name =>
@@ -1686,7 +1680,6 @@ package body Sem_Ch12 is
  while Present (Actual) loop
 if Nkind (Actual) = N_Others_Choice then
Others_Present := True;
-   Others_Choice  := Actual;
 
if Present (Next (Actual)) then
   Error_Msg_N ("OTHERS must be last association", Actual);
@@ -2311,7 +2304,7 @@ package body Sem_Ch12 is
 
   --  If this is a formal package, normalize the parameter list by adding
   --  explicit box associations for the formals that are covered by an
-  --  Others_Choice.
+  --  N_Others_Choice.
 
   Append_List (Default_Formals, Formals);
 
diff

[PATCH] RISC-V: Enable vectorization for vect-early-break_124-pr114403.c

2024-05-20 Thread Li Xu
From: xuli 

Because "targetm.slow_unaligned_access" is set to true by default
(aka -mtune=rocket) for RISC-V, it causes the __builtin_memcpy with
8 bytes failed to folded into int64 assignment during ccp1.

So adding "-mtune=generic-ooo" to the RISC-V target can vectorize
vect-early-break_124-pr114403.c.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-early-break_124-pr114403.c: Enable vectrization for 
RISC-V target.
---
 gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 101ae1e0eaa..610b951b262 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -1,8 +1,9 @@
 /* { dg-add-options vect_early_break } */
 /* { dg-require-effective-target vect_early_break_hw } */
 /* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options "-mtune=generic-ooo" { target riscv*-*-* } } */
 
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
 
 #include "tree-vect.h"
 
@@ -74,4 +75,3 @@ int main ()
 
   return 0;
 }
-
-- 
2.17.1



Re: [Patch, aarch64] v7: Preparatory patch to place target independent and dependent changed code in one file

2024-05-20 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Alex/Richard:
>
> All comments are addressed.
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface between target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> aarch64: Preparatory patch to place target independent and
> dependent changed code in one file
>
> Common infrastructure of load store pair fusion is divided into target
> independent and target dependent changed code.
>
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> 2024-05-18  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-ldp-fusion.cc: Factor out a
>   target-independent interface and move it to the head of the file
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 555 +++
>  1 file changed, 373 insertions(+), 182 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 1d9caeab05d..e4e55b84f8b 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -138,6 +138,235 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int &budget) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const = 0;
> +  virtual void advance () = 0;
> +};
> +
> +// When querying should_handle_writeback, this enum is used to
> +// qualify which opportunities we are asking about.
> +enum class writeback {
> +  // Only those writeback opportunities that arise from existing
> +  // auto-increment accesses.
> +  EXISTING,
> +
> +  // All writeback opportunities including those that involve folding
> +  // base register updates into a non-writeback pair.

This misses:

> There should be a comma after "opportunities"

from the previous review.  I.e.:

  // All writeback opportunities, including those that involve folding
  // base register updates into a non-writeback pair.

OK with that change, thanks.

Richard


[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isfinite. The finite check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isfinite
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isfinite_optab for isfinite builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
for isfinite builtin.
* optabs.def (isfinite_optab): New.
* doc/md.texi (isfinite): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..b8432f84020 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   errno_set = true; builtin_optab = ilogb_optab; break;
 CASE_FLT_FN (BUILT_IN_ISINF):
   builtin_optab = isinf_optab; break;
-case BUILT_IN_ISNORMAL:
 case BUILT_IN_ISFINITE:
+  builtin_optab = isfinite_optab; break;
+case BUILT_IN_ISNORMAL:
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..8ed70b3feea 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

 This pattern is not allowed to @code{FAIL}.

+@cindex @code{isfinite@var{m}2} instruction pattern
+@item @samp{isfinite@var{m}2}
+Set operand 0 to nonzero if operand 1 is a finite floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index ad14f9328b9..dcd77315c2a 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
 OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
+OPTAB_D (isfinite_optab, "isfinite$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] RISC-V: Enable vectorization for vect-early-break_124-pr114403.c

2024-05-20 Thread juzhe.zh...@rivai.ai
CC Robin who knows better than me in case of scheduling model in RISC-V



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-05-20 15:59
To: gcc-patches
CC: kito.cheng; palmer; tamar.christina; richard.guenther; Richard.Sandiford; 
juzhe.zhong; zhengyu; pan2.li; xuli
Subject: [PATCH] RISC-V: Enable vectorization for 
vect-early-break_124-pr114403.c
From: xuli 
 
Because "targetm.slow_unaligned_access" is set to true by default
(aka -mtune=rocket) for RISC-V, it causes the __builtin_memcpy with
8 bytes failed to folded into int64 assignment during ccp1.
 
So adding "-mtune=generic-ooo" to the RISC-V target can vectorize
vect-early-break_124-pr114403.c.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/vect-early-break_124-pr114403.c: Enable vectrization for RISC-V 
target.
---
gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
index 101ae1e0eaa..610b951b262 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_124-pr114403.c
@@ -1,8 +1,9 @@
/* { dg-add-options vect_early_break } */
/* { dg-require-effective-target vect_early_break_hw } */
/* { dg-require-effective-target vect_long_long } */
+/* { dg-additional-options "-mtune=generic-ooo" { target riscv*-*-* } } */
-/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" { xfail riscv*-*-* } } 
} */
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
#include "tree-vect.h"
@@ -74,4 +75,3 @@ int main ()
   return 0;
}
-
-- 
2.17.1
 
 


[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to document isnormal
in md.texi.
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index b8432f84020..ccd57fce522 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
 case BUILT_IN_ISFINITE:
   builtin_optab = isfinite_optab; break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab; break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ed70b3feea..b81b9dec18a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8562,6 +8562,11 @@ This pattern is not allowed to @code{FAIL}.
 Set operand 0 to nonzero if operand 1 is a finite floating-point
 number and to 0 otherwise.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Set operand 0 to nonzero if operand 1 is a normal floating-point
+number and to 0 otherwise.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


[Patch, fortran] PR103312 - [11/12/13/14/15 Regression] ICE in gfc_find_component since r9-1098-g3cf89a7b992d483e

2024-05-20 Thread Paul Richard Thomas
Hi All,

I don't think that this PR is really a regression although the fact that it
is marked as such brought it to my attention :-)

The fix turned out to be remarkably simple. It was found after going down a
silly number of rabbit holes, though!

The chunk in dependency.cc is probably more elaborate than it needs to be.
Returning -2 is sufficient for the testcase to work. Otherwise, the
comments in the patch say it all.

OK for mainline? I will delay for a month before backporting.

Regards

Paul


Change.Logs
Description: Binary data
diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
index fb4d94de641..bafe8cbc5bc 100644
--- a/gcc/fortran/dependency.cc
+++ b/gcc/fortran/dependency.cc
@@ -440,6 +440,38 @@ gfc_dep_compare_expr (gfc_expr *e1, gfc_expr *e2)
 	return mpz_sgn (e2->value.op.op2->value.integer);
 }
 
+
+  if (e1->expr_type == EXPR_COMPCALL)
+{
+  /* This will have emerged from interface.cc(gfc_check_typebound_override)
+	 via gfc_check_result_characteristics. It is possible that other
+	 variants exist that are 'equal' but play it safe for now by setting
+	 the relationship as 'indeterminate'.  */
+  if (e2->expr_type == EXPR_FUNCTION && e2->ref)
+	{
+	  gfc_ref *ref = e2->ref;
+	  gfc_symbol *s = NULL;
+
+	  if (e1->value.compcall.tbp->u.specific)
+	s = e1->value.compcall.tbp->u.specific->n.sym;
+
+	  /* Check if the proc ptr points to an interface declaration and the
+	 names are the same; ie. the overriden proc. of an abstract type.
+	 The checking of the arguments will already have been done.  */
+	  for (; ref && s; ref = ref->next)
+	if (!ref->next && ref->type == REF_COMPONENT
+		&& ref->u.c.component->attr.proc_pointer
+		&& ref->u.c.component->ts.interface
+		&& ref->u.c.component->ts.interface->attr.if_source
+			== IFSRC_IFBODY
+		&& !strcmp (s->name, ref->u.c.component->name))
+	  return 0;
+	}
+
+  /* Assume as default that TKR checking is sufficient.  */
+ return -2;
+  }
+
   if (e1->expr_type != e2->expr_type)
 return -3;
 
diff --git a/gcc/fortran/expr.cc b/gcc/fortran/expr.cc
index c883966646c..4ee2ad55915 100644
--- a/gcc/fortran/expr.cc
+++ b/gcc/fortran/expr.cc
@@ -3210,6 +3210,11 @@ gfc_reduce_init_expr (gfc_expr *expr)
 {
   bool t;
 
+  /* It is far too early to resolve a class compcall. Punt to resolution.  */
+  if (expr && expr->expr_type == EXPR_COMPCALL
+  && expr->symtree->n.sym->ts.type == BT_CLASS)
+return true;
+
   gfc_init_expr_flag = true;
   t = gfc_resolve_expr (expr);
   if (t)
diff --git a/gcc/testsuite/gfortran.dg/pr103312.f90 b/gcc/testsuite/gfortran.dg/pr103312.f90
new file mode 100644
index 000..deacc70bf5d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr103312.f90
@@ -0,0 +1,87 @@
+! { dg-do run }
+!
+! Test the fix for pr103312, in which the use of a component call in
+! initialization expressions, eg. character(this%size()), caused ICEs.
+!
+! Contributed by Arseny Solokha  
+!
+module example
+
+  type, abstract :: foo
+integer :: i
+  contains
+procedure(foo_size), deferred :: size
+procedure(foo_func), deferred :: func
+  end type
+
+  interface
+function foo_func (this) result (string)
+  import :: foo
+  class(foo) :: this
+  character(this%size()) :: string
+end function
+pure integer function foo_size (this)
+  import foo
+  class(foo), intent(in) :: this
+end function
+  end interface
+
+end module
+
+module extension
+  use example
+  implicit none
+  type, extends(foo) :: bar
+  contains
+procedure :: size
+procedure :: func
+  end type
+
+contains
+pure integer function size (this)
+  class(bar), intent(in) :: this
+  size = this%i
+end function
+function func (this) result (string)
+  class(bar) :: this
+  character(this%size()) :: string
+  string = repeat ("x", len (string))
+end function
+
+end module
+
+module unextended
+  implicit none
+  type :: foobar
+integer :: i
+  contains
+procedure :: size
+procedure :: func
+  end type
+
+contains
+pure integer function size (this)
+  class(foobar), intent(in) :: this
+  size = this%i
+end function
+function func (this) result (string)
+  class(foobar) :: this
+  character(this%size()) :: string
+  character(:), allocatable :: chr
+  string = repeat ("y", len (string))
+  allocate (character(this%size()) :: chr)
+  if (len (string) .ne. len (chr)) stop 1
+end function
+
+end module
+
+  use example
+  use extension
+  use unextended
+  type(bar) :: a
+  type(foobar) :: b
+  a%i = 5
+  if (a%func() .ne. 'x') stop 2
+  b%i = 7
+  if (b%func() .ne. 'yyy') stop 3
+end


Re: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Richard Sandiford
Pengxuan Zheng  writes:
> This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
> optimization opportunities for gimple optimizers.
>
> While we are here, we also remove the vget_low_* definitions from arm_neon.h 
> and
> use the new intrinsics framework.
>
> PR target/102171
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS):
>   New macro to create definitions for all vget_low intrinsics.
>   (VGET_LOW_BUILTIN): Likewise.
>   (enum aarch64_builtins): Add vget_low function codes.
>   (aarch64_general_fold_builtin): Fold vget_low calls.
>   * config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins.
>   * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
>   (aarch64_vget_lo_halfv8bf): Likewise.
>   * config/aarch64/arm_neon.h (__attribute__): Delete.
>   (vget_low_f16): Likewise.
>   (vget_low_f32): Likewise.
>   (vget_low_f64): Likewise.
>   (vget_low_p8): Likewise.
>   (vget_low_p16): Likewise.
>   (vget_low_p64): Likewise.
>   (vget_low_s8): Likewise.
>   (vget_low_s16): Likewise.
>   (vget_low_s32): Likewise.
>   (vget_low_s64): Likewise.
>   (vget_low_u8): Likewise.
>   (vget_low_u16): Likewise.
>   (vget_low_u32): Likewise.
>   (vget_low_u64): Likewise.
>   (vget_low_bf16): Likewise.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi
>   with vget_low_s16.
>   * gcc.target/aarch64/vget_low_2.c: New test.
>   * gcc.target/aarch64/vget_low_2_be.c: New test.

Ok, thanks.  I suppose the patch has the side effect of allowing
vget_low_bf16 to be called without +bf16.  IMO that's the correct
behaviour though, and is consistent with how we handle reinterprets.

Richard

> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-builtins.cc|  60 ++
>  gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
>  gcc/config/aarch64/aarch64-simd.md|  23 +---
>  gcc/config/aarch64/arm_neon.h | 105 --
>  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
>  gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
>  .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
>  7 files changed, 124 insertions(+), 132 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> b/gcc/config/aarch64/aarch64-builtins.cc
> index 75d21de1401..4afe7c86ae3 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
> aarch64_simd_builtin_data[] = {
>VREINTERPRET_BUILTINS \
>VREINTERPRETQ_BUILTINS
>  
> +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> +  VGET_LOW_BUILTIN(f16) \
> +  VGET_LOW_BUILTIN(f32) \
> +  VGET_LOW_BUILTIN(f64) \
> +  VGET_LOW_BUILTIN(p8) \
> +  VGET_LOW_BUILTIN(p16) \
> +  VGET_LOW_BUILTIN(p64) \
> +  VGET_LOW_BUILTIN(s8) \
> +  VGET_LOW_BUILTIN(s16) \
> +  VGET_LOW_BUILTIN(s32) \
> +  VGET_LOW_BUILTIN(s64) \
> +  VGET_LOW_BUILTIN(u8) \
> +  VGET_LOW_BUILTIN(u16) \
> +  VGET_LOW_BUILTIN(u32) \
> +  VGET_LOW_BUILTIN(u64) \
> +  VGET_LOW_BUILTIN(bf16)
> +
>  typedef struct
>  {
>const char *name;
> @@ -697,6 +714,9 @@ typedef struct
>  #define VREINTERPRET_BUILTIN(A, B, L) \
>AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
>  
> +#define VGET_LOW_BUILTIN(A) \
> +  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> +
>  #undef VAR1
>  #define VAR1(T, N, MAP, FLAG, A) \
>AARCH64_SIMD_BUILTIN_##T##_##N##A,
> @@ -732,6 +752,7 @@ enum aarch64_builtins
>AARCH64_CRC32_BUILTIN_MAX,
>/* SIMD intrinsic builtins.  */
>AARCH64_SIMD_VREINTERPRET_BUILTINS
> +  AARCH64_SIMD_VGET_LOW_BUILTINS
>/* ARMv8.3-A Pointer Authentication Builtins.  */
>AARCH64_PAUTH_BUILTIN_AUTIA1716,
>AARCH64_PAUTH_BUILTIN_PACIA1716,
> @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
> aarch64_fcmla_lane_builtin_data[] = {
>   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
>},
>  
> +#undef VGET_LOW_BUILTIN
> +#define VGET_LOW_BUILTIN(A) \
> +  {"vget_low_" #A, \
> +   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
> +   2, \
> +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> +   FLAG_AUTO_FP, \
> +   false \
> +  },
> +
> +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> +  VGET_LOW_BUILTIN(f16) \
> +  VGET_LOW_BUILTIN(f32) \
> +  VGET_LOW_BUILTIN(f64) \
> +  VGET_LOW_BUILTIN(p8) \
> +  VGET_LOW_BUILTIN(p16) \
> +  VGET_LOW_BUILTIN(p64) \
> +  VGET_LOW_BUILTIN(s8) \
> +  VGET_LOW_BUILTIN(s16) \
> +  VGET_LOW_BUILTIN(s32) \
> +  VGET_LOW_BUILTIN(s64) \
> +  VGET_LOW_BUILTIN(u8) \
> +  VGET_LOW_BUILTIN(u16) \
> +  VGET_LOW_BUILTIN(u32) \
> +  VGET_LOW_BUILTIN(u64) \
> +  VGET_LOW_BUILTIN(bf16

Re: [PATCH] AArch64: Fix printing of 2-instruction alternatives

2024-05-20 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Add missing '\' in 2-instruction movsi/di alternatives so that they are
> printed on separate lines.
>
> Passes bootstrap and regress, OK for commit once stage 1 reopens?
>
> gcc:
> * config/aarch64/aarch64.md (movsi_aarch64): Use '\;' to force
> newline in 2-instruction pattern.
> (movdi_aarch64): Likewise.

Oops, good catch.  Ok for trunk, thanks.

Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 
> 1a2e01284249223565cd12cf1bfd5db5475e56fb..5416c2e3b2002d0e53baf23e7c0048ddf683
>  100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1447,7 +1447,7 @@ (define_insn_and_split "*movsi_aarch64"
>   [w  , m  ; load_4   , fp  , 4] ldr\t%s0, %1
>   [m  , r Z; store_4  , *   , 4] str\t%w1, %0
>   [m  , w  ; store_4  , fp  , 4] str\t%s1, %0
> - [r  , Usw; load_4   , *   , 8] adrp\t%x0, %A1;ldr\t%w0, [%x0, %L1]
> + [r  , Usw; load_4   , *   , 8] adrp\t%x0, %A1\;ldr\t%w0, [%x0, %L1]
>   [r  , Usa; adr  , *   , 4] adr\t%x0, %c1
>   [r  , Ush; adr  , *   , 4] adrp\t%x0, %A1
>   [w  , r Z; f_mcr, fp  , 4] fmov\t%s0, %w1
> @@ -1484,7 +1484,7 @@ (define_insn_and_split "*movdi_aarch64"
>   [w, m  ; load_8   , fp  , 4] ldr\t%d0, %1
>   [m, r Z; store_8  , *   , 4] str\t%x1, %0
>   [m, w  ; store_8  , fp  , 4] str\t%d1, %0
> - [r, Usw; load_8   , *   , 8] << TARGET_ILP32 ? "adrp\t%0, %A1;ldr\t%w0, 
> [%0, %L1]" : "adrp\t%0, %A1;ldr\t%0, [%0, %L1]";
> + [r, Usw; load_8   , *   , 8] << TARGET_ILP32 ? "adrp\t%0, 
> %A1\;ldr\t%w0, [%0, %L1]" : "adrp\t%0, %A1\;ldr\t%0, [%0, %L1]";
>   [r, Usa; adr  , *   , 4] adr\t%x0, %c1
>   [r, Ush; adr  , *   , 4] adrp\t%x0, %A1
>   [w, r Z; f_mcr, fp  , 4] fmov\t%d0, %x1


Re: [PATCH] AArch64: Improve costing of ctz

2024-05-20 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Improve costing of ctz - both TARGET_CSSC and vector cases were not handled 
> yet.
>
> Passes regress & bootstrap - OK for commit?
>
> gcc:
> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ costing.

Ok, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> fe13c9a0d4863041eb9101882ea57c2094240d16..2a6f76f4008839bf0aa158504430af9b971c
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14309,10 +14309,24 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> outer ATTRIBUTE_UNUSED,
>return false;
>  
>  case CTZ:
> -  *cost = COSTS_N_INSNS (2);
> -
> -  if (speed)
> - *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +  if (VECTOR_MODE_P (mode))
> + {
> +   *cost = COSTS_N_INSNS (3);
> +   if (speed)
> + *cost += extra_cost->vect.alu * 3;
> + }
> +  else if (TARGET_CSSC)
> + {
> +   *cost = COSTS_N_INSNS (1);
> +   if (speed)
> + *cost += extra_cost->alu.clz;
> + }
> +  else
> + {
> +   *cost = COSTS_N_INSNS (2);
> +   if (speed)
> + *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> + }
>return false;
>  
>  case COMPARE:


[PATCH v2] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread pan2 . li
From: Pan Li 

There are sorts of match pattern for SAT related cases,  there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches.  Thus, extract one helper function to
do this and avoid match code duplication.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 regression test.

gcc/ChangeLog:

* generic-match-head.cc (integer_types_ternary_match): New helper
function to check tenary tree type matches or not.
* gimple-match-head.cc (integer_types_ternary_match): Ditto but
for match.
* match.pd: Leverage above helper function to avoid code dup.

Signed-off-by: Pan Li 
---
 gcc/generic-match-head.cc | 17 +
 gcc/gimple-match-head.cc  | 17 +
 gcc/match.pd  | 25 +
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 0d3f648fe8d..cdd48c7a5cc 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
   return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
 }
 
+/* Routine to determine if the types T1,  T2 and T3 are effectively
+   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
+   the test applies to their TREE_TYPE.  */
+
+static inline bool
+integer_types_ternary_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P (t3))
+return false;
+
+  return types_match (t1, t2) && types_match (t1, t3);
+}
+
 /* Return if T has a single use.  For GENERIC, we assume this is
always true.  */
 
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 5f8a1a1ad8e..91f2e56b8ef 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
   return types_compatible_p (t1, t2);
 }
 
+/* Routine to determine if the types T1,  T2 and T3 are effectively
+   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
+   the test applies to their TREE_TYPE.  */
+
+static inline bool
+integer_types_ternary_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P (t3))
+return false;
+
+  return types_match (t1, t2) && types_match (t1, t3);
+}
+
 /* Return if T has a single use.  For GIMPLE, we also allow any
non-SSA_NAME (ie constants) and zero uses to cope with uses
that aren't linked up yet.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..401b52e7573 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned Saturation Add */
 (match (usadd_left_part_1 @0 @1)
  (plus:c @0 @1)
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_left_part_2 @0 @1)
  (realpart (IFN_ADD_OVERFLOW:c @0 @1))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (lt (plus:c @0 @1) @0)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (gt @0 (plus:c @0 @1
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_2 @0 @1)
  (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (integer_types_ternary_match (type, @0, @1) && TYPE_UNSIGNED (type
 
 /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
because the sub part of left_part_2 cannot work with right_part_1.
-- 
2.34.1



RE: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread Li, Pan2
Thanks Andrew for comments, updated in v2.

Pan

From: Andrew Pinski 
Sent: Sunday, May 19, 2024 12:25 PM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Kito 
Cheng ; Tamar Christina ; 
Richard Guenther 
Subject: Re: [PATCH v1] Match: Extract integer_types_ternary_match helper to 
avoid code dup [NFC]


On Sat, May 18, 2024, 9:17 PM mailto:pan2...@intel.com>> 
wrote:
From: Pan Li mailto:pan2...@intel.com>>

There are sorts of match pattern for SAT related cases,  there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches.  Thus, extract one helper function to
do this and avoid match code duplication.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 regression test.

gcc/ChangeLog:

* generic-match-head.cc (integer_types_ternary_match): New helper
function to check tenary tree type matches or not.
* gimple-match-head.cc (integer_types_ternary_match): Ditto but
for match.
* match.pd: Leverage above helper function to avoid code dup.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
 gcc/generic-match-head.cc | 17 +
 gcc/gimple-match-head.cc  | 17 +
 gcc/match.pd  | 25 +
 3 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 0d3f648fe8d..cdd48c7a5cc 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
   return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
 }

+/* Routine to determine if the types T1,  T2 and T3 are effectively
+   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
+   the test applies to their TREE_TYPE.  */
+
+static inline bool
+integer_types_ternary_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P (t3))
+return false;
+
+  return types_match (t1, t2) && types_match (t1, t3);
+}
+
 /* Return if T has a single use.  For GENERIC, we assume this is
always true.  */

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 5f8a1a1ad8e..91f2e56b8ef 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
   return types_compatible_p (t1, t2);
 }

+/* Routine to determine if the types T1,  T2 and T3 are effectively
+   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
+   the test applies to their TREE_TYPE.  */
+
+static inline bool
+integer_types_ternary_match (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P (t3))
+return false;
+
+  return types_match (t1, t2) && types_match (t1, t3);
+}
+
 /* Return if T has a single use.  For GIMPLE, we also allow any
non-SSA_NAME (ie constants) and zero uses to cope with uses
that aren't linked up yet.  */
diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..b291e34bbe4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned Saturation Add */
 (match (usadd_left_part_1 @0 @1)
  (plus:c @0 @1)
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1


Even though unsigned might be the cheaper check, you might need to swap the 
order back to where it was so you check integral first.

Otherwise this is nice cleanup. (Note I can't approve it though).

Thanks,
Andrew


 (match (usadd_left_part_2 @0 @1)
  (realpart (IFN_ADD_OVERFLOW:c @0 @1))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1

 (match (usadd_right_part_1 @0 @1)
  (negate (convert (lt (plus:c @0 @1) @0)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1

 (match (usadd_right_part_1 @0 @1)
  (negate (convert (gt @0 (plus:c @0 @1
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1

 (match (usadd_right_part_2 @0 @1)

[PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD

2024-05-20 Thread pan2 . li
From: Pan Li 

This patch would like to support the branch form for unsigned
SAT_ADD.  For example as below:

uint64_t
sat_add (uint64_t x, uint64_t y)
{
  return (uint64_t) (x + y) >= x ? (x + y) : -1;
}

Different to the branchless version,  we leverage the simplify to
convert the branch version of SAT_ADD into branchless if and only
if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
the ability to choose branch or branchless implementation of .SAT_ADD.
For example,  some target can take care of branches code more optimally.

When the target implement the IFN_SAT_ADD for unsigned and before this
patch:
uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
{
  long unsigned int _1;
  uint64_t _2;
  __complex__ long unsigned int _6;
  long unsigned int _7;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
  _1 = REALPART_EXPR <_6>;
  _7 = IMAGPART_EXPR <_6>;
  if (_7 == 0)
goto ; [65.00%]
  else
goto ; [35.00%]
;;succ:   4
;;3

;;   basic block 3, loop depth 0
;;pred:   2
;;succ:   4

;;   basic block 4, loop depth 0
;;pred:   3
;;2
  # _2 = PHI <18446744073709551615(3), _1(2)>
  return _2;
;;succ:   EXIT

}

After this patch:
uint64_t sat_add (uint64_t x, uint64_t y)
{
  long unsigned int _9;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
  return _9;
;;succ:   EXIT
}

The below test suites are passed for this patch:
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression test.

gcc/ChangeLog:

* match.pd: Add new simplify to convert branch SAT_ADD into
branchless,  if and only if backend implement the IFN.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..0547b57b3a3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_add @0 @1)
  (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
 
+#if GIMPLE
+
+/* Simplify the branch version of SAT_ADD into branchless if and only if
+   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
+   ability to choose branch or branchless implementation of .SAT_ADD.  */
+
+(simplify
+ (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))
+
+(simplify
+ (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
+  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
+   (bit_ior @2 (negate (convert (lt @2 @0))
+
+#endif
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
-- 
2.34.1



[PATCH v1 2/2] RISC-V: Add test cases for branch form unsigned SAT_ADD

2024-05-20 Thread pan2 . li
From: Pan Li 

After we support branch form unsigned SAT_ADD from the
middle end.  Add more tests case to cover the functionarlities.

The below test suites are passed.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add branch form test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c: New test.
* gcc.target/riscv/sat_u_add-10.c: New test.
* gcc.target/riscv/sat_u_add-11.c: New test.
* gcc.target/riscv/sat_u_add-12.c: New test.
* gcc.target/riscv/sat_u_add-9.c: New test.
* gcc.target/riscv/sat_u_add-run-10.c: New test.
* gcc.target/riscv/sat_u_add-run-11.c: New test.
* gcc.target/riscv/sat_u_add-run-12.c: New test.
* gcc.target/riscv/sat_u_add-run-9.c: New test.

Signed-off-by: Pan Li 
---
 .../rvv/autovec/binop/vec_sat_u_add-10.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-11.c  | 20 +
 .../rvv/autovec/binop/vec_sat_u_add-12.c  | 20 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add-9.c | 19 +
 .../rvv/autovec/binop/vec_sat_u_add-run-10.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-11.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-12.c  | 75 +++
 .../rvv/autovec/binop/vec_sat_u_add-run-9.c   | 75 +++
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 23 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add-10.c | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add-11.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-12.c | 17 +
 gcc/testsuite/gcc.target/riscv/sat_u_add-9.c  | 19 +
 .../gcc.target/riscv/sat_u_add-run-10.c   | 25 +++
 .../gcc.target/riscv/sat_u_add-run-11.c   | 25 +++
 .../gcc.target/riscv/sat_u_add-run-12.c   | 25 +++
 .../gcc.target/riscv/sat_u_add-run-9.c| 25 +++
 17 files changed, 577 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-10.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-11.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_add-run-9.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c
new file mode 100644
index 000..db2233f04b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize 
-fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "../../../sat_arith.h"
+
+/*
+** vec_sat_u_add_uint16_t_fmt_3:
+** ...
+** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
+** ...
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
+** vsaddu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** ...
+*/
+DEF_VEC_SAT_U_ADD_FMT_3(uint16_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 4 "expand" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c
new file mode 100644
index 000..27cd38ea74f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_

RE: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread Tamar Christina
> -Original Message-
> From: pan2...@intel.com 
> Sent: Sunday, May 19, 2024 5:17 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com; Pan Li
> 
> Subject: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid
> code dup [NFC]
> 
> From: Pan Li 
> 
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
> 
> gcc/ChangeLog:
> 
>   * generic-match-head.cc (integer_types_ternary_match): New helper
>   function to check tenary tree type matches or not.
>   * gimple-match-head.cc (integer_types_ternary_match): Ditto but
>   for match.
>   * match.pd: Leverage above helper function to avoid code dup.
> 

Nice cleanup!

This function isn't part of the machinery of match.pd and is instead part of a 
pattern.
For these things we usually put them in tree.h/tree.cc and declare them at the 
top
of match.pd in the "define_predicates" list.

This will also allow you to get rid of the code duplication.  In addition such 
functions
which return a true/false we consider predicates, and name them ending with _p.

See e.g. bitmask_inv_cst_vector_p which is also defined in tree.h/tree.cc.

Cheers,
Tamar

> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
> 
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
> 
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
> 
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
> 
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..b291e34bbe4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> 
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> 
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGN

[COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-20 Thread Mark Wielaard
risc-v added an -mfence-tso option. i386 removed Xeon Phi ISA support
options. But the opt.urls files weren't regenerated.

Fixes: a6114c2a6911 ("RISC-V: Implement -m{,no}fence-tso")
Fixes: e1a7e2c54d52 ("i386: Remove Xeon Phi ISA support")

gcc/ChangeLog:

* config/riscv/riscv.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Likewise.
---
 gcc/config/i386/i386.opt.urls   | 15 ---
 gcc/config/riscv/riscv.opt.urls |  3 +++
 2 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index 81c5bb9a9270..40e8a8449367 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -238,12 +238,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx2)
 mavx512f
 UrlSuffix(gcc/x86-Options.html#index-mavx512f)
 
-mavx512pf
-UrlSuffix(gcc/x86-Options.html#index-mavx512pf)
-
-mavx512er
-UrlSuffix(gcc/x86-Options.html#index-mavx512er)
-
 mavx512cd
 UrlSuffix(gcc/x86-Options.html#index-mavx512cd)
 
@@ -262,12 +256,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx512ifma)
 mavx512vbmi
 UrlSuffix(gcc/x86-Options.html#index-mavx512vbmi)
 
-mavx5124fmaps
-UrlSuffix(gcc/x86-Options.html#index-mavx5124fmaps)
-
-mavx5124vnniw
-UrlSuffix(gcc/x86-Options.html#index-mavx5124vnniw)
-
 mavx512vpopcntdq
 UrlSuffix(gcc/x86-Options.html#index-mavx512vpopcntdq)
 
@@ -409,9 +397,6 @@ UrlSuffix(gcc/x86-Options.html#index-mrdrnd)
 mf16c
 UrlSuffix(gcc/x86-Options.html#index-mf16c)
 
-mprefetchwt1
-UrlSuffix(gcc/x86-Options.html#index-mprefetchwt1)
-
 mfentry
 UrlSuffix(gcc/x86-Options.html#index-mfentry)
 
diff --git a/gcc/config/riscv/riscv.opt.urls b/gcc/config/riscv/riscv.opt.urls
index 2f01ae5d6271..e02ef3ee3dd9 100644
--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -91,3 +91,6 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
 
 ; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
 
+mfence-tso
+UrlSuffix(gcc/RISC-V-Options.html#index-mfence-tso)
+
-- 
2.45.1



RE: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-20 Thread Li, Pan2
> See e.g. bitmask_inv_cst_vector_p which is also defined in tree.h/tree.cc.

Aha, I tried this way first and consider that maybe it should be something like 
types_match.
Thus, sent the v1 for this, will go bitmask_inv_cst_vector_p in v3.

Pan

-Original Message-
From: Tamar Christina  
Sent: Monday, May 20, 2024 7:20 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; richard.guent...@gmail.com
Subject: RE: [PATCH v1] Match: Extract integer_types_ternary_match helper to 
avoid code dup [NFC]

> -Original Message-
> From: pan2...@intel.com 
> Sent: Sunday, May 19, 2024 5:17 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com; Pan Li
> 
> Subject: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid
> code dup [NFC]
> 
> From: Pan Li 
> 
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
> 
> gcc/ChangeLog:
> 
>   * generic-match-head.cc (integer_types_ternary_match): New helper
>   function to check tenary tree type matches or not.
>   * gimple-match-head.cc (integer_types_ternary_match): Ditto but
>   for match.
>   * match.pd: Leverage above helper function to avoid code dup.
> 

Nice cleanup!

This function isn't part of the machinery of match.pd and is instead part of a 
pattern.
For these things we usually put them in tree.h/tree.cc and declare them at the 
top
of match.pd in the "define_predicates" list.

This will also allow you to get rid of the code duplication.  In addition such 
functions
which return a true/false we consider predicates, and name them ending with _p.

See e.g. bitmask_inv_cst_vector_p which is also defined in tree.h/tree.cc.

Cheers,
Tamar

> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
> 
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
> 
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
> 
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
> 
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..b291e34bbe4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0, @1
> 
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OV

RE: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD

2024-05-20 Thread Tamar Christina
Hi Pan,

> -Original Message-
> From: pan2...@intel.com 
> Sent: Monday, May 20, 2024 12:01 PM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com; Pan Li
> 
> Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD
> 
> From: Pan Li 
> 
> This patch would like to support the branch form for unsigned
> SAT_ADD.  For example as below:
> 
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   return (uint64_t) (x + y) >= x ? (x + y) : -1;
> }
> 
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
> 
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
> uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   uint64_t _2;
>   __complex__ long unsigned int _6;
>   long unsigned int _7;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_3(D), y_4(D));
>   _1 = REALPART_EXPR <_6>;
>   _7 = IMAGPART_EXPR <_6>;
>   if (_7 == 0)
> goto ; [65.00%]
>   else
> goto ; [35.00%]
> ;;succ:   4
> ;;3
> 
> ;;   basic block 3, loop depth 0
> ;;pred:   2
> ;;succ:   4
> 
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _2 = PHI <18446744073709551615(3), _1(2)>
>   return _2;
> ;;succ:   EXIT
> 
> }
> 
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _9;
> 
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call]
>   return _9;
> ;;succ:   EXIT
> }
> 
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Add new simplify to convert branch SAT_ADD into
>   branchless,  if and only if backend implement the IFN.
> 
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..0547b57b3a3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> 
> +#if GIMPLE
> +
> +/* Simplify the branch version of SAT_ADD into branchless if and only if
> +   the backend has supported the IFN_SAT_ADD.  Thus, the backend has the
> +   ability to choose branch or branchless implementation of .SAT_ADD.  */
> +
> +(simplify
> + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))
> +
> +(simplify
> + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep)
> +  (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +   (bit_ior @2 (negate (convert (lt @2 @0))
> +
> +#endif

Thanks, this looks good to me!

I'll leave it up to Richard to approve,
Richard: The reason for the direct_internal_fn_supported_p is because some
targets said that they currently handle the branch version better due to the 
lack
of some types.  At the time I reason it's just a target expansion bug but 
didn't hear anything.

To be honest, it feels to me like we should do this unconditionally, and just 
have the targets
that get faster branch version to handle it during expand? Since the patch 
series provides
a canonicalized version now.

This means we can also better support targets that have the vector optab but 
not the scalar one
as the above check would fail for these targets.

What do you think?

Thanks,
Tamar

> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1



Re: [PATCH 00/12] aarch64: Extend aarch64_feature_flags to 128 bits

2024-05-20 Thread Andrew Carlotti
On Fri, May 17, 2024 at 04:45:05PM +0100, Richard Sandiford wrote:
> Andrew Carlotti  writes:
> > The end goal of the series is to change the definition of 
> > aarch64_feature_flags
> > from a uint64_t typedef to a class with 128 bits of storage.  This class 
> > uses
> > operator overloading to mimic the existing integer interface as much as
> > possible, but with added restrictions to facilate type checking and
> > extensibility.
> >
> > Patches 01-10 are preliminary enablement work, and have passed regression
> > testing.  Are these ok for master?
> >
> > Patch 11 is an RFC, and the only patch that touches the middle end.  I am
> > seeking clarity on which part(s) of the compiler should be expected to 
> > handle
> > or prevent non-bool types in instruction pattern conditions.  The actual 
> > patch
> > does not compile by itself (though it does in combination with 12/12), but 
> > that
> > is not important to the questions I'm asking.
> >
> > Patch 12 is then a small patch that actually replaces the uint64_t typedef 
> > with
> > a class.  I think this patch is fine in it's current form, but it depends 
> > on a
> > resolution to the issues in patch 11/12 first.
> 
> Thanks for doing this.
> 
> Rather than disallowing flags == 0, etc., I think we should allow
> aarch64_feature_flags to be constructed from a single uint64_t.
> It's a lossless conversion.  The important thing is that we don't
> allow conversions the other way (and the patch doesn't allow them).

I agree that allowing conversion from a single int should be safe (albeit it
was probably helpful to disallow it during the development of this series).
It does feel a little bit strange to have a separate mechanism for
setting the first 64 bits (and zeroing the rest).

Do you consider the existing code in some places to be clearer than the new
versions in this patch series?  If so, it would be helpful to know which
patches (or parts of patches) I should drop.
 
> Also, I think we should make the new class in 12/12 be a templated
>  type that provides an N-bit bitmask.  It should arguably
> also be target-independent code.  aarch64_feature_flags would then be
> an alias with the appropriate number of bits.

I think the difficult part is to do this for generic N while still satisfying
C++11 constexpr function requirements (we can't use a loop, for example).
However, while writing this response, I've realised that I can do this using
recursion, with an N-bit bitmask being implemented as a class containing an
M-bit integer and (recursively) and (N-M)-bit bitmask.
 
> For the RFC in 11/12, how about, as another prepatch before 12/12,
> removing all the mechanical:
> 
> #define AARCH64_ISA_LS64 (aarch64_isa_flags & AARCH64_FL_LS64)
> 
> style macros and replacing uses with something like:
> 
>   AARCH64_HAVE_ISA (LS64)

This sounds like a good approach, and is roughly what I was already planning to
do (although I hadn't worked out the details yet).  I think that can entirely
replace 11/12 in the context of this series, but the questions about
instruction pattern condition type checking still ought to be addressed
separately.

> Uses outside aarch64.h should arguably be changed to TARGET_* instead,
> since the convention seems to be that TARGET_* checks the underlying
> ISA flag and also any other relevant conditions (where applicable).
> 
> Thanks,
> Richard


Re: [PATCH] wwwdocs: contribute.html: Update consensus on patch content.

2024-05-20 Thread Nick Clifton

Hi Christophe,


I have a follow-up one: I think the same applies to binutils, but I
don't think any maintainer / contributor expressed an opinion, and
IIUC patch policy for binutils is (lightly) documented at
https://sourceware.org/binutils/wiki/HowToContribute
Maybe Nick can update it? 


Done.


(I don't have such rights)


Would you like them ?  It is easy enough to set up.

Cheers
  Nick




[Patch, aarch64] Further renaming of generic code

2024-05-20 Thread Ajit Agarwal
Hello Alex/Richard:

Renaming of generic code is done to make target independent
and target dependent code to support multiple targets.

Target independent code is the Generic code with pure virtual function
to interface betwwen target independent and dependent code.

Target dependent code is the implementation of pure virtual function for
aarch64 target and the call to target independent code.

Bootstrapped and regtested on aarch64-linux-gnu.

Thanks & Regards
Ajit

aarch64: Further renaming of generic code

Renaming of generic code is done to make target independent
and target dependent code to support multiple targets.

Target independent code is the Generic code with pure virtual function
to interface betwwen target independent and dependent code.

Target dependent code is the implementation of pure virtual function for
aarch64 target and the call to target independent code.

2024-05-20  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/aarch64/aarch64-ldp-fusion.cc: Renaming of generic code
---
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 55 
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
b/gcc/config/aarch64/aarch64-ldp-fusion.cc
index 6b2a44f101b..6924e48fe7e 100644
--- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
+++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
@@ -368,7 +368,7 @@ struct aarch64_pair_fusion : public pair_fusion
 };
 
 // State used by the pass for a given basic block.
-struct ldp_bb_info
+struct pair_fusion_bb_info
 {
   using def_hash = nofree_ptr_hash;
   using expr_key_t = pair_hash>;
@@ -389,14 +389,14 @@ struct ldp_bb_info
 
   static const size_t obstack_alignment = sizeof (void *);
 
-  ldp_bb_info (bb_info *bb, pair_fusion *d)
+  pair_fusion_bb_info (bb_info *bb, pair_fusion *d)
 : m_bb (bb), m_pass (d), m_emitted_tombstone (false)
   {
 obstack_specify_allocation (&m_obstack, OBSTACK_CHUNK_SIZE,
obstack_alignment, obstack_chunk_alloc,
obstack_chunk_free);
   }
-  ~ldp_bb_info ()
+  ~pair_fusion_bb_info ()
   {
 obstack_free (&m_obstack, nullptr);
 
@@ -484,7 +484,7 @@ aarch64_pair_fusion::gen_pair (rtx *pats, rtx writeback, 
bool load_p)
 }
 
 splay_tree_node *
-ldp_bb_info::node_alloc (access_record *access)
+pair_fusion_bb_info::node_alloc (access_record *access)
 {
   using T = splay_tree_node;
   void *addr = obstack_alloc (&m_obstack, sizeof (T));
@@ -532,7 +532,7 @@ drop_writeback (rtx mem)
 // RTX_AUTOINC addresses.  The interface is like strip_offset except we take a
 // MEM so that we know the mode of the access.
 static rtx
-ldp_strip_offset (rtx mem, poly_int64 *offset)
+pair_mem_strip_offset (rtx mem, poly_int64 *offset)
 {
   rtx addr = XEXP (mem, 0);
 
@@ -658,7 +658,8 @@ access_group::track (Alloc alloc_node, poly_int64 offset, 
insn_info *insn)
 // MEM_EXPR base (i.e. a tree decl) relative to which we can track the access.
 // LFS is used as part of the key to the hash table, see track_access.
 bool
-ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, lfs_fields lfs)
+pair_fusion_bb_info::track_via_mem_expr (insn_info *insn, rtx mem,
+lfs_fields lfs)
 {
   if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem))
 return false;
@@ -706,7 +707,7 @@ ldp_bb_info::track_via_mem_expr (insn_info *insn, rtx mem, 
lfs_fields lfs)
 // this basic block.  LOAD_P is true if the access is a load, and MEM
 // is the mem rtx that occurs in INSN.
 void
-ldp_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
+pair_fusion_bb_info::track_access (insn_info *insn, bool load_p, rtx mem)
 {
   // We can't combine volatile MEMs, so punt on these.
   if (MEM_VOLATILE_P (mem))
@@ -739,7 +740,7 @@ ldp_bb_info::track_access (insn_info *insn, bool load_p, 
rtx mem)
   poly_int64 mem_off;
   rtx addr = XEXP (mem, 0);
   const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
-  rtx base = ldp_strip_offset (mem, &mem_off);
+  rtx base = pair_mem_strip_offset (mem, &mem_off);
   if (!REG_P (base))
 return;
 
@@ -1099,7 +1100,7 @@ def_upwards_move_range (def_info *def)
 // Class that implements a state machine for building the changes needed to 
form
 // a store pair instruction.  This allows us to easily build the changes in
 // program order, as required by rtl-ssa.
-struct stp_change_builder
+struct store_change_builder
 {
   enum class state
   {
@@ -1126,7 +1127,7 @@ struct stp_change_builder
 
   bool done () const { return m_state == state::DONE; }
 
-  stp_change_builder (insn_info *insns[2],
+  store_change_builder (insn_info *insns[2],
  insn_info *repurpose,
  insn_info *dest)
 : m_state (state::FIRST), m_insns { insns[0], insns[1] },
@@ -1402,7 +1403,7 @@ extract_writebacks (bool load_p, rtx pats[2], int changed)
   const bool autoinc_p = GET_RTX_CLASS (GET_CODE (addr)) == RTX_AUTOINC;
 
   poly_in

Re: [Patch, aarch64] Further renaming of generic code

2024-05-20 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Alex/Richard:
>
> Renaming of generic code is done to make target independent
> and target dependent code to support multiple targets.
>
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> Bootstrapped and regtested on aarch64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
> aarch64: Further renaming of generic code
>
> Renaming of generic code is done to make target independent
> and target dependent code to support multiple targets.
>
> Target independent code is the Generic code with pure virtual function
> to interface betwwen target independent and dependent code.
>
> Target dependent code is the implementation of pure virtual function for
> aarch64 target and the call to target independent code.
>
> 2024-05-20  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-ldp-fusion.cc: Renaming of generic code

* config/aarch64/aarch64-ldp-fusion.cc: Rename generic parts of code
to avoid "ldp" and "stp".

> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 55 
>  1 file changed, 28 insertions(+), 27 deletions(-)
>
> [...]
> @@ -1126,7 +1127,7 @@ struct stp_change_builder
>  
>bool done () const { return m_state == state::DONE; }
>  
> -  stp_change_builder (insn_info *insns[2],
> +  store_change_builder (insn_info *insns[2],
> insn_info *repurpose,
> insn_info *dest)

Please reindent the parameters for the new longer name.

>  : m_state (state::FIRST), m_insns { insns[0], insns[1] },
> [...]
> @@ -1916,7 +1917,7 @@ fixup_debug_uses (obstack_watermark &attempt,
>  // BASE gives the chosen base candidate for the pair and MOVE_RANGE is
>  // a singleton range which says where to place the pair.
>  bool
> -ldp_bb_info::fuse_pair (bool load_p,
> +pair_fusion_bb_info::fuse_pair (bool load_p,
>   unsigned access_size,
>   int writeback,
>   insn_info *i1, insn_info *i2,

Same here.

> @@ -2687,7 +2688,7 @@ pair_fusion::get_viable_bases (insn_info *insns[2],
>  // ACCESS_SIZE gives the (common) size of a single access, LOAD_P is true
>  // if the accesses are both loads, otherwise they are both stores.
>  bool
> -ldp_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
> +pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
>   insn_info *i1, insn_info *i2)
>  {
>if (dump_file)

And here.

OK with those changes, thanks.

Richard


[PATCH v3] aarch64: Fix normal returns inside functions which use eh_returns [PR114843]

2024-05-20 Thread Wilco Dijkstra
Hi Andrew,

A few comments on the implementation, I think it can be simplified a lot:

> +++ b/gcc/config/aarch64/aarch64.h
> @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = 
> AARCH64_FL_SM_OFF;
> #define DWARF2_UNWIND_INFO 1
>  
>  /* Use R0 through R3 to pass exception handling information.  */
> +#define EH_RETURN_DATA_REGISTERS_N 4
>  #define EH_RETURN_DATA_REGNO(N) \
> -  ((N) < 4 ? ((unsigned int) R0_REGNUM + (N)) : INVALID_REGNUM)
> +  ((N) < EH_RETURN_DATA_REGISTERS_N ? ((unsigned int) R0_REGNUM + (N)) : 
> INVALID_REGNUM)
 
It would be useful to add a macro IS_EH_RETURN_REGNUM(regnum) that just checks
the range R0_REGNUM to R0_REGNUM + EH_RETURN_DATA_REGISTERS_N.

> @@ -929,6 +928,7 @@ struct GTY (()) aarch64_frame
>  outgoing arguments) of each register save slot, or -2 if no save is
>  needed.  */
>   poly_int64 reg_offset[LAST_SAVED_REGNUM + 1];
> +  bool eh_return_allocated[EH_RETURN_DATA_REGISTERS_N];

This doesn't make much sense - besides X0-X3, we also need X5 and X6 for 
eh_return.
If these or any of the other temporaries used by epilog are callee-saved 
somehow,
things are going horribly wrong already... So what do we gain by doing this?


> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -7792,6 +7792,7 @@ aarch64_layout_frame (void)
> 
>  #define SLOT_NOT_REQUIRED (-2)
>  #define SLOT_REQUIRED (-1)
> +#define SLOT_EH_RETURN_REQUIRED (-3)
 
I don't see a need for this.


> @@ -7949,6 +7950,18 @@ aarch64_layout_frame (void)
> stopping it from being individually shrink-wrapped.  */
>  allocate_gpr_slot (R30_REGNUM);
>  
> +  /* Allocate the eh_return first. */
> +  if (crtl->calls_eh_return)
> +for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; regno++)
> +  {
> + int realregno = EH_RETURN_DATA_REGNO (regno);
> + if (known_eq (frame.reg_offset[realregno], SLOT_EH_RETURN_REQUIRED))
> +   {
> + frame.eh_return_allocated[regno] = true;
> + allocate_gpr_slot (realregno);
> +   }
> +  }

This change is unnecessary if we just mark the slots with SLOT_REQUIRED.


> @@ -8035,6 +8048,23 @@ aarch64_layout_frame (void)
>   frame.wb_pop_candidate1 = frame.wb_push_candidate1;
>   frame.wb_pop_candidate2 = frame.wb_push_candidate2;
>  
> +  /* EH data registers are not pop canidates. */
> +  if (crtl->calls_eh_return)
> +for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; 
> regno++)> 
> +  {
> + if (frame.eh_return_allocated[regno]
> + && frame.wb_pop_candidate1 == EH_RETURN_DATA_REGNO (regno))
> + {
> +   frame.wb_pop_candidate1 = frame.wb_pop_candidate2;
> +   frame.wb_pop_candidate2 = INVALID_REGNUM;
> + }
> + if (frame.eh_return_allocated[regno]
> + && frame.wb_pop_candidate2 == EH_RETURN_DATA_REGNO (regno))
> + {
> +   frame.wb_pop_candidate2 = INVALID_REGNUM;
> + }
> +  }

This is unnecessary since we can just avoid making them push candidates
if there is no frame chain, eg:

if ((!crtl->calls_eh_return || frame.emit_frame_chain) && !push_regs.empty ()
  && known_eq (frame.reg_offset[push_regs[0]], frame.bytes_below_hard_fp))


@@ -8681,6 +8712,20 @@ aarch64_restore_callee_saves (poly_int64 bytes_below_sp,
   if (frame.is_scs_enabled && regno == LR_REGNUM)
return true;
 
+  /* Skip the eh return data registers if we are
+returning normally rather than via eh_return. */
+  if (!was_eh_return && crtl->calls_eh_return)
+   {
+ for (unsigned ehregno = 0;
+  EH_RETURN_DATA_REGNO (ehregno) != INVALID_REGNUM;
+  ehregno++)
+   {
+ if (EH_RETURN_DATA_REGNO (ehregno) == regno
+ && frame.eh_return_allocated[ehregno])
+   return true;
+   }
+   }
+

So this could be something like:

  if (!was_eh_return && crtl->calls_eh_return && IS_EH_RETURN_REGNUM 
(regno))
return true;
 
Cheers,
Wilco

[committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-20 Thread Gerald Pfeifer
On Wed, 5 Jul 2023, Joern Rennecke wrote:
> I haven't worked with these targets in years and can't really do 
> sensible maintenance or reviews of patches for them. I am currently 
> working on optimizations for other ports like RISC-V.

I noticed MAINTAINERS was not updated, so pushed the patch below.

Gerald


commit f94598ffaf5affbc9421ff230502357b07c55d9c
Author: Gerald Pfeifer 
Date:   Mon May 20 16:43:05 2024 +0200

MAINTAINERS: Update Joern Rennecke's status

This is per his mail to g...@gcc.gnu.org on 7 Jul 2023.

ChangeLog:
* MAINTAINERS: Move Joern Rennecke from arc and epiphany maintainer
to Write After Approval.

diff --git a/MAINTAINERS b/MAINTAINERS
index 8e0add6bef8..e2870eef2ef 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -56,7 +56,6 @@ aarch64 port  Kyrylo Tkachov  

 alpha port Richard Henderson   
 amdgcn portJulian Brown
 amdgcn portAndrew Stubbs   
-arc port   Joern Rennecke  
 arc port   Claudiu Zissulescu  
 arm port   Nick Clifton
 arm port   Richard Earnshaw
@@ -68,7 +67,6 @@ c6x port  Bernd Schmidt   

 cris port  Hans-Peter Nilsson  
 c-sky port Xianmiao Qu 
 c-sky port Yunhai Shang
-epiphany port  Joern Rennecke  
 fr30 port  Nick Clifton
 frv port   Nick Clifton
 frv port   Alexandre Oliva 
@@ -634,6 +632,7 @@ Joe Ramsay  

 Rolf Rasmussen 
 Fritz Reese
 Volker Reichelt

+Joern Rennecke 
 Bernhard Reutner-Fischer   
 Tom Rix
 Thomas Rodgers 


Re: [PATCH 00/12] aarch64: Extend aarch64_feature_flags to 128 bits

2024-05-20 Thread Richard Sandiford
Andrew Carlotti  writes:
> On Fri, May 17, 2024 at 04:45:05PM +0100, Richard Sandiford wrote:
>> Andrew Carlotti  writes:
>> > The end goal of the series is to change the definition of 
>> > aarch64_feature_flags
>> > from a uint64_t typedef to a class with 128 bits of storage.  This class 
>> > uses
>> > operator overloading to mimic the existing integer interface as much as
>> > possible, but with added restrictions to facilate type checking and
>> > extensibility.
>> >
>> > Patches 01-10 are preliminary enablement work, and have passed regression
>> > testing.  Are these ok for master?
>> >
>> > Patch 11 is an RFC, and the only patch that touches the middle end.  I am
>> > seeking clarity on which part(s) of the compiler should be expected to 
>> > handle
>> > or prevent non-bool types in instruction pattern conditions.  The actual 
>> > patch
>> > does not compile by itself (though it does in combination with 12/12), but 
>> > that
>> > is not important to the questions I'm asking.
>> >
>> > Patch 12 is then a small patch that actually replaces the uint64_t typedef 
>> > with
>> > a class.  I think this patch is fine in it's current form, but it depends 
>> > on a
>> > resolution to the issues in patch 11/12 first.
>> 
>> Thanks for doing this.
>> 
>> Rather than disallowing flags == 0, etc., I think we should allow
>> aarch64_feature_flags to be constructed from a single uint64_t.
>> It's a lossless conversion.  The important thing is that we don't
>> allow conversions the other way (and the patch doesn't allow them).
>
> I agree that allowing conversion from a single int should be safe (albeit it
> was probably helpful to disallow it during the development of this series).
> It does feel a little bit strange to have a separate mechanism for
> setting the first 64 bits (and zeroing the rest).

With a templated class, I think it makes sense.  The constructor would
take a variable number of arguments and any unspecified elements would
implicitly be zero.  In that sense, a single uint64_t isn't a special
case.  It's just an instance of a generic rule.

> Do you consider the existing code in some places to be clearer than the new
> versions in this patch series?  If so, it would be helpful to know which
> patches (or parts of patches) I should drop.

Probably patches 3, 4, and (for unrelated reasons) 9.  (9 feels like
a microoptimisation, given that the underlying issue has been fixed.)

>> Also, I think we should make the new class in 12/12 be a templated
>>  type that provides an N-bit bitmask.  It should arguably
>> also be target-independent code.  aarch64_feature_flags would then be
>> an alias with the appropriate number of bits.
>
> I think the difficult part is to do this for generic N while still satisfying
> C++11 constexpr function requirements (we can't use a loop, for example).
> However, while writing this response, I've realised that I can do this using
> recursion, with an N-bit bitmask being implemented as a class containing an
> M-bit integer and (recursively) and (N-M)-bit bitmask.

I think it'd be better to keep a flat object, not least for debugging.

Things like operator| could be handled using code like:


template
struct operators
{
  template
  static constexpr Result binary(Operator op, const Arg &x, const Arg &y,
 Rest ...rest)
  {
return operators::template binary
  (op, x, y, op (x[N - 1], y[N - 1]), rest...);
  }
};

template<>
struct operators<0>
{
  template
  static constexpr Result binary(Operator op, const Arg &x, const Arg &y,
 Rest ...rest)
  {
return Result { rest... };
  }
};

using T = std::array;

template
constexpr T f(T x, T y) { return x | y; }
constexpr T x = { 1, 2 };
constexpr T y = { 0x100, 0x400 };
constexpr T z = operators<2>::binary (f, x, y);


(Unfortunately, constexpr lambdas are also not supported in C++11.)

>> For the RFC in 11/12, how about, as another prepatch before 12/12,
>> removing all the mechanical:
>> 
>> #define AARCH64_ISA_LS64(aarch64_isa_flags & AARCH64_FL_LS64)
>> 
>> style macros and replacing uses with something like:
>> 
>>   AARCH64_HAVE_ISA (LS64)
>
> This sounds like a good approach, and is roughly what I was already planning 
> to
> do (although I hadn't worked out the details yet).  I think that can entirely
> replace 11/12 in the context of this series, but the questions about
> instruction pattern condition type checking still ought to be addressed
> separately.

Yeah, stronger typing would be good.  I think in practice the generators
should add the "bool (...)" wrapper.

Thanks,
Richard


RE: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-20 Thread Jiang, Haochen
Thanks for your help! I haven't noticed this file is newly added to GCC.
I suppose that is why the buildbot is reporting something the whole
afternoon for me.

So just for confirm, does that mean we will always need to run
gcc/regenerate-opt-urls.py after adding or removing options in GCC?
My current understanding is yes.

Thx,
Haochen

> -Original Message-
> From: Mark Wielaard 
> Sent: Monday, May 20, 2024 7:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Palmer Dabbelt ; Jeff Law
> ; Jiang, Haochen ; Hu,
> Lin1 ; Mark Wielaard 
> Subject: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls
> 
> risc-v added an -mfence-tso option. i386 removed Xeon Phi ISA support options.
> But the opt.urls files weren't regenerated.
> 
> Fixes: a6114c2a6911 ("RISC-V: Implement -m{,no}fence-tso")
> Fixes: e1a7e2c54d52 ("i386: Remove Xeon Phi ISA support")
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.opt.urls: Regenerate.
>   * config/i386/i386.opt.urls: Likewise.
> ---
>  gcc/config/i386/i386.opt.urls   | 15 ---
>  gcc/config/riscv/riscv.opt.urls |  3 +++
>  2 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls 
> index
> 81c5bb9a9270..40e8a8449367 100644
> --- a/gcc/config/i386/i386.opt.urls
> +++ b/gcc/config/i386/i386.opt.urls
> @@ -238,12 +238,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx2)
>  mavx512f
>  UrlSuffix(gcc/x86-Options.html#index-mavx512f)
> 
> -mavx512pf
> -UrlSuffix(gcc/x86-Options.html#index-mavx512pf)
> -
> -mavx512er
> -UrlSuffix(gcc/x86-Options.html#index-mavx512er)
> -
>  mavx512cd
>  UrlSuffix(gcc/x86-Options.html#index-mavx512cd)
> 
> @@ -262,12 +256,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx512ifma)
>  mavx512vbmi
>  UrlSuffix(gcc/x86-Options.html#index-mavx512vbmi)
> 
> -mavx5124fmaps
> -UrlSuffix(gcc/x86-Options.html#index-mavx5124fmaps)
> -
> -mavx5124vnniw
> -UrlSuffix(gcc/x86-Options.html#index-mavx5124vnniw)
> -
>  mavx512vpopcntdq
>  UrlSuffix(gcc/x86-Options.html#index-mavx512vpopcntdq)
> 
> @@ -409,9 +397,6 @@ UrlSuffix(gcc/x86-Options.html#index-mrdrnd)
>  mf16c
>  UrlSuffix(gcc/x86-Options.html#index-mf16c)
> 
> -mprefetchwt1
> -UrlSuffix(gcc/x86-Options.html#index-mprefetchwt1)
> -
>  mfentry
>  UrlSuffix(gcc/x86-Options.html#index-mfentry)
> 
> diff --git a/gcc/config/riscv/riscv.opt.urls 
> b/gcc/config/riscv/riscv.opt.urls index
> 2f01ae5d6271..e02ef3ee3dd9 100644
> --- a/gcc/config/riscv/riscv.opt.urls
> +++ b/gcc/config/riscv/riscv.opt.urls
> @@ -91,3 +91,6 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
> 
>  ; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
> 
> +mfence-tso
> +UrlSuffix(gcc/RISC-V-Options.html#index-mfence-tso)
> +
> --
> 2.45.1



Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-20 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, May 17, 2024 at 11:56 AM Tamar Christina
>  wrote:
>>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Friday, May 17, 2024 10:46 AM
>> > To: Tamar Christina 
>> > Cc: Victor Do Nascimento ; gcc-
>> > patc...@gcc.gnu.org; Richard Sandiford ; Richard
>> > Earnshaw ; Victor Do Nascimento
>> > 
>> > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > autovectorizer
>> >
>> > On Fri, May 17, 2024 at 11:05 AM Tamar Christina
>> >  wrote:
>> > >
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Friday, May 17, 2024 6:51 AM
>> > > > To: Victor Do Nascimento 
>> > > > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
>> > ;
>> > > > Richard Earnshaw ; Victor Do Nascimento
>> > > > 
>> > > > Subject: Re: [PATCH] middle-end: Expand {u|s}dot product support in
>> > > > autovectorizer
>> > > >
>> > > > On Thu, May 16, 2024 at 4:40 PM Victor Do Nascimento
>> > > >  wrote:
>> > > > >
>> > > > > From: Victor Do Nascimento 
>> > > > >
>> > > > > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
>> > > > > optabs for dealing with vectorizable dot product code sequences.  The
>> > > > > consequence of using a direct optab for this is that backend-pattern
>> > > > > selection is only ever able to match against one datatype - Either
>> > > > > that of the operands or of the accumulated value, never both.
>> > > > >
>> > > > > With the introduction of the 2-way (un)signed dot-product insn [1][2]
>> > > > > in AArch64 SVE2, the existing direct opcode approach is no longer
>> > > > > sufficient for full specification of all the possible dot product
>> > > > > machine instructions to be matched to the code sequence; a dot 
>> > > > > product
>> > > > > resulting in VNx4SI may result from either dot products on VNx16QI or
>> > > > > VNx8HI values for the 4- and 2-way dot product operations, 
>> > > > > respectively.
>> > > > >
>> > > > > This means that the following example fails autovectorization:
>> > > > >
>> > > > > uint32_t foo(int n, uint16_t* data) {
>> > > > >   uint32_t sum = 0;
>> > > > >   for (int i=0; i> > > > > sum += data[i] * data[i];
>> > > > >   }
>> > > > >   return sum;
>> > > > > }
>> > > > >
>> > > > > To remedy the issue a new optab is added, tentatively named
>> > > > > `udot_prod_twoway_optab', whose selection is dependent upon checking
>> > > > > of both input and output types involved in the operation.
>> > > >
>> > > > I don't like this too much.  I'll note we document dot_prod as
>> > > >
>> > > > @cindex @code{sdot_prod@var{m}} instruction pattern
>> > > > @item @samp{sdot_prod@var{m}}
>> > > >
>> > > > Compute the sum of the products of two signed elements.
>> > > > Operand 1 and operand 2 are of the same mode. Their
>> > > > product, which is of a wider mode, is computed and added to operand 3.
>> > > > Operand 3 is of a mode equal or wider than the mode of the product. The
>> > > > result is placed in operand 0, which is of the same mode as operand 3.
>> > > > @var{m} is the mode of operand 1 and operand 2.
>> > > >
>> > > > with no restriction on the wider mode but we don't specify it which is
>> > > > bad design.  This should have been a convert optab with two modes
>> > > > from the start - adding a _twoway variant is just a hack.
>> > >
>> > > We did discuss this at the time we started implementing it.  There was 
>> > > two
>> > > options, one was indeed to change it to a convert dot_prod optab, but 
>> > > doing
>> > > this means we have to update every target that uses it.
>> > >
>> > > Now that means 3 ISAs for AArch64, Arm, Arc, c6x, 2 for x86, loongson and
>> > altivec.
>> > >
>> > > Which sure could be possible, but there's also every use in the backends 
>> > > that
>> > need
>> > > to be updated, and tested, which for some targets we don't even know how 
>> > > to
>> > begin.
>> > >
>> > > So it seems very hard to correct dotprod to a convert optab now.
>> >
>> > It's still the correct way to go.  At _least_ your new pattern should
>> > have been this,
>> > otherwise what do you do when you have two-way, four-way and eight-way
>> > variants?
>> > Add yet another optab?
>>
>> I guess that's fair, but having the new optab only be convert resulted in 
>> messy
>> code as everywhere you must check for both variants.
>>
>> Additionally that optab would then overlap with the existing optabs as, as 
>> you
>> Say, the documentation only says it's of a wider type and doesn't indicate
>> precision.
>>
>> So to avoid issues down the line then If the new optab isn't acceptable then
>> we'll have to do a wholesale conversion then..
>
> Yep.  It shouldn't be difficult though.

Still catching up, but FWIW, I agree this is the way to go.  (Convert all
existing dot_prods to convert optabs first, and then add the new AArch64
ones.)  Having two mechanisms feels like storing up trouble for later. :)

Richard


Re: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-20 Thread David Malcolm
On Mon, 2024-05-20 at 16:19 +, Jiang, Haochen wrote:
> Thanks for your help! I haven't noticed this file is newly added to
> GCC.
> I suppose that is why the buildbot is reporting something the whole
> afternoon for me.
> 
> So just for confirm, does that mean we will always need to run
> gcc/regenerate-opt-urls.py after adding or removing options in GCC?
> My current understanding is yes.

Yes please (and make sure you've got a clean build of the HTML docs
with the new options added when you do)

Though if you forget, the only problem will be some missing URLs at the
command line, and complaints from the CI.

Dave

> 
> Thx,
> Haochen
> 
> > -Original Message-
> > From: Mark Wielaard 
> > Sent: Monday, May 20, 2024 7:22 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Palmer Dabbelt ; Jeff Law
> > ; Jiang, Haochen ;
> > Hu,
> > Lin1 ; Mark Wielaard 
> > Subject: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls
> > 
> > risc-v added an -mfence-tso option. i386 removed Xeon Phi ISA
> > support options.
> > But the opt.urls files weren't regenerated.
> > 
> > Fixes: a6114c2a6911 ("RISC-V: Implement -m{,no}fence-tso")
> > Fixes: e1a7e2c54d52 ("i386: Remove Xeon Phi ISA support")
> > 
> > gcc/ChangeLog:
> > 
> > * config/riscv/riscv.opt.urls: Regenerate.
> > * config/i386/i386.opt.urls: Likewise.
> > ---
> >  gcc/config/i386/i386.opt.urls   | 15 ---
> >  gcc/config/riscv/riscv.opt.urls |  3 +++
> >  2 files changed, 3 insertions(+), 15 deletions(-)
> > 
> > diff --git a/gcc/config/i386/i386.opt.urls
> > b/gcc/config/i386/i386.opt.urls index
> > 81c5bb9a9270..40e8a8449367 100644
> > --- a/gcc/config/i386/i386.opt.urls
> > +++ b/gcc/config/i386/i386.opt.urls
> > @@ -238,12 +238,6 @@ UrlSuffix(gcc/x86-Options.html#index-mavx2)
> >  mavx512f
> >  UrlSuffix(gcc/x86-Options.html#index-mavx512f)
> > 
> > -mavx512pf
> > -UrlSuffix(gcc/x86-Options.html#index-mavx512pf)
> > -
> > -mavx512er
> > -UrlSuffix(gcc/x86-Options.html#index-mavx512er)
> > -
> >  mavx512cd
> >  UrlSuffix(gcc/x86-Options.html#index-mavx512cd)
> > 
> > @@ -262,12 +256,6 @@ UrlSuffix(gcc/x86-Options.html#index-
> > mavx512ifma)
> >  mavx512vbmi
> >  UrlSuffix(gcc/x86-Options.html#index-mavx512vbmi)
> > 
> > -mavx5124fmaps
> > -UrlSuffix(gcc/x86-Options.html#index-mavx5124fmaps)
> > -
> > -mavx5124vnniw
> > -UrlSuffix(gcc/x86-Options.html#index-mavx5124vnniw)
> > -
> >  mavx512vpopcntdq
> >  UrlSuffix(gcc/x86-Options.html#index-mavx512vpopcntdq)
> > 
> > @@ -409,9 +397,6 @@ UrlSuffix(gcc/x86-Options.html#index-mrdrnd)
> >  mf16c
> >  UrlSuffix(gcc/x86-Options.html#index-mf16c)
> > 
> > -mprefetchwt1
> > -UrlSuffix(gcc/x86-Options.html#index-mprefetchwt1)
> > -
> >  mfentry
> >  UrlSuffix(gcc/x86-Options.html#index-mfentry)
> > 
> > diff --git a/gcc/config/riscv/riscv.opt.urls
> > b/gcc/config/riscv/riscv.opt.urls index
> > 2f01ae5d6271..e02ef3ee3dd9 100644
> > --- a/gcc/config/riscv/riscv.opt.urls
> > +++ b/gcc/config/riscv/riscv.opt.urls
> > @@ -91,3 +91,6 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-
> > strlen)
> > 
> >  ; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
> > 
> > +mfence-tso
> > +UrlSuffix(gcc/RISC-V-Options.html#index-mfence-tso)
> > +
> > --
> > 2.45.1
> 



epilogue expansion alloca codepath (was Re: [PATCH v2 2/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733])

2024-05-20 Thread Vineet Gupta
On 5/14/24 08:44, Jeff Law wrote:
>>> I was able to find the summary info:
>>>
 Tests that now fail, but worked before (15 tests):
 libgomp: libgomp.fortran/simd7.f90   -O0  execution test
 libgomp: libgomp.fortran/task2.f90   -O0  execution test
 libgomp: libgomp.fortran/vla2.f90   -O0  execution test
 libgomp: libgomp.fortran/vla3.f90   -O3 -fomit-frame-pointer - 
 funroll-loops -fpeel-loops -ftracer -finline-functions execution test
 libgomp: libgomp.fortran/vla3.f90   -O3 -g  execution test
 libgomp: libgomp.fortran/vla4.f90   -O1  execution test
 libgomp: libgomp.fortran/vla4.f90   -O2  execution test
 libgomp: libgomp.fortran/vla4.f90   -O3 -fomit-frame-pointer - 
 funroll-loops -fpeel-loops -ftracer -finline-functions execution test
 libgomp: libgomp.fortran/vla4.f90   -O3 -g  execution test
 libgomp: libgomp.fortran/vla4.f90   -Os  execution test
 libgomp: libgomp.fortran/vla5.f90   -O1  execution test
 libgomp: libgomp.fortran/vla5.f90   -O2  execution test
 libgomp: libgomp.fortran/vla5.f90   -O3 -fomit-frame-pointer - 
 funroll-loops -fpeel-loops -ftracer -finline-functions execution test
 libgomp: libgomp.fortran/vla5.f90   -O3 -g  execution test
 libgomp: libgomp.fortran/vla5.f90   -Os  execution test
>>> So if you could check on those, it'd be appreciated.
>> I checked rv64gcv linux and those do not currently run in CI.
> So just ran with Vineet's patch in our CI system.  His patch is still 
> triggering those regressions.  So we need to get that resolved before 
> that second patch can go in.

So CI pointed to a lone Fortran execute failure which is very likely
also causing above.

    FAIL: gfortran.dg/PR93963.f90 -O0 execution test

Turns out the alloca codepath in epilogue expansion is simply busted -
I'm surprised that we only see 1 failure in the entire testsuite run
(libgomp run aside).

> -      if (!SMALL_OPERAND (adjust_offset.to_constant ()))
> +      HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
> +      if (SMALL_OPERAND (adj_off_value))
> +        {
> +      adjust = GEN_INT (adj_off_value);
> +        }
> +      else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
> +        {
> +      HOST_WIDE_INT base, off;
> +      riscv_split_sum_of_two_s12 (adj_off_value, &base, &off);
> +      insn = gen_add3_insn (stack_pointer_rtx,
hard_frame_pointer_rtx,
> +                    GEN_INT (base));

1. Missing gen_insn( ) here causes the insn to be overwritten by the
subsequent emit_insn below.

2. In sum of two s12 logic first insn is sp = xx + 2048, with the
additional insn expected to be of form
    sp = sp + off corresponding to
       stack_pointer_rtx+ stack_pointer_rtx+ off
    which the following emit_insn () below is not.
...

>    insn = emit_insn ( gen_add3_insn (stack_pointer_rtx,
hard_frame_pointer_rtx,
>                                                                adjust));

3. Yet another issue had to do with which insn should the dwarf be
attached -and the adj needed adjusting :-)
So In v3 I've split this patch into the main prologue/epilogue change
and one from the alloca one - which depending on reviewer guidance
(aesthetics, ugliness, trying to keep uniformity etc ? can be decided upon.

Thx,
-Vineet


Re: [COMMITTED] Regenerate riscv.opt.urls and i386.opt.urls

2024-05-20 Thread Mark Wielaard
Hi,

On Mon, 2024-05-20 at 12:37 -0400, David Malcolm wrote:
> On Mon, 2024-05-20 at 16:19 +, Jiang, Haochen wrote:
> > Thanks for your help! I haven't noticed this file is newly added to
> > GCC.
> > I suppose that is why the buildbot is reporting something the whole
> > afternoon for me.
> > 
> > So just for confirm, does that mean we will always need to run
> > gcc/regenerate-opt-urls.py after adding or removing options in GCC?
> > My current understanding is yes.
> 
> Yes please (and make sure you've got a clean build of the HTML docs
> with the new options added when you do)
> 
> Though if you forget, the only problem will be some missing URLs at the
> command line, and complaints from the CI.

Also note that the CI will provide a diff that is most likely the
correct patch you need to apply. e.g. for the last issue:
https://builder.sourceware.org/buildbot/#/builders/269/builds/5194/steps/8/logs/stdio

It is still recommended you run regenerate-opt-urls yourself. But if
you just quickly want to shut up the buildbot then you cannot really go
wrong with just applying the diff it generated for you.

Cheers,

Mark


CFG edge visualization to path-printing bootstrap failure

2024-05-20 Thread David Edelsohn
Hi, David

Unfortunately r15-636-g770657d02c986c causes a bootstrap failure on AIX
when building f951 in stage2.  cc1 and cc1plus link successfully. There
doesn't seem to be a similar failure for powerpc64-linux BE or LE.

The failure is

ld: 0711-317 ERROR: Undefined symbol: _ZTV29range_label_for_type_mismatch
ld: 0711-317 ERROR: Undefined symbol:
._ZNK29range_label_for_type_mismatch8get_textEj

which corresponds to

vtable for range_label_for_type_mismatch
range_label_for_type_mismatch::get_text(unsigned int) const

I suspect that something is not being explicitly instantiated, which is
running afoul of the AIX linker.

Somehow your patch is causing the f951 compiler to reference these
additional, undefined symbols.  I suspect that they also are undefined for
Linux targets, but the linker ignores the error and nothing is amiss if the
symbols never are called.

Thanks, David


Re: [Patch, aarch64, middle-end] Move pair_fusion pass from aarch64 to middle-end

2024-05-20 Thread Alex Coplan
On 20/05/2024 18:44, Alex Coplan wrote:
> Hi Ajit,
> 
> On 20/05/2024 21:50, Ajit Agarwal wrote:
> > Hello Alex/Richard:
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > Bootstrapped and regtested on aarch64-linux-gnu.
> > 
> > Thanks & Regards
> > Ajit
> > 
> > aarch64, middle-end: Move pair_fusion pass from aarch64 to middle-end
> > 
> > Move pair fusion pass from aarch64-ldp-fusion.cc to middle-end
> > to support multiple targets.
> > 
> > Common infrastructure of load store pair fusion is divided into
> > target independent and target dependent code.
> > 
> > Target independent code is structured in the following files.
> > gcc/pair-fusion.h
> > gcc/pair-fusion.cc
> > 
> > Target independent code is the Generic code with pure virtual
> > function to interface betwwen target independent and dependent
> > code.
> > 
> > 2024-05-20  Ajit Kumar Agarwal  
> > 
> > gcc/ChangeLog:
> > 
> > * pair-fusion.h: Generic header code for load store fusion
> 
> Insert "pair" before fusion?
> 
> > that can be shared across different architectures.
> > * pair-fusion.cc: Generic source code implementation for
> > load store fusion that can be shared across different architectures.
> 
> Likewise.
> 
> > * Makefile.in: Add new executable pair-fusion.o
> 
> It's not an executable but an object file.
> 
> > * config/aarch64/aarch64-ldp-fusion.cc: Target specific
> > code for load store fusion of aarch64.
> 
> I guess this should say something like: "Delete generic code and move it
> to pair-fusion.cc in the middle-end."
> 
> I've left some comments below on the header file.  The rest of the patch
> looks pretty good to me.  I tried diffing the original contents of
> aarch64-ldp-fusion.cc with pair-fusion.cc, and that looks as expected.
> 



> > diff --git a/gcc/pair-fusion.h b/gcc/pair-fusion.h
> > new file mode 100644
> > index 000..00f6d3e149a
> > --- /dev/null
> > +++ b/gcc/pair-fusion.h
> > @@ -0,0 +1,340 @@
> > +// Pair Mem fusion generic header file.
> > +// Copyright (C) 2024 Free Software Foundation, Inc.
> > +//
> > +// This file is part of GCC.
> > +//
> > +// GCC is free software; you can redistribute it and/or modify it
> > +// under the terms of the GNU General Public License as published by
> > +// the Free Software Foundation; either version 3, or (at your option)
> > +// any later version.
> > +//
> > +// GCC is distributed in the hope that it will be useful, but
> > +// WITHOUT ANY WARRANTY; without even the implied warranty of
> > +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +// General Public License for more details.
> > +//
> > +// You should have received a copy of the GNU General Public License
> > +// along with GCC; see the file COPYING3.  If not see
> > +// .
> > +
> > +#define INCLUDE_ALGORITHM
> > +#define INCLUDE_FUNCTIONAL
> > +#define INCLUDE_LIST
> > +#define INCLUDE_TYPE_TRAITS
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "rtl.h"
> > +#include "df.h"
> > +#include "rtl-iter.h"
> > +#include "rtl-ssa.h"
> 
> I'm not sure how desirable this is, but you might be able to
> forward-declare RTL-SSA types like this:
> 
> class def_info;
> class insn_info;
> class insn_range_info;
> 
> thus removing the need to include the header here, since the interface
> only refers to these types by pointer or reference.
> 
> Richard: please say if you'd prefer keeping the include.
> 
> > +#include "cfgcleanup.h"
> > +#include "tree-pass.h"
> > +#include "ordered-hash-map.h"
> > +#include "tree-dfa.h"
> > +#include "fold-const.h"
> > +#include "tree-hash-traits.h"
> > +#include "print-tree.h"
> > +#include "insn-attr.h"
> 
> I expect we don't need all of these includes here.  I think we should
> have the minimum necessary set of includes here and most of the includes
> should be in the *.cc files.
> 
> > +
> > +using namespace rtl_ssa;
> > +
> > +// We pack these fields (load_p, fpsimd_p, and size) into an integer
> > +// (LFS) which we use as part of the key into the main hash tables.
> > +//
> > +// The idea is that we group candidates together only if they agree on
> > +// the fields below.  Candidates that disagree on any of these
> > +// properties shouldn't be merged together.
> > +struct lfs_fields
> > +{
> > +  bool load_p;
> > +  bool fpsimd_p;
> > +  unsigned size;
> > +};
> 
> This struct shouldn't be needed in the header file (it should only be
> needed in pair-fusion.cc).  I can see that it'

Re: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Andrew Pinski
On Mon, May 20, 2024 at 2:57 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
> > optimization opportunities for gimple optimizers.
> >
> > While we are here, we also remove the vget_low_* definitions from 
> > arm_neon.h and
> > use the new intrinsics framework.
> >
> > PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS):
> >   New macro to create definitions for all vget_low intrinsics.
> >   (VGET_LOW_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_low function codes.
> >   (aarch64_general_fold_builtin): Fold vget_low calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
> >   (aarch64_vget_lo_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_low_f16): Likewise.
> >   (vget_low_f32): Likewise.
> >   (vget_low_f64): Likewise.
> >   (vget_low_p8): Likewise.
> >   (vget_low_p16): Likewise.
> >   (vget_low_p64): Likewise.
> >   (vget_low_s8): Likewise.
> >   (vget_low_s16): Likewise.
> >   (vget_low_s32): Likewise.
> >   (vget_low_s64): Likewise.
> >   (vget_low_u8): Likewise.
> >   (vget_low_u16): Likewise.
> >   (vget_low_u32): Likewise.
> >   (vget_low_u64): Likewise.
> >   (vget_low_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi
> >   with vget_low_s16.
> >   * gcc.target/aarch64/vget_low_2.c: New test.
> >   * gcc.target/aarch64/vget_low_2_be.c: New test.
>
> Ok, thanks.  I suppose the patch has the side effect of allowing
> vget_low_bf16 to be called without +bf16.  IMO that's the correct
> behaviour though, and is consistent with how we handle reinterprets.

Pushed as r15-697-ga2e4fe5a53cf75cd055f64e745ebd51253e42254 .

Thanks,
Andrew

>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  60 ++
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
> >  gcc/config/aarch64/aarch64-simd.md|  23 +---
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
> >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
> >  .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
> >  7 files changed, 124 insertions(+), 132 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 75d21de1401..4afe7c86ae3 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VREINTERPRET_BUILTINS \
> >VREINTERPRETQ_BUILTINS
> >
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  VGET_LOW_BUILTIN(f32) \
> > +  VGET_LOW_BUILTIN(f64) \
> > +  VGET_LOW_BUILTIN(p8) \
> > +  VGET_LOW_BUILTIN(p16) \
> > +  VGET_LOW_BUILTIN(p64) \
> > +  VGET_LOW_BUILTIN(s8) \
> > +  VGET_LOW_BUILTIN(s16) \
> > +  VGET_LOW_BUILTIN(s32) \
> > +  VGET_LOW_BUILTIN(s64) \
> > +  VGET_LOW_BUILTIN(u8) \
> > +  VGET_LOW_BUILTIN(u16) \
> > +  VGET_LOW_BUILTIN(u32) \
> > +  VGET_LOW_BUILTIN(u64) \
> > +  VGET_LOW_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -697,6 +714,9 @@ typedef struct
> >  #define VREINTERPRET_BUILTIN(A, B, L) \
> >AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
> >
> > +#define VGET_LOW_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -732,6 +752,7 @@ enum aarch64_builtins
> >AARCH64_CRC32_BUILTIN_MAX,
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> > +  AARCH64_SIMD_VGET_LOW_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> >   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
> >},
> >
> > +#undef VGET_LOW_BUILTIN
> > +#define VGET_LOW_BUILTIN(A) \
> > +  {"vget_low_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> > +   FLAG_AUTO_FP, \
> > +   false \
> > +  },
> > +
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  VGET

Re: committed: Stepping down as maintainer for ARC and Epiphany

2024-05-20 Thread Gerald Pfeifer
On Fri, 7 Jul 2023, Joern Wolfgang Rennecke wrote:
> Stepping down as maintainer for ARC and Epiphany
> 
> * MAINTAINERS (CPU Port Maintainers): Remove myself as ARC end
> epiphany maintainer.
> (Write After Approval): Add myself.

Hmm, is it possible you committed this last year, but never pushed?

(I essentially recreated that patch today and pushed it, commit 
f94598ffaf5affbc9421ff230502357b07c55d9c, before seeing the above.)

Gerald


RE: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Pengxuan Zheng (QUIC)
> On Mon, May 20, 2024 at 2:57 AM Richard Sandiford
>  wrote:
> >
> > Pengxuan Zheng  writes:
> > > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up
> > > more optimization opportunities for gimple optimizers.
> > >
> > > While we are here, we also remove the vget_low_* definitions from
> > > arm_neon.h and use the new intrinsics framework.
> > >
> > > PR target/102171
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-builtins.cc
> (AARCH64_SIMD_VGET_LOW_BUILTINS):
> > >   New macro to create definitions for all vget_low intrinsics.
> > >   (VGET_LOW_BUILTIN): Likewise.
> > >   (enum aarch64_builtins): Add vget_low function codes.
> > >   (aarch64_general_fold_builtin): Fold vget_low calls.
> > >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_low 
> > > builtins.
> > >   * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
> > >   (aarch64_vget_lo_halfv8bf): Likewise.
> > >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> > >   (vget_low_f16): Likewise.
> > >   (vget_low_f32): Likewise.
> > >   (vget_low_f64): Likewise.
> > >   (vget_low_p8): Likewise.
> > >   (vget_low_p16): Likewise.
> > >   (vget_low_p64): Likewise.
> > >   (vget_low_s8): Likewise.
> > >   (vget_low_s16): Likewise.
> > >   (vget_low_s32): Likewise.
> > >   (vget_low_s64): Likewise.
> > >   (vget_low_u8): Likewise.
> > >   (vget_low_u16): Likewise.
> > >   (vget_low_u32): Likewise.
> > >   (vget_low_u64): Likewise.
> > >   (vget_low_bf16): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/aarch64/pr113573.c: Replace
> __builtin_aarch64_get_lowv8hi
> > >   with vget_low_s16.
> > >   * gcc.target/aarch64/vget_low_2.c: New test.
> > >   * gcc.target/aarch64/vget_low_2_be.c: New test.
> >
> > Ok, thanks.  I suppose the patch has the side effect of allowing
> > vget_low_bf16 to be called without +bf16.  IMO that's the correct
> > behaviour though, and is consistent with how we handle reinterprets.

Thanks, Richard! Yes, it does have the side effect you pointed out and is 
consistent with how reinterprets are handled currently.
> 
> Pushed as r15-697-ga2e4fe5a53cf75cd055f64e745ebd51253e42254 .

Thanks, Andrew!
> 
> Thanks,
> Andrew
> 
> >
> > Richard
> >
> > > Signed-off-by: Pengxuan Zheng 
> > > ---
> > >  gcc/config/aarch64/aarch64-builtins.cc|  60 ++
> > >  gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
> > >  gcc/config/aarch64/aarch64-simd.md|  23 +---
> > >  gcc/config/aarch64/arm_neon.h | 105 --
> > >  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
> > >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
> > >  .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
> > >  7 files changed, 124 insertions(+), 132 deletions(-)  create mode
> > > 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc
> > > b/gcc/config/aarch64/aarch64-builtins.cc
> > > index 75d21de1401..4afe7c86ae3 100644
> > > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > > @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum
> aarch64_simd_builtin_data[] = {
> > >VREINTERPRET_BUILTINS \
> > >VREINTERPRETQ_BUILTINS
> > >
> > > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > > +  VGET_LOW_BUILTIN(f16) \
> > > +  VGET_LOW_BUILTIN(f32) \
> > > +  VGET_LOW_BUILTIN(f64) \
> > > +  VGET_LOW_BUILTIN(p8) \
> > > +  VGET_LOW_BUILTIN(p16) \
> > > +  VGET_LOW_BUILTIN(p64) \
> > > +  VGET_LOW_BUILTIN(s8) \
> > > +  VGET_LOW_BUILTIN(s16) \
> > > +  VGET_LOW_BUILTIN(s32) \
> > > +  VGET_LOW_BUILTIN(s64) \
> > > +  VGET_LOW_BUILTIN(u8) \
> > > +  VGET_LOW_BUILTIN(u16) \
> > > +  VGET_LOW_BUILTIN(u32) \
> > > +  VGET_LOW_BUILTIN(u64) \
> > > +  VGET_LOW_BUILTIN(bf16)
> > > +
> > >  typedef struct
> > >  {
> > >const char *name;
> > > @@ -697,6 +714,9 @@ typedef struct
> > >  #define VREINTERPRET_BUILTIN(A, B, L) \
> > >AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
> > >
> > > +#define VGET_LOW_BUILTIN(A) \
> > > +  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> > > +
> > >  #undef VAR1
> > >  #define VAR1(T, N, MAP, FLAG, A) \
> > >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > > @@ -732,6 +752,7 @@ enum aarch64_builtins
> > >AARCH64_CRC32_BUILTIN_MAX,
> > >/* SIMD intrinsic builtins.  */
> > >AARCH64_SIMD_VREINTERPRET_BUILTINS
> > > +  AARCH64_SIMD_VGET_LOW_BUILTINS
> > >/* ARMv8.3-A Pointer Authentication Builtins.  */
> > >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> > >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > > @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum
> aarch64_fcmla_lane_builtin_data[] = {
> > >   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
> > >},
> > >
> > > +#un

Re: [C PATCH] Fix for some variably modified types not being recognized [PR114831]

2024-05-20 Thread Joseph Myers
On Sat, 18 May 2024, Martin Uecker wrote:

> We did not propagate C_TYPE_VARIABLY_MODIFIED to pointers in all
> cases.   I added this directly in two places, but maybe we should
> check all cases of build_pointer_type or integrate this into 
> c_build_pointer_type and use this everywhere (but I do not fully 
> understand the pointer mode logic there).

This is OK.  I think there's at least one further bug in this area: 
composite types of pointer types (which happen to use 
build_pointer_type_for_mode) don't seem to propagate 
C_TYPE_VARIABLY_MODIFIED either.  Maybe there are other cases as well; 
checking all uses of build_pointer_type functions would make sense.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [C PATCH, v2] Fix for redeclared enumerator initialized with different type [PR115109]

2024-05-20 Thread Joseph Myers
On Sun, 19 May 2024, Martin Uecker wrote:

> c23 specifies that the type of a redeclared enumerator is the one of the
> previous declaration.  Convert initializers with different type 
> accordingly
> and add -Woverflow warning.

It doesn't make sense to use -Woverflow.  Either the value is the same (in 
which case it fits in the desired type), or it's different (and you should 
get the "conflicting redeclaration of enumerator" error or some equivalent 
error, whether or not the value in the redeclaration fits in the previous 
type).

Note that this includes both explicit values and values determined by 
adding 1 implicitly.  E.g.

  enum e { A = 0, B = UINT_MAX };
  enum e { B = UINT_MAX, A };

is not valid, because in the redefinition, A gets the value 1 greater than 
UINT_MAX (which is not representable in unsigned int) - there is *not* an 
addition in type unsigned int, or in type enum e.

The constraint violated is the general one "If an identifier has no 
linkage, there shall be no more than one declaration of the identifier (in 
a declarator or type specifier) with the same scope and in the same name 
space, except that: ... enumeration constants and tags may be redeclared 
as specified in 6.7.3.3 and 6.7.3.4, respectively." (where 6.7.3.3 says 
"Enumeration constants can be redefined in the same scope with the same 
value as part of a redeclaration of the same enumerated type." - as the 
redefinition is not with the same value, the "as specified in 6.7.3.3" is 
not satisfied and so the general constraint against redeclarations with no 
linkage applies).

-- 
Joseph S. Myers
josmy...@redhat.com



RE: [PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Biener 
> Sent: Sunday, May 19, 2024 11:55 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PHIOPT: Don't transform minmax if
> middle bb contains a phi [PR115143]
> 
> 
> 
> > Am 19.05.2024 um 01:12 schrieb Andrew Pinski
> :
> >
> > The problem here is even if last_and_only_stmt returns a
> statement,
> > the bb might still contain a phi node which defines a ssa
> name which
> > is used in that statement so we need to add a check to make
> sure that
> > the phi nodes are empty for the middle bbs in both the
> > `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B`
> cases.
> 
> Is that single arg PHIs or do we have an extra edge into the
> middle BB?  I think that might be unexpected, at least costing
> wise.  Maybe Also to some of the replacement code we have ?

It is only a single arg PHI since we already reject multiple edges in the 
middle BBs for these cases.
It was EVPR that produces the single arg PHI in the original testcase from 
folding of a conditional to false and evpr does not do simple name prop in this 
case and there is no pass inbetween evrp and phiopt that will clear up single 
arg PHI.
I added the Gimple based testcases basically to avoid the needing of depending 
on what previous passes could produce too.

> 
> > OK for trunk and backport to all open branches since r14-
> 3827-g30e6ee074588ba was backported?
> > Bootstrapped and tested on x86_64_linux-gnu with no
> regressions.
> >
> 
> Ok

Does this include the GCC 13 branch or should I wait until after the GCC 13.3.0 
release?

Thanks,
Andrew Pinski

> 
> Richard
> 
> >PR tree-optimization/115143
> >
> > gcc/ChangeLog:
> >
> >* tree-ssa-phiopt.cc (minmax_replacement): Check for
> empty
> >phi nodes for middle bbs for the case where middle bb is
> not empty.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.c-torture/compile/pr115143-1.c: New test.
> >* gcc.c-torture/compile/pr115143-2.c: New test.
> >* gcc.c-torture/compile/pr115143-3.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> > .../gcc.c-torture/compile/pr115143-1.c| 21
> +
> > .../gcc.c-torture/compile/pr115143-2.c| 30
> +++
> > .../gcc.c-torture/compile/pr115143-3.c| 29
> ++
> > gcc/tree-ssa-phiopt.cc| 12 
> > 4 files changed, 92 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-1.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-2.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-3.c
> >
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > new file mode 100644
> > index 000..5cb119ea432
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > @@ -0,0 +1,21 @@
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +short a, d;
> > +char b;
> > +long c;
> > +unsigned long e, f;
> > +void g(unsigned long h) {
> > +  if (c ? e : b)
> > +if (e)
> > +  if (d) {
> > +a = f ? ({
> > +  unsigned long i = d ? f : 0, j = e ? h : 0;
> > +  i < j ? i : j;
> > +}) : 0;
> > +  }
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > new file mode 100644
> > index 000..05c3bbe9738
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-options "-fgimple" } */
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +unsigned __GIMPLE (ssa,startwith("phiopt")) foo (unsigned
> a, unsigned
> > +b) {
> > +  unsigned j;
> > +  unsigned _23;
> > +  unsigned _12;
> > +
> > +  __BB(2):
> > +  if (a_6(D) != 0u)
> > +goto __BB3;
> > +  else
> > +goto __BB4;
> > +
> > +  __BB(3):
> > +  j_10 = __PHI (__BB2: b_11(D));
> > +  _23 = __MIN (a_6(D), j_10);
> > +  goto __BB4;
> > +
> > +  __BB(4):
> > +  _12 = __PHI (__BB3: _23, __BB2: 0u);  return _12;
> > +
> > +}
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
> > new file mode 100644
> > index 000..53c5fb5588e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
> > @@ -0,0 +1,29 @@
> > +/* { dg-options "-fgimple" } */
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would tra

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-20 Thread Alexander Monakov


Hello!

I looked at ternlog a bit last year, so I'd like to offer some drive-by
comments. If you want to tackle them in a follow-up patch, or leave for
someone else to handle, please let me know.

On Fri, 17 May 2024, Roger Sayle wrote:

> This revised patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?

Just to make sure: no new tests for the new tricks?

> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> +/* Determine the ternlog immediate index that implements 3-operand
> +   ternary logic expression OP.  This uses and modifies the 3 element
> +   array ARGS to record and check the leaves, either 3 REGs, or 2 REGs
> +   and MEM.  Returns an index between 0 and 255 for a valid ternlog,
> +   or -1 if the expression isn't suitable.  */
> +
> +int
> +ix86_ternlog_idx (rtx op, rtx *args)
> +{
> +  int idx0, idx1;
> +
> +  if (!op)
> +return -1;
> +
> +  switch (GET_CODE (op))
> +{
> +case REG:
> +  if (!args[0])
> + {
> +   args[0] = op;
> +   return 0xf0;

>From readability perspective, I wonder if it's nicer to have something like

enum {
  TERNLOG_A = 0xf0,
  TERNLOG_B = 0xcc,
  TERNLOG_C = 0xaa
}

and then use them to build the immediates.

> + }
> +  if (REGNO (op) == REGNO (args[0]))
> + return 0xf0;
> +  if (!args[1])
> + {
> +   args[1] = op;
> +   return 0xcc;
> + }
[snip]
> +
> +/* Return TRUE if OP (in mode MODE) is the leaf of a ternary logic
> +   expression, such as a register or a memory reference.  */
> + 
> +bool
> +ix86_ternlog_leaf_p (rtx op, machine_mode mode)
> +{
> +  /* We can't use memory_operand here, as it may return a different
> + value before and after reload (for volatile MEMs) which creates
> + problems splitting instructions.  */
> +  return register_operand (op, mode)
> +  || MEM_P (op)
> +  || GET_CODE (op) == CONST_VECTOR
> +  || bcst_mem_operand (op, mode);

Did your editor automatically indent this correctly for you? I think
usually such expressions have outer parenthesis.

> +}
[snip]
> +/* Expand a 3-operand ternary logic expression.  Return TARGET. */
> +rtx
> +ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx,
> +  rtx target)
> +{
> +  rtx tmp0, tmp1, tmp2;
> +
> +  if (!target)
> +target = gen_reg_rtx (mode);
> +
> +  /* Canonicalize ternlog index for degenerate (duplicated) operands.  */

But this only canonicalizes the case of triplicated operands, and does nothing
if two operands are duplicates of each other, and the third is distinct.
Handling that would complicate the already large patch a lot though.

> +  if (rtx_equal_p (op0, op1) && rtx_equal_p (op0, op2))
> +switch (idx & 0x81)
> +  {
> +  case 0x00:
> + idx = 0x00;
> + break;
> +  case 0x01:
> + idx = 0x0f;
> + break;
> +  case 0x80:
> + idx = 0xf0;
> + break;
> +  case 0x81:
> + idx = 0xff;
> + break;
> +  }
> +
> +  switch (idx & 0xff)
> +{
> +case 0x00:
> +  if ((!op0 || !side_effects_p (op0))
> +  && (!op1 || !side_effects_p (op1))
> +  && (!op2 || !side_effects_p (op2)))
> +{
> +   emit_move_insn (target, CONST0_RTX (mode));
> +   return target;
> + }
> +  break;
> +
> +case 0x0a: /* ~a&c */

With the enum idea above, this could be 'case ~TERNLOG_A & TERNLOG_C', etc.

Alexander


Re: [PATCH v2] c++/modules: Remember that header units have CMIs

2024-05-20 Thread Jason Merrill

On 5/17/24 02:14, Nathaniel Shead wrote:

On Tue, May 14, 2024 at 06:21:48PM -0400, Jason Merrill wrote:

On 5/12/24 22:58, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.



I realised as I was looking over this again that I might have spoken too
soon with the header unit example being supported. Doing the following:

   // a.H
   struct { int y; } s;
   decltype(s) f(decltype(s));  // { dg-error "used but never defined" }
   inline auto x = f({ 123 });
   
   // b.C

   struct {} unrelated;
   import "a.H";
   decltype(s) f(decltype(s) x) {
 return { 456 + x.y };
   }

   // c.C
   import "linkage-3_a.H";
   int main() { auto a = x.y; }

Actually does fail to link, because in 'c.C' we call 'f(.anon_0)' but
the definition 'b.C' is f(.anon_1).

I don't think this is fixable, so I don't think this direction is
workable.


Since namespace-scope anonymous types are TU-local, we don't need to 
support that for proper modules, but it's not clear to me that we don't 
need to support it for header units.


OTOH, https://eel.is/c++draft/module#import-5.3 allows c.C to import a 
different header unit than b.C, in which case the type is different and 
x violates the odr.



That said, I think that it might still be worth making header modules
satisfy 'module_has_cmi_p', since that is true to the name, and will be
useful in other places we currently use 'module_p ()': in which case we
could instead make all the callers in 'no_linkage_check' do
'module_maybe_has_cmi_p () && !header_module_p ()'; something like the
following, perhaps?


If we need that condition, it should be its own predicate rather than 
expecting callers to do that combined check.


But it's not clear to me how this is different from a type in the GMF of 
a named module, which is exactly the maybe_has_cmi case; there we could 
again see a different version of the type if another TU includes the header.


Jason



[PATCH] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-20 Thread Andrew Pinski
The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 
1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the 
undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.

OK for trunk and gcc-14 and gcc-13 branches? Bootstrapped and tested on 
x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  |  6 +++--
 .../c-c++-common/ubsan/signed1bitfield-1.c| 25 +++
 .../gcc.c-torture/execute/signed1bitfield-1.c | 23 +
 3 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..35e3d82b131 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
-- 
2.43.0



Re: [PATCH v2] c++: Strengthen checks on 'main'

2024-05-20 Thread Jason Merrill

On 5/17/24 02:47, Nathaniel Shead wrote:

On Tue, May 14, 2024 at 06:25:29PM -0400, Jason Merrill wrote:

On 5/11/24 08:32, Nathaniel Shead wrote:

I wasn't entirely sure what to do with the 'abi/main.C' testcase here;
is this OK, or should I e.g. lower the linkage error to a pedwarn for
the purposes of this test?


I think it should be a pedwarn anyway, since it's harmless.  The others can
still be errors.


Fair enough; how about this?  OK for trunk if bootstrap + regtest
succeeds?


OK.


-- >8 --

This patch adds some missing requirements for legal main declarations,
as according to [basic.start.main] p2.

gcc/cp/ChangeLog:

* decl.cc (grokfndecl): Check for main functions with language
linkage or module attachment.
(grokvardecl): Check for extern 'C' entities named main.

gcc/testsuite/ChangeLog:

* g++.dg/abi/main.C: Check pedwarn for main with linkage-spec.
* g++.dg/modules/contracts-1_b.C: Don't declare main in named
module.
* g++.dg/modules/contracts-3_b.C: Likewise.
* g++.dg/modules/contracts-4_d.C: Likewise.
* g++.dg/modules/horcrux-1_a.C: Export declarations, so that...
* g++.dg/modules/horcrux-1_b.C: Don't declare main in named
module.
* g++.dg/modules/main-1.C: New test.
* g++.dg/parse/linkage5.C: New test.
* g++.dg/parse/linkage6.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/decl.cc   | 19 +++
  gcc/testsuite/g++.dg/abi/main.C  |  3 ++-
  gcc/testsuite/g++.dg/modules/contracts-1_b.C |  4 
  gcc/testsuite/g++.dg/modules/contracts-3_b.C |  4 
  gcc/testsuite/g++.dg/modules/contracts-4_d.C |  2 --
  gcc/testsuite/g++.dg/modules/horcrux-1_a.C   |  3 +++
  gcc/testsuite/g++.dg/modules/horcrux-1_b.C   |  2 +-
  gcc/testsuite/g++.dg/modules/main-1.C|  5 +
  gcc/testsuite/g++.dg/parse/linkage5.C| 14 ++
  gcc/testsuite/g++.dg/parse/linkage6.C| 13 +
  10 files changed, 53 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/main-1.C
  create mode 100644 gcc/testsuite/g++.dg/parse/linkage5.C
  create mode 100644 gcc/testsuite/g++.dg/parse/linkage6.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 6fcab615d55..a992d54dc8f 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -10788,6 +10788,11 @@ grokfndecl (tree ctype,
  "cannot declare %<::main%> to be %qs", "consteval");
if (!publicp)
error_at (location, "cannot declare %<::main%> to be static");
+  if (current_lang_depth () != 0)
+   pedwarn (location, OPT_Wpedantic, "cannot declare %<::main%> with a"
+" linkage specification");
+  if (module_attach_p ())
+   error_at (location, "cannot attach %<::main%> to a named module");
inlinep = 0;
publicp = 1;
  }
@@ -11287,10 +11292,16 @@ grokvardecl (tree type,
  DECL_INTERFACE_KNOWN (decl) = 1;
  
if (DECL_NAME (decl)

-  && MAIN_NAME_P (DECL_NAME (decl))
-  && scope == global_namespace)
-error_at (DECL_SOURCE_LOCATION (decl),
- "cannot declare %<::main%> to be a global variable");
+  && MAIN_NAME_P (DECL_NAME (decl)))
+{
+  if (scope == global_namespace)
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "cannot declare %<::main%> to be a global variable");
+  else if (DECL_EXTERN_C_P (decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "an entity named % cannot be declared with "
+ "C language linkage");
+}
  
/* Check that the variable can be safely declared as a concept.

   Note that this also forbids explicit specializations.  */
diff --git a/gcc/testsuite/g++.dg/abi/main.C b/gcc/testsuite/g++.dg/abi/main.C
index 4c5f1ea213c..2797a16df5b 100644
--- a/gcc/testsuite/g++.dg/abi/main.C
+++ b/gcc/testsuite/g++.dg/abi/main.C
@@ -1,10 +1,11 @@
  /* { dg-do compile } */
+/* { dg-additional-options "-Wno-error=pedantic" }
  
  /* Check if entry points get implicit C linkage. If they don't, compiler will

   * error on incompatible declarations */
  
  int main();

-extern "C" int main();
+extern "C" int main();  // { dg-warning "linkage specification" }
  
  #ifdef __MINGW32__
  
diff --git a/gcc/testsuite/g++.dg/modules/contracts-1_b.C b/gcc/testsuite/g++.dg/modules/contracts-1_b.C

index 30c15f6928b..aa36c8d6b1b 100644
--- a/gcc/testsuite/g++.dg/modules/contracts-1_b.C
+++ b/gcc/testsuite/g++.dg/modules/contracts-1_b.C
@@ -1,15 +1,11 @@
  // { dg-module-do run }
  // { dg-additional-options "-fmodules-ts -fcontracts 
-fcontract-continuation-mode=on" }
-module;
  #include 
-export module bar;
-// { dg-module-cmi bar }
  import foo;
  
  template

  bool bar_fn_pre(T n) { printf("bar fn pre(%d)\n", n); return true; }
  
-export

  template
  T bar_fn(T n)
[[ pre: bar_fn_pre(n) && n > 0 ]]
diff --git a/gcc/testsuite/g++.dg/modules/contracts-3_b.C 
b

3rd Ping [PATCH v9 0/5] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2024-05-20 Thread Qing Zhao
Hi, 

This is the 3rd ping for the middle-end approval for this patch set. 
I’m hoping to commit this feature into GCC15 as early as possible to enable 
linux kernel to use it and have more testing. 

For your info, this new feature has been committed to CLANG since last October 
(https://github.com/llvm/llvm-project/pull/68750, 
https://github.com/llvm/llvm-project/pull/70606)

Linux kernel started to use the counted-by attribute in the source code since 
then with CLANG:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/compiler_attributes.h?h=v6.9#n97
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?h=v6.9&qt=grep&q=__counted_by

The patch set includes C FE changes, documentation changes, and middle end 
changes.

**All C FE changes + documentation changes have been approved by Joseph already 
(please see the approval below):
1/5 (All are C FE and documentation changes, approved):
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649857.html
2/5: (C FE approved)
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649189.html
4/5: (All are C FE changes, approved)
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649190.html
5/5: (C FE approved)
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649191.html

**All middle-end changes has been reviewed by Siddhesh, fixed all the comments 
raised by him, no remaining issue:
2/5: (middle-end reviewed without issue)
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647566.html
3/5: (All are Middle-end changes reviewed without issue)
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649256.html
5/5: (middle-end reviewed without issue)
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649217.html

Okay for GCC15?

Thanks a lot !

Qing
> On May 7, 2024, at 10:14, Qing Zhao  wrote:
> 
> 
> 
>> On May 7, 2024, at 10:02, Qing Zhao  wrote:
>> 
>> 2nd Ping for the middle-end change approval. -:)
>> 
>> **Approval status:
>> 
>> All C FE changes have been approved.
>> 
>> **Review status:
>> 
>> All Middle-end changes have been reviewed by Sid, no remaining issue. 
>> 
>> Okay for GCC15? 
> 
> For convenience, the following is the links to the 9th version:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649389.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649390.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649391.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649392.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649394.html
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649393.html
> 
> One more note, CLANG has supported this attribute since last year.
> 
> Qing
>> 
>> 
>> 
>> thanks.
>> 
>> Qing
>> 
>>> Begin forwarded message:
>>> 
>>> From: Qing Zhao 
>>> Subject: Re: [PATCH v9 0/5] New attribute "counted_by" to annotate bounds 
>>> for C99 FAM(PR108896)
>>> Date: April 23, 2024 at 15:56:26 EDT
>>> To: Richard Biener , Siddhesh Poyarekar 
>>> 
>>> Cc: Joseph Myers , "gcc-patches@gcc.gnu.org" 
>>> , "isanb...@gmail.com" , Kees 
>>> Cook , "uec...@tugraz.at" 
>>> 
>>> Ping for the middle-end change approval.
>>> 
>>> And an update on the status of the patch set:
>>> 
>>> **Approval status:
>>> 
>>> All C FE changes have been approved.
>>> 
>>> **Review status:
>>> 
>>> All Middle-end changes have been reviewed by Sid, no remaining issue. 
>>> 
>>> Okay for GCC15? 
>>> 
>>> thanks.
>>> 
>>> Qing
>>> 
 On Apr 12, 2024, at 09:54, Qing Zhao  wrote:
 
 Hi,
 
 This is the 9th version of the patch.
 
 Compare with the 8th version, the difference are:
 
 updates per Joseph's comments:
 
 1. in C FE, add checking for counted_by attribute for the new multiple 
 definitions of the same tag for C23 in the routine 
 "tagged_types_tu_compatible_p".
  Add a new testing case flex-array-counted-by-8.c for this. 
  This is for Patch 1;
 
 2. two minor typo fixes in c-typeck.cc. 
  This is for Patch 2;
 
 Approval status:
 
  Patch 2's C FE change has been approved with minor typo fixes (the above 
 2);
  Patch 4 has been approved; 
  Patch 5's C FE change has been approved;
 
 Review status:
 
  Patch 3, Patch 2 and Patch 5's Middle-end change have been review by Sid, 
 No issue.
 
 More review needed:
 
  Patch 1's new change to C FE (the above 1);
  Patch 2, 3 and 5's middle-end change need to be approved   
 
 The 8th version is here:
 https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648559.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648560.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648561.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648562.html
 https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648563.html
 
 It based on the following original proposal:
 
 https://gcc.gnu.org/pipermail/gcc-

[PATCH] modula2: Fully respect DESTDIR in texi rule

2024-05-20 Thread Sam James
This was originally reported in Gentoo at https://bugs.gentoo.org/930014.

2024-05-20  Sam James 
gcc/m2/
* Make-lang.in (m2.install-info): Pass --destdir for dir index.
---
 gcc/m2/Make-lang.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
index 0abd8ce14555..da4226123dff 100644
--- a/gcc/m2/Make-lang.in
+++ b/gcc/m2/Make-lang.in
@@ -425,7 +425,7 @@ m2.install-info: installdirs
else true; fi
-if [ -f gm2$(exeext) ] && [ -f $(DESTDIR)$(infodir)/m2.info ]; then \
  if $(SHELL) -c 'install-info --version' >/dev/null 2>&1; then \
-   install-info --dir-file=$(infodir)/dir 
$(DESTDIR)$(infodir)/m2.info; \
+   install-info --dir-file=$(DESTDIR)$(infodir)/dir 
$(DESTDIR)$(infodir)/m2.info; \
  else true; fi; \
else true; fi
 
-- 
2.45.1



Re: [PATCH] modula2: Fully respect DESTDIR in texi rule

2024-05-20 Thread Gaius Mulley
Sam James  writes:

> This was originally reported in Gentoo at https://bugs.gentoo.org/930014.
>
> 2024-05-20  Sam James 
> gcc/m2/
> * Make-lang.in (m2.install-info): Pass --destdir for dir index.
> ---
>  gcc/m2/Make-lang.in | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/m2/Make-lang.in b/gcc/m2/Make-lang.in
> index 0abd8ce14555..da4226123dff 100644
> --- a/gcc/m2/Make-lang.in
> +++ b/gcc/m2/Make-lang.in
> @@ -425,7 +425,7 @@ m2.install-info: installdirs
>   else true; fi
>   -if [ -f gm2$(exeext) ] && [ -f $(DESTDIR)$(infodir)/m2.info ]; then \
> if $(SHELL) -c 'install-info --version' >/dev/null 2>&1; then \
> - install-info --dir-file=$(infodir)/dir 
> $(DESTDIR)$(infodir)/m2.info; \
> + install-info --dir-file=$(DESTDIR)$(infodir)/dir 
> $(DESTDIR)$(infodir)/m2.info; \
> else true; fi; \
>   else true; fi

thanks for the patch and url above - looks good to me,

regards,
Gaius


[PATCH v3 1/2] RISC-V: avoid LUI based const mat in prologue/epilogue expansion [PR/105733]

2024-05-20 Thread Vineet Gupta
Changes since v2:
  - Broke out the hunk corresponding to alloca in epilogue expansion in
a seperate patch.
---

If the constant used for stack offset can be expressed as sum of two S12
values, the constant need not be materialized (in a reg) and instead the
two S12 bits can be added to instructions involved with frame pointer.
This avoids burning a register and more importantly can often get down
to be 2 insn vs. 3.

The prev patches to generally avoid LUI based const materialization didn't
fix this PR and need this directed fix in funcion prologue/epilogue
expansion.

This fix doesn't move the neddle for SPEC, at all, but it is still a
win considering gcc generates one insn fewer than llvm for the test ;-)

   gcc-13.1 release   |  gcc 230823 |   |
  |g6619b3d4c15c|   This patch  |  clang/llvm
-
li  t0,-4096 | lit0,-4096  | addi  sp,sp,-2048 | addi 
sp,sp,-2048
addit0,t0,2016   | addi  t0,t0,2032| add   sp,sp,-16   | addi sp,sp,-32
li  a4,4096  | add   sp,sp,t0  | add   a5,sp,a0| add  a1,sp,16
add sp,sp,t0 | addi  a5,sp,-2032   | sbzero,0(a5)  | add  a0,a0,a1
li  a5,-4096 | add   a0,a5,a0  | addi  sp,sp,2032  | sb   zero,0(a0)
addia4,a4,-2032  | lit0, 4096  | addi  sp,sp,32| addi sp,sp,2032
add a4,a4,a5 | sbzero,2032(a0) | ret   | addi sp,sp,48
addia5,sp,16 | addi  t0,t0,-2032   |   | ret
add a5,a4,a5 | add   sp,sp,t0  |
add a0,a5,a0 | ret |
li  t0,4096  |
sd  a5,8(sp) |
sb  zero,2032(a0)|
addit0,t0,-2016  |
add sp,sp,t0 |
ret  |

gcc/ChangeLog:
PR target/105733
* config/riscv/riscv.h: New macros for with aligned offsets.
* config/riscv/riscv.cc (riscv_split_sum_of_two_s12): New
function to split a sum of two s12 values into constituents.
(riscv_expand_prologue): Handle offset being sum of two S12.
(riscv_expand_epilogue): Ditto.
* config/riscv/riscv-protos.h (riscv_split_sum_of_two_s12): New.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr105733.c: New Test.
* gcc.target/riscv/rvv/autovec/vls/spill-1.c: Adjust to not
expect LUI 4096.
* gcc.target/riscv/rvv/autovec/vls/spill-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/spill-7.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv.cc | 54 +--
 gcc/config/riscv/riscv.h  |  7 +++
 gcc/testsuite/gcc.target/riscv/pr105733.c | 15 ++
 .../riscv/rvv/autovec/vls/spill-1.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-2.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-3.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-4.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-5.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-6.c   |  4 +-
 .../riscv/rvv/autovec/vls/spill-7.c   |  4 +-
 11 files changed, 89 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr105733.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c64aae18deb9..0704968561bb 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -167,6 +167,8 @@ extern void riscv_subword_address (rtx, rtx *, rtx *, rtx 
*, rtx *);
 extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *);
 extern enum memmodel riscv_union_memmodels (enum memmodel, enum memmodel);
 extern bool riscv_reg_frame_related (rtx);
+extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *,
+   HOST_WIDE_INT *);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d0c22058b8c3..2ecbcf1d0af8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4075,6 +4075,32 @@ riscv_split_doubleword_move (rtx dest, rtx src)
riscv_emit_move (riscv_subword (dest, true), riscv_subword (src, true));
  }
 }
+
+/* Constant VAL is known to be sum of two S12 constants.  Break it into
+   comprising BASE and OFF.
+   Numerically S12 is -2048 to 2047, however it uses the more conservative
+   range -2048 to 2032 as offsets pertain to stack related registers.  */
+
+void
+riscv_split_sum_of_two_s12 (HOST_WIDE_INT val, HOST_WIDE_INT *base,
+   HOST_WIDE_INT *off)
+{
+  if (SUM_OF_TWO_S12_N (val))
+{
+

[PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-20 Thread Vineet Gupta
This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
   li   t0,-4096|  addi sp,s0,-2048
   addi t0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
   add  sp,s0,t0|  addi sp,sp,-1488
   .cfi_def_cfa 2, 3536 |  .cfi_def_cfa_offset 3536  <- #2
   addi sp,sp,1504  |  addi sp,sp,1504
   .cfi_def_cfa_offset 2032 |  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.

---

This is continuing on the prev patch in function epilogue expansion.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): Handle offset
being sum of two S12.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.cc | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 2ecbcf1d0af8..85df5b7ab498 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8111,7 +8111,10 @@ riscv_expand_epilogue (int style)
   need_barrier_p = false;
 
   poly_int64 adjust_offset = -frame->hard_frame_pointer_offset;
+  rtx dwarf_adj = gen_int_mode (adjust_offset, Pmode);
   rtx adjust = NULL_RTX;
+  bool sum_of_two_s12 = false;
+  HOST_WIDE_INT one, two;
 
   if (!adjust_offset.is_constant ())
{
@@ -8123,14 +8126,23 @@ riscv_expand_epilogue (int style)
}
   else
{
- if (!SMALL_OPERAND (adjust_offset.to_constant ()))
+ HOST_WIDE_INT adj_off_value = adjust_offset.to_constant ();
+ if (SMALL_OPERAND (adj_off_value))
+   {
+ adjust = GEN_INT (adj_off_value);
+   }
+ else if (SUM_OF_TWO_S12_ALGN (adj_off_value))
+   {
+ riscv_split_sum_of_two_s12 (adj_off_value, &one, &two);
+ dwarf_adj = adjust = GEN_INT (one);
+ sum_of_two_s12 = true;
+   }
+ else
{
  riscv_emit_move (RISCV_PROLOGUE_TEMP (Pmode),
-  GEN_INT (adjust_offset.to_constant ()));
+  GEN_INT (adj_off_value));
  adjust = RISCV_PROLOGUE_TEMP (Pmode);
}
- else
-   adjust = GEN_INT (adjust_offset.to_constant ());
}
 
   insn = emit_insn (
@@ -8138,14 +8150,21 @@ riscv_expand_epilogue (int style)
  adjust));
 
   rtx dwarf = NULL_RTX;
-  rtx cfa_adjust_value = gen_rtx_PLUS (
-  Pmode, hard_frame_pointer_rtx,
-  gen_int_mode (-frame->hard_frame_pointer_offset, 
Pmode));
+  rtx cfa_adjust_value = gen_rtx_PLUS (Pmode, hard_frame_pointer_rtx,
+  dwarf_adj);
   rtx cfa_adjust_rtx = gen_rtx_SET (stack_pointer_rtx, cfa_adjust_value);
   dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, cfa_adjust_rtx, dwarf);
+
   RTX_FRAME_RELATED_P (insn) = 1;
 
   REG_NOTES (insn) = dwarf;
+
+  if (sum_of_two_s12)
+   {
+ insn = emit_insn (gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx,
+   GEN_INT (two)));
+ RTX_FRAME_RELATED_P (insn) = 1;
+   }
 }
 
   if (use_restore_libcall || use_multi_pop)
-- 
2.34.1



[PATCH v3] Match: Extract ternary_integer_types_match_p helper func [NFC]

2024-05-20 Thread pan2 . li
From: Pan Li 

There are sorts of match pattern for SAT related cases,  there will be
some duplicated code to check the dest, op_0, op_1 are same tree types.
Aka ternary tree type matches.  Thus, extract one helper function to
do this and avoid match code duplication.

The below test suites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 regression test.

gcc/ChangeLog:

* match.pd: Leverage helper func for SAT_ADD match.
* tree.cc (ternary_integer_types_match_p): New func impl to
check if ternary tree types are all integer.
* tree.h (ternary_integer_types_match_p): New func decl.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 28 +++-
 gcc/tree.cc  | 16 
 gcc/tree.h   |  5 +
 3 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..cff67c84498 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -39,7 +39,8 @@ along with GCC; see the file COPYING3.  If not see
HONOR_NANS
uniform_vector_p
expand_vec_cmp_expr_p
-   bitmask_inv_cst_vector_p)
+   bitmask_inv_cst_vector_p
+   ternary_integer_types_match_p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
@@ -3046,38 +3047,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned Saturation Add */
 (match (usadd_left_part_1 @0 @1)
  (plus:c @0 @1)
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_left_part_2 @0 @1)
  (realpart (IFN_ADD_OVERFLOW:c @0 @1))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (lt (plus:c @0 @1) @0)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_1 @0 @1)
  (negate (convert (gt @0 (plus:c @0 @1
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type
 
 (match (usadd_right_part_2 @0 @1)
  (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)))
- (if (INTEGRAL_TYPE_P (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@0))
-  && types_match (type, TREE_TYPE (@1)
+ (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type
 
 /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
because the sub part of left_part_2 cannot work with right_part_1.
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 6564b002dc1..b59d42c3e47 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10622,6 +10622,22 @@ uniform_integer_cst_p (tree t)
   return NULL_TREE;
 }
 
+/* Check if the types T1,  T2 and T3 are effectively the same integer type.
+   If T1,  T2 or T3 is not a type, the test applies to their TREE_TYPE.  */
+
+bool
+ternary_integer_types_match_p (tree t1, tree t2, tree t3)
+{
+  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
+  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
+  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
+
+  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P (t3))
+return false;
+
+  return types_compatible_p (t1, t2) && types_compatible_p (t2, t3);
+}
+
 /* Checks to see if T is a constant or a constant vector and if each element E
adheres to ~E + 1 == pow2 then return ~E otherwise NULL_TREE.  */
 
diff --git a/gcc/tree.h b/gcc/tree.h
index ee2aae332a4..4ac59ac55cb 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5212,6 +5212,11 @@ extern bool integer_pow2p (const_tree);
 
 extern tree bitmask_inv_cst_vector_p (tree);
 
+/* Check if the types T1,  T2 and T3 are effectively the same integer type.
+   If T1,  T2 or T3 is not a type, the test applies to their TREE_TYPE.  */
+
+extern bool ternary_integer_types_match_p (tree, tree, tree);
+
 /* integer_nonzerop (tree x) is nonzero if X is an integer constant
with a nonzero value.  */
 
-- 
2.34.1



Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-20 Thread Hongtao Liu
On Wed, May 15, 2024 at 5:24 PM Richard Biener
 wrote:
>
> On Wed, May 15, 2024 at 4:15 AM Hongtao Liu  wrote:
> >
> > On Mon, May 13, 2024 at 3:40 PM Richard Biener
> >  wrote:
> > >
> > > On Mon, May 13, 2024 at 4:29 AM liuhongt  wrote:
> > > >
> > > > As testcase in the PR, O3 cunrolli may prevent vectorization for the
> > > > innermost loop and increase register pressure.
> > > > The patch removes the 1/3 reduction of unr_insn for innermost loop for 
> > > > UL_ALL.
> > > > ul != UR_ALL is needed since some small loop complete unrolling at O2 
> > > > relies
> > > > the reduction.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > No big impact for SPEC2017.
> > > > Ok for trunk?
> > >
> > > This removes the 1/3 reduction when unrolling a loop nest (the case I was
> > > concerned about).  Unrolling of a nest is by iterating in
> > > tree_unroll_loops_completely
> > > so the to be unrolled loop appears innermost.  So I think you need a new
> > > parameter on tree_unroll_loops_completely_1 indicating whether we're in 
> > > the
> > > first iteration (or whether to assume inner most loops will "simplify").
> > yes, it would be better.
> > >
> > > Few comments below
> > >
> > > > gcc/ChangeLog:
> > > >
> > > > PR tree-optimization/112325
> > > > * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2
> > > > new parameters: loop and ul, and remove unr_insns reduction
> > > > for innermost loop.
> > > > (try_unroll_loop_completely): Pass loop and ul to
> > > > estimated_unrolled_size.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/tree-ssa/pr112325.c: New test.
> > > > * gcc.dg/vect/pr69783.c: Add extra option --param
> > > > max-completely-peeled-insns=300.
> > > > ---
> > > >  gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 
> > > >  gcc/testsuite/gcc.dg/vect/pr69783.c  |  2 +-
> > > >  gcc/tree-ssa-loop-ivcanon.cc | 16 +--
> > > >  3 files changed, 71 insertions(+), 4 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > >
> > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c 
> > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > > new file mode 100644
> > > > index 000..14208b3e7f8
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > > @@ -0,0 +1,57 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > > > +
> > > > +typedef unsigned short ggml_fp16_t;
> > > > +static float table_f32_f16[1 << 16];
> > > > +
> > > > +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> > > > +unsigned short s;
> > > > +__builtin_memcpy(&s, &f, sizeof(unsigned short));
> > > > +return table_f32_f16[s];
> > > > +}
> > > > +
> > > > +typedef struct {
> > > > +ggml_fp16_t d;
> > > > +ggml_fp16_t m;
> > > > +unsigned char qh[4];
> > > > +unsigned char qs[32 / 2];
> > > > +} block_q5_1;
> > > > +
> > > > +typedef struct {
> > > > +float d;
> > > > +float s;
> > > > +char qs[32];
> > > > +} block_q8_1;
> > > > +
> > > > +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const 
> > > > void * restrict vx, const void * restrict vy) {
> > > > +const int qk = 32;
> > > > +const int nb = n / qk;
> > > > +
> > > > +const block_q5_1 * restrict x = vx;
> > > > +const block_q8_1 * restrict y = vy;
> > > > +
> > > > +float sumf = 0.0;
> > > > +
> > > > +for (int i = 0; i < nb; i++) {
> > > > +unsigned qh;
> > > > +__builtin_memcpy(&qh, x[i].qh, sizeof(qh));
> > > > +
> > > > +int sumi = 0;
> > > > +
> > > > +for (int j = 0; j < qk/2; ++j) {
> > > > +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
> > > > +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
> > > > +
> > > > +const int x0 = (x[i].qs[j] & 0xF) | xh_0;
> > > > +const int x1 = (x[i].qs[j] >> 4) | xh_1;
> > > > +
> > > > +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> > > > +}
> > > > +
> > > > +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
> > > > ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
> > > > +}
> > > > +
> > > > +*s = sumf;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump {(?n)Not unrolling loop [1-9] \(--param 
> > > > max-completely-peel-times limit reached} "cunrolli"} } */
> > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr69783.c 
> > > > b/gcc/testsuite/gcc.dg/vect/pr69783.c
> > > > index 5df95d0ce4e..a1f75514d72 100644
> > > > --- a/gcc/testsuite/gcc.dg/vect/pr69783.c
> > > > +++ b/gcc/testsuite/gcc.dg/vect/pr69783.c
> > > > @@ -1,6 +1,6 @@
> > > >  /* { dg-do compile } */
> > > >  /* { dg-require-effective-target vect_float } */
> > > > -/* { dg-additional-options "-Ofast -funroll-loops" } */
> > > > +/* { dg-additional-optio

[PATCH] [testsuite] cope with rtems implicit -ftls-model=local-exec

2024-05-20 Thread Alexandre Oliva


gcc/config/rtems.h's OS_CC1_SPEC changes the -ftls-model default to
local-exec, which breaks some tests that compile with PIC and thus
expect dynamic TLS access models.

I assume the default overriding even with PIC is intended, so I'm
adjusting the testcases.

For those in gcc.dg/tls, I adjusted the ipa dump expectations, so that
they check for tls-local- only, which covers both -dynamic and -exec.

For those in g{cc,++}.target/aarch64/sve, I've added a -ftls-local
overrider, so that the expected opcodes for dynamic TLS are generated.

Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 targeting
aarch64-rtems6.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.dg/tls/vis-attr-hidden.c: Match tls-local- for both
-dynamic and -exec.  Note rtems's default.
* gcc.dg/tls/vis-flag-hidden.c: Likewise.
* gcc.dg/tls/vis-pragma-hidden.c: Likewise.
* gcc.target/aarch64/sve/tls_1.c: Override -ftls-model default
on rtems.
* gcc.target/aarch64/sve/tls_preserve_2.c: Likewise.
* gcc.target/aarch64/sve/tls_preserve_3.c: Likewise.
* g++.target/aarch64/sve/tls_2.C: Likewise.
---
 gcc/testsuite/g++.target/aarch64/sve/tls_2.C   |1 +
 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c |3 ++-
 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c |3 ++-
 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c   |3 ++-
 gcc/testsuite/gcc.target/aarch64/sve/tls_1.c   |1 +
 .../gcc.target/aarch64/sve/tls_preserve_2.c|1 +
 .../gcc.target/aarch64/sve/tls_preserve_3.c|1 +
 7 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/g++.target/aarch64/sve/tls_2.C 
b/gcc/testsuite/g++.target/aarch64/sve/tls_2.C
index a1a2c85e59106..23004d9984d5f 100644
--- a/gcc/testsuite/g++.target/aarch64/sve/tls_2.C
+++ b/gcc/testsuite/g++.target/aarch64/sve/tls_2.C
@@ -2,6 +2,7 @@
 /* { dg-require-effective-target tls } */
 /* { dg-options "-O2 -fPIC -msve-vector-bits=256" } */
 /* { dg-require-effective-target fpic } */
+/* { dg-additional-options "-ftls-model=global-dynamic" { target *-*-rtems* } 
} */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c 
b/gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c
index 0d43fc565b090..007d382fa9a0a 100644
--- a/gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c
+++ b/gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c
@@ -9,4 +9,5 @@ __thread int x;
 
 void reference() { x++; }
 
-/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-dynamic" 
"whole-program" } } */
+/* rtems defaults to local-exec, others should get local-dynamic.  */
+/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-" "whole-program" } } 
*/
diff --git a/gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c 
b/gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c
index a15df092d4d0c..baf248dc3babc 100644
--- a/gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c
+++ b/gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c
@@ -9,4 +9,5 @@ __thread int x;
 
 void reference() { x++; }
 
-/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-dynamic" 
"whole-program" } } */
+/* rtems defaults to local-exec, others should get local-dynamic.  */
+/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-" "whole-program" } } 
*/
diff --git a/gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c 
b/gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c
index 1be97644243ab..50cd010924cfb 100644
--- a/gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c
+++ b/gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c
@@ -13,4 +13,5 @@ __thread int x;
 
 void reference() { x++; }
 
-/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-dynamic" 
"whole-program" } } */
+/* rtems defaults to local-exec, others should get local-dynamic.  */
+/* { dg-final { scan-ipa-dump "Varpool flags: tls-local-" "whole-program" } } 
*/
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/tls_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/tls_1.c
index 43c52bc2b9061..71f354dfe1b48 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/tls_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/tls_1.c
@@ -1,5 +1,6 @@
 /* { dg-options "-O2 -fPIC -msve-vector-bits=256" } */
 /* { dg-require-effective-target fpic } */
+/* { dg-additional-options "-ftls-model=global-dynamic" { target *-*-rtems* } 
} */
 
 typedef unsigned int v8si __attribute__((vector_size(32)));
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/tls_preserve_2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/tls_preserve_2.c
index 20e939fbb85b4..1f477ba8c2599 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/tls_preserve_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/tls_preserve_2.c
@@ -2,6 +2,7 @@
 /* { dg-options "-O3 -fpic -msve-vector-bits=256 -fno-schedule-insns" } */
 /* { dg-require-effective-target fpic } */
 /* { dg-require-effective-target tls_native } */
+/* { dg-additional-options "-ftls-model=global-dynamic" { target *-*-rtems* } 
} */
 
 typedef float v8si __attribute__ ((vector_size (32)));
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/tl

[PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-20 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Compared with previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648303.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..eb1b0aff77c 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1175,6 +1175,62 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1268,6 +1324,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+


[PATCH-2v3] Value Range: Add range op for builtin isfinite

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isfinite.

  Compared to previous version, the main change is to set varying if
nothing is known about the range.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite
might not be folded at front end.  So the range op for isfinite is needed
for value range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 922ee7bf0f7..49b6d7abde1 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1229,6 +1229,61 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1326,6 +1381,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH-3] Value Range: Add range op for builtin isnormal

2024-05-20 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isnormal. It also adds two
help function in frange to detect range of normal floating-point and
range of subnormal or zero.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isnormal

The former patch adds optab for builtin isnormal. Thus builtin isnormal
might not be folded at front end.  So the range op for isnormal is needed
for value range analysis.  This patch adds range op for builtin isnormal.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index d69900d1f56..4c3f9c98282 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1281,6 +1281,60 @@ public:
   }
 } op_cfn_isfinite;

+//Implement range operator for CFN_BUILT_IN_ISNORMAL
+class cfn_isnormal :  public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isnormal ())
+  {
+   r.set_nonzero (type);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ()
+   || op1.known_isdenormal_or_zero ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isnormal;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1383,6 +1437,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isfinite;
   break;

+case CFN_BUILT_IN_ISNORMAL:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isnormal;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
new file mode 100644
index 000..c4df4d839b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 37ce91dc52d..1443d1906e5 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -588,6 +588,8 @@ public:
   bool maybe_isinf () const;
   bool signbit_p (bool &signbit) const;
   bool nan_signbit_p (bool &signbit) const;
+  bool known_isnormal () const;
+  bool known_isdenormal_or_zero () const;

 protected:
   virtual bool contains_p (tree cst) const override;
@@ -1650,6 +1652,33 @@ frange::known_isfinite () const
   return (!maybe_isnan () && !real_isinf (&m_min) && !real_isinf (&m_max));
 }

+// Return TRUE if range is known to be normal.
+
+inline bool
+frange::known_isnormal () const
+{
+  if (!known_isfinite ())
+return false;
+
+  machine_mode mode = TYPE_MODE (type ());
+  return (!real_isdenormal (&m_min, mode) && !real_isdenormal (&m_max, mode)
+ && !real_iszero (&m_min) && !real_iszero (&m_max)
+ && (!real_isneg (&m_min) ||

[PATCH v2] [testsuite] xfail pr79004 on longdouble64; drop long_double_64bit (was: ppc: testsuite: pr79004 needs -mlong-double-128)

2024-05-20 Thread Alexandre Oliva
On May  8, 2024, "Kewen.Lin"  wrote:

>>> How about the generic one "longdouble64"?  I did a grep and found it has one
>>> use, I'd expect it can work here. :)
>> 
>> ... since this and longdouble128 exist, maybe we can fix it and leave
>> them all alone, despite the interface oddity.
>> 
> ... personally I'm inclined to drop this 64 bit one. :)

Some of the asm opcodes expected by pr79004 depend on
-mlong-double-128 to be output.  E.g., without this flag, the
conditions of patterns @extenddf2 and extendsf2 do not
hold, and so GCC resorts to libcalls instead of even trying
rs6000_expand_float128_convert.

Perhaps the conditions are too strict, and they could enable the use
of conversion insns involving __ieee128/_Float128 even with 64-bit
long doubles.

For now, xfail the opcodes that are not available on longdouble64.

While at that, drop long_double_64bit, since it's broken and sort of
redundant.

Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on ppc64-vx7r2.
Ok to install?


for  gcc/testsuite/ChangeLog

PR target/105359
* gcc.target/powerpc/pr79004.c: Xfail opcodes not available on
longdouble64.
* lib/target-supports.exp
(check_effective_target_long_double_64bit): Drop.
(add_options_for_long_double_64bit): Likewise.
---
 gcc/testsuite/gcc.target/powerpc/pr79004.c |   14 +
 gcc/testsuite/lib/target-supports.exp  |   43 
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
b/gcc/testsuite/gcc.target/powerpc/pr79004.c
index caf1f6c1eefe4..2cb8bf4bc14bc 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
@@ -100,10 +100,12 @@ void to_uns_short_store_n (TYPE a, unsigned short *p, 
long n) { p[n] = (unsigned
 void to_uns_int_store_n (TYPE a, unsigned int *p, long n) { p[n] = (unsigned 
int)a; }
 void to_uns_long_store_n (TYPE a, unsigned long *p, long n) { p[n] = (unsigned 
long)a; }
 
-/* { dg-final { scan-assembler-not {\mbl __}   } } */
-/* { dg-final { scan-assembler {\mxscvdpqp\M}  } } */
-/* { dg-final { scan-assembler {\mxscvqpdp\M}  } } */
-/* { dg-final { scan-assembler {\mxscvqpdpo\M} } } */
+/* On targets with 64-bit long double, some opcodes to deal with __float128 are
+   disabled, see PR target/105359.  */
+/* { dg-final { scan-assembler-not {\mbl __}   { xfail longdouble64 } } } 
*/
+/* { dg-final { scan-assembler {\mxscvdpqp\M}  { xfail longdouble64 } } } 
*/
+/* { dg-final { scan-assembler {\mxscvqpdp\M}  { xfail longdouble64 } } } 
*/
+/* { dg-final { scan-assembler {\mxscvqpdpo\M} { xfail longdouble64 } } } 
*/
 /* { dg-final { scan-assembler {\mxscvqpsdz\M} } } */
 /* { dg-final { scan-assembler {\mxscvqpswz\M} } } */
 /* { dg-final { scan-assembler {\mxscvsdqp\M}  } } */
@@ -111,7 +113,7 @@ void to_uns_long_store_n (TYPE a, unsigned long *p, long n) 
{ p[n] = (unsigned l
 /* { dg-final { scan-assembler {\mlxsd\M}  } } */
 /* { dg-final { scan-assembler {\mlxsiwax\M}   } } */
 /* { dg-final { scan-assembler {\mlxsiwzx\M}   } } */
-/* { dg-final { scan-assembler {\mlxssp\M} } } */
+/* { dg-final { scan-assembler {\mlxssp\M} { xfail longdouble64 } } } 
*/
 /* { dg-final { scan-assembler {\mstxsd\M} } } */
 /* { dg-final { scan-assembler {\mstxsiwx\M}   } } */
-/* { dg-final { scan-assembler {\mstxssp\M}} } */
+/* { dg-final { scan-assembler {\mstxssp\M}{ xfail longdouble64 } } } 
*/
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ec9baa4f32a30..dc7d4f2b5f39e 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2930,49 +2930,6 @@ proc add_options_for_long_double_ieee128 { flags } {
 return "$flags"
 }
 
-# Check if GCC and GLIBC supports explicitly specifying that the long double
-# format uses the IEEE 64-bit.  Under little endian PowerPC Linux, you need
-# GLIBC 2.32 or later to be able to use a different long double format for
-# running a program than the system default.
-
-proc check_effective_target_long_double_64bit { } {
-return [check_runtime_nocache long_double_64bit {
-   #include 
-   #include 
-   /* use volatile to prevent optimization.  */
-   volatile double a = 3.0;
-   volatile long double one = 1.0L;
-   volatile long double two = 2.0L;
-   volatile long double b;
-   char buffer[20];
-   int main()
-   {
- double a2;
- long double b2;
- if (sizeof (long double) != 8)
-   return 1;
- b = one + two;
- /* eliminate removing volatile cast warning.  */
- a2 = a;
- b2 = b;
- if (memcmp (&a2, &b2, 16) != 0)
-   return 1;
- sprintf (buffer, "%lg", b);
- return strcmp (buffer, "3") != 0;
-   }
-}  [add_options_for_ppc_long_double_ov

RE: [PATCH v3] Match: Extract ternary_integer_types_match_p helper func [NFC]

2024-05-20 Thread Tamar Christina



> -Original Message-
> From: pan2...@intel.com 
> Sent: Tuesday, May 21, 2024 2:13 AM
> To: gcc-patches@gcc.gnu.org
> Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina
> ; richard.guent...@gmail.com; Pan Li
> 
> Subject: [PATCH v3] Match: Extract ternary_integer_types_match_p helper func
> [NFC]
> 
> From: Pan Li 
> 
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.
> 
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
> 
> gcc/ChangeLog:
> 
>   * match.pd: Leverage helper func for SAT_ADD match.
>   * tree.cc (ternary_integer_types_match_p): New func impl to
>   check if ternary tree types are all integer.
>   * tree.h (ternary_integer_types_match_p): New func decl.
> 

Thanks, looks good to me! You still need approval from a maintainer..

Cheers,
Tamar

> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 28 +++-
>  gcc/tree.cc  | 16 
>  gcc/tree.h   |  5 +
>  3 files changed, 28 insertions(+), 21 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..cff67c84498 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -39,7 +39,8 @@ along with GCC; see the file COPYING3.  If not see
> HONOR_NANS
> uniform_vector_p
> expand_vec_cmp_expr_p
> -   bitmask_inv_cst_vector_p)
> +   bitmask_inv_cst_vector_p
> +   ternary_integer_types_match_p)
> 
>  /* Operator lists.  */
>  (define_operator_list tcc_comparison
> @@ -3046,38 +3047,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED
> (type
> 
>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED
> (type
> 
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED
> (type
> 
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (gt @0 (plus:c @0 @1
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED
> (type
> 
>  (match (usadd_right_part_2 @0 @1)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED
> (type
> 
>  /* We cannot merge or overload usadd_left_part_1 and usadd_left_part_2
> because the sub part of left_part_2 cannot work with right_part_1.
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 6564b002dc1..b59d42c3e47 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -10622,6 +10622,22 @@ uniform_integer_cst_p (tree t)
>return NULL_TREE;
>  }
> 
> +/* Check if the types T1,  T2 and T3 are effectively the same integer type.
> +   If T1,  T2 or T3 is not a type, the test applies to their TREE_TYPE.  */
> +
> +bool
> +ternary_integer_types_match_p (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_compatible_p (t1, t2) && types_compatible_p (t2, t3);
> +}
> +
>  /* Checks to see if T is a constant or a constant vector and if each element 
> E
> adheres to ~E + 1 == pow2 then return ~E otherwise NULL_TREE.  */
> 
> diff --git a/gcc/tree.h b/gcc/tree.h
> index ee2aae332a4..4ac59ac55cb 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -5212,6 +5212,11 @@ extern bool integer_pow2p (const_tree);
> 
>  extern tree bitmask_inv_cst_vector_p (tree);
> 
> +/* Check if the types T1,  T2 and T3 are effectively the same integer type.
> +   If T1,  T2 or T3 is not a type, the test 

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-20 Thread Tamar Christina
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, May 15, 2024 10:31 PM
> To: Tamar Christina 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd
> ; Richard Earnshaw ; Marcus
> Shawcroft ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain
> operations.
> 
> Tamar Christina  writes:
> >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina
> >> >>  wrote:
> >> >> >
> >> >> > Hi All,
> >> >> >
> >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that
> state
> >> >> > that for predicated operations that also produce a predicate it is 
> >> >> > preferred
> >> >> > that the codegen should use a different register for the destination 
> >> >> > than
> that
> >> >> > of the input predicate in order to avoid a performance overhead.
> >> >> >
> >> >> > This of course has the problem that it increases register pressure 
> >> >> > and so
> >> should
> >> >> > be done with care.  Additionally not all micro-architectures have this
> >> >> > consideration and so it shouldn't be done as a default thing.
> >> >> >
> >> >> > The patch series adds support for doing conditional early clobbers 
> >> >> > through
> a
> >> >> > combination of new alternatives and attributes to control their 
> >> >> > availability.
> >> >>
> >> >> You could have two alternatives, one with early clobber and one with
> >> >> a matching constraint where you'd disparage the matching constraint one?
> >> >>
> >> >
> >> > Yeah, that's what I do, though there's no need to disparage the non-early
> clobber
> >> > alternative as the early clobber alternative will naturally get a 
> >> > penalty if it
> needs a
> >> > reload.
> >>
> >> But I think Richard's suggestion was to disparage the one with a matching
> >> constraint (not the earlyclobber), to reflect the increased cost of
> >> reusing the register.
> >>
> >> We did take that approach for gathers, e.g.:
> >>
> >>  [&w, Z,   w, Ui1, Ui1, Upl] ld1\t%0.s, %5/z, [%2.s]
> >>  [?w, Z,   0, Ui1, Ui1, Upl] ^
> >>
> >> The (supposed) advantage is that, if register pressure is so tight
> >> that using matching registers is the only alternative, we still
> >> have the opportunity to do that, as a last resort.
> >>
> >> Providing only an earlyclobber version means that using the same
> >> register is prohibited outright.  If no other register is free, the RA
> >> would need to spill something else to free up a temporary register.
> >> And it might then do the equivalent of (pseudo-code):
> >>
> >>   not p1.b, ..., p0.b
> >>   mov p0.d, p1.d
> >>
> >> after spilling what would otherwise have occupied p1.  In that
> >> situation it would be better use:
> >>
> >>   not p0.b, ..., p0.b
> >>
> >> and not introduce the spill of p1.
> >
> > I think I understood what Richi meant, but I thought it was already working 
> > that
> way.
> 
> The suggestion was to use matching constraints (like "0") though,
> whereas the patch doesn't.  I think your argument is that you don't
> need to use matching constraints.  But that's different from the
> suggestion (and from how we handle gathers).
> 
> I was going to say in response to patch 3 (but got distracted, sorry):
> I don't think we should have:
> 
>&Upa, Upa, ...
>Upa, Upa, ...
> 
> (taken from the pure logic ops) enabled at the same time.  Even though
> it works for the testcases, I don't think it has well-defined semantics.
> 
> The problem is that, taken on its own, the second alternative says that
> matching operands are free.  And fundamentally, I don't think the costs
> *must* take the earlyclobber alternative over the non-earlyclobber one
> (when costing during IRA, for instance).  In principle, the cheapest
> is best.
> 
> The aim of the gather approach is to make each alternative correct in
> isolation.  In:
> 
>   [&w, Z,   w, Ui1, Ui1, Upl] ld1\t%0.s, %5/z, [%2.s]
>   [?w, Z,   0, Ui1, Ui1, Upl] ^
> 
> the second alternative says that it is possible to have operands 0
> and 2 be the same vector register, but using that version has the
> cost of an extra reload.  In that sense the alternatives are
> (essentially) consistent about the restriction.
> 
> > i.e. as one of the testcases I had:
> >
> >> aarch64-none-elf-gcc -O3 -g0 -S -o - pred-clobber.c -mcpu=neoverse-n2 
> >> -ffixed-
> p[1-15]
> >
> > foo:
> > mov z31.h, w0
> > ptrue   p0.b, all
> > cmplo   p0.h, p0/z, z0.h, z31.h
> > b   use
> >
> > and reload did not force a spill.
> >
> > My understanding of how this works, and how it seems to be working is that
> since reload costs
> > Alternative from front to back the cheapest one wins and it stops 
> > evaluating the
> rest.
> >
> > The early clobber case is first and preferred, however when it's not 
> > possible, i.e.
> requires a non-pseudo
> > reload, the reload cost is added to the alternative.
> >
> > However you're right that in the following testcase:
> >
> > -mcpu=neoverse-n2 -

Re: [PATCH v3 2/2] RISC-V: avoid LUI based const mat in alloca epilogue expansion

2024-05-20 Thread Jeff Law




On 5/20/24 5:32 PM, Vineet Gupta wrote:

This is testsuite clean however there's a dwarf quirk which I want to
run by the experts. The test that was tripping CI has following
fragment:

Before patch|   After Patch
--
li  t0,-4096|  addi sp,s0,-2048
addit0,t0,560   |  .cfi_def_cfa 2, 2048  <- #1
add sp,s0,t0|  addi sp,sp,-1488
.cfi_def_cfa 2, 3536|  .cfi_def_cfa_offset 3536  <- #2
addisp,sp,1504  |  addi sp,sp,1504
.cfi_def_cfa_offset 2032|  .cfi_def_cfa_offset 2032  <- #3

The dwarf insn #1 and #3 seem ok, however #2 seems dubious to me.
What about it seems dubious?  We need a CFA adjustment on each insn that 
modifies the stack pointer so that we can unwind at any arbitrary point.


The first adjustment says the prior frame is at sp + 2048.  Then it's at 
sp + 3536.  Then after the final insn the prior frame is at sp+2032.


Jeff


Re: [PING^2][PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-05-20 Thread jeevitha
Ping!

please review.

Thanks & Regards
Jeevitha


On 17/04/24 2:44 pm, jeevitha wrote:
> Ping!
> 
> I've incorporated all the suggested changes. Please review.
> 
> Thanks & Regards
> Jeevitha
> 
> On 21/03/24 6:21 pm, jeevitha wrote:
>> Hi All,
>>
>> The following patch has been bootstrapped and regtested on powerpc64le-linux.
>>
>> PTImode assists in generating even/odd register pairs on 128 bits. When the 
>> user 
>> specifies PTImode as an attribute, it breaks because there is no internal 
>> type 
>> to handle this mode. To address this, we have created a tree node with dummy 
>> type
>> to handle PTImode. We are not documenting this dummy type since users are not
>> allowed to use this type externally.
>>
>> 2024-03-21  Jeevitha Palanisamy  
>>
>> gcc/
>>  PR target/110411
>>  * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
>>  RS6000_BTI_INTPTI.
>>  * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>>  for PTImode type.
>>
>> gcc/testsuite/
>>  PR target/106895
>>  * gcc.target/powerpc/pr106895.c: New testcase.
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
>> b/gcc/config/rs6000/rs6000-builtin.cc
>> index 6698274031b..f553c72779e 100644
>> --- a/gcc/config/rs6000/rs6000-builtin.cc
>> +++ b/gcc/config/rs6000/rs6000-builtin.cc
>> @@ -756,6 +756,15 @@ rs6000_init_builtins (void)
>>else
>>  ieee128_float_type_node = NULL_TREE;
>>  
>> +  /* PTImode to get even/odd register pairs.  */
>> +  intPTI_type_internal_node = make_node(INTEGER_TYPE);
>> +  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
>> +  layout_type (intPTI_type_internal_node);
>> +  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
>> +  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
>> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
>> +  "__dummypti");
>> +
>>/* Vector pair and vector quad support.  */
>>vector_pair_type_node = make_node (OPAQUE_TYPE);
>>SET_TYPE_MODE (vector_pair_type_node, OOmode);
>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
>> index 68bc45d65ba..b6078077b20 100644
>> --- a/gcc/config/rs6000/rs6000.h
>> +++ b/gcc/config/rs6000/rs6000.h
>> @@ -2302,6 +2302,7 @@ enum rs6000_builtin_type_index
>>RS6000_BTI_ptr_vector_quad,
>>RS6000_BTI_ptr_long_long,
>>RS6000_BTI_ptr_long_long_unsigned,
>> +  RS6000_BTI_INTPTI,
>>RS6000_BTI_MAX
>>  };
>>  
>> @@ -2346,6 +2347,7 @@ enum rs6000_builtin_type_index
>>  #define uintDI_type_internal_node
>> (rs6000_builtin_types[RS6000_BTI_UINTDI])
>>  #define intTI_type_internal_node 
>> (rs6000_builtin_types[RS6000_BTI_INTTI])
>>  #define uintTI_type_internal_node
>> (rs6000_builtin_types[RS6000_BTI_UINTTI])
>> +#define intPTI_type_internal_node
>> (rs6000_builtin_types[RS6000_BTI_INTPTI])
>>  #define float_type_internal_node 
>> (rs6000_builtin_types[RS6000_BTI_float])
>>  #define double_type_internal_node
>> (rs6000_builtin_types[RS6000_BTI_double])
>>  #define long_double_type_internal_node   
>> (rs6000_builtin_types[RS6000_BTI_long_double])
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106895.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr106895.c
>> new file mode 100644
>> index 000..56547b7fa9d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
>> @@ -0,0 +1,15 @@
>> +/* PR target/106895 */
>> +/* { dg-require-effective-target int128 } */
>> +/* { dg-options "-O2" } */
>> +
>> +/* Verify the following generates even/odd register pairs.  */
>> +
>> +typedef __int128 pti __attribute__((mode(PTI)));
>> +
>> +void
>> +set128 (pti val, pti *mem)
>> +{
>> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
>> +}
>> +
>> +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */
>>
>>


Re: [PING^4][PATCH] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-05-20 Thread jeevitha
Ping!

please review.

Thanks & Regards
Jeevitha


On 17/04/24 2:46 pm, jeevitha wrote:
> Ping!
> 
> please review.
> 
> Thanks & Regards
> Jeevitha
> 
> On 26/03/24 10:23 am, jeevitha wrote:
>> Ping!
>>
>> please review.
>>
>> Thanks & Regards
>> Jeevitha
>>
>>
>> On 26/02/24 11:13 am, jeevitha wrote:
>>> Hi All,
>>>
>>> The following patch has been bootstrapped and regtested on 
>>> powerpc64le-linux.
>>>
>>> PR110040 exposes an issue concerning moves from vector registers to GPRs.
>>> There are two moves, one for upper 64 bits and the other for the lower
>>> 64 bits.  In the problematic test case, we are only interested in storing
>>> the lower 64 bits.  However, the instruction for copying the upper 64 bits
>>> is still emitted and is dead code.  This patch adds a splitter that splits
>>> apart the two move instructions so that DCE can remove the dead code after
>>> splitting.
>>>
>>> 2024-02-26  Jeevitha Palanisamy  
>>>
>>> gcc/
>>> PR target/110040
>>> * config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.
>>>
>>> gcc/testsuite/
>>> PR target/110040
>>> * gcc.target/powerpc/pr110040-1.c: New testcase.
>>> * gcc.target/powerpc/pr110040-2.c: New testcase.
>>>
>>>
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 6111cc90eb7..78457f8fb14 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -6706,3 +6706,19 @@
>>>"vmsumcud %0,%1,%2,%3"
>>>[(set_attr "type" "veccomplex")]
>>>  )
>>> +
>>> +(define_split
>>> +  [(set (match_operand:V1TI 0 "int_reg_operand")
>>> +   (match_operand:V1TI 1 "vsx_register_operand"))]
>>> +  "reload_completed
>>> +   && TARGET_DIRECT_MOVE_64BIT"
>>> +   [(pc)]
>>> +{
>>> +  rtx op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
>>> +  rtx op1 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
>>> +  rtx op2 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
>>> +  rtx op3 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
>>> +  emit_insn (gen_vsx_extract_v2di (op0, op1, GEN_INT (0)));
>>> +  emit_insn (gen_vsx_extract_v2di (op2, op3, GEN_INT (1)));
>>> +  DONE;
>>> +})
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
>>> new file mode 100644
>>> index 000..fb3bd254636
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
>>> @@ -0,0 +1,14 @@
>>> +/* PR target/110040 */
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
>>> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
>>> +
>>> +#include 
>>> +
>>> +void
>>> +foo (signed long *dst, vector signed __int128 src)
>>> +{
>>> +  *dst = (signed long) src[0];
>>> +}
>>> +
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
>>> new file mode 100644
>>> index 000..f3aa22be4e8
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
>>> @@ -0,0 +1,13 @@
>>> +/* PR target/110040 */
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target power10_ok } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
>>> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
>>> +
>>> +#include 
>>> +
>>> +void
>>> +foo (signed int *dst, vector signed __int128 src)
>>> +{
>>> +  __builtin_vec_xst_trunc (src, 0, dst);
>>> +}
>>>
>>>


[PATCH 1/2] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-05-20 Thread liuhongt
When mask is (1 << (prec - imm) - 1) which is used to clear upper bits
of A, then it can be simplified to LSHIFTRT.

i.e Simplify
(and:v8hi
  (ashifrt:v8hi A 8)
  (const_vector 0xff x8))
to
(lshifrt:v8hi A 8)

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok of trunk?

gcc/ChangeLog:

PR target/114428
* simplify-rtx.cc
(simplify_context::simplify_binary_operation_1):
Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for
specific mask.
---
 gcc/simplify-rtx.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 53f54d1d392..6c91409200e 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4021,6 +4021,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
code,
return tem;
}
 
+  /* (and:v4si
+  (ashiftrt:v4si A 16)
+  (const_vector: 0x x4))
+is just (lshiftrt:v4si A 16).  */
+  if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT
+ && (CONST_INT_P (XEXP (op0, 1))
+ || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1
+ && GET_CODE (op1) == CONST_VECTOR
+ && CONST_VECTOR_DUPLICATE_P (op1))
+   {
+ unsigned HOST_WIDE_INT shift_count
+   = (CONST_INT_P (XEXP (op0, 1))
+  ? UINTVAL (XEXP (op0, 1))
+  : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0)));
+ unsigned HOST_WIDE_INT inner_prec
+   = GET_MODE_PRECISION (GET_MODE_INNER (mode));
+
+ /* Avoid UD shift count.  */
+ if (shift_count < inner_prec
+ && (UINTVAL (XVECEXP (op1, 0, 0))
+ == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1))
+   return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP 
(op0, 1));
+   }
+
   tem = simplify_byte_swapping_operation (code, mode, op0, op1);
   if (tem)
return tem;
-- 
2.31.1



[PATCH 2/2] [x86] Adjust rtx_cost for MEM to enable more simplication

2024-05-20 Thread liuhongt
For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
variants in ix86_vector_duplicate_simode_const.
Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
bit larger than broadcast.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:
PR target/114428
* config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
CONST_VECTOR_DUPLICATE_P in constant_pool.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114428.c: New test.
---
 gcc/config/i386/i386-expand.cc   |  2 +-
 gcc/config/i386/i386-protos.h|  1 +
 gcc/config/i386/i386.cc  | 13 +
 gcc/testsuite/gcc.target/i386/pr114428.c | 18 ++
 4 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114428.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4e16aedc5c1..d96c365e144 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -588,7 +588,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
 
 /* OP is a memref of CONST_VECTOR, return scalar constant mem
if CONST_VECTOR is a vec_duplicate, else return NULL.  */
-static rtx
+rtx
 ix86_broadcast_from_constant (machine_mode mode, rtx op)
 {
   int nunits = GET_MODE_NUNITS (mode);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index dbc861fb1ea..90712769200 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -107,6 +107,7 @@ extern void ix86_expand_clear (rtx);
 extern void ix86_expand_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move (machine_mode, rtx[]);
 extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]);
+extern rtx ix86_broadcast_from_constant (machine_mode, rtx);
 extern rtx ix86_fixup_binary_operands (enum rtx_code, machine_mode,
   rtx[], bool = false);
 extern void ix86_fixup_binary_operands_no_copy (enum rtx_code, machine_mode,
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b4838b7939e..fdd9343e47a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22197,6 +22197,19 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
   return true;
 
 case MEM:
+  /* CONST_VECTOR_DUPLICATE_P in constant_pool is just broadcast.
+or variants in ix86_vector_duplicate_simode_const.  */
+
+  if (GET_MODE_SIZE (mode) >= 16
+ && VECTOR_MODE_P (mode)
+ && SYMBOL_REF_P (XEXP (x, 0))
+ && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0))
+ && ix86_broadcast_from_constant (mode, x))
+   {
+ *total = COSTS_N_INSNS (2) + speed;
+ return true;
+   }
+
   /* An insn that accesses memory is slightly more expensive
  than one that does not.  */
   if (speed)
diff --git a/gcc/testsuite/gcc.target/i386/pr114428.c 
b/gcc/testsuite/gcc.target/i386/pr114428.c
new file mode 100644
index 000..bbbc5a080f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114428.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=x86-64-v3 -mno-avx512f -O2" } */
+/* { dg-final { scan-assembler-not "vpsra[dw]" } } */
+
+void
+foo2 (char* __restrict a, short* b)
+{
+  for (int i = 0; i != 32; i++)
+a[i] = b[i] >> (short)8;
+}
+
+void
+foo3 (char* __restrict a, short* b)
+{
+  for (int i = 0; i != 16; i++)
+a[i] = b[i] >> (short)8;
+}
+
-- 
2.31.1



Re: [C PATCH, v2] Fix for redeclared enumerator initialized with different type [PR115109]

2024-05-20 Thread Martin Uecker
Am Montag, dem 20.05.2024 um 21:30 + schrieb Joseph Myers:
> On Sun, 19 May 2024, Martin Uecker wrote:
> 
> > c23 specifies that the type of a redeclared enumerator is the one of the
> > previous declaration.  Convert initializers with different type 
> > accordingly
> > and add -Woverflow warning.
> 
> It doesn't make sense to use -Woverflow.  Either the value is the same (in 
> which case it fits in the desired type), or it's different (and you should 
> get the "conflicting redeclaration of enumerator" error or some equivalent 
> error, whether or not the value in the redeclaration fits in the previous 
> type).
> 
> Note that this includes both explicit values and values determined by 
> adding 1 implicitly.  E.g.
> 
>   enum e { A = 0, B = UINT_MAX };
>   enum e { B = UINT_MAX, A };
> 
> is not valid, because in the redefinition, A gets the value 1 greater than 
> UINT_MAX (which is not representable in unsigned int) - there is *not* an 
> addition in type unsigned int, or in type enum e.
> 
> The constraint violated is the general one "If an identifier has no 
> linkage, there shall be no more than one declaration of the identifier (in 
> a declarator or type specifier) with the same scope and in the same name 
> space, except that: ... enumeration constants and tags may be redeclared 
> as specified in 6.7.3.3 and 6.7.3.4, respectively." (where 6.7.3.3 says 
> "Enumeration constants can be redefined in the same scope with the same 
> value as part of a redeclaration of the same enumerated type." - as the 
> redefinition is not with the same value, the "as specified in 6.7.3.3" is 
> not satisfied and so the general constraint against redeclarations with no 
> linkage applies).

This assumes that the value in question is the one of the initializer and not 
the
one after initialization (with no clear rules how this works in this case), 
which is probably not how this wording would be understood in other contexts.
But I agree that your interpretation is probably closer to what was intended
and makes more sense in this case.

Martin

> 



[PATCH] x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw_por [PR115146]

2024-05-20 Thread Levy Hsu
Replaced arithmetic shifts with logical shifts in 
expand_vec_perm_psrlw_psllw_por to avoid sign bit extension issues. Also 
corrected gen_vlshrv8hi3 to gen_lshrv8hi3 and gen_vashlv8hi3 to gen_ashlv8hi3.

Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?

Co-authored-by: H.J. Lu 

gcc/ChangeLog:

PR target/115146
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): Replace 
arithmatic shift
gen_ashrv4hi3 with logic shift gen_lshrv4hi3.
Replace gen_vlshrv8hi3 with gen_lshrv8hi3 and gen_vashlv8hi3 with 
gen_ashlv8hi3.

gcc/testsuite/ChangeLog:

PR target/115146
* g++.target/i386/pr107563-a.C: Append '-mno-sse3' to compile option
to avoid test failure on hosts with SSE3 support.
* g++.target/i386/pr107563-b.C: Append '-mno-sse3' to compile option
to avoid test failure on hosts with SSE3 support.
* gcc.target/i386/pr115146.c: New test.
---
 gcc/config/i386/i386-expand.cc |  6 ++--
 gcc/testsuite/g++.target/i386/pr107563-a.C |  4 +--
 gcc/testsuite/g++.target/i386/pr107563-b.C |  2 +-
 gcc/testsuite/gcc.target/i386/pr115146.c   | 37 ++
 4 files changed, 43 insertions(+), 6 deletions(-)
 create mode 100755 gcc/testsuite/gcc.target/i386/pr115146.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4e16aedc5c1..945530d6481 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -22386,14 +22386,14 @@ expand_vec_perm_psrlw_psllw_por (struct 
expand_vec_perm_d *d)
   if (!TARGET_MMX_WITH_SSE)
return false;
   mode = V4HImode;
-  gen_shr = gen_ashrv4hi3;
+  gen_shr = gen_lshrv4hi3;
   gen_shl = gen_ashlv4hi3;
   gen_or = gen_iorv4hi3;
   break;
 case E_V16QImode:
   mode = V8HImode;
-  gen_shr = gen_vlshrv8hi3;
-  gen_shl = gen_vashlv8hi3;
+  gen_shr = gen_lshrv8hi3;
+  gen_shl = gen_ashlv8hi3;
   gen_or = gen_iorv8hi3;
   break;
 default: return false;
diff --git a/gcc/testsuite/g++.target/i386/pr107563-a.C 
b/gcc/testsuite/g++.target/i386/pr107563-a.C
index 605c1bdf814..c1c332bb948 100755
--- a/gcc/testsuite/g++.target/i386/pr107563-a.C
+++ b/gcc/testsuite/g++.target/i386/pr107563-a.C
@@ -1,8 +1,8 @@
 /* PR target/107563.C */
 /* { dg-do compile { target { ! ia32 } } } */
-/* { dg-options "-std=c++2b -O3 -msse2" } */
+/* { dg-options "-std=c++2b -O3 -msse2 -mno-sse3" } */
 /* { dg-final { scan-assembler-times "psllw" 1 } } */
-/* { dg-final { scan-assembler-times "psraw" 1 } } */
+/* { dg-final { scan-assembler-times "psrlw" 1 } } */
 /* { dg-final { scan-assembler-times "por" 1 } } */
 
 using temp_vec_type2 [[__gnu__::__vector_size__(8)]] = char;
diff --git a/gcc/testsuite/g++.target/i386/pr107563-b.C 
b/gcc/testsuite/g++.target/i386/pr107563-b.C
index 0ce3e8263bb..d5cc0300f46 100755
--- a/gcc/testsuite/g++.target/i386/pr107563-b.C
+++ b/gcc/testsuite/g++.target/i386/pr107563-b.C
@@ -1,5 +1,5 @@
 /* PR target/107563.C */
-/* { dg-options "-std=c++2b -O3 -msse2" } */
+/* { dg-options "-std=c++2b -O3 -msse2 -mno-sse3" } */
 /* { dg-final { scan-assembler-times "psllw" 1 } } */
 /* { dg-final { scan-assembler-times "psrlw" 1 } } */
 /* { dg-final { scan-assembler-times "por" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr115146.c 
b/gcc/testsuite/gcc.target/i386/pr115146.c
new file mode 100755
index 000..df7d0131968
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115146.c
@@ -0,0 +1,38 @@
+/* { dg-do run { target sse2_runtime } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned char v8qi __attribute__((vector_size (8)));
+
+v8qi res, a;
+
+void __attribute__((noipa))
+foo (void)
+{
+  res = __builtin_shufflevector(a, a, 1, 0, 3, 2, 5, 4, 7, 6);
+}
+
+void
+comp (v8qi a, v8qi b, int n)
+{
+  for (unsigned i = 0; i < n; ++i)
+if ((a)[i] != (b)[i])
+  __builtin_abort ();
+}
+
+#define E0 140
+#define E1 141
+#define E2 142
+#define E3 143
+#define E4 144
+#define E5 145
+#define E6 146
+#define E7 147
+
+int main()
+{
+  a = (v8qi) { E0, E1, E2, E3, E4, E5, E6, E7 };
+  foo ();
+  comp (res, ((v8qi) { E1, E0, E3, E2, E5, E4, E7, E6 }), 8);
+  return 0;
+}
+
-- 
2.31.1



Re: [PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Richard Biener
On Mon, May 20, 2024 at 11:37 PM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Sunday, May 19, 2024 11:55 AM
> > To: Andrew Pinski (QUIC) 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH] PHIOPT: Don't transform minmax if
> > middle bb contains a phi [PR115143]
> >
> >
> >
> > > Am 19.05.2024 um 01:12 schrieb Andrew Pinski
> > :
> > >
> > > The problem here is even if last_and_only_stmt returns a
> > statement,
> > > the bb might still contain a phi node which defines a ssa
> > name which
> > > is used in that statement so we need to add a check to make
> > sure that
> > > the phi nodes are empty for the middle bbs in both the
> > > `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B`
> > cases.
> >
> > Is that single arg PHIs or do we have an extra edge into the
> > middle BB?  I think that might be unexpected, at least costing
> > wise.  Maybe Also to some of the replacement code we have ?
>
> It is only a single arg PHI since we already reject multiple edges in the 
> middle BBs for these cases.
> It was EVPR that produces the single arg PHI in the original testcase from 
> folding of a conditional to false and evpr does not do simple name prop in 
> this case and there is no pass inbetween evrp and phiopt that will clear up 
> single arg PHI.
> I added the Gimple based testcases basically to avoid the needing of 
> depending on what previous passes could produce too.
>
> >
> > > OK for trunk and backport to all open branches since r14-
> > 3827-g30e6ee074588ba was backported?
> > > Bootstrapped and tested on x86_64_linux-gnu with no
> > regressions.
> > >
> >
> > Ok
>
> Does this include the GCC 13 branch or should I wait until after the GCC 
> 13.3.0 release?

Please wait until after the release.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> >
> > Richard
> >
> > >PR tree-optimization/115143
> > >
> > > gcc/ChangeLog:
> > >
> > >* tree-ssa-phiopt.cc (minmax_replacement): Check for
> > empty
> > >phi nodes for middle bbs for the case where middle bb is
> > not empty.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >* gcc.c-torture/compile/pr115143-1.c: New test.
> > >* gcc.c-torture/compile/pr115143-2.c: New test.
> > >* gcc.c-torture/compile/pr115143-3.c: New test.
> > >
> > > Signed-off-by: Andrew Pinski 
> > > ---
> > > .../gcc.c-torture/compile/pr115143-1.c| 21
> > +
> > > .../gcc.c-torture/compile/pr115143-2.c| 30
> > +++
> > > .../gcc.c-torture/compile/pr115143-3.c| 29
> > ++
> > > gcc/tree-ssa-phiopt.cc| 12 
> > > 4 files changed, 92 insertions(+)
> > > create mode 100644 gcc/testsuite/gcc.c-
> > torture/compile/pr115143-1.c
> > > create mode 100644 gcc/testsuite/gcc.c-
> > torture/compile/pr115143-2.c
> > > create mode 100644 gcc/testsuite/gcc.c-
> > torture/compile/pr115143-3.c
> > >
> > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > new file mode 100644
> > > index 000..5cb119ea432
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > @@ -0,0 +1,21 @@
> > > +/* PR tree-optimization/115143 */
> > > +/* This used to ICE.
> > > +   minmax part of phiopt would transform,
> > > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > > +   which was correct except b was defined by a phi in the
> > inner
> > > +   bb which was not handled. */
> > > +short a, d;
> > > +char b;
> > > +long c;
> > > +unsigned long e, f;
> > > +void g(unsigned long h) {
> > > +  if (c ? e : b)
> > > +if (e)
> > > +  if (d) {
> > > +a = f ? ({
> > > +  unsigned long i = d ? f : 0, j = e ? h : 0;
> > > +  i < j ? i : j;
> > > +}) : 0;
> > > +  }
> > > +}
> > > +
> > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > > b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > > new file mode 100644
> > > index 000..05c3bbe9738
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > > @@ -0,0 +1,30 @@
> > > +/* { dg-options "-fgimple" } */
> > > +/* PR tree-optimization/115143 */
> > > +/* This used to ICE.
> > > +   minmax part of phiopt would transform,
> > > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > > +   which was correct except b was defined by a phi in the
> > inner
> > > +   bb which was not handled. */
> > > +unsigned __GIMPLE (ssa,startwith("phiopt")) foo (unsigned
> > a, unsigned
> > > +b) {
> > > +  unsigned j;
> > > +  unsigned _23;
> > > +  unsigned _12;
> > > +
> > > +  __BB(2):
> > > +  if (a_6(D) != 0u)
> > > +goto __BB3;
> > > +  else
> > > +goto __BB4;
> > > +
> > > +  __BB(3):
> > > +  j_10 = __PHI (__BB2: b_11(D));
> > > +  _23 = __MIN (a_6(D), j_10);
> > > +  goto __BB4;
> > > +
> > > +  __BB(4):
> > > +  _12 = __PHI (__BB3: _23, __BB2: 0u);  return _12;
> > > +
> > > +}
> > > diff

Re: [committed] PATCH for Re: Stepping down as maintainer for ARC and Epiphany

2024-05-20 Thread Richard Biener
On Mon, May 20, 2024 at 4:45 PM Gerald Pfeifer  wrote:
>
> On Wed, 5 Jul 2023, Joern Rennecke wrote:
> > I haven't worked with these targets in years and can't really do
> > sensible maintenance or reviews of patches for them. I am currently
> > working on optimizations for other ports like RISC-V.
>
> I noticed MAINTAINERS was not updated, so pushed the patch below.

That leaves the epiphany port unmaintained.  Should we automatically add such
ports to the list of obsoleted ports?

Richard.

> Gerald
>
>
> commit f94598ffaf5affbc9421ff230502357b07c55d9c
> Author: Gerald Pfeifer 
> Date:   Mon May 20 16:43:05 2024 +0200
>
> MAINTAINERS: Update Joern Rennecke's status
>
> This is per his mail to g...@gcc.gnu.org on 7 Jul 2023.
>
> ChangeLog:
> * MAINTAINERS: Move Joern Rennecke from arc and epiphany 
> maintainer
> to Write After Approval.
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8e0add6bef8..e2870eef2ef 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -56,7 +56,6 @@ aarch64 port  Kyrylo Tkachov  
> 
>  alpha port Richard Henderson   
>  amdgcn portJulian Brown
>  amdgcn portAndrew Stubbs   
> -arc port   Joern Rennecke  
>  arc port   Claudiu Zissulescu  
>  arm port   Nick Clifton
>  arm port   Richard Earnshaw
> @@ -68,7 +67,6 @@ c6x port  Bernd Schmidt   
> 
>  cris port  Hans-Peter Nilsson  
>  c-sky port Xianmiao Qu 
>  c-sky port Yunhai Shang
> -epiphany port  Joern Rennecke  
>  fr30 port  Nick Clifton
>  frv port   Nick Clifton
>  frv port   Alexandre Oliva 
> @@ -634,6 +632,7 @@ Joe Ramsay  
> 
>  Rolf Rasmussen 
>  Fritz Reese
>  Volker Reichelt
> 
> +Joern Rennecke 
>  Bernhard Reutner-Fischer   
>  Tom Rix
>  Thomas Rodgers 


Re: [PATCH] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-20 Thread Richard Biener
On Tue, May 21, 2024 at 12:02 AM Andrew Pinski  wrote:
>
> The problem here is the pattern added in r13-1162-g9991d84d2a8435
> assumes that it is well defined to multiply zero_one_valuep by the truncated
> converted integer constant. It is well defined for all types except for 
> signed 1bit types.
> Where `a * -1` is produced which is undefined/
> So disable this pattern for 1bit signed types.
>
> Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the 
> undefinedness except when
> `-fsanitize=undefined` is turned on, this is why I added a testcase for that.
>
> OK for trunk and gcc-14 and gcc-13 branches? Bootstrapped and tested on 
> x86_64-linux-gnu with no regressions.

OK for trunk and branches.  Please wait until after 13.3.

Richard.

> PR tree-optimization/115154
>
> gcc/ChangeLog:
>
> * match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
> for 1bit signed types.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/ubsan/signed1bitfield-1.c: New test.
> * gcc.c-torture/execute/signed1bitfield-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  |  6 +++--
>  .../c-c++-common/ubsan/signed1bitfield-1.c| 25 +++
>  .../gcc.c-torture/execute/signed1bitfield-1.c | 23 +
>  3 files changed, 52 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..35e3d82b131 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(mult (convert @0) @1)))
>
>  /* Narrow integer multiplication by a zero_one_valued_p operand.
> -   Multiplication by [0,1] is guaranteed not to overflow.  */
> +   Multiplication by [0,1] is guaranteed not to overflow except for
> +   1bit signed types.  */
>  (simplify
>   (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
>   (if (INTEGRAL_TYPE_P (type)
>&& INTEGRAL_TYPE_P (TREE_TYPE (@0))
> -  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
> +  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
> +  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
>(mult (convert @1) (convert @2
>
>  /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
> diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
> b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
> new file mode 100644
> index 000..2ba8cf4dab0
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fsanitize=undefined" } */
> +
> +/* PR tree-optimization/115154 */
> +/* This was being miscompiled with -fsanitize=undefined due to
> +   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
> +   is undefined. */
> +
> +struct s {
> +  signed b : 1;
> +} f;
> +int i = 55;
> +__attribute__((noinline))
> +void check(int a)
> +{
> +if (!a)
> +__builtin_abort();
> +}
> +int main() {
> +int t = i != 5;
> +t = t*5;
> +f.b = t;
> +int tt = f.b;
> +check(f.b);
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
> new file mode 100644
> index 000..ab888ca3a04
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
> @@ -0,0 +1,23 @@
> +/* PR tree-optimization/115154 */
> +/* This was being miscompiled to `(signed:1)(t*5)`
> +   being transformed into `-((signed:1)t)` which is undefined.
> +   Note there is a pattern which removes the negative in some cases
> +   which works around the issue.  */
> +
> +struct {
> +  signed b : 1;
> +} f;
> +int i = 55;
> +__attribute__((noinline))
> +void check(int a)
> +{
> +if (!a)
> +__builtin_abort();
> +}
> +int main() {
> +int t = i != 5;
> +t = t*5;
> +f.b = t;
> +int tt = f.b;
> +check(f.b);
> +}
> --
> 2.43.0
>


Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-20 Thread Richard Biener
On Tue, May 21, 2024 at 4:35 AM Hongtao Liu  wrote:
>
> On Wed, May 15, 2024 at 5:24 PM Richard Biener
>  wrote:
> >
> > On Wed, May 15, 2024 at 4:15 AM Hongtao Liu  wrote:
> > >
> > > On Mon, May 13, 2024 at 3:40 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Mon, May 13, 2024 at 4:29 AM liuhongt  wrote:
> > > > >
> > > > > As testcase in the PR, O3 cunrolli may prevent vectorization for the
> > > > > innermost loop and increase register pressure.
> > > > > The patch removes the 1/3 reduction of unr_insn for innermost loop 
> > > > > for UL_ALL.
> > > > > ul != UR_ALL is needed since some small loop complete unrolling at O2 
> > > > > relies
> > > > > the reduction.
> > > > >
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > > No big impact for SPEC2017.
> > > > > Ok for trunk?
> > > >
> > > > This removes the 1/3 reduction when unrolling a loop nest (the case I 
> > > > was
> > > > concerned about).  Unrolling of a nest is by iterating in
> > > > tree_unroll_loops_completely
> > > > so the to be unrolled loop appears innermost.  So I think you need a new
> > > > parameter on tree_unroll_loops_completely_1 indicating whether we're in 
> > > > the
> > > > first iteration (or whether to assume inner most loops will "simplify").
> > > yes, it would be better.
> > > >
> > > > Few comments below
> > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR tree-optimization/112325
> > > > > * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2
> > > > > new parameters: loop and ul, and remove unr_insns reduction
> > > > > for innermost loop.
> > > > > (try_unroll_loop_completely): Pass loop and ul to
> > > > > estimated_unrolled_size.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > * gcc.dg/tree-ssa/pr112325.c: New test.
> > > > > * gcc.dg/vect/pr69783.c: Add extra option --param
> > > > > max-completely-peeled-insns=300.
> > > > > ---
> > > > >  gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 
> > > > > 
> > > > >  gcc/testsuite/gcc.dg/vect/pr69783.c  |  2 +-
> > > > >  gcc/tree-ssa-loop-ivcanon.cc | 16 +--
> > > > >  3 files changed, 71 insertions(+), 4 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c 
> > > > > b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > > > new file mode 100644
> > > > > index 000..14208b3e7f8
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
> > > > > @@ -0,0 +1,57 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
> > > > > +
> > > > > +typedef unsigned short ggml_fp16_t;
> > > > > +static float table_f32_f16[1 << 16];
> > > > > +
> > > > > +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> > > > > +unsigned short s;
> > > > > +__builtin_memcpy(&s, &f, sizeof(unsigned short));
> > > > > +return table_f32_f16[s];
> > > > > +}
> > > > > +
> > > > > +typedef struct {
> > > > > +ggml_fp16_t d;
> > > > > +ggml_fp16_t m;
> > > > > +unsigned char qh[4];
> > > > > +unsigned char qs[32 / 2];
> > > > > +} block_q5_1;
> > > > > +
> > > > > +typedef struct {
> > > > > +float d;
> > > > > +float s;
> > > > > +char qs[32];
> > > > > +} block_q8_1;
> > > > > +
> > > > > +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const 
> > > > > void * restrict vx, const void * restrict vy) {
> > > > > +const int qk = 32;
> > > > > +const int nb = n / qk;
> > > > > +
> > > > > +const block_q5_1 * restrict x = vx;
> > > > > +const block_q8_1 * restrict y = vy;
> > > > > +
> > > > > +float sumf = 0.0;
> > > > > +
> > > > > +for (int i = 0; i < nb; i++) {
> > > > > +unsigned qh;
> > > > > +__builtin_memcpy(&qh, x[i].qh, sizeof(qh));
> > > > > +
> > > > > +int sumi = 0;
> > > > > +
> > > > > +for (int j = 0; j < qk/2; ++j) {
> > > > > +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
> > > > > +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
> > > > > +
> > > > > +const int x0 = (x[i].qs[j] & 0xF) | xh_0;
> > > > > +const int x1 = (x[i].qs[j] >> 4) | xh_1;
> > > > > +
> > > > > +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> > > > > +}
> > > > > +
> > > > > +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
> > > > > ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
> > > > > +}
> > > > > +
> > > > > +*s = sumf;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump {(?n)Not unrolling loop [1-9] 
> > > > > \(--param max-completely-peel-times limit reached} "cunrolli"} } */
> > > > > diff --git a/gcc/testsuite/gcc.dg/vect/pr69783.c 
> > > > > b/gcc/testsuite/gcc.dg/vect/pr69783.c
> > > > > index 5df95d0ce4e..a1f75514d7

[PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-20 Thread Haochen Jiang
Hi all,

Since vpermq is really slow, we should avoid using it when it is
the only instruction could be used for ix86_expand_vecop_qihi2.

Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen

gcc/ChangeLog:

PR target/115069
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Do not enable the optimization when AVX512BW is not enabled.
---
 gcc/config/i386/i386-expand.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a6132911e6a..f24c800bb4f 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest, 
rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;
 
+  /* vpermq is slow and we should not fall into the optimization when
+ it is the only instruction to be selected.  */
+  if (!TARGET_AVX512BW)
+return false;
+
   if ((qimode == V16QImode && !TARGET_AVX2)
   || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
   /* There are no V64HImode instructions.  */
-- 
2.31.1



Re: [PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-20 Thread Hongtao Liu
On Tue, May 21, 2024 at 2:16 PM Haochen Jiang  wrote:
>
> Hi all,
>
> Since vpermq is really slow, we should avoid using it when it is
> the only instruction could be used for ix86_expand_vecop_qihi2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
Please add a testcase for it.
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/115069
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Do not enable the optimization when AVX512BW is not enabled.
> ---
>  gcc/config/i386/i386-expand.cc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f24c800bb4f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx 
> dest, rtx op1, rtx op2)
>bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>bool uns_p = code != ASHIFTRT;
>
> +  /* vpermq is slow and we should not fall into the optimization when
> + it is the only instruction to be selected.  */
> +  if (!TARGET_AVX512BW)
> +return false;
> +
>if ((qimode == V16QImode && !TARGET_AVX2)
>|| (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>/* There are no V64HImode instructions.  */
> --
> 2.31.1
>


-- 
BR,
Hongtao


  1   2   >