[PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Pan Li via Gcc-patches
From: Pan Li 

The decl_or_value is defined as void * before this PATCH. It will take
care of both the tree_node and rtx_def. Unfortunately, given a void
pointer cannot tell the input is tree_node or rtx_def.

Then we have some implicit structure layout requirement similar as
below. Or we will touch unreasonable bits when cast void * to tree_node
or rtx_def.

++---+--+
| offset | tree_node | rtx_def  |
++---+--+
|  0 | code: 16  | code: 16 | <- require the location and bitssize
++---+--+
| 16 | ...   | mode: 8  |
++---+--+
| ...   |
++---+--+
| 24 | ...   | ...  |
++---+--+

This behavior blocks the PATCH that extend the rtx_def mode from 8 to
16 bits for running out of machine mode. This PATCH introduced the
pointer_mux to tell the input is tree_node or rtx_def, and decouple
the above implicition dependency.

Signed-off-by: Pan Li 
Co-Authored-By: Richard Sandiford 
Co-Authored-By: Richard Biener 

gcc/ChangeLog:

* var-tracking.cc (DECL_OR_VALUE_OR_DEFAULT): New macro for
  clean code.
(dv_is_decl_p): Adjust type changes to pointer_mux.
(dv_as_decl): Likewise.
(dv_as_value): Likewise.
(dv_as_opaque): Likewise.
(variable_hasher::equal): Likewise.
(dv_from_decl): Likewise.
(dv_from_value): Likewise.
(shared_hash_find_slot_unshare_1): Likewise.
(shared_hash_find_slot_1): Likewise.
(shared_hash_find_slot_noinsert_1): Likewise.
(shared_hash_find_1): Likewise.
(unshare_variable): Likewise.
(vars_copy): Likewise.
(find_loc_in_1pdv): Likewise.
(find_mem_expr_in_1pdv): Likewise.
(dataflow_set_different): Likewise.
(variable_from_dropped): Likewise.
(variable_was_changed): Likewise.
(loc_exp_insert_dep): Likewise.
(notify_dependents_of_resolved_value): Likewise.
(vt_expand_loc_callback): Likewise.
(remove_value_from_changed_variables): Likewise.
(notify_dependents_of_changed_value): Likewise.
(emit_notes_for_differences_1): Likewise.
(emit_notes_for_differences_2): Likewise.
---
 gcc/var-tracking.cc | 119 +++-
 1 file changed, 74 insertions(+), 45 deletions(-)

diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index fae0c73e02f..9bc9d21e5ba 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -116,9 +116,17 @@
 #include "fibonacci_heap.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "mux-utils.h"
 
 typedef fibonacci_heap  bb_heap_t;
 
+/* A declaration of a variable, or an RTL value being handled like a
+   declaration by pointer_mux.  */
+typedef pointer_mux decl_or_value;
+
+#define DECL_OR_VALUE_OR_DEFAULT(ptr) \
+  ((ptr) ? decl_or_value (ptr) : decl_or_value ())
+
 /* var-tracking.cc assumes that tree code with the same value as VALUE rtx code
has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
Currently the value is the same as IDENTIFIER_NODE, which has such
@@ -196,15 +204,21 @@ struct micro_operation
 };
 
 
-/* A declaration of a variable, or an RTL value being handled like a
-   declaration.  */
-typedef void *decl_or_value;
-
 /* Return true if a decl_or_value DV is a DECL or NULL.  */
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  bool is_decl = !dv;
+
+  if (dv)
+{
+  if (dv.is_first ())
+   is_decl = (int) TREE_CODE (dv.known_first ()) != (int) VALUE;
+  else if (!dv.is_first () && !dv.is_second ())
+   is_decl = true;
+}
+
+  return is_decl;
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
@@ -219,7 +233,7 @@ static inline tree
 dv_as_decl (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_decl_p (dv));
-  return (tree) dv;
+  return dv.is_first () ? dv.known_first () : NULL_TREE;
 }
 
 /* Return the value in the decl_or_value.  */
@@ -227,14 +241,20 @@ static inline rtx
 dv_as_value (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_value_p (dv));
-  return (rtx)dv;
+  return dv.is_second () ? dv.known_second () : NULL_RTX;;
 }
 
 /* Return the opaque pointer in the decl_or_value.  */
 static inline void *
 dv_as_opaque (decl_or_value dv)
 {
-  return dv;
+  if (dv.is_first ())
+return dv.known_first ();
+
+  if (dv.is_second ())
+return dv.known_second ();
+
+  return NULL;
 }
 
 
@@ -503,9 +523,7 @@ variable_hasher::hash (const variable *v)
 inline bool
 variable_hasher::equal (const variable *v, const void *y)
 {
-  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
-
-  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));
+  return dv_as_opaque (v->dv) == y;
 }
 
 /* Free the element of VARIABLE_HTAB (its type is struct variable_def).  */
@@ -1396,8 +1414,7 @@ onepart_pool_al

RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-10 Thread Li, Pan2 via Gcc-patches
Filed the PATCH with var-tracking only as below, please help to review. Thanks!

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617973.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Wednesday, May 10, 2023 1:09 PM
To: Richard Biener ; Richard Sandiford 

Cc: Jeff Law ; Kito Cheng ; 
juzhe.zh...@rivai.ai; gcc-patches ; palmer 
; jakub 
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Just migrated to the pointer_mux for the var-tracking, it works well even the 
bitsize of tree_base code is different from the rtl_def code. I will prepare 
the PATCH if there is no surprise from the X86 bootstrap test.

Thanks Richard for pointing out the pointer_mux, 😉!

Pan 

-Original Message-
From: Li, Pan2
Sent: Tuesday, May 9, 2023 7:51 PM
To: Richard Biener ; Richard Sandiford 

Cc: Jeff Law ; Kito Cheng ; 
juzhe.zh...@rivai.ai; gcc-patches ; palmer 
; jakub 
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Sure thing, I will have a try and keep you posted.

Pan

-Original Message-
From: Richard Biener 
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford 
Cc: Li, Pan2 ; Jeff Law ; Kito Cheng 
; juzhe.zh...@rivai.ai; gcc-patches 
; palmer ; jakub 
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

On Tue, 9 May 2023, Richard Sandiford wrote:

> "Li, Pan2"  writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something 
> > similar as below. As I understand, the lower 8-bits of tree_base will be 
> > inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base   rtx_def
> > code: 16code: 8
> > side_effects_flag: 1mode: 16
> 
> I think we should try hard to avoid that though.  The 16-bit value 
> should be aligned to 16 bits if at all possible.  decl_or_value 
> doesn't seem like something that should be dictating our approach here.
> 
> Perhaps we can use pointer_mux for decl_or_value instead?  pointer_mux 
> is intended to be a standands-compliant (hah!) way of switching 
> between two pointer types in a reasonably efficient way.

Ah, I wasn't aware of that - yes, that looks good to use I think.

Pan, can you prepare a patch only doing such conversion of the var-tracking 
decl_or_value type?  Aka make it

typedef pointer_mux decl_or_value;

and adjust uses?

Thanks,
Richard.

> Thanks,
> Richard
> 
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink 
> > tree_code as 1:1 overlap to rtx_code. And completed one memory allocated 
> > bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 
> > Cc: Jeff Law ; Kito Cheng 
> > ; juzhe.zh...@rivai.ai; richard.sandiford 
> > ; gcc-patches ; 
> > palmer ; jakub 
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able 
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your 
> > patch?  VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when 
> > there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record 
> > wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be 
> > difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic 
> > conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink 
> > rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes 
> >> for X86.
> >> 
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >> 
> >> Pan
> >> 
> >> -Original Message-
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener ; Jeff Law 
> >> 
> >> Cc: Kito Cheng ; juzhe.zh...@rivai.ai; 
> >> richard.sandiford ; gcc-patches 
> >> ; palmer ; jakub 
> >> 
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 
> >> 8-bit to 16-bit
> >> 
> >> Oops. Actually I am patching a ver

Re: [PATCH] libffi: fix handling of homogeneous float128 structs [PR109447]

2023-05-10 Thread Andreas Schwab
On Mai 09 2023, Peter Bergner via Gcc-patches wrote:

> On 5/9/23 3:50 PM, Andreas Schwab wrote:
>> On Mai 09 2023, Peter Bergner via Gcc-patches wrote:
>> 
>>> It's almost as if the top level build machinery
>>> adds a LD_LIBRARY_PATH=...
>> 
>> See how the toplevel Makefile sets LD_LIBRARY_PATH (via RPATH_ENVVAR) if
>> gcc-bootstrap is set.
>
> I'm sorry to be dense, but can you point to the specific line?  In my
> $GCC_BUILD/Makefile, the only mention of LD_LIBRARY_PATH is:
>
>   RPATH_ENVVAR = LD_LIBRARY_PATH
>
> ...so that isn't setting LD_LIBRARY_PATH, but using it.

Have you considered searching for uses of RPATH_ENVVAR?

$ grep RPATH_ENVVAR Makefile.in 
RPATH_ENVVAR = @RPATH_ENVVAR@
# On targets where RPATH_ENVVAR is PATH, a subdirectory of the GCC build path
$(RPATH_ENVVAR)=`echo "$(TARGET_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
$(RPATH_ENVVAR)=`echo "$(HOST_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR);
$(RPATH_ENVVAR)=`echo "$(TARGET_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
$(RPATH_ENVVAR)=`echo "$(HOST_LIB_PATH)$$$(RPATH_ENVVAR)" | sed 
's,::*,:,g;s,^:*,,;s,:*$$,,'`; export $(RPATH_ENVVAR); \
# This is the list of directories that may be needed in RPATH_ENVVAR
# This is the list of directories that may be needed in RPATH_ENVVAR
"RPATH_ENVVAR=$(RPATH_ENVVAR)" \

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH 2/2] aarch64: Improve register allocation for lane instructions

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, May 10, 2023 at 12:05 AM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> Andrew Pinski  writes:
>> >  On Tue, May 9, 2023 at 11:02 AM Richard Sandiford via Gcc-patches
>> >  wrote:
>> >>
>> >> REG_ALLOC_ORDER is much less important than it used to be, but it
>> >> is still used as a tie-breaker when multiple registers in a class
>> >> are equally good.
>> >
>> > This was tried before but had to be reverted. I have not looked into
>> > the history on why though.
>> > Anyways it was recorded as https://gcc.gnu.org/PR63521.
>>
>> It looks like that was about the traditional use of REG_ALLOC_ORDER:
>> putting call-clobbered registers first and defining
>> HONOR_REG_ALLOC_ORDER to make order trump IRA's usual costing.
>> We definitely don't want to do that, for the reasons described in the
>> patch and that Richard gave in comment 2.  (IRA already accounts for
>> call-preservedness.  It also accounts for conflicts with argument
>> registers, so putting those in reverse order shouldn't be necessary.)
>>
>> The aim here is different: to keep REG_ALLOC_ORDER as a pure tiebreaker,
>> but to avoid eating into restricted FP register classes when we don't
>> need to.
>
> I wonder if IRA/LRA could do this on its own - when a register belongs
> to multiple
> register classes and there's choice between two being in N and M register
> classes prefer the register that's in fewer register classes?  I mean,
> that's your
> logic - choose a register that leaves maximum freedom of use for the remaining
> registers?

Yeah, I wondered about that.  But the problem is that targets
tend to define classes for special purposes.  E.g. aarch64 has
TAILCALL_ADDR_REGS, which contains just x16 and x17.  But that class
is only used for the address in an indirect sibling call.  Something
that niche shouldn't affect the allocation of ordinary GPRs.

I also think it would be hard for a target-independent algorithm to do
a good job with the x86 register classes.

So in the end it seemed like some target-dependent knowledge was needed
to determine which classes are important enough and which aren't.

Thanks,
Richard


Testsuite: Add 'torture-init-done', and use it to conditionalize implicit 'torture-init' (was: Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage (was: Let each

2023-05-10 Thread Thomas Schwinge
Hi Christophe!

On 2023-05-09T21:14:07+0200, Christophe Lyon  wrote:
> On Tue, 9 May 2023 at 17:17, Christophe Lyon 
> wrote:
>> On Tue, 9 May 2023 at 11:00, Thomas Schwinge 
>> wrote:
>>> On 2023-05-09T09:32:55+0200, Christophe Lyon 
>>> wrote:
>>> > On Wed, 3 May 2023 at 13:47, Richard Biener via Gcc-patches <
>>> gcc-patches@gcc.gnu.org> wrote:
>>> >> On Wed, 3 May 2023, Thomas Schwinge wrote:
>>> >> > "Let each 'lto_init' determine the default 'LTO_OPTIONS', and
>>> 'torture-init' the 'LTO_TORTURE_OPTIONS'"?
>>> >
>>> > This is causing issues on arm/aarch64, including:
>>> >
>>> > ERROR: can't read "LTO_TORTURE_OPTIONS": no such variable
>>> > in gcc.target/arm/acle/acle.exp:
>>> >
>>> > ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
>>> > in gcc.target/aarch64/sls-mitigation/sls-mitigation.exp,
>>> > gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp,
>>> > gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp,
>>> > gcc.target/aarch64/torture/aarch64-torture.exp
>>> >
>>> > and maybe others
>>> >
>>> > Are other targets affected too?
>>>
>>> Sorry for that -- it means, the safe-guards I added are working as
>>> expected.
>>>
>>> Please test whether all these issues are gone with the attached
>>> "Testsuite: Add missing 'torture-init'/'torture-finish' around
>>> 'LTO_TORTURE_OPTIONS' usage"?
>>
>> Your patch seemed reasonable,  but it doesn't work :-(
>>
>> Well now I get:
>> ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
>> because gcc-dg-runtest itself calls torture-init
>>
>> but I'm not sure where LTO_TORTURE_OPTIONS is set
>
> Just checking, are you able to test your changes on arm (a cross toolchain
> is OK) ?

Sorry, I don't currently have an arm/aarch64 toolchain built.

> The problem shows up even if running only acle.exp, so it's quick once you
> have built the toolchain once.

I did a quick hack:

--- gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
+++ gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
@@ -22,3 +21,0 @@
-if {![istarget aarch64*-*-*] } then {
-  return
-}
--- gcc/testsuite/gcc.target/arm/acle/acle.exp
+++ gcc/testsuite/gcc.target/arm/acle/acle.exp
@@ -20,3 +19,0 @@
-if ![istarget arm*-*-*] then {
-  return
-}

..., and confirm to run into the DejaGnu/TCL ERRORs in my
x86_64-pc-linux-gnu testing.

> I spent some time looking at it, and the conflict is that the .exp file
> calls torture-init and gcc-dg-runtest, which in turn calls torture-init
> again, leading to the error.

I see, thanks -- and sorry, once again.

> I haven't checked the details of why there are similar failures on aarch64.

I now understand that the problem is the following: most of all '*.exp'
files have 'torture-init' followed by 'set-torture-options' before
'gcc-dg-runtest' etc., and therefore don't run into the latter's
"Some callers set torture options themselves; don't override those."
code.  Some '*.exp' files however do 'torture-init' but not
'set-torture-options', and therefore we can't any longer conditionalize
the implicit 'torture-init' by '![torture-options-exist]'.
Please in addition to the earlier
"Testsuite: Add missing 'torture-init'/'torture-finish' around 
'LTO_TORTURE_OPTIONS' usage"
also apply the attached
"Testsuite: Add 'torture-init-done', and use it to conditionalize implicit 
'torture-init'".
That hopefully should restore sanity -- if not, I'll get arm/aarch64
toolchains built.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4a069073834c5b710e17315c0844f1212fa54164 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 10 May 2023 09:17:47 +0200
Subject: [PATCH] Testsuite: Add 'torture-init-done', and use it to
 conditionalize implicit 'torture-init'

Recent commit d6654a4be3ba44c0d57be7c8a51d76d9721345e1
"Let each 'lto_init' determine the default 'LTO_OPTIONS', and 'torture-init' the 'LTO_TORTURE_OPTIONS'"
made 'torture-init' non-idempotent re 'LTO_TORTURE_OPTIONS', in order to catch
certain classes of errors.  Now, most of all '*.exp' files have 'torture-init'
followed by 'set-torture-options' before 'gcc-dg-runtest' etc., and therefore
don't run into the latter's
"Some callers set torture options themselves; don't override those." code.
Some '*.exp' files however do 'torture-init' but not 'set-torture-options', and
therefore we can't any longer conditionalize the implicit 'torture-init' by
'![torture-options-exist]'.

	gcc/testsuite/
	* lib/torture-options.exp (torture-init-done): Add.
	* lib/gcc-dg.exp (gcc-dg-runtest): Use it to conditionalize
	implicit 'torture-init'.
	* lib/gfortran-dg.exp (gfortran-dg-runtest): Likewise.
	* lib/obj-c++-dg.exp (obj-c++-dg-runtest): Likewise.
	* lib/objc-dg.exp (objc-dg-runtest): Lik

Re: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Richard Biener via Gcc-patches
On Wed, 10 May 2023, pan2...@intel.com wrote:

> From: Pan Li 
> 
> The decl_or_value is defined as void * before this PATCH. It will take
> care of both the tree_node and rtx_def. Unfortunately, given a void
> pointer cannot tell the input is tree_node or rtx_def.
> 
> Then we have some implicit structure layout requirement similar as
> below. Or we will touch unreasonable bits when cast void * to tree_node
> or rtx_def.
> 
> ++---+--+
> | offset | tree_node | rtx_def  |
> ++---+--+
> |  0 | code: 16  | code: 16 | <- require the location and bitssize
> ++---+--+
> | 16 | ...   | mode: 8  |
> ++---+--+
> | ...   |
> ++---+--+
> | 24 | ...   | ...  |
> ++---+--+
> 
> This behavior blocks the PATCH that extend the rtx_def mode from 8 to
> 16 bits for running out of machine mode. This PATCH introduced the
> pointer_mux to tell the input is tree_node or rtx_def, and decouple
> the above implicition dependency.
> 
> Signed-off-by: Pan Li 
> Co-Authored-By: Richard Sandiford 
> Co-Authored-By: Richard Biener 
> 
> gcc/ChangeLog:
> 
>   * var-tracking.cc (DECL_OR_VALUE_OR_DEFAULT): New macro for
> clean code.
>   (dv_is_decl_p): Adjust type changes to pointer_mux.
>   (dv_as_decl): Likewise.
>   (dv_as_value): Likewise.
>   (dv_as_opaque): Likewise.
>   (variable_hasher::equal): Likewise.
>   (dv_from_decl): Likewise.
>   (dv_from_value): Likewise.
>   (shared_hash_find_slot_unshare_1): Likewise.
>   (shared_hash_find_slot_1): Likewise.
>   (shared_hash_find_slot_noinsert_1): Likewise.
>   (shared_hash_find_1): Likewise.
>   (unshare_variable): Likewise.
>   (vars_copy): Likewise.
>   (find_loc_in_1pdv): Likewise.
>   (find_mem_expr_in_1pdv): Likewise.
>   (dataflow_set_different): Likewise.
>   (variable_from_dropped): Likewise.
>   (variable_was_changed): Likewise.
>   (loc_exp_insert_dep): Likewise.
>   (notify_dependents_of_resolved_value): Likewise.
>   (vt_expand_loc_callback): Likewise.
>   (remove_value_from_changed_variables): Likewise.
>   (notify_dependents_of_changed_value): Likewise.
>   (emit_notes_for_differences_1): Likewise.
>   (emit_notes_for_differences_2): Likewise.
> ---
>  gcc/var-tracking.cc | 119 +++-
>  1 file changed, 74 insertions(+), 45 deletions(-)
> 
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
> index fae0c73e02f..9bc9d21e5ba 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,9 +116,17 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> +/* A declaration of a variable, or an RTL value being handled like a
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
> +
> +#define DECL_OR_VALUE_OR_DEFAULT(ptr) \
> +  ((ptr) ? decl_or_value (ptr) : decl_or_value ())
> +
>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
> code
> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
> Currently the value is the same as IDENTIFIER_NODE, which has such
> @@ -196,15 +204,21 @@ struct micro_operation
>  };
>  
>  
> -/* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;
> -
>  /* Return true if a decl_or_value DV is a DECL or NULL.  */
>  static inline bool
>  dv_is_decl_p (decl_or_value dv)
>  {
> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
> +  bool is_decl = !dv;
> +
> +  if (dv)
> +{
> +  if (dv.is_first ())
> + is_decl = (int) TREE_CODE (dv.known_first ()) != (int) VALUE;
> +  else if (!dv.is_first () && !dv.is_second ())
> + is_decl = true;
> +}
> +
> +  return is_decl;

This all looks very confused, shouldn't it just be

 return dv.is_first ();

?  All the keying on VALUE should no longer be necessary.

>  }
>  
>  /* Return true if a decl_or_value is a VALUE rtl.  */
> @@ -219,7 +233,7 @@ static inline tree
>  dv_as_decl (decl_or_value dv)
>  {
>gcc_checking_assert (dv_is_decl_p (dv));
> -  return (tree) dv;
> +  return dv.is_first () ? dv.known_first () : NULL_TREE;

and this should be

 return dv.known_first ();

?  (knowing that ptr-mux will not mutate 'first' and thus preserves
a nullptr there)

>  }
>  
>  /* Return the value in the decl_or_value.  */
> @@ -227,14 +241,20 @@ static inline rtx
>  dv_as_value (decl_or_value dv)
>  {
>gcc_checking_assert (dv_is_value_p (dv));
> -  return (rtx)dv;
> +  return dv.is_second () ? dv.known_second () : NULL_RTX;;

return dv.known_second ();  (the assert makes sure it isn't nullptr)

>  }
>  
>  /* Return the opaque pointer in the decl_or_value.  */
>  static inline void 

Re: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 10 May 2023, pan2...@intel.com wrote:
>
>> From: Pan Li 
>> 
>> The decl_or_value is defined as void * before this PATCH. It will take
>> care of both the tree_node and rtx_def. Unfortunately, given a void
>> pointer cannot tell the input is tree_node or rtx_def.
>> 
>> Then we have some implicit structure layout requirement similar as
>> below. Or we will touch unreasonable bits when cast void * to tree_node
>> or rtx_def.
>> 
>> ++---+--+
>> | offset | tree_node | rtx_def  |
>> ++---+--+
>> |  0 | code: 16  | code: 16 | <- require the location and bitssize
>> ++---+--+
>> | 16 | ...   | mode: 8  |
>> ++---+--+
>> | ...   |
>> ++---+--+
>> | 24 | ...   | ...  |
>> ++---+--+
>> 
>> This behavior blocks the PATCH that extend the rtx_def mode from 8 to
>> 16 bits for running out of machine mode. This PATCH introduced the
>> pointer_mux to tell the input is tree_node or rtx_def, and decouple
>> the above implicition dependency.
>> 
>> Signed-off-by: Pan Li 
>> Co-Authored-By: Richard Sandiford 
>> Co-Authored-By: Richard Biener 
>> 
>> gcc/ChangeLog:
>> 
>>  * var-tracking.cc (DECL_OR_VALUE_OR_DEFAULT): New macro for
>>clean code.
>>  (dv_is_decl_p): Adjust type changes to pointer_mux.
>>  (dv_as_decl): Likewise.
>>  (dv_as_value): Likewise.
>>  (dv_as_opaque): Likewise.
>>  (variable_hasher::equal): Likewise.
>>  (dv_from_decl): Likewise.
>>  (dv_from_value): Likewise.
>>  (shared_hash_find_slot_unshare_1): Likewise.
>>  (shared_hash_find_slot_1): Likewise.
>>  (shared_hash_find_slot_noinsert_1): Likewise.
>>  (shared_hash_find_1): Likewise.
>>  (unshare_variable): Likewise.
>>  (vars_copy): Likewise.
>>  (find_loc_in_1pdv): Likewise.
>>  (find_mem_expr_in_1pdv): Likewise.
>>  (dataflow_set_different): Likewise.
>>  (variable_from_dropped): Likewise.
>>  (variable_was_changed): Likewise.
>>  (loc_exp_insert_dep): Likewise.
>>  (notify_dependents_of_resolved_value): Likewise.
>>  (vt_expand_loc_callback): Likewise.
>>  (remove_value_from_changed_variables): Likewise.
>>  (notify_dependents_of_changed_value): Likewise.
>>  (emit_notes_for_differences_1): Likewise.
>>  (emit_notes_for_differences_2): Likewise.
>> ---
>>  gcc/var-tracking.cc | 119 +++-
>>  1 file changed, 74 insertions(+), 45 deletions(-)
>> 
>> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
>> index fae0c73e02f..9bc9d21e5ba 100644
>> --- a/gcc/var-tracking.cc
>> +++ b/gcc/var-tracking.cc
>> @@ -116,9 +116,17 @@
>>  #include "fibonacci_heap.h"
>>  #include "print-rtl.h"
>>  #include "function-abi.h"
>> +#include "mux-utils.h"
>>  
>>  typedef fibonacci_heap  bb_heap_t;
>>  
>> +/* A declaration of a variable, or an RTL value being handled like a
>> +   declaration by pointer_mux.  */
>> +typedef pointer_mux decl_or_value;
>> +
>> +#define DECL_OR_VALUE_OR_DEFAULT(ptr) \
>> +  ((ptr) ? decl_or_value (ptr) : decl_or_value ())
>> +
>>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
>> code
>> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
>> Currently the value is the same as IDENTIFIER_NODE, which has such
>> @@ -196,15 +204,21 @@ struct micro_operation
>>  };
>>  
>>  
>> -/* A declaration of a variable, or an RTL value being handled like a
>> -   declaration.  */
>> -typedef void *decl_or_value;
>> -
>>  /* Return true if a decl_or_value DV is a DECL or NULL.  */
>>  static inline bool
>>  dv_is_decl_p (decl_or_value dv)
>>  {
>> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
>> +  bool is_decl = !dv;
>> +
>> +  if (dv)
>> +{
>> +  if (dv.is_first ())
>> +is_decl = (int) TREE_CODE (dv.known_first ()) != (int) VALUE;
>> +  else if (!dv.is_first () && !dv.is_second ())
>> +is_decl = true;
>> +}
>> +
>> +  return is_decl;
>
> This all looks very confused, shouldn't it just be
>
>  return dv.is_first ();
>
> ?  All the keying on VALUE should no longer be necessary.
>
>>  }
>>  
>>  /* Return true if a decl_or_value is a VALUE rtl.  */
>> @@ -219,7 +233,7 @@ static inline tree
>>  dv_as_decl (decl_or_value dv)
>>  {
>>gcc_checking_assert (dv_is_decl_p (dv));
>> -  return (tree) dv;
>> +  return dv.is_first () ? dv.known_first () : NULL_TREE;
>
> and this should be
>
>  return dv.known_first ();
>
> ?  (knowing that ptr-mux will not mutate 'first' and thus preserves
> a nullptr there)
>
>>  }
>>  
>>  /* Return the value in the decl_or_value.  */
>> @@ -227,14 +241,20 @@ static inline rtx
>>  dv_as_value (decl_or_value dv)
>>  {
>>gcc_checking_assert (dv_is_value_p (dv));
>> -  return (rtx)dv;
>> +  return dv.is_second () ? dv.known_second () : NULL_RTX;;
>
> return 

Re: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Jakub Jelinek via Gcc-patches
On Wed, May 10, 2023 at 03:17:58PM +0800, Pan Li via Gcc-patches wrote:
> gcc/ChangeLog:
> 
>   * var-tracking.cc (DECL_OR_VALUE_OR_DEFAULT): New macro for
> clean code.

ChangeLog formatting shouldn't have spaces after the initial tab.
Furthermore, the entry doesn't describe what changes you've made.
It should start with:
* var-tracking.cc: Include mux-utils.h.
(decl_or_value): Changed from void * to
pointer_mux.
(DECL_OR_VALUE_OR_DEFAULT): Define.
etc.

> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,9 +116,17 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> +/* A declaration of a variable, or an RTL value being handled like a
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
> +
> +#define DECL_OR_VALUE_OR_DEFAULT(ptr) \
> +  ((ptr) ? decl_or_value (ptr) : decl_or_value ())
> +
>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
> code
> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
> Currently the value is the same as IDENTIFIER_NODE, which has such
> @@ -196,15 +204,21 @@ struct micro_operation
>  };
>  
>  
> -/* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;
> -
>  /* Return true if a decl_or_value DV is a DECL or NULL.  */
>  static inline bool
>  dv_is_decl_p (decl_or_value dv)
>  {
> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
> +  bool is_decl = !dv;
> +
> +  if (dv)
> +{
> +  if (dv.is_first ())
> + is_decl = (int) TREE_CODE (dv.known_first ()) != (int) VALUE;
> +  else if (!dv.is_first () && !dv.is_second ())
> + is_decl = true;
> +}
> +
> +  return is_decl;

I really don't understand why it needs to be so complicated.
decl_or_value is dv_is_decl_p if it is NULL or if it is a tree,
and is false if it is rtx VALUE, no other rtxes are expected.
pointer_mux should accept nullptr as being the first
one, so i'd expect just

/* Return true if a decl_or_value DV is a DECL or NULL.  */
static inline bool
dv_is_decl_p (decl_or_value dv)
{
  return dv.is_first ();
}

/* Return true if a decl_or_value is a VALUE rtl.  */
static inline bool
dv_is_value_p (decl_or_value dv)
{
  return dv.is_second ();
} 

/* Return the decl in the decl_or_value.  */
static inline tree
dv_as_decl (decl_or_value dv)
{
  gcc_checking_assert (dv_is_decl_p (dv));
  return dv.known_first ();
}
  
/* Return the value in the decl_or_value.  */
static inline rtx
dv_as_value (decl_or_value dv)
{
  gcc_checking_assert (dv_is_value_p (dv));
  return dv.known_second ();
}
   
/* Return the opaque pointer in the decl_or_value.  */
static inline void *
dv_as_opaque (decl_or_value dv)
{
  return dv.is_first () ? (void *) dv.known_first ()
: (void *) dv.known_second ();
}

// Ideally dv_as_opaque would just return m_ptr but that
// is unfortunately private.

And define a hasher for decl_or_value now that it is a class
(that would hash/compare the m_ptr value or separately
dv.is_first () bool and dv_as_opaque pointer).

And then I'd hope you don't need to do any further changes
in the file.

Jakub



Re: Re: [PATCH V2] RISC-V: Fix incorrect implementation of TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

2023-05-10 Thread Kito Cheng via Gcc-patches
Thanks, although I still have concern about that we should consider
check on movmisalign with STRICT_ALIGNMENT, but I am ok with this for
now, we can always fix that if got future issues.

committed to trunk

On Tue, May 9, 2023 at 11:22 PM 钟居哲  wrote:
>
> No, I don't think so. Some testcases the reason I added -fno-vect-cost-model 
> here is
> because we don't have enough patterns to enable some auto-vectorizations.
> I add   -fno-vect-cost-model to force enable auto-vectorizations for such 
> cases for testing.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-09 22:36
> To: juzhe.zhong
> CC: gcc-patches@gcc.gnu.org; pal...@dabbelt.com; jeffreya...@gmail.com; 
> rdapp@gmail.com
> Subject: Re: [PATCH V2] RISC-V: Fix incorrect implementation of 
> TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> One more question from me: should we just add  -fno-vect-cost-model to
> AUTOVEC_TEST_OPTS?
>
> On Tue, May 9, 2023 at 10:29 PM Kito Cheng  wrote:
> >
> > Oh, checked default_builtin_support_vector_misalignment and I realized
> > we can just remove riscv_support_vector_misalignment at all...
> >
> >
> > On Tue, May 9, 2023 at 10:18 PM juzhe.zhong  wrote:
> > >
> > > riscv_support_vector_misalignment update makes some of the testcase check 
> > > fail. I have checked the those fails, they are reasonable. So I include 
> > > test case adapt in this patch.
> > >  Replied Message 
> > > FromKito Cheng
> > > Date05/09/2023 21:54
> > > tojuzhe.zh...@rivai.ai
> > > ccgcc-patc...@gcc.gnu.org,
> > > pal...@dabbelt.com,
> > > jeffreya...@gmail.com,
> > > rdapp@gmail.com
> > > SubjectRe: [PATCH V2] RISC-V: Fix incorrect implementation of 
> > > TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
> > > I am ok with both changes but I tried to build some test cases, and it
> > > seems the changes are caused by options update, not caused by the
> > > riscv_support_vector_misalignment update? so I would like to see the
> > > testcase should split out into a separated patch.
> > >
> > > > +/* Return true if the vector misalignment factor is supported by the
> > > > +   target.  */
> > > >  bool
> > > >  riscv_support_vector_misalignment (machine_mode mode,
> > > >const_tree type ATTRIBUTE_UNUSED,
> > > >int misalignment,
> > > >bool is_packed ATTRIBUTE_UNUSED)
> > > >  {
> > > > -  if (TARGET_VECTOR)
> > > > -{
> > > > -  if (STRICT_ALIGNMENT)
> > > > -   {
> > > > - /* Return if movmisalign pattern is not supported for this 
> > > > mode.  */
> > > > - if (optab_handler (movmisalign_optab, mode) == 
> > > > CODE_FOR_nothing)
> > > > -   return false;
> > > > -
> > > > - /* Misalignment factor is unknown at compile time.  */
> > > > - if (misalignment == -1)
> > > > -   return false;
> > > > -   }
> > > > -  return true;
> > > > -}
> > > > +  /* TODO: For RVV scalable vector auto-vectorization, we should allow
> > > > + movmisalign pattern to handle misalign data movement to 
> > > > unblock
> > > > + possible auto-vectorization.
> > > >
> > > > + RVV VLS auto-vectorization or SIMD auto-vectorization can be 
> > > > supported here
> > > > + in the future.  */
> > > >return default_builtin_support_vector_misalignment (mode, type, 
> > > > misalignment,
> > > >   is_packed);
> > > >  }
> > >
> > > Should we have some corresponding change on autovec.md like this?
> > >
> > > diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> > > index f1c5ff5951bf..c2873201d82e 100644
> > > --- a/gcc/config/riscv/autovec.md
> > > +++ b/gcc/config/riscv/autovec.md
> > > @@ -51,7 +51,7 @@
> > > (define_expand "movmisalign"
> > >  [(set (match_operand:V 0 "nonimmediate_operand")
> > >   (match_operand:V 1 "general_operand"))]
> > > -  "TARGET_VECTOR"
> > > +  "TARGET_VECTOR && !STRICT_ALIGNMENT"
> > >  {
> > >/* Equivalent to a normal move for our purpooses.  */
> > >emit_move_insn (operands[0], operands[1]);
>


Re: [PATCH] RISC-V: Fix dead loop for user vsetvli intrinsic avl checking [PR109773]

2023-05-10 Thread Kito Cheng via Gcc-patches
Thanks, pushed to trunk.

On Tue, May 9, 2023 at 10:20 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is fix dead loop in vsetvl intrinsic avl checking.
>
> vsetvli->get_def () has vsetvli->get_def () has vsetvli.
> Then it will keep looping in the vsetvli avl checking which is a dead loop.
>
> PR target/109773
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (avl_source_has_vsetvl_p): New 
> function.
> (source_equal_p): Fix dead loop in vsetvl avl checking.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/vsetvl/pr109773-1.c: New test.
> * gcc.target/riscv/rvv/vsetvl/pr109773-2.c: New test.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  | 25 ++
>  .../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  | 20 ++
>  .../gcc.target/riscv/rvv/vsetvl/pr109773-2.c  | 26 +++
>  3 files changed, 71 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-2.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 72aa2bfcf6f..2577b2bd9b7 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1056,6 +1056,24 @@ change_vsetvl_insn (const insn_info *insn, const 
> vector_insn_info &info)
>change_insn (rinsn, new_pat);
>  }
>
> +static bool
> +avl_source_has_vsetvl_p (set_info *avl_source)
> +{
> +  if (!avl_source)
> +return false;
> +  if (!avl_source->insn ())
> +return false;
> +  if (avl_source->insn ()->is_real ())
> +return vsetvl_insn_p (avl_source->insn ()->rtl ());
> +  hash_set sets = get_all_sets (avl_source, true, false, true);
> +  for (const auto set : sets)
> +{
> +  if (set->insn ()->is_real () && vsetvl_insn_p (set->insn ()->rtl ()))
> +   return true;
> +}
> +  return false;
> +}
> +
>  static bool
>  source_equal_p (insn_info *insn1, insn_info *insn2)
>  {
> @@ -1098,6 +1116,13 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
>vector_insn_info insn1_info, insn2_info;
>insn1_info.parse_insn (insn1);
>insn2_info.parse_insn (insn2);
> +
> +  /* To avoid dead loop, we don't optimize a vsetvli def has vsetvli
> +instructions which will complicate the situation.  */
> +  if (avl_source_has_vsetvl_p (insn1_info.get_avl_source ())
> + || avl_source_has_vsetvl_p (insn2_info.get_avl_source ()))
> +   return false;
> +
>if (insn1_info.same_vlmax_p (insn2_info)
>   && insn1_info.compatible_avl_p (insn2_info))
> return true;
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-1.c
> new file mode 100644
> index 000..8656e473117
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-1.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize 
> -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f (int32_t *a, int32_t *b, int n)
> +{
> +  if (n <= 0)
> +return;
> +  int i = n;
> +  size_t vl = __riscv_vsetvl_e8mf4 (i);
> +  for (; i >= 0; i--)
> +{
> +  vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl);
> +  __riscv_vse32_v_i32m1 (b + i, v, vl);
> +  vl = __riscv_vsetvl_e8mf4 (vl);
> +}
> +}
> +
> +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" 
> no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts 
> "-funroll-loops" } } } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-2.c
> new file mode 100644
> index 000..7bfbaaf3713
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109773-2.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize 
> -fno-schedule-insns -fno-schedule-insns2" } */
> +
> +#include "riscv_vector.h"
> +
> +void f (int32_t * a, int32_t * b, int n)
> +{
> +if (n <= 0)
> +  return;
> +int i = n;
> +size_t vl = __riscv_vsetvl_e8mf4 (i);
> +for (; i >= 0; i--)
> +  {
> +vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl);
> +   v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl);
> +__riscv_vse32_v_i32m1 (b + i, v, vl);
> +
> +if (i >= vl)
> +  continue;
> +if (i == 0)
> +  return;
> +vl = __riscv_vsetvl_e8mf4 (vl);
> +  }
> +}
> +
> +/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
> "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } 
> } */
> --
> 2.36.3
>


Re: [PATCH] Fixes and workarounds for warnings during autoprofiledbootstrap build

2023-05-10 Thread Martin Jambor
Hello,

On Wed, May 10 2023, Eugene Rozenfeld via Gcc-patches wrote:
> autoprofiledbootstrap build produces new warnings since inlining decisions
> are different from other builds. This patch contains fixes and workarounds
> for those warnings.
>
> Tested on x86_64-pc-linux-gnu.
>
> gcc/ChangeLog:
>
>   * config/i386/i386-expand.cc (expand_vec_perm_interleave2): Work around
>   -Wstringop-overflow false positive during autoprofiledbootstrap
>   * ipa-devirt.cc (debug_tree_odr_name): Fix for -Wformat-overflow
>   warning during autoprofiledbootstrap
>   * lra-eliminations.cc (setup_can_eliminate): Work around
>   -Wmaybe-uninitialized false positive during autoprofiledbootstrap
>   * opts-common.cc (candidates_list_and_hint): Work around
>   -Wstringop-overflow false positive during autoprofiledbootstrap
>   * tree-ssa-ccp.cc (bit_value_unop): Work around -Wmaybe-uninitialized
>   false positive during autoprofiledbootstrap
>   * wide-int.h (wi::copy): Work around -Wmaybe-uninitialized false
>   positive during autoprofiledbootstrap
> ---
>  gcc/config/i386/i386-expand.cc | 11 +++
>  gcc/ipa-devirt.cc  |  3 ++-
>  gcc/lra-eliminations.cc| 11 +++
>  gcc/opts-common.cc |  1 +
>  gcc/tree-ssa-ccp.cc| 11 +++
>  gcc/wide-int.h | 11 +++
>  6 files changed, 47 insertions(+), 1 deletion(-)
>

[...]

> diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
> index 819860258d1..36ea266e834 100644
> --- a/gcc/ipa-devirt.cc
> +++ b/gcc/ipa-devirt.cc
> @@ -4033,7 +4033,8 @@ debug_tree_odr_name (tree type, bool demangle)
>odr = cplus_demangle (odr, opts);
>  }
>  
> -  fprintf (stderr, "%s\n", odr);
> +  if (odr != NULL)
> +fprintf (stderr, "%s\n", odr);
>  }

I cannot find a call to this debug function on trunk.  How exactly did
this trigger a warning?

In any case, IMHO the function should rather print something that makes
it clear that an odr name could not be obtained rather than printing
nothing.

I also think that if we want to handle the case, we should do it before
also possibly passing odr to demangler.

Thanks,

Martin


Re: [PATCH] _Hashtable implementation cleanup

2023-05-10 Thread Jonathan Wakely via Gcc-patches
On Wed, 10 May 2023 at 05:59, François Dumont via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Hi
>
> Rather than providing a series of patches for _Hashtable I prefer to
> submit them one by one. It will maximize the chances to have some of
> them in gcc 14.
>
> I'm starting with this simple patch to do some cleanup in the current
> implementation to ease compiler optimizations by making some methods
> implicitly inline and avoiding the iterator abstraction when useless.
>
> It is also replacing some faulty usages of __node_type* with __node_ptr.
> It should simplify the patch to make use of allocator custom pointer I
> would like to reactivate.
>
> libstdc++: [_Hashtable] Implement several small methods implicitly inline
>
> Make implementation of 3 simple _Hashtable methods implicitly inline.
>
> Avoid usage of const_iterator abstraction within _Hashtable implementation.
>
> Replace several usages of __node_type* with expected __node_ptr.
>
> libstdc++-v3/ChangeLog:
>
>  * include/bits/hashtable_policy.h
>  (_NodeBuilder<>::_S_build): Use __node_ptr.
>  (_ReuseOrAllocNode<>): Use __node_ptr in place of
> __node_type*.
>  (_AllocNode<>): Likewise.
>  (_Equality<>::_M_equal): Remove const_iterator usages. Only
> preserved
>  to call std::is_permutation in the non-unique key
> implementation.
>  * include/bits/hashtable.h
> (_Hashtable<>::_M_update_begin()): Capture
>  _M_begin() once.
>  (_Hashtable<>::_M_bucket_begin(size_type)): Implement
> implicitly inline.
>  (_Hashtable<>::_M_insert_bucket_begin): Likewise.
>  (_Hashtable<>::_M_remove_bucket_begin): Likewise.
>  (_Hashtable<>::_M_compute_hash_code): Use __node_ptr rather
> than
>  const_iterator.
>  (_Hashtable<>::find): Likewise.
>  (_Hashtable<>::_M_emplace): Likewise.
>  (_Hashtable<>::_M_insert_unique): Likewise.
>
> Ok to commit ?
>

OK, thanks


Re: [PATCH V2] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s instructions satisfying REG_P(operand[1]) in -O0.

2023-05-10 Thread Kito Cheng via Gcc-patches
Committed, thanks for catching this issue :)

On Wed, May 10, 2023 at 12:08 PM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM. Let's wait for kito's feedback.
> Thanks :)
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Li Xu
> Date: 2023-05-10 12:02
> To: gcc-patches
> CC: kito.cheng; palmer; juzhe.zhong; Li Xu
> Subject: [PATCH V2] RISC-V: Insert vsetivli zero, 0 for vmv.x.s/vfmv.f.s 
> instructions satisfying REG_P(operand[1]) in -O0.
> This issue happens is because the operand1 of scalar move can be
> REG_P (operand[1]) in the O0 case, which causes the VSETVL PASS to
> not insert the vsetvl instruction correctly, and the compiler crashes.
>
> Consider this following case:
> int16_t foo1 (void *base, size_t vl)
> {
> int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
> vl));
> return maxVal;
> }
>
> Before this patch:
> bug.c:15:1: internal compiler error: Segmentation fault
>15 | }
>   | ^
> 0x145d723 crash_signal
> ../.././riscv-gcc/gcc/toplev.cc:314
> 0x22929dd const_csr_operand(rtx_def*, machine_mode)
> ../.././riscv-gcc/gcc/config/riscv/predicates.md:44
> 0x2292a21 csr_operand(rtx_def*, machine_mode)
> ../.././riscv-gcc/gcc/config/riscv/predicates.md:46
> 0x23dfbb0 recog_356
> ../.././riscv-gcc/gcc/config/riscv/iterators.md:72
> 0x23efecd recog(rtx_def*, rtx_insn*, int*)
> ../.././riscv-gcc/gcc/config/riscv/iterators.md:89
> 0xdddc15 recog_memoized(rtx_insn*)
> ../.././riscv-gcc/gcc/recog.h:273
>
> After this patch:
> vsetivli zero,0,e16,m1,ta,ma
> vmv.x.s a5,v1
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): For vfmv.f.s/vmv.x.s 
> intruction replace null avl with (const_int 0).
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/scalar_move-10.c: New test.
> * gcc.target/riscv/rvv/base/scalar_move-11.c: New test.
> ---
> gcc/config/riscv/riscv-vsetvl.cc  |  5 +++
> .../riscv/rvv/base/scalar_move-10.c   | 31 +++
> .../riscv/rvv/base/scalar_move-11.c   | 20 
> 3 files changed, 56 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index d4d6f336ef9..14ebae1f3f6 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -618,6 +618,11 @@ static rtx
> gen_vsetvl_pat (enum vsetvl_type insn_type, const vl_vtype_info &info, rtx vl)
> {
>rtx avl = info.get_avl ();
> +  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
> + set the value of avl to (const_int 0) so that VSETVL PASS will
> + insert vsetvl correctly.*/
> +  if (info.has_avl_no_reg ())
> +avl = GEN_INT (0);
>rtx sew = gen_int_mode (info.get_sew (), Pmode);
>rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
>rtx ta = gen_int_mode (info.get_ta (), Pmode);
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
> new file mode 100644
> index 000..9760d77fb22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-10.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "riscv_vector.h"
> +
> +/*
> +** foo1:
> +** ...
> +** vsetivli\tzero,0,e16,m1,t[au],m[au]
> +** vmv.x.s\t[a-x0-9]+,v[0-9]+
> +** ...
> +*/
> +int16_t foo1 (void *base, size_t vl)
> +{
> +int16_t maxVal = __riscv_vmv_x_s_i16m1_i16 (__riscv_vle16_v_i16m1 (base, 
> vl));
> +return maxVal;
> +}
> +
> +/*
> +** foo2:
> +** ...
> +** vsetivli\tzero,0,e32,m1,t[au],m[au]
> +** vfmv.f.s\tf[a-x0-9]+,v[0-9]+
> +** ...
> +*/
> +float foo2 (void *base, size_t vl)
> +{
> +float maxVal = __riscv_vfmv_f_s_f32m1_f32 (__riscv_vle32_v_f32m1 (base, 
> vl));
> +return maxVal;
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
> new file mode 100644
> index 000..8036acd0a52
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-11.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv -mabi=ilp32d -O0" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#include "riscv_vector.h"
> +
> +/*
> +** foo:
> +** ...
> +** vsetivli\tzero,0,e64,m4,t[au],m[au]
> +** vmv.x.s\t[a-x0-9]+,v[0-9]+
> +** vsetivli\tzero,0,e64,m4,t[au],m[au]
> +** vmv.x.s\t[a-x0-9]+,v[0-9]+
> +** ...
> +*/
> +int16_t foo (void *base, size_t vl)
> +{
> +int16_t maxVal = __riscv_vmv_x_s_i64m4_i64 (__riscv_vle64_v_i64m4 (base, 
> vl));
> +return maxVal;
> +}
> --
> 2.17.1
>
>


Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Sandiford via Gcc-patches
Oluwatamilore Adebayo  writes:
> From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> From: oluade01 
> Date: Fri, 14 Apr 2023 10:24:43 +0100
> Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
>
> This adds a recognition pattern for the non-widening
> absolute difference (ABD).
>
> gcc/ChangeLog:
>
> * doc/md.texi (sabd, uabd): Document them.
> * internal-fn.def (ABD): Use new optab.
> * optabs.def (sabd_optab, uabd_optab): New optabs,
> * tree-vect-patterns.cc (vect_recog_absolute_difference):
> Recognize the following idiom abs (a - b).
> (vect_recog_sad_pattern): Refactor to use
> vect_recog_absolute_difference.
> (vect_recog_abd_pattern): Use patterns found by
> vect_recog_absolute_difference to build a new ABD
> internal call.
> ---
>  gcc/doc/md.texi   |  10 ++
>  gcc/internal-fn.def   |   3 +
>  gcc/optabs.def|   2 +
>  gcc/tree-vect-patterns.cc | 250 +-
>  4 files changed, 234 insertions(+), 31 deletions(-)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
>  Vector shift and rotate instructions that take vectors as operand 2
>  instead of a scalar type.
>
> +@cindex @code{uabd@var{m}} instruction pattern
> +@cindex @code{sabd@var{m}} instruction pattern
> +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> +Signed and unsigned absolute difference instructions.  These
> +instructions find the difference between operands 1 and 2
> +then return the absolute value.  A C code equivalent would be:
> +@smallexample
> +op0 = abs (op0 - op1)

op0 = abs (op1 - op2)

But that isn't the correct calculation for unsigned (where abs doesn't
really work).  It also doesn't handle some cases correctly for signed.

I think it's more:

  op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1

or (conceptually) max minus min.

E.g. for 16-bit values, the absolute difference between signed 0x7fff
and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
to signed).  But, ignoring undefined behaviour:

  0x7fff - 0x8000 = -1
  abs(-1) = 1

which gives the wrong answer.

We might still be able to fold C abs(a - b) to abd for signed a and b
by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
can't do it for -fwrapv.

Richi knows better than me what would be appropriate here.

Thanks,
Richard

> +@end smallexample
> +
>  @cindex @code{avg@var{m}3_floor} instruction pattern
>  @cindex @code{uavg@var{m}3_floor} instruction pattern
>  @item @samp{avg@var{m}3_floor}
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
>
> +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> + sabd, uabd, binary)
> +
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
>   savg_floor, uavg_floor, binary)
>  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> "mask_fold_left_plus_$a")
>  OPTAB_D (extract_last_optab, "extract_last_$a")
>  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
>
> +OPTAB_D (uabd_optab, "uabd$a3")
> +OPTAB_D (sabd_optab, "sabd$a3")
>  OPTAB_D (savg_floor_optab, "avg$a3_floor")
>  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
>  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> a49b09539776c0056e77f99b10365d0a8747fbc5..91e1f9d4b610275dd833ec56dc77f76367ee7886
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -770,6 +770,89 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
> stmt2_info, tree new_rhs,
>  }
>  }
>
> +/* Look for the following pattern
> +   X = x[i]
> +   Y = y[i]
> +   DIFF = X - Y
> +   DAD = ABS_EXPR
> + */
> +static bool
> +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
> +   tree *half_type, bool reject_unsigned,
> +   vect_unpromoted_value unprom[2],
> +   tree diff_oprnds[2])
> +{
> +  if (!abs_stmt)
> +return false;

[PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-10 Thread liuhongt via Gcc-patches
> The quoted patch shows -shared in context and  you didn't post a
> backport version
> to look at.  But yes, we shouldn't change -shared behavior on a
> branch, even less so make it
> inconsistent between targets.
Here's the patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for GCC 11/12 backport?

if (mdaz-ftz)
  link crtfastmath.o
else if ((Ofast || ffast-math || funsafe-math-optimizations)
 && !mno-daz-ftz)
  link crtfastmath.o
else
  Don't link crtfastmath.o

gcc/ChangeLog:

* config/i386/cygwin.h (ENDFILE_SPEC): Link crtfastmath.o
whenever -mdaz-ftz is specified. Don't link crtfastmath.o
when -mno-daz-ftz is specified.
* config/i386/darwin.h (ENDFILE_SPEC): Ditto.
* config/i386/gnu-user-common.h
(GNU_USER_TARGET_MATHFILE_SPEC): Ditto.
* config/i386/mingw32.h (ENDFILE_SPEC): Ditto.
* config/i386/i386.opt (mdaz-ftz): New option.
* doc/invoke.texi (x86 options): Document mftz-daz.
---
 gcc/config/i386/cygwin.h  |  2 +-
 gcc/config/i386/darwin.h  |  4 ++--
 gcc/config/i386/gnu-user-common.h |  2 +-
 gcc/config/i386/i386.opt  |  4 
 gcc/config/i386/mingw32.h |  2 +-
 gcc/doc/invoke.texi   | 11 ++-
 6 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/cygwin.h b/gcc/config/i386/cygwin.h
index d06eda369cf..5412c5d4479 100644
--- a/gcc/config/i386/cygwin.h
+++ b/gcc/config/i386/cygwin.h
@@ -57,7 +57,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef ENDFILE_SPEC
 #define ENDFILE_SPEC \
-  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s}\
+  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
%{!shared:%:if-exists(default-manifest.o%s)}\
%{fvtable-verify=none:%s; \
 fvtable-verify=preinit:vtv_end.o%s; \
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index a55f6b2b874..2f773924d6e 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -109,8 +109,8 @@ along with GCC; see the file COPYING3.  If not see
 "%{!force_cpusubtype_ALL:-force_cpusubtype_ALL} "
 
 #undef ENDFILE_SPEC
-#define ENDFILE_SPEC \
-  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
+#define ENDFILE_SPEC
+\  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
%{mpc32:crtprec32.o%s} \
%{mpc64:crtprec64.o%s} \
%{mpc80:crtprec80.o%s}" TM_DESTRUCTOR
diff --git a/gcc/config/i386/gnu-user-common.h 
b/gcc/config/i386/gnu-user-common.h
index 23b54c5be52..3d2a33f1714 100644
--- a/gcc/config/i386/gnu-user-common.h
+++ b/gcc/config/i386/gnu-user-common.h
@@ -47,7 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Similar to standard GNU userspace, but adding -ffast-math support.  */
 #define GNU_USER_TARGET_MATHFILE_SPEC \
-  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
+  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
%{mpc32:crtprec32.o%s} \
%{mpc64:crtprec64.o%s} \
%{mpc80:crtprec80.o%s}"
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index a3675e515bc..5cfb7cdcbc2 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -420,6 +420,10 @@ mpc80
 Target RejectNegative
 Set 80387 floating-point precision to 80-bit.
 
+mdaz-ftz
+Target
+Set the FTZ and DAZ Flags.
+
 mpreferred-stack-boundary=
 Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
 Attempt to keep stack aligned to this power of 2.
diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h
index d3ca0cd0279..ddbe6a4054b 100644
--- a/gcc/config/i386/mingw32.h
+++ b/gcc/config/i386/mingw32.h
@@ -197,7 +197,7 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef ENDFILE_SPEC
 #define ENDFILE_SPEC \
-  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
+  
"%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
 \
%{!shared:%:if-exists(default-manifest.o%s)}\
%{fvtable-verify=none:%s; \
 fvtable-verify=preinit:vtv_end.o%s; \
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cb83dd8a1cc..87eedfffa6c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1434,7 +1434,7 @@ See RS/6000 and PowerPC Options.
 -m96bit-long-double  -mlong-double-64  -mlong-double-80  -mlong-double-128 @gol
 -mregparm=@var{num}  -msseregparm @gol
 -mveclibabi=@var{type}  -mvect8-ret-in-mem @gol
--mpc32  -mpc64  -mpc80  -mstackrealign @gol
+-mpc32  -mpc64  -mpc80 -mdaz-ftz -mstackrealign @gol
 -momit-leaf-frame-pointer  -mno-red-zone  -mno-tls-direct-seg-refs @gol
 -mcmodel=@var{code-model}  -mabi=@var{name}  -maddress-mode=@var{mode} @gol
 -m32  -m64  -mx32  -m16  -miamcu  -mlarge-data-threshold=@var{num} @gol
@@ -32078,6 +3

Re: [PATCH] xtensa: Make full transition to LRA

2023-05-10 Thread Max Filippov via Gcc-patches
Hi Suwa-san,

On Mon, May 8, 2023 at 6:38 AM Takayuki 'January June' Suwa
 wrote:
>
> gcc/ChangeLog:
>
> * config/xtensa/constraints.md (R, T, U):
> Change define_constraint to define_memory_constraint.
> * config/xtensa/xtensa.cc
> (xtensa_lra_p, TARGET_LRA_P): Remove.
> (xtensa_emit_move_sequence): Remove "if (reload_in_progress)"
> clause as it can no longer be true.
> (xtensa_output_integer_literal_parts): Consider 16-bit wide
> constants.
> (xtensa_legitimate_constant_p): Add short-circuit path for
> integer load instructions.
> * config/xtensa/xtensa.md (movsf): Use can_create_pseudo_p()
> rather reload_in_progress and reload_completed.
> * config/xtensa/xtensa.opt (mlra): Remove.
> ---
>  gcc/config/xtensa/constraints.md | 26 --
>  gcc/config/xtensa/xtensa.cc  | 26 +-
>  gcc/config/xtensa/xtensa.md  |  2 +-
>  gcc/config/xtensa/xtensa.opt |  4 
>  4 files changed, 14 insertions(+), 44 deletions(-)

That's impressive.
This version introduces a few execution failures in the testsuite on
little endian targets and a bunch more (but not all, some execution
tests still pass) on big endian.
I'm traveling this week and likely won't be able to take a deep look
into it until 5/15.

New LE failures:

+FAIL: gcc.c-torture/execute/pr56866.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gcc.dg/torture/pr45764.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gcc.dg/torture/pr45764.c   -O3 -g  execution test

+FAIL: gfortran.dg/c-interop/section-2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/c-interop/section-2p.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/c-interop/section-3.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/c-interop/section-3p.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/bind-c-contiguous-3.f90   -O3 -g  execution test
+FAIL: gfortran.dg/check_bits_2.f90   -O1  output pattern test
+FAIL: gfortran.dg/coarray_ptr_comp_1.f08   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.dg/coarray_ptr_comp_1.f08   -O3 -g  execution test
+FAIL: gfortran.dg/loc_2.f90   -O2  execution test
+FAIL: gfortran.dg/loc_2.f90   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  execution test
+FAIL: gfortran.dg/loc_2.f90   -O3 -g  execution test
+FAIL: gfortran.dg/loc_2.f90   -Os  execution test
+FAIL: gfortran.dg/sizeof_6.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gfortran.fortran-torture/execute/forall_7.f90 execution,  -O2
-fbounds-check

New BE failures:

+FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution,  -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions
+FAIL: gcc.c-torture/execute/builtins/memset-chk.c execution,  -O3 -g
+FAIL: gcc.c-torture/execute/2412-3.c   -O2  execution test
+FAIL: gcc.c-torture/execute/2412-3.c   -O3 -g  execution test
+FAIL: gcc.c-torture/execute/2412-3.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
+FAIL: gcc.c-torture/execute/20020201-1.c   -O2  execution test
+FAIL: gcc.c-torture/execute/20020201-1.c   -O3 -g  execution test
+FAIL: gcc.c-torture/execute/20020201-1.c   -Os  execution test
+FAIL: gcc.c-torture/execute/20020201-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
+FAIL: gcc.c-torture/execute/20030224-2.c   -O0  execution test
+FAIL: gcc.c-torture/execute/20040629-1.c   -O0  execution test
+FAIL: gcc.c-torture/execute/20040629-1.c   -O1  execution test
+FAIL: gcc.c-torture/execute/20040705-1.c   -O0  execution test
+FAIL: gcc.c-torture/execute/20040705-1.c   -O1  execution test
+FAIL: gcc.c-torture/execute/20040705-2.c   -O0  execution test
+FAIL: gcc.c-torture/execute/20040705-2.c   -O1  execution test
+FAIL: gcc.c-torture/execute/930603-3.c   -O2  execution test
+FAIL: gcc.c-torture/execute/930603-3.c   -O3 -g  execution test
+FAIL: gcc.c-torture/execute/930603-3.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
+FAIL: gcc.c-torture/execute/931004-10.c   -O2  execution test
+FAIL: gcc.c-torture/execute/931004-10.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
+FAIL: gcc.c-torture/execute/931004-10.c   -O3 -g  execution 

Re: [PATCH 3/3] Remove widen_plus/minus_expr tree codes

2023-05-10 Thread Andre Vieira (lists) via Gcc-patches




On 03/05/2023 13:29, Richard Biener wrote:

On Fri, 28 Apr 2023, Andre Vieira (lists) wrote:


This is a rebase of Joel's previous patch.

This patch removes the old widen plus/minus tree codes which have been
replaced by internal functions.


I guess that's obvious then.  I wonder what we do to internal
fns in debug stmts?  Looks like we throw those away and do not
generate debug stmts from calls.

See the comment above the removed lines in expand_debug_expr:
 /* Vector stuff.  For most of the codes we don't have rtl codes.  */

And it then just returns NULL for those expr's. So the behaviour there 
remains unchanged, not saying we couldn't do anything but I don





Given you remove handling of the scalar WIDEN_PLUS/MINUS_EXPR
codes everywhere do we want to add checking code the scalar
IFNs do not appear in the IL?  For at least some cases there
are corresponding functions handling internal functions that
you could have amended otherwise.


I am making some changes to PATCH 2 of this series, in the new version I 
am adding some extra code to the gimple checks, one of which is to error 
if it comes a cross an IFN that decomposes to HILO as that should only 
occur as an intermediary representation of the vect pass.


Richard.


gcc/ChangeLog:

2023-04-28  Andre Vieira  
 Joel Hutton  

* doc/generic.texi: Remove old tree codes.
* expr.cc (expand_expr_real_2): Remove old tree code cases.
* gimple-pretty-print.cc (dump_binary_rhs): Likewise.
* optabs-tree.cc (optab_for_tree_code): Likewise.
(supportable_half_widening_operation): Likewise.
* tree-cfg.cc (verify_gimple_assign_binary): Likewise.
* tree-inline.cc (estimate_operator_cost): Likewise.
(op_symbol_code): Likewise.
* tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise.
(vect_analyze_data_ref_accesses): Likewise.
* tree-vect-generic.cc (expand_vector_operations_1): Likewise.
* cfgexpand.cc (expand_debug_expr): Likewise.
* tree-vect-stmts.cc (vectorizable_conversion): Likewise.
(supportable_widening_operation): Likewise.
* gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard):
Likewise.
* tree-vect-patterns.cc (vect_widened_op_tree): Refactor to replace
usage in vect_recog_sad_pattern.
(vect_recog_sad_pattern): Replace tree code widening pattern with
internal function.
(vect_recog_average_pattern): Likewise.
* tree-pretty-print.cc (dump_generic_node): Remove tree code
definition.
* tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR,
VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR,
VEC_WIDEN_MINUS_LO_EXPR): Likewise





Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__

2023-05-10 Thread Uros Bizjak via Gcc-patches
On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel  wrote:
>
> The small and medium PIC code models generate profiling calls that
> always load the address of __fentry__() via the GOT, even if
> -mdirect-extern-access is in effect.
>
> This deviates from the behavior with respect to other external
> references, and results in a longer opcode that relies on linker
> relaxation to eliminate the GOT load. In this particular case, the
> transformation replaces an indirect 'CALL *__fentry__@GOTPCREL(%rip)'
> with either 'CALL __fentry__; NOP' or 'NOP; CALL __fentry__', where the
> NOP is a 1 byte NOP that preserves the 6 byte length of the sequence.
>
> This is problematic for the Linux kernel, which generally relies on
> -mdirect-extern-access and hidden visibility to eliminate GOT based
> symbol references in code generated with -fpie/-fpic, without having to
> depend on linker relaxation.
>
> The Linux kernel relies on code patching to replace these opcodes with
> NOPs at runtime, and this is complicated code that we'd prefer not to
> complicate even more by adding support for patching both 5 and 6 byte
> sequences as well as parsing the instruction stream to decide which
> variant of CALL+NOP we are dealing with.
>
> So let's honour -mdirect-extern-access, and only load the address of
> __fentry__ via the GOT if direct references to external symbols are not
> permitted.
>
> Note that the GOT reference in question is in fact a data reference: we
> explicitly load the address of __fentry__ from the GOT, which amounts to
> eager binding, rather than emitting a PLT call that could bind eagerly,
> lazily or directly at link time.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (x86_function_profiler): Take
>   ix86_direct_extern_access into account when generating calls
>   to __fentry__()

HJ, is the patch OK with you?

Uros.

>
> Cc: H.J. Lu 
> Cc: Jakub Jelinek 
> Cc: Richard Biener 
> Cc: Uros Bizjak 
> Cc: Hou Wenlong 
> ---
>  gcc/config/i386/i386.cc | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b1d08ecdb3d44729..69b183abb4318b0a 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21836,8 +21836,12 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
>   break;
> case CM_SMALL_PIC:
> case CM_MEDIUM_PIC:
> - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
> - break;
> + if (!ix86_direct_extern_access)
> +   {
> + fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", 
> mcount_name);
> + break;
> +   }
> + /* fall through */
> default:
>   x86_print_call_or_nop (file, mcount_name);
>   break;
> --
> 2.39.2
>


Re: Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-10 Thread Fei Gao
On 2023-05-08 17:20  Kito Cheng  wrote:
>
>-msave-restore is a different story; it's only enabled when the user
>requests, but `-march` describes the capability of the target
>architecture, not specify the preference of performance or size, which
>should be determined by -O1~-O3/-Ofast or -Os/-Oz.
> 

I see and fully agree. 
I'll find a better way to resolve the conflict, 
My current idea is to diasble zcmp when shrink-wrap-separate is actually 
active. 

Thanks Kito and Andrew Pinski for your patience.

BR, 
Fei
>On Mon, May 8, 2023 at 4:54 PM Fei Gao  wrote:
>>
>> On 2023-05-08 16:05  Kito Cheng  wrote:
>> >
>> >> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> >> > index 45a63cab9c9..629e5e45cac 100644
>> >> > --- a/gcc/config/riscv/riscv.cc
>> >> > +++ b/gcc/config/riscv/riscv.cc
>> >> > @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>> >> >
>> >> >    if (riscv_use_save_libcall (&cfun->machine->frame)
>> >> >    || cfun->machine->interrupt_handler_p
>> >> > -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
>> >> > +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
>> >> > +  || TARGET_ZCMP)
>> >> >  return components;
>> >>
>> >> I think this is a bad idea. I have a use case where we use the C
>> >> extensions but still compile for -O2 because we want the code to be
>> >> fast as possible but still having the savings of the C extensions.
>> >
>> >Yeah, agree, so I would prefer to drop this from the patch series.
>>
>> Zcmp is a little different here than C.
>> C extension is done fully in AS.  So  we have the code to be
>> fast as possible but still having the savings of the C extensions.
>>
>> Zcmp and shrink-wrap-separate are both done in prologue/epilogue pass
>> and you can only have one switch active to direct sregs save and restore.
>> In my understanding, zcmp push and pop insns seem to
>> be mutually exclusive in functionality to shrink-wrap-separate.
>> It's not expected to see zcmp insns at the begining/end of prologue/epilogue,
>> and also repeated store/load sregs in separate blocks.
>>
>> Same for save and restore, and i guess that's why we have
>> riscv_use_save_libcall (&cfun->machine->frame) check here.
>>
>> BR,
>> Fei
>>
>> >
>> >> Thanks,
>> >> Andrew Pinski

Re: Re: [PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-10 Thread Fei Gao
On 2023-05-08 10:48  Kito Cheng  wrote:
>
>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>new file mode 100644
>index 000..1c2f390269e
>--- /dev/null
>+++ b/gcc/config/riscv/zc.md
>@@ -0,0 +1,55 @@
>...
>+(define_insn "gpr_multi_pop"
>+  [(unspec_volatile [(match_operand 0 "const_int_operand")
>+ (match_operand 1 "const_int_operand")]
>+    UNSPECV_GPR_MULTI_POP)]
>
>I would strongly suggest modeling the right memory and register access
>here correctly instead of using unspec,
>and same for other two patterns.
>
>That will help GCC know the semantics of this operation. 

Sure, working on it. 

BR, 
Fei

[PATCH][committed] aarch64: PR target/99195 annotate simple narrowing patterns for vec-concat-zero

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch cleans up some almost-duplicate patterns for the XTN, SQXTN, UQXTN 
instructions.
Using the  attributes we can remove the BYTES_BIG_ENDIAN and 
!BYTES_BIG_ENDIAN cases,
as well as the intrinsic expanders that select between the two.
Tests are also added. Thankfully the diffstat comes out negative \O/.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_xtn_insn_le): Delete.
(aarch64_xtn_insn_be): Likewise.
(trunc2): Rename to...
(trunc2): ... This.
(aarch64_xtn): Move under the above.  Just emit the truncate RTL.
(aarch64_qmovn): Likewise.
(aarch64_qmovn): New define_insn.
(aarch64_qmovn_insn_le): Delete.
(aarch64_qmovn_insn_be): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/pr99195_4.c: Add tests for vmovn, vqmovn.


vcxtn.patch
Description: vcxtn.patch


[PATCH][committed] aarch64: Simplify QSHRN expanders and patterns

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch deletes the explicit BYTES_BIG_ENDIAN and !BYTES_BIG_ENDIAN patterns 
for the QSHRN instructions in favour
of annotating a single one with . This allows simplification of 
the expander too.
Tests are added to ensure that we still optimise away the concat-with-zero use 
case.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(aarch64_qshrn_n_insn_le): Delete.
(aarch64_qshrn_n_insn_be): Delete.
(aarch64_qshrn_n_insn): New define_insn.
(aarch64_qshrn_n): Simplify expander.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/pr99195_5.c: New test.


qshrn.patch
Description: qshrn.patch


[PATCH] ipa-prop: Fix ipa_get_callee_param_type for calls with argument type mismatches

2023-05-10 Thread Jakub Jelinek via Gcc-patches
Hi!

The PR contains a testcase where the Fortran FE creates FUNCTION_TYPE
which doesn't really match the passed in arguments (FUNCTION_TYPE has
5 arguments, call has 6).  Now, I think that is a Fortran FE bug that
should be fixed there, but I think with function pointers one can
create something similar (of course invalid) in C/C++ too,so IMHO IPA
should be also more careful.
The ipa_get_callee_param_type function can return NULL if something goes
wrong and it does e.g. if asked for 7th argument type on a function
with just 5 arguments and similar.  But, if a function isn't varargs,
when asked for 6th argument type on a function with just 5 arguments
it actually returns void_type_node because the argument list is in that
case terminated with void_list_node.

The following patch makes sure we don't treat void_list_node as something
holding another argument.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-05-10  Jakub Jelinek  

PR fortran/109788
* ipa-prop.cc (ipa_get_callee_param_type): Don't return TREE_VALUE (t)
if t is void_list_node.

--- gcc/ipa-prop.cc.jj  2023-05-01 09:59:46.485296735 +0200
+++ gcc/ipa-prop.cc 2023-05-09 15:07:44.975578250 +0200
@@ -2147,7 +2147,7 @@ ipa_get_callee_param_type (struct cgraph
 break;
   t = TREE_CHAIN (t);
 }
-  if (t)
+  if (t && t != void_list_node)
 return TREE_VALUE (t);
   if (!e->callee)
 return NULL;

Jakub



Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Biener via Gcc-patches
On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
 wrote:
>
> Oluwatamilore Adebayo  writes:
> > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> > From: oluade01 
> > Date: Fri, 14 Apr 2023 10:24:43 +0100
> > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
> >
> > This adds a recognition pattern for the non-widening
> > absolute difference (ABD).
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi (sabd, uabd): Document them.
> > * internal-fn.def (ABD): Use new optab.
> > * optabs.def (sabd_optab, uabd_optab): New optabs,
> > * tree-vect-patterns.cc (vect_recog_absolute_difference):
> > Recognize the following idiom abs (a - b).
> > (vect_recog_sad_pattern): Refactor to use
> > vect_recog_absolute_difference.
> > (vect_recog_abd_pattern): Use patterns found by
> > vect_recog_absolute_difference to build a new ABD
> > internal call.
> > ---
> >  gcc/doc/md.texi   |  10 ++
> >  gcc/internal-fn.def   |   3 +
> >  gcc/optabs.def|   2 +
> >  gcc/tree-vect-patterns.cc | 250 +-
> >  4 files changed, 234 insertions(+), 31 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
> >  Vector shift and rotate instructions that take vectors as operand 2
> >  instead of a scalar type.
> >
> > +@cindex @code{uabd@var{m}} instruction pattern
> > +@cindex @code{sabd@var{m}} instruction pattern
> > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> > +Signed and unsigned absolute difference instructions.  These
> > +instructions find the difference between operands 1 and 2
> > +then return the absolute value.  A C code equivalent would be:
> > +@smallexample
> > +op0 = abs (op0 - op1)
>
> op0 = abs (op1 - op2)
>
> But that isn't the correct calculation for unsigned (where abs doesn't
> really work).  It also doesn't handle some cases correctly for signed.
>
> I think it's more:
>
>   op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1
>
> or (conceptually) max minus min.
>
> E.g. for 16-bit values, the absolute difference between signed 0x7fff
> and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
> to signed).  But, ignoring undefined behaviour:
>
>   0x7fff - 0x8000 = -1
>   abs(-1) = 1
>
> which gives the wrong answer.
>
> We might still be able to fold C abs(a - b) to abd for signed a and b
> by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
> can't do it for -fwrapv.
>
> Richi knows better than me what would be appropriate here.

The question is what does the hardware do?  For the widening [us]sad it's
obvious since the difference is computed in a wider signed mode and the
absolute value always fits.

So what does it actually do, esp. when the difference yields 0x8000?

Richard.

>
> Thanks,
> Richard
>
> > +@end smallexample
> > +
> >  @cindex @code{avg@var{m}3_floor} instruction pattern
> >  @cindex @code{uavg@var{m}3_floor} instruction pattern
> >  @item @samp{avg@var{m}3_floor}
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 
> > 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
> >  100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> > + sabd, uabd, binary)
> > +
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> >   savg_floor, uavg_floor, binary)
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 
> > 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
> >  100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> > "mask_fold_left_plus_$a")
> >  OPTAB_D (extract_last_optab, "extract_last_$a")
> >  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
> >
> > +OPTAB_D (uabd_optab, "uabd$a3")
> > +OPTAB_D (sabd_optab, "sabd$a3")
> >  OPTAB_D (savg_floor_optab, "avg$a3_floor")
> >  OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
> >  OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 
> > a49b09539776c0056e77f99b10365d0a8747fbc5..91e1f9d4b610275dd833ec56dc77f76367ee7886
> >  100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -770,6 +770,89 @@ vect_spli

Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Biener via Gcc-patches
On Wed, May 10, 2023 at 11:49 AM Richard Biener
 wrote:
>
> On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
>  wrote:
> >
> > Oluwatamilore Adebayo  writes:
> > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> > > From: oluade01 
> > > Date: Fri, 14 Apr 2023 10:24:43 +0100
> > > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
> > >
> > > This adds a recognition pattern for the non-widening
> > > absolute difference (ABD).
> > >
> > > gcc/ChangeLog:
> > >
> > > * doc/md.texi (sabd, uabd): Document them.
> > > * internal-fn.def (ABD): Use new optab.
> > > * optabs.def (sabd_optab, uabd_optab): New optabs,
> > > * tree-vect-patterns.cc (vect_recog_absolute_difference):
> > > Recognize the following idiom abs (a - b).
> > > (vect_recog_sad_pattern): Refactor to use
> > > vect_recog_absolute_difference.
> > > (vect_recog_abd_pattern): Use patterns found by
> > > vect_recog_absolute_difference to build a new ABD
> > > internal call.
> > > ---
> > >  gcc/doc/md.texi   |  10 ++
> > >  gcc/internal-fn.def   |   3 +
> > >  gcc/optabs.def|   2 +
> > >  gcc/tree-vect-patterns.cc | 250 +-
> > >  4 files changed, 234 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > > index 
> > > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
> > >  100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to 
> > > the
> > >  Vector shift and rotate instructions that take vectors as operand 2
> > >  instead of a scalar type.
> > >
> > > +@cindex @code{uabd@var{m}} instruction pattern
> > > +@cindex @code{sabd@var{m}} instruction pattern
> > > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> > > +Signed and unsigned absolute difference instructions.  These
> > > +instructions find the difference between operands 1 and 2
> > > +then return the absolute value.  A C code equivalent would be:
> > > +@smallexample
> > > +op0 = abs (op0 - op1)
> >
> > op0 = abs (op1 - op2)
> >
> > But that isn't the correct calculation for unsigned (where abs doesn't
> > really work).  It also doesn't handle some cases correctly for signed.
> >
> > I think it's more:
> >
> >   op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1
> >
> > or (conceptually) max minus min.
> >
> > E.g. for 16-bit values, the absolute difference between signed 0x7fff
> > and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
> > to signed).  But, ignoring undefined behaviour:
> >
> >   0x7fff - 0x8000 = -1
> >   abs(-1) = 1
> >
> > which gives the wrong answer.
> >
> > We might still be able to fold C abs(a - b) to abd for signed a and b
> > by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
> > can't do it for -fwrapv.
> >
> > Richi knows better than me what would be appropriate here.
>
> The question is what does the hardware do?  For the widening [us]sad it's
> obvious since the difference is computed in a wider signed mode and the
> absolute value always fits.
>
> So what does it actually do, esp. when the difference yields 0x8000?

A "sensible" definition would be that it works like the widening [us]sad
and applies truncation to the result (modulo-reducing when the result
isn't always unsigned).

Richard.

> Richard.
>
> >
> > Thanks,
> > Richard
> >
> > > +@end smallexample
> > > +
> > >  @cindex @code{avg@var{m}3_floor} instruction pattern
> > >  @cindex @code{uavg@var{m}3_floor} instruction pattern
> > >  @item @samp{avg@var{m}3_floor}
> > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > > index 
> > > 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
> > >  100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> > >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> > >  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> > >
> > > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> > > + sabd, uabd, binary)
> > > +
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> > >   savg_floor, uavg_floor, binary)
> > >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> > > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > > index 
> > > 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
> > >  100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> > > "mask_fold_left_plus_$a")
> > >  OPTAB_D (extract_last_optab, "extract_last_$a")
> > >  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
> > >
> > > +OPTAB_D (uabd_optab

Re: [PATCH] ipa-prop: Fix ipa_get_callee_param_type for calls with argument type mismatches

2023-05-10 Thread Richard Biener via Gcc-patches
On Wed, 10 May 2023, Jakub Jelinek wrote:

> Hi!
> 
> The PR contains a testcase where the Fortran FE creates FUNCTION_TYPE
> which doesn't really match the passed in arguments (FUNCTION_TYPE has
> 5 arguments, call has 6).  Now, I think that is a Fortran FE bug that
> should be fixed there, but I think with function pointers one can
> create something similar (of course invalid) in C/C++ too,so IMHO IPA
> should be also more careful.
> The ipa_get_callee_param_type function can return NULL if something goes
> wrong and it does e.g. if asked for 7th argument type on a function
> with just 5 arguments and similar.  But, if a function isn't varargs,
> when asked for 6th argument type on a function with just 5 arguments
> it actually returns void_type_node because the argument list is in that
> case terminated with void_list_node.
> 
> The following patch makes sure we don't treat void_list_node as something
> holding another argument.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-05-10  Jakub Jelinek  
> 
>   PR fortran/109788
>   * ipa-prop.cc (ipa_get_callee_param_type): Don't return TREE_VALUE (t)
>   if t is void_list_node.
> 
> --- gcc/ipa-prop.cc.jj2023-05-01 09:59:46.485296735 +0200
> +++ gcc/ipa-prop.cc   2023-05-09 15:07:44.975578250 +0200
> @@ -2147,7 +2147,7 @@ ipa_get_callee_param_type (struct cgraph
>  break;
>t = TREE_CHAIN (t);
>  }
> -  if (t)
> +  if (t && t != void_list_node)
>  return TREE_VALUE (t);
>if (!e->callee)
>  return NULL;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-10 Thread Uros Bizjak via Gcc-patches
On Fri, Apr 28, 2023 at 2:47 AM Fangrui Song  wrote:
>
> When using -mcmodel=medium, large data is placed into .l* sections.  GNU ld
> places .l* sections into separate output sections.  If small and medium
> code model object files are mixed, the .l* sections won't cause
> relocation overflow pressure on sections in -mcmodel=small object files.
>
> However, when using -mcmodel=large, -mlarge-data-threshold doesn't apply.  
> This
> means that the .rodata/.data/.bss sections may cause relocation overflow
> pressure on sections in -mcmodel=small object files.
>
> This patch allows -mcmodel=large to generate .l* sections.

The x86_64 psABI does not specify how -mlarge-threshold is handled
with -mcmodel=large and it also doesn't mention that -mcmodel=large
inherits handling from -mcmodel=medium. The ABI does say that the
-mcmodel=large is strictly only required if the text of a single
function becomes larger than what the medium model allows.

OTOH, the ABI also doesn't prohibit large sections with -mcmodel=large
and IMO, the introduction of -mlarge-threshold with -mcmodel=large
does not create an ABI change.

I think the best way is to first discuss the issue with the x86_64
psABI group, to clarify how -mlarge-threshold and large data is
handled under a large code model.

Uros.
>
> Signed-off-by: Fangrui Song 
> ---
>  gcc/config/i386/i386.cc| 15 +--
>  gcc/config/i386/i386.opt   |  2 +-
>  gcc/doc/invoke.texi|  7 ---
>  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
>  4 files changed, 27 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index a3db55642e3..c68c66a5567 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -637,7 +637,8 @@ ix86_can_inline_p (tree caller, tree callee)
>  static bool
>  ix86_in_large_data_p (tree exp)
>  {
> -  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
> +  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
> +  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
>  return false;
>
>if (exp == NULL_TREE)
> @@ -848,8 +849,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
> const char *name, unsigned HOST_WIDE_INT size,
> unsigned align)
>  {
> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> -  && size > (unsigned int)ix86_section_threshold)
> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> +  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> + size > (unsigned int)ix86_section_threshold)
>  {
>switch_to_section (get_named_section (decl, ".lbss", 0));
>fputs (LARGECOMM_SECTION_ASM_OP, file);
> @@ -869,9 +871,10 @@ void
>  x86_output_aligned_bss (FILE *file, tree decl, const char *name,
> unsigned HOST_WIDE_INT size, unsigned align)
>  {
> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> -  && size > (unsigned int)ix86_section_threshold)
> -switch_to_section (get_named_section (decl, ".lbss", 0));
> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> +   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> +  size > (unsigned int)ix86_section_threshold)
> +switch_to_section(get_named_section(decl, ".lbss", 0));
>else
>  switch_to_section (bss_section);
>ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index d74f6b1f8fc..de8e722cd62 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
>
>  mlarge-data-threshold=
>  Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
> Init(DEFAULT_LARGE_SECTION_THRESHOLD)
> --mlarge-data-threshold=Data greater than given threshold 
> will go into .ldata section in x86-64 medium model.
> +-mlarge-data-threshold=Data greater than given threshold 
> will go into a large data section in x86-64 medium and large code models.
>
>  mcmodel=
>  Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index e5ee2d536fc..4a20eef92e5 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -32927,9 +32927,10 @@ the cache line size.  @samp{compat} is the default.
>
>  @opindex mlarge-data-threshold
>  @item -mlarge-data-threshold=@var{threshold}
> -When @option{-mcmodel=medium} is specified, data objects larger than
> -@var{threshold} are placed in the large data section.  This value must be the
> -same across all objects linked into the binary, and defaults to 65535.
> +When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
> +objects larger than @var{

[PATCH] fix radix sort on 32bit platforms [PR109670]

2023-05-10 Thread Thomas Neumann via Gcc-patches

The radix sort uses two buffers, a1 for input and a2 for output.
After every digit the role of the two buffers is swapped.
When terminating the sort early the code made sure the output
was in a2.  However, when we run out of bits, as can happen on
32bit platforms, the sorted result was in a1, was we had just
swapped a1 and a2.
This patch fixes the problem by unconditionally having a1 as
output after every loop iteration.

This bug manifested itself only on 32bit platforms and even then
only in some circumstances, as it needs frames where a swap
is required due to differences in the top-most byte, which is
affected by ASLR. The new logic was validated by exhaustive
search over 32bit input values.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Fix radix sort buffer management.
---
 libgcc/unwind-dw2-fde.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 7b74c391ced..31a3834156b 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -624,8 +624,6 @@ fde_radixsort (struct object *ob, fde_extractor_t 
fde_extractor,

   // Stop if we are already sorted.
   if (!violations)
{
- // The sorted data is in a1 now.
- a2 = a1;
  break;
}

@@ -660,9 +658,9 @@ fde_radixsort (struct object *ob, fde_extractor_t 
fde_extractor,

 #undef FANOUT
 #undef FANOUTBITS

-  // The data is in a2 now, move in place if needed.
-  if (a2 != v1->array)
-memcpy (v1->array, a2, sizeof (const fde *) * n);
+  // The data is in a1 now, move in place if needed.
+  if (a1 != v1->array)
+memcpy (v1->array, a1, sizeof (const fde *) * n);
 }

 static inline void
--
2.39.2



[PATCH][committed] aarch64: PR target/99195 annotate simple saturating add/sub patterns for vec-concat-zero

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Moving onto the saturating instructions, this one goes through the simple 
add/sub ones.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_q):
Rename to...
(aarch64_q): ... This.
(aarch64_qadd): Rename to...
(aarch64_qadd): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for qadd, qsub.
* gcc.target/aarch64/simd/pr99195_6.c: New test.


qaddsub.patch
Description: qaddsub.patch


[PATCH][committed] aarch64: PR target/99195 annotate simple permutation patterns for vec-concat-zero

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Another straightforward patch annotating patterns for the zip1, zip2, uzp1, 
uzp2, rev* instructions, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_):
 Rename to...
(aarch64_): ... This.
(aarch64_rev): Rename to...
(aarch64_rev): ... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for zip and rev
intrinsics.


permvcz.patch
Description: permvcz.patch


[PATCH][committed] aarch64: Simplify sqmovun expander

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch is a no-op as it removes the explicit vec-concat-zero patterns in 
favour of vczle/vczbe.
This allows us to delete the explicit expander too. Tests are added to ensure 
the optimisation required
still triggers.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_sqmovun_insn_le): 
Delete.
(aarch64_sqmovun_insn_be): Delete.
(aarch64_sqmovun): New define_insn.
(aarch64_sqmovun): Delete expander.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/pr99195_4.c: Add tests for sqmovun.


sqmovun.patch
Description: sqmovun.patch


[RFC] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-10 Thread Jonathan Wakely via Gcc-patches
This patch would avoid TSan false positives when using timed waiting
functions on mutexes and condvars, but as noted below, it changes the
semantics.

I'm not sure whether we want this workaround in place until tsan gets
fixed.

On one hand, there's no guarantee that those functions use the right
clock anyway (and they won't do unless a recent-ish glibc is used). But
on the other hand, if they normally would use the right clock because
you have glibc support, it's not ideal for tsan to cause a different
clock to be used.

-- >8 --

As noted in https://github.com/llvm/llvm-project/issues/62623 there are
no tsan interceptors for the new POSIX-1:202x APIs added by
https://austingroupbugs.net/view.php?id=1216 so tsan gives false
positive warnings.

Disable the uses of the new APIs when tsan is active. This changes the
semantics of those functions, because it can change which clock is used
for the wait. This means those functions might be affected by system
clock adjustments when tsan is used, when they would not be affected
otherwise.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT): Define
_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT to depend on _GLIBCXX_TSAN.
(GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Likewise for
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK.
(GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK): Likewise for
_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 6 +++---
 libstdc++-v3/configure| 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 89e7f5f5f45..e2700b05ec3 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4284,7 +4284,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
   [glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
   ])
   if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
-AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if 
pthread_cond_clockwait is available in .])
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, (_GLIBCXX_TSAN==0), [Define 
if pthread_cond_clockwait is available in .])
   fi
 
   CXXFLAGS="$ac_save_CXXFLAGS"
@@ -4314,7 +4314,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK], [
   [glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK=no])
   ])
   if test $glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK = yes; then
-AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, 1, [Define if 
pthread_mutex_clocklock is available in .])
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, (_GLIBCXX_TSAN==0), 
[Define if pthread_mutex_clocklock is available in .])
   fi
 
   CXXFLAGS="$ac_save_CXXFLAGS"
@@ -4346,7 +4346,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK], [
   [glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK=no])
   ])
   if test $glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK = yes; then
-AC_DEFINE(_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK, 1, [Define if 
pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are available in 
.])
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK, (_GLIBCXX_TSAN==0), 
[Define if pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are 
available in .])
   fi
 
   CXXFLAGS="$ac_save_CXXFLAGS"



Re: [RFC] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-10 Thread Jonathan Wakely via Gcc-patches
On Wed, 10 May 2023 at 12:20, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> This patch would avoid TSan false positives when using timed waiting
> functions on mutexes and condvars, but as noted below, it changes the
> semantics.
>
> I'm not sure whether we want this workaround in place until tsan gets
> fixed.
>
> On one hand, there's no guarantee that those functions use the right
> clock anyway (and they won't do unless a recent-ish glibc is used). But
> on the other hand, if they normally would use the right clock because
> you have glibc support, it's not ideal for tsan to cause a different
> clock to be used.
>

But of course, it's not ideal to get false positives from tsan either
(especially when it looks like a libstdc++ bug, as initially reported to
me).



>
> -- >8 --
>
> As noted in https://github.com/llvm/llvm-project/issues/62623 there are
> no tsan interceptors for the new POSIX-1:202x APIs added by
> https://austingroupbugs.net/view.php?id=1216 so tsan gives false
> positive warnings.
>
> Disable the uses of the new APIs when tsan is active. This changes the
> semantics of those functions, because it can change which clock is used
> for the wait. This means those functions might be affected by system
> clock adjustments when tsan is used, when they would not be affected
> otherwise.
>
> libstdc++-v3/ChangeLog:
>
> * acinclude.m4 (GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT): Define
> _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT to depend on _GLIBCXX_TSAN.
> (GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Likewise for
> _GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK.
> (GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK): Likewise for
> _GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK.
> * configure: Regenerate.
> ---
>  libstdc++-v3/acinclude.m4 | 6 +++---
>  libstdc++-v3/configure| 6 +++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 89e7f5f5f45..e2700b05ec3 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -4284,7 +4284,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
>[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
>])
>if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
> -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
> pthread_cond_clockwait is available in .])
> +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, (_GLIBCXX_TSAN==0),
> [Define if pthread_cond_clockwait is available in .])
>fi
>
>CXXFLAGS="$ac_save_CXXFLAGS"
> @@ -4314,7 +4314,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK], [
>[glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK=no])
>])
>if test $glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK = yes; then
> -AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, 1, [Define if
> pthread_mutex_clocklock is available in .])
> +AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, (_GLIBCXX_TSAN==0),
> [Define if pthread_mutex_clocklock is available in .])
>fi
>
>CXXFLAGS="$ac_save_CXXFLAGS"
> @@ -4346,7 +4346,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_RWLOCK_CLOCKLOCK], [
>[glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK=no])
>])
>if test $glibcxx_cv_PTHREAD_RWLOCK_CLOCKLOCK = yes; then
> -AC_DEFINE(_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK, 1, [Define if
> pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are available in
> .])
> +AC_DEFINE(_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK, (_GLIBCXX_TSAN==0),
> [Define if pthread_rwlock_clockrdlock and pthread_rwlock_clockwrlock are
> available in .])
>fi
>
>CXXFLAGS="$ac_save_CXXFLAGS"
>
>


[committed] Fix a couple constraints on the H8 in preparation for LRA conversion

2023-05-10 Thread Jeff Law via Gcc-patches

So this is the 2nd patch on the way to LRA for the H8.

LRA is more sensitive to getting define_constraint vs 
define_memory_constraint vs define_special_memory_constraint correct. 
than reload.


The H8 port has the "Q" constraint, which is used to indicate memory 
addresses that can be used under certain circumstances in various ALU 
operations.  So it should be a memory constraint.  Ideally it'd would be 
a simple memory constraint, but it's used in contexts where MEMs are 
valid only for certain parts in the H8 family.  So it really needs to be 
a special_memory_constraint.


The "Zz" constraint accepts memory, but the forms are limited and can 
not be reloaded into a register.   It seems to be working, but I 
wouldn't be totally surprised if this got stressed in the right way if 
it broke.


Anyway, this patch fixes "Q" and "Zz" to be special memory constraints.

Regression tested with gdbsim and pushed to the trunk.

jeffcommit 02d79fb228c9f4c7d00dfe09b6ca7ef1392afbec
Author: Jeff Law 
Date:   Wed May 10 05:25:12 2023 -0600

Fix a couple constraints on the H8 in preparation for LRA conversion

So this is the 2nd patch on the way to LRA for the H8.

LRA is more sensitive to getting define_constraint vs 
define_memory_constraint
vs define_special_memory_constraint correct.  than reload.

The H8 port has the "Q" constraint, which is used to indicate memory 
addresses
that can be used under certain circumstances in various ALU operations.  So 
it
should be a memory constraint.  Ideally it'd would be a simple memory
constraint, but it's used in contexts where MEMs are valid only for certain
parts in the H8 family.  So it really needs to be a 
special_memory_constraint.

The "Zz" constraint accepts memory, but the forms are limited and can not be
reloaded into a register.   It seems to be working, but I wouldn't be 
totally
surprised if this got stressed in the right way if it broke.

Anyway, this patch fixes "Q" and "Zz" to be special memory constraints.

Regression tested with gdbsim and pushed to the trunk.

gcc
* config/h8300/constraints.md (Q): Make this a special memory
constraint.
(Zz): Similarly.

diff --git a/gcc/config/h8300/constraints.md b/gcc/config/h8300/constraints.md
index f1ebce14dad..3aef1205fef 100644
--- a/gcc/config/h8300/constraints.md
+++ b/gcc/config/h8300/constraints.md
@@ -144,7 +144,7 @@ (define_constraint "G"
(match_test "op == CONST0_RTX (SFmode)")))
 
 ;; Extra constraints.
-(define_constraint "Q"
+(define_special_memory_constraint "Q"
   "@internal"
   (and (match_test "TARGET_H8300SX")
(match_operand 0 "memory_operand")))
@@ -211,7 +211,7 @@ (define_constraint "Y2"
   (and (match_code "const_int")
(match_test "exact_log2 (ival & 0xff) != -1")))
 
-(define_constraint "Zz"
+(define_special_memory_constraint "Zz"
   "@internal"
   (and (match_test "TARGET_H8300SX")
(match_code "mem")


[PATCH v2] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Pan Li via Gcc-patches
From: Pan Li 

The decl_or_value is defined as void * before this PATCH. It will take
care of both the tree_node and rtx_def. Unfortunately, given a void
pointer cannot tell the input is tree_node or rtx_def.

Then we have some implicit structure layout requirement similar as
below. Or we will touch unreasonable bits when cast void * to tree_node
or rtx_def.

++---+--+
| offset | tree_node | rtx_def  |
++---+--+
|  0 | code: 16  | code: 16 | <- require the location and bitssize
++---+--+
| 16 | ...   | mode: 8  |
++---+--+
| ...   |
++---+--+
| 24 | ...   | ...  |
++---+--+

This behavior blocks the PATCH that extend the rtx_def mode from 8 to
16 bits for running out of machine mode. This PATCH introduced the
pointer_mux to tell the input is tree_node or rtx_def, and decouple
the above implicition dependency.

Signed-off-by: Pan Li 
Co-Authored-By: Richard Sandiford 
Co-Authored-By: Richard Biener 
Co-Authored-By: Jakub Jelinek 

gcc/ChangeLog:

* mux-utils.h: Add overload operator == and != for pointer_mux..
* var-tracking.cc: Included mux-utils.h for pointer_tmux.
(decl_or_value): Changed from void * to pointer_mux.
(dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
(dv_as_decl): Likewise.
(dv_as_value): Likewise.
(dv_as_opaque): Likewise.
(variable_hasher::equal): Likewise.
(dv_from_decl): Likewise.
(dv_from_value): Likewise.
(attrs_list_member): Likewise.
(shared_hash_find_slot_unshare_1): Likewise.
(shared_hash_find_slot_1): Likewise.
(shared_hash_find_slot_noinsert_1): Likewise.
(shared_hash_find_1): Likewise.
(unshare_variable): Likewise.
(vars_copy): Likewise.
(var_reg_decl_set): Likewise.
(var_reg_delete_and_set): Likewise.
(find_loc_in_1pdv): Likewise.
(canonicalize_values_star): Likewise.
(variable_post_merge_new_vals): Likewise.
(find_mem_expr_in_1pdv): Likewise.
(dump_onepart_variable_differences): Likewise.
(variable_different_p): Likewise.
(dataflow_set_different): Likewise.
(variable_from_dropped): Likewise.
(variable_was_changed): Likewise.
(set_slot_part): Likewise.
(clobber_slot_part): Likewise.
(loc_exp_insert_dep): Likewise.
(notify_dependents_of_resolved_value): Likewise.
(vt_expand_loc_callback): Likewise.
(remove_value_from_changed_variables): Likewise
(notify_dependents_of_changed_value): Likewise.
(emit_notes_for_differences_1): Likewise.
(emit_notes_for_differences_2): Likewise.
---
 gcc/mux-utils.h |  12 
 gcc/var-tracking.cc | 145 +++-
 2 files changed, 87 insertions(+), 70 deletions(-)

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
index a2b6a316899..3eec4edc833 100644
--- a/gcc/mux-utils.h
+++ b/gcc/mux-utils.h
@@ -72,6 +72,18 @@ public:
   // Return true unless the pointer is a null A pointer.
   explicit operator bool () const { return m_ptr; }
 
+  // Return true if class has the same m_ptr, or false.
+  bool operator == (const pointer_mux &other)
+{
+  return this->m_ptr == other.m_ptr;
+}
+
+  // Return true if class has the different m_ptr, or false.
+  bool operator != (const pointer_mux &other)
+{
+  return this->m_ptr != other.m_ptr;
+}
+
   // Assign A and B pointers respectively.
   void set_first (T1 *ptr) { *this = first (ptr); }
   void set_second (T2 *ptr) { *this = second (ptr); }
diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index fae0c73e02f..48a585423d9 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -116,9 +116,14 @@
 #include "fibonacci_heap.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "mux-utils.h"
 
 typedef fibonacci_heap  bb_heap_t;
 
+/* A declaration of a variable, or an RTL value being handled like a
+   declaration by pointer_mux.  */
+typedef pointer_mux decl_or_value;
+
 /* var-tracking.cc assumes that tree code with the same value as VALUE rtx code
has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
Currently the value is the same as IDENTIFIER_NODE, which has such
@@ -196,15 +201,11 @@ struct micro_operation
 };
 
 
-/* A declaration of a variable, or an RTL value being handled like a
-   declaration.  */
-typedef void *decl_or_value;
-
 /* Return true if a decl_or_value DV is a DECL or NULL.  */
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  return dv.is_first ();
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
@@ -219,7 +220,7 @@ static inline tree
 dv_as_decl (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_decl_p (dv));
-  return (t

RE: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

2023-05-10 Thread Li, Pan2 via Gcc-patches
Thanks all for comments.

Looks like pay too much attention for the NULL check but it is covered by 
pointer_mux already. Update PATCH v2 as below, please help to review 
continuously.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618007.html

Pan

-Original Message-
From: Jakub Jelinek  
Sent: Wednesday, May 10, 2023 4:14 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang ; jeffreya...@gmail.com; rguent...@suse.de; 
richard.sandif...@arm.com
Subject: Re: [PATCH] Var-Tracking: Leverage pointer_mux for decl_or_value

On Wed, May 10, 2023 at 03:17:58PM +0800, Pan Li via Gcc-patches wrote:
> gcc/ChangeLog:
> 
>   * var-tracking.cc (DECL_OR_VALUE_OR_DEFAULT): New macro for
> clean code.

ChangeLog formatting shouldn't have spaces after the initial tab.
Furthermore, the entry doesn't describe what changes you've made.
It should start with:
* var-tracking.cc: Include mux-utils.h.
(decl_or_value): Changed from void * to
pointer_mux.
(DECL_OR_VALUE_OR_DEFAULT): Define.
etc.

> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,9 +116,17 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> +/* A declaration of a variable, or an RTL value being handled like a
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
> +
> +#define DECL_OR_VALUE_OR_DEFAULT(ptr) \
> +  ((ptr) ? decl_or_value (ptr) : decl_or_value ())
> +
>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
> code
> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
> Currently the value is the same as IDENTIFIER_NODE, which has such 
> @@ -196,15 +204,21 @@ struct micro_operation  };
>  
>  
> -/* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;
> -
>  /* Return true if a decl_or_value DV is a DECL or NULL.  */  static 
> inline bool  dv_is_decl_p (decl_or_value dv)  {
> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
> +  bool is_decl = !dv;
> +
> +  if (dv)
> +{
> +  if (dv.is_first ())
> + is_decl = (int) TREE_CODE (dv.known_first ()) != (int) VALUE;
> +  else if (!dv.is_first () && !dv.is_second ())
> + is_decl = true;
> +}
> +
> +  return is_decl;

I really don't understand why it needs to be so complicated.
decl_or_value is dv_is_decl_p if it is NULL or if it is a tree, and is false if 
it is rtx VALUE, no other rtxes are expected.
pointer_mux should accept nullptr as being the first one, 
so i'd expect just

/* Return true if a decl_or_value DV is a DECL or NULL.  */ static inline bool 
dv_is_decl_p (decl_or_value dv) {
  return dv.is_first ();
}

/* Return true if a decl_or_value is a VALUE rtl.  */ static inline bool 
dv_is_value_p (decl_or_value dv) {
  return dv.is_second ();
} 

/* Return the decl in the decl_or_value.  */ static inline tree dv_as_decl 
(decl_or_value dv) {
  gcc_checking_assert (dv_is_decl_p (dv));
  return dv.known_first ();
}
  
/* Return the value in the decl_or_value.  */ static inline rtx dv_as_value 
(decl_or_value dv) {
  gcc_checking_assert (dv_is_value_p (dv));
  return dv.known_second ();
}
   
/* Return the opaque pointer in the decl_or_value.  */ static inline void * 
dv_as_opaque (decl_or_value dv) {
  return dv.is_first () ? (void *) dv.known_first ()
: (void *) dv.known_second ();
}

// Ideally dv_as_opaque would just return m_ptr but that // is unfortunately 
private.

And define a hasher for decl_or_value now that it is a class (that would 
hash/compare the m_ptr value or separately dv.is_first () bool and dv_as_opaque 
pointer).

And then I'd hope you don't need to do any further changes in the file.

Jakub



Re: [PATCH v2] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Jakub Jelinek via Gcc-patches
On Wed, May 10, 2023 at 07:57:05PM +0800, pan2...@intel.com wrote:
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,9 +116,14 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> +/* A declaration of a variable, or an RTL value being handled like a
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
> +
>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
> code
> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
> Currently the value is the same as IDENTIFIER_NODE, which has such
> @@ -196,15 +201,11 @@ struct micro_operation
>  };
>  
>  
> -/* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;

Why do you move the typedef?

> @@ -503,9 +505,7 @@ variable_hasher::hash (const variable *v)
>  inline bool
>  variable_hasher::equal (const variable *v, const void *y)
>  {
> -  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
> -
> -  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));
> +  return dv_as_opaque (v->dv) == y;
>  }

I'm not convinced this is correct.  I think all the find_slot_with_hash
etc. pass in a decl_or_value, so I'd expect y to have decl_or_value
type or something similar.

>  /* Free the element of VARIABLE_HTAB (its type is struct variable_def).  */
> @@ -1396,8 +1396,7 @@ onepart_pool_allocate (onepart_enum onepart)
>  static inline decl_or_value
>  dv_from_decl (tree decl)
>  {
> -  decl_or_value dv;
> -  dv = decl;
> +  decl_or_value dv = decl_or_value::first (decl);

Can't you just decl_or_value dv = decl; ?  I think pointer_mux has ctors
from pointers to the template parameter types.

>gcc_checking_assert (dv_is_decl_p (dv));
>return dv;
>  }
> @@ -1406,8 +1405,7 @@ dv_from_decl (tree decl)
>  static inline decl_or_value
>  dv_from_value (rtx value)
>  {
> -  decl_or_value dv;
> -  dv = value;
> +  decl_or_value dv = decl_or_value::second (value);

Ditto.

> @@ -1661,7 +1659,8 @@ shared_hash_find_slot_unshare_1 (shared_hash **pvars, 
> decl_or_value dv,
>  {
>if (shared_hash_shared (*pvars))
>  *pvars = shared_hash_unshare (*pvars);
> -  return shared_hash_htab (*pvars)->find_slot_with_hash (dv, dvhash, ins);
> +  return shared_hash_htab (*pvars)->find_slot_with_hash (dv_as_opaque (dv),
> +  dvhash, ins);

Then you wouldn't need to change all these.

Jakub



Re: [PATCH v2] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Wed, May 10, 2023 at 07:57:05PM +0800, pan2...@intel.com wrote:
>> --- a/gcc/var-tracking.cc
>> +++ b/gcc/var-tracking.cc
>> @@ -116,9 +116,14 @@
>>  #include "fibonacci_heap.h"
>>  #include "print-rtl.h"
>>  #include "function-abi.h"
>> +#include "mux-utils.h"
>>  
>>  typedef fibonacci_heap  bb_heap_t;
>>  
>> +/* A declaration of a variable, or an RTL value being handled like a
>> +   declaration by pointer_mux.  */
>> +typedef pointer_mux decl_or_value;
>> +
>>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
>> code
>> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
>> Currently the value is the same as IDENTIFIER_NODE, which has such
>> @@ -196,15 +201,11 @@ struct micro_operation
>>  };
>>  
>>  
>> -/* A declaration of a variable, or an RTL value being handled like a
>> -   declaration.  */
>> -typedef void *decl_or_value;
>
> Why do you move the typedef?
>
>> @@ -503,9 +505,7 @@ variable_hasher::hash (const variable *v)
>>  inline bool
>>  variable_hasher::equal (const variable *v, const void *y)
>>  {
>> -  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
>> -
>> -  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));
>> +  return dv_as_opaque (v->dv) == y;
>>  }
>
> I'm not convinced this is correct.  I think all the find_slot_with_hash
> etc. pass in a decl_or_value, so I'd expect y to have decl_or_value
> type or something similar.
>
>>  /* Free the element of VARIABLE_HTAB (its type is struct variable_def).  */
>> @@ -1396,8 +1396,7 @@ onepart_pool_allocate (onepart_enum onepart)
>>  static inline decl_or_value
>>  dv_from_decl (tree decl)
>>  {
>> -  decl_or_value dv;
>> -  dv = decl;
>> +  decl_or_value dv = decl_or_value::first (decl);
>
> Can't you just decl_or_value dv = decl; ?  I think pointer_mux has ctors
> from pointers to the template parameter types.
>
>>gcc_checking_assert (dv_is_decl_p (dv));
>>return dv;
>>  }
>> @@ -1406,8 +1405,7 @@ dv_from_decl (tree decl)
>>  static inline decl_or_value
>>  dv_from_value (rtx value)
>>  {
>> -  decl_or_value dv;
>> -  dv = value;
>> +  decl_or_value dv = decl_or_value::second (value);
>
> Ditto.
>
>> @@ -1661,7 +1659,8 @@ shared_hash_find_slot_unshare_1 (shared_hash **pvars, 
>> decl_or_value dv,
>>  {
>>if (shared_hash_shared (*pvars))
>>  *pvars = shared_hash_unshare (*pvars);
>> -  return shared_hash_htab (*pvars)->find_slot_with_hash (dv, dvhash, ins);
>> +  return shared_hash_htab (*pvars)->find_slot_with_hash (dv_as_opaque (dv),
>> + dvhash, ins);
>
> Then you wouldn't need to change all these.

Also, please do try changing variable_hasher::compare_type to
decl_or_value, and changing the type of the second parameter to
variable_hasher::equal accordingly.  I still feel that we should
be able to get rid of dv_as_opaque entirely.

Thanks,
Richard


[PATCH] RISC-V: Add basic vec_init support for RVV auto-vectorizaiton

2023-05-10 Thread juzhe . zhong
From: Juzhe-Zhong 

This is patching is adding basic vec_init support for RVV auto-vectorization.
Testing is on-going.

This patch makes vec_init support common init vector handling (using 
vslide1down to insert element)
which can handle any cases of initialization vec but it's not optimal for cases.

And support Case 1 optimizaiton:
https://godbolt.org/z/GzYsTEfqx

#include 

typedef int8_t vnx16qi __attribute__((vector_size (16)));

__attribute__((noipa))
void foo(int8_t a, int8_t b, int8_t c, int8_t *out)
{
  vnx16qi v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b };
  *(vnx16qi*) out = v;
}

LLVM codegen:
foo:# @foo
lui a3, 5
addiw   a3, a3, 1365
vsetivlizero, 1, e16, mf4, ta, ma
vmv.s.x v0, a3
vsetivlizero, 16, e8, m1, ta, ma
vmv.v.x v8, a1
vmerge.vxm  v8, v8, a0, v0
vse8.v  v8, (a2)
ret

This patch codegen:
foo:
sllia0,a0,8
or  a0,a0,a1
vsetvli a5,zero,e16,m1,ta,ma
vmv.v.x v1,a0
vs1r.v  v1,0(a2)
ret

We support more optimizations cases in the future. But they are not included in 
this patch.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_init): New pattern.
* config/riscv/riscv-protos.h (expand_vec_init): New function.
* config/riscv/riscv-v.cc (class rvv_builder): New class.
(rvv_builder::can_duplicate_repeating_sequence_p): New function.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(expand_vector_init_insert_elems): Ditto.
(expand_vec_init): Ditto.
* config/riscv/vector-iterators.md: New attribute.

---
 gcc/config/riscv/autovec.md  |  16 
 gcc/config/riscv/riscv-protos.h  |   1 +
 gcc/config/riscv/riscv-v.cc  | 127 +++
 gcc/config/riscv/vector-iterators.md |   9 ++
 4 files changed, 153 insertions(+)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 99dc4f046b0..fb57a52a4b6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -82,3 +82,19 @@
 DONE;
   }
 )
+
+;; -
+;;  [INT,FP] Initialize from individual elements
+;; -
+;; This is the pattern initialize the vector
+;; -
+
+(define_expand "vec_init"
+  [(match_operand:V 0 "register_operand")
+   (match_operand 1 "")]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_init (operands[0], operands[1]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e8a728ae226..7196e34e335 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -220,6 +220,7 @@ void expand_tuple_move (machine_mode, rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
 opt_machine_mode get_mask_mode (machine_mode);
 void expand_vec_series (rtx, rtx, rtx);
+void expand_vec_init (rtx, rtx);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 0c3b1b4c40b..9ab6d7d5f41 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1091,4 +1091,131 @@ preferred_simd_mode (scalar_mode mode)
   return word_mode;
 }
 
+class rvv_builder : public rtx_vector_builder
+{
+public:
+  static const uint8_t MAX_DUP_SIZE = 64;
+
+  rvv_builder () : rtx_vector_builder () {}
+  rvv_builder (machine_mode mode, unsigned int npatterns,
+  unsigned int nelts_per_pattern)
+: rtx_vector_builder (mode, npatterns, nelts_per_pattern)
+  {
+m_inner_mode = GET_MODE_INNER (mode);
+m_inner_size = GET_MODE_BITSIZE (m_inner_mode).to_constant ();
+  }
+
+  bool can_duplicate_repeating_sequence_p ();
+  rtx get_merged_repeating_sequence ();
+
+  machine_mode new_mode () const { return m_new_mode; }
+
+private:
+  machine_mode m_inner_mode;
+  machine_mode m_new_mode;
+  scalar_int_mode m_new_inner_mode;
+  unsigned int m_inner_size;
+};
+
+/* Return true if the vector duplicated by a super element which is the fusion
+   of consecutive elements.
+
+ v = { a, b, a, b } super element = ab, v = { ab, ab }  */
+bool
+rvv_builder::can_duplicate_repeating_sequence_p ()
+{
+  poly_uint64 new_size = exact_div (full_nelts (), npatterns ());
+  unsigned int new_inner_size = m_inner_size * npatterns ();
+  if (!int_mode_for_size (new_inner_size, 0).exists (&m_new_inner_mode)
+  || GET_MODE_BITSIZE (m_new_inner_mode) > MAX_DUP_SIZE
+  || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode))
+return false;
+  return repeating_sequence_p (0, encoded_nelts (), npatterns ());
+}
+
+/* Merge the repeating sequence into a single element and return the RTX.  */
+rtx
+rvv_builder::get_merged_repeating_sequence ()
+{
+  scala

Re: Question on patch -fprofile-partial-training

2023-05-10 Thread Jan Hubicka via Gcc-patches
> Honza,
> > Main motivation for this was profiling programs that contain specific
> > code paths for different CPUs (such as graphics library in Firefox or Linux
> > kernel). In the situation training machine differs from the machine
> > program is run later, we end up optimizing for size all code paths
> > except ones taken by the specific CPU.  This patch essentially tells gcc
> > to consider every non-trained function as built without profile
> > feedback.
> Make sense.
> > 
> > For Firefox it had important impact on graphics rendering tests back
> > then since the building machined had AVX while the benchmarking did not.
> > Some benchmarks improved several times which is not a surprise if you
> > consider tight graphics rendering loop optimized for size versus
> > vectorized one.  
> 
> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO 
> hurt the performance for those cases? 

Yes, to get code size improvements we assume that the non-trained part
of code is cold and with -Os we are very aggressive to optimize for
size.  We now have two-level optimize_for size, so I think we could
make this more fine grained this stage1.

Honza
> 
> > The patch has bad effect on code size which in turn
> > impacts performance too, so I think it makes sense to use
> > -fprofile-partial-training with bit of care (i.e. only one code where
> > such scenarios are likely).
> 
> Right. 
> > 
> > As for backporting, I do not have checkout of GCC 8 right now. It
> > depends on profile infrastructure that was added in 2017 (so stage1 of
> > GCC 8), so the patch may backport quite easilly.  I am not 100% sure
> > what shape the infrastrucure was in the first version, but I am quite
> > convinced it had the necessary bits - it was able to make the difference
> > between 0 profile count and missing profile feedback.
> 
> This is good to know, I will try to back port to GCC8 and let them test to 
> see any good impact.
> 
> Qing
> > 
> > Honza
> >> 
> 


Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Oluwatamilore Adebayo via Gcc-patches
When using inputs of 0x7fff and 0x8000 the result yielded is -1.
When using inputs of -1 and 0x7fff the results yielded is 0x8000.

Tami

From: Richard Biener 
Sent: Wednesday, May 10, 2023 10:49 AM
To: Oluwatamilore Adebayo ; 
gcc-patches@gcc.gnu.org ; richard.guent...@gmail.com 
; Richard Sandiford 
Subject: Re: [PATCH] vect: Missed opportunity to use [SU]ABD

On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
 wrote:
>
> Oluwatamilore Adebayo  writes:
> > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
> > From: oluade01 
> > Date: Fri, 14 Apr 2023 10:24:43 +0100
> > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
> >
> > This adds a recognition pattern for the non-widening
> > absolute difference (ABD).
> >
> > gcc/ChangeLog:
> >
> > * doc/md.texi (sabd, uabd): Document them.
> > * internal-fn.def (ABD): Use new optab.
> > * optabs.def (sabd_optab, uabd_optab): New optabs,
> > * tree-vect-patterns.cc (vect_recog_absolute_difference):
> > Recognize the following idiom abs (a - b).
> > (vect_recog_sad_pattern): Refactor to use
> > vect_recog_absolute_difference.
> > (vect_recog_abd_pattern): Use patterns found by
> > vect_recog_absolute_difference to build a new ABD
> > internal call.
> > ---
> >  gcc/doc/md.texi   |  10 ++
> >  gcc/internal-fn.def   |   3 +
> >  gcc/optabs.def|   2 +
> >  gcc/tree-vect-patterns.cc | 250 +-
> >  4 files changed, 234 insertions(+), 31 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 
> > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
> >  100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
> >  Vector shift and rotate instructions that take vectors as operand 2
> >  instead of a scalar type.
> >
> > +@cindex @code{uabd@var{m}} instruction pattern
> > +@cindex @code{sabd@var{m}} instruction pattern
> > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
> > +Signed and unsigned absolute difference instructions.  These
> > +instructions find the difference between operands 1 and 2
> > +then return the absolute value.  A C code equivalent would be:
> > +@smallexample
> > +op0 = abs (op0 - op1)
>
> op0 = abs (op1 - op2)
>
> But that isn't the correct calculation for unsigned (where abs doesn't
> really work).  It also doesn't handle some cases correctly for signed.
>
> I think it's more:
>
>   op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1
>
> or (conceptually) max minus min.
>
> E.g. for 16-bit values, the absolute difference between signed 0x7fff
> and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
> to signed).  But, ignoring undefined behaviour:
>
>   0x7fff - 0x8000 = -1
>   abs(-1) = 1
>
> which gives the wrong answer.
>
> We might still be able to fold C abs(a - b) to abd for signed a and b
> by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
> can't do it for -fwrapv.
>
> Richi knows better than me what would be appropriate here.

The question is what does the hardware do?  For the widening [us]sad it's
obvious since the difference is computed in a wider signed mode and the
absolute value always fits.

So what does it actually do, esp. when the difference yields 0x8000?

Richard.

>
> Thanks,
> Richard
>
> > +@end smallexample
> > +
> >  @cindex @code{avg@var{m}3_floor} instruction pattern
> >  @cindex @code{uavg@var{m}3_floor} instruction pattern
> >  @item @samp{avg@var{m}3_floor}
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 
> > 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
> >  100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
> >  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
> >
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
> > + sabd, uabd, binary)
> > +
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
> >   savg_floor, uavg_floor, binary)
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 
> > 695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
> >  100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
> > "mask_fold_left_plus_$a")
> >  OPTAB_D (extract_last_optab, "extract_last_$a")
> >  OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
> >
> > +OPTAB_D (uabd_optab, "uabd$a3")
> > +OPTAB_D (sabd_optab, "sabd$a3")
> >  OPTAB_D (savg_floor_optab, "

[PATCH 01/20] arm: [MVE intrinsics] factorize vcmp

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vcmp so that they use the same pattern.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_CMP_M, MVE_CMP_M_F, MVE_CMP_M_N)
(MVE_CMP_M_N_F, mve_cmp_op1): New.
(isu): Add VCMP*
(supf): Likewise.
* config/arm/mve.md (mve_vcmpq_n_): Rename into ...
(@mve_vcmpq_n_): ... this.
(mve_vcmpeqq_m_f, mve_vcmpgeq_m_f)
(mve_vcmpgtq_m_f, mve_vcmpleq_m_f)
(mve_vcmpltq_m_f, mve_vcmpneq_m_f): Merge into ...
(@mve_vcmpq_m_f): ... this.
(mve_vcmpcsq_m_u, mve_vcmpeqq_m_)
(mve_vcmpgeq_m_s, mve_vcmpgtq_m_s)
(mve_vcmphiq_m_u, mve_vcmpleq_m_s)
(mve_vcmpltq_m_s, mve_vcmpneq_m_): Merge into
...
(@mve_vcmpq_m_): ... this.
(mve_vcmpcsq_m_n_u, mve_vcmpeqq_m_n_)
(mve_vcmpgeq_m_n_s, mve_vcmpgtq_m_n_s)
(mve_vcmphiq_m_n_u, mve_vcmpleq_m_n_s)
(mve_vcmpltq_m_n_s, mve_vcmpneq_m_n_): Merge
into ...
(@mve_vcmpq_m_n_): ... this.
(mve_vcmpeqq_m_n_f, mve_vcmpgeq_m_n_f)
(mve_vcmpgtq_m_n_f, mve_vcmpleq_m_n_f)
(mve_vcmpltq_m_n_f, mve_vcmpneq_m_n_f): Merge into ...
(@mve_vcmpq_m_n_f): ... this.
---
 gcc/config/arm/iterators.md | 108 ++
 gcc/config/arm/mve.md   | 414 +++-
 2 files changed, 135 insertions(+), 387 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3c70fd7f56d..ef9fae0412b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -583,6 +583,47 @@ (define_int_iterator MVE_FP_CREATE_ONLY [
 VCREATEQ_F
 ])
 
+;; MVE comparison iterators
+(define_int_iterator MVE_CMP_M [
+VCMPCSQ_M_U
+VCMPEQQ_M_S VCMPEQQ_M_U
+VCMPGEQ_M_S
+VCMPGTQ_M_S
+VCMPHIQ_M_U
+VCMPLEQ_M_S
+VCMPLTQ_M_S
+VCMPNEQ_M_S VCMPNEQ_M_U
+])
+
+(define_int_iterator MVE_CMP_M_F [
+VCMPEQQ_M_F
+VCMPGEQ_M_F
+VCMPGTQ_M_F
+VCMPLEQ_M_F
+VCMPLTQ_M_F
+VCMPNEQ_M_F
+])
+
+(define_int_iterator MVE_CMP_M_N [
+VCMPCSQ_M_N_U
+VCMPEQQ_M_N_S VCMPEQQ_M_N_U
+VCMPGEQ_M_N_S
+VCMPGTQ_M_N_S
+VCMPHIQ_M_N_U
+VCMPLEQ_M_N_S
+VCMPLTQ_M_N_S
+VCMPNEQ_M_N_S VCMPNEQ_M_N_U
+])
+
+(define_int_iterator MVE_CMP_M_N_F [
+VCMPEQQ_M_N_F
+VCMPGEQ_M_N_F
+VCMPGTQ_M_N_F
+VCMPLEQ_M_N_F
+VCMPLTQ_M_N_F
+VCMPNEQ_M_N_F
+])
+
 (define_int_iterator MVE_VMAXVQ_VMINVQ [
 VMAXAVQ_S
 VMAXVQ_S VMAXVQ_U
@@ -655,6 +696,37 @@ (define_code_attr mve_addsubmul [
 (plus "vadd")
 ])
 
+(define_int_attr mve_cmp_op1 [
+(VCMPCSQ_M_U "cs")
+(VCMPEQQ_M_S "eq") (VCMPEQQ_M_U "eq")
+(VCMPGEQ_M_S "ge")
+(VCMPGTQ_M_S "gt")
+(VCMPHIQ_M_U "hi")
+(VCMPLEQ_M_S "le")
+(VCMPLTQ_M_S "lt")
+(VCMPNEQ_M_S "ne") (VCMPNEQ_M_U "ne")
+(VCMPEQQ_M_F "eq")
+(VCMPGEQ_M_F "ge")
+(VCMPGTQ_M_F "gt")
+(VCMPLEQ_M_F "le")
+(VCMPLTQ_M_F "lt")
+(VCMPNEQ_M_F "ne")
+(VCMPCSQ_M_N_U "cs")
+(VCMPEQQ_M_N_S "eq") (VCMPEQQ_M_N_U "eq")
+(VCMPGEQ_M_N_S "ge")
+(VCMPGTQ_M_N_S "gt")
+(VCMPHIQ_M_N_U "hi")
+(VCMPLEQ_M_N_S "le")
+(VCMPLTQ_M_N_S "lt")
+(VCMPNEQ_M_N_S "ne") (VCMPNEQ_M_N_U "ne")
+(VCMPEQQ_M_N_F "eq")
+(VCMPGEQ_M_N_F "ge")
+(VCMPGTQ_M_N_F "gt")
+(VCMPLEQ_M_N_F "le")
+(VCMPLTQ_M_N_F "lt")
+(VCMPNEQ_M_N_F "ne")
+])
+
 (define_int_attr mve_insn [
 (VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F "vabd")
 (VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
@@ -836,6 +908,26 @@ (define_int_attr isu[
 (VCLSQ_M_S "s")
 (VCLZQ_M_S "i")
 (VCLZQ_M_U "i")
+(VCMPCSQ_M_N_U "u")
+(VCMPCSQ_M_U "u")
+(VCMPEQQ_M_N_S "i")
+(VCMPEQQ_M_N_U "i")
+(VCMPEQQ_M_S "i")
+(VCMPEQQ_M_U "i")
+(VCMPGEQ_M_N_S "s")
+(VCMPGEQ_M_S "s")
+(VCMPGTQ_M_N_S "s")
+  

[PATCH 11/20] arm: [MVE intrinsics] rework vaddvq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vaddvq using the new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vaddvq): New.
* config/arm/arm-mve-builtins-base.def (vaddvq): New.
* config/arm/arm-mve-builtins-base.h (vaddvq): New.
* config/arm/arm_mve.h (vaddvq): Remove.
(vaddvq_p): Remove.
(vaddvq_s8): Remove.
(vaddvq_s16): Remove.
(vaddvq_s32): Remove.
(vaddvq_u8): Remove.
(vaddvq_u16): Remove.
(vaddvq_u32): Remove.
(vaddvq_p_u8): Remove.
(vaddvq_p_s8): Remove.
(vaddvq_p_u16): Remove.
(vaddvq_p_s16): Remove.
(vaddvq_p_u32): Remove.
(vaddvq_p_s32): Remove.
(__arm_vaddvq_s8): Remove.
(__arm_vaddvq_s16): Remove.
(__arm_vaddvq_s32): Remove.
(__arm_vaddvq_u8): Remove.
(__arm_vaddvq_u16): Remove.
(__arm_vaddvq_u32): Remove.
(__arm_vaddvq_p_u8): Remove.
(__arm_vaddvq_p_s8): Remove.
(__arm_vaddvq_p_u16): Remove.
(__arm_vaddvq_p_s16): Remove.
(__arm_vaddvq_p_u32): Remove.
(__arm_vaddvq_p_s32): Remove.
(__arm_vaddvq): Remove.
(__arm_vaddvq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   1 +
 gcc/config/arm/arm-mve-builtins-base.def |   1 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm_mve.h | 200 ---
 4 files changed, 3 insertions(+), 200 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index cb572130c2b..7f90fc65ae2 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -244,6 +244,7 @@ namespace arm_mve {
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
+FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_WITHOUT_N_NO_U_F (vclsq, VCLSQ)
 FUNCTION (vclzq, unspec_based_mve_function_exact_insn, (CLZ, CLZ, CLZ, -1, -1, 
-1, VCLZQ_M_S, VCLZQ_M_U, -1, -1, -1 ,-1))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 30e6aa1e1e6..d32745f334a 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -21,6 +21,7 @@
 DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vabsq, unary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vaddvq, unary_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vclsq, unary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vclzq, unary, all_integer, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 3dc9114045f..9080542e7e3 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -26,6 +26,7 @@ namespace functions {
 extern const function_base *const vabdq;
 extern const function_base *const vabsq;
 extern const function_base *const vaddq;
+extern const function_base *const vaddvq;
 extern const function_base *const vandq;
 extern const function_base *const vclsq;
 extern const function_base *const vclzq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index c3d18e4cc6f..11f1033deb9 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -43,7 +43,6 @@
 #ifndef __ARM_MVE_PRESERVE_USER_NAMESPACE
 #define vst4q(__addr, __value) __arm_vst4q(__addr, __value)
 #define vaddlvq(__a) __arm_vaddlvq(__a)
-#define vaddvq(__a) __arm_vaddvq(__a)
 #define vmovlbq(__a) __arm_vmovlbq(__a)
 #define vmovltq(__a) __arm_vmovltq(__a)
 #define vmvnq(__a) __arm_vmvnq(__a)
@@ -55,7 +54,6 @@
 #define vcaddq_rot90(__a, __b) __arm_vcaddq_rot90(__a, __b)
 #define vcaddq_rot270(__a, __b) __arm_vcaddq_rot270(__a, __b)
 #define vbicq(__a, __b) __arm_vbicq(__a, __b)
-#define vaddvq_p(__a, __p) __arm_vaddvq_p(__a, __p)
 #define vaddvaq(__a, __b) __arm_vaddvaq(__a, __b)
 #define vbrsrq(__a, __b) __arm_vbrsrq(__a, __b)
 #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
@@ -329,9 +327,6 @@
 #define vcvtq_f16_u16(__a) __arm_vcvtq_f16_u16(__a)
 #define vcvtq_f32_u32(__a) __arm_vcvtq_f32_u32(__a)
 #define vaddlvq_s32(__a) __arm_vaddlvq_s32(__a)
-#define vaddvq_s8(__a) __arm_vaddvq_s8(__a)
-#define vaddvq_s16(__a) __arm_vaddvq_s16(__a)
-#define vaddvq_s32(__a) __arm_vaddvq_s32(__a)
 #define vmovlbq_s8(__a) __arm_vmovlbq_s8(__a)
 #define vmovlbq_s16(__a) __arm_vmovlbq_s16(__a)
 #define vmovltq_s8(__a) __arm_vmovltq_s8(__a)
@@ -354,9 +349,6 @@
 #define vmvnq_u8(__a) __arm_vmvnq_u8(__a)
 #define vmvnq_u16(__a) __arm_vmvnq_u16(__a)
 #define vmvnq_u32(__a) __arm_vmvnq_u32(__a)
-#define vaddvq_u8(__a) __arm_vaddvq_u8(__a)
-#define v

[PATCH 09/20] arm: [MVE intrinsics] factorize vaddvq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vaddvq builtins so that they use parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add vaddv.
* config/arm/mve.md (@mve_vaddvq_): Rename into ...
(@mve_q_): ... this.
(mve_vaddvq_p_): Rename into ...
(@mve_q_p_): ... this.
* config/arm/vec-common.md: Use gen_mve_q instead of
gen_mve_vaddvq.
---
 gcc/config/arm/iterators.md  | 2 ++
 gcc/config/arm/mve.md| 8 
 gcc/config/arm/vec-common.md | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index aff4e7fb814..46c7ddeda67 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -762,6 +762,8 @@ (define_int_attr mve_insn [
 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
+(VADDVQ_P_S "vaddv") (VADDVQ_P_U "vaddv")
+(VADDVQ_S "vaddv") (VADDVQ_U "vaddv")
 (VANDQ_M_S "vand") (VANDQ_M_U "vand") (VANDQ_M_F "vand")
 (VBICQ_M_N_S "vbic") (VBICQ_M_N_U "vbic")
 (VBICQ_M_S "vbic") (VBICQ_M_U "vbic") (VBICQ_M_F "vbic")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 0c4e4e60bc4..d772f4d4380 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -360,14 +360,14 @@ (define_insn "@mve_q_"
 ;;
 ;; [vaddvq_s, vaddvq_u])
 ;;
-(define_insn "@mve_vaddvq_"
+(define_insn "@mve_q_"
   [
(set (match_operand:SI 0 "s_register_operand" "=Te")
(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")]
 VADDVQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddv.%#\t%0, %q1"
+  ".%#\t%0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -773,7 +773,7 @@ (define_insn "mve_vaddvaq_"
 ;;
 ;; [vaddvq_p_u, vaddvq_p_s])
 ;;
-(define_insn "mve_vaddvq_p_"
+(define_insn "@mve_q_p_"
   [
(set (match_operand:SI 0 "s_register_operand" "=Te")
(unspec:SI [(match_operand:MVE_2 1 "s_register_operand" "w")
@@ -781,7 +781,7 @@ (define_insn "mve_vaddvq_p_"
 VADDVQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddvt.%#%0, %q1"
+  "vpst\;t.%#\t%0, %q1"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 6183c931e36..9af8429968d 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -559,7 +559,7 @@ (define_expand "reduc_plus_scal_"
   /* vaddv generates a 32 bits accumulator.  */
   rtx op0 = gen_reg_rtx (SImode);
 
-  emit_insn (gen_mve_vaddvq (VADDVQ_S, mode, op0, operands[1]));
+  emit_insn (gen_mve_q (VADDVQ_S, VADDVQ_S, mode, op0, operands[1]));
   emit_move_insn (operands[0], gen_lowpart (mode, op0));
 }
 
-- 
2.34.1



[PATCH 06/20] arm: [MVE intrinsics] factorize vdupq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vdup builtins so that they use parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_FP_M_N_VDUPQ_ONLY)
(MVE_FP_N_VDUPQ_ONLY): New.
(mve_insn): Add vdupq.
* config/arm/mve.md (mve_vdupq_n_f): Rename into ...
(@mve_q_n_f): ... this.
(mve_vdupq_n_): Rename into ...
(@mve_q_n_): ... this.
(mve_vdupq_m_n_): Rename into ...
(@mve_q_m_n_): ... this.
(mve_vdupq_m_n_f): Rename into ...
(@mve_q_m_n_f): ... this.
---
 gcc/config/arm/iterators.md | 10 ++
 gcc/config/arm/mve.md   | 20 ++--
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 878210471c8..aff4e7fb814 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -391,6 +391,14 @@ (define_int_iterator MVE_FP_M_VREV32Q_ONLY [
 VREV32Q_M_F
 ])
 
+(define_int_iterator MVE_FP_M_N_VDUPQ_ONLY [
+VDUPQ_M_N_F
+])
+
+(define_int_iterator MVE_FP_N_VDUPQ_ONLY [
+VDUPQ_N_F
+])
+
 ;; MVE integer binary operations.
 (define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
 
@@ -762,6 +770,8 @@ (define_int_attr mve_insn [
 (VCLSQ_S "vcls")
 (VCLZQ_M_S "vclz") (VCLZQ_M_U "vclz")
 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F 
"vcreate")
+(VDUPQ_M_N_S "vdup") (VDUPQ_M_N_U "vdup") (VDUPQ_M_N_F "vdup")
+(VDUPQ_N_S "vdup") (VDUPQ_N_U "vdup") (VDUPQ_N_F "vdup")
 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
 (VHADDQ_M_N_S "vhadd") (VHADDQ_M_N_U "vhadd")
 (VHADDQ_M_S "vhadd") (VHADDQ_M_U "vhadd")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 4dfcd6c4280..0c4e4e60bc4 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -179,14 +179,14 @@ (define_insn "mve_vq_f"
 ;;
 ;; [vdupq_n_f])
 ;;
-(define_insn "mve_vdupq_n_f"
+(define_insn "@mve_q_n_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand: 1 "s_register_operand" "r")]
-VDUPQ_N_F))
+MVE_FP_N_VDUPQ_ONLY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vdup.%#\t%q0, %1"
+  ".%#\t%q0, %1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -310,14 +310,14 @@ (define_expand "mve_vmvnq_s"
 ;;
 ;; [vdupq_n_u, vdupq_n_s])
 ;;
-(define_insn "mve_vdupq_n_"
+(define_insn "@mve_q_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand: 1 "s_register_operand" "r")]
 VDUPQ_N))
   ]
   "TARGET_HAVE_MVE"
-  "vdup.%#\t%q0, %1"
+  ".%#\t%q0, %1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2006,7 +2006,7 @@ (define_insn "@mve_vcmpq_m_"
 ;;
 ;; [vdupq_m_n_s, vdupq_m_n_u])
 ;;
-(define_insn "mve_vdupq_m_n_"
+(define_insn "@mve_q_m_n_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
@@ -2015,7 +2015,7 @@ (define_insn "mve_vdupq_m_n_"
 VDUPQ_M_N))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vdupt.%#\t%q0, %2"
+  "vpst\;t.%#\t%q0, %2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
@@ -2666,16 +2666,16 @@ (define_insn "mve_vcvttq_m_f32_f16v4sf"
 ;;
 ;; [vdupq_m_n_f])
 ;;
-(define_insn "mve_vdupq_m_n_f"
+(define_insn "@mve_q_m_n_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
   (match_operand: 2 "s_register_operand" "r")
   (match_operand: 3 "vpr_register_operand" 
"Up")]
-VDUPQ_M_N_F))
+MVE_FP_M_N_VDUPQ_ONLY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vdupt.%#\t%q0, %2"
+  "vpst\;t.%#\t%q0, %2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-- 
2.34.1



[PATCH 04/20] arm: [MVE intrinsics] factorize vrev16q vrev32q vrev64q

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vrev16q vrev32q vrev64q so that they use generic builtin
names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_V8HF, MVE_V16QI)
(MVE_FP_VREV64Q_ONLY, MVE_FP_M_VREV64Q_ONLY, MVE_FP_VREV32Q_ONLY)
(MVE_FP_M_VREV32Q_ONLY): New iterators.
(mve_insn): Add vrev16q, vrev32q, vrev64q.
* config/arm/mve.md (mve_vrev64q_f): Rename into ...
(@mve_q_f): ... this
(mve_vrev32q_fv8hf): Rename into @mve_q_f.
(mve_vrev64q_): Rename into ...
(@mve_q_): ... this.
(mve_vrev32q_): Rename into
@mve_q_.
(mve_vrev16q_v16qi): Rename into
@mve_q_.
(mve_vrev64q_m_): Rename into
@mve_q_m_.
(mve_vrev32q_m_fv8hf): Rename into @mve_q_m_f.
(mve_vrev32q_m_): Rename into
@mve_q_m_.
(mve_vrev64q_m_f): Rename into @mve_q_m_f.
(mve_vrev16q_m_v16qi): Rename into
@mve_q_m_.
---
 gcc/config/arm/iterators.md | 25 +
 gcc/config/arm/mve.md   | 72 ++---
 2 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index ef9fae0412b..878210471c8 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1,3 +1,4 @@
+
 ;; Code and mode itertator and attribute definitions for the ARM backend
 ;; Copyright (C) 2010-2023 Free Software Foundation, Inc.
 ;; Contributed by ARM Ltd.
@@ -274,6 +275,8 @@ (define_mode_iterator MVE_5 [V8HI V4SI])
 (define_mode_iterator MVE_6 [V8HI V4SI])
 (define_mode_iterator MVE_7 [V16BI V8BI V4BI V2QI])
 (define_mode_iterator MVE_7_HI [HI V16BI V8BI V4BI V2QI])
+(define_mode_iterator MVE_V8HF [V8HF])
+(define_mode_iterator MVE_V16QI [V16QI])
 
 ;;
 ;; Code iterators
@@ -372,6 +375,22 @@ (define_int_iterator MVE_FP_M_UNARY [
 VRNDXQ_M_F
 ])
 
+(define_int_iterator MVE_FP_VREV64Q_ONLY [
+VREV64Q_F
+])
+
+(define_int_iterator MVE_FP_M_VREV64Q_ONLY [
+VREV64Q_M_F
+])
+
+(define_int_iterator MVE_FP_VREV32Q_ONLY [
+VREV32Q_F
+])
+
+(define_int_iterator MVE_FP_M_VREV32Q_ONLY [
+VREV32Q_M_F
+])
+
 ;; MVE integer binary operations.
 (define_code_iterator MVE_INT_BINARY_RTX [plus minus mult])
 
@@ -862,6 +881,12 @@ (define_int_attr mve_insn [
 (VQSUBQ_M_S "vqsub") (VQSUBQ_M_U "vqsub")
 (VQSUBQ_N_S "vqsub") (VQSUBQ_N_U "vqsub")
 (VQSUBQ_S "vqsub") (VQSUBQ_U "vqsub")
+(VREV16Q_M_S "vrev16") (VREV16Q_M_U "vrev16")
+(VREV16Q_S "vrev16") (VREV16Q_U "vrev16")
+(VREV32Q_M_S "vrev32") (VREV32Q_M_U "vrev32") (VREV32Q_M_F 
"vrev32")
+(VREV32Q_S "vrev32") (VREV32Q_U "vrev32") (VREV32Q_F "vrev32")
+(VREV64Q_M_S "vrev64") (VREV64Q_M_U "vrev64") (VREV64Q_M_F 
"vrev64")
+(VREV64Q_S "vrev64") (VREV64Q_U "vrev64") (VREV64Q_F "vrev64")
 (VRHADDQ_M_S "vrhadd") (VRHADDQ_M_U "vrhadd")
 (VRHADDQ_S "vrhadd") (VRHADDQ_U "vrhadd")
 (VRMULHQ_M_S "vrmulh") (VRMULHQ_M_U "vrmulh")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 191d1268ad6..4dfcd6c4280 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -151,14 +151,14 @@ (define_insn "@mve_q_f"
 ;;
 ;; [vrev64q_f])
 ;;
-(define_insn "mve_vrev64q_f"
+(define_insn "@mve_q_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=&w")
(unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w")]
-VREV64Q_F))
+MVE_FP_VREV64Q_ONLY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vrev64.%# %q0, %q1"
+  ".%#\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -193,14 +193,14 @@ (define_insn "mve_vdupq_n_f"
 ;;
 ;; [vrev32q_f])
 ;;
-(define_insn "mve_vrev32q_fv8hf"
+(define_insn "@mve_q_f"
   [
-   (set (match_operand:V8HF 0 "s_register_operand" "=w")
-   (unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "w")]
-VREV32Q_F))
+   (set (match_operand:MVE_V8HF 0 "s_register_operand" "=w")
+   (unspec:MVE_V8HF [(match_operand:MVE_V8HF 1 "s_register_operand" "w")]
+MVE_FP_VREV32Q_ONLY))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vrev32.16 %q0, %q1"
+  ".\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
 ;;
@@ -248,14 +248,14 @@ (define_insn "mve_vcvtq_to_f_"
 ;;
 ;; [vrev64q_u, vrev64q_s])
 ;;
-(define_insn "mve_vrev64q_"
+(define_insn "@mve_q_"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=&w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "w")]
 VREV64Q))
   ]
   "TARGET_HAVE_MVE"
-  "vrev64.%# %q0, %q1"
+  ".%#\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -374,14 +374,14 @@ (

[PATCH 20/20] arm: [MVE intrinsics] rework vmovlbq vmovltq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vmovlbq, vmovltq using the new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmovlbq, vmovltq): New.
* config/arm/arm-mve-builtins-base.def (vmovlbq, vmovltq): New.
* config/arm/arm-mve-builtins-base.h (vmovlbq, vmovltq): New.
* config/arm/arm_mve.h (vmovlbq): Remove.
(vmovltq): Remove.
(vmovlbq_m): Remove.
(vmovltq_m): Remove.
(vmovlbq_x): Remove.
(vmovltq_x): Remove.
(vmovlbq_s8): Remove.
(vmovlbq_s16): Remove.
(vmovltq_s8): Remove.
(vmovltq_s16): Remove.
(vmovltq_u8): Remove.
(vmovltq_u16): Remove.
(vmovlbq_u8): Remove.
(vmovlbq_u16): Remove.
(vmovlbq_m_s8): Remove.
(vmovltq_m_s8): Remove.
(vmovlbq_m_u8): Remove.
(vmovltq_m_u8): Remove.
(vmovlbq_m_s16): Remove.
(vmovltq_m_s16): Remove.
(vmovlbq_m_u16): Remove.
(vmovltq_m_u16): Remove.
(vmovlbq_x_s8): Remove.
(vmovlbq_x_s16): Remove.
(vmovlbq_x_u8): Remove.
(vmovlbq_x_u16): Remove.
(vmovltq_x_s8): Remove.
(vmovltq_x_s16): Remove.
(vmovltq_x_u8): Remove.
(vmovltq_x_u16): Remove.
(__arm_vmovlbq_s8): Remove.
(__arm_vmovlbq_s16): Remove.
(__arm_vmovltq_s8): Remove.
(__arm_vmovltq_s16): Remove.
(__arm_vmovltq_u8): Remove.
(__arm_vmovltq_u16): Remove.
(__arm_vmovlbq_u8): Remove.
(__arm_vmovlbq_u16): Remove.
(__arm_vmovlbq_m_s8): Remove.
(__arm_vmovltq_m_s8): Remove.
(__arm_vmovlbq_m_u8): Remove.
(__arm_vmovltq_m_u8): Remove.
(__arm_vmovlbq_m_s16): Remove.
(__arm_vmovltq_m_s16): Remove.
(__arm_vmovlbq_m_u16): Remove.
(__arm_vmovltq_m_u16): Remove.
(__arm_vmovlbq_x_s8): Remove.
(__arm_vmovlbq_x_s16): Remove.
(__arm_vmovlbq_x_u8): Remove.
(__arm_vmovlbq_x_u16): Remove.
(__arm_vmovltq_x_s8): Remove.
(__arm_vmovltq_x_s16): Remove.
(__arm_vmovltq_x_u8): Remove.
(__arm_vmovltq_x_u16): Remove.
(__arm_vmovlbq): Remove.
(__arm_vmovltq): Remove.
(__arm_vmovlbq_m): Remove.
(__arm_vmovltq_m): Remove.
(__arm_vmovlbq_x): Remove.
(__arm_vmovltq_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   2 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 454 ---
 4 files changed, 6 insertions(+), 454 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index fdc0ff50b96..2dec15ac0b1 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -279,6 +279,8 @@ FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, 
(UNKNOWN, UNKNOWN, SMIN
 FUNCTION_PRED_P_F (vminnmvq, VMINNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vminq, SMIN, UMIN, VMINQ)
 FUNCTION_PRED_P_S_U (vminvq, VMINVQ)
+FUNCTION_WITHOUT_N_NO_F (vmovlbq, VMOVLBQ)
+FUNCTION_WITHOUT_N_NO_F (vmovltq, VMOVLTQ)
 FUNCTION_WITHOUT_N_NO_F (vmovnbq, VMOVNBQ)
 FUNCTION_WITHOUT_N_NO_F (vmovntq, VMOVNTQ)
 FUNCTION_WITHOUT_N_NO_F (vmulhq, VMULHQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index dcfb426a7fb..b0de5af1013 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -48,6 +48,8 @@ DEF_MVE_FUNCTION (vminaq, binary_maxamina, all_signed, 
m_or_none)
 DEF_MVE_FUNCTION (vminavq, binary_maxavminav, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vminq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vminvq, binary_maxvminv, all_integer, p_or_none)
+DEF_MVE_FUNCTION (vmovlbq, unary_widen, integer_8_16, mx_or_none)
+DEF_MVE_FUNCTION (vmovltq, unary_widen, integer_8_16, mx_or_none)
 DEF_MVE_FUNCTION (vmovnbq, binary_move_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vmovntq, binary_move_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vmulhq, binary, all_integer, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 5de70d5e1d4..fa2e97fd461 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -61,6 +61,8 @@ extern const function_base *const vminnmq;
 extern const function_base *const vminnmvq;
 extern const function_base *const vminq;
 extern const function_base *const vminvq;
+extern const function_base *const vmovlbq;
+extern const function_base *const vmovltq;
 extern const function_base *const vmovnbq;
 extern const function_base *const vmovntq;
 extern const function_base *const vmulhq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 21d7768a732..c0891b7592a 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.

[PATCH 15/20] arm: [MVE intrinsics] add unary_acc shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the unary_acc shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary_acc): New.
* config/arm/arm-mve-builtins-shapes.h (unary_acc): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 28 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 29 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index bff1c3e843b..e77a0cc20ac 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1066,6 +1066,34 @@ struct unary_def : public overloaded_base<0>
 };
 SHAPE (unary)
 
+/* _t vfoo[_](_t)
+
+   i.e. a version of "unary" in which the source elements are half the
+   size of the destination scalar, but have the same type class.
+
+   Example: vaddlvq.
+   int64_t [__arm_]vaddlvq[_s32](int32x4_t a)
+   int64_t [__arm_]vaddlvq_p[_s32](int32x4_t a, mve_pred16_t p) */
+struct unary_acc_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "sw0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+/* FIXME: check that the return value is actually
+   twice as wide as arg 0.  */
+return r.resolve_unary ();
+  }
+};
+SHAPE (unary_acc)
+
 /* _t foo_t0[_t1](_t)
 
where the target type  must be specified explicitly but the source
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index fc1bacbd4da..c062fe624c4 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -53,6 +53,7 @@ namespace arm_mve
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary;
+extern const function_shape *const unary_acc;
 extern const function_shape *const unary_convert;
 extern const function_shape *const unary_int32;
 extern const function_shape *const unary_int32_acc;
-- 
2.34.1



[PATCH 07/20] arm: [MVE intrinsics] add unary_n shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the unary_n shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary_n): New.
* config/arm/arm-mve-builtins-shapes.h (unary_n): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 53 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 54 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ea0112b3e99..c78683aaba2 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1094,6 +1094,59 @@ struct unary_convert_def : public overloaded_base<1>
 };
 SHAPE (unary_convert)
 
+/* _t vfoo[_n]_t0(_t)
+
+   Example: vdupq.
+   int16x8_t [__arm_]vdupq_n_s16(int16_t a)
+   int16x8_t [__arm_]vdupq_m[_n_s16](int16x8_t inactive, int16_t a, 
mve_pred16_t p)
+   int16x8_t [__arm_]vdupq_x_n_s16(int16_t a, mve_pred16_t p)  */
+struct unary_n_def : public overloaded_base<0>
+{
+  bool
+  explicit_type_suffix_p (unsigned int, enum predication_index pred,
+ enum mode_suffix_index) const override
+  {
+return pred != PRED_m;
+  }
+
+  bool
+  explicit_mode_suffix_p (enum predication_index pred,
+ enum mode_suffix_index mode) const override
+  {
+return ((mode == MODE_n)
+   && (pred != PRED_m));
+  }
+
+  bool
+  skip_overload_p (enum predication_index pred, enum mode_suffix_index mode)
+const override
+  {
+switch (mode)
+  {
+  case MODE_n:
+   return pred != PRED_m;
+
+  default:
+   gcc_unreachable ();
+  }
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,s0", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+return r.resolve_unary_n ();
+  }
+};
+SHAPE (unary_n)
+
 } /* end namespace arm_mve */
 
 #undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 59c4dc39c39..a35faec2542 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -54,6 +54,7 @@ namespace arm_mve
 extern const function_shape *const inherent;
 extern const function_shape *const unary;
 extern const function_shape *const unary_convert;
+extern const function_shape *const unary_n;
 
   } /* end namespace arm_mve::shapes */
 } /* end namespace arm_mve */
-- 
2.34.1



[PATCH 14/20] arm: [MVE intrinsics] rework vaddvaq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vaddvaq using the new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vaddvaq): New.
* config/arm/arm-mve-builtins-base.def (vaddvaq): New.
* config/arm/arm-mve-builtins-base.h (vaddvaq): New.
* config/arm/arm_mve.h (vaddvaq): Remove.
(vaddvaq_p): Remove.
(vaddvaq_u8): Remove.
(vaddvaq_s8): Remove.
(vaddvaq_u16): Remove.
(vaddvaq_s16): Remove.
(vaddvaq_u32): Remove.
(vaddvaq_s32): Remove.
(vaddvaq_p_u8): Remove.
(vaddvaq_p_s8): Remove.
(vaddvaq_p_u16): Remove.
(vaddvaq_p_s16): Remove.
(vaddvaq_p_u32): Remove.
(vaddvaq_p_s32): Remove.
(__arm_vaddvaq_u8): Remove.
(__arm_vaddvaq_s8): Remove.
(__arm_vaddvaq_u16): Remove.
(__arm_vaddvaq_s16): Remove.
(__arm_vaddvaq_u32): Remove.
(__arm_vaddvaq_s32): Remove.
(__arm_vaddvaq_p_u8): Remove.
(__arm_vaddvaq_p_s8): Remove.
(__arm_vaddvaq_p_u16): Remove.
(__arm_vaddvaq_p_s16): Remove.
(__arm_vaddvaq_p_u32): Remove.
(__arm_vaddvaq_p_s32): Remove.
(__arm_vaddvaq): Remove.
(__arm_vaddvaq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   1 +
 gcc/config/arm/arm-mve-builtins-base.def |   1 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm_mve.h | 202 ---
 4 files changed, 3 insertions(+), 202 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 7f90fc65ae2..e87069b0467 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -245,6 +245,7 @@ FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
+FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
 FUNCTION_WITHOUT_N_NO_U_F (vclsq, VCLSQ)
 FUNCTION (vclzq, unspec_based_mve_function_exact_insn, (CLZ, CLZ, CLZ, -1, -1, 
-1, VCLZQ_M_S, VCLZQ_M_U, -1, -1, -1 ,-1))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index d32745f334a..413fe4a1ef0 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -21,6 +21,7 @@
 DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vabsq, unary, all_signed, mx_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vaddvaq, unary_int32_acc, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vaddvq, unary_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vandq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vclsq, unary, all_signed, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 9080542e7e3..5338b777444 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -26,6 +26,7 @@ namespace functions {
 extern const function_base *const vabdq;
 extern const function_base *const vabsq;
 extern const function_base *const vaddq;
+extern const function_base *const vaddvaq;
 extern const function_base *const vaddvq;
 extern const function_base *const vandq;
 extern const function_base *const vclsq;
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 11f1033deb9..74783570561 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -54,7 +54,6 @@
 #define vcaddq_rot90(__a, __b) __arm_vcaddq_rot90(__a, __b)
 #define vcaddq_rot270(__a, __b) __arm_vcaddq_rot270(__a, __b)
 #define vbicq(__a, __b) __arm_vbicq(__a, __b)
-#define vaddvaq(__a, __b) __arm_vaddvaq(__a, __b)
 #define vbrsrq(__a, __b) __arm_vbrsrq(__a, __b)
 #define vqshluq(__a, __imm) __arm_vqshluq(__a, __imm)
 #define vmlsdavxq(__a, __b) __arm_vmlsdavxq(__a, __b)
@@ -89,7 +88,6 @@
 #define vmlaq(__a, __b, __c) __arm_vmlaq(__a, __b, __c)
 #define vmladavq_p(__a, __b, __p) __arm_vmladavq_p(__a, __b, __p)
 #define vmladavaq(__a, __b, __c) __arm_vmladavaq(__a, __b, __c)
-#define vaddvaq_p(__a, __b, __p) __arm_vaddvaq_p(__a, __b, __p)
 #define vsriq(__a, __b, __imm) __arm_vsriq(__a, __b, __imm)
 #define vsliq(__a, __b, __imm) __arm_vsliq(__a, __b, __imm)
 #define vmlsdavxq_p(__a, __b, __p) __arm_vmlsdavxq_p(__a, __b, __p)
@@ -390,7 +388,6 @@
 #define vcaddq_rot90_u8(__a, __b) __arm_vcaddq_rot90_u8(__a, __b)
 #define vcaddq_rot270_u8(__a, __b) __arm_vcaddq_rot270_u8(__a, __b)
 #define vbicq_u8(__a, __b) __arm_vbicq_u8(__a, __b)
-#define vaddvaq_u8(__a, __b) __arm_vaddvaq_u8(__a, __b)
 #define vbrsrq_n_u8(__a, __b) __arm_vbrsrq_n_u8(__a, __b)
 #define vqshluq_n_s8(__a,  __imm) __arm_vqshluq_n_s8(__a,  __imm)
 #define vornq_s8(__a, __b) __arm_vornq_s8(__a, __b)
@@ -406,7 +4

[PATCH 10/20] arm: [MVE intrinsics] add unary_int32 shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the unary_int32 shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary_int32): New.
* config/arm/arm-mve-builtins-shapes.h (unary_int32): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index c78683aaba2..0bd91b24147 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1094,6 +1094,33 @@ struct unary_convert_def : public overloaded_base<1>
 };
 SHAPE (unary_convert)
 
+/* [u]int32_t vfoo[_](_t)
+
+   i.e. a version of "unary" which generates a scalar of type int32_t
+   or uint32_t depending on the signedness of the elements of of input
+   vector.
+
+   Example: vaddvq
+   int32_t [__arm_]vaddvq[_s16](int16x8_t a)
+   int32_t [__arm_]vaddvq_p[_s16](int16x8_t a, mve_pred16_t p)  */
+struct unary_int32_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "sx32,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+return r.resolve_uniform (1);
+  }
+};
+SHAPE (unary_int32)
+
 /* _t vfoo[_n]_t0(_t)
 
Example: vdupq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index a35faec2542..f422550559e 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -54,6 +54,7 @@ namespace arm_mve
 extern const function_shape *const inherent;
 extern const function_shape *const unary;
 extern const function_shape *const unary_convert;
+extern const function_shape *const unary_int32;
 extern const function_shape *const unary_n;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1



[PATCH 16/20] arm: [MVE intrinsics] factorize vaddlvq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vaddlvq builtins so that they use parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add vaddlv.
* config/arm/mve.md (mve_vaddlvq_v4si): Rename into ...
(@mve_q_v4si): ... this.
(mve_vaddlvq_p_v4si): Rename into ...
(@mve_q_p_v4si): ... this.
---
 gcc/config/arm/iterators.md | 2 ++
 gcc/config/arm/mve.md   | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 00123c0a376..84dd97249f9 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -759,6 +759,8 @@ (define_int_attr mve_insn [
 (VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
 (VABSQ_M_F "vabs")
 (VABSQ_M_S "vabs")
+(VADDLVQ_P_S "vaddlv") (VADDLVQ_P_U "vaddlv")
+(VADDLVQ_S "vaddlv") (VADDLVQ_U "vaddlv")
 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 1ccbce3c89c..c5373fef9a2 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -500,14 +500,14 @@ (define_insn "@mve_q_"
 ;;
 ;; [vaddlvq_s vaddlvq_u])
 ;;
-(define_insn "mve_vaddlvq_v4si"
+(define_insn "@mve_q_v4si"
   [
(set (match_operand:DI 0 "s_register_operand" "=r")
(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")]
 VADDLVQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddlv.32\t%Q0, %R0, %q1"
+  ".32\t%Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -666,7 +666,7 @@ (define_insn "mve_vcvtq_n_from_f_"
 ;;
 ;; [vaddlvq_p_s])
 ;;
-(define_insn "mve_vaddlvq_p_v4si"
+(define_insn "@mve_q_p_v4si"
   [
(set (match_operand:DI 0 "s_register_operand" "=r")
(unspec:DI [(match_operand:V4SI 1 "s_register_operand" "w")
@@ -674,7 +674,7 @@ (define_insn "mve_vaddlvq_p_v4si"
 VADDLVQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddlvt.32\t%Q0, %R0, %q1"
+  "vpst\;t.32\t%Q0, %R0, %q1"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-- 
2.34.1



[PATCH 02/20] arm: [MVE intrinsics] add cmp shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the cmp shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (cmp): New.
* config/arm/arm-mve-builtins-shapes.h (cmp): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 27 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index c9eac80d1e3..ea0112b3e99 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -974,6 +974,33 @@ struct binary_widen_n_def : public overloaded_base<0>
 };
 SHAPE (binary_widen_n)
 
+/* Shape for comparison operations that operate on
+   uniform types.
+
+   Examples: vcmpq.
+   mve_pred16_t [__arm_]vcmpeqq[_s16](int16x8_t a, int16x8_t b)
+   mve_pred16_t [__arm_]vcmpeqq[_n_s16](int16x8_t a, int16_t b)
+   mve_pred16_t [__arm_]vcmpeqq_m[_s16](int16x8_t a, int16x8_t b, mve_pred16_t 
p)
+   mve_pred16_t [__arm_]vcmpeqq_m[_n_s16](int16x8_t a, int16_t b, mve_pred16_t 
p)  */
+struct cmp_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "p,v0,v0", group, MODE_none, preserve_user_namespace);
+build_all (b, "p,v0,s0", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+return r.resolve_uniform_opt_n (2);
+  }
+};
+SHAPE (cmp)
+
 /* xN_t vfoo[_t0](uint64_t, uint64_t)
 
where there are N arguments in total.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 7f582d7375a..59c4dc39c39 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -49,6 +49,7 @@ namespace arm_mve
 extern const function_shape *const binary_rshift_narrow;
 extern const function_shape *const binary_rshift_narrow_unsigned;
 extern const function_shape *const binary_widen_n;
+extern const function_shape *const cmp;
 extern const function_shape *const create;
 extern const function_shape *const inherent;
 extern const function_shape *const unary;
-- 
2.34.1



[PATCH 18/20] arm: [MVE intrinsics] factorize vmovlbq vmovltq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vmovlbq, vmovltq builtins so that they use the same
parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add vmovlb, vmovlt.
(VMOVLBQ, VMOVLTQ): Merge into ...
(VMOVLxQ): ... this.
(VMOVLTQ_M, VMOVLBQ_M): Merge into ...
(VMOVLxQ_M): ... this.
* config/arm/mve.md (mve_vmovltq_)
(mve_vmovlbq_): Merge into ...
(@mve_q_): ... this.
(mve_vmovlbq_m_, mve_vmovltq_m_): Merge
into ...
(@mve_q_m_): ... this.
---
 gcc/config/arm/iterators.md | 10 +
 gcc/config/arm/mve.md   | 44 -
 2 files changed, 15 insertions(+), 39 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 84dd97249f9..2f6de937ef7 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -817,6 +817,10 @@ (define_int_attr mve_insn [
 (VMINVQ_S "vminv") (VMINVQ_U "vminv")
 (VMLAQ_M_N_S "vmla") (VMLAQ_M_N_U "vmla")
 (VMLASQ_M_N_S "vmlas") (VMLASQ_M_N_U "vmlas")
+(VMOVLBQ_M_S "vmovlb") (VMOVLBQ_M_U "vmovlb")
+(VMOVLBQ_S "vmovlb") (VMOVLBQ_U "vmovlb")
+(VMOVLTQ_M_S "vmovlt") (VMOVLTQ_M_U "vmovlt")
+(VMOVLTQ_S "vmovlt") (VMOVLTQ_U "vmovlt")
 (VMOVNBQ_M_S "vmovnb") (VMOVNBQ_M_U "vmovnb")
 (VMOVNBQ_S "vmovnb") (VMOVNBQ_U "vmovnb")
 (VMOVNTQ_M_S "vmovnt") (VMOVNTQ_M_U "vmovnt")
@@ -2318,8 +2322,7 @@ (define_int_iterator VCVTAQ [VCVTAQ_U VCVTAQ_S])
 (define_int_iterator VDUPQ_N [VDUPQ_N_U VDUPQ_N_S])
 (define_int_iterator VADDVQ [VADDVQ_U VADDVQ_S])
 (define_int_iterator VREV32Q [VREV32Q_U VREV32Q_S])
-(define_int_iterator VMOVLBQ [VMOVLBQ_S VMOVLBQ_U])
-(define_int_iterator VMOVLTQ [VMOVLTQ_U VMOVLTQ_S])
+(define_int_iterator VMOVLxQ [VMOVLBQ_S VMOVLBQ_U VMOVLTQ_U VMOVLTQ_S])
 (define_int_iterator VCVTPQ [VCVTPQ_S VCVTPQ_U])
 (define_int_iterator VCVTNQ [VCVTNQ_S VCVTNQ_U])
 (define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
@@ -2413,7 +2416,7 @@ (define_int_iterator VSLIQ_N [VSLIQ_N_S VSLIQ_N_U])
 (define_int_iterator VSRIQ_N [VSRIQ_N_S VSRIQ_N_U])
 (define_int_iterator VMLALDAVQ_P [VMLALDAVQ_P_U VMLALDAVQ_P_S])
 (define_int_iterator VQMOVNBQ_M [VQMOVNBQ_M_S VQMOVNBQ_M_U])
-(define_int_iterator VMOVLTQ_M [VMOVLTQ_M_U VMOVLTQ_M_S])
+(define_int_iterator VMOVLxQ_M [VMOVLBQ_M_U VMOVLBQ_M_S VMOVLTQ_M_U 
VMOVLTQ_M_S])
 (define_int_iterator VMOVNBQ_M [VMOVNBQ_M_U VMOVNBQ_M_S])
 (define_int_iterator VRSHRNTQ_N [VRSHRNTQ_N_U VRSHRNTQ_N_S])
 (define_int_iterator VORRQ_M_N [VORRQ_M_N_S VORRQ_M_N_U])
@@ -2421,7 +2424,6 @@ (define_int_iterator VREV32Q_M [VREV32Q_M_S VREV32Q_M_U])
 (define_int_iterator VREV16Q_M [VREV16Q_M_S VREV16Q_M_U])
 (define_int_iterator VQRSHRNTQ_N [VQRSHRNTQ_N_U VQRSHRNTQ_N_S])
 (define_int_iterator VMOVNTQ_M [VMOVNTQ_M_U VMOVNTQ_M_S])
-(define_int_iterator VMOVLBQ_M [VMOVLBQ_M_U VMOVLBQ_M_S])
 (define_int_iterator VMLALDAVAQ [VMLALDAVAQ_S VMLALDAVAQ_U])
 (define_int_iterator VQSHRNBQ_N [VQSHRNBQ_N_U VQSHRNBQ_N_S])
 (define_int_iterator VSHRNBQ_N [VSHRNBQ_N_U VSHRNBQ_N_S])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c5373fef9a2..f5cb8ef48ef 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -386,30 +386,17 @@ (define_insn "@mve_q_"
 ])
 
 ;;
-;; [vmovltq_u, vmovltq_s])
+;; [vmovlbq_s, vmovlbq_u]
+;; [vmovltq_u, vmovltq_s]
 ;;
-(define_insn "mve_vmovltq_"
-  [
-   (set (match_operand: 0 "s_register_operand" "=w")
-   (unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")]
-VMOVLTQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmovlt.%#   %q0, %q1"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vmovlbq_s, vmovlbq_u])
-;;
-(define_insn "mve_vmovlbq_"
+(define_insn "@mve_q_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand:MVE_3 1 "s_register_operand" 
"w")]
-VMOVLBQ))
+VMOVLxQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmovlb.%#   %q0, %q1"
+  ".%#\t%q0, %q1"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2904,34 +2891,21 @@ (define_insn "mve_vmlsldavxq_p_s"
   "vpst\;vmlsldavxt.s%# %Q0, %R0, %q1, %q2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
+
 ;;
 ;; [vmovlbq_m_u, vmovlbq_m_s])
-;;
-(define_insn "mve_vmovlbq_m_"
-  [
-   (set (match_operand: 0 "s_register_operand" "=w")
-   (unspec: [(match_operand: 1 
"s_register_operand" "0")
-  (match_operand:MVE_3 2 "s_register_operand" "w")
-  (match_operand: 3 "vpr_register_operand" 
"Up")]
-VMOVLBQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vmovlbt.%#   %q0, %q2"
-  [(set_attr "type" "mve_move")
-   (set_attr "length""8")])
-;;
 ;; [vmovltq_m_u, vmovltq_m_s])
 ;;
-(define_insn "mve_vmovltq_m_"
+(define_insn "@mve_q_m_"
   [
(set (match_operand: 0 "s_register_operand" "=w")
(unspec: [(match_operand: 1 
"s_register_operand" "0")
  

[PATCH 19/20] arm: [MVE intrinsics] add unary_widen shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the unary_widen shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary_widen): New.
* config/arm/arm-mve-builtins-shapes.h (unary_widen): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 46 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 47 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e77a0cc20ac..ae73fc6b1b7 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1236,6 +1236,52 @@ struct unary_n_def : public overloaded_base<0>
 };
 SHAPE (unary_n)
 
+/* _t vfoo[_t0](_t)
+
+   i.e. a version of "unary" in which the source elements are half the
+   size of the destination, but have the same type class.
+
+   Example: vmovlbq.
+   int32x4_t [__arm_]vmovlbq[_s16](int16x8_t a)
+   int32x4_t [__arm_]vmovlbq_m[_s16](int32x4_t inactive, int16x8_t a, 
mve_pred16_t p)
+   int32x4_t [__arm_]vmovlbq_x[_s16](int16x8_t a, mve_pred16_t p)  */
+struct unary_widen_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "vw0,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+tree res;
+if (!r.check_gp_argument (1, i, nargs)
+   || (type = r.infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+type_suffix_index wide_suffix
+  = find_type_suffix (type_suffixes[type].tclass,
+ type_suffixes[type].element_bits * 2);
+
+/* Check the inactive argument has the wide type.  */
+if ((r.pred == PRED_m)
+   && (r.infer_vector_type (0) != wide_suffix))
+return r.report_no_such_form (type);
+
+if ((res = r.lookup_form (r.mode_suffix_id, type)))
+   return res;
+
+return r.report_no_such_form (type);
+  }
+};
+SHAPE (unary_widen)
+
 } /* end namespace arm_mve */
 
 #undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index c062fe624c4..5a8d9fe2b2d 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -58,6 +58,7 @@ namespace arm_mve
 extern const function_shape *const unary_int32;
 extern const function_shape *const unary_int32_acc;
 extern const function_shape *const unary_n;
+extern const function_shape *const unary_widen;
 
   } /* end namespace arm_mve::shapes */
 } /* end namespace arm_mve */
-- 
2.34.1



[PATCH 17/20] arm: [MVE intrinsics] rework vaddlvq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vaddlvq using the new MVE builtins framework.

Since we kept v4si hardcoded in the builtin name, we need to
special-case it in unspec_mve_function_exact_insn_pred_p.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vaddlvq): New.
* config/arm/arm-mve-builtins-base.def (vaddlvq): New.
* config/arm/arm-mve-builtins-base.h (vaddlvq): New.
* config/arm/arm-mve-builtins-functions.h
(unspec_mve_function_exact_insn_pred_p): Handle vaddlvq.
* config/arm/arm_mve.h (vaddlvq): Remove.
(vaddlvq_p): Remove.
(vaddlvq_s32): Remove.
(vaddlvq_u32): Remove.
(vaddlvq_p_s32): Remove.
(vaddlvq_p_u32): Remove.
(__arm_vaddlvq_s32): Remove.
(__arm_vaddlvq_u32): Remove.
(__arm_vaddlvq_p_s32): Remove.
(__arm_vaddlvq_p_u32): Remove.
(__arm_vaddlvq): Remove.
(__arm_vaddlvq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc |  1 +
 gcc/config/arm/arm-mve-builtins-base.def|  1 +
 gcc/config/arm/arm-mve-builtins-base.h  |  1 +
 gcc/config/arm/arm-mve-builtins-functions.h | 69 ++--
 gcc/config/arm/arm_mve.h| 72 -
 5 files changed, 51 insertions(+), 93 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index e87069b0467..fdc0ff50b96 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -244,6 +244,7 @@ namespace arm_mve {
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
+FUNCTION_PRED_P_S_U (vaddlvq, VADDLVQ)
 FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
 FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 413fe4a1ef0..dcfb426a7fb 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -20,6 +20,7 @@
 #define REQUIRES_FLOAT false
 DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vabsq, unary, all_signed, mx_or_none)
+DEF_MVE_FUNCTION (vaddlvq, unary_acc, integer_32, p_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vaddvaq, unary_int32_acc, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vaddvq, unary_int32, all_integer, p_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 5338b777444..5de70d5e1d4 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -25,6 +25,7 @@ namespace functions {
 
 extern const function_base *const vabdq;
 extern const function_base *const vabsq;
+extern const function_base *const vaddlvq;
 extern const function_base *const vaddq;
 extern const function_base *const vaddvaq;
 extern const function_base *const vaddvq;
diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index d069990dcab..ea926e42b81 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -408,32 +408,59 @@ public:
   expand (function_expander &e) const override
   {
 insn_code code;
-switch (e.pred)
+
+if ((m_unspec_for_sint == VADDLVQ_S)
+   || m_unspec_for_sint == VADDLVAQ_S)
   {
-  case PRED_none:
-   if (e.type_suffix (0).integer_p)
- if (e.type_suffix (0).unsigned_p)
-   code = code_for_mve_q (m_unspec_for_uint, m_unspec_for_uint, 
e.vector_mode (0));
- else
-   code = code_for_mve_q (m_unspec_for_sint, m_unspec_for_sint, 
e.vector_mode (0));
-   else
- code = code_for_mve_q_f (m_unspec_for_fp, e.vector_mode (0));
+   switch (e.pred)
+ {
+ case PRED_none:
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_v4si (m_unspec_for_uint, m_unspec_for_uint);
+   else
+ code = code_for_mve_q_v4si (m_unspec_for_sint, m_unspec_for_sint);
+   return e.use_exact_insn (code);
 
-   return e.use_exact_insn (code);
+ case PRED_p:
+   if (e.type_suffix (0).unsigned_p)
+ code = code_for_mve_q_p_v4si (m_unspec_for_p_uint, 
m_unspec_for_p_uint);
+   else
+ code = code_for_mve_q_p_v4si (m_unspec_for_p_sint, 
m_unspec_for_p_sint);
+   return e.use_exact_insn (code);
 
-  case PRED_p:
-   if (e.type_suffix (0).integer_p)
- if (e.type_suffix (0).unsigned_p)
-   code = code_for_mve_q_p (m_unspec_for_p_uint, m_unspec_for_p_uint, 
e.vector_mode (0));
- else
-   code = code_for_mve_q_p (m_unspec_for_p_sint, m_unspec_for_p_sint, 
e.vector_mode (0));
-   else
- code = 

[PATCH 08/20] arm: [MVE intrinsics] rework vdupq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vdupq using the new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_N): New.
(vdupq): New.
* config/arm/arm-mve-builtins-base.def (vdupq): New.
* config/arm/arm-mve-builtins-base.h: (vdupq): New.
* config/arm/arm_mve.h (vdupq_n): Remove.
(vdupq_m): Remove.
(vdupq_n_f16): Remove.
(vdupq_n_f32): Remove.
(vdupq_n_s8): Remove.
(vdupq_n_s16): Remove.
(vdupq_n_s32): Remove.
(vdupq_n_u8): Remove.
(vdupq_n_u16): Remove.
(vdupq_n_u32): Remove.
(vdupq_m_n_u8): Remove.
(vdupq_m_n_s8): Remove.
(vdupq_m_n_u16): Remove.
(vdupq_m_n_s16): Remove.
(vdupq_m_n_u32): Remove.
(vdupq_m_n_s32): Remove.
(vdupq_m_n_f16): Remove.
(vdupq_m_n_f32): Remove.
(vdupq_x_n_s8): Remove.
(vdupq_x_n_s16): Remove.
(vdupq_x_n_s32): Remove.
(vdupq_x_n_u8): Remove.
(vdupq_x_n_u16): Remove.
(vdupq_x_n_u32): Remove.
(vdupq_x_n_f16): Remove.
(vdupq_x_n_f32): Remove.
(__arm_vdupq_n_s8): Remove.
(__arm_vdupq_n_s16): Remove.
(__arm_vdupq_n_s32): Remove.
(__arm_vdupq_n_u8): Remove.
(__arm_vdupq_n_u16): Remove.
(__arm_vdupq_n_u32): Remove.
(__arm_vdupq_m_n_u8): Remove.
(__arm_vdupq_m_n_s8): Remove.
(__arm_vdupq_m_n_u16): Remove.
(__arm_vdupq_m_n_s16): Remove.
(__arm_vdupq_m_n_u32): Remove.
(__arm_vdupq_m_n_s32): Remove.
(__arm_vdupq_x_n_s8): Remove.
(__arm_vdupq_x_n_s16): Remove.
(__arm_vdupq_x_n_s32): Remove.
(__arm_vdupq_x_n_u8): Remove.
(__arm_vdupq_x_n_u16): Remove.
(__arm_vdupq_x_n_u32): Remove.
(__arm_vdupq_n_f16): Remove.
(__arm_vdupq_n_f32): Remove.
(__arm_vdupq_m_n_f16): Remove.
(__arm_vdupq_m_n_f32): Remove.
(__arm_vdupq_x_n_f16): Remove.
(__arm_vdupq_x_n_f32): Remove.
(__arm_vdupq_n): Remove.
(__arm_vdupq_m): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  10 +
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm_mve.h | 333 ---
 4 files changed, 13 insertions(+), 333 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 76294ddb7fb..cb572130c2b 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -176,6 +176,15 @@ namespace arm_mve {
 UNSPEC##_M_S, UNSPEC##_M_U, UNSPEC##_M_F,  \
 -1, -1, -1))
 
+  /* Helper for builtins with only unspec codes, _m predicated
+ overrides, only _n version.  */
+#define FUNCTION_ONLY_N(NAME, UNSPEC) FUNCTION \
+  (NAME, unspec_mve_function_exact_insn,   \
+   (-1, -1, -1,
\
+UNSPEC##_N_S, UNSPEC##_N_U, UNSPEC##_N_F,  \
+-1, -1, -1,
\
+UNSPEC##_M_N_S, UNSPEC##_M_N_U, UNSPEC##_M_N_F))
+
   /* Helper for builtins with only unspec codes, _m predicated
  overrides, only _n version, no floating-point.  */
 #define FUNCTION_ONLY_N_NO_F(NAME, UNSPEC) FUNCTION\
@@ -247,6 +256,7 @@ FUNCTION (vcmpltq, 
unspec_based_mve_function_exact_insn_vcmp, (LT, UNKNOWN, LT,
 FUNCTION (vcmpcsq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GEU, 
UNKNOWN, UNKNOWN, VCMPCSQ_M_U, UNKNOWN, UNKNOWN, VCMPCSQ_M_N_U, UNKNOWN))
 FUNCTION (vcmphiq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GTU, 
UNKNOWN, UNKNOWN, VCMPHIQ_M_U, UNKNOWN, UNKNOWN, VCMPHIQ_M_N_U, UNKNOWN))
 FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
+FUNCTION_ONLY_N (vdupq, VDUPQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION_WITH_M_N_NO_F (vhaddq, VHADDQ)
 FUNCTION_WITH_M_N_NO_F (vhsubq, VHSUBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 2602cbf20e3..30e6aa1e1e6 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -33,6 +33,7 @@ DEF_MVE_FUNCTION (vcmpleq, cmp, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vcmpltq, cmp, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vcmpneq, cmp, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
+DEF_MVE_FUNCTION (vdupq, unary_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhsubq, binary_opt_n, all_integer, mx_or_none)
@@ -104,6 +105,7 @@ DEF_MVE_FUNCTION (vcmpleq, cmp, all_float, m_or_none)
 DEF_MVE_FUNCTION (vcmpltq, cmp, all_float, m_or_none)
 DEF_MVE_FUNC

[PATCH 13/20] arm: [MVE intrinsics] add unary_int32_acc shape

2023-05-10 Thread Christophe Lyon via Gcc-patches
This patch adds the unary_int32_acc shape description.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary_int32_acc): New.
* config/arm/arm-mve-builtins-shapes.h (unary_int32_acc): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 34 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 35 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 0bd91b24147..bff1c3e843b 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1121,6 +1121,40 @@ struct unary_int32_def : public overloaded_base<0>
 };
 SHAPE (unary_int32)
 
+/* [u]int32_t vfoo[_]([u]int32_t, _t)
+
+   i.e. a version of "unary" which accumulates into scalar of type
+   int32_t or uint32_t depending on the signedness of the elements of
+   of input vector.
+
+   Example: vaddvaq.
+   int32_t [__arm_]vaddvaq[_s16](int32_t a, int16x8_t b)
+   int32_t [__arm_]vaddvaq_p[_s16](int32_t a, int16x8_t b, mve_pred16_t p)  */
+struct unary_int32_acc_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "sx32,sx32,v0", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (2, i, nargs)
+   || !r.require_integer_immediate (0)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (unary_int32_acc)
+
 /* _t vfoo[_n]_t0(_t)
 
Example: vdupq.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index f422550559e..fc1bacbd4da 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -55,6 +55,7 @@ namespace arm_mve
 extern const function_shape *const unary;
 extern const function_shape *const unary_convert;
 extern const function_shape *const unary_int32;
+extern const function_shape *const unary_int32_acc;
 extern const function_shape *const unary_n;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1



[PATCH 12/20] arm: [MVE intrinsics] factorize vaddvaq

2023-05-10 Thread Christophe Lyon via Gcc-patches
Factorize vaddvaq builtins so that they use parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add vaddva.
* config/arm/mve.md (mve_vaddvaq_): Rename into ...
(@mve_q_): ... this.
(mve_vaddvaq_p_): Rename into ...
(@mve_q_p_): ... this.
---
 gcc/config/arm/iterators.md | 2 ++
 gcc/config/arm/mve.md   | 8 
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 46c7ddeda67..00123c0a376 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -762,6 +762,8 @@ (define_int_attr mve_insn [
 (VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd") (VADDQ_M_N_F "vadd")
 (VADDQ_M_S "vadd") (VADDQ_M_U "vadd") (VADDQ_M_F "vadd")
 (VADDQ_N_S "vadd") (VADDQ_N_U "vadd") (VADDQ_N_F "vadd")
+(VADDVAQ_P_S "vaddva") (VADDVAQ_P_U "vaddva")
+(VADDVAQ_S "vaddva") (VADDVAQ_U "vaddva")
 (VADDVQ_P_S "vaddv") (VADDVQ_P_U "vaddv")
 (VADDVQ_S "vaddv") (VADDVQ_U "vaddv")
 (VANDQ_M_S "vand") (VANDQ_M_U "vand") (VANDQ_M_F "vand")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index d772f4d4380..1ccbce3c89c 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -758,7 +758,7 @@ (define_insn "@mve_q_n_"
 ;;
 ;; [vaddvaq_s, vaddvaq_u])
 ;;
-(define_insn "mve_vaddvaq_"
+(define_insn "@mve_q_"
   [
(set (match_operand:SI 0 "s_register_operand" "=Te")
(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
@@ -766,7 +766,7 @@ (define_insn "mve_vaddvaq_"
 VADDVAQ))
   ]
   "TARGET_HAVE_MVE"
-  "vaddva.%#\t%0, %q2"
+  ".%#\t%0, %q2"
   [(set_attr "type" "mve_move")
 ])
 
@@ -1944,7 +1944,7 @@ (define_insn "@mve_q_m_"
 ;;
 ;; [vaddvaq_p_u, vaddvaq_p_s])
 ;;
-(define_insn "mve_vaddvaq_p_"
+(define_insn "@mve_q_p_"
   [
(set (match_operand:SI 0 "s_register_operand" "=Te")
(unspec:SI [(match_operand:SI 1 "s_register_operand" "0")
@@ -1953,7 +1953,7 @@ (define_insn "mve_vaddvaq_p_"
 VADDVAQ_P))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vaddvat.%#   %0, %q2"
+  "vpst\;t.%#\t%0, %q2"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-- 
2.34.1



[PATCH 05/20] arm: [MVE intrinsics] rework vrev16q vrev32q vrev64q

2023-05-10 Thread Christophe Lyon via Gcc-patches
Implement vrev16q, vrev32q, vrev64q using the new MVE builtins
framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vrev16q, vrev32q, vrev64q):
New.
* config/arm/arm-mve-builtins-base.def (vrev16q, vrev32q)
(vrev64q): New.
* config/arm/arm-mve-builtins-base.h (vrev16q, vrev32q)
(vrev64q): New.
* config/arm/arm_mve.h (vrev16q): Remove.
(vrev32q): Remove.
(vrev64q): Remove.
(vrev64q_m): Remove.
(vrev16q_m): Remove.
(vrev32q_m): Remove.
(vrev16q_x): Remove.
(vrev32q_x): Remove.
(vrev64q_x): Remove.
(vrev64q_f16): Remove.
(vrev64q_f32): Remove.
(vrev32q_f16): Remove.
(vrev16q_s8): Remove.
(vrev32q_s8): Remove.
(vrev32q_s16): Remove.
(vrev64q_s8): Remove.
(vrev64q_s16): Remove.
(vrev64q_s32): Remove.
(vrev64q_u8): Remove.
(vrev64q_u16): Remove.
(vrev64q_u32): Remove.
(vrev32q_u8): Remove.
(vrev32q_u16): Remove.
(vrev16q_u8): Remove.
(vrev64q_m_u8): Remove.
(vrev64q_m_s8): Remove.
(vrev64q_m_u16): Remove.
(vrev64q_m_s16): Remove.
(vrev64q_m_u32): Remove.
(vrev64q_m_s32): Remove.
(vrev16q_m_s8): Remove.
(vrev32q_m_f16): Remove.
(vrev16q_m_u8): Remove.
(vrev32q_m_s8): Remove.
(vrev64q_m_f16): Remove.
(vrev32q_m_u8): Remove.
(vrev32q_m_s16): Remove.
(vrev64q_m_f32): Remove.
(vrev32q_m_u16): Remove.
(vrev16q_x_s8): Remove.
(vrev16q_x_u8): Remove.
(vrev32q_x_s8): Remove.
(vrev32q_x_s16): Remove.
(vrev32q_x_u8): Remove.
(vrev32q_x_u16): Remove.
(vrev64q_x_s8): Remove.
(vrev64q_x_s16): Remove.
(vrev64q_x_s32): Remove.
(vrev64q_x_u8): Remove.
(vrev64q_x_u16): Remove.
(vrev64q_x_u32): Remove.
(vrev32q_x_f16): Remove.
(vrev64q_x_f16): Remove.
(vrev64q_x_f32): Remove.
(__arm_vrev16q_s8): Remove.
(__arm_vrev32q_s8): Remove.
(__arm_vrev32q_s16): Remove.
(__arm_vrev64q_s8): Remove.
(__arm_vrev64q_s16): Remove.
(__arm_vrev64q_s32): Remove.
(__arm_vrev64q_u8): Remove.
(__arm_vrev64q_u16): Remove.
(__arm_vrev64q_u32): Remove.
(__arm_vrev32q_u8): Remove.
(__arm_vrev32q_u16): Remove.
(__arm_vrev16q_u8): Remove.
(__arm_vrev64q_m_u8): Remove.
(__arm_vrev64q_m_s8): Remove.
(__arm_vrev64q_m_u16): Remove.
(__arm_vrev64q_m_s16): Remove.
(__arm_vrev64q_m_u32): Remove.
(__arm_vrev64q_m_s32): Remove.
(__arm_vrev16q_m_s8): Remove.
(__arm_vrev16q_m_u8): Remove.
(__arm_vrev32q_m_s8): Remove.
(__arm_vrev32q_m_u8): Remove.
(__arm_vrev32q_m_s16): Remove.
(__arm_vrev32q_m_u16): Remove.
(__arm_vrev16q_x_s8): Remove.
(__arm_vrev16q_x_u8): Remove.
(__arm_vrev32q_x_s8): Remove.
(__arm_vrev32q_x_s16): Remove.
(__arm_vrev32q_x_u8): Remove.
(__arm_vrev32q_x_u16): Remove.
(__arm_vrev64q_x_s8): Remove.
(__arm_vrev64q_x_s16): Remove.
(__arm_vrev64q_x_s32): Remove.
(__arm_vrev64q_x_u8): Remove.
(__arm_vrev64q_x_u16): Remove.
(__arm_vrev64q_x_u32): Remove.
(__arm_vrev64q_f16): Remove.
(__arm_vrev64q_f32): Remove.
(__arm_vrev32q_f16): Remove.
(__arm_vrev32q_m_f16): Remove.
(__arm_vrev64q_m_f16): Remove.
(__arm_vrev64q_m_f32): Remove.
(__arm_vrev32q_x_f16): Remove.
(__arm_vrev64q_x_f16): Remove.
(__arm_vrev64q_x_f32): Remove.
(__arm_vrev16q): Remove.
(__arm_vrev32q): Remove.
(__arm_vrev64q): Remove.
(__arm_vrev64q_m): Remove.
(__arm_vrev16q_m): Remove.
(__arm_vrev32q_m): Remove.
(__arm_vrev16q_x): Remove.
(__arm_vrev32q_x): Remove.
(__arm_vrev64q_x): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   3 +
 gcc/config/arm/arm-mve-builtins-base.def |   5 +
 gcc/config/arm/arm-mve-builtins-base.h   |   3 +
 gcc/config/arm/arm_mve.h | 820 ---
 4 files changed, 11 insertions(+), 820 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 14870f5b1aa..76294ddb7fb 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -293,6 +293,9 @@ FUNCTION_ONLY_N_NO_U_F (vqshrunbq, VQSHRUNBQ)
 FUNCTION_ONLY_N_NO_U_F (vqshruntq, VQSHRUNTQ)
 FUNCTION_WITH_M_N_NO_F (vqsubq, VQSUBQ)
 FUNCTION (vreinterpretq, vreinterpretq_impl,)
+FUNCTION_WITHOUT_N_NO_F (vrev16q, VREV16Q)
+FUNCTION_WITHOUT_N (vrev32q, VREV32Q)
+FUNCTION_WITHOUT_N (vrev64q, VREV64Q)
 FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
 FUNCTION_

Re: Testsuite: Add 'torture-init-done', and use it to conditionalize implicit 'torture-init' (was: Testsuite: Add missing 'torture-init'/'torture-finish' around 'LTO_TORTURE_OPTIONS' usage (was: Let e

2023-05-10 Thread Christophe Lyon via Gcc-patches
Hi Thomas,


On Wed, 10 May 2023 at 09:52, Thomas Schwinge 
wrote:

> Hi Christophe!
>
> On 2023-05-09T21:14:07+0200, Christophe Lyon 
> wrote:
> > On Tue, 9 May 2023 at 17:17, Christophe Lyon  >
> > wrote:
> >> On Tue, 9 May 2023 at 11:00, Thomas Schwinge 
> >> wrote:
> >>> On 2023-05-09T09:32:55+0200, Christophe Lyon <
> christophe.l...@linaro.org>
> >>> wrote:
> >>> > On Wed, 3 May 2023 at 13:47, Richard Biener via Gcc-patches <
> >>> gcc-patches@gcc.gnu.org> wrote:
> >>> >> On Wed, 3 May 2023, Thomas Schwinge wrote:
> >>> >> > "Let each 'lto_init' determine the default 'LTO_OPTIONS', and
> >>> 'torture-init' the 'LTO_TORTURE_OPTIONS'"?
> >>> >
> >>> > This is causing issues on arm/aarch64, including:
> >>> >
> >>> > ERROR: can't read "LTO_TORTURE_OPTIONS": no such variable
> >>> > in gcc.target/arm/acle/acle.exp:
> >>> >
> >>> > ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
> >>> > in gcc.target/aarch64/sls-mitigation/sls-mitigation.exp,
> >>> > gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp,
> >>> > gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp,
> >>> > gcc.target/aarch64/torture/aarch64-torture.exp
> >>> >
> >>> > and maybe others
> >>> >
> >>> > Are other targets affected too?
> >>>
> >>> Sorry for that -- it means, the safe-guards I added are working as
> >>> expected.
> >>>
> >>> Please test whether all these issues are gone with the attached
> >>> "Testsuite: Add missing 'torture-init'/'torture-finish' around
> >>> 'LTO_TORTURE_OPTIONS' usage"?
> >>
> >> Your patch seemed reasonable,  but it doesn't work :-(
> >>
> >> Well now I get:
> >> ERROR: torture-init: LTO_TORTURE_OPTIONS is not empty as expected
> >> because gcc-dg-runtest itself calls torture-init
> >>
> >> but I'm not sure where LTO_TORTURE_OPTIONS is set
> >
> > Just checking, are you able to test your changes on arm (a cross
> toolchain
> > is OK) ?
>
> Sorry, I don't currently have an arm/aarch64 toolchain built.
>
> > The problem shows up even if running only acle.exp, so it's quick once
> you
> > have built the toolchain once.
>
> I did a quick hack:
>
> --- gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
> +++ gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
> @@ -22,3 +21,0 @@
> -if {![istarget aarch64*-*-*] } then {
> -  return
> -}
> --- gcc/testsuite/gcc.target/arm/acle/acle.exp
> +++ gcc/testsuite/gcc.target/arm/acle/acle.exp
> @@ -20,3 +19,0 @@
> -if ![istarget arm*-*-*] then {
> -  return
> -}
>
> ..., and confirm to run into the DejaGnu/TCL ERRORs in my
> x86_64-pc-linux-gnu testing.
>
> > I spent some time looking at it, and the conflict is that the .exp file
> > calls torture-init and gcc-dg-runtest, which in turn calls torture-init
> > again, leading to the error.
>
> I see, thanks -- and sorry, once again.
>
> > I haven't checked the details of why there are similar failures on
> aarch64.
>
> I now understand that the problem is the following: most of all '*.exp'
> files have 'torture-init' followed by 'set-torture-options' before
> 'gcc-dg-runtest' etc., and therefore don't run into the latter's
> "Some callers set torture options themselves; don't override those."
> code.  Some '*.exp' files however do 'torture-init' but not
> 'set-torture-options', and therefore we can't any longer conditionalize
> the implicit 'torture-init' by '![torture-options-exist]'.
> Please in addition to the earlier
> "Testsuite: Add missing 'torture-init'/'torture-finish' around
> 'LTO_TORTURE_OPTIONS' usage"
> also apply the attached
> "Testsuite: Add 'torture-init-done', and use it to conditionalize implicit
> 'torture-init'".
> That hopefully should restore sanity -- if not, I'll get arm/aarch64
> toolchains built.
>
>
Thanks for the patch, it seems to work!

Christophe


>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> Registergericht München, HRB 106955
>


[PATCH] Avoid g++.dg/torture/pr106922.C FAIL with the pre-C++11 ABI

2023-05-10 Thread Richard Biener via Gcc-patches
The following forces the g++.dg/torture/pr106922.C testcase to use
the C++11 libstdc++ ABI and checks if that was successful.

Does this look OK?

Thanks,
Richard.

* g++.dg/uninit-pr106722-2.C: Force _GLIBCXX_USE_CXX11_ABI to 1.
---
 gcc/testsuite/g++.dg/torture/pr106922.C | 9 +
 1 file changed, 9 insertions(+)

diff --git a/gcc/testsuite/g++.dg/torture/pr106922.C 
b/gcc/testsuite/g++.dg/torture/pr106922.C
index 046fc6cce76..4b692d621ea 100644
--- a/gcc/testsuite/g++.dg/torture/pr106922.C
+++ b/gcc/testsuite/g++.dg/torture/pr106922.C
@@ -4,8 +4,16 @@
 // -O1 doesn't iterate VN and thus has bogus uninit diagnostics
 // { dg-skip-if "" { *-*-* } { "-O1" } { "" } }
 
+// The testcase still emits bogus diagnostics with the pre-C++11 ABI
+#undef _GLIBCXX_USE_CXX11_ABI
+#define _GLIBCXX_USE_CXX11_ABI 1
+
 #include 
 
+// When the library is not dual-ABI and defaults to old just compile
+// and empty TU
+#if _GLIBCXX_USE_CXX11_ABI
+
 #include 
 template 
 using Optional = std::optional;
@@ -46,3 +54,4 @@ void test()
 externals.external2 = internal2;
 }
 }
+#endif
-- 
2.35.3


RE: [PATCH v2] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Li, Pan2 via Gcc-patches
I see, will try to get rid of dv_as_opaque everywhere. Thank you all!

Pan

-Original Message-
From: Richard Sandiford  
Sent: Wednesday, May 10, 2023 8:53 PM
To: Jakub Jelinek 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, Yanzhang 
; jeffreya...@gmail.com; rguent...@suse.de
Subject: Re: [PATCH v2] Var-Tracking: Typedef pointer_mux 
as decl_or_value

Jakub Jelinek  writes:
> On Wed, May 10, 2023 at 07:57:05PM +0800, pan2...@intel.com wrote:
>> --- a/gcc/var-tracking.cc
>> +++ b/gcc/var-tracking.cc
>> @@ -116,9 +116,14 @@
>>  #include "fibonacci_heap.h"
>>  #include "print-rtl.h"
>>  #include "function-abi.h"
>> +#include "mux-utils.h"
>>  
>>  typedef fibonacci_heap  bb_heap_t;
>>  
>> +/* A declaration of a variable, or an RTL value being handled like a
>> +   declaration by pointer_mux.  */
>> +typedef pointer_mux decl_or_value;
>> +
>>  /* var-tracking.cc assumes that tree code with the same value as VALUE rtx 
>> code
>> has no chance to appear in REG_EXPR/MEM_EXPRs and isn't a decl.
>> Currently the value is the same as IDENTIFIER_NODE, which has 
>> such @@ -196,15 +201,11 @@ struct micro_operation  };
>>  
>>  
>> -/* A declaration of a variable, or an RTL value being handled like a
>> -   declaration.  */
>> -typedef void *decl_or_value;
>
> Why do you move the typedef?
>
>> @@ -503,9 +505,7 @@ variable_hasher::hash (const variable *v)  inline 
>> bool  variable_hasher::equal (const variable *v, const void *y)  {
>> -  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
>> -
>> -  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));
>> +  return dv_as_opaque (v->dv) == y;
>>  }
>
> I'm not convinced this is correct.  I think all the 
> find_slot_with_hash etc. pass in a decl_or_value, so I'd expect y to 
> have decl_or_value type or something similar.
>
>>  /* Free the element of VARIABLE_HTAB (its type is struct 
>> variable_def).  */ @@ -1396,8 +1396,7 @@ onepart_pool_allocate 
>> (onepart_enum onepart)  static inline decl_or_value  dv_from_decl 
>> (tree decl)  {
>> -  decl_or_value dv;
>> -  dv = decl;
>> +  decl_or_value dv = decl_or_value::first (decl);
>
> Can't you just decl_or_value dv = decl; ?  I think pointer_mux has 
> ctors from pointers to the template parameter types.
>
>>gcc_checking_assert (dv_is_decl_p (dv));
>>return dv;
>>  }
>> @@ -1406,8 +1405,7 @@ dv_from_decl (tree decl)  static inline 
>> decl_or_value  dv_from_value (rtx value)  {
>> -  decl_or_value dv;
>> -  dv = value;
>> +  decl_or_value dv = decl_or_value::second (value);
>
> Ditto.
>
>> @@ -1661,7 +1659,8 @@ shared_hash_find_slot_unshare_1 (shared_hash 
>> **pvars, decl_or_value dv,  {
>>if (shared_hash_shared (*pvars))
>>  *pvars = shared_hash_unshare (*pvars);
>> -  return shared_hash_htab (*pvars)->find_slot_with_hash (dv, dvhash, 
>> ins);
>> +  return shared_hash_htab (*pvars)->find_slot_with_hash (dv_as_opaque (dv),
>> + dvhash, ins);
>
> Then you wouldn't need to change all these.

Also, please do try changing variable_hasher::compare_type to decl_or_value, 
and changing the type of the second parameter to variable_hasher::equal 
accordingly.  I still feel that we should be able to get rid of dv_as_opaque 
entirely.

Thanks,
Richard


[PATCH take #3] match.pd: Simplify popcount/parity of bswap/rotate.

2023-05-10 Thread Roger Sayle

This is the latest iteration of my patch from August 2020
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552391.html
incorporating feedback and suggestions from reviewers.

This patch to match.pd optimizes away bit permutation operations,
specifically bswap and rotate, in calls to popcount and parity.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-05-10  Roger Sayle  

gcc/ChangeLog
* match.pd : Simplify popcount(bswap(x))
as popcount(x).  Simplify popcount(rotate(x,y)) as popcount(x).
:  Simplify parity(bswap(x)) as parity(x).
Simplify parity(rotate(x,y)) as parity(x).

gcc/testsuite/ChangeLog
* gcc.dg/fold-parity-6.c: New test.
* gcc.dg/fold-parity-7.c: New test.
* gcc.dg/fold-popcount-6.c: New test.
* gcc.dg/fold-popcount-7.c: New test.


Thanks again,
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index ceae1c3..bc083be 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7766,6 +7766,32 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cmp (popcount @0) integer_zerop)
   (rep @0 { build_zero_cst (TREE_TYPE (@0)); }
 
+/* popcount(bswap(x)) is popcount(x).  */
+(for popcount (POPCOUNT)
+  (for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32
+ BUILT_IN_BSWAP64 BUILT_IN_BSWAP128)
+(simplify
+  (popcount (convert?@0 (bswap:s@1 @2)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1)))
+   (with { unsigned int prec0 = TYPE_PRECISION (TREE_TYPE (@0));
+   unsigned int prec1 = TYPE_PRECISION (TREE_TYPE (@1)); }
+ (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (@1)))
+   (popcount @2)))
+
+/* popcount(rotate(X Y)) is popcount(X).  */
+(for popcount (POPCOUNT)
+  (for rot (lrotate rrotate)
+(simplify
+  (popcount (convert?@0 (rot:s@1 @2 @3)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))  
+  && (GIMPLE || !TREE_SIDE_EFFECTS (@3)))
+   (with { unsigned int prec0 = TYPE_PRECISION (TREE_TYPE (@0));
+   unsigned int prec1 = TYPE_PRECISION (TREE_TYPE (@1)); }
+ (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (@1)))
+   (popcount @2)))
+
 /* Canonicalize POPCOUNT(x)&1 as PARITY(X).  */
 (simplify
   (bit_and (POPCOUNT @0) integer_onep)
@@ -,6 +7803,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (PARITY (bit_not @0))
   (PARITY @0))
 
+/* parity(bswap(x)) is parity(x).  */
+(for parity (PARITY)
+  (for bswap (BUILT_IN_BSWAP16 BUILT_IN_BSWAP32
+ BUILT_IN_BSWAP64 BUILT_IN_BSWAP128)
+(simplify
+  (parity (convert?@0 (bswap:s@1 @2)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && TYPE_PRECISION (TREE_TYPE (@0))
+ >= TYPE_PRECISION (TREE_TYPE (@1)))
+   (parity @2)
+
+/* parity(rotate(X Y)) is parity(X).  */
+(for parity (PARITY)
+  (for rot (lrotate rrotate)
+(simplify
+  (parity (convert?@0 (rot:s@1 @2 @3)))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+  && (GIMPLE || !TREE_SIDE_EFFECTS (@3))
+  && TYPE_PRECISION (TREE_TYPE (@0))
+ >= TYPE_PRECISION (TREE_TYPE (@1)))
+   (parity @2)
+
 /* parity(X)^parity(Y) is parity(X^Y).  */
 (simplify
   (bit_xor (PARITY:s @0) (PARITY:s @1))
diff --git a/gcc/testsuite/gcc.dg/fold-parity-6.c 
b/gcc/testsuite/gcc.dg/fold-parity-6.c
new file mode 100644
index 000..623afb9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-6.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned int x)
+{
+#if __SIZEOF_INT__ == 4
+  return __builtin_parity (__builtin_bswap32(x));
+#elif __SIZEOF_INT__ == 2
+  return __builtin_parity (__builtin_bswap16(x));
+#else
+  return x;
+#endif
+}
+
+int bar(unsigned long x)
+{
+#if __SIZEOF_LONG__ == 8
+  return __builtin_parityl (__builtin_bswap64(x));
+#elif __SIZEOF_LONG__ == 4
+  return __builtin_parityl (__builtin_bswap32(x));
+#else
+  return x;
+#endif
+}
+
+int baz(unsigned long long x)
+{
+#if __SIZEOF_LONG_LONG__ == 8
+  return __builtin_parityll (__builtin_bswap64(x));
+#elif __SIZEOF_LONG_LONG__ == 4
+  return __builtin_parityll (__builtin_bswap32(x));
+#else
+  return x;
+#endif
+}
+
+/* { dg-final { scan-tree-dump-times "bswap" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/fold-parity-7.c 
b/gcc/testsuite/gcc.dg/fold-parity-7.c
new file mode 100644
index 000..c08cdee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-7.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned int x)
+{
+#if __SIZEOF_INT__ == 4
+  unsigned int y = (x>>4) | (x<<28);
+  return __builtin_parity(y);
+#elif __SIZEOF_INT__ == 2
+  unsigne

RE: [PATCH 15/20] arm: [MVE intrinsics] add unary_acc shape

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Wednesday, May 10, 2023 2:31 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 15/20] arm: [MVE intrinsics] add unary_acc shape
> 
> This patch adds the unary_acc shape description.
> 
> 2022-10-25  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm-mve-builtins-shapes.cc (unary_acc): New.
>   * config/arm/arm-mve-builtins-shapes.h (unary_acc): New.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 28 +++
>  gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
>  2 files changed, 29 insertions(+)
> 
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc b/gcc/config/arm/arm-
> mve-builtins-shapes.cc
> index bff1c3e843b..e77a0cc20ac 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -1066,6 +1066,34 @@ struct unary_def : public overloaded_base<0>
>  };
>  SHAPE (unary)
> 
> +/* _t vfoo[_](_t)
> +
> +   i.e. a version of "unary" in which the source elements are half the
> +   size of the destination scalar, but have the same type class.
> +
> +   Example: vaddlvq.
> +   int64_t [__arm_]vaddlvq[_s32](int32x4_t a)
> +   int64_t [__arm_]vaddlvq_p[_s32](int32x4_t a, mve_pred16_t p) */
> +struct unary_acc_def : public overloaded_base<0>
> +{
> +  void
> +  build (function_builder &b, const function_group_info &group,
> +  bool preserve_user_namespace) const override
> +  {
> +b.add_overloaded_functions (group, MODE_none,
> preserve_user_namespace);
> +build_all (b, "sw0,v0", group, MODE_none, preserve_user_namespace);
> +  }
> +
> +  tree
> +  resolve (function_resolver &r) const override
> +  {
> +/* FIXME: check that the return value is actually
> +   twice as wide as arg 0.  */

Any reason why we can't add that check now?
I'd rather not add new FIXMEs here...
Thanks,
Kyrill

> +return r.resolve_unary ();
> +  }
> +};
> +SHAPE (unary_acc)
> +
>  /* _t foo_t0[_t1](_t)
> 
> where the target type  must be specified explicitly but the source
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h b/gcc/config/arm/arm-
> mve-builtins-shapes.h
> index fc1bacbd4da..c062fe624c4 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.h
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.h
> @@ -53,6 +53,7 @@ namespace arm_mve
>  extern const function_shape *const create;
>  extern const function_shape *const inherent;
>  extern const function_shape *const unary;
> +extern const function_shape *const unary_acc;
>  extern const function_shape *const unary_convert;
>  extern const function_shape *const unary_int32;
>  extern const function_shape *const unary_int32_acc;
> --
> 2.34.1



[pushed 1/2] c++: always check consteval address

2023-05-10 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The restriction on the "permitted result of a constant expression" to not
refer to an immediate function applies regardless of context.  The previous
code tried to only check in cases where we wouldn't get the check in
cp_fold_r, but with the next patch I would need to add another case and it
shouldn't be a problem to always check.

We also shouldn't talk about immediate evaluation when we aren't dealing
with one.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Always check
for address of immediate fn.
(maybe_constant_init_1): Evaluate PTRMEM_CST.

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2478.C: Handle -fimplicit-constexpr.
* g++.dg/cpp23/consteval-if12.C: Adjust diagnostics.
* g++.dg/cpp2a/consteval20.C: Likewise.
* g++.dg/cpp2a/consteval24.C: Likewise.
* g++.dg/cpp2a/srcloc20.C: Likewise.
---
 gcc/cp/constexpr.cc | 22 ++---
 gcc/testsuite/g++.dg/DRs/dr2478.C   | 17 +---
 gcc/testsuite/g++.dg/cpp23/consteval-if12.C |  8 
 gcc/testsuite/g++.dg/cpp2a/consteval20.C|  4 ++--
 gcc/testsuite/g++.dg/cpp2a/consteval24.C|  8 
 gcc/testsuite/g++.dg/cpp2a/srcloc20.C   |  4 ++--
 6 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 987a536d515..7b8090625e8 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8353,7 +8353,8 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
   non_constant_p = true;
 }
 
-  if (!global_ctx.heap_vars.is_empty ())
+  if (!non_constant_p && cxx_dialect >= cxx20
+  && !global_ctx.heap_vars.is_empty ())
 {
   tree heap_var = cp_walk_tree_without_duplicates (&r, find_heap_var_refs,
   NULL);
@@ -8384,15 +8385,22 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
 
   /* Check that immediate invocation does not return an expression referencing
  any immediate function decls.  */
-  if (is_consteval || in_immediate_context ())
+  if (!non_constant_p && cxx_dialect >= cxx20)
 if (tree immediate_fndecl
= cp_walk_tree_without_duplicates (&r, find_immediate_fndecl,
   NULL))
 {
   if (!allow_non_constant && !non_constant_p)
-   error_at (cp_expr_loc_or_input_loc (t),
- "immediate evaluation returns address of immediate "
- "function %qD", immediate_fndecl);
+   {
+ if (is_consteval)
+   error_at (cp_expr_loc_or_input_loc (t),
+ "immediate evaluation returns address of immediate "
+ "function %qD", immediate_fndecl);
+ else
+   error_at (cp_expr_loc_or_input_loc (t),
+ "constant evaluation returns address of immediate "
+ "function %qD", immediate_fndecl);
+   }
   r = t;
   non_constant_p = true;
 }
@@ -8795,8 +8803,8 @@ maybe_constant_init_1 (tree t, tree decl, bool 
allow_non_constant,
 t = TARGET_EXPR_INITIAL (t);
   if (!is_nondependent_static_init_expression (t))
 /* Don't try to evaluate it.  */;
-  else if (CONSTANT_CLASS_P (t) && allow_non_constant)
-/* No evaluation needed.  */;
+  else if (CONSTANT_CLASS_P (t) && TREE_CODE (t) != PTRMEM_CST)
+/* No evaluation needed.  PTRMEM_CST needs the immediate fn check.  */;
   else
 {
   /* [basic.start.static] allows constant-initialization of variables with
diff --git a/gcc/testsuite/g++.dg/DRs/dr2478.C 
b/gcc/testsuite/g++.dg/DRs/dr2478.C
index 7e939ac6850..7f581cabb7b 100644
--- a/gcc/testsuite/g++.dg/DRs/dr2478.C
+++ b/gcc/testsuite/g++.dg/DRs/dr2478.C
@@ -1,11 +1,14 @@
 // DR 2478 - Properties of explicit specializations of implicitly-instantiated 
class templates
 // { dg-do compile { target c++20 } }
 
+// Defeat -fimplicit-constexpr
+int ii;
+
 template 
 struct S {
-  int foo () { return 0; }
+  int foo () { return ii; }
   constexpr int bar () { return 0; }
-  int baz () { return 0; }
+  int baz () { return ii; }
   consteval int qux () { return 0; }
   constexpr S () {}
   static constinit T x;
@@ -14,7 +17,7 @@ struct S {
 
 template 
 T S::x = S ().foo ();// { dg-error "'constinit' variable 
'S::x' does not have a constant initializer" }
-   // { dg-error "call to non-'constexpr' 
function" "" { target *-*-* } .-1 }
+   // { dg-error "call to non-'constexpr' 
function|called in a constant expression" "" { target *-*-* } .-1 }
 
 template 
 T S::y = S ().foo ();
@@ -49,14 +52,14 @@ S::qux ()
 
 template <>
 long S::x = S ().foo ();   // { dg-bogus "'constinit' variable 
'S::x' does not have a constant initializer" "" { xfail *-*-* } }
-   // { dg-bogus "call to non-'constexpr' 
fun

[pushed 2/2] c++: be stricter about constinit [CWG2543]

2023-05-10 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

DR 2543 clarifies that constinit variables should follow the language, and
diagnose non-constant initializers (according to [expr.const]) even if they
can actually initialize the variables statically.

DR 2543

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Preserve
TARGET_EXPR flags.
(potential_constant_expression_1): Check TARGET_EXPR_ELIDING_P.
* typeck2.cc (store_init_value): Diagnose constinit sooner.

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2543.C: New test.
---
 gcc/cp/constexpr.cc   | 12 +++
 gcc/cp/typeck2.cc | 55 +--
 gcc/testsuite/g++.dg/DRs/dr2543.C |  5 +++
 3 files changed, 48 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/DRs/dr2543.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 7b8090625e8..8f7f0b7d325 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8448,6 +8448,17 @@ cxx_eval_outermost_constant_expr (tree t, bool 
allow_non_constant,
}
 }
 
+  if (TREE_CODE (t) == TARGET_EXPR
+  && TREE_CODE (r) == TARGET_EXPR)
+{
+  /* Preserve this flag for potential_constant_expression, and the others
+for good measure.  */
+  TARGET_EXPR_ELIDING_P (r) = TARGET_EXPR_ELIDING_P (t);
+  TARGET_EXPR_IMPLICIT_P (r) = TARGET_EXPR_IMPLICIT_P (t);
+  TARGET_EXPR_LIST_INIT_P (r) = TARGET_EXPR_LIST_INIT_P (t);
+  TARGET_EXPR_DIRECT_INIT_P (r) = TARGET_EXPR_DIRECT_INIT_P (t);
+}
+
   /* Remember the original location if that wouldn't need a wrapper.  */
   if (location_t loc = EXPR_LOCATION (t))
 protected_set_expr_location (r, loc);
@@ -9774,6 +9785,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 
 case TARGET_EXPR:
   if (!TARGET_EXPR_DIRECT_INIT_P (t)
+ && !TARGET_EXPR_ELIDING_P (t)
  && !literal_type_p (TREE_TYPE (t)))
{
  if (flags & tf_error)
diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index bf03967a71f..f5cc7c8371c 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -843,23 +843,45 @@ store_init_value (tree decl, tree init, vec** cleanups, int flags)
   bool const_init;
   tree oldval = value;
   if (DECL_DECLARED_CONSTEXPR_P (decl)
+ || DECL_DECLARED_CONSTINIT_P (decl)
  || (DECL_IN_AGGR_P (decl)
  && DECL_INITIALIZED_IN_CLASS_P (decl)))
{
  value = fold_non_dependent_expr (value, tf_warning_or_error,
   /*manifestly_const_eval=*/true,
   decl);
+ if (value == error_mark_node)
+   ;
  /* Diagnose a non-constant initializer for constexpr variable or
 non-inline in-class-initialized static data member.  */
- if (!require_constant_expression (value))
-   value = error_mark_node;
- else if (processing_template_decl)
-   /* In a template we might not have done the necessary
-  transformations to make value actually constant,
-  e.g. extend_ref_init_temps.  */
-   value = maybe_constant_init (value, decl, true);
+ else if (!is_constant_expression (value))
+   {
+ /* Maybe we want to give this message for constexpr variables as
+well, but that will mean a lot of testsuite adjustment.  */
+ if (DECL_DECLARED_CONSTINIT_P (decl))
+ error_at (location_of (decl),
+   "% variable %qD does not have a "
+   "constant initializer", decl);
+ require_constant_expression (value);
+ value = error_mark_node;
+   }
  else
-   value = cxx_constant_init (value, decl);
+   {
+ value = maybe_constant_init (value, decl, true);
+
+ /* In a template we might not have done the necessary
+transformations to make value actually constant,
+e.g. extend_ref_init_temps.  */
+ if (!processing_template_decl
+ && !TREE_CONSTANT (value))
+   {
+ if (DECL_DECLARED_CONSTINIT_P (decl))
+ error_at (location_of (decl),
+   "% variable %qD does not have a "
+   "constant initializer", decl);
+ value = cxx_constant_init (value, decl);
+   }
+   }
}
   else
value = fold_non_dependent_init (value, tf_warning_or_error,
@@ -875,22 +897,7 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   if (!TYPE_REF_P (type))
TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl);
   if (!const_init)
-   {
- /* [dcl.constinit]/2 "If a variable declared with the constinit
-specifier has d

[PATCH v3] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Pan Li via Gcc-patches
From: Pan Li 

The decl_or_value is defined as void * before this PATCH. It will take
care of both the tree_node and rtx_def. Unfortunately, given a void
pointer cannot tell the input is tree_node or rtx_def.

Then we have some implicit structure layout requirement similar as
below. Or we will touch unreasonable bits when cast void * to tree_node
or rtx_def.

++---+--+
| offset | tree_node | rtx_def  |
++---+--+
|  0 | code: 16  | code: 16 | <- require the location and bitssize
++---+--+
| 16 | ...   | mode: 8  |
++---+--+
| ...   |
++---+--+
| 24 | ...   | ...  |
++---+--+

This behavior blocks the PATCH that extend the rtx_def mode from 8 to
16 bits for running out of machine mode. This PATCH introduced the
pointer_mux to tell the input is tree_node or rtx_def, and decouple
the above implicition dependency.

Signed-off-by: Pan Li 
Co-Authored-By: Richard Sandiford 
Co-Authored-By: Richard Biener 
Co-Authored-By: Jakub Jelinek 

gcc/ChangeLog:

* mux-utils.h: Add overload operator == and != for pointer_mux.
* var-tracking.cc: Included mux-utils.h for pointer_tmux.
(decl_or_value): Changed from void * to pointer_mux.
(dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
(dv_as_decl): Ditto.
(dv_as_opaque): Removed due to unnecessary.
(struct variable_hasher): Take decl_or_value as compare_type.
(variable_hasher::equal): Diito.
(dv_from_decl): Reconciled to the new type, aka pointer_mux.
(dv_from_value): Ditto.
(attrs_list_member): Ditto.
(vars_copy): Ditto.
(var_reg_decl_set): Ditto.
(var_reg_delete_and_set): Ditto.
(find_loc_in_1pdv): Ditto.
(canonicalize_values_star): Ditto.
(variable_post_merge_new_vals): Ditto.
(dump_onepart_variable_differences): Ditto.
(variable_different_p): Ditto.
(variable_was_changed): Ditto.
(set_slot_part): Ditto.
(clobber_slot_part): Ditto.
(clobber_variable_part): Ditto.
(remove_value_from_changed_variables): Ditto.
(notify_dependents_of_changed_value): Ditto.
---
 gcc/mux-utils.h | 12 ++
 gcc/var-tracking.cc | 96 ++---
 2 files changed, 51 insertions(+), 57 deletions(-)

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
index a2b6a316899..adf3d3b722b 100644
--- a/gcc/mux-utils.h
+++ b/gcc/mux-utils.h
@@ -72,6 +72,18 @@ public:
   // Return true unless the pointer is a null A pointer.
   explicit operator bool () const { return m_ptr; }
 
+  // Return true if class has the same m_ptr, or false.
+  bool operator == (const pointer_mux &other) const
+{
+  return this->m_ptr == other.m_ptr;
+}
+
+  // Return true if class has the different m_ptr, or false.
+  bool operator != (const pointer_mux &other) const
+{
+  return this->m_ptr != other.m_ptr;
+}
+
   // Assign A and B pointers respectively.
   void set_first (T1 *ptr) { *this = first (ptr); }
   void set_second (T2 *ptr) { *this = second (ptr); }
diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index fae0c73e02f..7a35f49020a 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -116,6 +116,7 @@
 #include "fibonacci_heap.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "mux-utils.h"
 
 typedef fibonacci_heap  bb_heap_t;
 
@@ -197,14 +198,14 @@ struct micro_operation
 
 
 /* A declaration of a variable, or an RTL value being handled like a
-   declaration.  */
-typedef void *decl_or_value;
+   declaration by pointer_mux.  */
+typedef pointer_mux decl_or_value;
 
 /* Return true if a decl_or_value DV is a DECL or NULL.  */
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  return dv.is_first ();
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
@@ -219,7 +220,7 @@ static inline tree
 dv_as_decl (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_decl_p (dv));
-  return (tree) dv;
+  return dv.known_first ();
 }
 
 /* Return the value in the decl_or_value.  */
@@ -227,14 +228,7 @@ static inline rtx
 dv_as_value (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_value_p (dv));
-  return (rtx)dv;
-}
-
-/* Return the opaque pointer in the decl_or_value.  */
-static inline void *
-dv_as_opaque (decl_or_value dv)
-{
-  return dv;
+  return dv.known_second ();
 }
 
 
@@ -483,9 +477,9 @@ static void variable_htab_free (void *);
 
 struct variable_hasher : pointer_hash 
 {
-  typedef void *compare_type;
+  typedef decl_or_value compare_type;
   static inline hashval_t hash (const variable *);
-  static inline bool equal (const variable *, const void *);
+  static inline bool equal (const variable *, const decl_or_value);
   static inline void remove (variable *);
 };
 

[PATCH] riscv: Add vectorized binops and insn_expander helpers.

2023-05-10 Thread Robin Dapp via Gcc-patches
Hi,

this patch adds basic binary integer operations support.  It is based
on Michael Collison's work and makes use of the existing helpers in
riscv-c.cc.  It introduces emit_nonvlmax_binop which, in turn, uses
emit_pred_binop.  Setting the destination as well as the mask and the
length is factored out into separate functions.

There are several things still missing, most notably the scalar variants
(.vx) as well as multiplication variants and more.

Bootstrapped and regtested.

Regards
 Robin

--

gcc/ChangeLog:

* config/riscv/autovec.md (3): Add integer binops.
* config/riscv/riscv-protos.h (emit_nonvlmax_binop): Declare.
* config/riscv/riscv-v.cc (emit_pred_op): New function.
(set_expander_dest_and_mask): New function.
(emit_pred_binop): New function.
(emit_nonvlmax_binop): New function.
* config/riscv/vector-iterators.md: Add new code attribute.
---
 gcc/config/riscv/autovec.md  | 33 ++
 gcc/config/riscv/riscv-protos.h  |  2 +
 gcc/config/riscv/riscv-v.cc  | 98 ++--
 gcc/config/riscv/vector-iterators.md | 22 +++
 4 files changed, 136 insertions(+), 19 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1c5ff5951b..15f8d007e07 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -58,3 +58,36 @@ (define_expand "movmisalign"
 DONE;
   }
 )
+
+;; =
+;; == Binary integer operations
+;; =
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+(any_int_binop:VI
+ (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  if (!register_operand (operands[2], mode))
+{
+  rtx cst;
+  gcc_assert (const_vec_duplicate_p(operands[2], &cst));
+  machine_mode inner = mode;
+  machine_mode op2mode = Pmode;
+  if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
+   op2mode = inner;
+
+  riscv_vector::emit_nonvlmax_binop (code_for_pred_scalar
+(, mode),
+operands[0], operands[1], cst,
+NULL_RTX, mode, op2mode);
+}
+  else
+riscv_vector::emit_nonvlmax_binop (code_for_pred
+  (, mode),
+  operands[0], operands[1], operands[2],
+  NULL_RTX, mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c0293a306f9..75cdb90b9c9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -169,6 +169,8 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
 void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
 void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode,
+ machine_mode = VOIDmode);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 7ca49ca67c1..3c43dfc5eea 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -53,7 +53,7 @@ namespace riscv_vector {
 template  class insn_expander
 {
 public:
-  insn_expander () : m_opno (0) {}
+  insn_expander () : m_opno (0), has_dest(false) {}
   void add_output_operand (rtx x, machine_mode mode)
   {
 create_output_operand (&m_ops[m_opno++], x, mode);
@@ -84,6 +84,44 @@ public:
 add_input_operand (gen_int_mode (type, Pmode), Pmode);
   }
 
+  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  {
+dest_mode = GET_MODE (dest);
+has_dest = true;
+
+add_output_operand (dest, dest_mode);
+
+if (mask)
+  add_input_operand (mask, GET_MODE (mask));
+else
+  add_all_one_mask_operand (mask_mode);
+
+add_vundef_operand (dest_mode);
+  }
+
+  void set_len_and_policy (rtx len, bool vlmax_p)
+{
+  gcc_assert (has_dest);
+  gcc_assert (len || vlmax_p);
+
+  if (len)
+   add_input_operand (len, Pmode);
+  else
+   {
+ rtx vlmax = gen_reg_rtx (Pmode);
+ emit_vlmax_vsetvl (dest_mode, vlmax);
+ add_input_operand (vlmax, Pmode);
+   }
+
+  if (GET_MODE_CLASS (dest_mode) != MODE_VECTOR_BOOL)
+   add_policy_operand (get_prefer_tail_policy (), get_prefer_mask_policy 
());
+
+  if (vlmax_p)
+   add_avl_type_operand (avl_type::VLMAX);
+  else
+   add_avl_type_operand (avl_type::NONVLMAX);
+}
+
   void expand (enum insn_code icode, bool temporary_volatile_p = false)
   {
 if (temporary_volatile_p)
@@ -97,6 +135,8 @@ public:
 
 

[PATCH] riscv: Split off shift patterns for autovectorization.

2023-05-10 Thread Robin Dapp via Gcc-patches
Hi,

this patch splits off the shift patterns of the binop patterns.
This is necessary as the scalar shifts require a Pmode operand
as shift count.  To this end, a new iterator any_int_binop_no_shift
is introduced.  At a later point when the binops are split up
further in commutative and non-commutative patterns (which both
do not include the shift patterns) we might not need this anymore.

Bootstrapped and regtested.

Regards
 Robin

--

gcc/ChangeLog:

* config/riscv/autovec.md (3): Add scalar shift
pattern.
(v3): Add vector shift pattern.
* config/riscv/vector-iterators.md: New iterator.
---
 gcc/config/riscv/autovec.md  | 40 +++-
 gcc/config/riscv/vector-iterators.md |  4 +++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8347e42bb9c..2da4fc67d51 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -65,7 +65,7 @@ (define_expand "movmisalign"
 
 (define_expand "3"
   [(set (match_operand:VI 0 "register_operand")
-(any_int_binop:VI
+(any_int_binop_no_shift:VI
  (match_operand:VI 1 "")
  (match_operand:VI 2 "")))]
   "TARGET_VECTOR"
@@ -91,3 +91,41 @@ (define_expand "3"
  NULL_RTX, mode);
   DONE;
 })
+
+;; =
+;; == Binary integer shifts by scalar.
+;; =
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+(any_shift:VI
+ (match_operand:VI 1 "register_operand")
+ (match_operand: 2 "csr_operand")))]
+  "TARGET_VECTOR"
+{
+  if (!CONST_SCALAR_INT_P (operands[2]))
+  operands[2] = gen_lowpart (Pmode, operands[2]);
+  riscv_vector::emit_len_binop (code_for_pred_scalar
+   (, mode),
+   operands[0], operands[1], operands[2],
+   NULL_RTX, mode, Pmode);
+  DONE;
+})
+
+;; =
+;; == Binary integer shifts by vector.
+;; =
+
+(define_expand "v3"
+  [(set (match_operand:VI 0 "register_operand")
+(any_shift:VI
+ (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "vector_shift_operand")))]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_len_binop (code_for_pred
+   (, mode),
+   operands[0], operands[1], operands[2],
+   NULL_RTX, mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 42848627c8c..fdb0bfbe3b1 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1429,6 +1429,10 @@ (define_code_iterator any_commutative_binop [plus and 
ior xor
 
 (define_code_iterator any_non_commutative_binop [minus div udiv mod umod])
 
+(define_code_iterator any_int_binop_no_shift
+ [plus minus and ior xor smax umax smin umin mult div udiv mod umod
+])
+
 (define_code_iterator any_immediate_binop [plus minus and ior xor])
 
 (define_code_iterator any_sat_int_binop [ss_plus ss_minus us_plus us_minus])
-- 
2.40.0



[PATCH] riscv: Add autovectorization tests for binary integer

2023-05-10 Thread Robin Dapp via Gcc-patches
Hi,

this patchs adds scan as well as execution tests for vectorized
binary integer operations.  It is based on Michael Collison's work
and also includes scalar variants.  The tests are not fully comprehensive
as the vector type promotions (vec_unpack, extend etc.) are not
implemented yet.  Also, vmulh, vmulhu, and vmulhsu and others are
still missing.

Regards
 Robin

--

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/shift-scalar-template.h: New test.
* gcc.target/riscv/rvv/autovec/shift-template.h: New test.
* gcc.target/riscv/rvv/autovec/shift-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vadd-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vadd-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vadd-template.h: New test.
* gcc.target/riscv/rvv/autovec/vand-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vand-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vand-template.h: New test.
* gcc.target/riscv/rvv/autovec/vdiv-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vdiv-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vdiv-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmax-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmax-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmax-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmin-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmin-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmin-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmul-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmul-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vmul-template.h: New test.
* gcc.target/riscv/rvv/autovec/vor-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vor-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vor-template.h: New test.
* gcc.target/riscv/rvv/autovec/vrem-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vrem-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vrem-template.h: New test.
* gcc.target/riscv/rvv/autovec/vsub-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vsub-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vsub-template.h: New test.
* gcc.target/riscv/rvv/autovec/vxor-run-template.h: New test.
* gcc.target/riscv/rvv/autovec/vxor-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vxor-template.h: New test.
---
 .../riscv/rvv/autovec/shift-run-template.h|  47 +++
 .../riscv/rvv/autovec/shift-rv32gcv.c |  12 ++
 .../riscv/rvv/autovec/shift-rv64gcv.c |  12 ++
 .../riscv/rvv/autovec/shift-scalar-rv32gcv.c  |   7 ++
 .../riscv/rvv/autovec/shift-scalar-rv64gcv.c  |   7 ++
 .../riscv/rvv/autovec/shift-scalar-template.h | 119 ++
 .../riscv/rvv/autovec/shift-template.h|  34 +
 .../riscv/rvv/autovec/vadd-run-template.h |  64 ++
 .../riscv/rvv/autovec/vadd-rv32gcv.c  |   8 ++
 .../riscv/rvv/autovec/vadd-rv64gcv.c  |   8 ++
 .../riscv/rvv/autovec/vadd-template.h |  56 +
 .../riscv/rvv/autovec/vand-run-template.h |  64 ++
 .../riscv/rvv/autovec/vand-rv32gcv.c  |   8 ++
 .../riscv/rvv/autovec/vand-rv64gcv.c  |   8 ++
 .../riscv/rvv/autovec/vand-template.h |  56 +
 .../riscv/rvv/autovec/vdiv-run-template.h |  42 +++
 .../riscv/rvv/autovec/vdiv-rv32gcv.c  |  10 ++
 .../riscv/rvv/autovec/vdiv-rv64gcv.c  |  10 ++
 .../riscv/rvv/autovec/vdiv-template.h |  34 +
 .../riscv/rvv/autovec/vmax-run-template.h |  42 +++
 .../riscv/rvv/autovec/vmax-rv32gcv.c  |   8 ++
 .../riscv/rvv/autovec/vmax-rv64gcv.c  |   8 ++
 .../riscv/rvv/autovec/vmax-template.h |  34 +
 .../riscv/rvv/autovec/vmin-run-template.h |  42 +++
 .../riscv/rvv/autovec/vmin-

[PATCH] riscv: Clarify vlmax and length handling.

2023-05-10 Thread Robin Dapp via Gcc-patches
Hi,

this patch tries to improve the wrappers that emit either vlmax or
non-vlmax operations.  Now, emit_len_op can be used to
emit a regular operation.  Depending on whether a length != NULL
is passed either no VLMAX flags are set or we emit a vsetvli and
set VLMAX flags.  The patch also adds some comments that describes
some of the rationale of the current handling of vlmax/nonvlmax
operations.

Bootstrapped and regtested.

Regards
 Robin

--

gcc/ChangeLog:

* config/riscv/autovec.md: Use renamed functions.
* config/riscv/riscv-protos.h (emit_vlmax_op): Rename.
(emit_vlmax_reg_op): To this.
(emit_nonvlmax_op): Rename.
(emit_len_op): To this.
(emit_nonvlmax_binop): Rename.
(emit_len_binop): To this.
* config/riscv/riscv-v.cc (emit_pred_op): Add default parameter.
(emit_pred_binop): Remove vlmax_p.
(emit_vlmax_op): Rename.
(emit_vlmax_reg_op): To this.
(emit_nonvlmax_op): Rename.
(emit_len_op): To this.
(emit_nonvlmax_binop): Rename.
(emit_len_binop): To this.
(sew64_scalar_helper): Use renamed functions.
(expand_tuple_move): Use renamed functions.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Use
renamed functions.
* config/riscv/vector.md: Use renamed functions.
---
 gcc/config/riscv/autovec.md | 24 +-
 gcc/config/riscv/riscv-protos.h |  8 ++--
 gcc/config/riscv/riscv-v.cc | 82 -
 gcc/config/riscv/riscv.cc   |  4 +-
 gcc/config/riscv/vector.md  | 12 +++--
 5 files changed, 75 insertions(+), 55 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 15f8d007e07..8347e42bb9c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -31,8 +31,8 @@ (define_expand "len_load_"
(match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
- operands[1], operands[2], mode);
+  riscv_vector::emit_len_op (code_for_pred_mov (mode), operands[0],
+operands[1], operands[2], mode);
   DONE;
 })
 
@@ -43,8 +43,8 @@ (define_expand "len_store_"
(match_operand 3 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
- operands[1], operands[2], mode);
+  riscv_vector::emit_len_op (code_for_pred_mov (mode), operands[0],
+operands[1], operands[2], mode);
   DONE;
 })
 
@@ -79,15 +79,15 @@ (define_expand "3"
   if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
op2mode = inner;
 
-  riscv_vector::emit_nonvlmax_binop (code_for_pred_scalar
-(, mode),
-operands[0], operands[1], cst,
-NULL_RTX, mode, op2mode);
+  riscv_vector::emit_len_binop (code_for_pred_scalar
+   (, mode),
+   operands[0], operands[1], cst,
+   NULL_RTX, mode, op2mode);
 }
   else
-riscv_vector::emit_nonvlmax_binop (code_for_pred
-  (, mode),
-  operands[0], operands[1], operands[2],
-  NULL_RTX, mode);
+riscv_vector::emit_len_binop (code_for_pred
+ (, mode),
+ operands[0], operands[1], operands[2],
+ NULL_RTX, mode);
   DONE;
 })
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 75cdb90b9c9..bfdf09b17ee 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -167,10 +167,10 @@ bool legitimize_move (rtx, rtx, machine_mode);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
-void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
-void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode,
- machine_mode = VOIDmode);
+void emit_vlmax_reg_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_len_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_len_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode, machine_mode =
+VOIDmode);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 3c43dfc5eea..07b7783282f 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -99,27 +99,24 @@ public:
 add_vundef_operand (dest_mode);
   }
 

Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-10 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, May 10, 2023 at 11:49 AM Richard Biener
>  wrote:
>>
>> On Wed, May 10, 2023 at 11:01 AM Richard Sandiford
>>  wrote:
>> >
>> > Oluwatamilore Adebayo  writes:
>> > > From 0b5f469171c340ef61a48a31877d495bb77bd35f Mon Sep 17 00:00:00 2001
>> > > From: oluade01 
>> > > Date: Fri, 14 Apr 2023 10:24:43 +0100
>> > > Subject: [PATCH 1/4] Missed opportunity to use [SU]ABD
>> > >
>> > > This adds a recognition pattern for the non-widening
>> > > absolute difference (ABD).
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > * doc/md.texi (sabd, uabd): Document them.
>> > > * internal-fn.def (ABD): Use new optab.
>> > > * optabs.def (sabd_optab, uabd_optab): New optabs,
>> > > * tree-vect-patterns.cc (vect_recog_absolute_difference):
>> > > Recognize the following idiom abs (a - b).
>> > > (vect_recog_sad_pattern): Refactor to use
>> > > vect_recog_absolute_difference.
>> > > (vect_recog_abd_pattern): Use patterns found by
>> > > vect_recog_absolute_difference to build a new ABD
>> > > internal call.
>> > > ---
>> > >  gcc/doc/md.texi   |  10 ++
>> > >  gcc/internal-fn.def   |   3 +
>> > >  gcc/optabs.def|   2 +
>> > >  gcc/tree-vect-patterns.cc | 250 +-
>> > >  4 files changed, 234 insertions(+), 31 deletions(-)
>> > >
>> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> > > index 
>> > > 07bf8bdebffb2e523f25a41f2b57e43c0276b745..0ad546c63a8deebb4b6db894f437d1e21f0245a8
>> > >  100644
>> > > --- a/gcc/doc/md.texi
>> > > +++ b/gcc/doc/md.texi
>> > > @@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to 
>> > > the
>> > >  Vector shift and rotate instructions that take vectors as operand 2
>> > >  instead of a scalar type.
>> > >
>> > > +@cindex @code{uabd@var{m}} instruction pattern
>> > > +@cindex @code{sabd@var{m}} instruction pattern
>> > > +@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
>> > > +Signed and unsigned absolute difference instructions.  These
>> > > +instructions find the difference between operands 1 and 2
>> > > +then return the absolute value.  A C code equivalent would be:
>> > > +@smallexample
>> > > +op0 = abs (op0 - op1)
>> >
>> > op0 = abs (op1 - op2)
>> >
>> > But that isn't the correct calculation for unsigned (where abs doesn't
>> > really work).  It also doesn't handle some cases correctly for signed.
>> >
>> > I think it's more:
>> >
>> >   op0 = op1 > op2 ? (unsigned type) op1 - op2 : (unsigned type) op2 - op1
>> >
>> > or (conceptually) max minus min.
>> >
>> > E.g. for 16-bit values, the absolute difference between signed 0x7fff
>> > and signed -0x8000 is 0x (reinterpreted as -1 if you cast back
>> > to signed).  But, ignoring undefined behaviour:
>> >
>> >   0x7fff - 0x8000 = -1
>> >   abs(-1) = 1
>> >
>> > which gives the wrong answer.
>> >
>> > We might still be able to fold C abs(a - b) to abd for signed a and b
>> > by relying on undefined behaviour (TYPE_OVERFLOW_UNDEFINED).  But we
>> > can't do it for -fwrapv.
>> >
>> > Richi knows better than me what would be appropriate here.
>>
>> The question is what does the hardware do?  For the widening [us]sad it's
>> obvious since the difference is computed in a wider signed mode and the
>> absolute value always fits.
>>
>> So what does it actually do, esp. when the difference yields 0x8000?
>
> A "sensible" definition would be that it works like the widening [us]sad
> and applies truncation to the result (modulo-reducing when the result
> isn't always unsigned).

Yeah.  Like Tami says, this is what the instruction does.

I think all three definitions are equivalent: the extend/operate/truncate
one, the ?: one above, and the "max - min" one.  Probably just personal
preference as to which seems more natural.

Reading the patch again, it does check TYPE_OVERFLOW_WRAPS, so -fwrapv
might be handled correctly after all.  Sorry for missing it first time.

On the patch:

> +/* Look for the following pattern
> + X = x[i]
> + Y = y[i]
> + DIFF = X - Y
> + DAD = ABS_EXPR
> + */
> +static bool
> +vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
> + tree *half_type, bool reject_unsigned,
> + vect_unpromoted_value unprom[2],
> + tree diff_oprnds[2])

It would be good to document what the parameters mean (except VINFO,
which is obvious).

> +  /* Peel off conversions from the ABS input.  This can involve sign
> + changes (e.g.  from an unsigned subtraction to a signed ABS input)
> + or signed promotion, but it can't include unsigned promotion.
> + (Note that ABS of an unsigned promotion should have been folded
> + away before now anyway.)  */
> +  vect_unpromoted_value unprom_diff;
> +  abs_oprnd = vect_look_through_possible_promotion (vinfo, abs_oprnd,
> + &unprom_diff);
> +  if (!a

[PATCH] c++: converted lambda as template argument [PR83258, ...]

2023-05-10 Thread Patrick Palka via Gcc-patches
r8-1253-g3d2e25a240c711 removed the template argument linkage requirement
in convert_nontype_argument for C++17, but we need to also remove the one
in convert_nontype_argument_function for sake of the first and third test
case which we incorrectly reject (in C++17/20 mode).

And in invalid_tparm_referent_p we're inadvertendly rejecting using the
address of a lambda's static op() due to the DECL_ARTIFICIAL check.
This patch relaxes this check for sake of the second test case which we
incorrectly reject (in C++20 mode).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 13 (since it's a relatively easy/safe fix for a
popular non-regression bug).

Co-authored-by: Jonathan Wakely 

PR c++/83258
PR c++/80488
PR c++/97700

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument_function): Disable linkage
requirement for C++17 and later.
(invalid_tparm_referent_p): Relax DECL_ARTIFICIAL check for
the artificial static op() of a lambda.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/anon8.C: Don't expect a "no linkage"
error for the template argument &B2:fn in C++17 mode.
* g++.dg/cpp0x/lambda/lambda-conv15.C: New test.
* g++.dg/cpp2a/nontype-class56.C: New test.
* g++.dg/template/function2.C: New test.
---
 gcc/cp/pt.cc  |  7 +--
 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C | 11 +++
 gcc/testsuite/g++.dg/cpp2a/nontype-class56.C  |  8 
 gcc/testsuite/g++.dg/ext/visibility/anon8.C   |  4 ++--
 gcc/testsuite/g++.dg/template/function2.C |  8 
 5 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
 create mode 100644 gcc/testsuite/g++.dg/template/function2.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 696df2bdd9f..c9b089f8fa7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6782,7 +6782,8 @@ convert_nontype_argument_function (tree type, tree expr,
 }
 
   linkage = decl_linkage (fn_no_ptr);
-  if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage != lk_external)
+  if ((cxx_dialect < cxx11 && linkage != lk_external)
+  || (cxx_dialect < cxx17 && linkage == lk_none))
 {
   if (complain & tf_error)
{
@@ -7180,7 +7181,9 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
   * a string literal (5.13.5),
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (DECL_ARTIFICIAL (decl))
+   else if (DECL_ARTIFICIAL (decl)
+/* Accept the artificial static op() of a lambda.  */
+&& !LAMBDA_TYPE_P (CP_DECL_CONTEXT (decl)))
  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
new file mode 100644
index 000..cf45e06a33d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
@@ -0,0 +1,11 @@
+// PR c++/83258
+// PR c++/80488
+// { dg-do compile { target c++11 } }
+
+template struct A { };
+
+int main() {
+  constexpr auto fp = +[]{}; // { dg-error "non-'constexpr' function" "" { 
target c++14_down } }
+  A a1;// { dg-error "not a valid template argument" "" { target 
c++14_down } }
+  A<[]{}> a2;  // { dg-error "lambda-expression in template-argument|invalid" 
"" { target c++17_down } }
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
new file mode 100644
index 000..0efd735c8a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
@@ -0,0 +1,8 @@
+// PR c++/97700
+// { dg-do compile { target c++20 } }
+
+struct S { void (*f)(); };
+
+template struct X { };
+
+X x;
diff --git a/gcc/testsuite/g++.dg/ext/visibility/anon8.C 
b/gcc/testsuite/g++.dg/ext/visibility/anon8.C
index b8507497d32..bfcc2d06df6 100644
--- a/gcc/testsuite/g++.dg/ext/visibility/anon8.C
+++ b/gcc/testsuite/g++.dg/ext/visibility/anon8.C
@@ -2,7 +2,7 @@
 // { dg-do compile }
 
 template 
-void call ()   // { dg-message "note" }
+void call ()   // { dg-message "note" "" { target c++14_down } 
}
 {
   fn ();
 }
@@ -26,7 +26,7 @@ int main ()
 static void fn2 () {}
   };
   call<&B1::fn1> ();
-  call<&B2::fn2> ();   // { dg-error "linkage|no matching" }
+  call<&B2::fn2> ();   // { dg-error "linkage|no matching" "" { target 
c++14_down } }
   call<&fn3> ();
   call<&B1::fn4> ();
   call<&fn5> ();   // { dg-error "linkage|no matching" "" { target { ! 
c++11 } } }
diff --git a/gcc/testsuite/g++.dg/template/function2.C 
b/gcc/testsuite/g++.dg/template/function2.C
new file mode 100644
index 000..ad19f24c1cd
--- /dev/nul

[PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion

2023-05-10 Thread Michael Meissner via Gcc-patches
I have posted 4 previous versions of this patch (April 26th, March 28th, March
24th, and March 21st).

In this patch, rather than just add changes to the existing code in
genfusion.pl, I rewrote the function completely.  There are two patches within
this patch set:

* The first patch rewrites the perl function to be more readable.  This
  patch produces the same output for fusion.md that the current version
  generates.

* The second patch then using the rewrite in the first patch adds the
  changes to fix the problem.

The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code when the -fstack-protector option
is used.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

These patch hav been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check these into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.

2023-05-10 Thread Michael Meissner via Gcc-patches
This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be clearer.
The resulting fusion.md file that this patch generates is exactly the same
output that the previous version of genfusion.pl generated.  The next patch in
this series will fix PR target/105325 (provide correct predicates and
constraints for power10 fusion of load and compare immediate).

This patch has been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  

gcc/

PR target/105325
* config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.
(print_ld_cmpi_p10): New function, split off from gen_ld_cmpi_p10.
(gen_ld_cmpi_p10): Rewrite completely.
---
 gcc/config/rs6000/genfusion.pl | 248 +
 1 file changed, 157 insertions(+), 91 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..81ba4b33940 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -45,103 +45,169 @@ print <<'EOF';
 
 EOF
 
-sub mode_to_ldst_char
+# Print the insns for load and compare with -1/0/1.
+# Arguments:
+# lmode  -- Integer mode ("DI", "SI", "HI", or "QI").
+# result -- "clobber", "GPR", or $lmode
+# ccmode -- Sign vs. unsigned ("CC" or "CCUNS").
+# mem_format -- Memory format ("d" or "ds").
+# cmpl   -- Suffix for compare ("l" or "")
+# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
+# extend -- "sign", "zero", or "none".
+# echr   -- Suffix for load ("a", "z", or "").
+# load   -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
+# np -- enum non_prefixed_form for memory type
+# constraint -- constraint to use
+# mem_pred   -- predicate for the memory operation
+
+sub print_ld_cmpi_p10
 {
-my ($mode) = @_;
-my %x = (DI => 'd', SI => 'w', HI => 'h', QI => 'b');
-return $x{$mode} if exists $x{$mode};
-return '?';
+  my ($lmode, $result, $ccmode, $cmpl, $const_pred,
+  $extend, $load, $np, $constraint, $mem_pred) = @_;
+
+  # For clobber, we need a SI/DI reg in case we split because we have to
+  # sign/zero extend.
+  my $clobbermode = ($lmode =~ m/^[HQ]I$/) ? "GPR" : $lmode;
+
+  # Break long print statements into smaller lines.
+  my $info = join (" ",
+  "load mode is ${lmode} result mode is ${result}",
+  "compare mode is ${ccmode} extend is ${extend}");
+
+  my $name = join ("",
+  "${load}_cmp${cmpl}di_cr0_${lmode}",
+  "_${result}_${ccmode}_${extend}");
+
+  my $cmp_op1 = "(match_operand:${lmode} 1 \"${mem_pred}\" \"${constraint}\")";
+
+  my $spaces = " " x (length ($ccmode) + 18);
+
+  print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+  print ";; ${info}\n";
+  print "(define_insn_and_split \"*${name}\"\n";
+  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+  print "(compare:${ccmode} ${cmp_op1}\n";
+  print "${spaces}(match_operand:${lmode} 3 \"${const_pred}\" \"n\")))\n";
+
+  if ($result eq "clobber")
+{
+  print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+}
+
+  else
+{
+  my $load_op0 = "(match_operand:${result} 0 \"gpc_reg_operand\" \"=r\")";
+  my $load_op1 = (($result eq $lmode)
+ ? "(match_dup 1)"
+ : "(${extend}_extend:${result} (match_dup 1))");
+  print "   (set ${load_op0} ${load_op1})]\n";
+}
+
+  # Do not match prefixed loads.  The machine only fuses non-prefixed loads
+  # with compare immediate.  Take into account whether the load is a ds-form
+  # or a d-form instruction.
+  print "  \"(TARGET_P10_FUSION)\"\n";
+  print "  \"${load}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
+  print "  \"&& reload_completed\n";
+  print "   && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+  print "   || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
+  print "  ${lmode}mode, ${np}))\"\n";
+
+  if ($extend eq "none")
+{
+  print "  [(set (match_dup 0) (match_dup 1))\n";
+}
+
+  else
+{
+  my $resultmode = ($result eq "clobber") ? $clobbermode : $result;
+  print "  [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 
1)))\n";
+}
+
+  print "   (set (match_dup 2)\n";
+  print "(compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
+  print "  \"\"\n";
+  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
+  print "   (set_attr \"cost\" \"8\")\n";
+  print "   (set_attr \"length\" \"8\")])\n";
+  print "\n";
 }
 
 sub gen_ld_cmpi_p10
 {
-my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-   $mempred, $ccmode, $np, $extend, $r

[PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.

2023-05-10 Thread Michael Meissner via Gcc-patches
This patch applies stricter predicates and constraints for LD and LWA
instructions with power10 fusion.  These instructions are DS-form instructions,
which means that the bottom 2 bits of the address must be 0.  In the past, we
did not use the stricter predicates and constraints, and if the user used the
-fstack-protector option, it would generate a non-prefixed load instruction
whose offset was too big if the stack is large.

This patch has been tested on:

* Little endian power9 with both IEEE and IBM long double
* Little endian power10
* Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  

gcc/

PR target/105325
* config/rs6000/genfusion.pl (print_ld_cmpi_p10): Use "YZ" constraints
for DS-form loads.  Set the sign_extend attribute for loads that do sign
extension.  Use the lwa_operand predicate for the LWA instruction.
* config/rs6000/fusion.md: Regenerate.

gcc/testsuite/

PR target/105325
* g++.target/powerpc/pr105325.C: New test.
* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md   | 17 +++-
 gcc/config/rs6000/genfusion.pl| 20 +++---
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 26 +++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c|  4 +--
 4 files changed, 54 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 81ba4b33940..836dbd20948 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
   print "  \"\"\n";
   print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
   print "   (set_attr \"cost\" \"8\")\n";
+
+  if ($extend eq "sign")
+{
+  print "   (set_attr \"sign_extend\" \"yes\")\n";
+}
+
   print "   (set_attr \"length\" \"8\")])\n";
   print "\n";
 }
@@ -147,9 +153,9 @@ sub gen_ld_cmpi_p10
   "HI" => "lhz",
   "QI" => "lbz");
 
-  # Memory predicate to use.
+  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.
   my %signed_memory_predicate = ("DI" => "ds_form_mem_operand",
-"SI" => "ds_form_mem_operand",
+"SI" => "lwa_operand",
 "HI" => "non_update_memory_operand");
 
   my %unsigned_memory_predicate = ("DI" => "ds_form_mem_operand",
@@ -161,6 +167,10 @@ sub gen_ld_cmpi_p10
   my %np = ("ds" => "NON_PREFIXED_DS",
"d"  => "NON_PREFIXED_D");
 
+  # Constraint to use.
+  my %constraint = ("ds" => "YZ",
+   "d"  => "m");
+
   # Result modes to use. Clobber is used when you are comparing the load to
   # -1/0/1, but you are not using it otherwise.  EXTDI does not exist. We
   # cannot directly use HI/QI results because we only have word and double word
@@ -189,7 +199,8 @@ sub gen_ld_cmpi_p10
 
  print_ld_cmpi_p10 ($lmode, $result, "CC", "",
 "const_m1_to_1_operand", $extend,
-$signed_load{$lmode}, $np{$mem_format}, "m",
+$signed_load{$lmode}, $np{$mem_format},
+$constraint{$mem_format},
 $signed_memory_predicate{$lmode});
}
 
@@ -204,7 +215,8 @@ sub gen_ld_cmpi_p10
 
  print_ld_cmpi_p10 ($lmode, $result, "CCUNS", "l",
 "const_0_to_1_operand", $extend,
-$unsigned_load{$lmode}, $np{$mem_format}, "m",
+$unsigned_load{$lmode}, $np{$mem_format},
+$constraint{$mem_format},
 $unsigned_memory_predicate{$lmode});
}
 }
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+(compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
 (match_operand:DI 3 "const_m1_to_1_operand" "n")))
(clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_op

Re: [PATCH] riscv: Add vectorized binops and insn_expander helpers.

2023-05-10 Thread 钟居哲
Thanks Robin.

A couple comments here:
+  machine_mode op2mode = Pmode;
+  if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
+ op2mode = inner;

Remove it.

+  
change it into 

+ e.add_input_operand (src2, op2mode == VOIDmode ? GET_MODE (src2) : op2mode);
Very confusing here.

+(define_code_attr BINOP_TO_UPPERCASE [
+(plus "PLUS")
+(minus "MINUS")
+(and "AND")
+(ior "IOR")
+(xor "XOR")
+(ashift "ASHIFT")
+(ashiftrt "ASHIFTRT")
+(lshiftrt "LSHIFTRT")
+(smax "SMAX")
+(umax "UMAX")
+(smin "SMIN")
+(umin "UMIN")
+(mult "MULT")
+(div "DIV")
+(udiv "UDIV")
+(mod "MOD")
+(umod "UMOD")
+])

Remove it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-10 23:24
To: gcc-patches; juzhe.zh...@rivai.ai; Kito Cheng; Michael Collison; palmer; 
jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] riscv: Add vectorized binops and insn_expander helpers.
Hi,
 
this patch adds basic binary integer operations support.  It is based
on Michael Collison's work and makes use of the existing helpers in
riscv-c.cc.  It introduces emit_nonvlmax_binop which, in turn, uses
emit_pred_binop.  Setting the destination as well as the mask and the
length is factored out into separate functions.
 
There are several things still missing, most notably the scalar variants
(.vx) as well as multiplication variants and more.
 
Bootstrapped and regtested.
 
Regards
Robin
 
--
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (3): Add integer binops.
* config/riscv/riscv-protos.h (emit_nonvlmax_binop): Declare.
* config/riscv/riscv-v.cc (emit_pred_op): New function.
(set_expander_dest_and_mask): New function.
(emit_pred_binop): New function.
(emit_nonvlmax_binop): New function.
* config/riscv/vector-iterators.md: Add new code attribute.
---
gcc/config/riscv/autovec.md  | 33 ++
gcc/config/riscv/riscv-protos.h  |  2 +
gcc/config/riscv/riscv-v.cc  | 98 ++--
gcc/config/riscv/vector-iterators.md | 22 +++
4 files changed, 136 insertions(+), 19 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index f1c5ff5951b..15f8d007e07 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -58,3 +58,36 @@ (define_expand "movmisalign"
 DONE;
   }
)
+
+;; =
+;; == Binary integer operations
+;; =
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+(any_int_binop:VI
+ (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  if (!register_operand (operands[2], mode))
+{
+  rtx cst;
+  gcc_assert (const_vec_duplicate_p(operands[2], &cst));
+  machine_mode inner = mode;
+  machine_mode op2mode = Pmode;
+  if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
+ op2mode = inner;
+
+  riscv_vector::emit_nonvlmax_binop (code_for_pred_scalar
+ (, mode),
+ operands[0], operands[1], cst,
+ NULL_RTX, mode, op2mode);
+}
+  else
+riscv_vector::emit_nonvlmax_binop (code_for_pred
+(, mode),
+operands[0], operands[1], operands[2],
+NULL_RTX, mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index c0293a306f9..75cdb90b9c9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -169,6 +169,8 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode,
+   machine_mode = VOIDmode);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 7ca49ca67c1..3c43dfc5eea 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -53,7 +53,7 @@ namespace riscv_vector {
template  class insn_expander
{
public:
-  insn_expander () : m_opno (0) {}
+  insn_expander () : m_opno (0), has_dest(false) {}
   void add_output_operand (rtx x, machine_mode mode)
   {
 create_output_operand (&m_ops[m_opno++], x, mode);
@@ -84,6 +84,44 @@ public:
 add_input_operand (gen_int_mode (type, Pmode), Pmode);
   }
+  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  {
+dest_mode = GET_MODE (dest);
+has_dest = true;
+
+add_output_operand (dest, dest_mode);
+
+if (mask)
+  add_input_operand (mask, GET_MODE (mask));
+else
+  add_all_one_mask_operand (mask_mode);
+
+add_vundef_operand (dest_mode);
+  }
+
+  void set_len_and_policy (rtx len, bool vlmax_p)
+{
+  gcc_assert (has_dest);
+  gcc_assert (len || vlmax_p);
+
+ 

Re: [PATCH take #3] match.pd: Simplify popcount/parity of bswap/rotate.

2023-05-10 Thread Bernhard Reutner-Fischer via Gcc-patches
Hi Roger!
On 10 May 2023 16:46:10 CEST, Roger Sayle  wrote:

Just a nit:

+/* { dg-final { scan-tree-dump-times "bswap" 0 "optimized" } } */

Can you please use scan-tree-dump-not instead?
thanks,


RISC-V: Remove masking third operand of rotate instructions

2023-05-10 Thread Jivan Hakobyan via Gcc-patches
Rotate instructions do not need to mask the third operand.
For example  RV64 the following code:

unsigned long foo1(unsigned long rs1, unsigned long rs2)
{
long shamt = rs2 & (64 - 1);
return (rs1 << shamt) | (rs1 >> ((64 - shamt) & (64 - 1)));
}

Compiles to:
foo1:
andia1,a1,63
rol a0,a0,a1
ret

This patch removes unnecessary masking.
Besides, I have merged masking insns for shifts that were written before.


gcc/ChangeLog:
* config/riscv/riscv.md: Merged
* config/riscv/bitmanip.md: New insns
* config/riscv/iterators.md: New iterator and optab items
* config/riscv/predicates.md: New predicates

gcc/testsuite/ChangeLog:
* testsuite/gcc.target/riscv/shift-and-2.c: Fixed test
* testsuite/gcc.target/riscv/zbb-rol-ror-01.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-02.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-03.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-04.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-05.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-06.c: New test
* testsuite/gcc.target/riscv/zbb-rol-ror-07.c: New test


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index a27fc3e34a1..0fd0cbdeb04 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -351,6 +351,42 @@
   "rolw\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn_and_split "*3_mask"
+  [(set (match_operand:X 0 "register_operand" "= r")
+(bitmanip_rotate:X
+(match_operand:X 1 "register_operand" "  r")
+(match_operator 4 "subreg_lowpart_operator"
+ [(and:X
+   (match_operand:X 2 "register_operand"  "r")
+   (match_operand 3 "" ""))])))]
+  "TARGET_ZBB || TARGET_ZBKB"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(bitmanip_rotate:X (match_dup 1)
+   (match_dup 2)))]
+  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "")])
+
+(define_insn_and_split "*si3_sext_mask"
+  [(set (match_operand:DI 0 "register_operand" "= r")
+  (sign_extend:DI (bitmanip_rotate:SI
+(match_operand:SI 1 "register_operand" "  r")
+(match_operator 4 "subreg_lowpart_operator"
+ [(and:DI
+   (match_operand:DI 2 "register_operand"  "r")
+   (match_operand 3 "const_si_mask_operand"))]]
+  "TARGET_64BIT && (TARGET_ZBB || TARGET_ZBKB)"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+  (sign_extend:DI (bitmanip_rotate:SI (match_dup 1)
+   (match_dup 2]
+  "operands[2] = gen_lowpart (QImode, operands[2]);"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "DI")])
+
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb2"
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 1d56324df03..8afe98e4410 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -117,7 +117,7 @@
 (define_mode_attr HALFMODE [(DF "SI") (DI "SI") (TF "DI")])
 
 ; bitmanip mode attribute
-(define_mode_attr shiftm1 [(SI "const31_operand") (DI "const63_operand")])
+(define_mode_attr shiftm1 [(SI "const_si_mask_operand") (DI "const_di_mask_operand")])
 (define_mode_attr shiftm1p [(SI "DsS") (DI "DsD")])
 
 ;; ---
@@ -174,6 +174,8 @@
 
 (define_code_iterator clz_ctz_pcnt [clz ctz popcount])
 
+(define_code_iterator bitmanip_rotate [rotate rotatert])
+
 ;; ---
 ;; Code Attributes
 ;; ---
@@ -271,7 +273,9 @@
   (umax "umax")
   (clz "clz")
   (ctz "ctz")
-  (popcount "popcount")])
+  (popcount "popcount")
+  (rotate "rotl")
+  (rotatert "rotr")])
 (define_code_attr bitmanip_insn [(smin "min")
  (smax "max")
  (umin "minu")
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..ffcbb9a7589 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -235,13 +235,15 @@
   (and (match_code "const_int")
(match_test "SINGLE_BIT_MASK_OPERAND (~UINTVAL (op))")))
 
-(define_predicate "const31_operand"
+(define_predicate "const_si_mask_operand"
   (and (match_code "const_int")
-   (match_test "INTVAL (op) == 31")))
+   (match_test "(INTVAL (op) & (GET_MODE_BITSIZE (SImode) - 1))
+== GET_MODE_BITSIZE (SImode) - 1")))
 
-(define_predicate "const63_operand"
+(define_predicate "const_di_mask_operand"
   (and (match_code "const_int")
-   (match_test "INTVAL (op) == 63")))
+   (match_test "(INTVAL (op) & (GET_MODE_BITSIZE (DImode) - 1))
+

Re: [PATCH v3] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-10 Thread Richard Sandiford via Gcc-patches
Thanks, mostly looks good to me.  Some minor comments below.

pan2...@intel.com writes:
> From: Pan Li 
>
> The decl_or_value is defined as void * before this PATCH. It will take
> care of both the tree_node and rtx_def. Unfortunately, given a void
> pointer cannot tell the input is tree_node or rtx_def.
>
> Then we have some implicit structure layout requirement similar as
> below. Or we will touch unreasonable bits when cast void * to tree_node
> or rtx_def.
>
> ++---+--+
> | offset | tree_node | rtx_def  |
> ++---+--+
> |  0 | code: 16  | code: 16 | <- require the location and bitssize
> ++---+--+
> | 16 | ...   | mode: 8  |
> ++---+--+
> | ...   |
> ++---+--+
> | 24 | ...   | ...  |
> ++---+--+
>
> This behavior blocks the PATCH that extend the rtx_def mode from 8 to
> 16 bits for running out of machine mode. This PATCH introduced the
> pointer_mux to tell the input is tree_node or rtx_def, and decouple
> the above implicition dependency.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Richard Sandiford 
> Co-Authored-By: Richard Biener 
> Co-Authored-By: Jakub Jelinek 
>
> gcc/ChangeLog:
>
>   * mux-utils.h: Add overload operator == and != for pointer_mux.
>   * var-tracking.cc: Included mux-utils.h for pointer_tmux.
>   (decl_or_value): Changed from void * to pointer_mux.
>   (dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
>   (dv_as_decl): Ditto.
>   (dv_as_opaque): Removed due to unnecessary.
>   (struct variable_hasher): Take decl_or_value as compare_type.
>   (variable_hasher::equal): Diito.
>   (dv_from_decl): Reconciled to the new type, aka pointer_mux.
>   (dv_from_value): Ditto.
>   (attrs_list_member): Ditto.
>   (vars_copy): Ditto.
>   (var_reg_decl_set): Ditto.
>   (var_reg_delete_and_set): Ditto.
>   (find_loc_in_1pdv): Ditto.
>   (canonicalize_values_star): Ditto.
>   (variable_post_merge_new_vals): Ditto.
>   (dump_onepart_variable_differences): Ditto.
>   (variable_different_p): Ditto.
>   (variable_was_changed): Ditto.
>   (set_slot_part): Ditto.
>   (clobber_slot_part): Ditto.
>   (clobber_variable_part): Ditto.
>   (remove_value_from_changed_variables): Ditto.
>   (notify_dependents_of_changed_value): Ditto.
> ---
>  gcc/mux-utils.h | 12 ++
>  gcc/var-tracking.cc | 96 ++---
>  2 files changed, 51 insertions(+), 57 deletions(-)
>
> diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
> index a2b6a316899..adf3d3b722b 100644
> --- a/gcc/mux-utils.h
> +++ b/gcc/mux-utils.h
> @@ -72,6 +72,18 @@ public:
>// Return true unless the pointer is a null A pointer.
>explicit operator bool () const { return m_ptr; }
>  
> +  // Return true if class has the same m_ptr, or false.
> +  bool operator == (const pointer_mux &other) const
> +{
> +  return this->m_ptr == other.m_ptr;
> +}
> +
> +  // Return true if class has the different m_ptr, or false.
> +  bool operator != (const pointer_mux &other) const
> +{
> +  return this->m_ptr != other.m_ptr;
> +}
> +

The current code tries to follow the coding standard rule that functions
should be defined outside the class if the whole thing doesn't fit on
one line.  Admittedly that's not widely followed, but we might as well
continue to stick to it here.

The comment shouldn't talk about m_ptr, since that's an internal
implementation detail rather than a user-facing thing.  I think it's
OK to leave the functions uncommented, since it's obvious what ==
and != do.

>// Assign A and B pointers respectively.
>void set_first (T1 *ptr) { *this = first (ptr); }
>void set_second (T2 *ptr) { *this = second (ptr); }
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
> index fae0c73e02f..7a35f49020a 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,6 +116,7 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> @@ -197,14 +198,14 @@ struct micro_operation
>  
>  
>  /* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
>  
>  /* Return true if a decl_or_value DV is a DECL or NULL.  */
>  static inline bool
>  dv_is_decl_p (decl_or_value dv)
>  {
> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
> +  return dv.is_first ();
>  }
>  
>  /* Return true if a decl_or_value is a VALUE rtl.  */
> @@ -219,7 +220,7 @@ static inline tree
>  dv_as_decl (decl_or_value dv)
>  {
>gcc_checking_assert (dv_is_decl_p (dv));
> -  return (tree) dv;
> +  return dv.known_first ();
>  }
>  
>  /* Return the value in

[committed] MAINTAINERS: Add myself to write after approval

2023-05-10 Thread Pan Li via Gcc-patches
From: Pan Li 

Signed-off-by: Pan Li 

ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b609411a30..4b846c6b288 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -520,6 +520,7 @@ Marc Lehmann

 James Lemke
 Ilya Leoshkevich   
 Kriang Lerdsuwanakij   
+Pan Li 
 Renlin Li  
 Xinliang David Li  
 Chen Liqin 
-- 
2.34.1



Re: [PATCH] c++: converted lambda as template argument [PR83258, ...]

2023-05-10 Thread Jason Merrill via Gcc-patches

On 5/10/23 11:36, Patrick Palka wrote:

r8-1253-g3d2e25a240c711 removed the template argument linkage requirement
in convert_nontype_argument for C++17, but we need to also remove the one
in convert_nontype_argument_function for sake of the first and third test
case which we incorrectly reject (in C++17/20 mode).

And in invalid_tparm_referent_p we're inadvertendly rejecting using the
address of a lambda's static op() due to the DECL_ARTIFICIAL check.
This patch relaxes this check for sake of the second test case which we
incorrectly reject (in C++20 mode).

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 13 (since it's a relatively easy/safe fix for a
popular non-regression bug).

Co-authored-by: Jonathan Wakely 

PR c++/83258
PR c++/80488
PR c++/97700

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument_function): Disable linkage
requirement for C++17 and later.
(invalid_tparm_referent_p): Relax DECL_ARTIFICIAL check for
the artificial static op() of a lambda.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/anon8.C: Don't expect a "no linkage"
error for the template argument &B2:fn in C++17 mode.
* g++.dg/cpp0x/lambda/lambda-conv15.C: New test.
* g++.dg/cpp2a/nontype-class56.C: New test.
* g++.dg/template/function2.C: New test.
---
  gcc/cp/pt.cc  |  7 +--
  gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C | 11 +++
  gcc/testsuite/g++.dg/cpp2a/nontype-class56.C  |  8 
  gcc/testsuite/g++.dg/ext/visibility/anon8.C   |  4 ++--
  gcc/testsuite/g++.dg/template/function2.C |  8 
  5 files changed, 34 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
  create mode 100644 gcc/testsuite/g++.dg/template/function2.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 696df2bdd9f..c9b089f8fa7 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6782,7 +6782,8 @@ convert_nontype_argument_function (tree type, tree expr,
  }
  
linkage = decl_linkage (fn_no_ptr);

-  if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage != lk_external)
+  if ((cxx_dialect < cxx11 && linkage != lk_external)
+  || (cxx_dialect < cxx17 && linkage == lk_none))
  {
if (complain & tf_error)
{
@@ -7180,7 +7181,9 @@ invalid_tparm_referent_p (tree type, tree expr, 
tsubst_flags_t complain)
   * a string literal (5.13.5),
   * the result of a typeid expression (8.2.8), or
   * a predefined __func__ variable (11.4.1).  */
-   else if (DECL_ARTIFICIAL (decl))
+   else if (DECL_ARTIFICIAL (decl)
+/* Accept the artificial static op() of a lambda.  */
+&& !LAMBDA_TYPE_P (CP_DECL_CONTEXT (decl)))


Maybe check for FUNCTION_DECL instead?  I think the cases we want to 
diagnose are all VAR_DECL.



  {
if (complain & tf_error)
  error ("the address of %qD is not a valid template argument",
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
new file mode 100644
index 000..cf45e06a33d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
@@ -0,0 +1,11 @@
+// PR c++/83258
+// PR c++/80488
+// { dg-do compile { target c++11 } }
+
+template struct A { };
+
+int main() {
+  constexpr auto fp = +[]{}; // { dg-error "non-'constexpr' function" "" { 
target c++14_down } }
+  A a1;// { dg-error "not a valid template argument" "" { target 
c++14_down } }
+  A<[]{}> a2;  // { dg-error "lambda-expression in template-argument|invalid" 
"" { target c++17_down } }
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C 
b/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
new file mode 100644
index 000..0efd735c8a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
@@ -0,0 +1,8 @@
+// PR c++/97700
+// { dg-do compile { target c++20 } }
+
+struct S { void (*f)(); };
+
+template struct X { };
+
+X x;
diff --git a/gcc/testsuite/g++.dg/ext/visibility/anon8.C 
b/gcc/testsuite/g++.dg/ext/visibility/anon8.C
index b8507497d32..bfcc2d06df6 100644
--- a/gcc/testsuite/g++.dg/ext/visibility/anon8.C
+++ b/gcc/testsuite/g++.dg/ext/visibility/anon8.C
@@ -2,7 +2,7 @@
  // { dg-do compile }
  
  template 

-void call ()   // { dg-message "note" }
+void call ()   // { dg-message "note" "" { target c++14_down } 
}
  {
fn ();
  }
@@ -26,7 +26,7 @@ int main ()
  static void fn2 () {}
};
call<&B1::fn1> ();
-  call<&B2::fn2> (); // { dg-error "linkage|no matching" }
+  call<&B2::fn2> (); // { dg-error "linkage|no matching" "" { target 
c++14_down } }
call<&fn3> ();
call<&B1::fn4> ();
call<&fn5> ();// { dg-error "linkage|no matching" "" { target { ! c++11 
} } }

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-10 Thread Richard Sandiford via Gcc-patches
In addition to Jeff's comments:

juzhe.zh...@rivai.ai writes:
> [...]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The value of operand 0 is target dependent and flexible in each iteration.
> +The operation of this pattern can be:
> +
> +@smallexample
> +Case 1:
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +
> +In this case, only the last iteration of the loop is partial iteration.
> +@end smallexample
> +
> +@smallexample
> +Case 2:
> +if (operand1 <= operand2)
> +  operand0 = operand1;
> +else if (operand1 < 2 * operand2)
> +  operand0 = IN_RANGE (ceil (operand1 / 2), operand2);

GCC's IN_RANGE is a predicate, so it would be best to avoid that here.
Why isn't it simply ceil (operand1 / 2), which must be <= operand2?

> +else
> +  operand0 = operand2;
> +
> +This case will evenly distribute work over the last 2 iterations of a 
> stripmine loop.
> +@end smallexample
> +
> +The output of this pattern is not only used as IV of loop control counter, 
> but also
> +is used as the IV of address calculation with multiply/shift operation. This 
> allow
> +us dynamic adjust the number of elements is processed in each iteration of 
> the loop.
> +
>  @cindex @code{check_raw_ptrs@var{m}} instruction pattern
>  @item @samp{check_raw_ptrs@var{m}}
>  Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> [...]
> diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
> index 909b705d00d..5abca64379e 100644
> --- a/gcc/tree-ssa-loop-manip.cc
> +++ b/gcc/tree-ssa-loop-manip.cc
> @@ -47,7 +47,9 @@ along with GCC; see the file COPYING3.  If not see
> so that we can free them all at once.  */
>  static bitmap_obstack loop_renamer_obstack;
>  
> -/* Creates an induction variable with value BASE + STEP * iteration in LOOP.
> +/* Creates an induction variable with value BASE (+/-) STEP * iteration in 
> LOOP.
> +   If CODE is PLUS_EXPR, the induction variable is BASE + STEP * iteration.
> +   If CODE is MINUS_EXPR, the induction variable is BASE - STEP * iteration.
> It is expected that neither BASE nor STEP are shared with other 
> expressions
> (unless the sharing rules allow this).  Use VAR as a base var_decl for it
> (if NULL, a new temporary will be created).  The increment will occur at
> @@ -57,8 +59,8 @@ static bitmap_obstack loop_renamer_obstack;
> VAR_AFTER (unless they are NULL).  */
>  
>  void
> -create_iv (tree base, tree step, tree var, class loop *loop,
> -gimple_stmt_iterator *incr_pos, bool after,
> +create_iv (tree base, tree_code code, tree step, tree var,
> +class loop *loop, gimple_stmt_iterator *incr_pos, bool after,
>  tree *var_before, tree *var_after)
>  {
>gassign *stmt;
> @@ -66,7 +68,9 @@ create_iv (tree base, tree step, tree var, class loop *loop,
>tree initial, step1;
>gimple_seq stmts;
>tree vb, va;
> -  enum tree_code incr_op = PLUS_EXPR;
> +  /* The code can only be PLUS_EXPR or MINUS_EXPR.  */
> +  gcc_assert (code == PLUS_EXPR || code == MINUS_EXPR);
> +  tree_code incr_op = code;

As Richard said, we should be able to get rid of incr_op, probably
by calling the parameter incr_op.

I think the later:

  incr_op = MINUS_EXPR;

stmts need to be updated to something that flips between PLUS_EXPR
and MINUS_EXPR (with updates to the comments).

It would probably make sense to split the create_iv part out as a
separate prepatch.

>edge pe = loop_preheader_edge (loop);
>  
>if (var != NULL_TREE)
> @@ -1365,7 +1369,7 @@ tree_transform_and_unroll_loop (class loop *loop, 
> unsigned factor,
>tree ctr_before, ctr_after;
>gimple_stmt_iterator bsi = gsi_last_nondebug_bb (new_exit->src);
>exit_if = as_a  (gsi_stmt (bsi));
> -  create_iv (exit_base, exit_step, NULL_TREE, loop,
> +  create_iv (exit_base, PLUS_EXPR, exit_step, NULL_TREE, loop,
>&bsi, false, &ctr_before, &ctr_after);
>gimple_cond_set_code (exit_if, exit_cmp);
>gimple_cond_set_lhs (exit_if, ctr_after);
> @@ -1580,8 +1584,8 @@ canonicalize_loop_ivs (class loop *loop, tree *nit, 
> bool bump_in_latch)
>  gsi = gsi_last_bb (loop->latch);
>else
>  gsi = gsi_last_nondebug_bb (loop->header);
> -  create_iv (build_int_cst_type (type, 0), b

[PATCH v2] Fortran: Narrow return types [PR78798]

2023-05-10 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

PR fortran/78798
* array.cc (compare_bounds): Use narrower return type.
(gfc_compare_array_spec): Likewise.
(is_constant_element): Likewise.
(gfc_constant_ac): Likewise.
* check.cc (dim_rank_check): Likewise.
* cpp.cc (gfc_cpp_init_options): Likewise.
(dump_macro): Likewise.
* cpp.h (gfc_cpp_handle_option): Likewise.
* dependency.cc (gfc_ref_needs_temporary_p): Likewise.
(gfc_check_argument_dependency): Likewise.
(gfc_check_fncall_dependency): Likewise.
(ref_same_as_full_array): Likewise.
* dependency.h (gfc_check_fncall_dependency): Likewise.
(gfc_dep_resolver): Likewise.
(gfc_are_equivalenced_arrays): Likewise.
* expr.cc (gfc_copy_ref): Likewise.
(gfc_kind_max): Likewise.
(numeric_type): Likewise.
* gfortran.h (gfc_at_end): Likewise.
(gfc_at_eof): Likewise.
(gfc_at_bol): Likewise.
(gfc_at_eol): Likewise.
(gfc_define_undef_line): Likewise.
(gfc_wide_is_printable): Likewise.
(gfc_wide_is_digit): Likewise.
(gfc_wide_fits_in_byte): Likewise.
(gfc_find_sym_tree): Likewise.
(gfc_generic_intrinsic): Likewise.
(gfc_specific_intrinsic): Likewise.
(gfc_intrinsic_actual_ok): Likewise.
(gfc_has_vector_index): Likewise.
(gfc_numeric_ts): Likewise.
(gfc_impure_variable): Likewise.
(gfc_pure): Likewise.
(gfc_implicit_pure): Likewise.
(gfc_elemental): Likewise.
(gfc_pure_function): Likewise.
(gfc_implicit_pure_function): Likewise.
(gfc_compare_array_spec): Likewise.
(gfc_constant_ac): Likewise.
(gfc_expanded_ac): Likewise.
(gfc_check_digit): Likewise.
* intrinsic.cc (gfc_find_subroutine): Likewise.
(gfc_generic_intrinsic): Likewise.
(gfc_specific_intrinsic): Likewise.
* io.cc (compare_to_allowed_values): Likewise. And remove
unneeded forward declaration.
* misc.cc (gfc_done_2): Likewise.
* parse.cc: Likewise.
* parse.h (gfc_check_do_variable): Likewise.
* primary.cc (gfc_check_digit): Likewise.
* resolve.cc (resolve_structure_cons): Likewise.
(pure_stmt_function): Likewise.
(gfc_pure_function): Likewise.
(impure_stmt_fcn): Likewise.
(resolve_forall_iterators): Likewise.
(resolve_data): Likewise.
(gfc_impure_variable): Likewise.
(gfc_pure): Likewise.
(gfc_unset_implicit_pure): Likewise.
* scanner.cc (wide_is_ascii): Likewise.
(gfc_wide_toupper): Likewise.
(gfc_open_included_file): Likewise.
(gfc_at_end): Likewise.
(gfc_at_eof): Likewise.
(gfc_at_bol): Likewise.
(skip_comment_line): Likewise.
(gfc_gobble_whitespace): Likewise.
* symbol.cc (gfc_find_symtree_in_proc): Likewise.
* trans-array.cc: Likewise.
* trans-decl.cc (gfc_set_decl_assembler_name): Likewise.
* trans-types.cc (gfc_get_element_type): Likewise.
(gfc_add_field_to_struct): Likewise.
* trans-types.h (gfc_copy_dt_decls_ifequal): Likewise.
(gfc_return_by_reference): Likewise.
(gfc_is_nodesc_array): Likewise.
* trans.h (gfc_can_put_var_on_stack): Likewise.
---
Bootstrapped without new warnings and regression tested on
x86_64-linux with no regressions, OK for trunk?

 gcc/fortran/array.cc   |  8 +++
 gcc/fortran/check.cc   |  2 +-
 gcc/fortran/cpp.cc |  3 +--
 gcc/fortran/cpp.h  |  2 +-
 gcc/fortran/dependency.cc  |  8 +++
 gcc/fortran/dependency.h   |  6 ++---
 gcc/fortran/expr.cc|  6 ++---
 gcc/fortran/gfortran.h | 48 +++---
 gcc/fortran/intrinsic.cc   |  6 ++---
 gcc/fortran/io.cc  | 13 ++-
 gcc/fortran/parse.cc   |  2 +-
 gcc/fortran/parse.h|  2 +-
 gcc/fortran/primary.cc |  4 ++--
 gcc/fortran/resolve.cc | 22 -
 gcc/fortran/scanner.cc | 20 
 gcc/fortran/symbol.cc  |  2 +-
 gcc/fortran/trans-array.cc |  2 +-
 gcc/fortran/trans-decl.cc  |  2 +-
 gcc/fortran/trans-types.cc |  6 ++---
 gcc/fortran/trans-types.h  |  6 ++---
 gcc/fortran/trans.h|  2 +-
 21 files changed, 81 insertions(+), 91 deletions(-)

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index be5eb8b6a0f..4b7c1e715bf 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -994,7 +994,7 @@ compare_bounds (gfc_expr *bound1, gfc_expr *bound2)
 /* Compares two array specifications.  They must be constant or deferred
shape.  */
 
-int
+bool
 gfc_compare_array_spec (gfc_array_spec *as1, gfc_array_spec *as2)
 {
   int i;
@@ -1039,7 +1039,7 @@ gfc_compare_array_spec (gfc_array_spec *as1, 
gfc_array_spec *as2)
use the symbol as an implied-DO iterator.  Return

[PATCH 1/2] Fortran: dump-parse-tree attribs: fix unbalanced braces [PR109624]

2023-05-10 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

PR fortran/109624
* dump-parse-tree.cc (debug): New function for gfc_namespace.
(gfc_debug_code): Delete forward declaration.
(show_attr): Make sure to print balanced braces.

---
(gdb) call debug(gfc_current_ns)

Namespace: A-H: (REAL 4) I-N: (INTEGER 4) O-Z: (REAL 4)
procedure name = fmodule
  symtree: 'C_ptr'   || symbol: 'c_ptr'
type spec : (UNKNOWN 0)
attributes: )

There is an open brace missing after "attributes: "

Regression tested on x86_64-linux, OK for trunk?
---
 gcc/fortran/dump-parse-tree.cc | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 1fc1f311e84..2380fa04796 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -125,6 +125,16 @@ void debug (gfc_ref *p)
   dumpfile = tmp;
 }
 
+void
+debug (gfc_namespace *ns)
+{
+  FILE *tmp = dumpfile;
+  dumpfile = stderr;
+  show_namespace (ns);
+  fputc ('\n', dumpfile);
+  dumpfile = tmp;
+}
+
 void
 gfc_debug_expr (gfc_expr *e)
 {
@@ -136,7 +146,6 @@ gfc_debug_expr (gfc_expr *e)
 }
 
 /* Allow for dumping of a piece of code in the debugger.  */
-void gfc_debug_code (gfc_code *c);
 
 void
 gfc_debug_code (gfc_code *c)
@@ -758,12 +767,13 @@ show_expr (gfc_expr *p)
 static void
 show_attr (symbol_attribute *attr, const char * module)
 {
+  fputc ('(', dumpfile);
   if (attr->flavor != FL_UNKNOWN)
 {
   if (attr->flavor == FL_DERIVED && attr->pdt_template)
-   fputs (" (PDT-TEMPLATE", dumpfile);
+   fputs ("PDT-TEMPLATE ", dumpfile);
   else
-fprintf (dumpfile, "(%s ", gfc_code2string (flavors, attr->flavor));
+   fprintf (dumpfile, "%s ", gfc_code2string (flavors, attr->flavor));
 }
   if (attr->access != ACCESS_UNKNOWN)
 fprintf (dumpfile, "%s ", gfc_code2string (access_types, attr->access));
-- 
2.30.2



[PATCH 2/2] Fortran: dump-parse-tree: Mark debug functions with DEBUG_FUNCTION

2023-05-10 Thread Bernhard Reutner-Fischer via Gcc-patches
From: Bernhard Reutner-Fischer 

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (gfc_debug_expr): Remove forward declaration.
(debug): Add DEBUG_FUNCTION.
(show_code_node): Remove erroneous whitespace.

---
Regression tested on x86_64-linux, OK for trunk?
---
 gcc/fortran/dump-parse-tree.cc | 38 --
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 2380fa04796..644f8f37d63 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -55,10 +55,8 @@ static void show_typespec (gfc_typespec *);
 static void show_ref (gfc_ref *);
 static void show_attr (symbol_attribute *, const char *);
 
-/* Allow dumping of an expression in the debugger.  */
-void gfc_debug_expr (gfc_expr *);
-
-void debug (symbol_attribute *attr)
+DEBUG_FUNCTION void
+debug (symbol_attribute *attr)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -67,7 +65,8 @@ void debug (symbol_attribute *attr)
   dumpfile = tmp;
 }
 
-void debug (gfc_formal_arglist *formal)
+DEBUG_FUNCTION void
+debug (gfc_formal_arglist *formal)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -80,12 +79,14 @@ void debug (gfc_formal_arglist *formal)
   dumpfile = tmp;
 }
 
-void debug (symbol_attribute attr)
+DEBUG_FUNCTION void
+debug (symbol_attribute attr)
 {
   debug (&attr);
 }
 
-void debug (gfc_expr *e)
+DEBUG_FUNCTION void
+debug (gfc_expr *e)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -102,7 +103,8 @@ void debug (gfc_expr *e)
   dumpfile = tmp;
 }
 
-void debug (gfc_typespec *ts)
+DEBUG_FUNCTION void
+debug (gfc_typespec *ts)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -111,12 +113,14 @@ void debug (gfc_typespec *ts)
   dumpfile = tmp;
 }
 
-void debug (gfc_typespec ts)
+DEBUG_FUNCTION void
+debug (gfc_typespec ts)
 {
   debug (&ts);
 }
 
-void debug (gfc_ref *p)
+DEBUG_FUNCTION void
+debug (gfc_ref *p)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -125,7 +129,7 @@ void debug (gfc_ref *p)
   dumpfile = tmp;
 }
 
-void
+DEBUG_FUNCTION void
 debug (gfc_namespace *ns)
 {
   FILE *tmp = dumpfile;
@@ -135,7 +139,7 @@ debug (gfc_namespace *ns)
   dumpfile = tmp;
 }
 
-void
+DEBUG_FUNCTION void
 gfc_debug_expr (gfc_expr *e)
 {
   FILE *tmp = dumpfile;
@@ -147,7 +151,7 @@ gfc_debug_expr (gfc_expr *e)
 
 /* Allow for dumping of a piece of code in the debugger.  */
 
-void
+DEBUG_FUNCTION void
 gfc_debug_code (gfc_code *c)
 {
   FILE *tmp = dumpfile;
@@ -157,7 +161,8 @@ gfc_debug_code (gfc_code *c)
   dumpfile = tmp;
 }
 
-void debug (gfc_symbol *sym)
+DEBUG_FUNCTION void
+debug (gfc_symbol *sym)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
@@ -2513,7 +2518,7 @@ show_code_node (int level, gfc_code *c)
 case EXEC_SYNC_MEMORY:
   fputs ("SYNC MEMORY ", dumpfile);
   if (c->expr2 != NULL)
-   {
+   {
  fputs (" stat=", dumpfile);
  show_expr (c->expr2);
}
@@ -4031,7 +4036,8 @@ gfc_dump_global_symbols (FILE *f)
 
 /* Show an array ref.  */
 
-void debug (gfc_array_ref *ar)
+DEBUG_FUNCTION void
+debug (gfc_array_ref *ar)
 {
   FILE *tmp = dumpfile;
   dumpfile = stderr;
-- 
2.30.2



RE: [PATCH 01/20] arm: [MVE intrinsics] factorize vcmp

2023-05-10 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Wednesday, May 10, 2023 2:30 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 01/20] arm: [MVE intrinsics] factorize vcmp
> 
> Factorize vcmp so that they use the same pattern.
> 

Besides my comments on patch 15/20, this series looks good.
So once that patch is resolved, this series is ok for trunk.
Thanks,
Kyrill

> 2022-10-25  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (MVE_CMP_M, MVE_CMP_M_F,
> MVE_CMP_M_N)
>   (MVE_CMP_M_N_F, mve_cmp_op1): New.
>   (isu): Add VCMP*
>   (supf): Likewise.
>   * config/arm/mve.md (mve_vcmpq_n_):
> Rename into ...
>   (@mve_vcmpq_n_): ... this.
>   (mve_vcmpeqq_m_f, mve_vcmpgeq_m_f)
>   (mve_vcmpgtq_m_f, mve_vcmpleq_m_f)
>   (mve_vcmpltq_m_f, mve_vcmpneq_m_f): Merge into
> ...
>   (@mve_vcmpq_m_f): ... this.
>   (mve_vcmpcsq_m_u, mve_vcmpeqq_m_)
>   (mve_vcmpgeq_m_s, mve_vcmpgtq_m_s)
>   (mve_vcmphiq_m_u, mve_vcmpleq_m_s)
>   (mve_vcmpltq_m_s, mve_vcmpneq_m_):
> Merge into
>   ...
>   (@mve_vcmpq_m_): ... this.
>   (mve_vcmpcsq_m_n_u,
> mve_vcmpeqq_m_n_)
>   (mve_vcmpgeq_m_n_s, mve_vcmpgtq_m_n_s)
>   (mve_vcmphiq_m_n_u, mve_vcmpleq_m_n_s)
>   (mve_vcmpltq_m_n_s, mve_vcmpneq_m_n_):
> Merge
>   into ...
>   (@mve_vcmpq_m_n_): ... this.
>   (mve_vcmpeqq_m_n_f, mve_vcmpgeq_m_n_f)
>   (mve_vcmpgtq_m_n_f, mve_vcmpleq_m_n_f)
>   (mve_vcmpltq_m_n_f, mve_vcmpneq_m_n_f):
> Merge into ...
>   (@mve_vcmpq_m_n_f): ... this.
> ---
>  gcc/config/arm/iterators.md | 108 ++
>  gcc/config/arm/mve.md   | 414 +++-
>  2 files changed, 135 insertions(+), 387 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 3c70fd7f56d..ef9fae0412b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -583,6 +583,47 @@ (define_int_iterator MVE_FP_CREATE_ONLY [
>VCREATEQ_F
>])
> 
> +;; MVE comparison iterators
> +(define_int_iterator MVE_CMP_M [
> +  VCMPCSQ_M_U
> +  VCMPEQQ_M_S VCMPEQQ_M_U
> +  VCMPGEQ_M_S
> +  VCMPGTQ_M_S
> +  VCMPHIQ_M_U
> +  VCMPLEQ_M_S
> +  VCMPLTQ_M_S
> +  VCMPNEQ_M_S VCMPNEQ_M_U
> +  ])
> +
> +(define_int_iterator MVE_CMP_M_F [
> +  VCMPEQQ_M_F
> +  VCMPGEQ_M_F
> +  VCMPGTQ_M_F
> +  VCMPLEQ_M_F
> +  VCMPLTQ_M_F
> +  VCMPNEQ_M_F
> +  ])
> +
> +(define_int_iterator MVE_CMP_M_N [
> +  VCMPCSQ_M_N_U
> +  VCMPEQQ_M_N_S VCMPEQQ_M_N_U
> +  VCMPGEQ_M_N_S
> +  VCMPGTQ_M_N_S
> +  VCMPHIQ_M_N_U
> +  VCMPLEQ_M_N_S
> +  VCMPLTQ_M_N_S
> +  VCMPNEQ_M_N_S VCMPNEQ_M_N_U
> +  ])
> +
> +(define_int_iterator MVE_CMP_M_N_F [
> +  VCMPEQQ_M_N_F
> +  VCMPGEQ_M_N_F
> +  VCMPGTQ_M_N_F
> +  VCMPLEQ_M_N_F
> +  VCMPLTQ_M_N_F
> +  VCMPNEQ_M_N_F
> +  ])
> +
>  (define_int_iterator MVE_VMAXVQ_VMINVQ [
>VMAXAVQ_S
>VMAXVQ_S VMAXVQ_U
> @@ -655,6 +696,37 @@ (define_code_attr mve_addsubmul [
>(plus "vadd")
>])
> 
> +(define_int_attr mve_cmp_op1 [
> +  (VCMPCSQ_M_U "cs")
> +  (VCMPEQQ_M_S "eq") (VCMPEQQ_M_U "eq")
> +  (VCMPGEQ_M_S "ge")
> +  (VCMPGTQ_M_S "gt")
> +  (VCMPHIQ_M_U "hi")
> +  (VCMPLEQ_M_S "le")
> +  (VCMPLTQ_M_S "lt")
> +  (VCMPNEQ_M_S "ne") (VCMPNEQ_M_U "ne")
> +  (VCMPEQQ_M_F "eq")
> +  (VCMPGEQ_M_F "ge")
> +  (VCMPGTQ_M_F "gt")
> +  (VCMPLEQ_M_F "le")
> +  (VCMPLTQ_M_F "lt")
> +  (VCMPNEQ_M_F "ne")
> +  (VCMPCSQ_M_N_U "cs")
> +  (VCMPEQQ_M_N_S "eq") (VCMPEQQ_M_N_U "eq")
> +  (VCMPGEQ_M_N_S "ge")
> +  (VCMPGTQ_M_N_S "gt")
> +  (VCMPHIQ_M_N_U "hi")
> +  (VCMPLEQ_M_N_S "le")
> +  (VCMPLTQ_M_N_S "lt")
> +  (VCMPNEQ_M_N_S "ne") (VCMPNEQ_M_N_U "ne")
> +  (VCMPEQQ_M_N_F "eq")
> +  (VCMPGEQ_M_N_F "ge")
> +  (VCMPGTQ_M_N_F "gt")
> +  (VCMPLEQ_M_N_F "le")
> +  (VCMPLTQ_M_N_F "lt")
> +  (VCMPNEQ_M_N_F "ne")
> +  ])
> +
>  (define_int_attr mve_insn [
>(VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F
> "vabd")
>(VABDQ_S "vabd") (VABDQ_U "vabd") (VAB

[PATCH] Add another new testcase

2023-05-10 Thread Andrew Pinski via Gcc-patches
While working on improving min/max detection, this
code (which is reduced from worse_state in ipa-pure-const.cc)
was being miscompiled. Since there was no testcase in the
testsuite yet for this, this patch adds one.

Committed as obvious after testing the testcase via:
make check-gcc RUNTESTFLAGS="execute.exp=20230510-1.c"

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/20230510-1.c: New test.
---
 .../gcc.c-torture/execute/20230510-1.c| 34 +++
 1 file changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/20230510-1.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/20230510-1.c 
b/gcc/testsuite/gcc.c-torture/execute/20230510-1.c
new file mode 100644
index 000..ec9c9e6eae4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/20230510-1.c
@@ -0,0 +1,34 @@
+/* This code shows up in worse_state in ipa-pure-const.cc:
+   *looping = MAX (*looping, looping2);
+   was miscompiling it as just `return 1` though instead of
+   `MAX_EXPR<*a, b>` (which should be transformed into `*a | b`
+   note MAX_EXPR is really `bool | bool` so we
+   use that to compare against here.
+ */
+#define bool _Bool
+bool __attribute__((noipa)) f(bool *a, bool b)
+{
+  bool t = *a;
+  if (t <= b)
+return b;
+  return t;
+}
+bool __attribute__((noipa)) f1(bool *a, bool b)
+{
+  return *a | b;
+}
+
+int main()
+{
+  int i = 0;
+  int j = 0;
+
+  for (i = 0; i <= 1; i++)
+for (j = 0; j <= 1; j++)
+  {
+bool a = i;
+if (f(&a, j) != f1(&a, j))
+  __builtin_abort();
+  }
+  return 0;
+}
-- 
2.31.1



Re: [PATCH] riscv: Add vectorized binops and insn_expander helpers.

2023-05-10 Thread Robin Dapp via Gcc-patches
> +  machine_mode op2mode = Pmode;
> +  if (inner == E_QImode || inner == E_HImode || inner == E_SImode)
> + op2mode = inner;

This I added in order to match the scalar variants like

  [(set (match_operand:VI_QHS 0 "register_operand"  "=vd,vd, vr, vr")
(if_then_else:VI_QHS
  (unspec:
[(match_operand: 1 "vector_mask_operand" "vm,vm,Wc1,Wc1")
 (match_operand 5 "vector_length_operand""rK,rK, rK, rK")
 (match_operand 6 "const_int_operand"" i, i,  i,  i")
 (match_operand 7 "const_int_operand"" i, i,  i,  i")
 (match_operand 8 "const_int_operand"" i, i,  i,  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (any_commutative_binop:VI_QHS
(vec_duplicate:VI_QHS
  (match_operand: 4 "reg_or_0_operand"  "rJ,rJ, rJ, rJ"))

Any other way to get there?

> + e.add_input_operand (src2, op2mode == VOIDmode ? GET_MODE (src2) : op2mode);
> Very confusing here.

Hmm I see, the VOIDmode being abused as default might be confusing here.
Would an additional parameter like "bool set_op2_mode" make it clearer?
Another option is to separate this into another function altogether like
emit_len_binop_scalar or so.

> +  
> change it into 

Done and removed the rest.

Thanks.


[PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-10 Thread Carl Love via Gcc-patches
GCC maintainers:

The following patch fixes errors in the arguments in the
__builtin_altivec_tr_stxvrhx,   __builtin_altivec_tr_stxvrwx builtin
definitions.  Note, these builtins are used by the overloaded
__builtin_vec_xst_trunc builtin.

The patch adds a new overloaded builtin definition for
__builtin_vec_xst_trunc for the third argument to be unsigned and
signed long int.

A new testcase is added for the various overloaded versions of
__builtin_vec_xst_trunc.

The patch has been tested on Power 10 with no new regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl

---
rs6000: Fix __builtin_vec_xst_trunc definition

Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx
and __builtin_altivec_tr_stxvrwx to handle the short and word cases.  The
arguments for these two builtins are wrong.  This patch fixes the wrong
arguments for the builtins.

Additionally, the patch adds a new __builtin_vec_xst_trunc overloaded
version for the destination being signed or unsigned long int.

A runnable test case is added to test each of the overloaded definitions
of __builtin_vec_xst_tru

gcc/
* config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of second argument.
Add, definition for send argument to be signed long.
* config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc):
add definition with thrird arument signed and unsigned long.
* doc/extend.texi (__builtin_vec_xst_trunc): Add documentation for
new unsinged long and signed long versions.

gcc/testsuite/
* gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case
for __builtin_vec_xst_trunc builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vsx-builtin-vec_xst_trunc.c   | 217 ++
 4 files changed, 228 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..a378491b358 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3161,12 +3161,15 @@
   void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
 TR_STXVRBX vsx_stxvrbx {stvec}
 
-  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
+  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
 TR_STXVRHX vsx_stxvrhx {stvec}
 
-  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
+  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
 TR_STXVRWX vsx_stxvrwx {stvec}
 
+  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long *);
+TR_STXVRLX vsx_stxvrdx {stvec}
+
   void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *);
 TR_STXVRDX vsx_stxvrdx {stvec}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..54b7ae5e51b 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4872,6 +4872,10 @@
 TR_STXVRWX  TR_STXVRWX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned int *);
 TR_STXVRWX  TR_STXVRWX_U
+  void __builtin_vec_xst_trunc (vsq, signed long long, signed long *);
+TR_STXVRLX  TR_STXVRLX_S
+  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long *);
+TR_STXVRLX  TR_STXVRLX_U
   void __builtin_vec_xst_trunc (vsq, signed long long, signed long long *);
 TR_STXVRDX  TR_STXVRDX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long long *);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e426a2eb7d8..7e2ae790ab3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18570,10 +18570,12 @@ instructions.
 @defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed short *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned short *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int1

Re: [PATCH] c++: converted lambda as template argument [PR83258, ...]

2023-05-10 Thread Patrick Palka via Gcc-patches
On Wed, 10 May 2023, Jason Merrill wrote:

> On 5/10/23 11:36, Patrick Palka wrote:
> > r8-1253-g3d2e25a240c711 removed the template argument linkage requirement
> > in convert_nontype_argument for C++17, but we need to also remove the one
> > in convert_nontype_argument_function for sake of the first and third test
> > case which we incorrectly reject (in C++17/20 mode).
> > 
> > And in invalid_tparm_referent_p we're inadvertendly rejecting using the
> > address of a lambda's static op() due to the DECL_ARTIFICIAL check.
> > This patch relaxes this check for sake of the second test case which we
> > incorrectly reject (in C++20 mode).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk and perhaps 13 (since it's a relatively easy/safe fix for a
> > popular non-regression bug).
> > 
> > Co-authored-by: Jonathan Wakely 
> > 
> > PR c++/83258
> > PR c++/80488
> > PR c++/97700
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (convert_nontype_argument_function): Disable linkage
> > requirement for C++17 and later.
> > (invalid_tparm_referent_p): Relax DECL_ARTIFICIAL check for
> > the artificial static op() of a lambda.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/ext/visibility/anon8.C: Don't expect a "no linkage"
> > error for the template argument &B2:fn in C++17 mode.
> > * g++.dg/cpp0x/lambda/lambda-conv15.C: New test.
> > * g++.dg/cpp2a/nontype-class56.C: New test.
> > * g++.dg/template/function2.C: New test.
> > ---
> >   gcc/cp/pt.cc  |  7 +--
> >   gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C | 11 +++
> >   gcc/testsuite/g++.dg/cpp2a/nontype-class56.C  |  8 
> >   gcc/testsuite/g++.dg/ext/visibility/anon8.C   |  4 ++--
> >   gcc/testsuite/g++.dg/template/function2.C |  8 
> >   5 files changed, 34 insertions(+), 4 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class56.C
> >   create mode 100644 gcc/testsuite/g++.dg/template/function2.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 696df2bdd9f..c9b089f8fa7 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -6782,7 +6782,8 @@ convert_nontype_argument_function (tree type, tree
> > expr,
> >   }
> >   linkage = decl_linkage (fn_no_ptr);
> > -  if (cxx_dialect >= cxx11 ? linkage == lk_none : linkage != lk_external)
> > +  if ((cxx_dialect < cxx11 && linkage != lk_external)
> > +  || (cxx_dialect < cxx17 && linkage == lk_none))
> >   {
> > if (complain & tf_error)
> > {
> > @@ -7180,7 +7181,9 @@ invalid_tparm_referent_p (tree type, tree expr,
> > tsubst_flags_t complain)
> >* a string literal (5.13.5),
> >* the result of a typeid expression (8.2.8), or
> >* a predefined __func__ variable (11.4.1).  */
> > -   else if (DECL_ARTIFICIAL (decl))
> > +   else if (DECL_ARTIFICIAL (decl)
> > +/* Accept the artificial static op() of a lambda.  */
> > +&& !LAMBDA_TYPE_P (CP_DECL_CONTEXT (decl)))
> 
> Maybe check for FUNCTION_DECL instead?  I think the cases we want to diagnose
> are all VAR_DECL.

Makes sense, before r13-6970-gb5e38b1c166357 this code path would only
be reachable for VAR_DECL anyway.  Like so?  Bootstrapped and regtested
on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps 13?

-- >8 --

Subject: [PATCH] c++: converted lambda as template argument [PR83258, ...]

r8-1253-g3d2e25a240c711 removed the template argument linkage requirement
in convert_nontype_argument for C++17, but we need to also remove the one
in convert_nontype_argument_function for sake of the first and third test
case which we incorrectly reject (in C++17/20 mode).

And in invalid_tparm_referent_p we're inadvertendly diagnosing using the
address of a lambda's static op() since it's DECL_ARTIFICIAL, which causes
us to reject the second (C++20) testcase.  But this DECL_ARTIFICIAL check
seems to be relevant only for VAR_DECL, and indeed code path was reached
only for VAR_DECL until r13-6970-gb5e38b1c166357.  So this patch relaxes
the check accordingly.

Co-authored-by: Jonathan Wakely 

PR c++/83258
PR c++/80488
PR c++/97700

gcc/cp/ChangeLog:

* pt.cc (convert_nontype_argument_function): Disable linkage
check for C++17 and later.
(invalid_tparm_referent_p): Accept DECL_ARTIFICIAL FUNCTION_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/anon8.C: Don't expect a "no linkage"
error for the template argument &B2:fn in C++17 mode.
* g++.dg/cpp0x/lambda/lambda-conv15.C: New test.
* g++.dg/cpp2a/nontype-class56.C: New test.
* g++.dg/template/function2.C: New test.
---
 gcc/cp/pt.cc  |  5 +++--
 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-conv15.C | 11 +++
 gcc/testsuite/g++.dg/c

Re: [PATCH] riscv: Clarify vlmax and length handling.

2023-05-10 Thread Palmer Dabbelt

On Wed, 10 May 2023 08:24:40 PDT (-0700), rdapp@gmail.com wrote:

Hi,

this patch tries to improve the wrappers that emit either vlmax or
non-vlmax operations.  Now, emit_len_op can be used to
emit a regular operation.  Depending on whether a length != NULL
is passed either no VLMAX flags are set or we emit a vsetvli and
set VLMAX flags.  The patch also adds some comments that describes
some of the rationale of the current handling of vlmax/nonvlmax
operations.

Bootstrapped and regtested.

Regards
 Robin


It's somewhat common for mail clients to treat "--" as a signature 
deliminator, it's "---" that git uses as a comment deliminator.


Re: [PATCH] riscv: Clarify vlmax and length handling.

2023-05-10 Thread Robin Dapp via Gcc-patches
It's somewhat common for mail clients to treat "--" as a signature 
deliminator, it's "---" that git uses as a comment deliminator.


It's in my muscle memory somehow.  Always did it that way because I
didn't want the same delimiter as in the git part of the message.  Time 
to change that habit I suppose :) (or automate more of the process).


Re: [PATCH] riscv: Clarify vlmax and length handling.

2023-05-10 Thread Palmer Dabbelt

On Wed, 10 May 2023 11:50:32 PDT (-0700), rdapp@gmail.com wrote:

It's somewhat common for mail clients to treat "--" as a signature
deliminator, it's "---" that git uses as a comment deliminator.


It's in my muscle memory somehow.  Always did it that way because I
didn't want the same delimiter as in the git part of the message.  Time
to change that habit I suppose :) (or automate more of the process).


I guess if you're committing your own code it doesn't matter, but mixing 
them will trip up git-am and such.


The patch LGTM, but it's mostly Juzhe's code so it's probably best to at 
least give him a chance to see it when he's awake.


Re: [PATCH 1/2] Fortran: dump-parse-tree attribs: fix unbalanced braces [PR109624]

2023-05-10 Thread Bernhard Reutner-Fischer via Gcc-patches
[re-adding the lists, i hope you don't mind]

On Wed, 10 May 2023 18:52:54 +0200
Thomas Koenig  wrote:

> Hi Bernhard,
> 
> both patches look good to me.

Pushed as r14-664-g39f7c0963a9c00 and r14-665-gbdc10c2bfaceb3
Thanks!

> 
> No user impact, so they should have the lowest possible impact :-)
> 
> (And I didn't know about DEBUG_FUNCTION, that could come in handy
> later).
> 
> Thanks for the patch!
> 
> Best regards
> 
>      Thomas



Re: [PATCH] riscv: Split off shift patterns for autovectorization.

2023-05-10 Thread Palmer Dabbelt
On Wed, 10 May 2023 08:24:50 PDT (-0700), rdapp@gmail.com wrote:
> Hi,
>
> this patch splits off the shift patterns of the binop patterns.
> This is necessary as the scalar shifts require a Pmode operand
> as shift count.  To this end, a new iterator any_int_binop_no_shift
> is introduced.  At a later point when the binops are split up
> further in commutative and non-commutative patterns (which both
> do not include the shift patterns) we might not need this anymore.
>
> Bootstrapped and regtested.
>
> Regards
>  Robin
>
> --
>
> gcc/ChangeLog:
>
>   * config/riscv/autovec.md (3): Add scalar shift
>   pattern.
>   (v3): Add vector shift pattern.
>   * config/riscv/vector-iterators.md: New iterator.
> ---
>  gcc/config/riscv/autovec.md  | 40 +++-
>  gcc/config/riscv/vector-iterators.md |  4 +++
>  2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 8347e42bb9c..2da4fc67d51 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -65,7 +65,7 @@ (define_expand "movmisalign"
>
>  (define_expand "3"
>[(set (match_operand:VI 0 "register_operand")
> -(any_int_binop:VI
> +(any_int_binop_no_shift:VI
>   (match_operand:VI 1 "")
>   (match_operand:VI 2 "")))]
>"TARGET_VECTOR"
> @@ -91,3 +91,41 @@ (define_expand "3"
> NULL_RTX, mode);
>DONE;
>  })
> +
> +;; =
> +;; == Binary integer shifts by scalar.
> +;; =
> +
> +(define_expand "3"
> +  [(set (match_operand:VI 0 "register_operand")
> +(any_shift:VI
> + (match_operand:VI 1 "register_operand")
> + (match_operand: 2 "csr_operand")))]

I don't think VEL is _wrong_ here, as it's an integer type that's big
enough to hold the shift amount, but we might get some odd generated
code for the QI and HI flavors as we frequently don't handle the shorter
types well.

"csr_operand" does seem wrong, though, as that just accepts constants.
Maybe "arith_operand" is the way to go?  I haven't looked at the
V immediates though.

> +  "TARGET_VECTOR"
> +{
> +  if (!CONST_SCALAR_INT_P (operands[2]))
> +  operands[2] = gen_lowpart (Pmode, operands[2]);
> +  riscv_vector::emit_len_binop (code_for_pred_scalar
> + (, mode),
> + operands[0], operands[1], operands[2],
> + NULL_RTX, mode, Pmode);
> +  DONE;
> +})
> +
> +;; =
> +;; == Binary integer shifts by vector.
> +;; =
> +
> +(define_expand "v3"
> +  [(set (match_operand:VI 0 "register_operand")
> +(any_shift:VI
> + (match_operand:VI 1 "register_operand")
> + (match_operand:VI 2 "vector_shift_operand")))]
> +  "TARGET_VECTOR"
> +{
> +  riscv_vector::emit_len_binop (code_for_pred
> + (, mode),
> + operands[0], operands[1], operands[2],
> + NULL_RTX, mode);
> +  DONE;
> +})
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index 42848627c8c..fdb0bfbe3b1 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -1429,6 +1429,10 @@ (define_code_iterator any_commutative_binop [plus and 
> ior xor
>
>  (define_code_iterator any_non_commutative_binop [minus div udiv mod umod])
>
> +(define_code_iterator any_int_binop_no_shift
> + [plus minus and ior xor smax umax smin umin mult div udiv mod umod
> +])
> +
>  (define_code_iterator any_immediate_binop [plus minus and ior xor])
>
>  (define_code_iterator any_sat_int_binop [ss_plus ss_minus us_plus us_minus])
> --
> 2.40.0

It'd be great to have test cases for the patterns we're adding, at least
for some of the stickier ones.


Re: [vxworks] [testsuite] [aarch64] use builtin in pred-not-gen-4.c

2023-05-10 Thread Richard Sandiford via Gcc-patches
Alexandre Oliva via Gcc-patches  writes:
> On vxworks, isunordered is defined as a macro that ultimately calls a
> _Fpcomp function, that GCC doesn't recognize as a builtin, so it
> can't optimize accordingly.
>
> Use __builtin_isunordered instead to get the desired code for the
> test.
>
> Regstrapped on x86_64-linux-gnu.  Also tested on aarch64-vx7r2 with
> gcc-12.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
>   * gcc.target/aarch64/pred-not-gen-4.c: Drop math.h include,
>   call builtin.

OK, thanks.

Richard

> ---
>  .../gcc.target/aarch64/sve/pred-not-gen-4.c|4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-4.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-4.c
> index 0001dd3fc211f..1845bd3f0f704 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-4.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-not-gen-4.c
> @@ -1,12 +1,10 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O3" } */
>  
> -#include 
> -
>  void f13(double * restrict z, double * restrict w, double * restrict x, 
> double * restrict y, int n)
>  {
>  for (int i = 0; i < n; i++) {
> -z[i] = (isunordered(w[i], 0)) ? x[i] + w[i] : y[i] - w[i];
> +z[i] = (__builtin_isunordered(w[i], 0)) ? x[i] + w[i] : y[i] - w[i];
>  }
>  }


Re: [PATCH] riscv: Add autovectorization tests for binary integer

2023-05-10 Thread Palmer Dabbelt
On Wed, 10 May 2023 08:24:57 PDT (-0700), rdapp@gmail.com wrote:
> Hi,
>
> this patchs adds scan as well as execution tests for vectorized
> binary integer operations.  It is based on Michael Collison's work
> and also includes scalar variants.  The tests are not fully comprehensive
> as the vector type promotions (vec_unpack, extend etc.) are not
> implemented yet.  Also, vmulh, vmulhu, and vmulhsu and others are
> still missing.

Ah, I guess there's the tests... ;)

>
> Regards
>  Robin
>
> --
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/rvv/autovec/shift-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/shift-scalar-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/shift-scalar-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/shift-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/shift-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vadd-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vadd-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vadd-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vand-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vand-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vand-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vdiv-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vdiv-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vdiv-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmax-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmax-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmax-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmin-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmin-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmin-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmul-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vmul-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vmul-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vor-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vor-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vor-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vrem-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vrem-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vrem-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vsub-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vsub-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vsub-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vxor-run-template.h: New test.
>   * gcc.target/riscv/rvv/autovec/vxor-rv32gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: New test.
>   * gcc.target/riscv/rvv/autovec/vxor-template.h: New test.

I just skimmed them, but nothing jumps out as a problem.  IMO that's
good enough to land them on trunk once the dependencies do.

> ---
>  .../riscv/rvv/autovec/shift-run-template.h|  47 +++
>  .../riscv/rvv/autovec/shift-rv32gcv.c |  12 ++
>  .../riscv/rvv/autovec/shift-rv64gcv.c |  12 ++
>  .../riscv/rvv/autovec/shift-scalar-rv32gcv.c  |   7 ++
>  .../riscv/rvv/autovec/shift-scalar-rv64gcv.c  |   7 ++
>  .../riscv/rvv/autovec/shift-scalar-template.h | 119 ++
>  .../riscv/rvv/autovec/shift-template.h|  34 +
>  .../riscv/rvv/autovec/vadd-run-template.h |  64 ++
>  .../riscv/rvv/autovec/vadd-rv32gcv.c  |   8 ++
>  .../riscv/rvv/autovec/vadd-rv64gcv.c  |   8 ++
>  .../riscv/rvv/autovec/vadd-template.h |  56 +
>  .../riscv/rvv/autovec/vand-run-template.h |  64 ++
>  .../riscv/rvv/autovec/vand-rv32gcv.c  |   8 ++
>  .../riscv/rvv/autovec/vand-rv64gcv.c  |   8 ++
>  .../riscv/rvv/autovec/vand-template.h |  56 +
>  .../riscv/rvv/autovec/vdiv-run-template.h |  42 +++
>  .../riscv/rvv/autovec/vdiv-rv32gcv.c  |  10 ++
>  .../riscv/rvv/autovec/vdiv-rv64gcv.c  |  10 ++
>  .../riscv/rvv/autovec/vdiv-template.h |  34 +
>  .../riscv/rvv

[x86_64 PATCH] Use [(const_int 0)] idiom consistently in i386.md

2023-05-10 Thread Roger Sayle

Hi Uros,
This cleans up the use of [(clobber (const_int 0))] in the i386 backend.
My apologies I must have copied this idiom from one of the other targets:
aarch64.md, arm.md, thumb1.md, avr.md, or sparc.md.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-05-10  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*concat3_1): Use preferred
[(const_int 0)] idiom, instead of [(clobber (const_int 0))].
(*concat3_2): Likewise.
(*concat3_3): Likewise.
(*concat3_4): Likewise.
(*concat3_5): Likewise.
(*concat3_6): Likewise.
(*concat3_7): Likewise.


Thanks,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cf90867..f2dd67e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -11584,7 +11584,7 @@
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   split_double_concat (mode, operands[0], operands[3],
   gen_lowpart (mode, operands[1]));
@@ -11601,7 +11601,7 @@
   "INTVAL (operands[3]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   split_double_concat (mode, operands[0], operands[1],
   gen_lowpart (mode, operands[2]));
@@ -11620,7 +11620,7 @@
   "INTVAL (operands[2]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   split_double_concat (mode, operands[0], operands[3], operands[1]);
   DONE;
@@ -11638,7 +11638,7 @@
   "INTVAL (operands[3]) ==  * BITS_PER_UNIT"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   split_double_concat (mode, operands[0], operands[1], operands[2]);
   DONE;
@@ -11665,7 +11665,7 @@
VOIDmode))"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   rtx op3 = simplify_subreg (mode, operands[3], mode, 0);
   split_double_concat (mode, operands[0], op3,
@@ -11697,7 +11697,7 @@
VOIDmode))"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   rtx op3 = simplify_subreg (mode, operands[3], mode, 0);
   split_double_concat (mode, operands[0], op3, operands[1]);
@@ -11723,7 +11723,7 @@
   VOIDmode)"
   "#"
   "&& reload_completed"
-  [(clobber (const_int 0))]
+  [(const_int 0)]
 {
   rtx op2;
   if (mode == DImode)


  1   2   >