[patch,avr,applied] Tweak comparisons with constant

2024-08-01 Thread Georg-Johann Lay

Applied this tweak to the 16-bit and 32-bit comparisons.

Johann

--

gcc/
* config/avr/constraints.md (YMM): New constraint.
* config/avr/avr.md (cmp3, *cmp3)
(cbranch4_insn): Allow YMM where M is allowed.
(cbranch4_insn): Split to a test of the
high part against 0 if possible.diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index 91a306f2522..fce5349bbe5 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -6568,9 +6568,9 @@ (define_insn "*cmp..1"
 ;; "cmpha3" "cmpuha3"
 (define_insn "cmp3"
   [(set (reg:CC REG_CC)
-(compare:CC (match_operand:ALL2 0 "register_operand"  "!w  ,r  ,r,d ,r  ,d,r")
-(match_operand:ALL2 1 "nonmemory_operand"  "Y00,Y00,r,s ,s  ,M,n Ynn")))
-   (clobber (match_scratch:QI 2   "=X  ,X  ,X,, ,X,"))]
+(compare:CC (match_operand:ALL2 0 "register_operand"  "!w  ,r  ,r,d ,r ,d,r")
+(match_operand:ALL2 1 "nonmemory_operand"  "Y00,Y00,r,s ,s ,M YMM,n Ynn")))
+   (clobber (match_scratch:QI 2   "=X  ,X  ,X,,,X,"))]
   "reload_completed"
   {
 switch (which_alternative)
@@ -6635,9 +6635,9 @@ (define_insn "*cmppsi"
 ;; "*cmpsa" "*cmpusa"
 (define_insn "*cmp"
   [(set (reg:CC REG_CC)
-(compare:CC (match_operand:ALL4 0 "register_operand"  "r  ,r ,d,r ,r")
-(match_operand:ALL4 1 "nonmemory_operand" "Y00,r ,M,M ,n Ynn")))
-   (clobber (match_scratch:QI 2  "=X  ,X ,X,,"))]
+(compare:CC (match_operand:ALL4 0 "register_operand"  "r  ,r ,d,r")
+(match_operand:ALL4 1 "nonmemory_operand" "Y00,r ,M YMM,n Ynn")))
+   (clobber (match_scratch:QI 2  "=X  ,X ,X,"))]
   "reload_completed"
   {
 if (0 == which_alternative)
@@ -6647,8 +6647,8 @@ (define_insn "*cmp"
 
 return avr_out_compare (insn, operands, NULL);
   }
-  [(set_attr "length" "4,4,4,5,8")
-   (set_attr "adjust_len" "tstsi,*,compare,compare,compare")])
+  [(set_attr "length" "4,4,4,8")
+   (set_attr "adjust_len" "tstsi,*,compare,compare")])
 
 
 ;; A helper for avr_pass_ifelse::avr_rest_of_handle_ifelse().
@@ -6727,11 +6727,11 @@ (define_insn_and_split "cbranch4_insn"
   [(set (pc)
 (if_then_else
  (match_operator 0 "ordered_comparison_operator"
-   [(match_operand:ALL4 1 "register_operand"  "r  ,r,d,r ,r")
-(match_operand:ALL4 2 "nonmemory_operand" "Y00,r,M,M ,n Ynn")])
+   [(match_operand:ALL4 1 "register_operand"  "r  ,r,d ,r")
+(match_operand:ALL4 2 "nonmemory_operand" "Y00,r,M YMM ,n Ynn")])
  (label_ref (match_operand 3))
  (pc)))
-   (clobber (match_scratch:QI 4  "=X  ,X,X,,"))]
+   (clobber (match_scratch:QI 4  "=X  ,X,X ,"))]
""
"#"
"reload_completed"
@@ -6742,7 +6742,23 @@ (define_insn_and_split "cbranch4_insn"
  (if_then_else (match_op_dup 0
  [(reg:CC REG_CC) (const_int 0)])
(label_ref (match_dup 3))
-   (pc)))])
+   (pc)))]
+   {
+ // Unsigned >= 65536 and < 65536 can be performed by testing the
+ // high word against 0.
+ if ((GET_CODE (operands[0]) == LTU
+  || GET_CODE (operands[0]) == GEU)
+ && const_operand (operands[2], mode)
+ && INTVAL (avr_to_int_mode (operands[2])) == 65536)
+   {
+ // "cmphi3" of the high word against 0.
+ operands[0] = copy_rtx (operands[0]);
+ PUT_CODE (operands[0], GET_CODE (operands[0]) == GEU ? NE : EQ);
+ operands[1] = simplify_gen_subreg (HImode, operands[1], mode, 2);
+ operands[2] = const0_rtx;
+ operands[4] = gen_rtx_SCRATCH (QImode);
+   }
+   })
 
 ;; "cbranchpsi4_insn"
 (define_insn_and_split "cbranchpsi4_insn"
@@ -6772,11 +6788,11 @@ (define_insn_and_split "cbranch4_insn"
   [(set (pc)
 (if_then_else
  (match_operator 0 "ordered_comparison_operator"
-   [(match_operand:ALL2 1 "register_operand" "!w  ,r  ,r,d ,r ,d,r")
-(match_operand:ALL2 2 "nonmemory_operand" "Y00,Y00,r,s ,s ,M,n Ynn")])
+   [(match_operand:ALL2 1 "register_operand" "!w  ,r  ,r,d ,r ,d,r")
+(match_operand:ALL2 2 "nonmemory_operand" "Y00,Y00,r,s ,s ,M YMM,n Ynn")])
  (label_ref (match_operand 3))
  (pc)))
-   (clobber (match_scratch:QI 4  "=X  ,X  ,X,,,X,"))]
+   (clobber (match_scratch:QI 4  "=X  ,X  ,X,,,X,"))]
""
"#"
"reload_completed"
@@ -6787,7 +6803,23 @@ (define_insn_and_split "cbranch4_insn"
  (if_then_else (match_op_dup 0
  [(reg:CC REG_CC) (const_int 0)])
(label_ref (match_dup 3))
-   (pc)))])
+   (pc)))]
+   {
+ // Unsigned >= 256 and < 256 can be performed by 

Re: [Patch] libgomp: Device load_image - minor num-funcs/vars check improvement

2024-08-01 Thread Tobias Burnus
I have sent the following page in February (Stage 4) and didn't want to 
commit it back then. But for Stage 1, it should be fine ... I like to 
commit it tomorrow, unless there are comments suggesting other.


Attached is the unchanged patch and I also added a "diff -w -U1" patch 
as that makes it easier to see the non-re-indent changes.


Tobias

On February 19, 2024, Tobias Burnus wrote:
When debugging a linker issue, leading to a mismatch in the number of 
host/device functions, I was surprised by seeing one additional entry. 
Well, it turned out to be due to the ICV variable.


This patch makes it more consistent. The "+1" is returned since 
r12-2769-g0bac793ed6bad2 (for the on-device omp_get_device_num), 
extended in r13-2545-g9f2fca56593a2b for a struct to support more ICV 
variables on the devices [to handle OMP_..._DEV environment variables].


As the value is returned unconditionally, it makes sense to use it 
both for the expected-value diagnostic and for the condition further 
below.


Comments, suggestions, remarks?

Tobias

PS: Alternative would be to make the plugin's value depend on whether 
the data was loaded. But that would make the number-of-entries assert 
weaker and might cause corner-case issues when a slightly older 
libgomp plugin is used with the updated libgomp.so. Thus, I have 
settled for the attached variant.diff --git a/libgomp/target.c b/libgomp/target.c
index efed6ad68ff..fb9a6fb5c79 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2364,5 +2364,4 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
-  if (num_target_entries != num_funcs + num_vars
-  /* "+1" due to the additional ICV struct.  */
-  && num_target_entries != num_funcs + num_vars + 1)
+  /* The "+1" is due to the additional ICV struct.  */
+  if (num_target_entries != num_funcs + num_vars + 1)
 {
@@ -2372,3 +2371,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   gomp_fatal ("Cannot map target functions or variables"
-		  " (expected %u, have %u)", num_funcs + num_vars,
+		  " (expected %u + %u + 1, have %u)", num_funcs, num_vars,
 		  num_target_entries);
@@ -2456,11 +2455,5 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
-  /* Last entry is for a ICVs variable.
- Tolerate case where plugin does not return those entries.  */
-  if (num_funcs + num_vars < num_target_entries)
-{
-  struct addr_pair *var = _table[num_funcs + num_vars];
-
-  /* Start address will be non-zero for the ICVs variable if
-	 the variable was found in this image.  */
-  if (var->start != 0)
+  /* Last entry is for the ICV struct variable; if absent, start = end = 0.  */
+  struct addr_pair *icv_var = _table[num_funcs + num_vars];
+  if (icv_var->start != 0)
 {
@@ -2471,3 +2464,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   struct gomp_offload_icvs *icvs = get_gomp_offload_icvs (dev_num);
-	  size_t var_size = var->end - var->start;
+  size_t var_size = icv_var->end - icv_var->start;
   if (var_size != sizeof (struct gomp_offload_icvs))
@@ -2482,3 +2475,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 	 actually designating its device number into effect.  */
-	  gomp_copy_host2dev (devicep, NULL, (void *) var->start, icvs,
+  gomp_copy_host2dev (devicep, NULL, (void *) icv_var->start, icvs,
 			  var_size, false, NULL);
@@ -2489,3 +2482,3 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt = tgt;
-	  k->tgt_offset = var->start;
+  k->tgt_offset = icv_var->start;
   k->refcount = REFCOUNT_INFINITY;
@@ -2498,3 +2491,2 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 }
-}
 
libgomp: Device load_image - improve minor num-funcs/vars check

The run time library loads the offload functions and variable and optionally
the ICV variable and returns the number of loaded items, which has to match
the host side. The plugin returns "+1" (since GCC 12) for the ICV variable
entry, independently whether it was loaded or not, but the var's value
(start == end == 0) can be used to detect when this failed.

Thus, we can tighten the assert check - which this commit does together with
making the output less surprising - and simplify the condition further below.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (GOMP_OFFLOAD_load_image): If ICV variable
	is is not available, decrement other_count and thus the return value.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Likewise.
	* target.c (gomp_load_image_to_device): Extend fatal-error message;
	simplify a condition.

 libgomp/target.c | 78 +---
 1 file changed, 35 insertions(+),

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-08-01 Thread Uros Bizjak
On Tue, Jul 30, 2024 at 5:05 AM liuhongt  wrote:
>
> (insn 98 94 387 2 (parallel [
> (set (reg:TI 337 [ _32 ])
> (ashift:TI (reg:TI 329)
> (reg:QI 521)))
> (clobber (reg:CC 17 flags))
> ]) "test.c":11:13 953 {ashlti3_doubleword}
>
> is reloaded into
>
> (insn 98 452 387 2 (parallel [
> (set (reg:TI 0 ax [orig:337 _32 ] [337])
> (ashift:TI (const_int 1671291085 [0x639de0cd])
> (reg:QI 2 cx [521])))
> (clobber (reg:CC 17 flags))
>
> since constraint n in the pattern accepts that.
> (Not sure why reload doesn't check predicate)

This is how reload works. It doesn't look at predicates, only at
constraints. To avoid checking errors in later passes, the predicate
should allow a superset of operands compared to the operands of the
constraint. Basically, reload is "refining" the operands to fit
constraints.

OTOH, predicates are used by pre-reload passes, e.g. combine to
combine various instructions. This is the reason sometimes insn has
nonimmediate_operand predicate and "r" constraint - to allow insn
combination while expecting that reload will fix the operand to fit
the constraint. Post-reload passes check both, predicats and
constraints, this is where you get failures due to mismatched reload
operand.

> (define_insn "ashl3_doubleword"
>   [(set (match_operand:DWI 0 "register_operand" "=,")
> (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
> (match_operand:QI 2 "nonmemory_operand" "c,c")))
>
> The patch fixes the mismatch between constraint and predicate.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/116096
> * config/i386/constraints.md (Wc): New constraint for integer
> 1 or -1.
> * config/i386/i386.md (ashl3_doubleword): Refine
> constraint with Wc.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr116096.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/constraints.md   |  6 ++
>  gcc/config/i386/i386.md  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr116096.c | 26 
>  3 files changed, 33 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116096.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 7508d7a58bd..154cbccd09e 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -254,6 +254,12 @@ (define_constraint "Wb"
>(and (match_code "const_int")
> (match_test "IN_RANGE (ival, 0, 7)")))
>
> +(define_constraint "Wc"
> +  "Integer constant -1 or 1."
> +  (and (match_code "const_int")
> +   (ior (match_test "op == constm1_rtx")
> +   (match_test "op == const1_rtx"
> +
>  (define_constraint "Ww"
>"Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
>(and (match_code "const_int")
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6207036a2a0..79d5de5b46a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14774,7 +14774,7 @@ (define_insn_and_split "*ashl3_doubleword_mask_1"
>
>  (define_insn "ashl3_doubleword"
>[(set (match_operand:DWI 0 "register_operand" "=,")
> -   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
> +   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0Wc,r")
> (match_operand:QI 2 "nonmemory_operand" "c,c")))
> (clobber (reg:CC FLAGS_REG))]
>""
> diff --git a/gcc/testsuite/gcc.target/i386/pr116096.c 
> b/gcc/testsuite/gcc.target/i386/pr116096.c
> new file mode 100644
> index 000..5ef39805f58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116096.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -flive-range-shrinkage -fno-peephole2 -mstackrealign 
> -Wno-psabi" } */
> +
> +typedef char U __attribute__((vector_size (32)));
> +typedef unsigned V __attribute__((vector_size (32)));
> +typedef __int128 W __attribute__((vector_size (32)));
> +U g;
> +
> +W baz ();
> +
> +static inline U
> +bar (V x, W y)
> +{
> +  y = y | y << (W) x;
> +  return (U)y;
> +}
> +
> +void
> +foo (W w)
> +{
> +  g = g <<
> +bar ((V){baz ()[1], 3, 3, 5, 7},
> +(W){w[0], ~(int) 2623676210}) >>
> +bar ((V){baz ()[1]},
> +(W){-w[0], ~(int) 2623676210});
> +}
> --
> 2.31.1
>


[PATCH 1/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_available_features): Handle
avx10.2.
* common/config/i386/i386-common.cc
(OPTION_MASK_ISA2_AVX10_2_256_SET): New.
(OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto.
(OPTION_MASK_ISA2_AVX10_1_256_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_256_UNSET.
(OPTION_MASK_ISA2_AVX10_1_512_UNSET):
Add OPTION_MASK_ISA2_AVX10_2_512_UNSET.
(OPTION_MASK_ISA2_AVX10_2_256_UNSET): New.
(OPTION_MASK_ISA2_AVX10_2_512_UNSET): Ditto.
(ix86_handle_option): Handle avx10.2-256 and avx10.2-512.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVX10_2_256 and FEATURE_AVX10_2_512.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY for
avx10.2-256 and avx10.2-512.
* config/i386/i386-c.cc (ix86_target_macros_internal): Define
__AVX10_2_256__ and __AVX10_2_512__.
* config/i386/i386-isa.def (AVX10_2): Add DEF_PTA(AVX10_2_256)
and DEF_PTA(AVX10_2_512).
* config/i386/i386-options.cc (isa2_opts): Add -mavx10.2-256 and
-mavx10.2-512.
(ix86_valid_target_attribute_inner_p): Handle avx10.2-256 and
avx10.2-512.
* config/i386/i386.opt: Add option -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* config/i386/i386.opt.urls: Regenerated.
* doc/extend.texi: Document avx10.2, avx10.2-256 and avx10.2-512.
* doc/invoke.texi: Document -mavx10.2, -mavx10.2-256 and
-mavx10.2-512.
* doc/sourcebuild.texi: Document target avx10.2, avx10.2-256,
avx10.2-512.

gcc/testsuite/ChangeLog:

* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
---
 gcc/common/config/i386/cpuinfo.h   |  6 
 gcc/common/config/i386/i386-common.cc  | 43 --
 gcc/common/config/i386/i386-cpuinfo.h  |  2 ++
 gcc/common/config/i386/i386-isas.h |  3 ++
 gcc/config/i386/i386-c.cc  |  4 +++
 gcc/config/i386/i386-isa.def   |  2 ++
 gcc/config/i386/i386-options.cc|  7 -
 gcc/config/i386/i386.opt   | 15 +
 gcc/config/i386/i386.opt.urls  |  9 ++
 gcc/doc/extend.texi| 15 +
 gcc/doc/invoke.texi| 17 --
 gcc/doc/sourcebuild.texi   |  9 ++
 gcc/testsuite/g++.dg/other/i386-2.C|  9 +++---
 gcc/testsuite/g++.dg/other/i386-3.C|  9 +++---
 gcc/testsuite/gcc.target/i386/sse-12.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-13.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-14.c |  2 +-
 gcc/testsuite/gcc.target/i386/sse-22.c |  4 +--
 gcc/testsuite/gcc.target/i386/sse-23.c |  2 +-
 19 files changed, 140 insertions(+), 22 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 2ae77d335d2..2ae383eb6ab 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -1006,6 +1006,9 @@ get_available_features (struct __processor_model 
*cpu_model,
   if (ebx & bit_AVX10_256)
switch (version)
  {
+ case 2:
+   set_feature (FEATURE_AVX10_2_256);
+   /* Fall through.  */
  case 1:
set_feature (FEATURE_AVX10_1_256);
break;
@@ -1016,6 +1019,9 @@ get_available_features (struct __processor_model 
*cpu_model,
   if (ebx & bit_AVX10_512)
switch (version)
  {
+ case 2:
+   set_feature (FEATURE_AVX10_2_512);
+   /* Fall through.  */
  case 1:
set_feature (FEATURE_AVX10_1_512);
break;
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index e38b1b22ffb..fb744319b05 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -122,6 +122,11 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_AVX10_1_256_SET OPTION_MASK_ISA2_AVX10_1_256
 #define OPTION_MASK_ISA2_AVX10_1_512_SET \
   (OPTION_MASK_ISA2_AVX10_1_256_SET | OPTION_MASK_ISA2_AVX10_1_512)
+#define OPTION_MASK_ISA2_AVX10_2_256_SET \
+  (OPTION_MASK_ISA2_AVX10_1_256_SET | OPTION_MASK_ISA2_AVX10_2_256)
+#define OPTION_MASK_ISA2_AVX10_2_512_SET \
+  (OPTION_MASK_ISA2_AVX10_1_512_SET | OPTION_MASK_ISA2_AVX10_2_256_SET \
+   | OPTION_MASK_ISA2_AVX10_2_512)
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -307,8 +312,12 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_EVEX512_UNSET OPTION_MASK_ISA2_EVEX512
 #define OPTION_MASK_ISA2_USER_MSR_UNSET OPTION_MASK_ISA2_USER_MSR
 #define OPTION_MASK_ISA2_AVX10_1_256_UNSET \
-  (OPTION_MASK_ISA2_AVX10_1_256 | 

[PATCH 0/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
Hi all,

AVX10.2 tech details has been just published on July 31st in the
following link:

https://cdrdv2.intel.com/v1/dl/getContent/828965

For new features and instructions, we could divide them into two parts.
One is ymm rounding control, the other is the new instructions.

In the following weeks, we plan to upstream ymm rounding part first,
following by new instructions. After all of them upstreamed, we will
also upstream several patches optimizing codegen with new AVX10.2
instructions.

The patch coming next is the initial support for AVX10.2. This patch
will be the foundation of all our patches. It adds the support for
cpuid, option, target attribute, etc.

Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk?

Thx,
Haochen




Re: [PATCH] middle-end/114563 - improve release_pages

2024-08-01 Thread Richard Biener
On Wed, Jul 31, 2024 at 5:37 PM Andi Kleen  wrote:
>
> On Wed, Jul 31, 2024 at 04:02:22PM +0200, Richard Biener wrote:
> > The following improves release_pages when using the madvise path
> > to sort the freelist to get more page entries contiguous and possibly
> > release them.  This populates the unused prev pointer so the reclaim
> > can then easily unlink from the freelist without re-ordering it.
> > The paths not having madvise do not keep the memory allocated, so
> > I left them untouched.
> >
> > Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > I've CCed people messing with release_pages;  This doesn't really
> > address PR114563 but I thought I post this patch anyway - the
> > actual issue we run into for the PR is the linear search of
> > G.free_pages when that list becomes large but a requested allocation
> > cannot be served from it.
> >
> >   PR middle-end/114563
> >   * ggc-page.cc (page_sort): New qsort comparator.
> >   (release_pages): Sort the free_pages list entries after their
> >   memory block virtual address to improve contiguous memory
> >   chunk release.
>
> I saw this in a profile some time ago and tried it with a slightly
> different patch. Instead of a full sort it uses an array to keep
> multiple free lists. But I couldn't find any speed ups in non checking
> builds later.
>
> My feeling is that an array is probably more efficient.
>
> I guess should compare both on that PR.
>
>
> diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
> index 4245f843a29f..af1627b002c6 100644
> --- a/gcc/ggc-page.cc
> +++ b/gcc/ggc-page.cc
> @@ -234,6 +234,8 @@ static struct
>  }
>  inverse_table[NUM_ORDERS];
>
> +struct free_list;
> +
>  /* A page_entry records the status of an allocation page.  This
> structure is dynamically sized to fit the bitmap in_use_p.  */
>  struct page_entry
> @@ -251,6 +253,9 @@ struct page_entry
>   of the host system page size.)  */
>size_t bytes;
>
> +  /* Free list of this page size.  */
> +  struct free_list *free_list;
> +

?  this seems misplaced

>/* The address at which the memory is allocated.  */
>char *page;
>
> @@ -368,6 +373,15 @@ struct free_object
>  };
>  #endif
>
> +constexpr int num_free_list = 8;
> +
> +/* A free_list for pages with BYTES size.  */
> +struct free_list
> +{
> +  size_t bytes;
> +  page_entry *free_pages;
> +};
> +
>  /* The rest of the global variables.  */
>  static struct ggc_globals
>  {
> @@ -412,8 +426,8 @@ static struct ggc_globals
>int dev_zero_fd;
>  #endif
>
> -  /* A cache of free system pages.  */
> -  page_entry *free_pages;
> +  /* A cache of free system pages. Entry 0 is fallback.  */
> +  struct free_list free_lists[num_free_list];

I also thought of this, but ...

>  #ifdef USING_MALLOC_PAGE_GROUPS
>page_group *page_groups;
> @@ -754,6 +768,26 @@ clear_page_group_in_use (page_group *group, char *page)
>  }
>  #endif
>
> +/* Find a free list for ENTRY_SIZE.  */
> +
> +static inline struct free_list *
> +find_free_list (size_t entry_size)
> +{
> +  int i;
> +  for (i = 1; i < num_free_list; i++)
> +{
> +  if (G.free_lists[i].bytes == entry_size)
> +   return _lists[i];
> +  if (G.free_lists[i].bytes == 0)
> +   {
> + G.free_lists[i].bytes = entry_size;
> + return _lists[i];
> +   }
> +}
> +  /* Fallback.  */
> +  return _lists[0];
> +}
> +
>  /* Allocate a new page for allocating objects of size 2^ORDER,
> and return an entry for it.  The entry is not added to the
> appropriate page_table list.  */
> @@ -770,6 +804,7 @@ alloc_page (unsigned order)
>  #ifdef USING_MALLOC_PAGE_GROUPS
>page_group *group;
>  #endif
> +  struct free_list *free_list;
>
>num_objects = OBJECTS_PER_PAGE (order);
>bitmap_size = BITMAP_SIZE (num_objects + 1);
> @@ -782,8 +817,10 @@ alloc_page (unsigned order)
>entry = NULL;
>page = NULL;
>
> +  free_list = find_free_list (entry_size);

... this made me not consider it.  Looking closer I think
'entry_size' only depends on 'order' and thus a direct mapping
would work here.

> +
>/* Check the list of free pages for one we can use.  */
> -  for (pp = _pages, p = *pp; p; pp = >next, p = *pp)
> +  for (pp = _list->free_pages, p = *pp; p; pp = >next, p = *pp)
>  if (p->bytes == entry_size)

and this search should then go - if this cannot become O(1) adding any
complexity elsewhere is not worth it.

I also saw we only use "pages" for low-order allocations but malloc for
higher orders bu

Re: [PATCH v2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-08-01 Thread Richard Biener
On Thu, Aug 1, 2024 at 12:30 AM Andrew Pinski  wrote:
>
> When this pattern was converted from being only dealing with 0/-1, we missed 
> that if `e == f` is true
> then the optimization is wrong and needs an extra check for that.
>
> This changes the patterns to be:
> /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>
> Also this can't be done if the X can be a NaNs either. Since that changes the 
> value there too.
>
> This still produces better code than the original case and in many cases (x 
> != y) will
> still reduce to either false or true.
>
> With this change we also need to make sure `a`, `b` and the resulting types 
> are all
> the same for the same reason as the previous patch.
>
> I updated (well added) to the testcases to make sure there are the right 
> amount of
> comparisons left.
>
> Changes since v1:
> * v2: Fixed the testcase names and fixed dg-run to be `dg-do run`. Added a 
> check for HONORS_NANS too.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR tree-optimization/116120
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Add test for `x != y`
> in result.
> (`(a ? x : y) eq/ne (b ? y : x)`): Add test for `x == y` in result.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr50.C: Add extra checks on the test.
> * gcc.dg/tree-ssa/pr50-1.c: Likewise.
> * gcc.dg/tree-ssa/pr50.c: Likewise.
> * g++.dg/torture/pr116120-1.C: New test.
> * g++.dg/torture/pr116120-2.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd   | 23 ++
>  gcc/testsuite/g++.dg/torture/pr116120-1.C  | 32 +
>  gcc/testsuite/g++.dg/torture/pr116120-2.C  | 53 ++
>  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 10 
>  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c |  9 
>  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   |  1 +
>  6 files changed, 120 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-1.C
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-2.C
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 881a827860f..c9c8478d286 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5632,21 +5632,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
>  #endif
>
> -/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> -/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> +/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> +/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> +/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> +/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
> +/* These are only valid if x and y don't have NaNs. */
>  (for cnd (cond vec_cond)
>   (for eqne (eq ne)
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> +(if (!HONOR_NANS (@1)
> +&& types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_and (bit_xor @0 @3) (ne:type @1 @2))
> +  { constant_boolean_node (eqne == NE_EXPR, type); }
>{ constant_boolean_node (eqne != NE_EXPR, type); })))
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> +(if (!HONOR_NANS (@1)
> +&& types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_ior (bit_xor @0 @3) (eq:type @1 @2))
> +  { constant_boolean_node (eqne != NE_EXPR, type); }
>{ constant_boolean_node (eqne == NE_EXPR, type); })
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116120-1.C
> new file mode 100644
> index 000..209946f17a4
> --- /dev/null
> +++ b/gcc/testsuite/g+

Re: [PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-08-01 Thread Richard Biener
On Wed, Jul 31, 2024 at 9:00 PM Qing Zhao  wrote:
>
> Hi, Kewen,
>
> Thanks a lot for fixing this testing case issue.
> Yes, the change LGTM though I can’t approve it.

OK.

Richard.

> Qing
>
> > On Jul 31, 2024, at 05:22, Kewen.Lin  wrote:
> >
> > Hi,
> >
> > As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
> > was designed for little-endian, the recent commit r15-2403 made it
> > be tested with running on BE and PR116148 got exposed.
> >
> > This patch is to adjust the expected data for members in with_fam_2_v
> > and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
> > from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.
> >
> > Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
> >
> > Is it ok for trunk?
> >
> > BR,
> > Kewen
> > -
> >   PR testsuite/116148
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * c-c++-common/fam-in-union-alone-in-struct-2.c: Define macros
> >   WITH_FAM_2_V_B[03] and WITH_FAM_3_V_A[07] as endianness, update the
> >   checking with these macros and initialize with_fam_3_v.b[1] with
> >   0x5f6f7f8f instead of 0x5f6f7f7f.
> > ---
> > .../fam-in-union-alone-in-struct-2.c  | 22 ++-
> > 1 file changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c 
> > b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> > index 93f9d5128f6..7845a7fbab3 100644
> > --- a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> > +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> > @@ -16,7 +16,7 @@ union with_fam_2 {
> > union with_fam_3 {
> >   char a[];
> >   int b[];
> > -} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}};
> > +} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f8f}};
> >
> > struct only_fam {
> >   int b[];
> > @@ -28,16 +28,28 @@ struct only_fam_2 {
> >   int b[];
> > } only_fam_2_v = {{7, 11}};
> >
> > +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > +#define WITH_FAM_2_V_B0 0x4f
> > +#define WITH_FAM_2_V_B3 0x1f
> > +#define WITH_FAM_3_V_A0 0x4f
> > +#define WITH_FAM_3_V_A7 0x5f
> > +#else
> > +#define WITH_FAM_2_V_B0 0x1f
> > +#define WITH_FAM_2_V_B3 0x4f
> > +#define WITH_FAM_3_V_A0 0x1f
> > +#define WITH_FAM_3_V_A7 0x8f
> > +#endif
> > +
> > int main ()
> > {
> >   if (with_fam_1_v.b[3] != 4
> >   || with_fam_1_v.b[0] != 1)
> > __builtin_abort ();
> > -  if (with_fam_2_v.b[3] != 0x1f
> > -  || with_fam_2_v.b[0] != 0x4f)
> > +  if (with_fam_2_v.b[3] != WITH_FAM_2_V_B3
> > +  || with_fam_2_v.b[0] != WITH_FAM_2_V_B0)
> > __builtin_abort ();
> > -  if (with_fam_3_v.a[0] != 0x4f
> > -  || with_fam_3_v.a[7] != 0x5f)
> > +  if (with_fam_3_v.a[0] != WITH_FAM_3_V_A0
> > +  || with_fam_3_v.a[7] != WITH_FAM_3_V_A7)
> > __builtin_abort ();
> >   if (only_fam_v.b[0] != 7
> >   || only_fam_v.b[1] != 11)
> > --
> > 2.45.2
>


Re: [Patch, libgfortran] PR105361 Followup fix to test case

2024-08-01 Thread Andre Vehreschild
Hi Jerry,

looks fine to me. The change is so small that I see no harm, ok to merge.
Thanks for the patch.

- Andre

On Wed, 31 Jul 2024 09:08:54 -0700
Jerry D  wrote:

> I plan to push this soon to hopefully fix some test breakage on some
> architetures.  It is simple and obvious. I did not get any feedback on
> this and I do not have access to the machines in question.
>
> Regression tested on linux-x86-64.
>
> Regards,
>
> Jerry
>
> commit bc4ee05dc7c60d534ef927ac5e679f67fb99d54b
> Author: Jerry DeLisle 
> Date:   Wed Jul 31 08:58:17 2024 -0700
>
>  Fortran: Add newline character to test input.
>
>  gcc/testsuite/ChangeLog:
>
>  PR libfortran/105361
>
>  * gfortran.dg/pr105361.f90: Add newline character to test
>  input to provide more compliant test.
>
> diff --git a/gcc/testsuite/gfortran.dg/pr105361.f90
> b/gcc/testsuite/gfortran.dg/pr105361.f90
> index e2d3b07caca..62821c2802d 100644
> --- a/gcc/testsuite/gfortran.dg/pr105361.f90
> +++ b/gcc/testsuite/gfortran.dg/pr105361.f90
> @@ -27,7 +27,7 @@ program main
> type(foo) :: a, b
> real :: c, d
> open(10, access="stream")
> -  write(10) "1 2" ! // NEW_LINE('A')
> +  write(10) "1 2" // NEW_LINE('A')
> close(10)
> open(10)
> read(10,*) c, d


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-08-01 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 11:04 AM liuhongt  wrote:
>
> (insn 98 94 387 2 (parallel [
> (set (reg:TI 337 [ _32 ])
> (ashift:TI (reg:TI 329)
> (reg:QI 521)))
> (clobber (reg:CC 17 flags))
> ]) "test.c":11:13 953 {ashlti3_doubleword}
>
> is reloaded into
>
> (insn 98 452 387 2 (parallel [
> (set (reg:TI 0 ax [orig:337 _32 ] [337])
> (ashift:TI (const_int 1671291085 [0x639de0cd])
> (reg:QI 2 cx [521])))
> (clobber (reg:CC 17 flags))
>
> since constraint n in the pattern accepts that.
> (Not sure why reload doesn't check predicate)
>
> (define_insn "ashl3_doubleword"
>   [(set (match_operand:DWI 0 "register_operand" "=,")
> (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
> (match_operand:QI 2 "nonmemory_operand" "c,c")))
>
> The patch fixes the mismatch between constraint and predicate.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/116096
> * config/i386/constraints.md (Wc): New constraint for integer
> 1 or -1.
> * config/i386/i386.md (ashl3_doubleword): Refine
> constraint with Wc.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr116096.c: New test.
> ---
>  gcc/config/i386/constraints.md   |  6 ++
>  gcc/config/i386/i386.md  |  2 +-
>  gcc/testsuite/gcc.target/i386/pr116096.c | 26 
>  3 files changed, 33 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116096.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 7508d7a58bd..154cbccd09e 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -254,6 +254,12 @@ (define_constraint "Wb"
>(and (match_code "const_int")
> (match_test "IN_RANGE (ival, 0, 7)")))
>
> +(define_constraint "Wc"
> +  "Integer constant -1 or 1."
> +  (and (match_code "const_int")
> +   (ior (match_test "op == constm1_rtx")
> +   (match_test "op == const1_rtx"
> +
>  (define_constraint "Ww"
>"Integer constant in the range 0 @dots{} 15, for 16-bit shifts."
>(and (match_code "const_int")
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6207036a2a0..79d5de5b46a 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14774,7 +14774,7 @@ (define_insn_and_split "*ashl3_doubleword_mask_1"
>
>  (define_insn "ashl3_doubleword"
>[(set (match_operand:DWI 0 "register_operand" "=,")
> -   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0n,r")
> +   (ashift:DWI (match_operand:DWI 1 "reg_or_pm1_operand" "0Wc,r")
> (match_operand:QI 2 "nonmemory_operand" "c,c")))
> (clobber (reg:CC FLAGS_REG))]
>""
> diff --git a/gcc/testsuite/gcc.target/i386/pr116096.c 
> b/gcc/testsuite/gcc.target/i386/pr116096.c
> new file mode 100644
> index 000..5ef39805f58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116096.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-O2 -flive-range-shrinkage -fno-peephole2 -mstackrealign 
> -Wno-psabi" } */
> +
> +typedef char U __attribute__((vector_size (32)));
> +typedef unsigned V __attribute__((vector_size (32)));
> +typedef __int128 W __attribute__((vector_size (32)));
> +U g;
> +
> +W baz ();
> +
> +static inline U
> +bar (V x, W y)
> +{
> +  y = y | y << (W) x;
> +  return (U)y;
> +}
> +
> +void
> +foo (W w)
> +{
> +  g = g <<
> +bar ((V){baz ()[1], 3, 3, 5, 7},
> +(W){w[0], ~(int) 2623676210}) >>
> +bar ((V){baz ()[1]},
> +(W){-w[0], ~(int) 2623676210});
> +}
> --
> 2.31.1
>


-- 
BR,
Hongtao


Re: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Hongtao Liu
On Thu, Aug 1, 2024 at 10:03 AM Kong, Lingling  wrote:
>
>
>
> > -Original Message-
> > From: Liu, Hongtao 
> > Sent: Thursday, August 1, 2024 9:35 AM
> > To: Kong, Lingling ; gcc-patches@gcc.gnu.org
> > Cc: Wang, Hongyu 
> > Subject: RE: [PATCH] i386: Fix memory constraint for APX NF
> >
> >
> >
> > > -Original Message-
> > > From: Kong, Lingling 
> > > Sent: Thursday, August 1, 2024 9:30 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Liu, Hongtao ; Wang, Hongyu
> > > 
> > > Subject: [PATCH] i386: Fix memory constraint for APX NF
> > >
> > > The je constraint should be used for APX NDD ADD with register source
> > > operand. The jM is for APX NDD patterns with immediate operand.
> > But these 2 alternatives is for Non-NDD.
> The jM constraint is for the size limit of 15 byes when non-default address 
> space,
> It also work to APX NF. The je is for TLS code with EVEX prefix for ADD, and 
> APX NF
> also has the EVEX prefix.
I see, could you also adjust apx_ndd_add_memory_operand and
apx_ndd_memory_operand to apx_evex_add_memory_operand and
apx_evex_memory_operand, and change the comments, but it can be a
separate patch.
The patch LGTM.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.md (nf_mem_constraint): Fixed the constraint
> > > for the define_subst_attr.
> > > (nf_mem_constraint): Added new define_subst_attr.
> > > (*add_1): Fixed the constraint.
> > > ---
> > >  gcc/config/i386/i386.md | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> > > fb10fdc9f96..aa7220ee17c 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -6500,7 +6500,8 @@
> > >  (define_subst_attr "nf_name" "nf_subst" "_nf" "")  (define_subst_attr
> > > "nf_prefix" "nf_subst" "%{nf%} " "")  (define_subst_attr "nf_condition"
> > > "nf_subst" "TARGET_APX_NF" "true") -(define_subst_attr
> > > "nf_mem_constraint" "nf_subst" "je" "m")
> > > +(define_subst_attr "nf_add_mem_constraint" "nf_subst" "je" "m")
> > > +(define_subst_attr "nf_mem_constraint" "nf_subst" "jM" "m")
> > >  (define_subst_attr "nf_applied" "nf_subst" "true" "false")
> > > (define_subst_attr "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")
> > > (define_subst_attr "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64") @@ -
> > 6514,7 +6515,7 @@
> > > (clobber (reg:CC FLAGS_REG))])
> > >
> > >  (define_insn "*add_1"
> > > -  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> > > "=rm,r,r,r,r,r,r,r")
> > > +  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> > > + "=r,r,r,r,r,r,r,r")
> > > (plus:SWI48
> > >   (match_operand:SWI48 1 "nonimmediate_operand"
> > > "%0,0,0,r,r,rje,jM,r")
> > >   (match_operand:SWI48 2 "x86_64_general_operand"
> > > "r,e,BM,0,le,r,e,BM")))]
> > > --
> > > 2.31.1



-- 
BR,
Hongtao


Re: [PATCH] MATCH: add abs support for half float

2024-07-31 Thread Kugan Vivekanandarajah

On Mon, Jul 29, 2024 at 10:11 AM Andrew Pinski  wrote:
>
> On Mon, Jul 29, 2024 at 12:57 AM Kugan Vivekanandarajah
>  wrote:
> >
> > On Thu, Jul 25, 2024 at 10:19 PM Richard Biener
> >  wrote:
> > >
> > > On Thu, Jul 25, 2024 at 4:42 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > On Tue, Jul 23, 2024 at 11:56 PM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, Jul 23, 2024 at 10:27 AM Kugan Vivekanandarajah
> > > > >  wrote:
> > > > > >
> > > > > > On Tue, Jul 23, 2024 at 10:35 AM Andrew Pinski  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Jul 22, 2024 at 5:26 PM Kugan Vivekanandarajah
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Revised based on the comment and moved it into existing 
> > > > > > > > patterns as.
> > > > > > > >
> > > > > > > > gcc/ChangeLog:
> > > > > > > >
> > > > > > > > * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A.
> > > > > > > > Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A.
> > > > > > > >
> > > > > > > > gcc/testsuite/ChangeLog:
> > > > > > > >
> > > > > > > > * gcc.dg/tree-ssa/absfloat16.c: New test.
> > > > > > >
> > > > > > > The testcase needs to make sure it runs only for targets that 
> > > > > > > support
> > > > > > > float16 so like:
> > > > > > >
> > > > > > > /* { dg-require-effective-target float16 } */
> > > > > > > /* { dg-add-options float16 } */
> > > > > > Added in the attached version.
> > > > >
> > > > > + /* (type)A >=/> 0 ? A : -Asame as abs (A) */
> > > > >   (for cmp (ge gt)
> > > > >(simplify
> > > > > -   (cnd (cmp @0 zerop) @1 (negate @1))
> > > > > -(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> > > > > -&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> > > > > -&& bitwise_equal_p (@0, @1))
> > > > > +   (cnd (cmp (convert?@0 @1) zerop) @2 (negate @2))
> > > > > +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE (@1))
> > > > > +&& !TYPE_UNSIGNED (TREE_TYPE (@1))
> > > > > +&& ((VECTOR_TYPE_P (type)
> > > > > + && tree_nop_conversion_p (TREE_TYPE (@0), TREE_TYPE 
> > > > > (@1)))
> > > > > +   || (!VECTOR_TYPE_P (type)
> > > > > +   && (TYPE_PRECISION (TREE_TYPE (@1))
> > > > > +   <= TYPE_PRECISION (TREE_TYPE (@0)
> > > > > +&& bitwise_equal_p (@1, @2))
> > > > >
> > > > > I wonder about the bitwise_equal_p which tests @1 against @2 now
> > > > > with the convert still applied to @1 - that looks odd.  You are 
> > > > > allowing
> > > > > sign-changing conversions but doesn't that change ge/gt behavior?
> > > > > Also why are sign/zero-extensions not OK for vector types?
> > > > Thanks for the review.
> > > > My main motivation here is for _Float16  as below.
> > > >
> > > > _Float16 absfloat16 (_Float16 x)
> > > > {
> > > >   float _1;
> > > >   _Float16 _2;
> > > >   _Float16 _4;
> > > >[local count: 1073741824]:
> > > >   _1 = (float) x_3(D);
> > > >   if (_1 < 0.0)
> > > > goto ; [41.00%]
> > > >   else
> > > > goto ; [59.00%]
> > > >[local count: 440234144]:\
> > > >   _4 = -x_3(D);
> > > >[local count: 1073741824]:
> > > >   # _2 = PHI <_4(3), x_3(D)(2)>
> > > >   return _2;
> > > > }
> > > >
> > > > This is why I added  bitwise_equal_p test of @1 against @2 with
> > > > TYPE_PRECISION checks.
> > > > I agree that I will have to check for sign-changing conversions.
> > > >
> > > > Just to keep it simple, I disallowed vector types. I am not sure if
> > > > this would  hit vec types. I am happy to handle this if that is
> > > > needed

Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Kewen.Lin
on 2024/8/1 01:52, Carl Love wrote:
> Kewen:
> 
> On 7/31/24 2:12 AM, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2024/7/27 06:56, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> Per a report from a user, the existing vec_test_lsbb_all_ones and, 
>>> vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
>>> documentation file.
>>>
>>> The following patch adds missing documentation for the 
>>> vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.
>>>
>>> Please let me know if the patch is acceptable for mainline.  Thanks.
>>>
>>>  Carl
>>>
>>> ---
>>> rs6000, document built-ins vec_test_lsbb_all_ones and 
>>> vec_test_lsbb_all_zeros
>>>
>>> Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
>>> and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
>>> returns 1 if the least significant bit in each byte is a 1, returns
>>> 0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
>>> the least significant bit in each byte is a zero and 0 otherwise.
>>>
>>> The test cases for the built-ins are in files:
>>>    gcc/testsuite/gcc.target/powerpc/lsbb.c
>>>    gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c
>>>
>>>
>>> gcc/ChangeLog:
>>>      * doc/extend.texi (vec_test_lsbb_all_ones,
>>>      vec_test_lsbb_all_zeros): Add documentation for the
>>>      existing built-ins.
>>> ---
>>>   gcc/doc/extend.texi | 15 +++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index 83ff168faf6..96e41c9a905 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost 
>>> byte of each doubleword.
>>>   The following additional built-in functions are also available for the
>>>   PowerPC family of processors, starting with ISA 3.1 
>>> (@option{-mcpu=power10}):
>>>
>>> +@smallexample
>>> +@exdent int vec_test_lsbb_all_ones (vector char);
>> I think we need to specify "unsigned" char explicitly since we don't actually
>> allow vector "signed" char as the below testing shows:
>>
>> int foo11 (vector signed char va)
>> {
>>    return vec_test_lsbb_all_ones (va);
>> }
>>
>> :17:3: error: invalid parameter combination for AltiVec intrinsic 
>> '__builtin_vec_xvtlsbb_all_ones'
>>     17 |   return vec_test_lsbb_all_ones (va);
>>
>>
>> Now we make these two bifs as overload, but there is only one instance 
>> respectively,
> Yes, I noticed that the built-ins were defined as overloaded but only had one 
> definition.   Did seem odd to me.
> 
>> either is with "vector unsigned char" as argument type, but the 
>> corresponding instance
>> prototype in builtin table is with "vector signed char".  It's inconsistent 
>> and weird,
>> I think we can just update the prototype in builtin table with "vector 
>> unsigned char"
>> and remove the entries in overload table.  It can be a follow up patch.
> 
> I didn't notice that it was signed in the instance prototype but unsigned in 
> the overloaded definition.  That is definitely inconsistent.
> 
> That said, should we just go ahead and support both signed and unsigned 
> argument versions of the all ones and all zeros built-ins?

Good question, I thought about that but found openxl only supports the unsigned 
version 
so I felt it's probably better to keep consistent with it.  But I'm fine for 
either, if
we decide to extend it to cover both signed and unsigned, we should notify 
openxl team
to extend it as well.

openxl doc links:

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros

BR,
Kewen

> 
> For example
> 
> [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
> __builtin_vec_xvtlsbb_all_ones]
>   signed int __builtin_vec_xvtlsbb_all_ones (vsc);
>     XVTLSBB_ONES   LSBB_ALL_ONES_VSC
>   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
>     XVTLSBB_ONES   LSBB_ALL_ONES_VUC
> 
> I tried this with the testcase, I borrowed from you and extended:
> 
> int foo11 (vector char va) <- 
> compi

[COMMITTED] Re: Re: [PATCH] RISC-V: NFC: Do not use zicond for pr105314 testcases

2024-07-31 Thread Xiao Zeng
2024-08-01 09:53  Jeff Law  wrote:
>
>
>
>On 7/30/24 7:05 PM, Xiao Zeng wrote:
>> 2024-07-31 03:10  Jeff Law  wrote:
>>>
>>>
>>>
>>> On 7/28/24 7:58 PM, Xiao Zeng wrote:
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>    * gcc.target/riscv/pr105314-rtl.c: Skip zicond.
>>>>    * gcc.target/riscv/pr105314-rtl32.c: Dotto.
>>>>    * gcc.target/riscv/pr105314.c: Dotto.
>>> Why do you want to skip zicond for this test?
>> Yes, I should provide as detailed a description as possible for each 
>> submitted patch.
>>>
>>> Jeff
>> riscv64-unknown-linux-gnu-gcc  -O2 -march=rv64gc_zicond -mabi=lp64d 
>> ../gcc/testsuite/gcc.target/riscv/pr105314.c -fdump-rtl-ce1 -S -o 
>> pr105314.c.S
>>
>> This output will be obtained:
>[ ... ]
>Thanks.  That's exactly what I needed. 
Yes, patches may appear more straightforward in the eyes of the submitter.
But sometimes it's difficult for people without backgrounds to understand.

Providing detailed explanations may benefit everyone.
>
>This is fine for the trunk, though please fix the typo in your
>ChangeLog.  It's spelled "Ditto" rather than "Dotto". 
After fixing this spelling error, push to the trunk.

>
>jeff
Thanks
Xiao Zeng



RE: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Kong, Lingling



> -Original Message-
> From: Liu, Hongtao 
> Sent: Thursday, August 1, 2024 9:35 AM
> To: Kong, Lingling ; gcc-patches@gcc.gnu.org
> Cc: Wang, Hongyu 
> Subject: RE: [PATCH] i386: Fix memory constraint for APX NF
> 
> 
> 
> > -Original Message-
> > From: Kong, Lingling 
> > Sent: Thursday, August 1, 2024 9:30 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; Wang, Hongyu
> > 
> > Subject: [PATCH] i386: Fix memory constraint for APX NF
> >
> > The je constraint should be used for APX NDD ADD with register source
> > operand. The jM is for APX NDD patterns with immediate operand.
> But these 2 alternatives is for Non-NDD.  
The jM constraint is for the size limit of 15 byes when non-default address 
space,
It also work to APX NF. The je is for TLS code with EVEX prefix for ADD, and 
APX NF
also has the EVEX prefix.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (nf_mem_constraint): Fixed the constraint
> > for the define_subst_attr.
> > (nf_mem_constraint): Added new define_subst_attr.
> > (*add_1): Fixed the constraint.
> > ---
> >  gcc/config/i386/i386.md | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> > fb10fdc9f96..aa7220ee17c 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -6500,7 +6500,8 @@
> >  (define_subst_attr "nf_name" "nf_subst" "_nf" "")  (define_subst_attr
> > "nf_prefix" "nf_subst" "%{nf%} " "")  (define_subst_attr "nf_condition"
> > "nf_subst" "TARGET_APX_NF" "true") -(define_subst_attr
> > "nf_mem_constraint" "nf_subst" "je" "m")
> > +(define_subst_attr "nf_add_mem_constraint" "nf_subst" "je" "m")
> > +(define_subst_attr "nf_mem_constraint" "nf_subst" "jM" "m")
> >  (define_subst_attr "nf_applied" "nf_subst" "true" "false")
> > (define_subst_attr "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")
> > (define_subst_attr "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64") @@ -
> 6514,7 +6515,7 @@
> > (clobber (reg:CC FLAGS_REG))])
> >
> >  (define_insn "*add_1"
> > -  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> > "=rm,r,r,r,r,r,r,r")
> > +  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> > + "=r,r,r,r,r,r,r,r")
> > (plus:SWI48
> >   (match_operand:SWI48 1 "nonimmediate_operand"
> > "%0,0,0,r,r,rje,jM,r")
> >   (match_operand:SWI48 2 "x86_64_general_operand"
> > "r,e,BM,0,le,r,e,BM")))]
> > --
> > 2.31.1


Re: [PATCH] RISC-V: NFC: Do not use zicond for pr105314 testcases

2024-07-31 Thread Jeff Law




On 7/30/24 7:05 PM, Xiao Zeng wrote:

2024-07-31 03:10  Jeff Law  wrote:




On 7/28/24 7:58 PM, Xiao Zeng wrote:

gcc/testsuite/ChangeLog:

   * gcc.target/riscv/pr105314-rtl.c: Skip zicond.
   * gcc.target/riscv/pr105314-rtl32.c: Dotto.
   * gcc.target/riscv/pr105314.c: Dotto.

Why do you want to skip zicond for this test?

Yes, I should provide as detailed a description as possible for each submitted 
patch.


Jeff

riscv64-unknown-linux-gnu-gcc  -O2 -march=rv64gc_zicond -mabi=lp64d 
../gcc/testsuite/gcc.target/riscv/pr105314.c -fdump-rtl-ce1 -S -o pr105314.c.S

This output will be obtained:

[ ... ]
Thanks.  That's exactly what I needed.

This is fine for the trunk, though please fix the typo in your 
ChangeLog.  It's spelled "Ditto" rather than "Dotto".


jeff


RE: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Liu, Hongtao



> -Original Message-
> From: Kong, Lingling 
> Sent: Thursday, August 1, 2024 9:30 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Liu, Hongtao ; Wang, Hongyu
> 
> Subject: [PATCH] i386: Fix memory constraint for APX NF
> 
> The je constraint should be used for APX NDD ADD with register source
> operand. The jM is for APX NDD patterns with immediate operand.
But these 2 alternatives is for Non-NDD.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
> * config/i386/i386.md (nf_mem_constraint): Fixed the constraint
> for the define_subst_attr.
> (nf_mem_constraint): Added new define_subst_attr.
> (*add_1): Fixed the constraint.
> ---
>  gcc/config/i386/i386.md | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index
> fb10fdc9f96..aa7220ee17c 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6500,7 +6500,8 @@
>  (define_subst_attr "nf_name" "nf_subst" "_nf" "")  (define_subst_attr
> "nf_prefix" "nf_subst" "%{nf%} " "")  (define_subst_attr "nf_condition"
> "nf_subst" "TARGET_APX_NF" "true") -(define_subst_attr
> "nf_mem_constraint" "nf_subst" "je" "m")
> +(define_subst_attr "nf_add_mem_constraint" "nf_subst" "je" "m")
> +(define_subst_attr "nf_mem_constraint" "nf_subst" "jM" "m")
>  (define_subst_attr "nf_applied" "nf_subst" "true" "false")  
> (define_subst_attr
> "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")  (define_subst_attr
> "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64") @@ -6514,7 +6515,7 @@
> (clobber (reg:CC FLAGS_REG))])
> 
>  (define_insn "*add_1"
> -  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> "=rm,r,r,r,r,r,r,r")
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand"
> + "=r,r,r,r,r,r,r,r")
> (plus:SWI48
>   (match_operand:SWI48 1 "nonimmediate_operand"
> "%0,0,0,r,r,rje,jM,r")
>   (match_operand:SWI48 2 "x86_64_general_operand"
> "r,e,BM,0,le,r,e,BM")))]
> --
> 2.31.1


[PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Kong, Lingling
The je constraint should be used for APX NDD ADD with register source
operand. The jM is for APX NDD patterns with immediate operand.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.md (nf_mem_constraint): Fixed the constraint
for the define_subst_attr.
(nf_mem_constraint): Added new define_subst_attr.
(*add_1): Fixed the constraint.
---
 gcc/config/i386/i386.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fb10fdc9f96..aa7220ee17c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6500,7 +6500,8 @@
 (define_subst_attr "nf_name" "nf_subst" "_nf" "")
 (define_subst_attr "nf_prefix" "nf_subst" "%{nf%} " "")
 (define_subst_attr "nf_condition" "nf_subst" "TARGET_APX_NF" "true")
-(define_subst_attr "nf_mem_constraint" "nf_subst" "je" "m")
+(define_subst_attr "nf_add_mem_constraint" "nf_subst" "je" "m")
+(define_subst_attr "nf_mem_constraint" "nf_subst" "jM" "m")
 (define_subst_attr "nf_applied" "nf_subst" "true" "false")
 (define_subst_attr "nf_nonf_attr" "nf_subst"  "noapx_nf" "*")
 (define_subst_attr "nf_nonf_x64_attr" "nf_subst" "noapx_nf" "x64")
@@ -6514,7 +6515,7 @@
(clobber (reg:CC FLAGS_REG))])

 (define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" 
"=rm,r,r,r,r,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" 
"=r,r,r,r,r,r,r,r")
(plus:SWI48
  (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,0,r,r,rje,jM,r")
  (match_operand:SWI48 2 "x86_64_general_operand" 
"r,e,BM,0,le,r,e,BM")))]
--
2.31.1


Re: [PATCH] LoongArch: Rework bswap{hi,si,di}2 definition

2024-07-31 Thread Lulu Cheng



在 2024/7/31 下午6:25, Xi Ruoyao 写道:

On Wed, 2024-07-31 at 16:57 +0800, Lulu Cheng wrote:

在 2024/7/29 下午3:58, Xi Ruoyao 写道:

Per a gcc-help thread we are generating sub-optimal code for
__builtin_bswap{32,64}.  To fix it:

- Use a single revb.d instruction for bswapdi2.
- Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
     revb.2h + rotri.w for !TARGET_64BIT.
- Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
     revb.2w instruction for bswapdi2 (x) r>> 32.

Unfortunately I cannot figure out a way to make the compiler generate
revb.4h or revh.{2w,d} instructions.

This optimization is really ingenious and I have no problem.

I also haven't figured out how to generate revb.4h or revh. {2w,d}.
I think we can merge this patch first.

Pushed r15-2433.

Ok. Thanks!


FWIW I tried a naive pattern for revh.2w:

(set (match_operand:DI 0 "register_operand" "=r")
  (ior:DI
(and:DI
  (ashift:DI (match_operand:DI 1 "register_operand" "r")
 (const_int 16))
  (const_int 18446462603027742720))
(and:DI
  (lshiftrt:DI (match_dup 1)
   (const_int 16))
  (const_int 281470681808895
But it seems too complex to be recognized.


I think it needs to be recognized as a bswap operation in the tree-bswap 
phase,


but it seems a bit difficult to be recognized







[PATCH] testsuite: add print-stack.exp

2024-07-31 Thread David Malcolm
I wrote this support file to help me debug Tcl issues in the
testsuite.

Adding a call to:

  print_stack_backtrace

somewhere in a .exp file (along with "load_lib print-stack.exp") leads
to the interpreter printing a backtrace in a form that e.g. Emacs can
consume, with filename:linenum: lines, and quoting the line of .exp
source code.

Fer example, adding a print_stack_backtrace to scansarif.exp in
run-sarif-pytest I get this output:

VVV START OF BACKTRACE VVV
  /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/scansarif.exp:142: frame 
16 in proc print_stack_backtrace
142 | print_stack_backtrace
  : frame 15 in proc run-sarif-pytest
  : frame 14 in proc dg-final-proc
  /usr/share/dejagnu/dg.exp:851: frame 13 in proc dg-final-proc
851 |   if {[catch "dg-final-proc $prog" errmsg]} {
  : frame 12 in proc saved-dg-test
  /home/david/coding/gcc-newgit/src/gcc/testsuite/lib/gcc-dg.exp:1080: frame 11 
in proc saved-dg-test
1080 |  if { [ catch { eval saved-dg-test $args } errmsg ] } {
  /usr/share/dejagnu/dg.exp:559: frame 10 in proc dg-test
559 |   dg-test $testcase $options ${default-extra-options}
  
/home/david/coding/gcc-newgit/src/gcc/testsuite/gcc.dg/sarif-output/sarif-output.exp:28:
 frame 9
28 | dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.c]] "" ""
  : frame 8
  : frame 7
  /usr/share/dejagnu/runtest.exp:1460: frame 6
1460 |  if { [catch "uplevel #0 source $test_file_name"] == 1 } {
  /usr/share/dejagnu/runtest.exp:1886: frame 5 in proc dg-runtest
1886 |  runtest $test_name
  /usr/share/dejagnu/runtest.exp:1845: frame 4 in proc dg-runtest
1845 |  foreach test_name [lsort [find ${dir} *.exp]] {
  /usr/share/dejagnu/runtest.exp:1788: frame 3 in proc dg-runtest
1788 |  foreach dir "${test_top_dirs}" {
  /usr/share/dejagnu/runtest.exp:1669: frame 2 in proc dg-runtest
1669 | foreach pass $multipass {
  /usr/share/dejagnu/runtest.exp:1619: frame 1 in proc dg-runtest
1619 | foreach current_target $target_list {
^^^  END OF BACKTRACE  ^^^

and can click on the lines in Emacs's compilation buffer to take
me to the relevant places.

I found this made it *much* easier to debug my .exp files.  That
said, I'm uncomfortable with Tcl, and so
(a) there may be a better way of doing this
(b) I may have made mistakes

OK for trunk?

gcc/testsuite/ChangeLog:
* lib/print-stack.exp: New file.

Signed-off-by: David Malcolm 
---
 gcc/testsuite/lib/print-stack.exp | 59 +++
 1 file changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/lib/print-stack.exp

diff --git a/gcc/testsuite/lib/print-stack.exp 
b/gcc/testsuite/lib/print-stack.exp
new file mode 100644
index 000..5688d0a63de
--- /dev/null
+++ b/gcc/testsuite/lib/print-stack.exp
@@ -0,0 +1,59 @@
+# Copyright (C) 2024 Free Software Foundation, Inc.
+#  Contributed by David Malcolm .
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# Get the 1-based line for LINENUM from FILENAME as a string
+
+proc get_line { filename linenum } {
+set f [open $filename]
+set lines [split [read $f] \n]
+close $f
+return [lindex $lines [expr $linenum - 1] ]
+}
+
+# Print a backtrace of the Tcl interpreter's stack, showing
+# frames, levels, source file and line where available.
+
+proc print_stack_backtrace {} {
+set current_frame_level [info frame]
+puts "VVV START OF BACKTRACE VVV"
+for {set i [expr $current_frame_level - 1]} {$i > 0} {incr i -1} {
+   set frame [info frame $i]
+   if { [dict exists $frame "level"] } {
+   set level_num [dict get $frame "level"]
+   set relative_level_offset [expr 1 - $level_num]
+   set level [info level $relative_level_offset]
+   set procname [lindex $level 0]
+   # TODO: args = rest of $level, but this can be very long
+   } else {
+   set procname ""
+   }
+   set suffix ""
+   if { $procname != "" } {
+   set suffix " in proc $procname"
+   }
+   if { [dict get $frame "type"] == "source" } {
+   set fname [dict get $frame "file"]
+   set line [dict get $frame "line"]
+   puts "  $fname:$line: frame $i$suffix"
+   puts "$line | [get_line $fname $line]"
+   } else {
+   set type [dict get $frame "type"]
+ 

RE: [PATCH] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-31 Thread Pengxuan Zheng (QUIC)
> Sorry for the slow review.
> 
> Pengxuan Zheng  writes:
> > This patch improves the Advanced SIMD popcount expansion by using SVE
> > if available.
> >
> > For example, GCC currently generates the following code sequence for V2DI:
> >   cnt v31.16b, v31.16b
> >   uaddlp  v31.8h, v31.16b
> >   uaddlp  v31.4s, v31.8h
> >   uaddlp  v31.2d, v31.4s
> >
> > However, by using SVE, we can generate the following sequence instead:
> >   ptrue   p7.b, all
> >   cnt z31.d, p7/m, z31.d
> >
> > Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too.
> >
> > The scalar popcount expansion can also be improved similarly by using
> > SVE and those changes will be included in a separate patch.
> >
> > PR target/113860
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (popcount2): Add
> TARGET_SVE
> > support.
> > * config/aarch64/aarch64-sve.md
> (@aarch64_pred_popcount): New
> > insn.
> > * config/aarch64/iterators.md (VPRED): Add V4HI, V8HI and V2SI.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/popcnt-sve.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md|  9 ++
> >  gcc/config/aarch64/aarch64-sve.md | 12 +++
> >  gcc/config/aarch64/iterators.md   |  1 +
> >  gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 88
> > +++
> >  4 files changed, 110 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index bbeee221f37..895d6e5eab5 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3508,6 +3508,15 @@ (define_expand "popcount2"
> > (popcount:VDQHSD (match_operand:VDQHSD 1
> "register_operand")))]
> >"TARGET_SIMD"
> >{
> > +if (TARGET_SVE)
> > +  {
> > +   rtx p = aarch64_ptrue_reg (mode);
> > +   emit_insn (gen_aarch64_pred_popcount (operands[0],
> > +   p,
> > +   operands[1]));
> > +   DONE;
> > +  }
> > +
> >  /* Generate a byte popcount.  */
> >  machine_mode mode =  == 64 ? V8QImode : V16QImode;
> >  rtx tmp = gen_reg_rtx (mode);
> > diff --git a/gcc/config/aarch64/aarch64-sve.md
> > b/gcc/config/aarch64/aarch64-sve.md
> > index 5331e7121d5..b5021dd2da0 100644
> > --- a/gcc/config/aarch64/aarch64-sve.md
> > +++ b/gcc/config/aarch64/aarch64-sve.md
> > @@ -3168,6 +3168,18 @@ (define_insn "*cond__any"
> >}
> >  )
> >
> > +;; Popcount predicated with a PTRUE.
> > +(define_insn "@aarch64_pred_popcount"
> > +  [(set (match_operand:VDQHSD 0 "register_operand" "=w")
> > +   (unspec:VDQHSD
> > + [(match_operand: 1 "register_operand" "Upl")
> > +  (popcount:VDQHSD
> > +(match_operand:VDQHSD 2 "register_operand" "0"))]
> > + UNSPEC_PRED_X))]
> > +  "TARGET_SVE"
> > +  "cnt\t%Z0., %1/m, %Z2."
> > +)
> > +
> 
> Could you instead change:
> 
> (define_insn "@aarch64_pred_"
>   [(set (match_operand:SVE_I 0 "register_operand")
>   (unspec:SVE_I
> [(match_operand: 1 "register_operand")
>  (SVE_INT_UNARY:SVE_I
>(match_operand:SVE_I 2 "register_operand"))]
> UNSPEC_PRED_X))]
>   "TARGET_SVE"
>   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
>  [ w, Upl , 0 ; *  ] \t%0., %1/m,
> %2.
>  [ ?  , Upl , w ; yes] movprfx\t%0,
> %2\;\t%0., %1/m, %2.
>   }
> )
> 
> to use a new iterator SVE_VDQ_I, defined as:
> 
> (define_mode_iterator SVE_VDQ_I [SVE_I VDQI_I])
> 
> ?  That will give the benefit of the movprfx handling and avoid code
> duplication.  It will define some patterns that are initially unused, but 
> that's
> ok.  I think the direction of travel would be to use some of the others
> eventually.
> 
> OK with that change if there are no other comments in 24 hours.

Thanks, Richard. Here's the patch updated according to your feedback.
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/658929.html

I'll commit it if there's no other comments in 24 hours.

Thanks,
Pengxuan
> 
>

[PATCH v2] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-31 Thread Pengxuan Zheng
This patch improves the Advanced SIMD popcount expansion by using SVE if
available.

For example, GCC currently generates the following code sequence for V2DI:
  cnt v31.16b, v31.16b
  uaddlp  v31.8h, v31.16b
  uaddlp  v31.4s, v31.8h
  uaddlp  v31.2d, v31.4s

However, by using SVE, we can generate the following sequence instead:
  ptrue   p7.b, all
  cnt z31.d, p7/m, z31.d

Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too.

The scalar popcount expansion can also be improved similarly by using SVE and
those changes will be included in a separate patch.

PR target/113860

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (popcount2): Add TARGET_SVE
support.
* config/aarch64/aarch64-sve.md (@aarch64_pred_): Use new
iterator SVE_VDQ_I.
* config/aarch64/iterators.md (SVE_VDQ_I): New mode iterator.
(VPRED): Add V8QI, V16QI, V4HI, V8HI and V2SI.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-sve.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md|  9 ++
 gcc/config/aarch64/aarch64-sve.md | 13 +--
 gcc/config/aarch64/iterators.md   |  5 ++
 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 88 +++
 4 files changed, 109 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index bbeee221f37..895d6e5eab5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3508,6 +3508,15 @@ (define_expand "popcount2"
(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
   "TARGET_SIMD"
   {
+if (TARGET_SVE)
+  {
+   rtx p = aarch64_ptrue_reg (mode);
+   emit_insn (gen_aarch64_pred_popcount (operands[0],
+   p,
+   operands[1]));
+   DONE;
+  }
+
 /* Generate a byte popcount.  */
 machine_mode mode =  == 64 ? V8QImode : V16QImode;
 rtx tmp = gen_reg_rtx (mode);
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 5331e7121d5..eb3705ae515 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3104,16 +3104,16 @@ (define_expand "2"
 
 ;; Integer unary arithmetic predicated with a PTRUE.
 (define_insn "@aarch64_pred_"
-  [(set (match_operand:SVE_I 0 "register_operand")
-   (unspec:SVE_I
+  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
+   (unspec:SVE_VDQ_I
  [(match_operand: 1 "register_operand")
-  (SVE_INT_UNARY:SVE_I
-(match_operand:SVE_I 2 "register_operand"))]
+  (SVE_INT_UNARY:SVE_VDQ_I
+(match_operand:SVE_VDQ_I 2 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
- [ w, Upl , 0 ; *  ] \t%0., %1/m, 
%2.
- [ ?  , Upl , w ; yes] movprfx\t%0, 
%2\;\t%0., %1/m, %2.
+ [ w, Upl , 0 ; *  ] \t%Z0., %1/m, 
%Z2.
+ [ ?  , Upl , w ; yes] movprfx\t%Z0, 
%Z2\;\t%Z0., %1/m, %Z2.
   }
 )
 
@@ -3168,6 +3168,7 @@ (define_insn "*cond__any"
   }
 )
 
+
 ;; -
 ;;  [INT] General unary arithmetic corresponding to unspecs
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index f527b2cfeb8..ee3d1fb98fd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -559,6 +559,9 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
 ;; element modes
 (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI])
 
+;; All SVE and Advanced SIMD integer vector modes.
+(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I])
+
 ;; SVE integer vector modes whose elements are 16 bits or wider.
 (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI
VNx4SI VNx2SI
@@ -2278,6 +2281,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
"VNx8BI")
 (VNx32BF "VNx8BI")
 (VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
+(V8QI "VNx8BI") (V16QI "VNx16BI")
+(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
 (V4SI "VNx4BI") (V2DI "VNx2BI")])
 
 ;; ...and again in lower case.
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c 
b/gcc/testsuite/gcc.target

[PATCH v2] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-31 Thread Pengxuan Zheng
This patch improves the Advanced SIMD popcount expansion by using SVE if
available.

For example, GCC currently generates the following code sequence for V2DI:
  cnt v31.16b, v31.16b
  uaddlp  v31.8h, v31.16b
  uaddlp  v31.4s, v31.8h
  uaddlp  v31.2d, v31.4s

However, by using SVE, we can generate the following sequence instead:
  ptrue   p7.b, all
  cnt z31.d, p7/m, z31.d

Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too.

The scalar popcount expansion can also be improved similarly by using SVE and
those changes will be included in a separate patch.

PR target/113860

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (popcount2): Add TARGET_SVE
support.
* config/aarch64/aarch64-sve.md (@aarch64_pred_): Use new
iterator SVE_VDQ_I.
* config/aarch64/iterators.md (SVE_VDQ_I): New mode iterator.
(VPRED): Add V8QI, V16QI, V4HI, V8HI and V2SI.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-sve.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md|  9 ++
 gcc/config/aarch64/aarch64-sve.md | 13 +--
 gcc/config/aarch64/iterators.md   |  5 ++
 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 88 +++
 4 files changed, 109 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index bbeee221f37..895d6e5eab5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3508,6 +3508,15 @@ (define_expand "popcount2"
(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
   "TARGET_SIMD"
   {
+if (TARGET_SVE)
+  {
+   rtx p = aarch64_ptrue_reg (mode);
+   emit_insn (gen_aarch64_pred_popcount (operands[0],
+   p,
+   operands[1]));
+   DONE;
+  }
+
 /* Generate a byte popcount.  */
 machine_mode mode =  == 64 ? V8QImode : V16QImode;
 rtx tmp = gen_reg_rtx (mode);
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 5331e7121d5..eb3705ae515 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3104,16 +3104,16 @@ (define_expand "2"
 
 ;; Integer unary arithmetic predicated with a PTRUE.
 (define_insn "@aarch64_pred_"
-  [(set (match_operand:SVE_I 0 "register_operand")
-   (unspec:SVE_I
+  [(set (match_operand:SVE_VDQ_I 0 "register_operand")
+   (unspec:SVE_VDQ_I
  [(match_operand: 1 "register_operand")
-  (SVE_INT_UNARY:SVE_I
-(match_operand:SVE_I 2 "register_operand"))]
+  (SVE_INT_UNARY:SVE_VDQ_I
+(match_operand:SVE_VDQ_I 2 "register_operand"))]
  UNSPEC_PRED_X))]
   "TARGET_SVE"
   {@ [ cons: =0 , 1   , 2 ; attrs: movprfx ]
- [ w, Upl , 0 ; *  ] \t%0., %1/m, 
%2.
- [ ?  , Upl , w ; yes] movprfx\t%0, 
%2\;\t%0., %1/m, %2.
+ [ w, Upl , 0 ; *  ] \t%Z0., %1/m, 
%Z2.
+ [ ?  , Upl , w ; yes] movprfx\t%Z0, 
%Z2\;\t%Z0., %1/m, %Z2.
   }
 )
 
@@ -3168,6 +3168,7 @@ (define_insn "*cond__any"
   }
 )
 
+
 ;; -
 ;;  [INT] General unary arithmetic corresponding to unspecs
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index f527b2cfeb8..ee3d1fb98fd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -559,6 +559,9 @@ (define_mode_iterator SVE_I [VNx16QI VNx8QI VNx4QI VNx2QI
 ;; element modes
 (define_mode_iterator SVE_I_SIMD_DI [SVE_I V2DI])
 
+;; All SVE and Advanced SIMD integer vector modes.
+(define_mode_iterator SVE_VDQ_I [SVE_I VDQ_I])
+
 ;; SVE integer vector modes whose elements are 16 bits or wider.
 (define_mode_iterator SVE_HSDI [VNx8HI VNx4HI VNx2HI
VNx4SI VNx2SI
@@ -2278,6 +2281,8 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
"VNx8BI")
 (VNx32BF "VNx8BI")
 (VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
+(V8QI "VNx8BI") (V16QI "VNx16BI")
+(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
 (V4SI "VNx4BI") (V2DI "VNx2BI")])
 
 ;; ...and again in lower case.
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c 
b/gcc/testsuite/gcc.target

[PATCH v2] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-31 Thread Pengxuan Zheng
This has been approved and will be committed if there's no other comments in a
day.



[PATCH v2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Andrew Pinski
When this pattern was converted from being only dealing with 0/-1, we missed 
that if `e == f` is true
then the optimization is wrong and needs an extra check for that.

This changes the patterns to be:
/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */

Also this can't be done if the X can be a NaNs either. Since that changes the 
value there too.

This still produces better code than the original case and in many cases (x != 
y) will
still reduce to either false or true.

With this change we also need to make sure `a`, `b` and the resulting types are 
all
the same for the same reason as the previous patch.

I updated (well added) to the testcases to make sure there are the right amount 
of
comparisons left.

Changes since v1:
* v2: Fixed the testcase names and fixed dg-run to be `dg-do run`. Added a 
check for HONORS_NANS too.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/116120

gcc/ChangeLog:

* match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Add test for `x != y`
in result.
(`(a ? x : y) eq/ne (b ? y : x)`): Add test for `x == y` in result.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/pr50.C: Add extra checks on the test.
* gcc.dg/tree-ssa/pr50-1.c: Likewise.
* gcc.dg/tree-ssa/pr50.c: Likewise.
* g++.dg/torture/pr116120-1.C: New test.
* g++.dg/torture/pr116120-2.C: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 23 ++
 gcc/testsuite/g++.dg/torture/pr116120-1.C  | 32 +
 gcc/testsuite/g++.dg/torture/pr116120-2.C  | 53 ++
 gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 10 
 gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c |  9 
 gcc/testsuite/gcc.dg/tree-ssa/pr50.c   |  1 +
 6 files changed, 120 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-1.C
 create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-2.C

diff --git a/gcc/match.pd b/gcc/match.pd
index 881a827860f..c9c8478d286 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5632,21 +5632,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (vec_cond (bit_and (bit_not @0) @1) @2 @3)))
 #endif
 
-/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
-/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
-/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
-/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
+/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
+/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
+/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
+/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
+/* These are only valid if x and y don't have NaNs. */
 (for cnd (cond vec_cond)
  (for eqne (eq ne)
   (simplify
(eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
-(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
- (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
+(if (!HONOR_NANS (@1)
+&& types_match (TREE_TYPE (@0), TREE_TYPE (@3))
+ && types_match (type, TREE_TYPE (@0)))
+ (cnd (bit_and (bit_xor @0 @3) (ne:type @1 @2))
+  { constant_boolean_node (eqne == NE_EXPR, type); }
   { constant_boolean_node (eqne != NE_EXPR, type); })))
   (simplify
(eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
-(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
- (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
+(if (!HONOR_NANS (@1)
+&& types_match (TREE_TYPE (@0), TREE_TYPE (@3))
+ && types_match (type, TREE_TYPE (@0)))
+ (cnd (bit_ior (bit_xor @0 @3) (eq:type @1 @2))
+  { constant_boolean_node (eqne != NE_EXPR, type); }
   { constant_boolean_node (eqne == NE_EXPR, type); })
 
 /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.C 
b/gcc/testsuite/g++.dg/torture/pr116120-1.C
new file mode 100644
index 000..209946f17a4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr116120-1.C
@@ -0,0 +1,32 @@
+// { dg-do run }
+// PR tree-optimization/116120
+
+// The optimization for `(a ? x : y) != (b ? x : y)`
+// missed that x and y could be the same value.
+
+typedef int v4si __attribute((__vector_size__(1 * sizeof(int;
+v4si f1(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
+  v4si X = a == b ? e : f;
+  v4si Y = c == d ? e : f;
+  return (X != Y); // ~(X == Y ? -1 : 0) (x ^ Y)
+}
+
+int f2(int a, int b, int c, int d, int e, int f) {
+  int X = a == b ? e : f;
+  int Y = c == d ? e : f;
+  return (X != Y) ? -1 : 0; // ~(X ==

Re: Ping^6: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-07-31 Thread Joseph Myers
On Mon, 22 Jul 2024, Xi Ruoyao wrote:

> On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> > In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> > the build is configured --enable-default-pie.  Let's fix them.
> > 
> > Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> > 
> > Xi Ruoyao (2):
> >   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
> >   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> > 
> >  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
> >  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
> >  2 files changed, 2 insertions(+), 3 deletions(-)

OK in the absence of i386 maintainer objections within 72 hours.

-- 
Joseph S. Myers
josmy...@redhat.com

Re: [PATCH] libstdc++: Fix future.wait_until when given a negative time_point

2024-07-31 Thread Jonathan Wakely
On Wed, 24 Jul 2024 at 14:14, William Tsai  wrote:
>
> The template `future.wait_until` will expand to
> `_M_load_and_test_until_impl` where it will call
> `_M_load_and_test_until*` with given time_point casted into second and
> nanosecond. The callee expects the caller to provide the values
> correctly from caller while the caller did not make check with those
> values. One possible error is that if `future.wait_until` is given with
> a negative time_point, the underlying system call will raise an error as
> the system call does not accept second < 0 and nanosecond < 1.

Thanks for the patch, it looks correct. The futex syscall returns
EINVAL in this case, which we don't handle, so the caller loops and
keeps calling the syscall again, which fails again the same way.

I think it would be good to mention EINVAL, e.g. "will raise an EINVAL
error" instead of just "will raise an error".

It would also be good to add a test to the testsuite for this.

Do you have git write access, or do you need me to push it once it's approved?


>
> Following is a simple testcase:
> ```
> #include 
> #include 
> #include 
>
> using namespace std;
>
> int main() {
> promise p;
> future f = p.get_future();
> chrono::steady_clock::time_point tp(chrono::milliseconds{-10});
> future_status status = f.wait_until(tp);
> if (status == future_status::timeout) {
> cout << "Timed out" << endl;
> }
> return 0;
> }
> ```
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_futex.h: Check if __s and __ns is valid before
> calling before calling _M_load_and_test_until*
>
> Signed-off-by: William Tsai 
> ---
>  libstdc++-v3/include/bits/atomic_futex.h | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
> b/libstdc++-v3/include/bits/atomic_futex.h
> index dd654174873..4c31946a97f 100644
> --- a/libstdc++-v3/include/bits/atomic_futex.h
> +++ b/libstdc++-v3/include/bits/atomic_futex.h
> @@ -173,6 +173,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>auto __s = chrono::time_point_cast(__atime);
>auto __ns = chrono::duration_cast(__atime - __s);
>// XXX correct?
> +  if ((__s.time_since_epoch().count() < 0) || (__ns.count() < 0))
> +   return false;
>return _M_load_and_test_until(__assumed, __operand, __equal, __mo,
>   true, __s.time_since_epoch(), __ns);
>  }
> @@ -186,6 +188,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>auto __s = chrono::time_point_cast(__atime);
>auto __ns = chrono::duration_cast(__atime - __s);
>// XXX correct?
> +  if ((__s.time_since_epoch().count() < 0) || (__ns.count() < 0))
> +   return false;
>return _M_load_and_test_until_steady(__assumed, __operand, __equal, 
> __mo,
>   true, __s.time_since_epoch(), __ns);
>  }
> --
> 2.37.1
>



Re: [PATCH v2] c++/coroutines: only defer expanding co_{await,return,yield} if dependent [PR112341]

2024-07-31 Thread Jason Merrill

On 7/31/24 3:56 PM, Arsen Arsenović wrote:

Okay, I've reworked it, and it built and passed coroutine tests.
Regstrapping overnight.  Is the following OK with you?


OK.


-- >8 --
By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

   awaitable g();
   template
   task f()
   {
 co_await g();
 co_yield 1;
 co_return "foo";
   }

... the coroutine promise type in each statement is always
std::coroutine_handle::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on 
co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a proper type.
(coro_dependent_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
coro_dependent_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
---
  gcc/cp/coroutines.cc  | 157 ++
  gcc/cp/pt.cc  |   5 -
  gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
  gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
  gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
  .../torture/co-yield-03-tmpl-nondependent.C   | 140 
  6 files changed, 376 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
  create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 08a610afc82b..b535519b56d1 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
tree actor_decl;/* The synthesized actor function.  */
tree destroy_decl;  /* The synthesized destroy function.  */
tree promise_type;  /* The cached promise type for this function.  */
+  tree traits_type;   /* The cached traits type for this function.  */
tree handle_type;   /* The cached coroutine handle for this function.  */
tree self_h_proxy;  /* A handle instance that is used as the proxy for the
 one that will eventually be allocated in the coroutine
@@ -527,11 +528,12 @@ find_promise_type (tree traits_class)
return promise_type;
  }
  
+/* Perform initialization of the coroutine processor state, if not done

+   before.  */
+
  static bool
-coro_promise_type_found_p (tree fndecl, location_t loc)
+ensure_coro_initialized (location_t loc)
  {
-  gcc_assert (fndecl != NULL_TREE);
-
if (!coro_initialized)
  {
/* Trees we only need to create once.
@@ -569,6 +571,30 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
  
coro_initialized = true;

  }
+  return true;
+}
+
+/* Try to get the coroutine traits class.  */
+static tree
+coro_get_traits_class (tree fndecl, location_t loc)
+{
+  gcc_assert (fndecl != NULL_TREE);
+  gcc_assert (coro_initialized);
+
+  coroutine_info *coro_info = 

Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-31 Thread Carl Love



Kewen:

On 7/29/24 3:21 AM, Kewen.Lin wrote:

+@smallexample
+@exdent vector signed __int128 vec_sld (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_sldw (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_srl (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
+vector unsigned char);
+@end smallexample
+
+The above instances are extension of the existing overloaded built-ins
+@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
+that are documented in the PVIPR.
+
  @findex vec_srdb

Nit: The above new @smallexample section and its associated description should 
be
placed after this @findex vec_srdb (otherwise it breaks the connection between 
the
index and the content of vec_srdb),

Yes, my bad.  I didn't notice I got the findex vec_srdb in the wrong place.


  but personally I preferred it to be placed at
the end of this node, that is: after
"int vec_any_le (vector unsigned __int128, vector unsigned __int128);
@end smallexample
" as what's in your previous version, since most of these beginning entries have
their headings but this @smallexample section doesn't have a heading, it looks a
bit weird.


OK, perhaps I didn't understand where you wanted it in the previous 
email.  I moved it.  Hopefully I have it correct this time.



  Vector Splat
diff --git 
a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
index 000..65e8e94ec07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
@@ -0,0 +1,358 @@
+/* { dg-do run  { target power10_hw } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */

As Peter pointed out in another thread, you need int128 effective target check 
as well,
otherwise it will fail with power10 -m32.

Another nit: power10_hw should already guarantee power10_ok, so power10_ok
is only required for dg-do link.


Changed to:

+/* { dg-do run  { target power10_hw } } */
+/* { dg-do compile  { target { ! power10_hw } } } */
+/* { dg-require-effective-target int128 } */

per the discussion/feedback from Kewen and Peter.

 Carl


Re: [PATCH v2] c: Add support for byte arrays in C2Y

2024-07-31 Thread Joseph Myers
On Sat, 13 Jul 2024, Martin Uecker wrote:

> This marks structures which include a byte array
> as typeless storage for all C language modes.
> 
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
> 
> 
> c: Add support for byte arrays in C2Y
> 
> To get correct aliasing behavior requires that structures and unions
> that contain a byte array, i.e. an array of non-atomic character
> type (N3254), are marked with TYPE_TYPELESS_STORAGE.  This change
> affects also earlier language modes.
> 
> gcc/c/
> * c-decl.cc (grokdeclarator, finish_struct): Set and
> propagate TYPE_TYPELESS_STORAGE.
> 
> gcc/testsuite/
> * gcc.dg/c2y-byte-alias-1.c: New test.
> * gcc.dg/c2y-byte-alias-2.c: New test.
> * gcc.dg/c2y-byte-alias-3.c: New test.

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-07-31 Thread Andrew Pinski
On Wed, Jul 31, 2024 at 3:42 AM Mariam Arutunian
 wrote:
>
> Gives an opportunity to execute the code on bit level,
>assigning symbolic values to the variables which don't have initial values.
>Supports only CRC specific operations.
>
>Example:
>
>uint8_t crc;
>uint8_t pol = 1;
>crc = crc ^ pol;
>
>during symbolic execution crc's value will be:
>crc(8), crc(7), ... crc(1), crc(0) ^ 1
>
>Author: Matevos Mehrabyan 
>
>  gcc/
>
>* Makefile.in (OBJS): Add sym-exec/expression.o,
>sym-exec/state.o, sym-exec/condition.o.
>* configure (sym-exec): New subdir.

I have one thing to make a mention of and I am not 100% it is a big
issue but non-verbose file names like expression.o has an issue adding
to an archive (in this case libbackend.a) due to not always using the
full path of the object file. I am not sure this is a problem we need
to worry about right now but it might be one in the future and one
which we should document on coding conventions on naming things that
will be included in OBJS (which puts it into libbackend.a).

Thanks,
Andrew Pinski

>
>  gcc/sym-exec/
>
>* condition.cc: New file.
>* condition.h: New file.
>* expression-is-a-helper.h: New file.
>* expression.cc: New file.
>* expression.h: New file.
>* state.cc: New file.
>* state.h: New file.
>
>Signed-off-by: Mariam Arutunian 


[PATCH 1/8] fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Add the tests covering the various cases for which we are about to implement
inline expansion of MINLOC and MAXLOC.  Those are cases where the DIM
argument is not present.

PR fortran/90608

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_7.f90: New test.
* gfortran.dg/maxloc_with_mask_1.f90: New test.
* gfortran.dg/minloc_8.f90: New test.
* gfortran.dg/minloc_with_mask_1.f90: New test.
---
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 220 ++
 .../gfortran.dg/maxloc_with_mask_1.f90| 393 ++
 gcc/testsuite/gfortran.dg/minloc_8.f90| 220 ++
 .../gfortran.dg/minloc_with_mask_1.f90| 392 +
 4 files changed, 1225 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90

diff --git a/gcc/testsuite/gfortran.dg/maxloc_7.f90 
b/gcc/testsuite/gfortran.dg/maxloc_7.f90
new file mode 100644
index 000..a875083052a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/maxloc_7.f90
@@ -0,0 +1,220 @@
+! { dg-do run }
+!
+! PR fortran/90608
+! Check the correct behaviour of the inline maxloc implementation,
+! when there is no optional argument.
+
+program p
+  implicit none
+  integer, parameter :: data5(*) = (/ 1, 7, 2, 7, 0 /)
+  integer, parameter :: data64(*) = (/ 2, 5, 4, 6, 0, 9, 3, 5,  &
+   4, 4, 1, 7, 3, 2, 1, 2,  &
+   5, 4, 6, 0, 9, 3, 5, 4,  &
+   4, 1, 7, 3, 2, 1, 2, 5,  &
+   4, 6, 0, 9, 3, 5, 4, 4,  &
+   1, 7, 3, 2, 1, 2, 5, 4,  &
+   6, 0, 9, 3, 5, 4, 4, 1,  &
+   7, 3, 2, 1, 2, 5, 4, 6  /)
+  call check_int_const_shape_rank_1
+  call check_int_const_shape_rank_3
+  call check_int_const_shape_empty_4
+  call check_int_alloc_rank_1
+  call check_int_alloc_rank_3
+  call check_int_alloc_empty_4
+  call check_real_const_shape_rank_1
+  call check_real_const_shape_rank_3
+  call check_real_const_shape_empty_4
+  call check_real_alloc_rank_1
+  call check_real_alloc_rank_3
+  call check_real_alloc_empty_4
+  call check_int_lower_bounds
+  call check_real_lower_bounds
+  call check_all_nans
+  call check_dependencies
+contains
+  subroutine check_int_const_shape_rank_1()
+integer :: a(5)
+integer, allocatable :: m(:)
+a = data5
+m = maxloc(a)
+if (size(m, dim=1) /= 1) stop 11
+if (any(m /= (/ 2 /))) stop 12
+  end subroutine
+  subroutine check_int_const_shape_rank_3()
+integer :: a(4,4,4)
+integer, allocatable :: m(:)
+a = reshape(data64, shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 3) stop 21
+if (any(m /= (/ 2, 2, 1 /))) stop 22
+  end subroutine
+  subroutine check_int_const_shape_empty_4()
+integer :: a(9,3,0,7)
+integer, allocatable :: m(:)
+a = reshape((/ integer:: /), shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 4) stop 31
+if (any(m /= (/ 0, 0, 0, 0 /))) stop 32
+  end subroutine
+  subroutine check_int_alloc_rank_1()
+integer, allocatable :: a(:)
+integer, allocatable :: m(:)
+allocate(a(5))
+a(:) = data5
+m = maxloc(a)
+if (size(m, dim=1) /= 1) stop 41
+if (any(m /= (/ 2 /))) stop 42
+  end subroutine
+  subroutine check_int_alloc_rank_3()
+integer, allocatable :: a(:,:,:)
+integer, allocatable :: m(:)
+allocate(a(4,4,4))
+a(:,:,:) = reshape(data64, shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 3) stop 51
+if (any(m /= (/ 2, 2, 1 /))) stop 52
+  end subroutine
+  subroutine check_int_alloc_empty_4()
+integer, allocatable :: a(:,:,:,:)
+integer, allocatable :: m(:)
+allocate(a(9,3,0,7))
+a(:,:,:,:) = reshape((/ integer:: /), shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 4) stop 61
+if (any(m /= (/ 0, 0, 0, 0 /))) stop 62
+  end subroutine
+  subroutine check_real_const_shape_rank_1()
+real :: a(5)
+integer, allocatable :: m(:)
+a = (/ real:: data5 /)
+m = maxloc(a)
+if (size(m, dim=1) /= 1) stop 71
+if (any(m /= (/ 2 /))) stop 72
+  end subroutine
+  subroutine check_real_const_shape_rank_3()
+real :: a(4,4,4)
+integer, allocatable :: m(:)
+a = reshape((/ real:: data64 /), shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 3) stop 81
+if (any(m /= (/ 2, 2, 1 /))) stop 82
+  end subroutine
+  subroutine check_real_const_shape_empty_4()
+real :: a(9,3,0,7)
+integer, allocatable :: m(:)
+a = reshape((/ real:: /), shape(a))
+m = maxloc(a)
+if (size(m, dim=1) /= 4) stop 91
+if (any(m /= (/ 0, 0, 0, 0 /))) stop 92
+  end subroutine
+  

[PATCH 6/8] fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable the generation of inline code for MINLOC/MAXLOC when argument ARRAY
is of integral type, DIM is not present, and MASK is present and is scalar
(only absent MASK or rank 1 ARRAY were inlined before).

Scalar masks are implemented with a wrapping condition around the code one
would generate if MASK wasn't present, so they are easy to support once
inline code without MASK is working.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate
variable initialization for each dimension in the else branch of
the toplevel condition.
(gfc_inline_intrinsic_function_p): Return TRUE for scalar MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_7.f90: Additionally accept the error message
reported by the scalarizer.
---
 gcc/fortran/trans-intrinsic.cc| 13 -
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |  4 ++--
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index ac8bd2d4812..85520871797 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5886,7 +5886,6 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   /* For a scalar mask, enclose the loop in an if statement.  */
   if (maskexpr && maskss == NULL)
 {
-  gcc_assert (loop.dimen == 1);
   tree ifmask;
 
   gfc_init_se (, NULL);
@@ -5901,7 +5900,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
 the pos variable the same way as above.  */
 
   gfc_init_block ();
-  gfc_add_modify (, pos[0], gfc_index_zero_node);
+  for (int i = 0; i < loop.dimen; i++)
+   gfc_add_modify (, pos[i], gfc_index_zero_node);
   elsetmp = gfc_finish_block ();
   ifmask = conv_mask_condition (, maskexpr, optional_mask);
   tmp = build3_v (COND_EXPR, ifmask, tmp, elsetmp);
@@ -11795,9 +11795,12 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
if (array->rank == 1)
  return true;
 
-   if (array->ts.type == BT_INTEGER
-   && dim == nullptr
-   && mask == nullptr)
+   if (array->ts.type != BT_INTEGER
+   || dim != nullptr)
+ return false;
+
+   if (mask == nullptr
+   || mask->rank == 0)
  return true;
 
return false;
diff --git a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 
b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
index 206a29b149d..3aa9d3dcebe 100644
--- a/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
+++ b/gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-options "-fbounds-check" }
-! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2" }
+! { dg-shouldfail "Incorrect extent in return value of MAXLOC intrinsic: is 3, 
should be 2|Array bound mismatch for dimension 1 of array 'res' .3/2." }
 module tst
 contains
   subroutine foo(res)
@@ -18,4 +18,4 @@ program main
   integer :: res(3)
   call foo(res)
 end program main
-! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2" }
+! { dg-output "Fortran runtime error: Incorrect extent in return value of 
MAXLOC intrinsic: is 3, should be 2|Array bound mismatch for dimension 1 of 
array 'res' .3/2." }
-- 
2.43.0



[PATCH 8/8] fortran: Continue MINLOC/MAXLOC second loop where the first stopped [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Continue the second set of loops where the first one stopped in the
generated inline MINLOC/MAXLOC code in the cases where the generated code
contains two sets of loops.  This fixes a regression that was introduced
when enabling the generation of inline MINLOC/MAXLOC code with ARRAY of rank
greater than 1, no DIM argument, and either non-scalar MASK or floating-
point ARRAY.

In the cases where two sets of loops are generated as inline MINLOC/MAXLOC
code, we previously generated code such as (for rank 2 ARRAY, so with two
levels of nesting):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

which means we process the first elements twice, once in the first set
of loops and once in the second one.  This change avoids this duplicate
processing by using a conditional as lower bound for the second set of
loops, generating code like:

second_loop_entry = false;
for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
second_loop_entry = true;
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in (second_loop_entry ? idx11 : lower1)..upper1)
  {
for (idx22 in (second_loop_entry ? idx12 : lower2)..upper2)
  {
...
second_loop_entry = false;
  }
  }

It was expected that the compiler optimizations would be able to remove the
state variable second_loop_entry.  It is the case if ARRAY has rank 1 (so
without loop nesting), the variable is removed and the loop bounds become
unconditional, which restores previously generated code, fully fixing the
regression.  For larger rank, unfortunately, the state variable and
conditional loop bounds remain, but those cases were previously using
library calls, so it's not a regression.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Generate a set
of index variables.  Set them using the loop indexes before leaving
the first set of loops.  Generate a new loop entry predicate.
Initialize it.  Set it before leaving the first set of loops.  Clear
it in the body of the second set of loops.  For the second set of
loops, update each loop lower bound to use the corresponding index
variable if the predicate variable is set.
---
 gcc/fortran/trans-intrinsic.cc | 33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 3a6a73d4241..89134b1190b 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5342,6 +5342,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = 0;
 pos1 = 0;
 S1 = from1;
+second_loop_entry = false;
 while (S1 <= to1) {
   S0 = from0;
   while (s0 <= to0 {
@@ -5354,6 +5355,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 limit = a[S1][S0];
 pos0 = S0 + (1 - from0);
 pos1 = S1 + (1 - from1);
+second_loop_entry = true;
 goto lab1;
   }
 }
@@ -5363,9 +5365,9 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 }
 goto lab2;
 lab1:;
-S1 = from1;
+S1 = second_loop_entry ? S1 : from1;
 while (S1 <= to1) {
-  S0 = from0;
+  S0 = second_loop_entry ? S0 : from0;
   while (S0 <= to0) {
 if (mask[S1][S0])
   if (a[S1][S0] < limit) {
@@ -5373,6 +5375,7 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
 pos0 = S + (1 - from0);
 pos1 = S + (1 - from1);
   }
+second_loop_entry = false;
 S0++;
   }
   S1++;
@@ -5444,6 +5447,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_expr *backexpr;
   gfc_se backse;
   tree pos[GFC_MAX_DIMENSIONS];
+  tree idx[GFC_MAX_DIMENSIONS];
   tree result_var = NULL_TREE;
   int n;
   bool optional_mask;
@@ -5525,6 +5529,8 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, gfc_expr * 
expr, enum tree_code op)
   gfc_get_string ("pos%d", i));
   offset[i] = 

[PATCH 4/8] fortran: Outline array bound check generation code

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

The next patch will need reindenting of the array bound check generation
code.  This outlines it to its own function beforehand, reducing the churn
in the next patch.

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Move array bound check
generation code...
(add_check_section_in_array_bounds): ... here as a new function.
---
 gcc/fortran/trans-array.cc | 297 ++---
 1 file changed, 143 insertions(+), 154 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 0c78e1fecd8..99a603a3afb 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4736,6 +4736,146 @@ gfc_conv_section_startstride (stmtblock_t * block, 
gfc_ss * ss, int dim)
 }
 
 
+/* Generate in INNER the bounds checking code along the dimension DIM for
+   the array associated with SS_INFO.  */
+
+static void
+add_check_section_in_array_bounds (stmtblock_t *inner, gfc_ss_info *ss_info,
+  int dim)
+{
+  gfc_expr *expr = ss_info->expr;
+  locus *expr_loc = >where;
+  const char *expr_name = expr->symtree->name;
+
+  gfc_array_info *info = _info->data.array;
+
+  bool check_upper;
+  if (dim == info->ref->u.ar.dimen - 1
+  && info->ref->u.ar.as->type == AS_ASSUMED_SIZE)
+check_upper = false;
+  else
+check_upper = true;
+
+  /* Zero stride is not allowed.  */
+  tree tmp = fold_build2_loc (input_location, EQ_EXPR, logical_type_node,
+ info->stride[dim], gfc_index_zero_node);
+  char * msg = xasprintf ("Zero stride is not allowed, for dimension %d "
+ "of array '%s'", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg);
+  free (msg);
+
+  tree desc = info->descriptor;
+
+  /* This is the run-time equivalent of resolve.cc's
+ check_dimension.  The logical is more readable there
+ than it is here, with all the trees.  */
+  tree lbound = gfc_conv_array_lbound (desc, dim);
+  tree end = info->end[dim];
+  tree ubound = check_upper ? gfc_conv_array_ubound (desc, dim) : NULL_TREE;
+
+  /* non_zerosized is true when the selected range is not
+ empty.  */
+  tree stride_pos = fold_build2_loc (input_location, GT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_pos = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_pos, tmp);
+
+  tree stride_neg = fold_build2_loc (input_location, LT_EXPR, 
logical_type_node,
+info->stride[dim], gfc_index_zero_node);
+  tmp = fold_build2_loc (input_location, GE_EXPR, logical_type_node,
+info->start[dim], end);
+  stride_neg = fold_build2_loc (input_location, TRUTH_AND_EXPR,
+   logical_type_node, stride_neg, tmp);
+  tree non_zerosized = fold_build2_loc (input_location, TRUTH_OR_EXPR,
+   logical_type_node, stride_pos,
+   stride_neg);
+
+  /* Check the start of the range against the lower and upper
+ bounds of the array, if the range is not empty.
+ If upper bound is present, include both bounds in the
+ error message.  */
+  if (check_upper)
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+info->start[dim], lbound);
+  tmp = fold_build2_loc (input_location, TRUTH_AND_EXPR, logical_type_node,
+non_zerosized, tmp);
+  tree tmp2 = fold_build2_loc (input_location, GT_EXPR, logical_type_node,
+  info->start[dim], ubound);
+  tmp2 = fold_build2_loc (input_location, TRUTH_AND_EXPR, 
logical_type_node,
+ non_zerosized, tmp2);
+  msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' outside of "
+  "expected range (%%ld:%%ld)", dim + 1, expr_name);
+  gfc_trans_runtime_check (true, false, tmp, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  gfc_trans_runtime_check (true, false, tmp2, inner, expr_loc, msg,
+ fold_convert (long_integer_type_node, info->start[dim]),
+ fold_convert (long_integer_type_node, lbound),
+ fold_convert (long_integer_type_node, ubound));
+  free (msg);
+}
+  else
+{
+  tmp = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
+ 

[PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

This series of patches enable the generation of inline code for the MINLOC
and MAXLOC intrinsics, when the DIM argument is not present.  The
generated code is based on the inline implementation already generated in
the scalar case, that is when ARRAY has rank 1 and DIM is present.  The
code is extended by using several variables (one for each dimension) where
the scalar code used just one, and collecting the variables to an array
before returning.

The patches are split in a way that allows inlining in more and more cases
as controlled by the gfc_inline_intrinsic_p predicate which evolves with
the patches.

They have been generated on top of the patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657959.html

Mikael Morin (8):
  fortran: Add tests covering inline MINLOC/MAXLOC without DIM [PR90608]
  fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]
  fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1
[PR90608]
  fortran: Outline array bound check generation code
  fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK
[PR90608]
  fortran: Inline integral MINLOC/MAXLOC with no DIM and scalar MASK
[PR90608]
  fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]
  fortran: Continue MINLOC/MAXLOC second loop where the first stopped
[PR90608]

 gcc/fortran/frontend-passes.cc|   3 +-
 gcc/fortran/trans-array.cc| 382 ---
 gcc/fortran/trans-intrinsic.cc| 454 +-
 gcc/testsuite/gfortran.dg/maxloc_7.f90| 220 +
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_7.f90 |   4 +-
 .../gfortran.dg/maxloc_with_mask_1.f90| 393 +++
 gcc/testsuite/gfortran.dg/minloc_8.f90| 220 +
 .../gfortran.dg/minloc_with_mask_1.f90| 392 +++
 11 files changed, 1792 insertions(+), 288 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/maxloc_with_mask_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_8.f90
 create mode 100644 gcc/testsuite/gfortran.dg/minloc_with_mask_1.f90

-- 
2.43.0



[PATCH 3/8] fortran: Inline MINLOC/MAXLOC with no DIM and ARRAY of rank 1 [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable inline code generation for the MINLOC and MAXLOC intrinsic, if the
DIM argument is not present and ARRAY has rank 1.  This case is similar to
the case where the result is scalar (DIM present and rank 1 ARRAY), which
already supports inline expansion of the intrinsic.  Both cases return
the same value, with the difference that the result is an array of size 1 if
DIM is absent, whereas it's a scalar if DIM  is present.  So all there is
to do for the new case to work is hook the inline expansion with the
scalarizer.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the scalarization
rank based on the MINLOC/MAXLOC rank if needed.  Call the inline
code generation and setup the scalarizer array descriptor info
in the MINLOC and MAXLOC cases.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Return the
result array element if the scalarizer is setup and we are inside
the loops.  Restrict library function call dispatch to the case
where inline expansion is not supported.  Declare an array result
if the expression isn't scalar.  Initialize the array result single
element and return the result variable if the expression isn't
scalar.
(walk_inline_intrinsic_minmaxloc): New function.
(walk_inline_intrinsic_function): Add MINLOC and MAXLOC cases,
dispatching to walk_inline_intrinsic_minmaxloc.
(gfc_add_intrinsic_ss_code): Add MINLOC and MAXLOC cases.
(gfc_inline_intrinsic_function_p): Return true if ARRAY has rank 1,
regardless of DIM.
---
 gcc/fortran/trans-array.cc |  25 +
 gcc/fortran/trans-intrinsic.cc | 198 ++---
 2 files changed, 155 insertions(+), 68 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index c93a5f1e754..0c78e1fecd8 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4771,6 +4771,8 @@ gfc_conv_ss_startstride (gfc_loopinfo * loop)
case GFC_ISYM_UBOUND:
case GFC_ISYM_LCOBOUND:
case GFC_ISYM_UCOBOUND:
+   case GFC_ISYM_MAXLOC:
+   case GFC_ISYM_MINLOC:
case GFC_ISYM_SHAPE:
case GFC_ISYM_THIS_IMAGE:
  loop->dimen = ss->dimen;
@@ -4820,6 +4822,29 @@ done:
case GFC_SS_INTRINSIC:
  switch (expr->value.function.isym->id)
{
+   case GFC_ISYM_MINLOC:
+   case GFC_ISYM_MAXLOC:
+ {
+   gfc_se se;
+   gfc_init_se (, nullptr);
+   se.loop = loop;
+   se.ss = ss;
+   gfc_conv_intrinsic_function (, expr);
+   gfc_add_block_to_block (_loop->pre, );
+   gfc_add_block_to_block (_loop->post, );
+
+   info->descriptor = se.expr;
+
+   info->data = gfc_conv_array_data (info->descriptor);
+   info->data = gfc_evaluate_now (info->data, _loop->pre);
+
+   info->offset = gfc_index_zero_node;
+   info->start[0] = gfc_index_zero_node;
+   info->end[0] = gfc_index_zero_node;
+   info->stride[0] = gfc_index_one_node;
+   continue;
+ }
+
/* Fall through to supply start and stride.  */
case GFC_ISYM_LBOUND:
case GFC_ISYM_UBOUND:
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index cc0d00f4e39..a947dd1ba0b 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5273,66 +5273,69 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
we need to handle.  For performance reasons we sometimes create two
loops instead of one, where the second one is much simpler.
Examples for minloc intrinsic:
-   1) Result is an array, a call is generated
-   2) Array mask is used and NaNs need to be supported:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (mask[S]) {
- if (pos == 0) pos = S + (1 - from);
- if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   }
-   S++;
-  }
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (mask[S]) if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++;
-  }
-  lab2:;
-   3) NaNs need to be supported, but it is known at compile time or cheaply
-  at runtime whether array is nonempty or not:
-  limit = Infinity;
-  pos = 0;
-  S = from;
-  while (S <= to) {
-   if (a[S] <= limit) { limit = a[S]; pos = S + (1 - from); goto lab1; }
-   S++;
-  }
-  if (from <= to) pos = 1;
-  goto lab2;
-  lab1:;
-  while (S <= to) {
-   if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
-   S++;
-  }
-  lab2:;
-   4) NaNs 

[PATCH 5/8] fortran: Inline integral MINLOC/MAXLOC with no DIM and no MASK [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable generation of inline code for the MINLOC and MAXLOC intrinsic,
if the ARRAY argument is of integral type and of any rank (only the rank 1
case was previously inlined), and neither DIM nor MASK arguments are
present.

This needs a few adjustments in gfc_conv_intrinsic_minmaxloc,
mainly to replace the single variables POS and OFFSET, with collections
of variables, one variable per dimension each.

The restriction to integral ARRAY and absent MASK limits the scope of
the change to the cases where we generate single loop inline code.  The
code generation for the second loop is only accessible with ARRAY of rank
1, so it can continue using a single variable.  A later change will extend
inlining to the double loop cases.

There is some bounds checking code that was previously handled by the
library, and that needed some changes in the scalarizer to avoid regressing.
The bounds check code generation was already supported by the scalarizer,
but it was only applying to array reference sections, checking both
for array bound violation and for shape conformability between all the
involved arrays.  With this change, for MINLOC or MAXLOC, enable the
conformability check between all the scalarized arrays, and disable the
array bound violation check.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_conv_ss_startstride): Set the MINLOC/MAXLOC
result upper bound using the rank of the ARRAY argument.  Ajdust
the error message for intrinsic result arrays.  Only check array
bounds for array references.  Move bound check decision code...
(bounds_check_needed): ... here as a new predicate.  Allow bound
check for MINLOC/MAXLOC intrinsic results.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Change the
result array upper bound to the rank of ARRAY.  Update the NONEMPTY
variable to depend on the non-empty extent of every dimension.  Use
one variable per dimension instead of a single variable for the
position and the offset.  Update their declaration, initialization,
and update to affect the variable of each dimension.  Use the first
variable only in areas only accessed with rank 1 ARRAY argument.
Set every element of the result using its corresponding variable.
(gfc_inline_intrinsic_function_p): Return true for integral ARRAY
and absent DIM and MASK.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_4.f90: Additionally accept the error
message emitted by the scalarizer.
---
 gcc/fortran/trans-array.cc|  70 ++--
 gcc/fortran/trans-intrinsic.cc| 150 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_4.f90 |   4 +-
 3 files changed, 166 insertions(+), 58 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 99a603a3afb..76448c8ac0e 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4876,6 +4876,35 @@ add_check_section_in_array_bounds (stmtblock_t *inner, 
gfc_ss_info *ss_info,
 }
 
 
+/* Tells whether we need to generate bounds checking code for the array
+   associated with SS.  */
+
+bool
+bounds_check_needed (gfc_ss *ss)
+{
+  /* Catch allocatable lhs in f2003.  */
+  if (flag_realloc_lhs && ss->no_bounds_check)
+return false;
+
+  gfc_ss_info *ss_info = ss->info;
+  if (ss_info->type == GFC_SS_SECTION)
+return true;
+
+  if (!(ss_info->type == GFC_SS_INTRINSIC
+   && ss_info->expr
+   && ss_info->expr->expr_type == EXPR_FUNCTION))
+return false;
+
+  gfc_intrinsic_sym *isym = ss_info->expr->value.function.isym;
+  if (!(isym
+   && (isym->id == GFC_ISYM_MAXLOC
+   || isym->id == GFC_ISYM_MINLOC)))
+return false;
+
+  return gfc_inline_intrinsic_function_p (ss_info->expr);
+}
+
+
 /* Calculates the range start and stride for a SS chain.  Also gets the
descriptor and data pointer.  The range of vector subscripts is the size
of the vector.  Array bounds are also checked.  */
@@ -4977,10 +5006,17 @@ done:
info->data = gfc_conv_array_data (info->descriptor);
info->data = gfc_evaluate_now (info->data, _loop->pre);
 
-   info->offset = gfc_index_zero_node;
+   gfc_expr *array = expr->value.function.actual->expr;
+   tree rank = build_int_cst (gfc_array_index_type, array->rank);
+
+   tree tmp = fold_build2_loc (input_location, MINUS_EXPR,
+   gfc_array_index_type, rank,
+   gfc_index_one_node);
+
+   info->end[0] = gfc_evaluate_now (tmp, _loop->pre);
info->start[0] = gfc_index_zero_node;
-   info->end[0] = gfc_index_zero_node;
info->stride[0] = gfc_index_one_node;
+   info->offset = 

[PATCH 7/8] fortran: Inline non-character MINLOC/MAXLOC with no DIM [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Enable generation of inline MINLOC/MAXLOC code in the case where DIM
is not present, and either ARRAY is of floating point type or MASK is an
array.  Those cases are the remaining bits to fully support inlining of
non-CHARACTER MINLOC/MAXLOC without DIM.  They are treated together because
they generate similar code, the NANs for REAL types being handled a bit like
a second level of masking.  These are the cases for which we generate two
sets of loops.

This change affects the code generating the second loop, that was previously
accessible only in the cases ARRAY has rank 1 only.  The single variable
initialization and update are changed to apply to multiple variables, one
per dimension.

The code generated is as follows (if ARRAY has rank 2):

for (idx11 in lower1..upper1)
  {
for (idx12 in lower2..upper2)
  {
...
if (...)
  {
...
goto second_loop;
  }
  }
  }
second_loop:
for (idx21 in lower1..upper1)
  {
for (idx22 in lower2..upper2)
  {
...
  }
  }

This code leads to processing the first elements redundantly, both in the
first set of loops and in the second one.  The loop over idx22 could
start from idx12 the first time it is run, but as it has to start from
lower2 for the rest of the runs, this change uses the same bounds for both
set of loops for simplicity.  In the rank 1 case, this makes the generated
code worse compared to the inline code that was generated before.  A later
change will introduce conditionals to avoid the duplicate processing and
restore the generated code in that case.

PR fortran/90608

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Initialize
and update all the variables.  Put the label and goto in the
outermost scalarizer loop.  Don't start the second loop where the
first stopped.
(gfc_inline_intrinsic_function_p): Also return TRUE for array MASK
or for any REAL type.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_bounds_5.f90: Additionally accept error
messages reported by the scalarizer.
* gfortran.dg/maxloc_bounds_6.f90: Ditto.
---
 gcc/fortran/trans-intrinsic.cc| 127 --
 gcc/testsuite/gfortran.dg/maxloc_bounds_5.f90 |   4 +-
 gcc/testsuite/gfortran.dg/maxloc_bounds_6.f90 |   4 +-
 3 files changed, 87 insertions(+), 48 deletions(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 85520871797..3a6a73d4241 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5332,12 +5332,55 @@ strip_kind_from_actual (gfc_actual_arglist * actual)
   if (a[S] < limit) { limit = a[S]; pos = S + (1 - from); }
   S++;
 }
-   B: ARRAY has rank 1, and DIM is absent.  Use the same code as the scalar
-  case and wrap the result in an array.
-   C: ARRAY has rank > 1, NANs are not supported, and DIM and MASK are absent.
-  Generate code similar to the single loop scalar case, but using one
-  variable per dimension, for example if ARRAY has rank 2:
-  4) NAN's aren't supported, no MASK:
+   B: Array result, non-CHARACTER type, DIM absent
+  Generate similar code as in the scalar case, using a collection of
+  variables (one per dimension) instead of a single variable as result.
+  Picking only cases 1) and 4) with ARRAY of rank 2, the generated code
+  becomes:
+  1) Array mask is used and NaNs need to be supported:
+limit = Infinity;
+pos0 = 0;
+pos1 = 0;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (s0 <= to0 {
+if (mask[S1][S0]) {
+  if (pos0 == 0) {
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+  }
+  if (a[S1][S0] <= limit) {
+limit = a[S1][S0];
+pos0 = S0 + (1 - from0);
+pos1 = S1 + (1 - from1);
+goto lab1;
+  }
+}
+S0++;
+  }
+  S1++;
+}
+goto lab2;
+lab1:;
+S1 = from1;
+while (S1 <= to1) {
+  S0 = from0;
+  while (S0 <= to0) {
+if (mask[S1][S0])
+  if (a[S1][S0] < limit) {
+limit = a[S1][S0];
+pos0 = S + (1 - from0);
+pos1 = S + (1 - from1);
+  }
+S0++;
+  }
+  S1++;
+}
+lab2:;
+result = { pos0, pos1 };
+  ...
+  4) NANs aren't supported, no array mask.
 limit = infinities_supported ? Infinity : huge (limit);
 pos0 = 

[PATCH 2/8] fortran: Disable frontend passes for inlinable MINLOC/MAXLOC [PR90608]

2024-07-31 Thread Mikael Morin
From: Mikael Morin 

Regression-tested on x86_64-pc-linux-gnu.
OK for master?

-- >8 --

Disable rewriting of MINLOC/MAXLOC expressions for which inline code
generation is supported.  Update the gfc_inline_intrinsic_function_p
predicate (already existing) for that, with the current state of
MINLOC/MAXLOC inlining support, that is only the cases of a scalar
result and non-CHARACTER argument for now.

This change has no effect currently, as the MINLOC/MAXLOC front-end passes
only change expressions of rank 1, but the inlining control predicate
gfc_inline_intrinsic_function_p returns false for those.  However, later
changes will extend MINLOC/MAXLOC inline expansion support to array
expressions and update the inlining control predicate, and this will become
effective.

PR fortran/90608

gcc/fortran/ChangeLog:

* frontend-passes.cc (optimize_minmaxloc): Skip if we can generate
inline code for the unmodified expression.
* trans-intrinsic.cc (gfc_inline_intrinsic_function_p): Add
MINLOC and MAXLOC cases.
---
 gcc/fortran/frontend-passes.cc |  3 ++-
 gcc/fortran/trans-intrinsic.cc | 23 +++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 3c06018fdbb..8e4c6310ba8 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -2277,7 +2277,8 @@ optimize_minmaxloc (gfc_expr **e)
   || fn->value.function.actual == NULL
   || fn->value.function.actual->expr == NULL
   || fn->value.function.actual->expr->ts.type == BT_CHARACTER
-  || fn->value.function.actual->expr->rank != 1)
+  || fn->value.function.actual->expr->rank != 1
+  || gfc_inline_intrinsic_function_p (fn))
 return;
 
   *e = gfc_get_array_expr (fn->ts.type, fn->ts.kind, >where);
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 9f3c3ce47bc..cc0d00f4e39 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
 case GFC_ISYM_TRANSPOSE:
   return true;
 
+case GFC_ISYM_MINLOC:
+case GFC_ISYM_MAXLOC:
+  {
+   /* Disable inline expansion if code size matters.  */
+   if (optimize_size)
+ return false;
+
+   gfc_actual_arglist *array_arg = expr->value.function.actual;
+   gfc_actual_arglist *dim_arg = array_arg->next;
+
+   gfc_expr *array = array_arg->expr;
+   gfc_expr *dim = dim_arg->expr;
+
+   if (!(array->ts.type == BT_INTEGER
+ || array->ts.type == BT_REAL))
+ return false;
+
+   if (array->rank == 1 && dim != nullptr)
+ return true;
+
+   return false;
+  }
+
 default:
   return false;
 }
-- 
2.43.0



[PATCH v2] c++/coroutines: only defer expanding co_{await,return,yield} if dependent [PR112341]

2024-07-31 Thread Arsen Arsenović
Okay, I've reworked it, and it built and passed coroutine tests.
Regstrapping overnight.  Is the following OK with you?
-- >8 --
By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

  awaitable g();
  template
  task f()
  {
co_await g();
co_yield 1;
co_return "foo";
  }

... the coroutine promise type in each statement is always
std::coroutine_handle::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on 
co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a proper type.
(coro_dependent_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
coro_dependent_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
---
 gcc/cp/coroutines.cc  | 157 ++
 gcc/cp/pt.cc  |   5 -
 gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
 gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
 gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
 .../torture/co-yield-03-tmpl-nondependent.C   | 140 
 6 files changed, 376 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
 create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 08a610afc82b..b535519b56d1 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
   tree actor_decl;/* The synthesized actor function.  */
   tree destroy_decl;  /* The synthesized destroy function.  */
   tree promise_type;  /* The cached promise type for this function.  */
+  tree traits_type;   /* The cached traits type for this function.  */
   tree handle_type;   /* The cached coroutine handle for this function.  */
   tree self_h_proxy;  /* A handle instance that is used as the proxy for the
 one that will eventually be allocated in the coroutine
@@ -527,11 +528,12 @@ find_promise_type (tree traits_class)
   return promise_type;
 }
 
+/* Perform initialization of the coroutine processor state, if not done
+   before.  */
+
 static bool
-coro_promise_type_found_p (tree fndecl, location_t loc)
+ensure_coro_initialized (location_t loc)
 {
-  gcc_assert (fndecl != NULL_TREE);
-
   if (!coro_initialized)
 {
   /* Trees we only need to create once.
@@ -569,6 +571,30 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
 
   coro_initialized = true;
 }
+  return true;
+}
+
+/* Try to get the coroutine traits class.  */
+static tree
+coro_get_traits_class (tree fndecl, location_t loc)
+{
+  gcc_assert (fndecl != NULL_TREE);
+  gcc_assert (coro_initialized);
+
+  coroutine_info *coro_info = get_or_insert_coroutine_info (fndecl);
+  auto& traits_type = coro_info->traits_type;
+  if 

Re: [PATCH 2/2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Andrew Pinski
On Wed, Jul 31, 2024 at 5:05 AM Richard Biener
 wrote:
>
> On Tue, Jul 30, 2024 at 5:26 PM Andrew Pinski  
> wrote:
> >
> > When this pattern was converted from being only dealing with 0/-1, we 
> > missed that if `e == f` is true
> > then the optimization is wrong and needs an extra check for that.
> >
> > This changes the patterns to be:
> > /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> > /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> > /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> > /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
> >
> > This still produces better code than the original case and in many cases (x 
> > != y) will
> > still reduce to either false or true.
> >
> > With this change we also need to make sure `a`, `b` and the resulting types 
> > are all
> > the same for the same reason as the previous patch.
> >
> > I updated (well added) to the testcases to make sure there are the right 
> > amount of
> > comparisons left.
> >
> > Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> OK

I found a few issues with this version of the patch in the end dealing
with NaNs and because the testcase was not actually running it was not
spotted before.
I am testing a new version of the patch and will submit it after it
finishes testing.

Thanks,
Andrew

>
> > PR tree-optimization/116120
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Add test for `x != y`
> > in result.
> > (`(a ? x : y) eq/ne (b ? y : x)`): Add test for `x == y` in result.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.dg/tree-ssa/pr50.C: Add extra checks on the test.
> > * gcc.dg/tree-ssa/pr50-1.c: Likewise.
> > * gcc.dg/tree-ssa/pr50.c: Likewise.
> > * g++.dg/torture/pr116120-1.c: New test.
> > * g++.dg/torture/pr116120-2.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/match.pd   | 20 -
> >  gcc/testsuite/g++.dg/torture/pr116120-1.c  | 32 
> >  gcc/testsuite/g++.dg/torture/pr116120-2.c  | 35 ++
> >  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 10 +++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c |  9 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   |  1 +
> >  6 files changed, 99 insertions(+), 8 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-1.c
> >  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-2.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 881a827860f..4d3ee578371 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5632,21 +5632,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
> >  #endif
> >
> > -/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> > -/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> > -/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> > -/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> > +/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> > +/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> > +/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> > +/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
> >  (for cnd (cond vec_cond)
> >   (for eqne (eq ne)
> >(simplify
> > (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> > -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> > - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); 
> > }
> > +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> > + && types_match (type, TREE_TYPE (@0)))
> > + (cnd (bit_and (bit_xor @0 @3) (ne:type @1 @2))
> > +  { constant_boolean_node (eqne == NE_EXPR, type); }
> >{ constant_boolean_node (eqne != NE_EXPR, type); })))
> >(simplify
> > (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> > -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> > - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); 
> > }
> > +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> > + && types_match (type, TREE_TYPE (@0)))
> > + (cnd (bit_ior (bit_xor @0 @3) (eq:type @1 @2))
> > +  { co

Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-31 Thread Richard Sandiford
Jennifer Schmitz  writes:
> Thanks for the feedback! I updated the patch based on your comments, more 
> detailed comments inline below. The updated version was bootstrapped and 
> tested again, no regression.
> Best,
> Jennifer
>
> From 89936b7bc2de7a1e4bc55c3a1e8d5e6ac0de579d Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz 
> Date: Wed, 24 Jul 2024 06:13:59 -0700
> Subject: [PATCH] AArch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2
>
> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
> implemented so far. This patch implements and tests the two fusion pairs.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> There was also no non-noise impact on SPEC CPU2017 benchmark.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>   fusion logic.
>   * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>   (cmp+cset): Likewise.
>   * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>   field fusible_ops.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>   * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64-fusion-pairs.def   |  2 ++
>  gcc/config/aarch64/aarch64.cc | 20 +++
>  gcc/config/aarch64/tuning_models/neoversev2.h |  5 ++-
>  .../gcc.target/aarch64/fuse_cmp_csel.c| 33 +++
>  .../gcc.target/aarch64/fuse_cmp_cset.c| 31 +
>  5 files changed, 90 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_csel.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/fuse_cmp_cset.c
>
> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> index 9a43b0c8065..bf5e85ba8fe 100644
> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> @@ -37,5 +37,7 @@ AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
>  AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
>  AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
>  AARCH64_FUSION_PAIR ("addsub_2reg_const1", ADDSUB_2REG_CONST1)
> +AARCH64_FUSION_PAIR ("cmp+csel", CMP_CSEL)
> +AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>  
>  #undef AARCH64_FUSION_PAIR
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index e0cf382998c..d42c153443e 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -27345,6 +27345,26 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
> *curr)
>&& reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
>  return true;
>  
> +  /* Fuse CMP and CSEL/CSET.  */
> +  if (prev_set && curr_set
> +  && GET_CODE (SET_SRC (prev_set)) == COMPARE
> +  && SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (prev_set), 0)))
> +  && reg_referenced_p (SET_DEST (prev_set), PATTERN (curr)))
> +{
> +  enum attr_type prev_type = get_attr_type (prev);
> +  if ((prev_type == TYPE_ALUS_SREG || prev_type == TYPE_ALUS_IMM)
> +&& (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSEL)
> +&& GET_CODE (SET_SRC (curr_set)) == IF_THEN_ELSE
> +&& REG_P (XEXP (SET_SRC (curr_set), 1))
> +&& REG_P (XEXP (SET_SRC (curr_set), 2))

Gah, I'd meant to say this in a previous review, but now realise
that I forgot.  I think these REG_Ps can be relaxed to:

   && aarch64_reg_or_zero (XEXP (SET_SRC (curr_set), N), VOIDmode)

since CSEL can take w/xzr.

> +&& SCALAR_INT_MODE_P (GET_MODE (XEXP (SET_SRC (curr_set), 1
> +|| (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_CSET)
> +&& GET_RTX_CLASS (GET_CODE (SET_SRC (curr_set)))
> +   == RTX_COMPARE
> +&& REG_P (SET_DEST (curr_set
> + return true;
> +}

It looks like there might be a missing pair of brackets here.  AFAICT,
the current bracketing works out as:

  if ((prev_type conditions)
  && (CMP_CSEL conditions)
 || (CMP_CSET conditions))

and since || binds less tightly than &&, I think the CMP_CSET condition
doesn't include the prev_type restriction.  Enclosing everything after
the first && in an extra set of brackets would fix that.

OK with those changes from my POV (no need for another round of review,
unless you'd prefer one).  Please give others 

Re: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account

2024-07-31 Thread Kyrylo Tkachov
gt; @@ -135,6 +135,8 @@ static const sve_vec_cost cortexx925_sve_vector_cost =
>>> operation more than a 64-bit gather.  */
>>>  14, /* gather_load_x32_cost  */
>>>  12, /* gather_load_x64_cost  */
>>> +  42, /* gather_load_x32_init_cost  */
>>> +  24, /* gather_load_x64_init_cost  */
>> 
>> 
>> Can you comment on how these numbers are derived?
> 
> They were derived essentially from benchmarking.  I did a bunch of runs over 
> various cores
> to determine at which iteration count they become profitable.  From that as 
> you can
> probably tell the costs are a multiple of the cost of the operations for the 
> specific core.
> 
> This because that cost already keeps in mind things like VL differences.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

Ok with Richard’s comments addressed.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>* config/aarch64/aarch64-protos.h (struct sve_vec_cost): Add
>gather_load_x32_init_cost and gather_load_x64_init_cost.
>* config/aarch64/aarch64.cc (aarch64_vector_costs): Add
>m_sve_gather_scatter_init_cost.
>(aarch64_vector_costs::add_stmt_cost): Use them.
>(aarch64_vector_costs::finish_cost): Likewise.
>* config/aarch64/tuning_models/a64fx.h: Update.
>* config/aarch64/tuning_models/cortexx925.h: Update.
>* config/aarch64/tuning_models/generic.h: Update.
>* config/aarch64/tuning_models/generic_armv8_a.h: Update.
>* config/aarch64/tuning_models/generic_armv9_a.h: Update.
>* config/aarch64/tuning_models/neoverse512tvb.h: Update.
>* config/aarch64/tuning_models/neoversen2.h: Update.
>* config/aarch64/tuning_models/neoversen3.h: Update.
>* config/aarch64/tuning_models/neoversev1.h: Update.
>* config/aarch64/tuning_models/neoversev2.h: Update.
>* config/aarch64/tuning_models/neoversev3.h: Update.
>* config/aarch64/tuning_models/neoversev3ae.h: Update.
> 
> -- inline copy of patch --
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 42639e9efcf1e0f9362f759ae63a31b8eeb0d581..16eb8edab4d9fdfc6e3672c56ef5c9f6962d0c0b
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -262,6 +262,8 @@ struct sve_vec_cost : simd_vec_cost
>  unsigned int fadda_f64_cost,
>  unsigned int gather_load_x32_cost,
>  unsigned int gather_load_x64_cost,
> + unsigned int gather_load_x32_init_cost,
> + unsigned int gather_load_x64_init_cost,
>  unsigned int scatter_store_elt_cost)
> : simd_vec_cost (base),
>   clast_cost (clast_cost),
> @@ -270,6 +272,8 @@ struct sve_vec_cost : simd_vec_cost
>   fadda_f64_cost (fadda_f64_cost),
>   gather_load_x32_cost (gather_load_x32_cost),
>   gather_load_x64_cost (gather_load_x64_cost),
> +  gather_load_x32_init_cost (gather_load_x32_init_cost),
> +  gather_load_x64_init_cost (gather_load_x64_init_cost),
>   scatter_store_elt_cost (scatter_store_elt_cost)
>   {}
> 
> @@ -289,6 +293,12 @@ struct sve_vec_cost : simd_vec_cost
>   const int gather_load_x32_cost;
>   const int gather_load_x64_cost;
> 
> +  /* Additional loop initialization cost of using a gather load instruction. 
>  The x32
> + value is for loads of 32-bit elements and the x64 value is for loads of
> + 64-bit elements.  */
> +  const int gather_load_x32_init_cost;
> +  const int gather_load_x64_init_cost;
> +
>   /* The per-element cost of a scatter store.  */
>   const int scatter_store_elt_cost;
> };
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> eafa377cb095f49408d8a926fb49ce13e2155ba2..da2feb54ddad9b39db92e0a9ec7c4e40cfa3e4e2
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16227,6 +16227,10 @@ private:
>  supported by Advanced SIMD and SVE2.  */
>   bool m_has_avg = false;
> 
> +  /* Additional initialization costs for using gather or scatter operation in
> + the current loop.  */
> +  unsigned int m_sve_gather_scatter_init_cost = 0;
> +
>   /* True if the vector body contains a store to a decl and if the
>  function is known to have a vld1 from the same decl.
> 
> @@ -17291,6 +17295,20 @@ aarch64_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
>   

Re: [PATCH] c++/coroutines: only defer expanding co_{await,return,yield} if dependent [PR112341]

2024-07-31 Thread Arsen Arsenović
Jason Merrill  writes:

> On 7/31/24 6:54 AM, Arsen Arsenović wrote:
>> Tested on x86_64-pc-linux-gnu.  OK for trunk?
>> TIA, have a lovely day.
>> -- >8 --
>> By doing so, we can get diagnostics in template decls when we know we
>> can.  For instance, in the following:
>>awaitable g();
>>template
>>task f()
>>{
>>  co_await g();
>>  co_yield 1;
>>  co_return "foo";
>>}
>> ... the coroutine promise type in each statement is always
>> std::coroutine_handle::promise_type, and all of the operands are
>> not type-dependent, so we can always compute the resulting types (and
>> expected types) of these expressions and statements.
>> Also, when we do not know the type of the CO_AWAIT_EXPR or
>> CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
>> unknown_type_node.  This is more correct, since the type is not unknown,
>> it just isn't determined yet.  This also means we can remove the
>> CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
>> type_dependent_expression_p.
>> PR c++/112341 - error: insufficient contextual information to determine type
>> on co_await result in function template
>> gcc/cp/ChangeLog:
>>  PR c++/112341
>>  * coroutines.cc (struct coroutine_info): Also cache the
>>  traits type.
>>  (ensure_coro_initialized): New function.  Makes sure we have
>>  initialized the coroutine state successfully, or informs the
>>  caller should it fail to do so.  Extracted from
>>  coro_promise_type_found_p.
>>  (coro_get_traits_class): New function.  Gets the (cached)
>>  coroutine traits type for a given coroutine.  Extracted from
>>  coro_promise_type_found_p and refactored to cache the result.
>>  (coro_promise_type_found_p): Use the two functions above.
>>  (build_template_co_await_expr): New function.  Builds a
>>  CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
>>  declaration.
>>  (build_co_await): Use the above if processing_template_decl, and
>>  give it a propert type.
>>  (defer_expansion_p): New function.  Returns true iff its
>>  argument is a type-dependent expression OR the current functions
>>  traits class is type dependent.
>>  (finish_co_await_expr): Defer expansion only in the case
>>  defer_expasnion_p returns true.
>>  (finish_co_yield_expr): Ditto.
>>  (finish_co_return_stmt): Ditto.
>>  * pt.cc (type_dependent_expression_p): Do not treat
>>  CO_AWAIT/CO_YIELD specially.
>> gcc/testsuite/ChangeLog:
>>  PR c++/112341
>>  * g++.dg/coroutines/pr112341-2.C: New test.
>>  * g++.dg/coroutines/pr112341-3.C: New test.
>>  * g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
>>  test.
>>  * g++.dg/coroutines/pr112341.C: New test.
>> ---
>>   gcc/cp/coroutines.cc  | 142 ++
>>   gcc/cp/pt.cc  |   5 -
>>   gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
>>   gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
>>   gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
>>   .../torture/co-yield-03-tmpl-nondependent.C   | 140 +
>>   6 files changed, 361 insertions(+), 37 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
>>   create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
>>   create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
>>   create mode 100644 
>> gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C
>> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
>> index 127a1c06b56e..9494cb499454 100644
>> --- a/gcc/cp/coroutines.cc
>> +++ b/gcc/cp/coroutines.cc
>> @@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
>> tree actor_decl;/* The synthesized actor function.  */
>> tree destroy_decl;  /* The synthesized destroy function.  */
>> tree promise_type;  /* The cached promise type for this function.  */
>> +  tree traits_type;   /* The cached traits type for this function.  */
>> tree handle_type;   /* The cached coroutine handle for this function.  */
>> tree self_h_proxy;  /* A handle instance that is used as the proxy for 
>> the
>>   one that will eventually be allocated in the coroutine
>> @@ -429,11 +430,12 @@ find_promise_type (tree traits_class)
>> return promise_type;
>>   }
>>   +/* Perform initialization of the coroutine processor state, if not done
>> +   before.  */
>> +
>>   static bool
>> -coro_promise_type_found_p (tree fndecl, location_t loc)
>> +ensure_coro_initialized (location_t loc)
>>   {
>> -  gcc_assert (fndecl != NULL_TREE);
>> -
>> if (!coro_initialized)
>>   {
>> /* Trees we only need to create once.
>> @@ -466,6 +468,33 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
>>   coro_initialized = true;
>>   }
>> +  return true;
>> +}
>> +
>> +/* Try to get the coroutine traits class.  */
>> 

Re: [PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-07-31 Thread Qing Zhao
Hi, Kewen,

Thanks a lot for fixing this testing case issue.
Yes, the change LGTM though I can’t approve it. 

Qing

> On Jul 31, 2024, at 05:22, Kewen.Lin  wrote:
> 
> Hi,
> 
> As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
> was designed for little-endian, the recent commit r15-2403 made it
> be tested with running on BE and PR116148 got exposed.
> 
> This patch is to adjust the expected data for members in with_fam_2_v
> and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
> from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.
> 
> Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -
>   PR testsuite/116148
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/fam-in-union-alone-in-struct-2.c: Define macros
>   WITH_FAM_2_V_B[03] and WITH_FAM_3_V_A[07] as endianness, update the
>   checking with these macros and initialize with_fam_3_v.b[1] with
>   0x5f6f7f8f instead of 0x5f6f7f7f.
> ---
> .../fam-in-union-alone-in-struct-2.c  | 22 ++-
> 1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c 
> b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> index 93f9d5128f6..7845a7fbab3 100644
> --- a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
> @@ -16,7 +16,7 @@ union with_fam_2 {
> union with_fam_3 {
>   char a[];
>   int b[];
> -} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}};
> +} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f8f}};
> 
> struct only_fam {
>   int b[];
> @@ -28,16 +28,28 @@ struct only_fam_2 {
>   int b[];
> } only_fam_2_v = {{7, 11}};
> 
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +#define WITH_FAM_2_V_B0 0x4f
> +#define WITH_FAM_2_V_B3 0x1f
> +#define WITH_FAM_3_V_A0 0x4f
> +#define WITH_FAM_3_V_A7 0x5f
> +#else
> +#define WITH_FAM_2_V_B0 0x1f
> +#define WITH_FAM_2_V_B3 0x4f
> +#define WITH_FAM_3_V_A0 0x1f
> +#define WITH_FAM_3_V_A7 0x8f
> +#endif
> +
> int main ()
> {
>   if (with_fam_1_v.b[3] != 4
>   || with_fam_1_v.b[0] != 1)
> __builtin_abort ();
> -  if (with_fam_2_v.b[3] != 0x1f
> -  || with_fam_2_v.b[0] != 0x4f)
> +  if (with_fam_2_v.b[3] != WITH_FAM_2_V_B3
> +  || with_fam_2_v.b[0] != WITH_FAM_2_V_B0)
> __builtin_abort ();
> -  if (with_fam_3_v.a[0] != 0x4f
> -  || with_fam_3_v.a[7] != 0x5f)
> +  if (with_fam_3_v.a[0] != WITH_FAM_3_V_A0
> +  || with_fam_3_v.a[7] != WITH_FAM_3_V_A7)
> __builtin_abort ();
>   if (only_fam_v.b[0] != 7
>   || only_fam_v.b[1] != 11)
> --
> 2.45.2



Re: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account

2024-07-31 Thread Richard Sandiford
Tamar Christina  writes:
> @@ -289,6 +293,12 @@ struct sve_vec_cost : simd_vec_cost
>const int gather_load_x32_cost;
>const int gather_load_x64_cost;
>  
> +  /* Additional loop initialization cost of using a gather load instruction. 
>  The x32

Sorry for the trivia, but: long line.

> + value is for loads of 32-bit elements and the x64 value is for loads of
> + 64-bit elements.  */
> +  const int gather_load_x32_init_cost;
> +  const int gather_load_x64_init_cost;
> +
>/* The per-element cost of a scatter store.  */
>const int scatter_store_elt_cost;
>  };
> [...]
> @@ -17291,6 +17295,20 @@ aarch64_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>   stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
>   stmt_info, vectype,
>   where, stmt_cost);
> +
> +  /* Check if we've seen an SVE gather/scatter operation and which size. 
>  */
> +  if (kind == scalar_load
> +   && aarch64_sve_mode_p (TYPE_MODE (vectype))
> +   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
> + {
> +   const sve_vec_cost *sve_costs = aarch64_tune_params.vec_costs->sve;

I think we need to check whether this is nonnull, since not all tuning
targets provide SVE costs.

> +   if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64)
> + m_sve_gather_scatter_init_cost
> +   += sve_costs->gather_load_x64_init_cost;
> +   else
> + m_sve_gather_scatter_init_cost
> +   += sve_costs->gather_load_x32_init_cost;
> + }
>  }
>  
>/* Do any SVE-specific adjustments to the cost.  */
> @@ -17676,6 +17694,12 @@ aarch64_vector_costs::finish_cost (const 
> vector_costs *uncast_scalar_costs)
>m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
>m_costs[vect_body]);
>m_suggested_unroll_factor = determine_suggested_unroll_factor ();
> +
> +  /* For gather and scatters there's an additional overhead for the first
> +  iteration.  For low count loops they're not beneficial so model the
> +  overhead as loop prologue costs.  */
> +  if (m_sve_gather_scatter_init_cost)
> + m_costs[vect_prologue] += m_sve_gather_scatter_init_cost;

Might as well make this unconditional now.

LGTM with those changes, but please wait for Kyrill's review too.

Thanks,
Richard

>  }
>  
>/* Apply the heuristic described above m_stp_sequence_cost.  Prefer
> diff --git a/gcc/config/aarch64/tuning_models/a64fx.h 
> b/gcc/config/aarch64/tuning_models/a64fx.h
> index 
> 6091289d4c3c66f01d7e4dbf97a85c1f8c40bb0b..378a1b3889ee265859786c1ff6525fce2305b615
>  100644
> --- a/gcc/config/aarch64/tuning_models/a64fx.h
> +++ b/gcc/config/aarch64/tuning_models/a64fx.h
> @@ -104,6 +104,8 @@ static const sve_vec_cost a64fx_sve_vector_cost =
>13, /* fadda_f64_cost  */
>64, /* gather_load_x32_cost  */
>32, /* gather_load_x64_cost  */
> +  0, /* gather_load_x32_init_cost  */
> +  0, /* gather_load_x64_init_cost  */
>1 /* scatter_store_elt_cost  */
>  };
>  
> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> index 
> 6cae5b7de5ca7ffad8a0f683e1285039bb55d159..b509cae758419a415d9067ec751ef1e6528eb09a
>  100644
> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> @@ -135,6 +135,8 @@ static const sve_vec_cost cortexx925_sve_vector_cost =
>   operation more than a 64-bit gather.  */
>14, /* gather_load_x32_cost  */
>12, /* gather_load_x64_cost  */
> +  42, /* gather_load_x32_init_cost  */
> +  24, /* gather_load_x64_init_cost  */
>1 /* scatter_store_elt_cost  */
>  };
>  
> diff --git a/gcc/config/aarch64/tuning_models/generic.h 
> b/gcc/config/aarch64/tuning_models/generic.h
> index 
> 2b1f68b3052117814161a32f426422736ad6462b..101969bdbb9ccf7eafbd9a1cd6e25f0b584fb261
>  100644
> --- a/gcc/config/aarch64/tuning_models/generic.h
> +++ b/gcc/config/aarch64/tuning_models/generic.h
> @@ -105,6 +105,8 @@ static const sve_vec_cost generic_sve_vector_cost =
>2, /* fadda_f64_cost  */
>4, /* gather_load_x32_cost  */
>2, /* gather_load_x64_cost  */
> +  12, /* gather_load_x32_init_cost  */
> +  4, /* gather_load_x64_init_cost  */
>1 /* scatter_store_elt_cost  */
>  };
>  
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv8_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> index 
> b38b9a8c5cad7d12aa38afdb610a14a25e755010..b5088afe068aa4be7f9dd614cfdd2a51fa96e524
>  100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv8_a.h
> @@ -106,6 +106,8 @@ static const sve_vec_cost generic_armv8_a_sve_vector_cost 
> =
>2, /* fadda_f64_cost  */
>4, /* gather_load_x32_cost  */
>2, /* gather_load_x64_cost  */
> +  

Re: [PATCH] c++/coroutines: only defer expanding co_{await,return,yield} if dependent [PR112341]

2024-07-31 Thread Jason Merrill

On 7/31/24 6:54 AM, Arsen Arsenović wrote:

Tested on x86_64-pc-linux-gnu.  OK for trunk?

TIA, have a lovely day.
-- >8 --
By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

   awaitable g();
   template
   task f()
   {
 co_await g();
 co_yield 1;
 co_return "foo";
   }

... the coroutine promise type in each statement is always
std::coroutine_handle::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on 
co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a propert type.
(defer_expansion_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
defer_expasnion_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
---
  gcc/cp/coroutines.cc  | 142 ++
  gcc/cp/pt.cc  |   5 -
  gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
  gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
  gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
  .../torture/co-yield-03-tmpl-nondependent.C   | 140 +
  6 files changed, 361 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
  create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 127a1c06b56e..9494cb499454 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
tree actor_decl;/* The synthesized actor function.  */
tree destroy_decl;  /* The synthesized destroy function.  */
tree promise_type;  /* The cached promise type for this function.  */
+  tree traits_type;   /* The cached traits type for this function.  */
tree handle_type;   /* The cached coroutine handle for this function.  */
tree self_h_proxy;  /* A handle instance that is used as the proxy for the
 one that will eventually be allocated in the coroutine
@@ -429,11 +430,12 @@ find_promise_type (tree traits_class)
return promise_type;
  }
  
+/* Perform initialization of the coroutine processor state, if not done

+   before.  */
+
  static bool
-coro_promise_type_found_p (tree fndecl, location_t loc)
+ensure_coro_initialized (location_t loc)
  {
-  gcc_assert (fndecl != NULL_TREE);
-
if (!coro_initialized)
  {
/* Trees we only need to create once.
@@ -466,6 +468,33 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
  
coro_initialized = true;

  }
+  return true;
+}
+
+/* Try to get the coroutine traits class.  */
+static tree
+coro_get_traits_class (tree fndecl, location_t loc)
+{
+  gcc_assert (fndecl != NULL_TREE);
+
+  if (!ensure_coro_initialized (loc))
+/* We can't continue.  */
+return error_mark_node;
+
+  coroutine_info *coro_info = 

Re: [PATCH] dir-locals: apply our C settings in C++ also

2024-07-31 Thread Richard Sandiford
Arsen Arsenović  writes:
> We haven't been applying our settings to our C++.  This patch fixes
> that.
>
> Sadly, it seems that the only documented way to apply settings to
> multiple modes is to repeat them.  I thought that we can provide a list
> of modes to apply, but that seems to not be the case (even thought it
> happened to work on my machine).
>
> As a result, C-h C-v fill-column now shows:
>
>   This variable’s value is directory-local, set by the file
>   ‘/home/arsen/gcc/pristine/.dir-locals.el’.
>
> As this could affect peoples workflows, I'm posting as a heads-up and
> sanity check.
>
> OK for trunk?
>
> TIA, have a lovely day.
> -- >8 --
> This also works with Emacs 30 Tree-Sitter C and C++ modes, as they are
> submodes.
>
> ChangeLog:
>
>   * .dir-locals.el: Change c-mode to a list of C, C++ and ObjC
>   modes that Emacs currently provides.

OK, thanks.

Richard

> ---
>  .dir-locals.el | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/.dir-locals.el b/.dir-locals.el
> index fa031cbded99..2c12b3866633 100644
> --- a/.dir-locals.el
> +++ b/.dir-locals.el
> @@ -18,6 +18,10 @@
> (tcl-continued-indent-level . 4)
> (indent-tabs-mode . t)))
>   (nil . ((bug-reference-url-format . "https://gcc.gnu.org/PR%s;)))
> + ;; Please keep C and C++ in sync.
>   (c-mode . ((c-file-style . "GNU")
>   (indent-tabs-mode . t)
> - (fill-column . 79
> + (fill-column . 79)))
> + (c++-mode . ((c-file-style . "GNU")
> +   (indent-tabs-mode . t)
> +   (fill-column . 79


Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Carl Love

Kewen:

On 7/31/24 2:12 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/7/27 06:56, Carl Love wrote:

GCC maintainers:

Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
file.

The following patch adds missing documentation for the vec_test_lsbb_all_ones 
and, vec_test_lsbb_all_zeros built-ins.

Please let me know if the patch is acceptable for mainline.  Thanks.

     Carl

---
rs6000, document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

The test cases for the built-ins are in files:
   gcc/testsuite/gcc.target/powerpc/lsbb.c
   gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c


gcc/ChangeLog:
     * doc/extend.texi (vec_test_lsbb_all_ones,
     vec_test_lsbb_all_zeros): Add documentation for the
     existing built-ins.
---
  gcc/doc/extend.texi | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83ff168faf6..96e41c9a905 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost byte 
of each doubleword.
  The following additional built-in functions are also available for the
  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):

+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector char);

I think we need to specify "unsigned" char explicitly since we don't actually
allow vector "signed" char as the below testing shows:

int foo11 (vector signed char va)
{
   return vec_test_lsbb_all_ones (va);
}

:17:3: error: invalid parameter combination for AltiVec intrinsic 
'__builtin_vec_xvtlsbb_all_ones'
17 |   return vec_test_lsbb_all_ones (va);


Now we make these two bifs as overload, but there is only one instance 
respectively,
Yes, I noticed that the built-ins were defined as overloaded but only 
had one definition.   Did seem odd to me.



either is with "vector unsigned char" as argument type, but the corresponding 
instance
prototype in builtin table is with "vector signed char".  It's inconsistent and 
weird,
I think we can just update the prototype in builtin table with "vector unsigned 
char"
and remove the entries in overload table.  It can be a follow up patch.


I didn't notice that it was signed in the instance prototype but 
unsigned in the overloaded definition.  That is definitely inconsistent.


That said, should we just go ahead and support both signed and unsigned 
argument versions of the all ones and all zeros built-ins?


For example

[VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
    XVTLSBB_ONES   LSBB_ALL_ONES_VSC
  signed int __builtin_vec_xvtlsbb_all_ones (vuc);
    XVTLSBB_ONES   LSBB_ALL_ONES_VUC

I tried this with the testcase, I borrowed from you and extended:

int foo11 (vector char va) 
<- compiles fine

{
  return vec_test_lsbb_all_ones (va);
}

int sfoo11 (vector signed char sva) <- currently fails to compile 
without change to overloaded def, compiles with

{ additional overloaded definition.
  return vec_test_lsbb_all_ones (sva);
}

int ufoo11 (vector unsigned char uva) <- 
compiles fine

{
  return vec_test_lsbb_all_ones (uva);
}

I did a quick test to see that the testcase does compile.  We would need 
to add testcases to lsbb.c and lsbb-runnable.c and then update

the documentation to say both are supported.

Thoughts on expanding the scope of the patch from just documentation to 
adding additional overloaded cases and updating the documentation?





+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
+bit in each byte is a one.  It returns a zero otherwise.

May be better to use the wording "equal to 1" referred from ISA and "returns 0"
matches the preceding "returns 1", like:

“... in each byte is equal to 1.  It returns 0 otherwise.”


Changed.

+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
+bit in each byte is a zero.  It returns a zero otherwise.

Likewise, "... in each byte is equal to 0.  It returns 0 otherwise."


Changed.

  Carl



Re: [PATCH] RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].

2024-07-31 Thread Jeff Law




On 7/31/24 9:55 AM, Robin Dapp wrote:

Hi,

in PR116149 we choose a wrong vector length which causes wrong values in
a reduction.  The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length.  For the
non-scalar variants the respective operand has the correct non-widened
mode.  For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit.  This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1).  Those end up
in the reduction causing the wrong result.

This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.

Regards
  Robin

gcc/ChangeLog:

PR target/116149

* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116149.c: New test.

OK
jeff



RE: [PATCH 8/8]AArch64: take gather/scatter decode overhead into account

2024-07-31 Thread Tamar Christina
ons for the 
specific core.

This because that cost already keeps in mind things like VL differences.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (struct sve_vec_cost): Add
gather_load_x32_init_cost and gather_load_x64_init_cost.
* config/aarch64/aarch64.cc (aarch64_vector_costs): Add
m_sve_gather_scatter_init_cost.
(aarch64_vector_costs::add_stmt_cost): Use them.
(aarch64_vector_costs::finish_cost): Likewise.
* config/aarch64/tuning_models/a64fx.h: Update.
* config/aarch64/tuning_models/cortexx925.h: Update.
* config/aarch64/tuning_models/generic.h: Update.
* config/aarch64/tuning_models/generic_armv8_a.h: Update.
* config/aarch64/tuning_models/generic_armv9_a.h: Update.
* config/aarch64/tuning_models/neoverse512tvb.h: Update.
* config/aarch64/tuning_models/neoversen2.h: Update.
* config/aarch64/tuning_models/neoversen3.h: Update.
* config/aarch64/tuning_models/neoversev1.h: Update.
* config/aarch64/tuning_models/neoversev2.h: Update.
* config/aarch64/tuning_models/neoversev3.h: Update.
* config/aarch64/tuning_models/neoversev3ae.h: Update.

-- inline copy of patch --

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
42639e9efcf1e0f9362f759ae63a31b8eeb0d581..16eb8edab4d9fdfc6e3672c56ef5c9f6962d0c0b
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -262,6 +262,8 @@ struct sve_vec_cost : simd_vec_cost
  unsigned int fadda_f64_cost,
  unsigned int gather_load_x32_cost,
  unsigned int gather_load_x64_cost,
+ unsigned int gather_load_x32_init_cost,
+ unsigned int gather_load_x64_init_cost,
  unsigned int scatter_store_elt_cost)
 : simd_vec_cost (base),
   clast_cost (clast_cost),
@@ -270,6 +272,8 @@ struct sve_vec_cost : simd_vec_cost
   fadda_f64_cost (fadda_f64_cost),
   gather_load_x32_cost (gather_load_x32_cost),
   gather_load_x64_cost (gather_load_x64_cost),
+  gather_load_x32_init_cost (gather_load_x32_init_cost),
+  gather_load_x64_init_cost (gather_load_x64_init_cost),
   scatter_store_elt_cost (scatter_store_elt_cost)
   {}
 
@@ -289,6 +293,12 @@ struct sve_vec_cost : simd_vec_cost
   const int gather_load_x32_cost;
   const int gather_load_x64_cost;
 
+  /* Additional loop initialization cost of using a gather load instruction.  
The x32
+ value is for loads of 32-bit elements and the x64 value is for loads of
+ 64-bit elements.  */
+  const int gather_load_x32_init_cost;
+  const int gather_load_x64_init_cost;
+
   /* The per-element cost of a scatter store.  */
   const int scatter_store_elt_cost;
 };
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
eafa377cb095f49408d8a926fb49ce13e2155ba2..da2feb54ddad9b39db92e0a9ec7c4e40cfa3e4e2
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16227,6 +16227,10 @@ private:
  supported by Advanced SIMD and SVE2.  */
   bool m_has_avg = false;
 
+  /* Additional initialization costs for using gather or scatter operation in
+ the current loop.  */
+  unsigned int m_sve_gather_scatter_init_cost = 0;
+
   /* True if the vector body contains a store to a decl and if the
  function is known to have a vld1 from the same decl.
 
@@ -17291,6 +17295,20 @@ aarch64_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
stmt_cost = aarch64_detect_vector_stmt_subtype (m_vinfo, kind,
stmt_info, vectype,
where, stmt_cost);
+
+  /* Check if we've seen an SVE gather/scatter operation and which size.  
*/
+  if (kind == scalar_load
+ && aarch64_sve_mode_p (TYPE_MODE (vectype))
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_GATHER_SCATTER)
+   {
+ const sve_vec_cost *sve_costs = aarch64_tune_params.vec_costs->sve;
+ if (GET_MODE_UNIT_BITSIZE (TYPE_MODE (vectype)) == 64)
+   m_sve_gather_scatter_init_cost
+ += sve_costs->gather_load_x64_init_cost;
+ else
+   m_sve_gather_scatter_init_cost
+ += sve_costs->gather_load_x32_init_cost;
+   }
 }
 
   /* Do any SVE-specific adjustments to the cost.  */
@@ -17676,6 +17694,12 @@ aarch64_vector_costs::finish_cost (const vector_costs 
*uncast_scalar_costs)
   m_costs[vect_body] = adjust_body_cost (loop_vinfo, scalar_costs,
 m_costs[vect_body]);
   m_suggested_unroll_factor = determine_suggested_unroll_factor ();
+
+  /* For gather and scatters there's an addition

[PATCH] Make may_trap_p_1 return false for constant pool references [PR116145]

2024-07-31 Thread Richard Sandiford
The testcase contains the constant:

  arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));

which was initially hoisted by hand, but which gimple optimisers later
propagated to each use (as expected).  The constant was then expanded
as a load-and-duplicate from the constant pool.  Normally that load
should then be hoisted back out of the loop, but may_trap_or_fault_p
stopped that from happening in this case.

The code responsible was:

  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
 reference; moving it out of context such as when moving code
 when optimizing, might cause its address to become invalid.  */
  code_changed
  || !MEM_NOTRAP_P (x))
{
  poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
  return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
GET_MODE (x), code_changed);
}

where code_changed is true.  (Arguably it doesn't need to be true in
this case, if we inserted invariants on the preheader edge, but it
would still need to be true for conditionally executed loads.)

Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1
would recognise that the address refers to the constant pool.
However, the SVE load-and-replicate instructions have a limited
offset range, so it isn't possible for them to have a LO_SUM address.
All we have is a plain pseudo base register.

MEM_READONLY_P is defined as:

/* 1 if RTX is a mem that is statically allocated in read-only memory.  */
  (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging)

and so I think it should be safe to move memory references if both
MEM_READONLY_P and MEM_NOTRAP_P are true.

The testcase isn't a minimal reproducer, but I think it's good
to have a realistic full routine in the testsuite.

Bootstrapped & regression-tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/116145
* rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code
movement if MEM_READONLY_P is also true.

gcc/testsuite/
PR rtl-optimization/116145
* gcc.target/aarch64/sve/acle/general/pr116145.c: New test.
---
 gcc/rtlanal.cc| 14 --
 .../aarch64/sve/acle/general/pr116145.c   | 46 +++
 2 files changed, 56 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c

diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 4158a531bdd..893a6afbbc5 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags)
  && MEM_VOLATILE_P (x)
  && XEXP (x, 0) == stack_pointer_rtx)
return true;
-  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
-reference; moving it out of context such as when moving code
-when optimizing, might cause its address to become invalid.  */
- code_changed
+  if (/* MEM_READONLY_P means that the memory is both statically
+allocated and readonly, so MEM_NOTRAP_P should remain true
+even if the memory reference is moved.  This is certainly
+true for the important case of force_const_mem.
+
+Otherwise, MEM_NOTRAP_P only relates to the actual position
+of the memory reference; moving it out of context such as
+when moving code when optimizing, might cause its address
+to become invalid.  */
+ (code_changed && !MEM_READONLY_P (x))
  || !MEM_NOTRAP_P (x))
{
  poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
new file mode 100644
index 000..a3d93d3e1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c
@@ -0,0 +1,46 @@
+// { dg-options "-O2" }
+
+#include 
+#include 
+
+#pragma GCC target "+sve2"
+
+typedef unsigned char uchar;
+
+const uchar *
+search_line_fast (const uchar *s, const uchar *end)
+{
+  size_t VL = svcntb();
+  svuint8_t arr1, arr2;
+  svbool_t pc, pg = svptrue_b8();
+
+  // This should not be loaded inside the loop every time.
+  arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f));
+
+  for (; s+VL <= end; s += VL) {
+arr1 = svld1_u8(pg, s);
+pc = svmatch_u8(pg, arr1, arr2);
+
+if (svptest_any(pg, pc)) {
+  pc = svbrkb_z(pg, pc);
+  return s+svcntp_b8(pg, pc);
+}
+  }
+
+  // Handle remainder.
+  if (s < end) {
+pg = svwhilelt_b8((size_t)s, (size_t)end);
+
+arr1 = svld1_u8(pg, s);
+pc = svmatch_u8(pg, arr1, arr2);
+
+if (svptest_any(pg, pc)) {
+  pc = svbrkb_z(pg, pc);
+  return s+svcntp_b8(pg, pc);
+}
+  }
+
+  return end;
+}
+
+// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } }
-- 
2.25.1



Re: [PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Sam James
Jonathan Wakely  writes:

> On Wed, 31 Jul 2024 at 16:45, Sam James  wrote:
>>
>> We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
>> one.
>>
>> libstdc++-v3/ChangeLog:
>> * testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do 
>> run'.
>> ---
>> OK? No regressions in the logs but it's a bit weird that it's got a proper
>> directive with a target specifier so I thought I'd check rather than doing
>> it as obvious.
>
> Definitely OK. Dejagnu will ignore it because it doesn't start with
> dg- so it is useless.

Thank you! Will push shortly.

>
> Even if it was used, it would be wrong because std::regex can't be
> used in C++98 so the c++11 effective target is needed.

That's the missing piece I was looking for -- I just didn't want to be
dropping the bogus directive and covering up if something else was wrong
there.

> [...]

thanks,
sam


signature.asc
Description: PGP signature


[Patch, libgfortran] PR105361 Followup fix to test case

2024-07-31 Thread Jerry D
I plan to push this soon to hopefully fix some test breakage on some 
architetures.  It is simple and obvious. I did not get any feedback on 
this and I do not have access to the machines in question.


Regression tested on linux-x86-64.

Regards,

Jerry

commit bc4ee05dc7c60d534ef927ac5e679f67fb99d54b
Author: Jerry DeLisle 
Date:   Wed Jul 31 08:58:17 2024 -0700

Fortran: Add newline character to test input.

gcc/testsuite/ChangeLog:

PR libfortran/105361

* gfortran.dg/pr105361.f90: Add newline character to test
input to provide more compliant test.

diff --git a/gcc/testsuite/gfortran.dg/pr105361.f90 
b/gcc/testsuite/gfortran.dg/pr105361.f90

index e2d3b07caca..62821c2802d 100644
--- a/gcc/testsuite/gfortran.dg/pr105361.f90
+++ b/gcc/testsuite/gfortran.dg/pr105361.f90
@@ -27,7 +27,7 @@ program main
   type(foo) :: a, b
   real :: c, d
   open(10, access="stream")
-  write(10) "1 2" ! // NEW_LINE('A')
+  write(10) "1 2" // NEW_LINE('A')
   close(10)
   open(10)
   read(10,*) c, d


Re: [PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Jonathan Wakely
On Wed, 31 Jul 2024 at 16:45, Sam James  wrote:
>
> We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
> one.
>
> libstdc++-v3/ChangeLog:
> * testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do run'.
> ---
> OK? No regressions in the logs but it's a bit weird that it's got a proper
> directive with a target specifier so I thought I'd check rather than doing
> it as obvious.

Definitely OK. Dejagnu will ignore it because it doesn't start with
dg- so it is useless.

Even if it was used, it would be wrong because std::regex can't be
used in C++98 so the c++11 effective target is needed.


>
>  libstdc++-v3/testsuite/28_regex/traits/char/translate.cc | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc 
> b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> index e2552e3cbf05..65119e67e25b 100644
> --- a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> +++ b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
> @@ -1,4 +1,3 @@
> -// { dg_do run }
>  // { dg-do run { target c++11 } }
>  // { dg-timeout-factor 2 }
>
>
> --
> 2.45.2
>



Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:

> > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > XFmode is rejected in the hook?
> > > > > >
> > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy 
> > > > > > to
> > > > > > get some hard data.
> > > > >
> > > > > My (limited) understanding was that the hook would be used only for 
> > > > > cases
> > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > and some
> > > > > subsequent loads from the same address with different mode but same 
> > > > > size
> > > > > the same and replace say int or long long later load with 
> > > > > VIEW_CONVERT_EXPR
> > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > because
> > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > doing
> > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > program,
> > > > > load some floating point value and store it elsewhere or as part of 
> > > > > larger
> > > > > aggregate copy.
> > > >
> > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > >
> > > >
> > > > switch (GET_MODE_INNER (mode))
> > > >   {
> > > >   case SFmode:
> > > >   case DFmode:
> > > > /* These suffer from normalization upon load when not using 
> > > > SSE.  */
> > > > return !(ix86_fpmath & FPMATH_387);
> > > >   default:
> > > > return true;
> > > >   }
> > >
> > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > wrapper around the hook should reject modes with padding or if
> > > the supposed users (value-numbering and SRA) should deal with that
> > > issue separately.  I do wonder whether
> > >
> > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > >   ? _extended_intel_128_format
> > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > >   ? _extended_intel_96_round_53_format
> > >   : _extended_intel_96_format));
> > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > >
> > > unambiguously specifies where the padding is - m68k has
> > >
> > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > >
> > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > possible padding as unspecified and not "masked out" even if
> > > the actual fstp will only store 10 bytes.
> >
> > The hardware will never touch bytes outside 10 bytes range, the
> > padding is some artificial compiler thingy, so IMO it should be
> > handled before the hook is called. Please find attached the source I
> > have used to confirm that a) the copied bits will never be mangled and
> > b) there is no access outside the 10 bytes range. (BTW: these
> > particular values are to test the effect of leading bit 63, the
> > non-hidden normalized bit).
>
> Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> mode_base_align[XFmode] seems to be correctly set to ensure
> 12 bytes / 16 bytes "effective" size.

FTR, "long double" AKA __float80 is defined as fundamental type in psABI as:

sizeof 12, alignment 4 for i386 [1] and
sizeof 16, alignment 16 for x86_64 [2].

These values are thus set by ABI despite the fact that hardware
handles only 10 bytes.

[1] Table 2.1, page 8 of https://www.uclibc.org/docs/psABI-i386.pdf
[2] Figure 3.1, page 12 of
https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

Uros.


[PATCH] RISC-V: Correct mode_idx attribute for viwalu wx variants [PR116149].

2024-07-31 Thread Robin Dapp
Hi,

in PR116149 we choose a wrong vector length which causes wrong values in
a reduction.  The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length.  For the
non-scalar variants the respective operand has the correct non-widened
mode.  For the scalar variants, however, the same operand has a scalar
mode which obviously only has one unit.  This makes us choose VL = 1
leaving three elements undisturbed (so potentially -1).  Those end up
in the reduction causing the wrong result.

This patch adjusts the mode_idx just for the scalar variants of the
affected instruction patterns.

Regards
 Robin

gcc/ChangeLog:

PR target/116149

* config/riscv/vector.md: Fix mode_idx attribute of scalar
widen add/sub variants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr116149.c: New test.
---
 gcc/config/riscv/vector.md |  2 ++
 .../gcc.target/riscv/rvv/autovec/pr116149.c| 18 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index bcedf3d79e2..d4d9bd87e91 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4016,6 +4016,7 @@ (define_insn 
"@pred_single_widen_add_extended_scalar"
   "TARGET_VECTOR"
   "vwadd.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "viwalu")
+   (set_attr "mode_idx" "3")
(set_attr "mode" "")])
 
 (define_insn "@pred_single_widen_sub_extended_scalar"
@@ -4038,6 +4039,7 @@ (define_insn 
"@pred_single_widen_sub_extended_scalar"
   "TARGET_VECTOR"
   "vwsub.wx\t%0,%3,%z4%p1"
   [(set_attr "type" "viwalu")
+   (set_attr "mode_idx" "3")
(set_attr "mode" "")])
 
 (define_insn "@pred_widen_mulsu"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c
new file mode 100644
index 000..4f5927b96fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr116149.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=rv64gcv_zvl256b -mabi=lp64d -mrvv-vector-bits=zvl" 
} */
+
+long a;
+short b[6];
+short c[20];
+int main() {
+  for (short d = 0; d < 20; d += 3) {
+c[d] = 0;
+for (int e = 0; e < 20; e += 2)
+  for (int f = 1; f < 20; f += 2)
+a += (unsigned)b[f + e];
+  }
+  if (a != 0)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-assembler-times "vsetivli\tzero,1" 0 } } */
-- 
2.45.2



Re: [PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 3:40 PM Richard Biener  wrote:
>
> The following implements the hook, excluding x87 modes for scalar
> and complex float modes.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> OK this way?
>
> Thanks,
> Richard.
>
> * i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
> (ix86_mode_can_transfer_bits): New function.

OK.

Thanks for your efforts and your patience to resolve this issue!

Uros.

> ---
>  gcc/config/i386/i386.cc | 22 ++
>  1 file changed, 22 insertions(+)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 12d15feb5e9..9869c44ee15 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -26113,6 +26113,25 @@ ix86_have_ccmp ()
>return (bool) TARGET_APX_CCMP;
>  }
>
> +/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
> +static bool
> +ix86_mode_can_transfer_bits (machine_mode mode)
> +{
> +  if (GET_MODE_CLASS (mode) == MODE_FLOAT
> +  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
> +switch (GET_MODE_INNER (mode))
> +  {
> +  case SFmode:
> +  case DFmode:
> +   /* These suffer from normalization upon load when not using SSE.  */
> +   return !(ix86_fpmath & FPMATH_387);
> +  default:
> +   return true;
> +  }
> +
> +  return true;
> +}
> +
>  /* Target-specific selftests.  */
>
>  #if CHECKING_P
> @@ -26959,6 +26978,9 @@ ix86_libgcc_floating_mode_supported_p
>  #undef TARGET_HAVE_CCMP
>  #define TARGET_HAVE_CCMP ix86_have_ccmp
>
> +#undef TARGET_MODE_CAN_TRANSFER_BITS
> +#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
> +
>  static bool
>  ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
>  {
> --
> 2.43.0
>


[PATCH] libstdc++: drop bogus 'dg_do run' directive

2024-07-31 Thread Sam James
We already have a valid 'dg-do run' (- vs _) directive, so drop the bogus
one.

libstdc++-v3/ChangeLog:
* testsuite/28_regex/traits/char/translate.cc: Drop bogus 'dg_do run'.
---
OK? No regressions in the logs but it's a bit weird that it's got a proper
directive with a target specifier so I thought I'd check rather than doing
it as obvious.

 libstdc++-v3/testsuite/28_regex/traits/char/translate.cc | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc 
b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
index e2552e3cbf05..65119e67e25b 100644
--- a/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
+++ b/libstdc++-v3/testsuite/28_regex/traits/char/translate.cc
@@ -1,4 +1,3 @@
-// { dg_do run }
 // { dg-do run { target c++11 } }
 // { dg-timeout-factor 2 }
 

-- 
2.45.2



Re: [RFH PATCH] c++: Implement C++26 P2963R3 - Ordering of constraints involving fold expressions [PR115746]

2024-07-31 Thread Patrick Palka
On Tue, 30 Jul 2024, Jason Merrill wrote:

> On 7/29/24 5:32 PM, Patrick Palka wrote:
> > On Mon, 29 Jul 2024, Jakub Jelinek wrote:
> > 
> > > On Fri, Jul 26, 2024 at 06:00:12PM -0400, Patrick Palka wrote:
> > > > On Fri, 26 Jul 2024, Jakub Jelinek wrote:
> > > > 
> > > > > On Fri, Jul 26, 2024 at 04:42:36PM -0400, Patrick Palka wrote:
> > > > > > > // P2963R3 - Ordering of constraints involving fold expressions
> > > > > > > // { dg-do compile { target c++20 } }
> > > > > > > 
> > > > > > > template  concept C = (__is_same (T, int) && ...);
> > > > > > > template 
> > > > > > > struct S {
> > > > > > >template  requires (C)
> > > > > > >static constexpr bool foo () { return true; }
> > > > > > > };
> > > > > > > 
> > > > > > > static_assert (S::foo  ());
> > > > > > > 
> > > > > > > somehow the template parameter mapping needs to be remembered even
> > > > > > > for the
> > > > > > > fold expanded constraint, right now the patch will see the pack is
> > > > > > > T,
> > > > > > > which is level 1 index 0, but args aren't arguments of the C
> > > > > > > concept,
> > > > > > > but of the foo function template.
> > > > > > > One can also use requires (C) etc., no?
> > > > > > 
> > > > > > It seems the problem is FOLD_EXPR_PACKS is currently set to the
> > > > > > parameter packs used inside the non-normalized constraints, but I
> > > > > > think
> > > > > > what we really need are the packs used in the normalized
> > > > > > constraints,
> > > > > > specifically the packs used in the target of each parameter mapping
> > > > > > of
> > > > > > each atomic constraint?
> > > > > 
> > > > > But in that case there might be no packs at all.
> > > > > 
> > > > > template  C = true;
> > > > > template  requires (C && ...)
> > > > > constexpr bool foo () { return true; }
> > > > > 
> > > > > If normalized C is just true, it doesn't use any packs.
> > > > > But the [temp.constr.fold] wording assumes it is a pack expansion and
> > > > > that
> > > > > there is at least one pack expansion parameter, otherwise N wouldn't
> > > > > be
> > > > > defined.
> > > > 
> > > > Hmm yeah, I see what you mean.  That seems to be an edge case that's not
> > > > fully accounted for by the wording.
> 
> I agree the wording is unclear, but it seems necessary to me that T is a pack
> expansion parameter, even if it isn't mentioned by the normalized constraint.
> 
> > > > One thing that's unclear to me in that wording is what are the pack
> > > > expansion parameters of a fold expanded constraint.
> > > > 
> > > > In
> > > > 
> > > >template concept C = (__is_same (T, int) && ...);
> > > >template
> > > >void f() requires C;
> > > > 
> > > > is the pack expansion parameter T or V?  In
> > > > 
> > > >template concept C = (__is_same (T, int) && ...);
> > > >template
> > > >void g() requires C;
> > > > 
> > > > it must be T.  So I guess in both cases it must be T.  But then I reckon
> > > > when [temp.constr.fold] mentions "pack expansion parameter(s)" what it
> > > > really means is "target of each pack expansion parameter within the
> > > > parameter mapping"...
> 
> Yeah.
> 
> In the paper a fold expanded constraint doesn't have a parameter mapping, only
> atomic constraints do.  Within the normal form of (__is_same (T, int) && ...)
> we have a single atomic constraint with parameter mapping T -> T, which only
> comes into play when we're checking satisfaction for each element.
> 
> But that doesn't specify how the packs are established.  For many cases it's a
> simple matter of connecting one pack to another, so you could kind of handwave
> it, but it isn't that hard to come up with a testcase that isn't so simple,
> say
> 
> template concept C = (__is_same (T, int) && ...);
> template 

Re: [PATCH] middle-end/114563 - improve release_pages

2024-07-31 Thread Andi Kleen
On Wed, Jul 31, 2024 at 04:02:22PM +0200, Richard Biener wrote:
> The following improves release_pages when using the madvise path
> to sort the freelist to get more page entries contiguous and possibly
> release them.  This populates the unused prev pointer so the reclaim
> can then easily unlink from the freelist without re-ordering it.
> The paths not having madvise do not keep the memory allocated, so
> I left them untouched.
> 
> Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> I've CCed people messing with release_pages;  This doesn't really
> address PR114563 but I thought I post this patch anyway - the
> actual issue we run into for the PR is the linear search of
> G.free_pages when that list becomes large but a requested allocation
> cannot be served from it.
> 
>   PR middle-end/114563
>   * ggc-page.cc (page_sort): New qsort comparator.
>   (release_pages): Sort the free_pages list entries after their
>   memory block virtual address to improve contiguous memory
>   chunk release.

I saw this in a profile some time ago and tried it with a slightly
different patch. Instead of a full sort it uses an array to keep
multiple free lists. But I couldn't find any speed ups in non checking
builds later.

My feeling is that an array is probably more efficient.

I guess should compare both on that PR.


diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
index 4245f843a29f..af1627b002c6 100644
--- a/gcc/ggc-page.cc
+++ b/gcc/ggc-page.cc
@@ -234,6 +234,8 @@ static struct
 }
 inverse_table[NUM_ORDERS];
 
+struct free_list;
+
 /* A page_entry records the status of an allocation page.  This
structure is dynamically sized to fit the bitmap in_use_p.  */
 struct page_entry
@@ -251,6 +253,9 @@ struct page_entry
  of the host system page size.)  */
   size_t bytes;
 
+  /* Free list of this page size.  */
+  struct free_list *free_list;
+
   /* The address at which the memory is allocated.  */
   char *page;
 
@@ -368,6 +373,15 @@ struct free_object
 };
 #endif
 
+constexpr int num_free_list = 8;
+
+/* A free_list for pages with BYTES size.  */
+struct free_list
+{
+  size_t bytes;
+  page_entry *free_pages;
+};
+
 /* The rest of the global variables.  */
 static struct ggc_globals
 {
@@ -412,8 +426,8 @@ static struct ggc_globals
   int dev_zero_fd;
 #endif
 
-  /* A cache of free system pages.  */
-  page_entry *free_pages;
+  /* A cache of free system pages. Entry 0 is fallback.  */
+  struct free_list free_lists[num_free_list];
 
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *page_groups;
@@ -754,6 +768,26 @@ clear_page_group_in_use (page_group *group, char *page)
 }
 #endif
 
+/* Find a free list for ENTRY_SIZE.  */
+
+static inline struct free_list *
+find_free_list (size_t entry_size)
+{
+  int i;
+  for (i = 1; i < num_free_list; i++)
+{
+  if (G.free_lists[i].bytes == entry_size)
+   return _lists[i];
+  if (G.free_lists[i].bytes == 0)
+   {
+ G.free_lists[i].bytes = entry_size;
+ return _lists[i];
+   }
+}
+  /* Fallback.  */
+  return _lists[0];
+}
+
 /* Allocate a new page for allocating objects of size 2^ORDER,
and return an entry for it.  The entry is not added to the
appropriate page_table list.  */
@@ -770,6 +804,7 @@ alloc_page (unsigned order)
 #ifdef USING_MALLOC_PAGE_GROUPS
   page_group *group;
 #endif
+  struct free_list *free_list;
 
   num_objects = OBJECTS_PER_PAGE (order);
   bitmap_size = BITMAP_SIZE (num_objects + 1);
@@ -782,8 +817,10 @@ alloc_page (unsigned order)
   entry = NULL;
   page = NULL;
 
+  free_list = find_free_list (entry_size);
+
   /* Check the list of free pages for one we can use.  */
-  for (pp = _pages, p = *pp; p; pp = >next, p = *pp)
+  for (pp = _list->free_pages, p = *pp; p; pp = >next, p = *pp)
 if (p->bytes == entry_size)
   break;
 
@@ -816,7 +853,7 @@ alloc_page (unsigned order)
   /* We want just one page.  Allocate a bunch of them and put the
 extras on the freelist.  (Can only do this optimization with
 mmap for backing store.)  */
-  struct page_entry *e, *f = G.free_pages;
+  struct page_entry *e, *f = free_list->free_pages;
   int i, entries = GGC_QUIRE_SIZE;
 
   page = alloc_anon (NULL, G.pagesize * GGC_QUIRE_SIZE, false);
@@ -833,12 +870,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->bytes = G.pagesize;
+ e->free_list = free_list;
  e->page = page + (i << G.lg_pagesize);
  e->next = f;
  f = e;
}
 
-  G.free_pages = f;
+  free_list->free_pages = f;
 }
   else
 page = alloc_anon (NULL, entry_size, true);
@@ -904,12 +942,13 @@ alloc_page (unsigned order)
  e = XCNEWVAR (struct page_entry, page_entry_size);
  e->order = order;
  e->b

[PATCH] dir-locals: apply our C settings in C++ also

2024-07-31 Thread Arsen Arsenović
We haven't been applying our settings to our C++.  This patch fixes
that.

Sadly, it seems that the only documented way to apply settings to
multiple modes is to repeat them.  I thought that we can provide a list
of modes to apply, but that seems to not be the case (even thought it
happened to work on my machine).

As a result, C-h C-v fill-column now shows:

  This variable’s value is directory-local, set by the file
  ‘/home/arsen/gcc/pristine/.dir-locals.el’.

As this could affect peoples workflows, I'm posting as a heads-up and
sanity check.

OK for trunk?

TIA, have a lovely day.
-- >8 --
This also works with Emacs 30 Tree-Sitter C and C++ modes, as they are
submodes.

ChangeLog:

* .dir-locals.el: Change c-mode to a list of C, C++ and ObjC
modes that Emacs currently provides.
---
 .dir-locals.el | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/.dir-locals.el b/.dir-locals.el
index fa031cbded99..2c12b3866633 100644
--- a/.dir-locals.el
+++ b/.dir-locals.el
@@ -18,6 +18,10 @@
  (tcl-continued-indent-level . 4)
  (indent-tabs-mode . t)))
  (nil . ((bug-reference-url-format . "https://gcc.gnu.org/PR%s;)))
+ ;; Please keep C and C++ in sync.
  (c-mode . ((c-file-style . "GNU")
(indent-tabs-mode . t)
-   (fill-column . 79
+   (fill-column . 79)))
+ (c++-mode . ((c-file-style . "GNU")
+ (indent-tabs-mode . t)
+ (fill-column . 79
-- 
2.45.2



Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-07-31 Thread Jonathan Wakely
On Wed, 31 Jul 2024 at 15:42, Björn Schäpers  wrote:
>
> Am 30.07.2024 um 11:13 schrieb Jonathan Wakely:
> > On Sun, 24 Mar 2024 at 21:34, Björn Schäpers  wrote:
> >>
> >> From: Björn Schäpers 
> >>
> >> This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
> >> I don't know if I picked the right way to do it.
> >>
> >> When acceptable I think the declaration should be moved into
> >> ops-common.h, since then we could use stat_type and also use that in the
> >> commonly used function.
> >>
> >> Manually tested on i686-w64-mingw32.
> >>
> >> -- >8 --
> >> libstdc++: Fix overwriting files on windows
> >>
> >> The inodes have no meaning on windows, thus all files have an inode of
> >> 0. Use a differenz approach to identify equivalent files. As a result
> >> std::filesystem::copy_file did not honor
> >> copy_options::overwrite_existing. Factored the method out of
> >> std::filesystem::equivalent.
> >>
> >> libstdc++-v3/Changelog:
> >>
> >>  * include/bits/fs_ops.h: Add declaration of
> >>__detail::equivalent_win32.
> >>  * src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
> >>  (fs::equivalent): Use __detail::equivalent_win32, factored the
> >>  old test out.
> >>  * src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
> >>Use the function.
> >>
> >> Signed-off-by: Björn Schäpers 
> >> ---
> >>   libstdc++-v3/include/bits/fs_ops.h   |  8 +++
> >>   libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
> >>   libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
> >>   3 files changed, 60 insertions(+), 37 deletions(-)
> >>
> >> diff --git a/libstdc++-v3/include/bits/fs_ops.h 
> >> b/libstdc++-v3/include/bits/fs_ops.h
> >> index 90650c47b46..d10b78a4bdd 100644
> >> --- a/libstdc++-v3/include/bits/fs_ops.h
> >> +++ b/libstdc++-v3/include/bits/fs_ops.h
> >> @@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>
> >>   namespace filesystem
> >>   {
> >> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +namespace __detail
> >> +{
> >> +  bool
> >> +  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);
> >> +} // namespace __detail
> >> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +
> >> /** @addtogroup filesystem
> >>  *  @{
> >>  */
> >> diff --git a/libstdc++-v3/src/c++17/fs_ops.cc 
> >> b/libstdc++-v3/src/c++17/fs_ops.cc
> >> index 61df19753ef..3cc87d45237 100644
> >> --- a/libstdc++-v3/src/c++17/fs_ops.cc
> >> +++ b/libstdc++-v3/src/c++17/fs_ops.cc
> >> @@ -67,6 +67,49 @@
> >>   namespace fs = std::filesystem;
> >>   namespace posix = std::filesystem::__gnu_posix;
> >>
> >> +#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +bool
> >> +fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
> >> +  error_code& ec)
> >> +{
> >> +  struct auto_handle {
> >> +explicit auto_handle(const path& p_)
> >> +: handle(CreateFileW(p_.c_str(), 0,
> >> +   FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> >> +   0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> >> +{ }
> >> +
> >> +~auto_handle()
> >> +{ if (*this) CloseHandle(handle); }
> >> +
> >> +explicit operator bool() const
> >> +{ return handle != INVALID_HANDLE_VALUE; }
> >> +
> >> +bool get_info()
> >> +{ return GetFileInformationByHandle(handle, ); }
> >> +
> >> +HANDLE handle;
> >> +BY_HANDLE_FILE_INFORMATION info;
> >> +  };
> >> +  auto_handle h1(p1);
> >> +  auto_handle h2(p2);
> >> +  if (!h1 || !h2)
> >> +{
> >> +  if (!h1 && !h2)
> >> +   ec = __last_system_error();
> >> +  return false;
> >> +}
> >> +  if (!h1.get_info() || !h2.get_info())
> >> +{
> >> +  ec = __last_system_error();
> >> +  return false;
> >> +}
> >> +  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
> >> +&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
> >> +&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
> >> +}
> >> +#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
> >> +
> >>   fs::path
> >>   fs::absolute(const path& p)
> >>   {
> >> @@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, 
> >> error_code& ec) noexcept
> >> if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
> >>  return false;
> >>
> >> -  struct auto_handle {
> >> -   explicit auto_handle(const path& p_)
> >> -   : handle(CreateFileW(p_.c_str(), 0,
> >> - FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
> >> - 0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
> >> -   { }
> >> -
> >> -   ~auto_handle()
> >> -   { if (*this) CloseHandle(handle); }
> >> -
> >> -   explicit operator bool() const
> >> -   { return handle != INVALID_HANDLE_VALUE; }
> >> -
> >> -   bool get_info()
> >> -   { return GetFileInformationByHandle(handle, ); }
> >> -
> >> -   HANDLE handle;
> >> -   BY_HANDLE_FILE_INFORMATION 

[COMMITTED PATCH 5/5] testsuite: fix dg-require-* order vs dg-additional-sources

2024-07-31 Thread Sam James
Per gccint, 'dg-require-*' must come before any
'dg-additional-sources' directives. Fix a handful of deviant cases.

* gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c: Fix 
dg-require-profiling
directive order.
* gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c | 2 +-
 gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
index b57d30f91637..f6ec71a9298d 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-1.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target lto } */
-/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */
 
 #ifdef FOR_AUTOFDO_TESTING
diff --git a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c 
b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
index 6b5ae93214a5..2ace3c3b9bf1 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/crossmodule-indir-call-topn-2.c
@@ -1,6 +1,6 @@
 /* { dg-require-effective-target lto } */
-/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-additional-sources "crossmodule-indir-call-topn-1a.c" } */
 /* { dg-options "-O2 -flto -DDOJOB=1 -fdump-ipa-profile_estimate" } */
 
 #ifdef FOR_AUTOFDO_TESTING
-- 
2.45.2



[COMMITTED PATCH 4/5] testsuite: fix dg-require-effective-target order vs dg-additional-sources

2024-07-31 Thread Sam James
Per gccint, 'dg-require-effective-target' must come before any
'dg-additional-sources' directives. Fix a handful of deviant cases.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/aapcs64/func-ret-3.c: Fix 
dg-require-effective-target directive order.
* gcc.target/aarch64/aapcs64/func-ret-4.c: Likewise.
* gfortran.dg/PR100914.f90: Likewise.

libgomp/ChangeLog:
* testsuite/libgomp.c++/pr24455.C: Fix dg-require-effective-target 
directive order.
* testsuite/libgomp.c/pr24455.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c | 2 +-
 gcc/testsuite/gfortran.dg/PR100914.f90| 2 +-
 libgomp/testsuite/libgomp.c++/pr24455.C   | 2 +-
 libgomp/testsuite/libgomp.c/pr24455.c | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
index 1d35ebf14b4b..ebd2e8dd8791 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-3.c
@@ -4,9 +4,9 @@
in AAPCS64 \S 4.3.5.  */
 
 /* { dg-do run { target aarch64-*-* } } */
+/* { dg-require-effective-target aarch64_big_endian } */
 /* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
-/* { dg-require-effective-target aarch64_big_endian } */
 
 #ifndef IN_FRAMEWORK
 #define TESTFILE "func-ret-3.c"
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
index 15e1408c62d7..03d42f3dd047 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/func-ret-4.c
@@ -5,9 +5,9 @@
are treated as general composite types.  */
 
 /* { dg-do run { target aarch64*-*-* } } */
+/* { dg-require-effective-target aarch64_big_endian } */
 /* { dg-additional-options "-mbranch-protection=none" } */
 /* { dg-additional-sources "abitest.S" } */
-/* { dg-require-effective-target aarch64_big_endian } */
 
 #ifndef IN_FRAMEWORK
 #define TESTFILE "func-ret-4.c"
diff --git a/gcc/testsuite/gfortran.dg/PR100914.f90 
b/gcc/testsuite/gfortran.dg/PR100914.f90
index 8588157e59c0..161f1265fa21 100644
--- a/gcc/testsuite/gfortran.dg/PR100914.f90
+++ b/gcc/testsuite/gfortran.dg/PR100914.f90
@@ -1,7 +1,7 @@
 ! Fails on x86 targets where sizeof(long double) == 16.
 ! { dg-do run }
-! { dg-additional-sources PR100914.c }
 ! { dg-require-effective-target fortran_real_c_float128 }
+! { dg-additional-sources PR100914.c }
 ! { dg-additional-options "-Wno-pedantic" }
 !
 ! Test the fix for PR100914
diff --git a/libgomp/testsuite/libgomp.c++/pr24455.C 
b/libgomp/testsuite/libgomp.c++/pr24455.C
index 8256b6693c8f..9816d37461a5 100644
--- a/libgomp/testsuite/libgomp.c++/pr24455.C
+++ b/libgomp/testsuite/libgomp.c++/pr24455.C
@@ -1,6 +1,6 @@
 // { dg-do run }
-// { dg-additional-sources pr24455-1.C }
 // { dg-require-effective-target tls_runtime }
+// { dg-additional-sources pr24455-1.C }
 // { dg-options "-fno-extern-tls-init" }
 
 extern "C" void abort (void);
diff --git a/libgomp/testsuite/libgomp.c/pr24455.c 
b/libgomp/testsuite/libgomp.c/pr24455.c
index 8af449e7b5c3..4284c1095293 100644
--- a/libgomp/testsuite/libgomp.c/pr24455.c
+++ b/libgomp/testsuite/libgomp.c/pr24455.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
-/* { dg-additional-sources pr24455-1.c } */
 /* { dg-require-effective-target tls_runtime } */
+/* { dg-additional-sources pr24455-1.c } */
 
 extern void abort (void);
 
-- 
2.45.2



[COMMITTED PATCH 3/5] testsuite: fix 'dg-do-preprocess' typo

2024-07-31 Thread Sam James
We want 'dg-do preprocess', not 'dg-do-preprocess'. Fix that.

PR target/106828
* g++.target/loongarch/pr106828.C: Fix 'dg-do compile' typo.
---
Committed as obvious.

 gcc/testsuite/g++.target/loongarch/pr106828.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/loongarch/pr106828.C 
b/gcc/testsuite/g++.target/loongarch/pr106828.C
index 190c1db715f4..0d13cbbd5153 100644
--- a/gcc/testsuite/g++.target/loongarch/pr106828.C
+++ b/gcc/testsuite/g++.target/loongarch/pr106828.C
@@ -1,4 +1,4 @@
-/* { dg-do-preprocess } */
+/* { dg-do preprocess } */
 /* { dg-options "-mabi=lp64d -fsanitize=address" } */
 
 /* Tests whether the compiler supports compile option '-fsanitize=address'.  */
-- 
2.45.2



[COMMITTED PATCH 2/5] testsuite: fix 'dg-do-compile' typos

2024-07-31 Thread Sam James
We want 'dg-do compile', not 'dg-do-compile'. Fix that.

PR target/69194
PR c++/92024
PR c++/110057
* c-c++-common/Wshadow-1.c: Fix 'dg-do compile' typo.
* g++.dg/tree-ssa/devirt-array-destructor-1.C: Likewise.
* g++.dg/tree-ssa/devirt-array-destructor-2.C: Likewise.
* gcc.target/arm/pr69194.c: Likewise.
---
Committed as obvious.

 gcc/testsuite/c-c++-common/Wshadow-1.c| 2 +-
 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C | 2 +-
 gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C | 2 +-
 gcc/testsuite/gcc.target/arm/pr69194.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Wshadow-1.c 
b/gcc/testsuite/c-c++-common/Wshadow-1.c
index 4d1edf07f002..3cd99e9087ec 100644
--- a/gcc/testsuite/c-c++-common/Wshadow-1.c
+++ b/gcc/testsuite/c-c++-common/Wshadow-1.c
@@ -1,4 +1,4 @@
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* { dg-additional-options "-Wshadow=local -Wno-shadow=compatible-local" } */
 int c;
 void foo(int *c, int *d)   /* { dg-bogus   "Wshadow" } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C 
b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
index ce8dc2a57cd7..eed9a7c17698 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-1.C
@@ -1,5 +1,5 @@
 // PR c++/110057
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* Virtual calls should be devirtualized because we know dynamic type of 
object in array at compile time */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
 
diff --git a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C 
b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
index 6b44dc1a4eea..448f3739700f 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/devirt-array-destructor-2.C
@@ -1,5 +1,5 @@
 // PR c++/110057
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* Virtual calls should be devirtualized because we know dynamic type of 
object in array at compile time */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-inline"  } */
 
diff --git a/gcc/testsuite/gcc.target/arm/pr69194.c 
b/gcc/testsuite/gcc.target/arm/pr69194.c
index 477d5f92c8ec..dc1b0d306c2b 100644
--- a/gcc/testsuite/gcc.target/arm/pr69194.c
+++ b/gcc/testsuite/gcc.target/arm/pr69194.c
@@ -1,5 +1,5 @@
 /* PR target/69194 */
-/* { dg-do-compile } */
+/* { dg-do compile } */
 /* { dg-require-effective-target arm_neon_ok } */
 /* { dg-options "-O2" } */
 /* { dg-add-options arm_neon } */
-- 
2.45.2



[COMMITTED PATCH 1/5] testsuite: libgomp: fix dg-do run typo

2024-07-31 Thread Sam James
'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here
for the test to be executed.

That said, it actually seems to be executed for me anyway, presumably
a default in the directory, but let's fix it to be consistent with
other uses in the tree and in that test directory even.

libgomp/ChangeLog:
* testsuite/libgomp.c++/declare-target-indirect-1.C: Fix 'dg-run' typo.
---
Committed as obvious.

 libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C 
b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
index 1eac6b3fa96b..bd84b492feec 100644
--- a/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
+++ b/libgomp/testsuite/libgomp.c++/declare-target-indirect-1.C
@@ -1,4 +1,4 @@
-// { dg-run }
+// { dg-do run }
 
 #pragma omp begin declare target indirect
 class C

-- 
2.45.2



Re: [PATCH] Fix overwriting files with fs::copy_file on windows

2024-07-31 Thread Björn Schäpers

Am 30.07.2024 um 11:13 schrieb Jonathan Wakely:

On Sun, 24 Mar 2024 at 21:34, Björn Schäpers  wrote:


From: Björn Schäpers 

This fixes i.e. https://github.com/msys2/MSYS2-packages/issues/1937
I don't know if I picked the right way to do it.

When acceptable I think the declaration should be moved into
ops-common.h, since then we could use stat_type and also use that in the
commonly used function.

Manually tested on i686-w64-mingw32.

-- >8 --
libstdc++: Fix overwriting files on windows

The inodes have no meaning on windows, thus all files have an inode of
0. Use a differenz approach to identify equivalent files. As a result
std::filesystem::copy_file did not honor
copy_options::overwrite_existing. Factored the method out of
std::filesystem::equivalent.

libstdc++-v3/Changelog:

 * include/bits/fs_ops.h: Add declaration of
   __detail::equivalent_win32.
 * src/c++17/fs_ops.cc (__detail::equivalent_win32): Implement it
 (fs::equivalent): Use __detail::equivalent_win32, factored the
 old test out.
 * src/filesystem/ops-common.h (_GLIBCXX_FILESYSTEM_IS_WINDOWS):
   Use the function.

Signed-off-by: Björn Schäpers 
---
  libstdc++-v3/include/bits/fs_ops.h   |  8 +++
  libstdc++-v3/src/c++17/fs_ops.cc | 79 +---
  libstdc++-v3/src/filesystem/ops-common.h | 10 ++-
  3 files changed, 60 insertions(+), 37 deletions(-)

diff --git a/libstdc++-v3/include/bits/fs_ops.h 
b/libstdc++-v3/include/bits/fs_ops.h
index 90650c47b46..d10b78a4bdd 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -40,6 +40,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  namespace filesystem
  {
+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+namespace __detail
+{
+  bool
+  equivalent_win32(const wchar_t* p1, const wchar_t* p2, error_code& ec);
+} // namespace __detail
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
/** @addtogroup filesystem
 *  @{
 */
diff --git a/libstdc++-v3/src/c++17/fs_ops.cc b/libstdc++-v3/src/c++17/fs_ops.cc
index 61df19753ef..3cc87d45237 100644
--- a/libstdc++-v3/src/c++17/fs_ops.cc
+++ b/libstdc++-v3/src/c++17/fs_ops.cc
@@ -67,6 +67,49 @@
  namespace fs = std::filesystem;
  namespace posix = std::filesystem::__gnu_posix;

+#ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+bool
+fs::__detail::equivalent_win32(const wchar_t* p1, const wchar_t* p2,
+  error_code& ec)
+{
+  struct auto_handle {
+explicit auto_handle(const path& p_)
+: handle(CreateFileW(p_.c_str(), 0,
+   FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
+   0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
+{ }
+
+~auto_handle()
+{ if (*this) CloseHandle(handle); }
+
+explicit operator bool() const
+{ return handle != INVALID_HANDLE_VALUE; }
+
+bool get_info()
+{ return GetFileInformationByHandle(handle, ); }
+
+HANDLE handle;
+BY_HANDLE_FILE_INFORMATION info;
+  };
+  auto_handle h1(p1);
+  auto_handle h2(p2);
+  if (!h1 || !h2)
+{
+  if (!h1 && !h2)
+   ec = __last_system_error();
+  return false;
+}
+  if (!h1.get_info() || !h2.get_info())
+{
+  ec = __last_system_error();
+  return false;
+}
+  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
+&& h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
+&& h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+}
+#endif //_GLIBCXX_FILESYSTEM_IS_WINDOWS
+
  fs::path
  fs::absolute(const path& p)
  {
@@ -858,41 +901,7 @@ fs::equivalent(const path& p1, const path& p2, error_code& 
ec) noexcept
if (st1.st_mode != st2.st_mode || st1.st_dev != st2.st_dev)
 return false;

-  struct auto_handle {
-   explicit auto_handle(const path& p_)
-   : handle(CreateFileW(p_.c_str(), 0,
- FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE,
- 0, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0))
-   { }
-
-   ~auto_handle()
-   { if (*this) CloseHandle(handle); }
-
-   explicit operator bool() const
-   { return handle != INVALID_HANDLE_VALUE; }
-
-   bool get_info()
-   { return GetFileInformationByHandle(handle, ); }
-
-   HANDLE handle;
-   BY_HANDLE_FILE_INFORMATION info;
-  };
-  auto_handle h1(p1);
-  auto_handle h2(p2);
-  if (!h1 || !h2)
-   {
- if (!h1 && !h2)
-   ec = __last_system_error();
- return false;
-   }
-  if (!h1.get_info() || !h2.get_info())
-   {
- ec = __last_system_error();
- return false;
-   }
-  return h1.info.dwVolumeSerialNumber == h2.info.dwVolumeSerialNumber
-   && h1.info.nFileIndexHigh == h2.info.nFileIndexHigh
-   && h1.info.nFileIndexLow == h2.info.nFileIndexLow;
+  return __detail::equivalent_win32(p1.c_str(), p2.c_str(), ec);
  #else
return st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino;
  #endif
diff --git 

Re: [PATCH 01/15] arm: [MVE intrinsics] improve comment for orrq shape

2024-07-31 Thread Christophe Lyon
ping for the series?


On Thu, 11 Jul 2024 at 23:43, Christophe Lyon
 wrote:
>
> Add a comment about the lack of "n" forms for floating-point nor 8-bit
> integers, to make it clearer why we use build_16_32 for MODE_n.
>
> 2024-07-11  Christophe Lyon  
>
> gcc/
> * config/arm/arm-mve-builtins-shapes.cc (binary_orrq_def): Improve 
> comment.
> ---
>  gcc/config/arm/arm-mve-builtins-shapes.cc | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
> b/gcc/config/arm/arm-mve-builtins-shapes.cc
> index ba20c6a8f73..e01939469e3 100644
> --- a/gcc/config/arm/arm-mve-builtins-shapes.cc
> +++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
> @@ -865,7 +865,12 @@ SHAPE (binary_opt_n)
> int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, 
> int16x8_t b, mve_pred16_t p)
> int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t p)
> int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
> -   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
> mve_pred16_t p)  */
> +   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
> mve_pred16_t p)
> +
> +   No "_n" forms for floating-point, nor 8-bit integers:
> +   float16x8_t [__arm_]vorrq[_f16](float16x8_t a, float16x8_t b)
> +   float16x8_t [__arm_]vorrq_m[_f16](float16x8_t inactive, float16x8_t a, 
> float16x8_t b, mve_pred16_t p)
> +   float16x8_t [__arm_]vorrq_x[_f16](float16x8_t a, float16x8_t b, 
> mve_pred16_t p)  */
>  struct binary_orrq_def : public overloaded_base<0>
>  {
>bool
> --
> 2.34.1
>


[PATCH] middle-end/114563 - improve release_pages

2024-07-31 Thread Richard Biener
The following improves release_pages when using the madvise path
to sort the freelist to get more page entries contiguous and possibly
release them.  This populates the unused prev pointer so the reclaim
can then easily unlink from the freelist without re-ordering it.
The paths not having madvise do not keep the memory allocated, so
I left them untouched.

Re-bootstrap and regtest running on x86_64-unknown-linux-gnu.

I've CCed people messing with release_pages;  This doesn't really
address PR114563 but I thought I post this patch anyway - the
actual issue we run into for the PR is the linear search of
G.free_pages when that list becomes large but a requested allocation
cannot be served from it.

PR middle-end/114563
* ggc-page.cc (page_sort): New qsort comparator.
(release_pages): Sort the free_pages list entries after their
memory block virtual address to improve contiguous memory
chunk release.
---
 gcc/ggc-page.cc | 68 ++---
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/gcc/ggc-page.cc b/gcc/ggc-page.cc
index 4245f843a29..c9d8a8cd8e9 100644
--- a/gcc/ggc-page.cc
+++ b/gcc/ggc-page.cc
@@ -1010,6 +1010,19 @@ free_page (page_entry *entry)
   G.free_pages = entry;
 }
 
+/* Comparison function to sort page_entry after virtual address.  */
+
+static int
+page_sort (const void *pa_, const void *pb_)
+{
+  const page_entry *pa = *(const page_entry * const *)pa_;
+  const page_entry *pb = *(const page_entry * const *)pb_;
+  if ((uintptr_t)pa->page < (uintptr_t)pb->page)
+return -1;
+  else
+return 1;
+}
+
 /* Release the free page cache to the system.  */
 
 static void
@@ -1022,7 +1035,7 @@ release_pages (void)
   char *start;
   size_t len;
   size_t mapped_len;
-  page_entry *next, *prev, *newprev;
+  page_entry *prev;
   size_t free_unit = (GGC_QUIRE_SIZE/2) * G.pagesize;
 
   /* First free larger continuous areas to the OS.
@@ -1031,41 +1044,56 @@ release_pages (void)
  This does not always work because the free_pages list is only
  approximately sorted. */
 
-  p = G.free_pages;
+  auto_vec pages;
   prev = NULL;
+  p = G.free_pages;
   while (p)
 {
+  p->prev = prev;
+  pages.safe_push (p);
+  prev = p;
+  p = p->next;
+}
+  pages.qsort (page_sort);
+
+  for (unsigned i = 0; i < pages.length ();)
+{
+  p = pages[i];
   start = p->page;
-  start_p = p;
+  unsigned start_i = i;
   len = 0;
   mapped_len = 0;
-  newprev = prev;
-  while (p && p->page == start + len)
+  while (i < pages.length () && pages[i]->page == start + len)
 {
+ p = pages[i];
   len += p->bytes;
  if (!p->discarded)
- mapped_len += p->bytes;
- newprev = p;
-  p = p->next;
+   mapped_len += p->bytes;
+ ++i;
 }
   if (len >= free_unit)
 {
-  while (start_p != p)
-{
-  next = start_p->next;
-  free (start_p);
-  start_p = next;
-}
+ for (unsigned j = start_i; j != i; ++j)
+   {
+ p = pages[j];
+ if (!p->prev)
+   {
+ G.free_pages = p->next;
+ if (p->next)
+   p->next->prev = NULL;
+   }
+ else
+   {
+ p->prev->next = p->next;
+ if (p->next)
+   p->next->prev = p->prev;
+   }
+ free (pages[j]);
+   }
   munmap (start, len);
- if (prev)
-   prev->next = p;
-  else
-G.free_pages = p;
   G.bytes_mapped -= mapped_len;
  n1 += len;
- continue;
 }
-  prev = newprev;
}
 
   /* Now give back the fragmented pages to the OS, but keep the address 
-- 
2.43.0


Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Rainer Orth
Hi Jonathan,

>> agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales
>>
>> da_DK.ISO8859-15@euro
>> en_GB.ISO8859-15@euro
>> en_US.ISO8859-15@euro
>> sv_SE.ISO8859-15@euro
>>
>> the majority (17) are not.
>
> Ah interesting, I only saw en_US.ISO8859-15@euro on cfarm216, which is
> an interesting one. US locale using Euro symbol for currency?!

don't ask me what they were thinking ;-)  Anyway, I found that both
Solaris cfarm systems only had a subset of the available locales
installed.  An artifact of the exact installation method, I supposed.
Whatever the case, that's fixed now.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] tree-optimization/115825 - improve unroll estimates for volatile accesses

2024-07-31 Thread Richard Biener
On Wed, 10 Jul 2024, Richard Biener wrote:

> The loop unrolling code assumes that one third of all volatile accesses
> can be possibly optimized away which is of course not true.  This leads
> to excessive unrolling in some cases.  The following tracks the number
> of stmts with side-effects as those are not eliminatable later and
> only assumes one third of the other stmts can be further optimized.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> There's quite some testsuite fallout, mostly because of different rounding
> and a size of 8 now no longer is optimistically optimized to 5 but only 6.
> I can fix that by writing
> 
>   *est_eliminated = (unr_insns - not_elim) / 3;
> 
> as
> 
>   *est_eliminated = unr_insns - not_elim - (unr_insns - not_elim) * 2 / 3;
> 
> to preserve the old rounding behavior.  But for example
> 
> FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings, line 56)
> 
> shows
> 
>   size:   3 C::C (_25,   [(void *)&_ZTT2D1 + 48B]);
> 
> which we now consider not being optimizable (correctly I think) and thus
> the optimistic size reduction isn't enough to get the loop unrolled.
> Previously the computed size of 20 was reduced to 13, exactly the size
> of the not unrolled body.
> 
> So the remaining fallout will be
> 
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 LP64 note (test for 
> warnings
> , line 56)
> +FAIL: g++.dg/warn/Warray-bounds-20.C  -std=gnu++14 note (test for 
> warnings, lin
> e 66)
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c  -std=gnu++14  scan-tree-dump 
> optimized "__builtin___ubsan_handle_builtin_unreachable"
> ...
> +FAIL: c-c++-common/ubsan/unreachable-3.c   -O0   scan-tree-dump optimized 
> "__builtin___ubsan_handle_builtin_unreachable"
> 
> for the latter the issue is __builtin___sanitizer_cov_trace_pc ()
> 
> Does this seem feasible overall?  I can fixup the testcases above
> with #pragma unroll ...

Honza - any comments?

> Thanks,
> Richard.
> 
>   PR tree-optimization/115825
>   * tree-ssa-loop-ivcanon.cc (loop_size::not_eliminatable_after_peeling):
>   New.
>   (loop_size::last_iteration_not_eliminatable_after_peeling): Likewise.
>   (tree_estimate_loop_size): Count stmts with side-effects as
>   not optimistically eliminatable.
>   (estimated_unrolled_size): Compute the number of stmts that can
>   be optimistically eliminated by followup transforms.
>   (try_unroll_loop_completely): Adjust.
> 
>   * gcc.dg/tree-ssa/cunroll-17.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c | 11 +++
>  gcc/tree-ssa-loop-ivcanon.cc   | 35 +-
>  2 files changed, 38 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> new file mode 100644
> index 000..282db99c883
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-17.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os -fdump-tree-optimized" } */
> +
> +char volatile v;
> +void for16 (void)
> +{
> +  for (char i = 16; i > 0; i -= 2)
> +v = i;
> +}
> +
> +/* { dg-final { scan-tree-dump-times " ={v} " 1 "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
> index 5ef24a91917..dd941c31648 100644
> --- a/gcc/tree-ssa-loop-ivcanon.cc
> +++ b/gcc/tree-ssa-loop-ivcanon.cc
> @@ -139,11 +139,16 @@ struct loop_size
>   variable where induction variable starts at known constant.)  */
>int eliminated_by_peeling;
>  
> +  /* Number of instructions that cannot be further optimized in the
> + peeled loop, for example volatile accesses.  */
> +  int not_eliminatable_after_peeling;
> +
>/* Same statistics for last iteration of loop: it is smaller because
>   instructions after exit are not executed.  */
>int last_iteration;
>int last_iteration_eliminated_by_peeling;
> -  
> +  int last_iteration_not_eliminatable_after_peeling;
> +
>/* If some IV computation will become constant.  */
>bool constant_iv;
>  
> @@ -267,8 +272,10 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>  
>size->overall = 0;
>size->eliminated_by_peeling = 0;
> +  size->not_eliminatable_after_peeling = 0;
>size->last_iteration = 0;
>size->last_iteration_eliminated_by_peeling = 0;
> +  size->last_iteration_not_eliminatable_after_peeling = 0;
>size->num_pure_calls_on_hot_path = 0;
>size->num_non_pure_calls_on_hot_path = 0;
>size->non_call_stmts_on_hot_path = 0;
> @@ -292,6 +299,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, 
> edge edge_to_cancel,
>   {
> gimple *stmt = gsi_stmt (gsi);
> int num = estimate_num_insns (stmt, _size_weights);
> +   bool not_eliminatable_after_peeling = false;
> bool likely_eliminated = false;
> bool 

Re: [PATCH v4 0/3] aarch64: Add initial support for +fp8 arch extensions

2024-07-31 Thread Richard Sandiford
Claudio Bantaloukas  writes:
> This series introduces initial flags and functionality for the fp8 feature.
>
> Specifically, the following are added:
> - functions that enable constructing valid fpm register values.
> - support for the '+fp8' -march modifier.
> - support for reading and writing the new system register FPMR (Floating 
> Point Mode
>   Register) which configures the new FP8 features
>
> Tested against aarch64-unknown-linux-gnu.
>
> V1 of this patch series had "aarch64: Add march flags for +fp8 arch 
> extensions" as
> cover letter title. Since then, changes in V2 are:
>
> aarch64: Add march flags for +fp8 arch extensions
> - Removed __ARM_FEATURE_FP8 define: will be added once the relevant features 
> are in.
> - Some unnecessary whitespace changes were removed.
> - Helper function names now begin with __arm.
>
> aarch64: Add support for moving fpm system register
> - Removed a misleading comment.
> - Removed unnecessary modifier in .md
>
> aarch64: Add fpm register helper functions.
> - Helper functions and fpm_t types are available unconditionally when 
> including arm_acle.h
>
> Changes in V3 are:
>
> aarch64: Add march flags for +fp8 arch extensions
> - removed unnecessary check-function-bodies check
>
> aarch64: Add support for moving fpm system register
> - added check-function-bodies check
>
> aarch64: Add fpm register helper functions.
> - moved fp8 types and helper functions into a new private header file 
> arm_private_fp8.h
> - arm_neon.h and arm_sve.h now include the new header
> - added tests that check the helpers are available when including arm_neon.h
>   arm_sve.h or arm_sme.h 
>
> Changes in V4 are:
>
> aarch64: Add support for moving fpm system register
> - updated commit message
> - fixed length in .md
> - fixed tests to only exercise register moves for specific sizes
>
> aarch64: Add fpm register helper functions.
> - updated error message in arm_private_fp8.h
>
> Is this ok for master? I do not have merge permissions. Can someone merge 
> this for me please?
>
> Thanks,
> Claudio Bantaloukas
>
>
>
> Claudio Bantaloukas (3):
>   aarch64: Add march flags for +fp8 arch extensions
>   aarch64: Add support for moving fpm system register
>   aarch64: Add fpm register helper functions.

LGTM, thanks.  Pushed to trunk.

Richard

>  gcc/config.gcc|   2 +-
>  .../aarch64/aarch64-option-extensions.def |   2 +
>  gcc/config/aarch64/aarch64.cc |   8 ++
>  gcc/config/aarch64/aarch64.h  |  17 ++-
>  gcc/config/aarch64/aarch64.md |  30 +++--
>  gcc/config/aarch64/arm_neon.h |   1 +
>  gcc/config/aarch64/arm_private_fp8.h  |  80 
>  gcc/config/aarch64/arm_sve.h  |   1 +
>  gcc/config/aarch64/constraints.md |   3 +
>  gcc/doc/invoke.texi   |   2 +
>  .../aarch64/acle/fp8-helpers-neon.c   |  53 
>  .../gcc.target/aarch64/acle/fp8-helpers-sme.c |  12 ++
>  .../gcc.target/aarch64/acle/fp8-helpers-sve.c |  12 ++
>  gcc/testsuite/gcc.target/aarch64/acle/fp8.c   | 121 ++
>  14 files changed, 329 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/config/aarch64/arm_private_fp8.h
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-neon.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sme.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8-helpers-sve.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/fp8.c


[PATCH 3/3][v3] tree-optimization/114659 - VN and FP to int punning

2024-07-31 Thread Richard Biener
The following addresses another case where x87 FP loads mangle the
bit representation and thus are not suitable for a representative
in other types.  VN was value-numbering a later integer load of 'x'
as the same as a former float load of 'x'.

We can use the new TARGET_MODE_CAN_TRANSFER_BITS hook to identify
problematic modes and enforce strict compatibility for those in
the reference comparison, improving the handling of modes with
padding in visit_reference_op_load.

PR tree-optimization/114659
* tree-ssa-sccvn.cc (visit_reference_op_load): Do not
prevent punning from modes with padding here, but ...
(vn_reference_eq): ... ensure this here, also honoring
types with modes that cannot act as bit container.

* gcc.target/i386/pr114659.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr114659.c | 62 
 gcc/tree-ssa-sccvn.cc| 11 ++---
 2 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr114659.c

diff --git a/gcc/testsuite/gcc.target/i386/pr114659.c 
b/gcc/testsuite/gcc.target/i386/pr114659.c
new file mode 100644
index 000..e1e24d55687
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114659.c
@@ -0,0 +1,62 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+my_totalorderf (float const *x, float const *y)
+{
+  int xs = __builtin_signbit (*x);
+  int ys = __builtin_signbit (*y);
+  if (!xs != !ys)
+return xs;
+
+  int xn = __builtin_isnan (*x);
+  int yn = __builtin_isnan (*y);
+  if (!xn != !yn)
+return !xn == !xs;
+  if (!xn)
+return *x <= *y;
+
+  unsigned int extended_sign = -!!xs;
+  union { unsigned int i; float f; } xu = {0}, yu = {0};
+  __builtin_memcpy (, x, sizeof (float));
+  __builtin_memcpy (, y, sizeof (float));
+  return (xu.i ^ extended_sign) <= (yu.i ^ extended_sign);
+}
+
+static float
+positive_NaNf ()
+{
+  float volatile nan = 0.0f / 0.0f;
+  return (__builtin_signbit (nan) ? - nan : nan);
+}
+
+typedef union { float value; unsigned int word[1]; } memory_float;
+
+static memory_float
+construct_memory_SNaNf (float quiet_value)
+{
+  memory_float m;
+  m.value = quiet_value;
+  m.word[0] ^= (unsigned int) 1 << 22;
+  m.word[0] |= (unsigned int) 1;
+  return m;
+}
+
+memory_float x[7] =
+  {
+{ 0 },
+{ 1e-5 },
+{ 1 },
+{ 1e37 },
+{ 1.0f / 0.0f },
+  };
+
+int
+main ()
+{
+  x[5] = construct_memory_SNaNf (positive_NaNf ());
+  x[6] = (memory_float) { positive_NaNf () };
+  if (! my_totalorderf ([5].value, [6].value))
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index dc377fa16ce..0639ba426ff 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -837,6 +837,9 @@ vn_reference_eq (const_vn_reference_t const vr1, 
const_vn_reference_t const vr2)
TYPE_VECTOR_SUBPARTS (vr2->type)))
return false;
 }
+  else if (TYPE_MODE (vr1->type) != TYPE_MODE (vr2->type)
+  && !mode_can_transfer_bits (TYPE_MODE (vr1->type)))
+return false;
 
   i = 0;
   j = 0;
@@ -5814,13 +5817,7 @@ visit_reference_op_load (tree lhs, tree op, gimple *stmt)
   if (result
   && !useless_type_conversion_p (TREE_TYPE (result), TREE_TYPE (op)))
 {
-  /* Avoid the type punning in case the result mode has padding where
-the op we lookup has not.  */
-  if (TYPE_MODE (TREE_TYPE (result)) != BLKmode
- && maybe_lt (GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (result))),
-  GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (op)
-   result = NULL_TREE;
-  else if (CONSTANT_CLASS_P (result))
+  if (CONSTANT_CLASS_P (result))
result = const_unop (VIEW_CONVERT_EXPR, TREE_TYPE (op), result);
   else
{
-- 
2.43.0


[PATCH 2/3] [x86] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener
The following implements the hook, excluding x87 modes for scalar
and complex float modes.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK this way?

Thanks,
Richard.

* i386.cc (TARGET_MODE_CAN_TRANSFER_BITS): Define.
(ix86_mode_can_transfer_bits): New function.
---
 gcc/config/i386/i386.cc | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 12d15feb5e9..9869c44ee15 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26113,6 +26113,25 @@ ix86_have_ccmp ()
   return (bool) TARGET_APX_CCMP;
 }
 
+/* Implement TARGET_MODE_CAN_TRANSFER_BITS.  */
+static bool
+ix86_mode_can_transfer_bits (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_FLOAT
+  || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
+switch (GET_MODE_INNER (mode))
+  {
+  case SFmode:
+  case DFmode:
+   /* These suffer from normalization upon load when not using SSE.  */
+   return !(ix86_fpmath & FPMATH_387);
+  default:
+   return true;
+  }
+
+  return true;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -26959,6 +26978,9 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_HAVE_CCMP
 #define TARGET_HAVE_CCMP ix86_have_ccmp
 
+#undef TARGET_MODE_CAN_TRANSFER_BITS
+#define TARGET_MODE_CAN_TRANSFER_BITS ix86_mode_can_transfer_bits
+
 static bool
 ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
 {
-- 
2.43.0



[PATCH 1/3][v3] Add TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener
The following adds a target hook to specify whether regs of MODE can be
used to transfer bits.  The hook is supposed to be used for value-numbering
to decide whether a value loaded in such mode can be punned to another
mode instead of re-loading the value in the other mode and for SRA to
decide whether MODE is suitable as container holding a value to be
used in different modes.

Adjusted documentation in v3.

* target.def (mode_can_transfer_bits): New target hook.
* target.h (mode_can_transfer_bits): New function wrapping the
hook and providing default behavior.
* doc/tm.texi.in: Update.
* doc/tm.texi: Re-generate.
---
 gcc/doc/tm.texi| 11 +++
 gcc/doc/tm.texi.in |  2 ++
 gcc/target.def | 13 +
 gcc/target.h   | 16 
 4 files changed, 42 insertions(+)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c7535d07f4d..cc33084ed32 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4545,6 +4545,17 @@ is either a declaration of type int or accessed by 
dereferencing
 a pointer to int.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_MODE_CAN_TRANSFER_BITS (machine_mode 
@var{mode})
+Define this to return false if the mode @var{mode} cannot be used
+for memory copying of @code{GET_MODE_SIZE (mode)} units.  This might be
+because a register class allowed for @var{mode} has registers that do
+not transparently transfer every bit pattern or because the load or
+store patterns available for @var{mode} have this issue.
+
+The default is to assume modes with the same precision as size are fine
+to be used.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE 
(machine_mode @var{mode})
 Define this hook if during mode attribute processing, the port should
 translate machine_mode @var{mode} to another mode.  For example, rs6000's
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 64cea3b1eda..8af3f414505 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3455,6 +3455,8 @@ stack.
 
 @hook TARGET_REF_MAY_ALIAS_ERRNO
 
+@hook TARGET_MODE_CAN_TRANSFER_BITS
+
 @hook TARGET_TRANSLATE_MODE_ATTRIBUTE
 
 @hook TARGET_SCALAR_MODE_SUPPORTED_P
diff --git a/gcc/target.def b/gcc/target.def
index 3de1aad4c84..1d0ea6f30ca 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3363,6 +3363,19 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+DEFHOOK
+(mode_can_transfer_bits,
+ "Define this to return false if the mode @var{mode} cannot be used\n\
+for memory copying of @code{GET_MODE_SIZE (mode)} units.  This might be\n\
+because a register class allowed for @var{mode} has registers that do\n\
+not transparently transfer every bit pattern or because the load or\n\
+store patterns available for @var{mode} have this issue.\n\
+\n\
+The default is to assume modes with the same precision as size are fine\n\
+to be used.",
+ bool, (machine_mode mode),
+ NULL)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
diff --git a/gcc/target.h b/gcc/target.h
index c1f99b97b86..837651d273a 100644
--- a/gcc/target.h
+++ b/gcc/target.h
@@ -312,6 +312,22 @@ estimated_poly_value (poly_int64 x,
 return targetm.estimated_poly_value (x, kind);
 }
 
+/* Return true when MODE can be used to copy GET_MODE_BITSIZE bits
+   unchanged.  */
+
+inline bool
+mode_can_transfer_bits (machine_mode mode)
+{
+  if (mode == BLKmode)
+return true;
+  if (maybe_ne (GET_MODE_BITSIZE (mode),
+   GET_MODE_UNIT_PRECISION (mode) * GET_MODE_NUNITS (mode)))
+return false;
+  if (targetm.mode_can_transfer_bits)
+return targetm.mode_can_transfer_bits (mode);
+  return true;
+}
+
 #ifdef GCC_TM_H
 
 #ifndef CUMULATIVE_ARGS_MAGIC
-- 
2.43.0



Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener
On Wed, 31 Jul 2024, Jakub Jelinek wrote:

> On Wed, Jul 31, 2024 at 02:43:36PM +0200, Richard Biener wrote:
> > diff --git a/gcc/config/i386/i386-modes.def 
> > b/gcc/config/i386/i386-modes.def
> > index 6d8f1946f3a..2cc03e30f13 100644
> > --- a/gcc/config/i386/i386-modes.def
> > +++ b/gcc/config/i386/i386-modes.def
> > @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
> > XFmode is __float80 is IEEE extended; TFmode is __float128
> > is IEEE quad.  */
> >  
> > -FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> > +FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
> >  FLOAT_MODE (TF, 16, ieee_quad_format);
> >  FLOAT_MODE (HF, 2, ieee_half_format);
> >  FLOAT_MODE (BF, 2, 0);
> > 
> > bootstraps and tests (-m64/-m32) OK on x86_64-linux.
> 
> And does it e.g. pass compat.exp / structure-layout-1.exp testing
> against gcc without that patch (ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++)?

It doesn't.  I would expect differences at least for packed structs
since TYPE_SIZE changes with MODE_SIZE.

Richard.


[PATCH] libstdc++: Handle strerror returning null

2024-07-31 Thread Jonathan Wakely
As discussed a couple of weeks ago, I'm going to push this.

Tested x86_64-linux (where this #else isn't even used, but I checked it
does at least compile when the #if isn't true).

-- >8 --

The linux man page for strerror says that some systems return NULL for
an unknown error number. That violates the C and POSIX standards, but we
can esily handle it to avoid a segfault.

libstdc++-v3/ChangeLog:

* src/c++11/system_error.cc (strerror_string): Handle
non-conforming NULL return from strerror.
---
 libstdc++-v3/src/c++11/system_error.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++11/system_error.cc 
b/libstdc++-v3/src/c++11/system_error.cc
index d01451ba1ef..38bc0446110 100644
--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -110,7 +110,11 @@ namespace
 #else
   string strerror_string(int err)
   {
-return strerror(err); // XXX Not thread-safe.
+auto str = strerror(err); // XXX Not thread-safe.
+if (str) [[__likely__]]
+  return str;
+// strerror should not return NULL, but some implementations do.
+return "Unknown error";
   }
 #endif
 
-- 
2.45.2



Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely
On Wed, 31 Jul 2024 at 13:42, Rainer Orth  wrote:
>
> Hi Jonathan,
>
> > On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
> >>
> >> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
> >> certainly don't want to append "@euro" on Solaris, where this change
> >> flips some tests from UNSUPPORTED to PASS, e.g.
> >> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
> >> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
> >> we'll need to address.
> >
> > Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
> > Solaris was a build using my patch for PR 57585, which is not pushed
> > yet. I think without that all uses of dg-require-namedlocale might
> > fail on Solaris, so this change won't actually change anything ...
> > yet.
> >
> > It still seems worth doing now though.
>
> agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales
>
> da_DK.ISO8859-15@euro
> en_GB.ISO8859-15@euro
> en_US.ISO8859-15@euro
> sv_SE.ISO8859-15@euro
>
> the majority (17) are not.

Ah interesting, I only saw en_US.ISO8859-15@euro on cfarm216, which is
an interesting one. US locale using Euro symbol for currency?!

Anyway, thanks for confirming, I'll push this.



Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2024 at 02:43:36PM +0200, Richard Biener wrote:
> diff --git a/gcc/config/i386/i386-modes.def 
> b/gcc/config/i386/i386-modes.def
> index 6d8f1946f3a..2cc03e30f13 100644
> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
> XFmode is __float80 is IEEE extended; TFmode is __float128
> is IEEE quad.  */
>  
> -FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
> +FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
>  FLOAT_MODE (TF, 16, ieee_quad_format);
>  FLOAT_MODE (HF, 2, ieee_half_format);
>  FLOAT_MODE (BF, 2, 0);
> 
> bootstraps and tests (-m64/-m32) OK on x86_64-linux.

And does it e.g. pass compat.exp / structure-layout-1.exp testing
against gcc without that patch (ALT_CC_UNDER_TEST=gcc ALT_CXX_UNDER_TEST=g++)?

Jakub



Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Richard Biener
On Wed, 31 Jul 2024, Uros Bizjak wrote:

> On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:
> >
> > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> >
> > > On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
> > > >
> > > > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> > > >
> > > > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > > XFmode is rejected in the hook?
> > > > > > >
> > > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory 
> > > > > > > copy to
> > > > > > > get some hard data.
> > > > > >
> > > > > > My (limited) understanding was that the hook would be used only for 
> > > > > > cases
> > > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > > and some
> > > > > > subsequent loads from the same address with different mode but same 
> > > > > > size
> > > > > > the same and replace say int or long long later load with 
> > > > > > VIEW_CONVERT_EXPR
> > > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > > because
> > > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > > doing
> > > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > > program,
> > > > > > load some floating point value and store it elsewhere or as part of 
> > > > > > larger
> > > > > > aggregate copy.
> > > > >
> > > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > > >
> > > > >
> > > > > switch (GET_MODE_INNER (mode))
> > > > >   {
> > > > >   case SFmode:
> > > > >   case DFmode:
> > > > > /* These suffer from normalization upon load when not using 
> > > > > SSE.  */
> > > > > return !(ix86_fpmath & FPMATH_387);
> > > > >   default:
> > > > > return true;
> > > > >   }
> > > >
> > > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > > wrapper around the hook should reject modes with padding or if
> > > > the supposed users (value-numbering and SRA) should deal with that
> > > > issue separately.  I do wonder whether
> > > >
> > > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > > >   ? _extended_intel_128_format
> > > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > > >   ? _extended_intel_96_round_53_format
> > > >   : _extended_intel_96_format));
> > > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > > >
> > > > unambiguously specifies where the padding is - m68k has
> > > >
> > > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > > >
> > > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > > possible padding as unspecified and not "masked out" even if
> > > > the actual fstp will only store 10 bytes.
> > >
> > > The hardware will never touch bytes outside 10 bytes range, the
> > > padding is some artificial compiler thingy, so IMO it should be
> > > handled before the hook is called. Please find attached the source I
> > > have used to confirm that a) the copied bits will never be mangled and
> > > b) there is no access outside the 10 bytes range. (BTW: these
> > > particular values are to test the effect of leading bit 63, the
> > > non-hidden normalized bit).
> >
> > Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> > mode_base_align[XFmode] seems to be correctly set to ensure
> > 12 bytes / 16 bytes "effective" size.
> 
> Uh, this decision predates my involvement in GCC development by a long shot ;)

diff --git a/gcc/config/i386/i386-modes.def 
b/gcc/config/i386/i386-modes.def
index 6d8f1946f3a..2cc03e30f13 100644
--- a/gcc/config/i386/i386-modes.def
+++ b/gcc/config/i386/i386-modes.def
@@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
XFmode is __float80 is IEEE extended; TFmode is __float128
is IEEE quad.  */
 
-FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_intel_96_format);
+FRACTIONAL_FLOAT_MODE (XF, 80, 10, ieee_extended_intel_96_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (BF, 2, 0);

bootstraps and tests (-m64/-m32) OK on x86_64-linux.

Richard.

Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Rainer Orth
Hi Jonathan,

> On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
>>
>> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
>> certainly don't want to append "@euro" on Solaris, where this change
>> flips some tests from UNSUPPORTED to PASS, e.g.
>> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
>> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
>> we'll need to address.
>
> Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
> Solaris was a build using my patch for PR 57585, which is not pushed
> yet. I think without that all uses of dg-require-namedlocale might
> fail on Solaris, so this change won't actually change anything ...
> yet.
>
> It still seems worth doing now though.

agreed: while Solaris 11.4 does have a few *.ISO8859-15@euro locales

da_DK.ISO8859-15@euro
en_GB.ISO8859-15@euro
en_US.ISO8859-15@euro
sv_SE.ISO8859-15@euro

the majority (17) are not.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely
On Wed, 31 Jul 2024 at 13:27, Jonathan Wakely  wrote:
>
> I doubt we want the @euro suffix anywhere except Glibc-based targets. We
> certainly don't want to append "@euro" on Solaris, where this change
> flips some tests from UNSUPPORTED to PASS, e.g.
> 21_strings/basic_string/numeric_conversions/char/to_string_float.cc
> It will probably also cause some to flip from UNSUPPORTED to FAIL, which
> we'll need to address.

Oh, I've just realised that the UNSUPPORTED -> PASS I observed on
Solaris was a build using my patch for PR 57585, which is not pushed
yet. I think without that all uses of dg-require-namedlocale might
fail on Solaris, so this change won't actually change anything ...
yet.

It still seems worth doing now though.

>
> Let's restrict it to Glibc.
>
> Tested x86_64-linux and sparc-solaris11.4.
>
> -- >8 --
>
> The testsuite automatically appends "@euro" to "xx.ISO8859-15" locale
> names on all targets except FreeBSD, DragonflyBSD, and NetBSD. It should
> only be for Glibc, not all non-BSD targets.
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
> Only append "@euro" to ".ISO8859-15" locales for Glibc.
> ---
>  libstdc++-v3/testsuite/lib/libstdc++.exp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
> b/libstdc++-v3/testsuite/lib/libstdc++.exp
> index 18331c80bc2..2510c7f4cbb 100644
> --- a/libstdc++-v3/testsuite/lib/libstdc++.exp
> +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
> @@ -1032,7 +1032,7 @@ proc check_v3_target_namedlocale { args } {
> puts $f "strcpy(result, name);"
> puts $f "#if defined __FreeBSD__ || defined __DragonFly__ || defined 
> __NetBSD__"
> puts $f "/* fall-through */"
> -   puts $f "#else"
> +   puts $f "#elif defined __GLIBC__"
> puts $f "if (strstr(result, \"ISO8859-15\")) {"
> puts $f "strcat(result, \"@euro\");"
> puts $f "}"
> --
> 2.45.2
>



[PATCH] libstdc++: Only append "@euro" to locale names for Glibc testing

2024-07-31 Thread Jonathan Wakely
I doubt we want the @euro suffix anywhere except Glibc-based targets. We
certainly don't want to append "@euro" on Solaris, where this change
flips some tests from UNSUPPORTED to PASS, e.g.
21_strings/basic_string/numeric_conversions/char/to_string_float.cc
It will probably also cause some to flip from UNSUPPORTED to FAIL, which
we'll need to address.

Let's restrict it to Glibc.

Tested x86_64-linux and sparc-solaris11.4.

-- >8 --

The testsuite automatically appends "@euro" to "xx.ISO8859-15" locale
names on all targets except FreeBSD, DragonflyBSD, and NetBSD. It should
only be for Glibc, not all non-BSD targets.

libstdc++-v3/ChangeLog:

* testsuite/lib/libstdc++.exp (check_v3_target_namedlocale):
Only append "@euro" to ".ISO8859-15" locales for Glibc.
---
 libstdc++-v3/testsuite/lib/libstdc++.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp 
b/libstdc++-v3/testsuite/lib/libstdc++.exp
index 18331c80bc2..2510c7f4cbb 100644
--- a/libstdc++-v3/testsuite/lib/libstdc++.exp
+++ b/libstdc++-v3/testsuite/lib/libstdc++.exp
@@ -1032,7 +1032,7 @@ proc check_v3_target_namedlocale { args } {
puts $f "strcpy(result, name);"
puts $f "#if defined __FreeBSD__ || defined __DragonFly__ || defined 
__NetBSD__"
puts $f "/* fall-through */"
-   puts $f "#else"
+   puts $f "#elif defined __GLIBC__"
puts $f "if (strstr(result, \"ISO8859-15\")) {"
puts $f "strcat(result, \"@euro\");"
puts $f "}"
-- 
2.45.2



Re: [Patch,v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])

2024-07-31 Thread Richard Biener
On Wed, Jul 31, 2024 at 1:21 PM Tobias Burnus  wrote:
>
> Hi Richard, hi all,
>
> Richard Biener wrote:
>
> Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
> shouldn't simply do the substitution?  Aka
>
> This seems to work ...
>
> --- a/gcc/omp-offload.cc
> +++ b/gcc/omp-offload.cc
> @@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
>&& is_global_var (t)
>&& lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
>  {
> +  *tp = unshare_expr (DECL_VALUE_EXPR (t));
>*walk_subtrees = 0;
>return t;
>  }
>
> which then makes the stmt obviously not gimple?
>
> ... except that 'return t' prevents updating other value-expr in the same 
> stmt, but that can be fixed.
>
> Updated patch attached.

You can pass a

  walk_stmt_info wi;
  wi->data = NULL;

to walk_gimple_stmt and set wi->data instead of using a global
variable (or make wi->data point
to a local variable for some more indirection).

OK as-is or with cleanup as suggested above.

Thanks,
Richard.

> Thanks for the suggestion!
>
> Tobias


Re: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2

2024-07-31 Thread Jennifer Schmitz
Thanks for the feedback! I updated the patch based on your comments, more 
detailed comments inline below. The updated version was bootstrapped and tested 
again, no regression.
Best,
Jennifer



0001-AArch64-Fuse-CMP-CSEL-and-CMP-CSET-for-mcpu-neoverse.patch
Description: Binary data

> On 25 Jul 2024, at 14:49, Kyrylo Tkachov  wrote:
> 
> 
> 
>> On 25 Jul 2024, at 13:58, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> Thank you for the feedback. I added checks for SCALAR_INT_MODE_P for the 
>>> reg operands of the compare and if-then-else expressions. As it is not 
>>> legal to have different modes in the operand registers, I only added one 
>>> check for each of the expressions.
>>> The updated patch was bootstrapped and tested again.
>>> Best,
>>> Jennifer
>>> 
>>> From 8da609be99fece8130cf1429bd938b2a26c6672b Mon Sep 17 00:00:00 2001
>>> From: Jennifer Schmitz 
>>> Date: Wed, 24 Jul 2024 06:13:59 -0700
>>> Subject: [PATCH] aarch64: Fuse CMP+CSEL and CMP+CSET for -mcpu=neoverse-v2
>>> 
>>> According to the Neoverse V2 Software Optimization Guide (section 4.14), the
>>> instruction pairs CMP+CSEL and CMP+CSET can be fused, which had not been
>>> implemented so far. This patch implements and tests the two fusion pairs.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> There was also no non-noise impact on SPEC CPU2017 benchmark.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz 
>>> 
>>> gcc/
>>> 
>>> * config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Implement
>>> fusion logic.
>>> * config/aarch64/aarch64-fusion-pairs.def (cmp+csel): New entry.
>>> (cmp+cset): Likewise.
>>> * config/aarch64/tuning_models/neoversev2.h: Enable logic in
>>> field fusible_ops.
>>> 
>>> gcc/testsuite/
>>> 
>>> * gcc.target/aarch64/fuse_cmp_csel.c: New test.
>>> * gcc.target/aarch64/fuse_cmp_cset.c: Likewise.
>> 
>> Thanks for the update.
>> 
>> It looks from a quick scan like the main three instructions associated
>> with single-set integer COMPAREs are CMP, CMN and TST.  TST could be
>> distinguished from CMP and CMN based on get_attr_type (), although it
>> looks like:
>> 
>> (define_insn "*and_compare0"
>> [(set (reg:CC_Z CC_REGNUM)
>>   (compare:CC_Z
>>(match_operand:SHORT 0 "register_operand" "r")
>>(const_int 0)))]
>> ""
>> "tst\\t%0, "
>> [(set_attr "type" "alus_imm")]
>> )
> 
> We can change that independently.
I submitted a small patch to fix that.
> 
>> 
>> should use logics_imm instead of alus_imm.
>> 
>> Alternatively, we could add a new attribute for "compare_type" and use
>> that.  That would make the test in aarch_macro_fusion_pair_p slightly
>> simpler, since it could use get_attr_compare_type without having to
>> look at the pattern of prev_set.  But there's a danger that we'd
>> forget to add the new attribute for new comparison instructions.
>> 
>> I did wonder whether we could simply punt on CC_Zmode, but that's
>> not a reliable test.
>> 
>> But I suppose the counter-argument to my questions above is: how bad
>> would it be if we fused CMN and TST?  They are at least plausible
>> fusions, so it probably doesn't matter if we include them too.
> 
> CMN and TST can be fused with conditional branches, but not with CSEL 
> according to my reading of the SWOG so I guess we’d want to keep them 
> separate in principle. In practice, I can’t imagine the performance 
> difference will be measurable in real workloads if they are kept together.
> Jennifer’s benchmarking of this patch didn’t show any negative performance 
> consequences of the more aggressive fusion, and even a slight improvement.
As suggested, I used get_attr_type to distinguish TST from CMP/CMN.
> 
> 
>> 
>> So:
>> 
>>> ---
>>> gcc/config/aarch64/aarch64-fusion-pairs.def   |  2 ++
>>> gcc/config/aarch64/aarch64.cc | 22 +
>>> gcc/config/aarch64/tuning_models/neoversev2.h |  5 ++-
>>> .../gcc.target/aarch64/fuse_cmp_csel.c| 33 +++
>>> .../gcc.target/aarch64/fuse_cmp_cset.c| 31 +
>>> 5 files changed, 92 inser

Re: [PATCH 2/2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Sam James
Andrew Pinski  writes:

> When this pattern was converted from being only dealing with 0/-1, we missed 
> that if `e == f` is true
> then the optimization is wrong and needs an extra check for that.
>
> This changes the patterns to be:
> /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>
> This still produces better code than the original case and in many cases (x 
> != y) will
> still reduce to either false or true.
>
> With this change we also need to make sure `a`, `b` and the resulting types 
> are all
> the same for the same reason as the previous patch.
>
> I updated (well added) to the testcases to make sure there are the right 
> amount of
> comparisons left.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
>   PR tree-optimization/116120
>
> [...]
> diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.c 
> b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> new file mode 100644
> index 000..cffb7fbdc5b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> @@ -0,0 +1,32 @@
> +// { dg-run }

dg-do run! Ditto elsewhere.

> [...]

thanks,
sam


signature.asc
Description: PGP signature


Re: [PATCH 2/2] match: Fix wrong code due to `(a ? e : f) !=/== (b ? e : f)` patterns [PR116120]

2024-07-31 Thread Richard Biener
On Tue, Jul 30, 2024 at 5:26 PM Andrew Pinski  wrote:
>
> When this pattern was converted from being only dealing with 0/-1, we missed 
> that if `e == f` is true
> then the optimization is wrong and needs an extra check for that.
>
> This changes the patterns to be:
> /* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> /* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> /* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> /* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>
> This still produces better code than the original case and in many cases (x 
> != y) will
> still reduce to either false or true.
>
> With this change we also need to make sure `a`, `b` and the resulting types 
> are all
> the same for the same reason as the previous patch.
>
> I updated (well added) to the testcases to make sure there are the right 
> amount of
> comparisons left.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR tree-optimization/116120
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Add test for `x != y`
> in result.
> (`(a ? x : y) eq/ne (b ? y : x)`): Add test for `x == y` in result.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/pr50.C: Add extra checks on the test.
> * gcc.dg/tree-ssa/pr50-1.c: Likewise.
> * gcc.dg/tree-ssa/pr50.c: Likewise.
> * g++.dg/torture/pr116120-1.c: New test.
> * g++.dg/torture/pr116120-2.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd   | 20 -
>  gcc/testsuite/g++.dg/torture/pr116120-1.c  | 32 
>  gcc/testsuite/g++.dg/torture/pr116120-2.c  | 35 ++
>  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 10 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c |  9 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   |  1 +
>  6 files changed, 99 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-1.c
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116120-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 881a827860f..4d3ee578371 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5632,21 +5632,25 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
>  #endif
>
> -/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> -/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> -/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> +/* (a ? x : y) != (b ? x : y) --> (a^b & (x != y)) ? TRUE  : FALSE */
> +/* (a ? x : y) == (b ? x : y) --> (a^b & (x != y)) ? FALSE : TRUE  */
> +/* (a ? x : y) != (b ? y : x) --> (a^b | (x == y)) ? FALSE : TRUE  */
> +/* (a ? x : y) == (b ? y : x) --> (a^b | (x == y)) ? TRUE  : FALSE */
>  (for cnd (cond vec_cond)
>   (for eqne (eq ne)
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_and (bit_xor @0 @3) (ne:type @1 @2))
> +  { constant_boolean_node (eqne == NE_EXPR, type); }
>{ constant_boolean_node (eqne != NE_EXPR, type); })))
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> -(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> - (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3))
> + && types_match (type, TREE_TYPE (@0)))
> + (cnd (bit_ior (bit_xor @0 @3) (eq:type @1 @2))
> +  { constant_boolean_node (eqne != NE_EXPR, type); }
>{ constant_boolean_node (eqne == NE_EXPR, type); })
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> diff --git a/gcc/testsuite/g++.dg/torture/pr116120-1.c 
> b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> new file mode 100644
> index 000..cffb7fbdc5b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116120-1.c
> @@ -0,0 +1,32 @@
> +// { dg-run }
> +// PR tree-optimization/116120
> +
> +// The optimization for `(a ? x : y) != (b ? x : y)`
> +// missed that x and y could be the same value.
> +
> +typedef int v4si __attribute((__vector_size__(1 * sizeof(int;
> +v4si f1(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b

Re: [PATCH 1/2] match: Fix types matching for `(?:) !=/== (?:)` [PR116134]

2024-07-31 Thread Richard Biener
On Tue, Jul 30, 2024 at 5:25 PM Andrew Pinski  wrote:
>
> The problem here is that in generic types of comparisons don't need
> to be boolean types (or vector boolean types). And fixes that by making
> sure the types of the conditions match before doing the optimization.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK

> PR middle-end/116134
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): Check that
> a and b types match.
> (`(a ? x : y) eq/ne (b ? y : x)`): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/torture/pr116134-1.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd  | 10 ++
>  gcc/testsuite/gcc.dg/torture/pr116134-1.c |  9 +
>  2 files changed, 15 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr116134-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 1c8601229e3..881a827860f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5640,12 +5640,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (for eqne (eq ne)
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> -(cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> - { constant_boolean_node (eqne != NE_EXPR, type); }))
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> + (cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> +  { constant_boolean_node (eqne != NE_EXPR, type); })))
>(simplify
> (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> -(cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> - { constant_boolean_node (eqne == NE_EXPR, type); }
> +(if (types_match (TREE_TYPE (@0), TREE_TYPE (@3)))
> + (cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> +  { constant_boolean_node (eqne == NE_EXPR, type); })
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
> diff --git a/gcc/testsuite/gcc.dg/torture/pr116134-1.c 
> b/gcc/testsuite/gcc.dg/torture/pr116134-1.c
> new file mode 100644
> index 000..ab595f99680
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr116134-1.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +/* This used to ICE as comparisons on generic can be different types. */
> +/* PR middle-end/116134  */
> +
> +int a;
> +int b;
> +int d;
> +void c() { 1UL <= (d < b) != (1UL & (0 < a | 0L)); }
> --
> 2.43.0
>


Re: [PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl
On Wed 2024-07-31 13:34:28, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 01:32:06PM +0200, Filip Kastl wrote:
> > Thanks for the feedback!  Here is a second version of the patch.  I've 
> > tested
> > this version with
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
> > --target_board='unix{-m32}'"
> > 
> > and
> > 
> > make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"
> 
> You can just use
> make check RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} 
> i386.exp=switch-exp-transform-3.c"
> 
> > testsuite: Adjust switch-exp-transform-3.c for 32bit
> > 
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, I'm setting the long long int part of the
> > switch-exp-transform-3.c test to only execute with 64bit targets.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Set the long long
> >   int test to only execute with 64bit targets.
> 
> There should be just tab, not tab + 2 spaces before int test.
> Ok with that nit changed.
> 
>   Jakub
> 

Ok, removed the 2 spaces and pushed the patch.

Thanks,
Filip Kastl


Re: [PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Jakub Jelinek
On Wed, Jul 31, 2024 at 01:32:06PM +0200, Filip Kastl wrote:
> Thanks for the feedback!  Here is a second version of the patch.  I've tested
> this version with
> 
> make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
> --target_board='unix{-m32}'"
> 
> and
> 
> make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"

You can just use
make check RUNTESTFLAGS="--target_board=unix\{-m32,-m64\} 
i386.exp=switch-exp-transform-3.c"

> testsuite: Adjust switch-exp-transform-3.c for 32bit
> 
> 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> type.  Therefore, I'm setting the long long int part of the
> switch-exp-transform-3.c test to only execute with 64bit targets.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/switch-exp-transform-3.c: Set the long long
> int test to only execute with 64bit targets.

There should be just tab, not tab + 2 spaces before int test.
Ok with that nit changed.

Jakub



[PATCH v2] testsuite: Adjust switch-exp-transform-3.c for 32bit

2024-07-31 Thread Filip Kastl
On Wed 2024-07-31 12:18:34, Jakub Jelinek wrote:
> On Wed, Jul 31, 2024 at 12:02:08PM +0200, Filip Kastl wrote:
> > 32bit x86 CPUs won't natively support the FFS operation on a 64 bit
> > type.  Therefore, the switch-exp-transform-3.c test will always fail
> > with a 32bit target.  I'm fixing my mistake.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/i386/switch-exp-transform-3.c: Remove code testing
> >   that the exponential index transform is able to handle long
> >   long int.
> 
> But for -m64 it does and it is good to test even that.
> Can't you wrap the long long stuff with
> #ifdef __x86_64__
> and
> do
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
> "switchconv" { target ia32 } } } */
> /* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
> "switchconv" { target { ! ia32 } } } } */
> or so?
> 
>   Jakub
> 

Thanks for the feedback!  Here is a second version of the patch.  I've tested
this version with

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c 
--target_board='unix{-m32}'"

and

make check RUNTESTFLAGS="i386.exp=gcc.target/i386/switch-exp-transform-3.c"

on a x86_64 machine and in both cases the test didn't produce any errors and
scan-tree-dump-times was successful.

Is this version ok?

Thanks,
Filip Kastl


-- 8< --


testsuite: Adjust switch-exp-transform-3.c for 32bit

32bit x86 CPUs won't natively support the FFS operation on a 64 bit
type.  Therefore, I'm setting the long long int part of the
switch-exp-transform-3.c test to only execute with 64bit targets.

gcc/testsuite/ChangeLog:

* gcc.target/i386/switch-exp-transform-3.c: Set the long long
  int test to only execute with 64bit targets.

Signed-off-by: Filip Kastl 
---
 gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c 
b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
index c8fae70692e..64a7b146172 100644
--- a/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
+++ b/gcc/testsuite/gcc.target/i386/switch-exp-transform-3.c
@@ -99,6 +99,8 @@ int unopt_unsigned_long(unsigned long bit_position)
 }
 }
 
+#ifdef __x86_64__
+
 int unopt_long_long(long long bit_position)
 {
 switch (bit_position)
@@ -145,4 +147,7 @@ int unopt_unsigned_long_long(unsigned long long 
bit_position)
 }
 }
 
-/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" } } */
+#endif
+
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 4 
"switchconv" { target ia32 } } } */
+/* { dg-final { scan-tree-dump-times "Applying exponential index transform" 6 
"switchconv" { target { ! ia32 } } } } */
-- 
2.45.2



[Patch, v3] omp-offload.cc: Fix value-expr handling of 'declare target link' vars [PR115637] (was: [Patch] gimplify.cc: Handle VALUE_EXPR of MEM_REF's ADDR_EXPR argument [PR115637])

2024-07-31 Thread Tobias Burnus

Hi Richard, hi all,

Richard Biener wrote:

Looking at pass_omp_target_link::execute I wonder iff find_link_var_op
shouldn't simply do the substitution?  Aka


This seems to work ...


--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2893,6 +2893,7 @@ find_link_var_op (tree *tp, int *walk_subtrees, void *)
&& is_global_var (t)
&& lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
  {
+  *tp = unshare_expr (DECL_VALUE_EXPR (t));
*walk_subtrees = 0;
return t;
  }

which then makes the stmt obviously not gimple?


... except that 'return t' prevents updating other value-expr in the 
same stmt, but that can be fixed.


Updated patch attached.

Thanks for the suggestion!

Tobias
omp-offload.cc: Fix value-expr handling of 'declare target link' vars

As the PR and included testcase shows, replacing 'arr2' by its value expression
'*arr2$13$linkptr' failed for
  MEM  [(c_char * {ref-all})]
which left 'arr2' in the code as unknown symbol. Now expand the value expression
already in pass_omp_target_link::execute's process_link_var_op walk_gimple_stmt
walk - and don't rely on gimple_regimplify_operands.

PR middle-end/115637

gcc/ChangeLog:

	* gimplify.cc (gimplify_body): Fix macro name in the comment.
	* omp-offload.cc (found_link_var): New global var.
	(find_link_var_op): Rename to ...
	(process_link_var_op): ... this. Replace value expr; set
	found_link_var.
	(pass_omp_target_link::execute): Update walk_gimple_stmt call.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: Uncomment
	now working code.

Co-authored-by: Richard Biener  PR115637
-! if (res /= -11436) stop 5
-if (res /= -11546) stop 5 ! FIXME
+! print *, res
+if (res /= -11436) stop 5
   end
   integer function run_device1()
 !$omp declare target
 integer :: i
 run_device1 = -99
-! FIXME: arr2 not link mapped -> PR115637
-!   arr2 = [11,22,33,44]
+arr2 = [11,22,33,44]
 if (any (arr(10:50) /= [(i, i=10,50)])) then
   run_device1 = arr(11)
   return
 end if
-! FIXME: -> PR115637
-! run_device1 = sum(arr(10:13) + arr2)
-run_device1 = sum(arr(10:13) ) ! FIXME
+run_device1 = sum(arr(10:13) + arr2)
 do i = 10, 50
   arr(i) = 3 - 10 * arr(i)
 end do


[PATCH] c++/coroutines: only defer expanding co_{await, return, yield} if dependent [PR112341]

2024-07-31 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu.  OK for trunk?

TIA, have a lovely day.
-- >8 --
By doing so, we can get diagnostics in template decls when we know we
can.  For instance, in the following:

  awaitable g();
  template
  task f()
  {
co_await g();
co_yield 1;
co_return "foo";
  }

... the coroutine promise type in each statement is always
std::coroutine_handle::promise_type, and all of the operands are
not type-dependent, so we can always compute the resulting types (and
expected types) of these expressions and statements.

Also, when we do not know the type of the CO_AWAIT_EXPR or
CO_YIELD_EXPR, we now return NULL_TREE as the type rather than
unknown_type_node.  This is more correct, since the type is not unknown,
it just isn't determined yet.  This also means we can remove the
CO_AWAIT_EXPR and CO_YIELD_EXPR special-cases from
type_dependent_expression_p.

PR c++/112341 - error: insufficient contextual information to determine type on 
co_await result in function template

gcc/cp/ChangeLog:

PR c++/112341
* coroutines.cc (struct coroutine_info): Also cache the
traits type.
(ensure_coro_initialized): New function.  Makes sure we have
initialized the coroutine state successfully, or informs the
caller should it fail to do so.  Extracted from
coro_promise_type_found_p.
(coro_get_traits_class): New function.  Gets the (cached)
coroutine traits type for a given coroutine.  Extracted from
coro_promise_type_found_p and refactored to cache the result.
(coro_promise_type_found_p): Use the two functions above.
(build_template_co_await_expr): New function.  Builds a
CO_AWAIT_EXPR representing a CO_AWAIT_EXPR in a template
declaration.
(build_co_await): Use the above if processing_template_decl, and
give it a propert type.
(defer_expansion_p): New function.  Returns true iff its
argument is a type-dependent expression OR the current functions
traits class is type dependent.
(finish_co_await_expr): Defer expansion only in the case
defer_expasnion_p returns true.
(finish_co_yield_expr): Ditto.
(finish_co_return_stmt): Ditto.
* pt.cc (type_dependent_expression_p): Do not treat
CO_AWAIT/CO_YIELD specially.

gcc/testsuite/ChangeLog:

PR c++/112341
* g++.dg/coroutines/pr112341-2.C: New test.
* g++.dg/coroutines/pr112341-3.C: New test.
* g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C: New
test.
* g++.dg/coroutines/pr112341.C: New test.
---
 gcc/cp/coroutines.cc  | 142 ++
 gcc/cp/pt.cc  |   5 -
 gcc/testsuite/g++.dg/coroutines/pr112341-2.C  |  25 +++
 gcc/testsuite/g++.dg/coroutines/pr112341-3.C  |  65 
 gcc/testsuite/g++.dg/coroutines/pr112341.C|  21 +++
 .../torture/co-yield-03-tmpl-nondependent.C   | 140 +
 6 files changed, 361 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-2.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341-3.C
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr112341.C
 create mode 100644 
gcc/testsuite/g++.dg/coroutines/torture/co-yield-03-tmpl-nondependent.C

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 127a1c06b56e..9494cb499454 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -85,6 +85,7 @@ struct GTY((for_user)) coroutine_info
   tree actor_decl;/* The synthesized actor function.  */
   tree destroy_decl;  /* The synthesized destroy function.  */
   tree promise_type;  /* The cached promise type for this function.  */
+  tree traits_type;   /* The cached traits type for this function.  */
   tree handle_type;   /* The cached coroutine handle for this function.  */
   tree self_h_proxy;  /* A handle instance that is used as the proxy for the
 one that will eventually be allocated in the coroutine
@@ -429,11 +430,12 @@ find_promise_type (tree traits_class)
   return promise_type;
 }
 
+/* Perform initialization of the coroutine processor state, if not done
+   before.  */
+
 static bool
-coro_promise_type_found_p (tree fndecl, location_t loc)
+ensure_coro_initialized (location_t loc)
 {
-  gcc_assert (fndecl != NULL_TREE);
-
   if (!coro_initialized)
 {
   /* Trees we only need to create once.
@@ -466,6 +468,33 @@ coro_promise_type_found_p (tree fndecl, location_t loc)
 
   coro_initialized = true;
 }
+  return true;
+}
+
+/* Try to get the coroutine traits class.  */
+static tree
+coro_get_traits_class (tree fndecl, location_t loc)
+{
+  gcc_assert (fndecl != NULL_TREE);
+
+  if (!ensure_coro_initialized (loc))
+/* We can't continue.  */
+return error_mark_node;
+
+  coroutine_info *coro_info = get_or_insert_coroutine_info (fndecl);
+  auto& traits_type = 

Re: [PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-07-31 Thread Sam James
"Kewen.Lin"  writes:

> Hi,
>
> As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
> was designed for little-endian, the recent commit r15-2403 made it
> be tested with running on BE and PR116148 got exposed.
>
> This patch is to adjust the expected data for members in with_fam_2_v
> and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
> from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.
>
> Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>
> Is it ok for trunk?

I can't approve it but LGTM & thanks for doing it. Maybe give Qing at
least the rest of the day to comment given it's theirs.

>
> BR,
> Kewen

thanks,
sam


signature.asc
Description: PGP signature


[RFC/RFA] [PATCH v2 09/12] Add symbolic execution support.

2024-07-31 Thread Mariam Arutunian
Gives an opportunity to execute the code on bit level,
   assigning symbolic values to the variables which don't have initial
values.
   Supports only CRC specific operations.

   Example:

   uint8_t crc;
   uint8_t pol = 1;
   crc = crc ^ pol;

   during symbolic execution crc's value will be:
   crc(8), crc(7), ... crc(1), crc(0) ^ 1

   Author: Matevos Mehrabyan 

 gcc/

   * Makefile.in (OBJS): Add sym-exec/expression.o,
   sym-exec/state.o, sym-exec/condition.o.
   * configure (sym-exec): New subdir.

 gcc/sym-exec/

   * condition.cc: New file.
   * condition.h: New file.
   * expression-is-a-helper.h: New file.
   * expression.cc: New file.
   * expression.h: New file.
   * state.cc: New file.
   * state.h: New file.

   Signed-off-by: Mariam Arutunian 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 0238201981d..1d10120baf3 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1722,6 +1722,9 @@ OBJS = \
 	tree-logical-location.o \
 	tree-loop-distribution.o \
 	gimple-crc-optimization.o \
+	sym-exec/expression.o \
+	sym-exec/state.o \
+	sym-exec/condition.o \
 	tree-nested.o \
 	tree-nrv.o \
 	tree-object-size.o \
diff --git a/gcc/configure b/gcc/configure
index 1335db2d4d2..68e905fb48e 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -36203,7 +36203,7 @@ $as_echo "$as_me: executing $ac_file commands" >&6;}
 "depdir":C) $SHELL $ac_aux_dir/mkinstalldirs $DEPDIR ;;
 "gccdepdir":C)
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs build/$DEPDIR
-  for lang in $subdirs c-family common analyzer text-art rtl-ssa
+  for lang in $subdirs c-family common analyzer text-art rtl-ssa sym-exec
   do
   ${CONFIG_SHELL-/bin/sh} $ac_aux_dir/mkinstalldirs $lang/$DEPDIR
   done ;;
diff --git a/gcc/sym-exec/condition.cc b/gcc/sym-exec/condition.cc
new file mode 100644
index 000..5b558d1e315
--- /dev/null
+++ b/gcc/sym-exec/condition.cc
@@ -0,0 +1,53 @@
+#include "condition.h"
+
+bit_condition::bit_condition (value_bit *left, value_bit *right, tree_code code)
+{
+  this->m_left = left;
+  this->m_right = right;
+  this->m_code = code;
+  m_type = BIT_CONDITION;
+}
+
+
+bit_condition::bit_condition (const bit_condition )
+{
+  bit_expression::copy ();
+  m_code = expr.get_code ();
+}
+
+
+tree_code
+bit_condition::get_code () const
+{
+  return m_code;
+}
+
+
+value_bit *
+bit_condition::copy () const
+{
+  return new bit_condition (*this);
+}
+
+
+void
+bit_condition::print_expr_sign ()
+{
+  switch (m_code)
+{
+  case GT_EXPR:
+	fprintf (dump_file, " > ");
+	break;
+  case LT_EXPR:
+	fprintf (dump_file, " < ");
+	break;
+  case EQ_EXPR:
+	fprintf (dump_file, " == ");
+	break;
+  case NE_EXPR:
+	fprintf (dump_file, " != ");
+	break;
+  default:
+	fprintf (dump_file, " ? ");
+}
+}
\ No newline at end of file
diff --git a/gcc/sym-exec/condition.h b/gcc/sym-exec/condition.h
new file mode 100644
index 000..1882c6cfa91
--- /dev/null
+++ b/gcc/sym-exec/condition.h
@@ -0,0 +1,26 @@
+#ifndef SYM_EXEC_CONDITION_H
+#define SYM_EXEC_CONDITION_H
+
+#include "expression.h"
+
+enum condition_status {
+  CS_NO_COND,
+  CS_TRUE,
+  CS_FALSE,
+  CS_SYM
+};
+
+
+class bit_condition : public bit_expression {
+ private:
+  tree_code m_code;
+  void print_expr_sign ();
+
+ public:
+  bit_condition (value_bit *left, value_bit *right, tree_code type);
+  bit_condition (const bit_condition );
+  tree_code get_code () const;
+  value_bit *copy () const;
+};
+
+#endif /* SYM_EXEC_CONDITION_H.  */
\ No newline at end of file
diff --git a/gcc/sym-exec/expression-is-a-helper.h b/gcc/sym-exec/expression-is-a-helper.h
new file mode 100644
index 000..9931254c36e
--- /dev/null
+++ b/gcc/sym-exec/expression-is-a-helper.h
@@ -0,0 +1,204 @@
+#ifndef SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+#define SYM_EXEC_EXPRESSION_IS_A_HELPER_H
+
+#include "condition.h"
+
+/* Defining test functions for value conversion via dyn_cast.  */
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::SYMBOLIC_BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  value_type type = ptr->get_type ();
+  return type == value_type::BIT_AND_EXPRESSION
+	 || type == value_type::BIT_OR_EXPRESSION
+	 || type == value_type::BIT_XOR_EXPRESSION
+	 || type == value_type::BIT_COMPLEMENT_EXPRESSION
+	 || type == value_type::SHIFT_RIGHT_EXPRESSION
+	 || type == value_type::SHIFT_LEFT_EXPRESSION
+	 || type == value_type::ADD_EXPRESSION
+	 || type == value_type::SUB_EXPRESSION
+	 || type == value_type::BIT_CONDITION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  return ptr->get_type () == value_type::BIT_AND_EXPRESSION;
+}
+
+
+template<>
+template<>
+inline bool
+is_a_helper::test (value_bit *ptr)
+{
+  

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Uros Bizjak
On Wed, Jul 31, 2024 at 11:33 AM Richard Biener  wrote:
>
> On Wed, 31 Jul 2024, Uros Bizjak wrote:
>
> > On Wed, Jul 31, 2024 at 10:48 AM Richard Biener  wrote:
> > >
> > > On Wed, 31 Jul 2024, Uros Bizjak wrote:
> > >
> > > > On Wed, Jul 31, 2024 at 10:24 AM Jakub Jelinek  wrote:
> > > > >
> > > > > On Wed, Jul 31, 2024 at 10:11:44AM +0200, Uros Bizjak wrote:
> > > > > > OK. Richard, can you please mention the above in the comment why
> > > > > > XFmode is rejected in the hook?
> > > > > >
> > > > > > Later, we can perhaps benchmark XFmode move vs. generic memory copy 
> > > > > > to
> > > > > > get some hard data.
> > > > >
> > > > > My (limited) understanding was that the hook would be used only for 
> > > > > cases
> > > > > where we'd like to e.g. value number some SF/DF/XF etc. mode loads 
> > > > > and some
> > > > > subsequent loads from the same address with different mode but same 
> > > > > size
> > > > > the same and replace say int or long long later load with 
> > > > > VIEW_CONVERT_EXPR
> > > > > of the result of the SF/SF mode load.  That is what was incorrect, 
> > > > > because
> > > > > the load didn't preserve all the bits.  The patch would still keep 
> > > > > doing
> > > > > normal SF/DF/XF etc. mode copies if that is all that happens in the 
> > > > > program,
> > > > > load some floating point value and store it elsewhere or as part of 
> > > > > larger
> > > > > aggregate copy.
> > > >
> > > > So, the hook should allow everything besides SF/DFmode, simply:
> > > >
> > > >
> > > > switch (GET_MODE_INNER (mode))
> > > >   {
> > > >   case SFmode:
> > > >   case DFmode:
> > > > /* These suffer from normalization upon load when not using 
> > > > SSE.  */
> > > > return !(ix86_fpmath & FPMATH_387);
> > > >   default:
> > > > return true;
> > > >   }
> > >
> > > OK, I think I'll go with this then.  I'm now unsure whether the
> > > wrapper around the hook should reject modes with padding or if
> > > the supposed users (value-numbering and SRA) should deal with that
> > > issue separately.  I do wonder whether
> > >
> > > ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_LONG_DOUBLE
> > >   ? _extended_intel_128_format
> > >   : TARGET_96_ROUND_53_LONG_DOUBLE
> > >   ? _extended_intel_96_round_53_format
> > >   : _extended_intel_96_format));
> > > ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > > ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > >
> > > unambiguously specifies where the padding is - m68k has
> > >
> > > FRACTIONAL_FLOAT_MODE (XF, 80, 12, ieee_extended_motorola_format);
> > >
> > > It's also not clear we can model a x87 10 byte memory copy in RTL since
> > > a mem:XF still touches 12 or 16 bytes - IIRC a store leaves
> > > possible padding as unspecified and not "masked out" even if
> > > the actual fstp will only store 10 bytes.
> >
> > The hardware will never touch bytes outside 10 bytes range, the
> > padding is some artificial compiler thingy, so IMO it should be
> > handled before the hook is called. Please find attached the source I
> > have used to confirm that a) the copied bits will never be mangled and
> > b) there is no access outside the 10 bytes range. (BTW: these
> > particular values are to test the effect of leading bit 63, the
> > non-hidden normalized bit).
>
> Thanks - I do wonder why GET_MODE_SIZE (XFmode) is not 10 then,
> mode_base_align[XFmode] seems to be correctly set to ensure
> 12 bytes / 16 bytes "effective" size.

Uh, this decision predates my involvement in GCC development by a long shot ;)

Uros.


Re: [PATCH] LoongArch: Rework bswap{hi,si,di}2 definition

2024-07-31 Thread Xi Ruoyao
On Wed, 2024-07-31 at 16:57 +0800, Lulu Cheng wrote:
> 
> 在 2024/7/29 下午3:58, Xi Ruoyao 写道:
> > Per a gcc-help thread we are generating sub-optimal code for
> > __builtin_bswap{32,64}.  To fix it:
> > 
> > - Use a single revb.d instruction for bswapdi2.
> > - Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
> >     revb.2h + rotri.w for !TARGET_64BIT.
> > - Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
> >     revb.2w instruction for bswapdi2 (x) r>> 32.
> > 
> > Unfortunately I cannot figure out a way to make the compiler generate
> > revb.4h or revh.{2w,d} instructions.
> 
> This optimization is really ingenious and I have no problem.
> 
> I also haven't figured out how to generate revb.4h or revh. {2w,d}.
> I think we can merge this patch first.

Pushed r15-2433.

FWIW I tried a naive pattern for revh.2w:

(set (match_operand:DI 0 "register_operand" "=r")
 (ior:DI
   (and:DI
 (ashift:DI (match_operand:DI 1 "register_operand" "r")
(const_int 16))
 (const_int 18446462603027742720))
   (and:DI
 (lshiftrt:DI (match_dup 1)
  (const_int 16))
 (const_int 281470681808895

But it seems too complex to be recognized.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


<    1   2   3   4   5   6   7   8   9   10   >