date:20221122

[PATCH] speed up end_fde_sort using radix sort

2022-11-22 Thread Thomas Neumann via Gcc-patches


When registering a dynamic unwinding frame the fde list is sorted.
Previously, we split the list into a sorted and an unsorted part,
sorted the later using heap sort, and merged both. That can be
quite slow due to the large number of (expensive) comparisons.

This patch replaces that logic with a radix sort instead. The
radix sort uses the same amount of memory as the old logic,
using the second list as auxiliary space, and it includes two
techniques to speed up sorting: First, it computes the pointer
addresses for blocks of values, reducing the decoding overhead.
And it recognizes when the data has reached a sorted state,
allowing for early termination. When running out of memory
we fall back to pure heap sort, as before.

For this test program

#include 
int main(int argc, char** argv) {
 return 0;
}

compiled with g++ -O -o hello -static hello.c we get with
perf stat -r 200 on a 5950X the following performance numbers:

old logic:

  0,20 msec task-clock
   930.834  cycles
 3.079.765  instructions
0,00030478 +- 0,0237 seconds time elapsed

new logic:

  0,10 msec task-clock
   473.269  cycles
 1.239.077  instructions
0,00021119 +- 0,0168 seconds time elapsed

libgcc/ChangeLog:
* unwind-dw2-fde.c: Use radix sort instead of split+sort+merge.
---
 libgcc/unwind-dw2-fde.c | 234 +++-
 1 file changed, 134 insertions(+), 100 deletions(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 3c0cc654ec0..b81540c41a4 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -456,22 +456,52 @@ fde_mixed_encoding_compare (struct object *ob, const fde 
*x, const fde *y)
 
 typedef int (*fde_compare_t) (struct object *, const fde *, const fde *);
 
+// The extractor functions compute the pointer values for a block of

+// fdes. The block processing hides the call overhead.
 
-/* This is a special mix of insertion sort and heap sort, optimized for

-   the data sets that actually occur. They look like
-   101 102 103 127 128 105 108 110 190 111 115 119 125 160 126 129 130.
-   I.e. a linearly increasing sequence (coming from functions in the text
-   section), with additionally a few unordered elements (coming from functions
-   in gnu_linkonce sections) whose values are higher than the values in the
-   surrounding linear sequence (but not necessarily higher than the values
-   at the end of the linear sequence!).
-   The worst-case total run time is O(N) + O(n log (n)), where N is the
-   total number of FDEs and n is the number of erratic ones.  */
+static void
+fde_unencoded_extract (struct object *ob __attribute__ ((unused)),
+  _Unwind_Ptr *target, const fde **x, int count)
+{
+  for (int index = 0; index < count; ++index)
+memcpy (target + index, x[index]->pc_begin, sizeof (_Unwind_Ptr));
+}
+
+static void
+fde_single_encoding_extract (struct object *ob, _Unwind_Ptr *target,
+const fde **x, int count)
+{
+  _Unwind_Ptr base;
+
+  base = base_from_object (ob->s.b.encoding, ob);
+  for (int index = 0; index < count; ++index)
+read_encoded_value_with_base (ob->s.b.encoding, base, x[index]->pc_begin,
+ target + index);
+}
+
+static void
+fde_mixed_encoding_extract (struct object *ob, _Unwind_Ptr *target,
+   const fde **x, int count)
+{
+  for (int index = 0; index < count; ++index)
+{
+  int encoding = get_fde_encoding (x[index]);
+  read_encoded_value_with_base (encoding, base_from_object (encoding, ob),
+   x[index]->pc_begin, target + index);
+}
+}
+
+typedef void (*fde_extractor_t) (struct object *, _Unwind_Ptr *, const fde **,
+int);
+
+// Data is is sorted using radix sort if possible, using an temporary
+// auxiliary data structure of the same size as the input. When running
+// out of memory to in-place heap sort.
 
 struct fde_accumulator

 {
   struct fde_vector *linear;
-  struct fde_vector *erratic;
+  struct fde_vector *aux;
 };
 
 static inline int

@@ -485,8 +515,8 @@ start_fde_sort (struct fde_accumulator *accu, size_t count)
   if ((accu->linear = malloc (size)))
 {
   accu->linear->count = 0;
-  if ((accu->erratic = malloc (size)))
-   accu->erratic->count = 0;
+  if ((accu->aux = malloc (size)))
+   accu->aux->count = 0;
   return 1;
 }
   else
@@ -500,59 +530,6 @@ fde_insert (struct fde_accumulator *accu, const fde 
*this_fde)
 accu->linear->array[accu->linear->count++] = this_fde;
 }
 
-/* Split LINEAR into a linear sequence with low values and an erratic

-   sequence with high values, put the linear one (of longest possible
-   length) into LINEAR and the erratic one into ERRATIC. This is O(N).
-
-   Because the longest linear sequence we are trying to locate within the
-   incoming LINEAR array c

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Richard Biener via Gcc-patches

On Mon, 21 Nov 2022, Qing Zhao wrote:

> 
> 
> > On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
> > 
> > On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> >> Hi, Richard,
> >> 
> >> Honestly, it?s very hard for me to decide what?s the best way to handle 
> >> the interaction 
> >> between -fstrict-flex-array=M and -Warray-bounds=N. 
> >> 
> >> Ideally,  -fstrict-flex-array=M should completely control the behavior of 
> >> -Warray-bounds.
> >> If possible, I prefer this solution.
> >> 
> >> However, -Warray-bounds is included in -Wall, and has been used 
> >> extensively for a long time.
> >> It?s not safe to change its default behavior. 
> > 
> > I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
> > it is in -Wall is _good_ for this reason. :) No one is going to add
> > -fstrict-flex-arrays (at any level) without understanding what it does
> > and wanting those effects on -Warray-bounds.
> 
> 
> The major difficulties to let -fstrict-flex-arrays controlling -Warray-bounds 
> was discussed in the following threads:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
> 
> Please take a look at the discussion and let me know your opinion.

My opinion is now, after re-considering and with seeing your new 
patch, that -Warray-bounds=2 should be changed to only add
"the intermediate results of pointer arithmetic that may yield out of 
bounds values" and that what it considers a flex array should now
be controlled by -fstrict-flex-arrays only.

That is, I think, the only thing that's not confusing to users even
if that implies a change from previous behavior that we should
document by clarifying the -Warray-bounds documentation as well as
by adding an entry to the Caveats section of gcc-13/changes.html

That also means that =2 will get _less_ warnings with GCC 13 when
the user doesn't use -fstrict-flex-arrays as well.

Richard.

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-22 Thread Florian Weimer via Gcc-patches

* Thomas Neumann:

> Hi,
>
> When dynamically linking a fast enough machine hides the latency, but when
> Statically linking or on slower devices this change caused a 5x increase 
> in
> Instruction count and 2x increase in cycle count before getting to main.
>
> I have looked at ways to fix that. The problem is that with static
> linking unwinding tables are registered dynamically, and with my patch 
> that registration triggers an eager sort of fde lists. While
> previously the lists were sorted when the first exception was
> thrown. If an application throws at least one exception there is no
> downside in eager sorting, but if the application never throws there
> is overhead.

Would it be possible to trigger lazy registration if the version is read
as a zero?  This would not introduce any additional atomic instructions
on the fast path.

Thanks,
Florian

Re: [PATCH] Remove legacy VRP (maybe?)

2022-11-22 Thread Richard Biener via Gcc-patches

On Mon, Nov 21, 2022 at 5:49 PM Jeff Law  wrote:
>
>
> On 11/21/22 09:35, Aldy Hernandez via Gcc-patches wrote:
> > I've been playing around with removing the legacy VRP code for the
> > next release.  It's a layered onion to get this right, but the first
> > bit is pretty straightforward and may be useful for this release.
> > Basically, it entails removing the old VRP pass itself, along with
> > value_range_equiv which have no producers left.  The current users of
> > value_range_equiv don't put anything in the equivalence bitmaps, so
> > they're basically behaving like plain value_range.
> >
> > I removed as much as possible without having to change any behavior,
> > and this is what I came up with.  Is this something that would be
> > useful for this release?  Would it help release managers have less
> > unused cruft in the tree?
> >
> > Neither Andrew nor I have any strong feelings here.  We don't foresee
> > the legacy code changing at all in the offseason, so we can just
> > accumulate these patches in local trees.
>
> I'd lean towards removal after gcc-13 releases.

I think removing the ability to switch to the old implementation easens
maintainance so I'd prefer to have this before the gcc-13 release.

So please go ahead.

Thanks,
Richard.

>
> jeff
>

Re: [PATCH] Remove legacy VRP (maybe?)

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, Nov 22, 2022 at 9:24 AM Richard Biener
 wrote:
>
> On Mon, Nov 21, 2022 at 5:49 PM Jeff Law  wrote:
> >
> >
> > On 11/21/22 09:35, Aldy Hernandez via Gcc-patches wrote:
> > > I've been playing around with removing the legacy VRP code for the
> > > next release.  It's a layered onion to get this right, but the first
> > > bit is pretty straightforward and may be useful for this release.
> > > Basically, it entails removing the old VRP pass itself, along with
> > > value_range_equiv which have no producers left.  The current users of
> > > value_range_equiv don't put anything in the equivalence bitmaps, so
> > > they're basically behaving like plain value_range.
> > >
> > > I removed as much as possible without having to change any behavior,
> > > and this is what I came up with.  Is this something that would be
> > > useful for this release?  Would it help release managers have less
> > > unused cruft in the tree?
> > >
> > > Neither Andrew nor I have any strong feelings here.  We don't foresee
> > > the legacy code changing at all in the offseason, so we can just
> > > accumulate these patches in local trees.
> >
> > I'd lean towards removal after gcc-13 releases.
>
> I think removing the ability to switch to the old implementation easens
> maintainance so I'd prefer to have this before the gcc-13 release.
>
> So please go ahead.

Btw, ASSERT_EXPR should also go away with this, no?

Richard.

> Thanks,
> Richard.
>
> >
> > jeff
> >

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, Nov 22, 2022 at 8:59 AM Richard Sandiford via Gcc-patches
 wrote:
>
> Wilco Dijkstra  writes:
> > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the
> > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4
> > FMA pipes.  This improves SPECFP2017 on Neoverse V1 by ~1.5%.
> >
> > Passes regress/bootstrap, OK for commit?
> >
> > gcc/
> > PR 107413
> > * config/aarch64/aarch64.cc (struct tune_params): Add
> > fma_reassoc_width to all CPU tuning structures.
> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> > fma_reassoc_width.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index 
> > a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037
> >  100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -540,6 +540,7 @@ struct tune_params
> >const char *loop_align;
> >int int_reassoc_width;
> >int fp_reassoc_width;
> > +  int fma_reassoc_width;
> >int vec_reassoc_width;
> >int min_div_recip_mul_sf;
> >int min_div_recip_mul_df;
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 
> > 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d
> >  100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings =
> >"4", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings =
> >"8", /* loop_align.  */
> >2,   /* int_reassoc_width.  */
> >4,   /* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,   /* vec_reassoc_width.  */
> >2,   /* min_div_recip_mul_sf.  */
> >2,   /* min_div_recip_mul_df.  */
> > @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings =
> >"8",  /* loop_align.  */
> >2,/* int_reassoc_width.  */
> >4,/* fp_reassoc_width.  */
> > +  1,   /* fma_reassoc_width.  */
> >1,/* vec_reassoc_width.  */
> >2,/* min_div_recip_mul_sf.  */
> >2,/* min_div_recip_mul_df.  */
> > @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings =
> >"16",/* loop

Re: [PATCH RFA(configure)] c++: provide strchrnul on targets without it [PR107781]

2022-11-22 Thread Jakub Jelinek via Gcc-patches

On Mon, Nov 21, 2022 at 06:31:47PM -0500, Jason Merrill via Gcc-patches wrote:
> Tested x86_64-pc-linux-gnu, and also manually changing the HAVE_DECL_STRCHRNUL
> flag.  OK for trunk?
> 
> -- 8< --
> 
> The Contracts implementation uses strchrnul, which is a glibc extension, so
> bootstrap broke on non-glibc targets.  I considered unconditionally using a
> local definition, but I guess we might as well use the libc version if it
> exists.
> 
>   PR c++/107781
> 
> gcc/cp/ChangeLog:
> 
>   * contracts.cc (strchrnul): Define if needed.
> 
> gcc/ChangeLog:
> 
>   * configure.ac: Check for strchrnul.
>   * config.in, configure: Regenerate.

Normally we'd add such a local definition to libiberty, shouldn't we do it
in this case too?

Jakub

[PATCH] Riscv don't support "-fprefetch-loop-arrays", skip.

2022-11-22 Thread Yixuan Chen

gcc/testsuite/ChangeLog:

Riscv don't support "-fprefetch-loop-arrays" option, skip.

2022-11-22  Yixuan Chen  

* gcc.dg/pr106397.c: Riscv don't support "-fprefetch-loop-arrays" 
option, skip.
---
 gcc/testsuite/gcc.dg/pr106397.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr106397.c b/gcc/testsuite/gcc.dg/pr106397.c
index 2bc17f8cf80..7b507125575 100644
--- a/gcc/testsuite/gcc.dg/pr106397.c
+++ b/gcc/testsuite/gcc.dg/pr106397.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
prefetch-latency=3 -fprefetch-loop-arrays" } */
 /* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
x86_64-*-* } && ia32 } } } */
-
+/* { dg-skip-if "" { riscv*-*-* } } */
 int
 bar (void)
 {
-- 
2.37.2

[PATCH] tree-optimization/107672 - avoid vector mode type_for_mode call

2022-11-22 Thread Richard Biener via Gcc-patches

The following avoids using type_for_mode on vector modes which might
not work for all frontends.  Instead we look for the inner mode
type and use build_vector_type_for_mode instead.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107672
* tree-vect-stmts.cc (supportable_widening_operation): Avoid
type_for_mode on vector modes.
---
 gcc/tree-vect-stmts.cc | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index bc0ef136f19..b35b986889d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12195,9 +12195,15 @@ supportable_widening_operation (vec_info *vinfo,
intermediate_type
  = vect_halve_mask_nunits (prev_type, intermediate_mode);
   else
-   intermediate_type
- = lang_hooks.types.type_for_mode (intermediate_mode,
-   TYPE_UNSIGNED (prev_type));
+   {
+ gcc_assert (VECTOR_MODE_P (intermediate_mode));
+ tree intermediate_element_type
+   = lang_hooks.types.type_for_mode (GET_MODE_INNER 
(intermediate_mode),
+ TYPE_UNSIGNED (prev_type));
+ intermediate_type
+   = build_vector_type_for_mode (intermediate_element_type,
+ intermediate_mode);
+   }
 
   if (VECTOR_BOOLEAN_TYPE_P (intermediate_type)
  && VECTOR_BOOLEAN_TYPE_P (prev_type)
-- 
2.35.3

Re: [PATCH] Riscv don't support "-fprefetch-loop-arrays", skip.

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, Nov 22, 2022 at 9:43 AM Yixuan Chen  wrote:
>
> gcc/testsuite/ChangeLog:
>
> Riscv don't support "-fprefetch-loop-arrays" option, skip.

Looking around other testcases simply add -w to the set of command-line options,
can you do that instead?

OK with that change,
Richard.

> 2022-11-22  Yixuan Chen  
>
> * gcc.dg/pr106397.c: Riscv don't support "-fprefetch-loop-arrays" 
> option, skip.
> ---
>  gcc/testsuite/gcc.dg/pr106397.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/pr106397.c b/gcc/testsuite/gcc.dg/pr106397.c
> index 2bc17f8cf80..7b507125575 100644
> --- a/gcc/testsuite/gcc.dg/pr106397.c
> +++ b/gcc/testsuite/gcc.dg/pr106397.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
> prefetch-latency=3 -fprefetch-loop-arrays" } */
>  /* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
> x86_64-*-* } && ia32 } } } */
> -
> +/* { dg-skip-if "" { riscv*-*-* } } */
>  int
>  bar (void)
>  {
> --
> 2.37.2
>

Re: [PATCH] rs6000: Adjust loop_unroll_adjust to match middle-end change [PR 107692]

2022-11-22 Thread Richard Biener via Gcc-patches

On Fri, Nov 18, 2022 at 3:47 PM Segher Boessenkool
 wrote:
>
> [ Please cc: me and Ke Wen on rs6000 patches ]
>
> On Thu, Nov 17, 2022 at 07:54:29AM +0800, Hongyu Wang wrote:
> > r13-3950-g071e428c24ee8c enables O2 small loop unrolling, but it breaks
> > -fno-unroll-loops for rs6000 with loop_unroll_adjust hook. Adjust the
> > option handling and target hook accordingly.
>
> NAK.
>
> This is wrong.  -munroll-only-small-loops does not enable loop
> unrolling; doing that with a machine flag is completely unmaintainable,
> also for people using different targets.

I suggested that because doing it fully in the backend by twiddling
-funroll-loops is unmaintainable as well.

> Something in your patch was wrong, please fix that (or revert the
> patch).  You should not have to touch config/rs6000/ at all.

Sure something is wrong, but I think there's the opportunity to
simplify rs6000/ and s390x/, the only other two implementors of
the hook used.

Richard.

>
> Segher

[PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Christophe Lyon via Gcc-patches

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
 gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
   ANON(struct z, a, D1)
   ANON(struct z, b, STACK)
   ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
   ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
+#endif
   LAST_ANON(_Decimal64, 0.5dd, STACK+40)
 #endif
-- 
2.34.1

Re: [PATCH] Remove legacy VRP (maybe?)

2022-11-22 Thread Aldy Hernandez via Gcc-patches





On 11/22/22 09:25, Richard Biener wrote:

On Tue, Nov 22, 2022 at 9:24 AM Richard Biener
 wrote:


On Mon, Nov 21, 2022 at 5:49 PM Jeff Law  wrote:



On 11/21/22 09:35, Aldy Hernandez via Gcc-patches wrote:

I've been playing around with removing the legacy VRP code for the
next release.  It's a layered onion to get this right, but the first
bit is pretty straightforward and may be useful for this release.
Basically, it entails removing the old VRP pass itself, along with
value_range_equiv which have no producers left.  The current users of
value_range_equiv don't put anything in the equivalence bitmaps, so
they're basically behaving like plain value_range.

I removed as much as possible without having to change any behavior,
and this is what I came up with.  Is this something that would be
useful for this release?  Would it help release managers have less
unused cruft in the tree?

Neither Andrew nor I have any strong feelings here.  We don't foresee
the legacy code changing at all in the offseason, so we can just
accumulate these patches in local trees.


I'd lean towards removal after gcc-13 releases.


I think removing the ability to switch to the old implementation easens
maintainance so I'd prefer to have this before the gcc-13 release.

So please go ahead.


Btw, ASSERT_EXPR should also go away with this, no?


Ah yes, for everything except ipa-*.* which uses it internally (and sets 
it in its internal structures):


   - ASSERT_EXPR means that only the value in operand is allowed to 
pass

 through (without any change), for all other values the result is
 unknown.

I can remove all other uses, including any externally visible documentation.

Thanks.
Aldy

Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-11-22 Thread Thomas Neumann via Gcc-patches


Would it be possible to trigger lazy registration if the version is read
as a zero?  This would not introduce any additional atomic instructions
on the fast path.


yes, that is possible. The main problem is the transition from lazy to 
non-lazy mode when the first exception is thrown. We must somehow stop 
the world for that without introducing an additional mutex. But I have 
though about that some more, and that is possible too, by encoding a 
magic value as version during the transition, which causes the other 
threads to block. A bit ugly, but manageable. I will implement that in a 
few days.


Independent of that I think we should improve the sort logic, as we 
still have to sort, even in lazy mode, at latest when the first 
exception is thrown. I have send a patch that significantly improves 
that step.


Best

Thomas

Re: [PATCH] Remove legacy VRP (maybe?)

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, Nov 22, 2022 at 10:04 AM Aldy Hernandez  wrote:
>
>
>
> On 11/22/22 09:25, Richard Biener wrote:
> > On Tue, Nov 22, 2022 at 9:24 AM Richard Biener
> >  wrote:
> >>
> >> On Mon, Nov 21, 2022 at 5:49 PM Jeff Law  wrote:
> >>>
> >>>
> >>> On 11/21/22 09:35, Aldy Hernandez via Gcc-patches wrote:
>  I've been playing around with removing the legacy VRP code for the
>  next release.  It's a layered onion to get this right, but the first
>  bit is pretty straightforward and may be useful for this release.
>  Basically, it entails removing the old VRP pass itself, along with
>  value_range_equiv which have no producers left.  The current users of
>  value_range_equiv don't put anything in the equivalence bitmaps, so
>  they're basically behaving like plain value_range.
> 
>  I removed as much as possible without having to change any behavior,
>  and this is what I came up with.  Is this something that would be
>  useful for this release?  Would it help release managers have less
>  unused cruft in the tree?
> 
>  Neither Andrew nor I have any strong feelings here.  We don't foresee
>  the legacy code changing at all in the offseason, so we can just
>  accumulate these patches in local trees.
> >>>
> >>> I'd lean towards removal after gcc-13 releases.
> >>
> >> I think removing the ability to switch to the old implementation easens
> >> maintainance so I'd prefer to have this before the gcc-13 release.
> >>
> >> So please go ahead.
> >
> > Btw, ASSERT_EXPR should also go away with this, no?
>
> Ah yes, for everything except ipa-*.* which uses it internally (and sets
> it in its internal structures):
>
> - ASSERT_EXPR means that only the value in operand is allowed to
> pass
>   through (without any change), for all other values the result is
>   unknown.

Ick.  But yeah, I can see how 'ASSERT_EXPR' looked nice to use here
(but it's only a distinct value, so TARGET_OPTION_NODE would have
worked here as well)

> I can remove all other uses, including any externally visible documentation.

Works for me.

Richard.

> Thanks.
> Aldy
>

[PATCH] c-family: Fix up -Wsign-compare BIT_NOT_EXPR handling [PR107465]

2022-11-22 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch fixes multiple bugs in warn_for_sign_compare related to
the BIT_NOT_EXPR related warnings.
My understanding is that what those 3 warnings are meant to warn (since 1995
apparently) is the case where we have BIT_NOT_EXPR of a zero-extended
value, so in result_type the value is something like:
0b (e.g. ~ of a 8->16 bit zero extension)
0b (e.g. ~ of a 8->16 bit zero extension
then zero extended to 32 bits)
0b (e.g. ~ of a 8->16 bit zero extension
then sign extended to 32 bits)
and the intention of the warning is to warn when this is compared against
something that has some 0 bits at the place where the above has guaranteed
1 bits, either ensured through comparison against constant where we know
the bits exactly, or through zero extension from some narrower type where
again we know at least some upper bits are zero extended.
The bugs in the warning code are:
1) misunderstanding of the {,c_common_}get_narrower APIs - the unsignedp
   it sets is only meaningful if the function actually returns something
   narrower (in that case it says whether the narrower value is then
   sign (0) or zero (1) extended to the originally passed value.
   Though op0 or op1 at this point might be already narrower than
   result_type, and if the function doesn't return anything narrower,
   it all depends on whether the passed in op{0,1} had TYPE_UNSIGNED
   type or not
2) the code didn't check at all whether the BIT_NOT_EXPR operand
   was actually zero extended (i.e. that it was narrower and unsignedp
   was set to 1 for it), all it did is check that unsignedp from the
   call was 1.  But that isn't well defined thing, if the argument
   is returned as is, the function sets unsignedp to 0, but if there
   is e.g. a useless cast to the same or compatible type in between,
   it can return 1 if the cast is unsigned; now, if BIT_NOT_EXPR
   operand is not zero extended, we know nothing at all about any bits
   in the operand containing BIT_NOT_EXPR, so there is nothing to warn
   about
3) the code was actually testing both operands after calling
   c_common_get_narrower on them and on the one with BIT_NOT_EXPR
   again for constants; I think that is just wrong in case the BIT_NOT_EXPR
   operand wouldn't be fully folded, the warning makes sense only if the
   other operand not having BIT_NOT_EXPR in it is constant
4) as can be seen from the above bit pattern examples, the upper bits above
   (in the patch arg0) aren't always all 1s, there could be some zero extension
   above it and from it one would have 0s, so that needs to be taken into
   account for the choice which constant bits to test for being always set
   otherwise warning is emitted, or for the zero extension guaranteed zero
   bits
5) the patch also simplifies the handling, we only do it if one but not
   both operands are BIT_NOT_EXPR after first {,c_common_}get_narrower,
   so we can just use std::swap to ensure it is the first one
6) the code compared bits against HOST_BITS_PER_LONG, which made sense
   back in 1995 when the values were stored into long, but now that they
   are HOST_WIDE_INT should test HOST_BITS_PER_WIDE_INT (or we could rewrite
   the stuff to wide_int, not done in the patch)

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-21  Jakub Jelinek  

PR c/107465
* c-warn.cc (warn_for_sign_compare): If c_common_get_narrower
doesn't return a narrower result, use TYPE_UNSIGNED to set unsignedp0
and unsignedp1.  For the one BIT_NOT_EXPR case vs. one without,
only check for constant in the non-BIT_NOT_EXPR operand, use std::swap
to simplify the code, only warn if BIT_NOT_EXPR operand is extended
from narrower unsigned, fix up computation of mask for the constant
cases and for unsigned other operand case handle differently
BIT_NOT_EXPR result being sign vs. zero extended.

* c-c++-common/Wsign-compare-2.c: New test.
* c-c++-common/pr107465.c: New test.

--- gcc/c-family/c-warn.cc.jj   2022-10-28 11:00:53.738247032 +0200
+++ gcc/c-family/c-warn.cc  2022-11-21 19:29:42.351885444 +0100
@@ -2344,42 +2344,50 @@ warn_for_sign_compare (location_t locati
  have all bits set that are set in the ~ operand when it is
  extended.  */
 
-  op0 = c_common_get_narrower (op0, &unsignedp0);
-  op1 = c_common_get_narrower (op1, &unsignedp1);
+  tree arg0 = c_common_get_narrower (op0, &unsignedp0);
+  if (TYPE_PRECISION (TREE_TYPE (arg0)) == TYPE_PRECISION (TREE_TYPE (op0)))
+unsignedp0 = TYPE_UNSIGNED (TREE_TYPE (op0));
+  op0 = arg0;
+  tree arg1 = c_common_get_narrower (op1, &unsignedp1);
+  if (TYPE_PRECISION (TREE_TYPE (arg1)) == TYPE_PRECISION (TREE_TYPE (op1)))
+unsignedp1 = TYPE_UNSIGNED (TREE_TYPE (op1));
+  op1 = arg1;
 
   if ((TREE_CODE (op0) == BIT_NOT_EXPR)
   ^ (TREE_CODE (op1) == BIT_NOT_EXPR))
 {
-  if (TREE_CODE

[PATCH] d: respect --enable-link-mutex configure option

2022-11-22 Thread Martin Liška

I noticed the option is ignored because @DO_LINK_MUTEX@
is not defined in d/Make-lang.in.

Tested locally before and after the patch.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* Makefile.in: Set DO_LINK_MUTEX.

gcc/d/ChangeLog:

* Make-lang.in: Use it as $DO_LINK_MUTEX.
---
 gcc/Makefile.in| 1 +
 gcc/d/Make-lang.in | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5ad638f59d8..c57d62229ee 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -272,6 +272,7 @@ COMPILER += $(CET_HOST_FLAGS)
 
 NO_PIE_CFLAGS = @NO_PIE_CFLAGS@
 NO_PIE_FLAG = @NO_PIE_FLAG@
+DO_LINK_MUTEX = @DO_LINK_MUTEX@
 
 # We don't want to compile the compilers with -fPIE, it make PCH fail.
 COMPILER += $(NO_PIE_CFLAGS)
diff --git a/gcc/d/Make-lang.in b/gcc/d/Make-lang.in
index 6f9b2e5c26a..984b1d63dcb 100644
--- a/gcc/d/Make-lang.in
+++ b/gcc/d/Make-lang.in
@@ -70,7 +70,7 @@ DPOSTCOMPILE = @mv $(@D)/$(DEPDIR)/$(*F).TPo 
$(@D)/$(DEPDIR)/$(*F).Po
 DLINKER = $(GDC) $(NO_PIE_FLAG) -lstdc++
 
 # Like LINKER, but use a mutex for serializing front end links.
-ifeq (@DO_LINK_MUTEX@,true)
+ifeq ($(DO_LINK_MUTEX),true)
 DLLINKER = $(SHELL) $(srcdir)/lock-and-run.sh linkfe.lck $(DLINKER)
 else
 DLLINKER = $(DLINKER)
-- 
2.38.1

[PATCH] Ver2: Riscv don't support "-fprefetch-loop-arrays" option, add "-w" option.

2022-11-22 Thread Yixuan Chen

gcc/testsuite/ChangeLog:

Riscv don't support "-fprefetch-loop-arrays" option, add "-w" option.

2022-11-22  Yixuan Chen  

* gcc.dg/pr106397.c: Riscv don't support "-fprefetch-loop-arrays" 
option, add "-w" option.
---
 gcc/testsuite/gcc.dg/pr106397.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/pr106397.c b/gcc/testsuite/gcc.dg/pr106397.c
index 2bc17f8cf80..b0983b61dfc 100644
--- a/gcc/testsuite/gcc.dg/pr106397.c
+++ b/gcc/testsuite/gcc.dg/pr106397.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
prefetch-latency=3 -fprefetch-loop-arrays" } */
 /* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
x86_64-*-* } && ia32 } } } */
+/* { dg-additional-options "-w" { target riscv*-*-* } } */
 
 int
 bar (void)
-- 
2.37.2

Re: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic

2022-11-22 Thread Christophe Lyon via Gcc-patches





On 11/17/22 17:37, Andrea Corallo via Gcc-patches wrote:

From: Stam Markianos-Wright 

In the past we had only defined the vsubq_x generic overload of the
vsubq_x_* intrinsics for float vector types.  This would cause them
to fall back to the `__ARM_undef` failure state if they was called
through the generic version.
This patch simply adds these overloads.

gcc/ChangeLog:

 * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
  (__arm_vsubq_x Integer): New.


Hi Stam,

To hopefully help Kyrill in the review, I think this fix is tested by 
patch #19, where we now have

+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
(this line explains why this bug was not noticed so far)

Thanks,

Christophe


---
  gcc/config/arm/arm_mve.h | 28 
  1 file changed, 28 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f6b42dc3fab..09167ec118e 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
  #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
__typeof(p2) __p2 = (p2); \
_Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: 
__arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, 
int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: 
__arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, 
int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: 
__arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, 
int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 
(__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 
(__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 
(__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce(__p2, 
uint8x16_t), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce(__p2, 
uint16x8_t), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce(__p2, 
uint32x4_t), p3), \
+  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_u8 
(__ARM_mve_coerce(__p1, uint8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: 
__arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t), __ARM_mve_coerce3(p2, 
int), p3), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: 
__arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t), __ARM_mve_coerce3(p2, 
int), p3), \
int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: 
__arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce(__p2, 
float16x8_t), p3), \
int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: 
__arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t), __ARM_mve_coerce(__p2, 
float32x4_t), p3), \
int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]: 
__arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t), __ARM_mve_coerce2(p2, 
double), p3), \
@@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16 
(__ARM_mve_coerce1(p0, uint16_t *)), \
int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32 
(__ARM_mve_coerce1(p0, uint32_t *
  
+#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \

+  __typeof(p2) __p2 = (p2); \
+  _Generic( (int (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: 
__arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce(__p2, 
int8x16_t), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: 
__arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce(__p2, 
int16x8_t), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: 
__arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce(__p2, 
int32x4_t), p3), \
+  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s8 
(__ARM_mve_coerce(__p1, int8x16_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s16 
(__ARM_mve_coerce(__p1, int16x8_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vsubq_x_n_s32 
(__ARM_mve_coerce(__p1, int32x4_t), __ARM_mve_coerce3(p2, int), p3), \
+  int (*

Re: [PATCH] Ver2: Riscv don't support "-fprefetch-loop-arrays" option, add "-w" option.

2022-11-22 Thread Richard Biener via Gcc-patches




> Am 22.11.2022 um 10:49 schrieb Yixuan Chen :
> 
> gcc/testsuite/ChangeLog:
> 
> Riscv don't support "-fprefetch-loop-arrays" option, add "-w" option.

Ok.

Richard 

> 2022-11-22  Yixuan Chen  
> 
>* gcc.dg/pr106397.c: Riscv don't support "-fprefetch-loop-arrays" 
> option, add "-w" option.
> ---
> gcc/testsuite/gcc.dg/pr106397.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/pr106397.c b/gcc/testsuite/gcc.dg/pr106397.c
> index 2bc17f8cf80..b0983b61dfc 100644
> --- a/gcc/testsuite/gcc.dg/pr106397.c
> +++ b/gcc/testsuite/gcc.dg/pr106397.c
> @@ -1,6 +1,7 @@
> /* { dg-do compile } */
> /* { dg-options "-O3 -fprefetch-loop-arrays --param l2-cache-size=0 --param 
> prefetch-latency=3 -fprefetch-loop-arrays" } */
> /* { dg-additional-options "-march=i686 -msse" { target { { i?86-*-* 
> x86_64-*-* } && ia32 } } } */
> +/* { dg-additional-options "-w" { target riscv*-*-* } } */
> 
> int
> bar (void)
> -- 
> 2.37.2
>

Re: [PATCH] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-11-22 Thread Philipp Tomsich

Richard & Tamar,

On Fri, 26 Aug 2022 at 15:29, Tamar Christina  wrote:
>
> > -Original Message-
> > From: Gcc-patches  > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> > Biener via Gcc-patches
> > Sent: Friday, August 26, 2022 10:08 AM
> > To: mtsamis 
> > Cc: GCC Patches ;
> > jiangning@amperecomputing.com; Philipp Tomsich
> > 
> > Subject: Re: [PATCH] Add pattern to convert vector shift + bitwise and +
> > multiply to vector compare in some cases.
> >
> > On Sat, Aug 13, 2022 at 11:59 AM mtsamis  wrote:
> > >
> > > When using SWAR (SIMD in a register) techniques a comparison operation
> > > within such a register can be made by using a combination of shifts,
> > > bitwise and and multiplication. If code using this scheme is
> > > vectorized then there is potential to replace all these operations
> > > with a single vector comparison, by reinterpreting the vector types to
> > match the width of the SWAR register.
> > >
> > > For example, for the test function packed_cmp_16_32, the original
> > generated code is:
> > >
> > > ldr q0, [x0]
> > > add w1, w1, 1
> > > ushrv0.4s, v0.4s, 15
> > > and v0.16b, v0.16b, v2.16b
> > > shl v1.4s, v0.4s, 16
> > > sub v0.4s, v1.4s, v0.4s
> > > str q0, [x0], 16
> > > cmp w2, w1
> > > bhi .L20
> > >
> > > with this pattern the above can be optimized to:
> > >
> > > ldr q0, [x0]
> > > add w1, w1, 1
> > > cmltv0.8h, v0.8h, #0
> > > str q0, [x0], 16
> > > cmp w2, w1
> > > bhi .L20
> > >
> > > The effect is similar for x86-64.
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > > cases.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
> > >
> > > Signed-off-by: mtsamis 
> > > ---
> > >  gcc/match.pd  | 57 +++
> > >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72
> > +++
> > >  2 files changed, 129 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > 8bbc0dbd5cd..5c768a94846 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -301,6 +301,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  (view_convert (bit_and:itype (view_convert @0)
> > >  (ne @1 { build_zero_cst (type);
> > > })))
> > >
> > > +/* In SWAR (SIMD in a register) code a comparison of packed data can
> > > +   be consturcted with a particular combination of shift, bitwise and,
> > > +   and multiplication by constants.  If that code is vectorized we can
> > > +   convert this pattern into a more efficient vector comparison.  */
> > > +(simplify  (mult (bit_and (rshift @0 @1) @2) @3)
> >
> > You should restrict the pattern a bit more, below you use
> > uniform_integer_cst_p and also require a vector type thus
> >
> >   (simplify
> >(mult (bit_and (rshift @0 VECTOR_CST@1) VECTOR_CST@2)
> > VECTOR_CST@3)
> >
> >
> > > + (with {
> > > +   tree op_type = TREE_TYPE (@0);
> >
> > that's the same as 'type' which is already available.
> >
> > > +   tree rshift_cst = NULL_TREE;
> > > +   tree bit_and_cst = NULL_TREE;
> > > +   tree mult_cst = NULL_TREE;
> > > +  }
> > > +  /* Make sure we're working with vectors and uniform vector
> > > + constants.  */  (if (VECTOR_TYPE_P (op_type)
> > > +   && (rshift_cst = uniform_integer_cst_p (@1))
> > > +   && (bit_and_cst = uniform_integer_cst_p (@2))
> > > +   && (mult_cst = uniform_integer_cst_p (@3)))
> > > +   /* Compute what constants would be needed for this to represent a
> > packed
> > > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > > +   (with {
> > > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (op_type);
> > > + HOST_WIDE_INT vec_nelts = TYPE_VECTOR_SUBPARTS
> > > + (op_type).to_constant ();
> >
> > you need to check that this isn't a VLA vector operation.
>
> Seems like this pattern should be applicable to VLA as well no?
> So could we not keep vec_nelts as a poly and just use exact_div
> Below in the division? The pattern is only valid if cmp_bits_i is a
> multiple of vec_elem_bits anyway.  The build_vector_* should then
> do the right thing.

Seems like we never agreed on what should go into the next version.
Am I right in assuming that applicability to VLA is ok and that we
should primarily focus on addressing the below comments for v2?

Cheers,
Philipp.

> >
> > > + HOST_WIDE_INT vec_bits = vec_elem_bits * vec_nelts;
> > > +
> > > + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> > > + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> > > + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> >
> > and that the rshift_cst and others actually

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-22 Thread Wilco Dijkstra via Gcc-patches

Hi Richard,

> I guess an obvious question is: if 1 (rather than 2) was the right value
> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA
> pipes?  It would be good to clarify how, conceptually, the core property
> should map to the fma_reassoc_width value.

1 turns off reassociation so that FMAs get properly formed. After reassociation 
far
fewer FMAs get formed so we end up with more FLOPS which means slower execution.
It's a significant slowdown on cores that are not wide, have only 1 or 2 FP 
pipes and
may have high FP latencies. So we turn it off by default on all older cores.

> It sounds from the existing comment like the main motivation for returning 1
> was to encourage more FMAs to be formed, rather than to prevent FMAs from
> being reassociated.  Is that no longer an issue?  Or is the point that,
> with more FMA pipes, lower FMA formation is a price worth paying for
> the better parallelism we get when FMAs can be formed?

Exactly. A wide CPU can deal with the extra instructions, so the loss from fewer
FMAs ends up lower than the speedup from the extra parallelism. Having more FMAs
will be even faster of course.

> Does this code ever see opc == FMA?

No, that's the problem, reassociation ignores the fact that we actually want 
FMAs. A smart
reassociation pass could form more FMAs while also increasing parallelism, but 
the way it
currently works always results in fewer FMAs.

Cheers,
Wilco

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Sandiford via Gcc-patches

Tamar Christina via Gcc-patches  writes:
>> So it's not easily possible the within current infrastructure.  But it does 
>> look
>> like ARM might eventually benefit from something like STV on x86?
>> 
>
> I'm not sure.  The problem with trying to do this in RTL is that you'd have 
> to be
> able to decide from two psuedos whether they come from extracts that are
> sequential. When coming in from a hard register that's easy yes.  When coming 
> in
> from a load, or any other operation that produces psuedos that becomes harder.

Yeah.

Just in case anyone reading the above is tempted to implement STV for
AArch64: I think it would set a bad precedent if we had a paste-&-adjust
version of the x86 pass.  AFAIK, the target capabilities and constraints
are mostly modelled correctly using existing mechanisms, so I don't
think there's anything particularly target-specific about the process
of forcing things to be on the general or SIMD/FP side.

So if we did have an STV-ish thing for AArch64, I think it should be
a target-independent pass that uses hooks and recog, even if the pass
is initially enabled for AArch64 only.

(FWIW, on the patch itself, I tend to agree that this is really an
SLP optimisation.  If the vectoriser fails to see the benefit, or if
it fails to handle more complex cases, then it would be good to try
to fix that.)

Thanks,
Richard

[committed] libstdc++: Fix pool resource build errors for H8 [PR107801]

2022-11-22 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, Pushed to trunk. Backports will follow.

-- >8 --

The array of pool sizes was previously adjusted to work for msp430-elf
which has 16-bit int and either 16-bit size_t or 20-bit size_t. The
largest pool sizes were disabled unless size_t has more than 20 bits.

The H8 family has 16-bit int but 32-bit size_t, which means that the
largest sizes are enabled, but 1<<15 produces a negative number that
then cannot be narrowed to size_t.

Replace the test for 32-bit size_t with a test for 32-bit int, which
means we won't use the 4kiB to 4MiB pools for targets with 16-bit int
even if they have a wider size_t.

libstdc++-v3/ChangeLog:

PR libstdc++/107801
* src/c++17/memory_resource.cc (pool_sizes): Disable large pools
for targets with 16-bit int.
---
 libstdc++-v3/src/c++17/memory_resource.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++17/memory_resource.cc 
b/libstdc++-v3/src/c++17/memory_resource.cc
index 651d07489aa..0bd94dbc6a7 100644
--- a/libstdc++-v3/src/c++17/memory_resource.cc
+++ b/libstdc++-v3/src/c++17/memory_resource.cc
@@ -873,9 +873,11 @@ namespace pmr
   256, 320, 384, 448,
   512, 768,
 #if __SIZE_WIDTH__ > 16
+  // Use bigger pools if size_t has at least 20 bits.
   1024, 1536,
   2048, 3072,
-#if __SIZE_WIDTH__ > 20
+#if __INT_WIDTH__ >= 32
+  // Use even bigger pools if int has at least 32 bits.
   1<<12, 1<<13, 1<<14,
   1<<15, 1<<16, 1<<17,
   1<<20, 1<<21, 1<<22 // 4MB should be enough for anybody
-- 
2.38.1

Re: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic

2022-11-22 Thread Andrea Corallo via Gcc-patches

Christophe Lyon  writes:

> On 11/17/22 17:37, Andrea Corallo via Gcc-patches wrote:
>> From: Stam Markianos-Wright 
>> In the past we had only defined the vsubq_x generic overload of the
>> vsubq_x_* intrinsics for float vector types.  This would cause them
>> to fall back to the `__ARM_undef` failure state if they was called
>> through the generic version.
>> This patch simply adds these overloads.
>> gcc/ChangeLog:
>>  * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>>   (__arm_vsubq_x Integer): New.
>
> Hi Stam,
>
> To hopefully help Kyrill in the review, I think this fix is tested by
> patch #19, where we now have
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> (this line explains why this bug was not noticed so far)
>
> Thanks,
>
> Christophe

Exactly

PS also the fact that now tests are 'check-function-bodies' should catch
that.

Thanks

  Andrea

RE: [PATCH] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2022-11-22 Thread Tamar Christina via Gcc-patches

Hi,

> -Original Message-
> From: Philipp Tomsich 
> Sent: Tuesday, November 22, 2022 10:35 AM
> To: Tamar Christina 
> Cc: Richard Biener ; mtsamis
> ; GCC Patches ;
> jiangning@amperecomputing.com
> Subject: Re: [PATCH] Add pattern to convert vector shift + bitwise and +
> multiply to vector compare in some cases.
> 
> Richard & Tamar,
> 
> On Fri, 26 Aug 2022 at 15:29, Tamar Christina 
> wrote:
> >
> > > -Original Message-
> > > From: Gcc-patches  > > bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Richard
> > > Biener via Gcc-patches
> > > Sent: Friday, August 26, 2022 10:08 AM
> > > To: mtsamis 
> > > Cc: GCC Patches ;
> > > jiangning@amperecomputing.com; Philipp Tomsich
> > > 
> > > Subject: Re: [PATCH] Add pattern to convert vector shift + bitwise
> > > and + multiply to vector compare in some cases.
> > >
> > > On Sat, Aug 13, 2022 at 11:59 AM mtsamis 
> wrote:
> > > >
> > > > When using SWAR (SIMD in a register) techniques a comparison
> > > > operation within such a register can be made by using a
> > > > combination of shifts, bitwise and and multiplication. If code
> > > > using this scheme is vectorized then there is potential to replace
> > > > all these operations with a single vector comparison, by
> > > > reinterpreting the vector types to
> > > match the width of the SWAR register.
> > > >
> > > > For example, for the test function packed_cmp_16_32, the original
> > > generated code is:
> > > >
> > > > ldr q0, [x0]
> > > > add w1, w1, 1
> > > > ushrv0.4s, v0.4s, 15
> > > > and v0.16b, v0.16b, v2.16b
> > > > shl v1.4s, v0.4s, 16
> > > > sub v0.4s, v1.4s, v0.4s
> > > > str q0, [x0], 16
> > > > cmp w2, w1
> > > > bhi .L20
> > > >
> > > > with this pattern the above can be optimized to:
> > > >
> > > > ldr q0, [x0]
> > > > add w1, w1, 1
> > > > cmltv0.8h, v0.8h, #0
> > > > str q0, [x0], 16
> > > > cmp w2, w1
> > > > bhi .L20
> > > >
> > > > The effect is similar for x86-64.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > > > cases.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
> > > >
> > > > Signed-off-by: mtsamis 
> > > > ---
> > > >  gcc/match.pd  | 57 +++
> > > >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72
> > > +++
> > > >  2 files changed, 129 insertions(+)  create mode 100644
> > > > gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > > > 8bbc0dbd5cd..5c768a94846 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -301,6 +301,63 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >  (view_convert (bit_and:itype (view_convert @0)
> > > >  (ne @1 { build_zero_cst (type);
> > > > })))
> > > >
> > > > +/* In SWAR (SIMD in a register) code a comparison of packed data can
> > > > +   be consturcted with a particular combination of shift, bitwise and,
> > > > +   and multiplication by constants.  If that code is vectorized we can
> > > > +   convert this pattern into a more efficient vector comparison.
> > > > +*/ (simplify  (mult (bit_and (rshift @0 @1) @2) @3)
> > >
> > > You should restrict the pattern a bit more, below you use
> > > uniform_integer_cst_p and also require a vector type thus
> > >
> > >   (simplify
> > >(mult (bit_and (rshift @0 VECTOR_CST@1) VECTOR_CST@2)
> > > VECTOR_CST@3)
> > >
> > >
> > > > + (with {
> > > > +   tree op_type = TREE_TYPE (@0);
> > >
> > > that's the same as 'type' which is already available.
> > >
> > > > +   tree rshift_cst = NULL_TREE;
> > > > +   tree bit_and_cst = NULL_TREE;
> > > > +   tree mult_cst = NULL_TREE;
> > > > +  }
> > > > +  /* Make sure we're working with vectors and uniform vector
> > > > + constants.  */  (if (VECTOR_TYPE_P (op_type)
> > > > +   && (rshift_cst = uniform_integer_cst_p (@1))
> > > > +   && (bit_and_cst = uniform_integer_cst_p (@2))
> > > > +   && (mult_cst = uniform_integer_cst_p (@3)))
> > > > +   /* Compute what constants would be needed for this to
> > > > + represent a
> > > packed
> > > > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > > > +   (with {
> > > > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (op_type);
> > > > + HOST_WIDE_INT vec_nelts = TYPE_VECTOR_SUBPARTS
> > > > + (op_type).to_constant ();
> > >
> > > you need to check that this isn't a VLA vector operation.
> >
> > Seems like this pattern should be applicable to VLA as well no?
> > So could we not keep vec_nelts as a poly and just use exact_div Below
> > in the division? The pattern is only valid if cmp_bits_i is a multiple
> > of vec_elem_bits anyway.  Th

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Sandiford via Gcc-patches

Christophe Lyon via Gcc-patches  writes:
> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
> big-endian, because the _Decimal32 on-stack argument is not padded in
> the same direction depending on endianness.
>
> This patch fixes the testcase so that it expects the argument in the
> right stack location, similarly to what other tests do in the same
> directory.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/107604
>   * gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.

OK, thanks.

Richard

> ---
>  gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
> b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
> index 22dc462bf7c..3c45f715cf7 100644
> --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
> +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
> @@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
>ANON(struct z, a, D1)
>ANON(struct z, b, STACK)
>ANON(int , 5, W0)
> +#ifndef __AAPCS64_BIG_ENDIAN__
>ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
> +#else
> +  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
> +#endif
>LAST_ANON(_Decimal64, 0.5dd, STACK+40)
>  #endif

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, 22 Nov 2022, Richard Sandiford wrote:

> Tamar Christina via Gcc-patches  writes:
> >> So it's not easily possible the within current infrastructure.  But it 
> >> does look
> >> like ARM might eventually benefit from something like STV on x86?
> >> 
> >
> > I'm not sure.  The problem with trying to do this in RTL is that you'd have 
> > to be
> > able to decide from two psuedos whether they come from extracts that are
> > sequential. When coming in from a hard register that's easy yes.  When 
> > coming in
> > from a load, or any other operation that produces psuedos that becomes 
> > harder.
> 
> Yeah.
> 
> Just in case anyone reading the above is tempted to implement STV for
> AArch64: I think it would set a bad precedent if we had a paste-&-adjust
> version of the x86 pass.  AFAIK, the target capabilities and constraints
> are mostly modelled correctly using existing mechanisms, so I don't
> think there's anything particularly target-specific about the process
> of forcing things to be on the general or SIMD/FP side.
> 
> So if we did have an STV-ish thing for AArch64, I think it should be
> a target-independent pass that uses hooks and recog, even if the pass
> is initially enabled for AArch64 only.

Agreed - maybe some of the x86 code can be leveraged, but of course
the cost modeling is the most difficult to get right - IIRC the x86
backend resorts to backend specific tuning flags rather than trying
to get rtx_cost or insn_cost "correct" here.

> (FWIW, on the patch itself, I tend to agree that this is really an
> SLP optimisation.  If the vectoriser fails to see the benefit, or if
> it fails to handle more complex cases, then it would be good to try
> to fix that.)

Also agreed - but costing is hard ;)

Richard.

RE: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, November 22, 2022 10:59 AM
> To: Richard Sandiford 
> Cc: Tamar Christina via Gcc-patches ; Tamar
> Christina ; Richard Biener
> ; nd 
> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
> bitfields and array_refs
> 
> On Tue, 22 Nov 2022, Richard Sandiford wrote:
> 
> > Tamar Christina via Gcc-patches  writes:
> > >> So it's not easily possible the within current infrastructure.  But
> > >> it does look like ARM might eventually benefit from something like STV
> on x86?
> > >>
> > >
> > > I'm not sure.  The problem with trying to do this in RTL is that
> > > you'd have to be able to decide from two psuedos whether they come
> > > from extracts that are sequential. When coming in from a hard
> > > register that's easy yes.  When coming in from a load, or any other
> operation that produces psuedos that becomes harder.
> >
> > Yeah.
> >
> > Just in case anyone reading the above is tempted to implement STV for
> > AArch64: I think it would set a bad precedent if we had a
> > paste-&-adjust version of the x86 pass.  AFAIK, the target
> > capabilities and constraints are mostly modelled correctly using
> > existing mechanisms, so I don't think there's anything particularly
> > target-specific about the process of forcing things to be on the general or
> SIMD/FP side.
> >
> > So if we did have an STV-ish thing for AArch64, I think it should be a
> > target-independent pass that uses hooks and recog, even if the pass is
> > initially enabled for AArch64 only.
> 
> Agreed - maybe some of the x86 code can be leveraged, but of course the
> cost modeling is the most difficult to get right - IIRC the x86 backend 
> resorts
> to backend specific tuning flags rather than trying to get rtx_cost or 
> insn_cost
> "correct" here.
> 
> > (FWIW, on the patch itself, I tend to agree that this is really an SLP
> > optimisation.  If the vectoriser fails to see the benefit, or if it
> > fails to handle more complex cases, then it would be good to try to
> > fix that.)
> 
> Also agreed - but costing is hard ;)

I guess, I still disagree here but I've clearly been out-Richard.  The problem 
is still
that this is just basic codegen.  I still don't think it requires -O2 to be 
usable.

So I guess the only correct implementation is to use an STV-like patch.  But 
given
that this is already the second attempt, first RTL one was rejected by Richard,
second GIMPLE one was rejected by Richi I'd like to get an agreement on this STV
thing before I waste months more..

Thanks,
Tamar

> 
> Richard.

[PATCH] tree-optimization/107803 - abnormal cleanup from the SSA propagator

2022-11-22 Thread Richard Biener via Gcc-patches

The SSA propagator is missing abnormal cleanup which shows in a
sanity check in the uninit engine (and missed CFG verification).
The following adds that.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/107803
* tree-ssa-propagate.cc (substitute_and_fold_dom_walker): Add
need_ab_cleanup member.
(substitute_and_fold_dom_walker::before_dom_children): When
a stmt can no longer transfer control flow abnormally set
need_ab_cleanup.
(substitute_and_fold_engine::substitute_and_fold): Cleanup
abnormal control flow.

* g++.dg/pr107803.C: New testcase.
---
 gcc/testsuite/g++.dg/pr107803.C | 19 +++
 gcc/tree-ssa-propagate.cc   | 20 ++--
 2 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pr107803.C

diff --git a/gcc/testsuite/g++.dg/pr107803.C b/gcc/testsuite/g++.dg/pr107803.C
new file mode 100644
index 000..f814e968b69
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr107803.C
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-options "-O -fno-tree-dominator-opts -fno-tree-fre 
-Wmaybe-uninitialized" }
+
+void printf(...);
+void __sigsetjmp_cancel() __attribute__((__returns_twice__));
+int z, main_ret;
+void func(void *) {}
+
+int
+main()
+{
+  int x;
+  void (*__cancel_routine)(void *)(func);
+  __sigsetjmp_cancel();
+  __cancel_routine(0);
+  if (main_ret)
+x = z;
+  printf(x);
+}
diff --git a/gcc/tree-ssa-propagate.cc b/gcc/tree-ssa-propagate.cc
index 9dc4bfd85bf..d8b0aed4564 100644
--- a/gcc/tree-ssa-propagate.cc
+++ b/gcc/tree-ssa-propagate.cc
@@ -671,12 +671,14 @@ public:
   stmts_to_remove.create (0);
   stmts_to_fixup.create (0);
   need_eh_cleanup = BITMAP_ALLOC (NULL);
+  need_ab_cleanup = BITMAP_ALLOC (NULL);
 }
 ~substitute_and_fold_dom_walker ()
 {
   stmts_to_remove.release ();
   stmts_to_fixup.release ();
   BITMAP_FREE (need_eh_cleanup);
+  BITMAP_FREE (need_ab_cleanup);
 }
 
 edge before_dom_children (basic_block) final override;
@@ -689,6 +691,7 @@ public:
 vec stmts_to_remove;
 vec stmts_to_fixup;
 bitmap need_eh_cleanup;
+bitmap need_ab_cleanup;
 
 class substitute_and_fold_engine *substitute_and_fold_engine;
 
@@ -838,8 +841,13 @@ substitute_and_fold_dom_walker::before_dom_children 
(basic_block bb)
 folded.  */
   did_replace = false;
   gimple *old_stmt = stmt;
-  bool was_noreturn = (is_gimple_call (stmt)
-  && gimple_call_noreturn_p (stmt));
+  bool was_noreturn = false;
+  bool can_make_abnormal_goto = false;
+  if (is_gimple_call (stmt))
+   {
+ was_noreturn = gimple_call_noreturn_p (stmt);
+ can_make_abnormal_goto = stmt_can_make_abnormal_goto (stmt);
+   }
 
   /* Replace real uses in the statement.  */
   did_replace |= substitute_and_fold_engine->replace_uses_in (stmt);
@@ -905,6 +913,12 @@ substitute_and_fold_dom_walker::before_dom_children 
(basic_block bb)
  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
bitmap_set_bit (need_eh_cleanup, bb->index);
 
+ /* If we turned a call with possible abnormal control transfer
+into one that doesn't, remove abnormal edges.  */
+ if (can_make_abnormal_goto
+ && !stmt_can_make_abnormal_goto (stmt))
+   bitmap_set_bit (need_ab_cleanup, bb->index);
+
  /* If we turned a not noreturn call into a noreturn one
 schedule it for fixup.  */
  if (!was_noreturn
@@ -1012,6 +1026,8 @@ substitute_and_fold_engine::substitute_and_fold 
(basic_block block)
 
   if (!bitmap_empty_p (walker.need_eh_cleanup))
 gimple_purge_all_dead_eh_edges (walker.need_eh_cleanup);
+  if (!bitmap_empty_p (walker.need_ab_cleanup))
+gimple_purge_all_dead_abnormal_call_edges (walker.need_ab_cleanup);
 
   /* Fixup stmts that became noreturn calls.  This may require splitting
  blocks and thus isn't possible during the dominator walk.  Do this
-- 
2.35.3

[PATCH] Fix wrong array type conversion with different storage order

2022-11-22 Thread Eric Botcazou via Gcc-patches

Hi,

when two arrays of scalars have a different storage order in Ada, the
front-end makes sure that the conversion is performed component-wise
so that each component can be reversed.  So it's a little bit counter
productive that the ldist pass performs the opposite transformation
and synthesizes a memcpy/memmove in this case.

Tested on x86-64/Linux, OK for the mainline?


2022-11-22  Eric Botcazou  

* tree-loop-distribution.cc (loop_distribution::classify_builtin_ldst):
Bail out if source and destination do not have the same storage order.


2022-11-22  Eric Botcazou  

* gnat.dg/sso18.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc
index ed3dd73e1a9..15ae2410861 100644
--- a/gcc/tree-loop-distribution.cc
+++ b/gcc/tree-loop-distribution.cc
@@ -1790,10 +1790,15 @@ loop_distribution::classify_builtin_ldst (loop_p loop, struct graph *rdg,
   if (res != 2)
 return;
 
-  /* They much have the same access size.  */
+  /* They must have the same access size.  */
   if (!operand_equal_p (size, src_size, 0))
 return;
 
+  /* They must have the same storage order.  */
+  if (reverse_storage_order_for_component_p (DR_REF (dst_dr))
+  != reverse_storage_order_for_component_p (DR_REF (src_dr)))
+return;
+
   /* Load and store in loop nest must access memory in the same way, i.e,
  their must have the same steps in each loop of the nest.  */
   if (dst_steps.length () != src_steps.length ())
--  { dg-do run }
--  { dg-options "-O2" }

with System;

procedure SSO18 is

  type Arr is array (1..32) of Short_Integer;
  type Rev_Arr is array (1..32) of Short_Integer
with Scalar_Storage_Order => System.High_Order_First;
  C : constant Arr := (others => 16);
  RA : Rev_Arr;
  A  : Arr;

begin
  RA := Rev_Arr(C);
  A := Arr (RA);
  if A /= C or else RA(1) /= 16 then
 raise Program_Error;
  end if;
end;

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Biener 
>> Sent: Tuesday, November 22, 2022 10:59 AM
>> To: Richard Sandiford 
>> Cc: Tamar Christina via Gcc-patches ; Tamar
>> Christina ; Richard Biener
>> ; nd 
>> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
>> bitfields and array_refs
>>
>> On Tue, 22 Nov 2022, Richard Sandiford wrote:
>>
>> > Tamar Christina via Gcc-patches  writes:
>> > >> So it's not easily possible the within current infrastructure.  But
>> > >> it does look like ARM might eventually benefit from something like STV
>> on x86?
>> > >>
>> > >
>> > > I'm not sure.  The problem with trying to do this in RTL is that
>> > > you'd have to be able to decide from two psuedos whether they come
>> > > from extracts that are sequential. When coming in from a hard
>> > > register that's easy yes.  When coming in from a load, or any other
>> operation that produces psuedos that becomes harder.
>> >
>> > Yeah.
>> >
>> > Just in case anyone reading the above is tempted to implement STV for
>> > AArch64: I think it would set a bad precedent if we had a
>> > paste-&-adjust version of the x86 pass.  AFAIK, the target
>> > capabilities and constraints are mostly modelled correctly using
>> > existing mechanisms, so I don't think there's anything particularly
>> > target-specific about the process of forcing things to be on the general or
>> SIMD/FP side.
>> >
>> > So if we did have an STV-ish thing for AArch64, I think it should be a
>> > target-independent pass that uses hooks and recog, even if the pass is
>> > initially enabled for AArch64 only.
>>
>> Agreed - maybe some of the x86 code can be leveraged, but of course the
>> cost modeling is the most difficult to get right - IIRC the x86 backend 
>> resorts
>> to backend specific tuning flags rather than trying to get rtx_cost or 
>> insn_cost
>> "correct" here.
>>
>> > (FWIW, on the patch itself, I tend to agree that this is really an SLP
>> > optimisation.  If the vectoriser fails to see the benefit, or if it
>> > fails to handle more complex cases, then it would be good to try to
>> > fix that.)
>>
>> Also agreed - but costing is hard ;)
>
> I guess, I still disagree here but I've clearly been out-Richard.  The 
> problem is still
> that this is just basic codegen.  I still don't think it requires -O2 to be 
> usable.
>
> So I guess the only correct implementation is to use an STV-like patch.  But 
> given
> that this is already the second attempt, first RTL one was rejected by 
> Richard,
> second GIMPLE one was rejected by Richi I'd like to get an agreement on this 
> STV
> thing before I waste months more..

I don't think this in itself is a good motivation for STV.  My comment
above was more about the idea of STV for AArch64 in general (since it
had been raised).

Personally I still think the reduction should be generated in gimple.

Thanks,
Richard

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, 22 Nov 2022, Richard Sandiford wrote:

> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Biener 
> >> Sent: Tuesday, November 22, 2022 10:59 AM
> >> To: Richard Sandiford 
> >> Cc: Tamar Christina via Gcc-patches ; Tamar
> >> Christina ; Richard Biener
> >> ; nd 
> >> Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
> >> bitfields and array_refs
> >>
> >> On Tue, 22 Nov 2022, Richard Sandiford wrote:
> >>
> >> > Tamar Christina via Gcc-patches  writes:
> >> > >> So it's not easily possible the within current infrastructure.  But
> >> > >> it does look like ARM might eventually benefit from something like STV
> >> on x86?
> >> > >>
> >> > >
> >> > > I'm not sure.  The problem with trying to do this in RTL is that
> >> > > you'd have to be able to decide from two psuedos whether they come
> >> > > from extracts that are sequential. When coming in from a hard
> >> > > register that's easy yes.  When coming in from a load, or any other
> >> operation that produces psuedos that becomes harder.
> >> >
> >> > Yeah.
> >> >
> >> > Just in case anyone reading the above is tempted to implement STV for
> >> > AArch64: I think it would set a bad precedent if we had a
> >> > paste-&-adjust version of the x86 pass.  AFAIK, the target
> >> > capabilities and constraints are mostly modelled correctly using
> >> > existing mechanisms, so I don't think there's anything particularly
> >> > target-specific about the process of forcing things to be on the general 
> >> > or
> >> SIMD/FP side.
> >> >
> >> > So if we did have an STV-ish thing for AArch64, I think it should be a
> >> > target-independent pass that uses hooks and recog, even if the pass is
> >> > initially enabled for AArch64 only.
> >>
> >> Agreed - maybe some of the x86 code can be leveraged, but of course the
> >> cost modeling is the most difficult to get right - IIRC the x86 backend 
> >> resorts
> >> to backend specific tuning flags rather than trying to get rtx_cost or 
> >> insn_cost
> >> "correct" here.
> >>
> >> > (FWIW, on the patch itself, I tend to agree that this is really an SLP
> >> > optimisation.  If the vectoriser fails to see the benefit, or if it
> >> > fails to handle more complex cases, then it would be good to try to
> >> > fix that.)
> >>
> >> Also agreed - but costing is hard ;)
> >
> > I guess, I still disagree here but I've clearly been out-Richard.  The 
> > problem is still
> > that this is just basic codegen.  I still don't think it requires -O2 to be 
> > usable.
> >
> > So I guess the only correct implementation is to use an STV-like patch.  
> > But given
> > that this is already the second attempt, first RTL one was rejected by 
> > Richard,
> > second GIMPLE one was rejected by Richi I'd like to get an agreement on 
> > this STV
> > thing before I waste months more..
> 
> I don't think this in itself is a good motivation for STV.  My comment
> above was more about the idea of STV for AArch64 in general (since it
> had been raised).
> 
> Personally I still think the reduction should be generated in gimple.

I agree, and the proper place to generate the reduction is in SLP.

Richard.

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches





On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
  gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
ANON(struct z, a, D1)
ANON(struct z, b, STACK)
ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
+#endif
LAST_ANON(_Decimal64, 0.5dd, STACK+40)
  #endif


Why would a Decimal32 change stack placement based on the endianness? 
Isn't it a 4-byte object?

Re: [PATCH RFA(configure)] c++: provide strchrnul on targets without it [PR107781]

2022-11-22 Thread Jakub Jelinek via Gcc-patches

On Tue, Nov 22, 2022 at 09:41:24AM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Nov 21, 2022 at 06:31:47PM -0500, Jason Merrill via Gcc-patches wrote:
> > Tested x86_64-pc-linux-gnu, and also manually changing the 
> > HAVE_DECL_STRCHRNUL
> > flag.  OK for trunk?
> > 
> > -- 8< --
> > 
> > The Contracts implementation uses strchrnul, which is a glibc extension, so
> > bootstrap broke on non-glibc targets.  I considered unconditionally using a
> > local definition, but I guess we might as well use the libc version if it
> > exists.
> > 
> > PR c++/107781
> > 
> > gcc/cp/ChangeLog:
> > 
> > * contracts.cc (strchrnul): Define if needed.
> > 
> > gcc/ChangeLog:
> > 
> > * configure.ac: Check for strchrnul.
> > * config.in, configure: Regenerate.
> 
> Normally we'd add such a local definition to libiberty, shouldn't we do it
> in this case too?

Or use strcspn as Jonathan posted in the PR, at least glibc will handle
it as strchrnul (start, reject[0]) - start early in the strcspn
implementation.

Jakub

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Sandiford via Gcc-patches

Richard Earnshaw via Gcc-patches  writes:
> On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
>> gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
>> big-endian, because the _Decimal32 on-stack argument is not padded in
>> the same direction depending on endianness.
>> 
>> This patch fixes the testcase so that it expects the argument in the
>> right stack location, similarly to what other tests do in the same
>> directory.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  PR target/107604
>>  * gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
>> ---
>>   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
>>   1 file changed, 4 insertions(+)
>> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
>> b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> index 22dc462bf7c..3c45f715cf7 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
>> @@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
>> ANON(struct z, a, D1)
>> ANON(struct z, b, STACK)
>> ANON(int , 5, W0)
>> +#ifndef __AAPCS64_BIG_ENDIAN__
>> ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
>> +#else
>> +  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
>> +#endif
>> LAST_ANON(_Decimal64, 0.5dd, STACK+40)
>>   #endif
>
> Why would a Decimal32 change stack placement based on the endianness? 
> Isn't it a 4-byte object?

Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack arguments.

Richard

Re: [PATCH] Fix wrong array type conversion with different storage order

2022-11-22 Thread Richard Biener via Gcc-patches

On Tue, Nov 22, 2022 at 12:06 PM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> when two arrays of scalars have a different storage order in Ada, the
> front-end makes sure that the conversion is performed component-wise
> so that each component can be reversed.  So it's a little bit counter
> productive that the ldist pass performs the opposite transformation
> and synthesizes a memcpy/memmove in this case.
>
> Tested on x86-64/Linux, OK for the mainline?

OK for trunk and branches.

Richard.

>
> 2022-11-22  Eric Botcazou  
>
> * tree-loop-distribution.cc 
> (loop_distribution::classify_builtin_ldst):
> Bail out if source and destination do not have the same storage order.
>
>
> 2022-11-22  Eric Botcazou  
>
> * gnat.dg/sso18.adb: New test.
>
> --
> Eric Botcazou

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches





On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
 ANON(struct z, a, D1)
 ANON(struct z, b, STACK)
 ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
 ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to _Decimal64.  */
+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to _Decimal64.  */
+#endif
 LAST_ANON(_Decimal64, 0.5dd, STACK+40)
   #endif


Why would a Decimal32 change stack placement based on the endianness?
Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack arguments.

Richard


Ah, OK.

I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.


I notice the new ANON definition is not correctly indented.

R.

Re: [PATCH 1/2] Fortran: Cleanup struct ext_attr_t

2022-11-22 Thread Mikael Morin


Le 21/11/2022 à 21:34, Bernhard Reutner-Fischer a écrit :

On Mon, 21 Nov 2022 12:08:20 +0100
Mikael Morin  wrote:


* gfortran.h (struct ext_attr_t): Remove middle_end_name.
* trans-decl.cc (add_attributes_to_decl): Move building
tree_list to ...
* decl.cc (gfc_match_gcc_attributes): ... here. Add the attribute to
the tree_list for the middle end.
   

I prefer to not do any middle-end stuff at parsing time, so I would
rather not do this change.
Not OK.


Ok, that means we should filter-out those bits that we don't want to
write to the module (right?). We've plenty of bits left, more than Dave
Love would want to have added, i hope, so that should not be much of a
concern.

I didn't think of modules.  Yes, that means we have to store (in memory) 
the attribute we have parsed, and we can filter-out the attributes at 
the time the attributes are written to the module.  I don't think it is 
strictly necessary (for flatten, at least) though.



What that table really wants to say is whether or not this attribute
should be passed to the ME. Would it be acceptable to remove these
duplicate strings and just have a bool/char/int that is true if it
should be lowered (in trans-decl, as before)? But now i admit it's just
bikeshedding and we can as well leave it alone for now.. It was just a
though.


Yes, that would be acceptable.

Re: [PATCH 1/2] symtab: also change RTL decl name

2022-11-22 Thread Jan Hubicka via Gcc-patches

> On Mon, 21 Nov 2022 20:02:49 +0100
> Jan Hubicka  wrote:
> 
> > > Hi Honza, Ping.
> > > Regtests cleanly for c,fortran,c++,ada,d,go,lto,objc,obj-c++
> > > Ok?
> > > I'd need this for attribute target_clones for the Fortran FE.  
> > Sorry for delay here.
> > > >  void
> > > > @@ -303,6 +301,10 @@ symbol_table::change_decl_assembler_name (tree 
> > > > decl, tree name)
> > > > warning (0, "%qD renamed after being referenced in assembly", 
> > > > decl);
> > > >  
> > > >SET_DECL_ASSEMBLER_NAME (decl, name);
> > > > +  /* Set the new name in rtl.  */
> > > > +  if (DECL_RTL_SET_P (decl))
> > > > +   XSTR (XEXP (DECL_RTL (decl), 0), 0) = IDENTIFIER_POINTER 
> > > > (name);  
> > 
> > I am not quite sure how safe this is.  We generally produce DECL_RTL
> > when we produce assembly file.  So if DECL_RTL is set then we probably
> > already output the original function name and it is too late to change
> > it.
> 
> AFAICS we make_decl_rtl in the fortran FE in trans_function_start.

I see, it may be a relic of something that is no longer necessary.  Can
you see why one needs DECL_RTL so early?
> 
> > 
> > Also RTL is shared so changing it in-place is going to rewrite all the
> > existing RTL expressions using it.
> > 
> > Why the DECL_RTL is produced for function you want to rename?
> 
> I think the fortran FE sets it quite early when lowering a function.
> Later, when the ME creates the target_clones, it wants to rename the
> initial function to initial_fun.default for the default target.
> That's where the change_decl_assembler_name is called (only on the
> decl).
> But nobody changes the RTL name, so the ifunc (which should be the
> initial, unchanged name) is properly emitted but
> assemble_start_function uses the same, unchanged, initial fnname it
> later obtains by get_fnname_from_decl which fetches the (wrong) initial
> name where it should use the .default target name.
> See
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605081.html
> 
> I'm open to other suggestions to make this work in a different way, of
> course. Maybe we're missing some magic somewhere that might share the
> name between the fndecl and the RTL XSTR so the RTL is magically
> updated by that single SET_ECL_ASSEMBLER_NAME in
> change_decl_assembler_name? But i didn't quite see where that'd be?

I think we should start by understanding why Fortran FE produces
DECL_RTL early.  It was written before symbol table code emerged, it may
be simply an oversight I made while converting FE to symbol table.

Honza
> 
> thanks,
> 
> > Honza
> > > > +
> > > >if (alias)
> > > > {
> > > >   IDENTIFIER_TRANSPARENT_ALIAS (name) = 1;  
> > >   
>

Re: [PATCH] maintainer-scripts/gcc_release: compress xz in parallel

2022-11-22 Thread Richard Sandiford via Gcc-patches

Sam James via Gcc-patches  writes:
>> On 8 Nov 2022, at 07:14, Sam James  wrote:
>> 
>> 1. This should speed up decompression for folks, as parallel xz
>>   creates a different archive which can be decompressed in parallel.
>> 
>>   Note that this different method is enabled by default in a new
>>   xz release coming shortly anyway (>= 5.3.3_alpha1).
>> 
>>   I build GCC regularly from the weekly snapshots
>>   and so the decompression time adds up.
>> 
>> 2. It should speed up compression on the webserver a bit.
>> 
>>   Note that -T0 won't be the default in the new xz release,
>>   only the parallel compression mode (which enables parallel
>>   decompression).
>> 
>>   -T0 detects the number of cores available.
>> 
>>   So, if a different number of threads is preferred, it's fine
>>   to set e.g. -T2, etc.
>> 
>> Signed-off-by: Sam James 
>> ---
>> maintainer-scripts/gcc_release | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> 
>
> Given no disagreements, anyone fancy pushing
> this in time for Sunday evening for the next 13
> snapshot? ;)

I didn't see an explicit ACK or NACK, but it looks good to me.  I'll push
tomorrow if there are no objections before then.

Thanks,
Richard

[PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-22 Thread Florian Weimer via Gcc-patches

Without this change, finish_declspecs cannot tell that whether there
was an erroneous type specified, or no type at all.  This may result
in additional diagnostics for implicit ints, or missing diagnostics
for multiple types.

PR c/107805

gcc/c/
* c-decl.cc (declspecs_add_type): Propagate error_mark_bode
from type to specs.

gcc/testsuite/
* gcc.dg/pr107805-1.c: New test.
* gcc.dg/pr107805-1.c: Likewise.

---
Note regarding testing: I boostrap with c,c++,lto on x86-64
(non-multlib) and diffed these .sum files:

gcc/testsuite/gcc/gcc.sum
gcc/testsuite/g++/g++.sum
x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum
x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum
x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum
x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum

Apart from timestamps, the only differences I get is this change:

--- ./gcc/testsuite/gcc/gcc.sum 2022-11-22 05:45:33.813264761 -0500
+++ /tmp/b/build/./gcc/testsuite/gcc/gcc.sum2022-11-22 06:39:10.667590185 
-0500
@@ -83303,6 +83303,11 @@
 PASS: gcc.dg/pr107618.c  (test for bogus messages, line 9)
 PASS: gcc.dg/pr107618.c (test for excess errors)
 PASS: gcc.dg/pr107686.c (test for excess errors)
+PASS: gcc.dg/pr107805-1.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-1.c (test for excess errors)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 3)
+PASS: gcc.dg/pr107805-2.c  (test for errors, line 4)
+PASS: gcc.dg/pr107805-2.c (test for excess errors)
 PASS: gcc.dg/pr11459-1.c (test for excess errors)
 PASS: gcc.dg/pr11492.c  (test for bogus messages, line 8)
 PASS: gcc.dg/pr11492.c (test for excess errors)
@@ -190486,7 +190491,7 @@
 
=== gcc Summary ===
 
-# of expected passes   185932
+# of expected passes   185937
 # of unexpected failures   99
 # of unexpected successes  20
 # of expected failures 1484

So I think this means there are no test suite regressions.

Thanks,
Florian

 gcc/c/c-decl.cc   | 6 ++
 gcc/testsuite/gcc.dg/pr107805-1.c | 5 +
 gcc/testsuite/gcc.dg/pr107805-2.c | 4 
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 098e475f65d..4adb89e4aaf 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -12243,11 +12243,9 @@ declspecs_add_type (location_t loc, struct c_declspecs 
*specs,
 error_at (loc, "two or more data types in declaration specifiers");
   else if (TREE_CODE (type) == TYPE_DECL)
 {
-  if (TREE_TYPE (type) == error_mark_node)
-   ; /* Allow the type to default to int to avoid cascading errors.  */
-  else
+  specs->type = TREE_TYPE (type);
+  if (TREE_TYPE (type) != error_mark_node)
{
- specs->type = TREE_TYPE (type);
  specs->decl_attr = DECL_ATTRIBUTES (type);
  specs->typedef_p = true;
  specs->explicit_signed_p = C_TYPEDEF_EXPLICITLY_SIGNED (type);
diff --git a/gcc/testsuite/gcc.dg/pr107805-1.c 
b/gcc/testsuite/gcc.dg/pr107805-1.c
new file mode 100644
index 000..559b6a5586e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-1.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t x; /* No warning here.  */
+
diff --git a/gcc/testsuite/gcc.dg/pr107805-2.c 
b/gcc/testsuite/gcc.dg/pr107805-2.c
new file mode 100644
index 000..fa5fa4ce273
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr107805-2.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+typedef int t;
+typedef struct { double a; int b; } t; /* { dg-error "conflicting types" } */
+t char x; /* { dg-error "two or more data types" } */

base-commit: e4faee8d02ec5d65bf418612f7181823eb08c078

[COMMITTED] ada: Fix recent assertion failure on GPR2

2022-11-22 Thread Marc Poulhiès via Gcc-patches

From: Eric Botcazou 

It's the compiler trying to load the nonexistent body of a generic package
when trying to inline a call to an expression function of this package that
has a pre or post-condition (hence the need for -gnata to trigger the ICE).

gcc/ada/

* contracts.adb (Build_Subprogram_Contract_Wrapper): Do not fiddle
with the Was_Expression_Function flag. Move a few lines around.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/contracts.adb | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/contracts.adb b/gcc/ada/contracts.adb
index fef3d24870f..6f474eb2944 100644
--- a/gcc/ada/contracts.adb
+++ b/gcc/ada/contracts.adb
@@ -1691,6 +1691,10 @@ package body Contracts is
   Set_Debug_Info_Needed  (Wrapper_Id);
   Set_Wrapped_Statements (Subp_Id, Wrapper_Id);
 
+  Set_Has_Pragma_Inline (Wrapper_Id, Has_Pragma_Inline (Subp_Id));
+  Set_Has_Pragma_Inline_Always
+(Wrapper_Id, Has_Pragma_Inline_Always (Subp_Id));
+
   --  Create specification and declaration for the wrapper
 
   if No (Ret_Type) or else Ret_Type = Standard_Void_Type then
@@ -1719,20 +1723,6 @@ package body Contracts is
 Make_Handled_Sequence_Of_Statements (Loc,
   End_Label  => Make_Identifier (Loc, Chars (Wrapper_Id;
 
-  --  Move certain flags which are relevant to the body
-
-  --  Wouldn't a better way be to perform some sort of copy of Body_Decl
-  --  for Wrapper_Body be less error-prone ???
-
-  if Was_Expression_Function (Body_Decl) then
- Set_Was_Expression_Function (Body_Decl, False);
- Set_Was_Expression_Function (Wrapper_Body);
-  end if;
-
-  Set_Has_Pragma_Inline (Wrapper_Id, Has_Pragma_Inline (Subp_Id));
-  Set_Has_Pragma_Inline_Always
-(Wrapper_Id, Has_Pragma_Inline_Always (Subp_Id));
-
   --  Prepend a call to the wrapper when the subprogram is a procedure
 
   if No (Ret_Type) or else Ret_Type = Standard_Void_Type then
-- 
2.34.1

[COMMITTED] ada: Fix formatting glitches in Make_Tag_Assignment

2022-11-22 Thread Marc Poulhiès via Gcc-patches

From: Eric Botcazou 

gcc/ada/

* exp_ch3.adb (Make_Tag_Assignment): Fix formatting glitches.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 7b194bb9816..2661a3ff9f6 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -11769,37 +11769,39 @@ package body Exp_Ch3 is
 
function Make_Tag_Assignment (N : Node_Id) return Node_Id is
   Loc  : constant Source_Ptr := Sloc (N);
-  Def_If   : constant Entity_Id := Defining_Identifier (N);
-  Expr : constant Node_Id := Expression (N);
-  Typ  : constant Entity_Id := Etype (Def_If);
-  Full_Typ : constant Entity_Id := Underlying_Type (Typ);
+  Def_If   : constant Entity_Id  := Defining_Identifier (N);
+  Expr : constant Node_Id:= Expression (N);
+  Typ  : constant Entity_Id  := Etype (Def_If);
+  Full_Typ : constant Entity_Id  := Underlying_Type (Typ);
+
   New_Ref  : Node_Id;
 
begin
-  --  This expansion activity is called during analysis.
+  --  This expansion activity is called during analysis
 
   if Is_Tagged_Type (Typ)
-   and then not Is_Class_Wide_Type (Typ)
-   and then not Is_CPP_Class (Typ)
-   and then Tagged_Type_Expansion
-   and then Nkind (Expr) /= N_Aggregate
-   and then (Nkind (Expr) /= N_Qualified_Expression
-  or else Nkind (Expression (Expr)) /= N_Aggregate)
+and then not Is_Class_Wide_Type (Typ)
+and then not Is_CPP_Class (Typ)
+and then Tagged_Type_Expansion
+and then Nkind (Expr) /= N_Aggregate
+and then (Nkind (Expr) /= N_Qualified_Expression
+   or else Nkind (Expression (Expr)) /= N_Aggregate)
   then
  New_Ref :=
Make_Selected_Component (Loc,
-  Prefix=> New_Occurrence_Of (Def_If, Loc),
-  Selector_Name =>
-New_Occurrence_Of (First_Tag_Component (Full_Typ), Loc));
+ Prefix=> New_Occurrence_Of (Def_If, Loc),
+ Selector_Name =>
+   New_Occurrence_Of (First_Tag_Component (Full_Typ), Loc));
+
  Set_Assignment_OK (New_Ref);
 
  return
Make_Assignment_Statement (Loc,
-  Name   => New_Ref,
-  Expression =>
-Unchecked_Convert_To (RTE (RE_Tag),
-  New_Occurrence_Of (Node
-  (First_Elmt (Access_Disp_Table (Full_Typ))), Loc)));
+ Name   => New_Ref,
+ Expression =>
+   Unchecked_Convert_To (RTE (RE_Tag),
+ New_Occurrence_Of
+   (Node (First_Elmt (Access_Disp_Table (Full_Typ))), Loc)));
   else
  return Empty;
   end if;
-- 
2.34.1

[COMMITTED] ada: Adjust number of errors when removing warning in dead code

2022-11-22 Thread Marc Poulhiès via Gcc-patches

From: Piotr Trojanek 

When a warning about a runtime exception is emitted for a code in
generic instance, we add continuation warnings "in instantiation ..."
and only the original message increase the total number of errors.

When removing these messages, e.g. after detecting that the code inside
generic instance is dead, we must decrease the total number of errors,
as otherwise the compiler exit status might stop gnatmake or gprbuild.

gcc/ada/

* errout.adb (To_Be_Removed): Decrease total number of errors when
removing a warning that has been escalated into error.
* erroutc.adb (dmsg): Print Warn_Runtime_Raise flag.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/errout.adb  | 11 +++
 gcc/ada/erroutc.adb | 35 ++-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/errout.adb b/gcc/ada/errout.adb
index afa30674fa3..b30e8b51d15 100644
--- a/gcc/ada/errout.adb
+++ b/gcc/ada/errout.adb
@@ -3351,6 +3351,17 @@ package body Errout is
   Warning_Info_Messages := Warning_Info_Messages - 1;
end if;
 
+   --  When warning about a runtime exception has been escalated
+   --  into error, the starting message has increased the total
+   --  errors counter, so here we decrease this counter.
+
+   if Errors.Table (E).Warn_Runtime_Raise
+ and then not Errors.Table (E).Msg_Cont
+ and then Warning_Mode = Treat_Run_Time_Warnings_As_Errors
+   then
+  Total_Errors_Detected := Total_Errors_Detected - 1;
+   end if;
+
return True;
 
 --  No removal required
diff --git a/gcc/ada/erroutc.adb b/gcc/ada/erroutc.adb
index 7766c972730..d40c668be8a 100644
--- a/gcc/ada/erroutc.adb
+++ b/gcc/ada/erroutc.adb
@@ -312,32 +312,33 @@ package body Erroutc is
 
begin
   w ("Dumping error message, Id = ", Int (Id));
-  w ("  Text = ", E.Text.all);
-  w ("  Next = ", Int (E.Next));
-  w ("  Prev = ", Int (E.Prev));
-  w ("  Sfile= ", Int (E.Sfile));
+  w ("  Text   = ", E.Text.all);
+  w ("  Next   = ", Int (E.Next));
+  w ("  Prev   = ", Int (E.Prev));
+  w ("  Sfile  = ", Int (E.Sfile));
 
   Write_Str
-("  Sptr = ");
+("  Sptr   = ");
   Write_Location (E.Sptr.Ptr);  --  ??? Do not write the full span for now
   Write_Eol;
 
   Write_Str
-("  Optr = ");
+("  Optr   = ");
   Write_Location (E.Optr.Ptr);
   Write_Eol;
 
-  w ("  Line = ", Int (E.Line));
-  w ("  Col  = ", Int (E.Col));
-  w ("  Warn = ", E.Warn);
-  w ("  Warn_Err = ", E.Warn_Err);
-  w ("  Warn_Chr = '" & E.Warn_Chr & ''');
-  w ("  Style= ", E.Style);
-  w ("  Serious  = ", E.Serious);
-  w ("  Uncond   = ", E.Uncond);
-  w ("  Msg_Cont = ", E.Msg_Cont);
-  w ("  Deleted  = ", E.Deleted);
-  w ("  Node = ", Int (E.Node));
+  w ("  Line   = ", Int (E.Line));
+  w ("  Col= ", Int (E.Col));
+  w ("  Warn   = ", E.Warn);
+  w ("  Warn_Err   = ", E.Warn_Err);
+  w ("  Warn_Runtime_Raise = ", E.Warn_Runtime_Raise);
+  w ("  Warn_Chr   = '" & E.Warn_Chr & ''');
+  w ("  Style  = ", E.Style);
+  w ("  Serious= ", E.Serious);
+  w ("  Uncond = ", E.Uncond);
+  w ("  Msg_Cont   = ", E.Msg_Cont);
+  w ("  Deleted= ", E.Deleted);
+  w ("  Node   = ", Int (E.Node));
 
   Write_Eol;
end dmsg;
-- 
2.34.1

[COMMITTED] ada: Disable checking of Elab_Spec procedures in CodePeer_Mode

2022-11-22 Thread Marc Poulhiès via Gcc-patches

From: Ghjuvan Lacambre 

This commit re-enables the Validate_Subprogram_Calls check that had been
disabled in a previous commit and has said check skip over Elab_Spec
procedures in CodePeer_Mode.

gcc/ada/

* frontend.adb (Frontend): Re-enable Validate_Subprogram_Calls.
* exp_ch6.adb (Check_BIP_Actuals): When in CodePeer mode, do not
attempt to validate procedures coming from an
Elab_Spec/Elab_Body/Elab_Subp_Body procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch6.adb  | 17 +
 gcc/ada/frontend.adb |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
index a5dee38c55f..237a19d1327 100644
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -1115,6 +1115,23 @@ package body Exp_Ch6 is
 | N_Function_Call
 | N_Procedure_Call_Statement);
 
+  --  In CodePeer_Mode, the tree for `'Elab_Spec` procedures will be
+  --  malformed because GNAT does not perform the usual expansion that
+  --  results in the importation of external elaboration procedure symbols.
+  --  This is expected: the CodePeer backend has special handling for this
+  --  malformed tree.
+  --  Thus, we do not need to check the tree (and in fact can't, because
+  --  it's malformed).
+
+  if CodePeer_Mode
+and then Nkind (Name (Subp_Call)) = N_Attribute_Reference
+and then Attribute_Name (Name (Subp_Call)) in Name_Elab_Spec
+| Name_Elab_Body
+| Name_Elab_Subp_Body
+  then
+ return True;
+  end if;
+
   Formal := First_Formal_With_Extras (Subp_Id);
   Actual := First_Actual (Subp_Call);
 
diff --git a/gcc/ada/frontend.adb b/gcc/ada/frontend.adb
index bc3da30b0cf..033ecf3b7be 100644
--- a/gcc/ada/frontend.adb
+++ b/gcc/ada/frontend.adb
@@ -531,7 +531,7 @@ begin
--  formals). It is invoked using pragma Debug to avoid adding any cost
--  when the compiler is built with assertions disabled.
 
-   if not Debug_Flag_Underscore_XX and then not CodePeer_Mode then
+   if not Debug_Flag_Underscore_XX then
   pragma Debug (Exp_Ch6.Validate_Subprogram_Calls (Cunit (Main_Unit)));
end if;
 
-- 
2.34.1

[COMMITTED] ada: Accept aspects Global and Depends on abstract subprograms

2022-11-22 Thread Marc Poulhiès via Gcc-patches

From: Piotr Trojanek 

Aspects Global and Depends are now allowed on abstract subprograms
(as substitutes for Global'Class and Depends'Class).

This patch implements the recently modified rules SPARK RM 6.1.2(2-3).
The behavior for Contract_Cases and aspects on null subprograms stays
as it was.

gcc/ada/

* sem_prag.adb (Analyze_Depends_Global): Accept aspects on
abstract subprograms.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index f2c1a3f0e6e..0a91518cff9 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -4549,6 +4549,11 @@ package body Sem_Prag is
  elsif Nkind (Subp_Decl) = N_Single_Task_Declaration then
 null;
 
+ --  Abstract subprogram declaration
+
+ elsif Nkind (Subp_Decl) = N_Abstract_Subprogram_Declaration then
+null;
+
  --  Subprogram body acts as spec
 
  elsif Nkind (Subp_Decl) = N_Subprogram_Body
-- 
2.34.1

Re: [PATCH] d: respect --enable-link-mutex configure option

2022-11-22 Thread Iain Buclaw via Gcc-patches

Excerpts from Martin Liška's message of November 22, 2022 10:41 am:
> I noticed the option is ignored because @DO_LINK_MUTEX@
> is not defined in d/Make-lang.in.
> 
> Tested locally before and after the patch.
> 
> Ready to be installed?
> Thanks,
> Martin
> 

Fine on my end.  Thanks!

Iain.

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Christophe Lyon via Gcc-patches





On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on 
big-endian, because the _Decimal32 on-stack argument is not

padded in the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument
in the right stack location, similarly to what other tests do
in the same directory.

gcc/testsuite/ChangeLog:

PR target/107604 * gcc.target/aarch64/aapcs64/test_dfp_17.c:
Fix for big-endian. --- 
gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4

 1 file changed, 4 insertions(+)

diff --git
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c index

22dc462bf7c..3c45f715cf7 100644 ---
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c +++
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c @@
-32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd
}; ANON(struct z, a, D1) ANON(struct z, b, STACK) ANON(int , 5,
W0) +#ifndef __AAPCS64_BIG_ENDIAN__ ANON(_Decimal32, f1,
STACK+32) /* Note: no promotion to _Decimal64.  */ +#else +
ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */ +#endif LAST_ANON(_Decimal64, 0.5dd, STACK+40) 
#endif


Why would a Decimal32 change stack placement based on the
endianness? Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack
 arguments.

Richard


Ah, OK.
Indeed, it was not immediately obvious to me either, when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe. There are many other tests under aapcs64/ which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



I notice the new ANON definition is not correctly indented.

It looks OK on my side (2 spaces).

Thanks,

Christophe



R.

Re: [PATCH 2/2] Fortran: add attribute target_clones

2022-11-22 Thread Mikael Morin


Le 21/11/2022 à 23:26, Bernhard Reutner-Fischer a écrit :

On Mon, 21 Nov 2022 20:13:40 +0100
Mikael Morin  wrote:


Hello,

Le 09/11/2022 à 20:02, Bernhard Reutner-Fischer via Fortran a écrit :

Hi!


(...)

+  if (allow_multiple && gfc_match_char (')') != MATCH_YES)
+{
+  gfc_error ("expected ')' at %C");
+  return NULL_TREE;
+}
+
+  return attr_args;
+}

I'm not sure this function need to do all the parsing manually.
I would rather use gfc_match_actual_arglist, or maybe implement the
function as a wrapper around it.
What is allowed here?  Are non-literal constants allowed, for example
parameter variables?  Is line continuation supported ?


Line continuation is supported i think.
Parameter variables supposedly are or should not be supported. Why would
you do that in the context of an attribute target decl? > Either way, if the ME 
does not find such an fndecl, it will complain
and ignore the attribute.
I don't understand non-literal constants in this context.
This very attribute applies to decls, so the existing code supposedly
matches a comma separated list of identifiers. The usual dollar-ok
caveats apply.

No, my comment and my questions were about your function, which, as I 
understand it, matches the arguments to the attribute: it matches open 
and closing parenthesis, double quotes, etc.

Matching of decl names comes after that.
I asked the question about non-literal constant (and the other as well), 
because I saw it as a possible reason to not reuse the existing parsing 
functions.



As to gfc_match_actual_arglist, probably.
target_clones has
+  { "target_clones",  1, -1, true, false, false, false,
+ dummy, NULL },
with tree-core.h struct attribute_spec, so
name, min=1, max=unbounded, decl_required=yes, ...ignore...

hence applies to functions and subroutines and the like. It does take an
unbounded list of strings, isa1, isa2, isa4, default. We could add
"default" unless seen, but i'd rather want it spelled out by the user
for the user is supposed to know what she's doing, as in c or c++.
The ME has code to sanity-check the attributes, including conflicting
(ME) attributes.

The reason why i contemplated with a separate parser was that for stuff
like regparm or sseregparm, you would want to require a single number
for the equivalent of

__attribute__((regparm(3),stdcall)

which you would provide in 2 separate !GCC$ attributes i assume.

Well, the check could as easily be enforced after parsing with the 
existing parsing functions.




Nothing (bad) to say about the rest, but there is enough to change with
the above comments.


Yes, many thanks for your comments.
I think there is no other non-intrusive way to pass the data through the
frontend. So for an acceptable way this means touching quite some spots
for every single ME attribute anybody would like to add in the future.


I'm not sure I understand.  Please let's just add what is necessary for 
this attribute, not more.

RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-22 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 15, 2022 11:34 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 11:15 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> ; nd ; Marcus
> Shawcroft
> >> >> 
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> -Original Message-
> >> >> >> From: Richard Sandiford 
> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> >> To: Tamar Christina 
> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> >> ; nd ; Marcus
> >> Shawcroft
> >> >> >> 
> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >>
> >> >> >> Tamar Christina  writes:
> >> >> >> > Hello,
> >> >> >> >
> >> >> >> > Ping and updated patch.
> >> >> >> >
> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> issues.
> >> >> >> >
> >> >> >> > Ok for master?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Tamar
> >> >> >> >
> >> >> >> > gcc/ChangeLog:
> >> >> >> >
> >> >> >> > * config/aarch64/aarch64.md (*tb1):
> >> >> >> > Rename
> >> to...
> >> >> >> > (*tb1): ... this.
> >> >> >> > (tbranch4): New.
> >> >> >> >
> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >
> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >> >
> >> >> >> > --- inline copy of patch ---
> >> >> >> >
> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >> >
> >> >> >>
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> >> 71
> >> >> >> > 2bde55c7c72e 100644
> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >> >> >   (const_int 1)))]
> >> >> >> >  )
> >> >> >> >
> >> >> >> > -(define_insn "*tb1"
> >> >> >> > +(define_expand "tbranch4"
> >> >> >> >[(set (pc) (if_then_else
> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
> >> "register_operand"
> >> >> >> "r")
> >> >> >> > -   (const_int 1)
> >> >> >> > -   (match_operand 1
> >> >> >> > - 
> >> >> >> > "aarch64_simd_shift_imm_" "n"))
> >> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
> >> >> >> > +[(match_operand:ALLI 1 "register_operand")
> >> >> >> > + (match_operand:ALLI 2
> >> >> >> "aarch64_simd_shift_imm_")])
> >> >> >> > +   (label_ref (match_operand 3 "" ""))
> >> >> >> > +   (pc)))]
> >> >> >> > +  "optimize > 0"
> >> >> >>
> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
> >> >> >> choice at -O0
> >> >> too.
> >> >> >>
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I had explained the reason why in the original patch, just
> >> >> > didn't repeat it in
> >> >> the ping:
> >> >> >
> >> >> > Instead of emitting the instruction directly I've chosen to
> >> >> > expand the pattern using a zero extract and generating the
> >> >> > existing pattern for comparisons for two
> >> >> > reasons:
> >> >> >
> >> >> >   1. Allows for CSE of the actual comparison.
> >> >> >   2. It looks like the code in expand makes the label as unused
> >> >> > and removed
> >> >> it
> >> >> >  if it doesn't see a separate reference to it.
> >> >> >
> >> >> > Because of this expansion though I disable the pattern at -O0
> >> >> > since we
> >> >> have no combine in that case so we'd end up with worse code.  I
> >> >> did try emitting the pattern directly, but as mentioned in no#2
> >> >> expand would then kill the label.
> >> >> >
> >> >> > Basically I emit the pattern directly, immediately during expand
> >> >> > the label is
> >> >> marked as dead for some weird reason.
> >> >>
> >> >> Isn't #2 a bug though?  It seems like something we should fix
> >> >> rather than work around.
> >> >
> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
> >> > split the optabs still? Isn't the problem atm that I need the split?
> >> > If I'm emitting the instruction directly then the recog pattern for
> >> > it can just be (eq (vec_extract x 1) 0) which is the correct semantics?
> >>
> >> What rtx does the code that uses the optab pass for op

[PATCH] Remove ASSERT_EXPR.

2022-11-22 Thread Aldy Hernandez via Gcc-patches

This removes all uses of ASSERT_EXPR except the internal one in ipa-*.

OK pending tests?

gcc/ChangeLog:

* doc/gimple.texi: Remove ASSERT_EXPR references.
* fold-const.cc (tree_expr_nonzero_warnv_p): Same.
(fold_binary_loc): Same.
(tree_expr_nonnegative_warnv_p): Same.
* gimple-array-bounds.cc (get_base_decl): Same.
* gimple-pretty-print.cc (dump_unary_rhs): Same.
* gimple.cc (get_gimple_rhs_num_ops): Same.
* pointer-query.cc (handle_ssa_name): Same.
* tree-cfg.cc (verify_gimple_assign_single): Same.
* tree-pretty-print.cc (dump_generic_node): Same.
* tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):Same.
(interpret_rhs_expr): Same.
* tree-ssa-operands.cc (operands_scanner::get_expr_operands): Same.
* tree-ssa-propagate.cc
(substitute_and_fold_dom_walker::before_dom_children): Same.
* tree-ssa-threadedge.cc: Same.
* tree-vrp.cc (overflow_comparison_p): Same.
* tree.def (ASSERT_EXPR): Add note.
* tree.h (ASSERT_EXPR_VAR): Remove.
(ASSERT_EXPR_COND): Remove.
* vr-values.cc (simplify_using_ranges::vrp_visit_cond_stmt):
Remove comment.
---
 gcc/doc/gimple.texi  |  3 +--
 gcc/fold-const.cc|  6 -
 gcc/gimple-array-bounds.cc   |  9 +---
 gcc/gimple-pretty-print.cc   |  1 -
 gcc/gimple.cc|  1 -
 gcc/pointer-query.cc |  6 -
 gcc/tree-cfg.cc  | 11 -
 gcc/tree-pretty-print.cc |  8 ---
 gcc/tree-scalar-evolution.cc | 15 -
 gcc/tree-ssa-operands.cc |  1 -
 gcc/tree-ssa-propagate.cc|  5 +
 gcc/tree-ssa-threadedge.cc   |  6 ++---
 gcc/tree-vrp.cc  |  7 +++---
 gcc/tree.def |  5 -
 gcc/tree.h   |  4 
 gcc/vr-values.cc | 43 
 16 files changed, 13 insertions(+), 118 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 7832fa6ff90..a4263922887 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -682,8 +682,7 @@ more than two slots on the RHS.  For instance, a 
@code{COND_EXPR}
 expression of the form @code{(a op b) ? x : y} could be flattened
 out on the operand vector using 4 slots, but it would also
 require additional processing to distinguish @code{c = a op b}
-from @code{c = a op b ? x : y}.  Something similar occurs with
-@code{ASSERT_EXPR}.   In time, these special case tree
+from @code{c = a op b ? x : y}.  In time, these special case tree
 expressions should be flattened into the operand vector.
 @end itemize
 
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index b89cac91cae..114258fa182 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -10751,7 +10751,6 @@ tree_expr_nonzero_warnv_p (tree t, bool 
*strict_overflow_p)
 case COND_EXPR:
 case CONSTRUCTOR:
 case OBJ_TYPE_REF:
-case ASSERT_EXPR:
 case ADDR_EXPR:
 case WITH_SIZE_EXPR:
 case SSA_NAME:
@@ -12618,10 +12617,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
 : fold_convert_loc (loc, type, arg1);
   return tem;
 
-case ASSERT_EXPR:
-  /* An ASSERT_EXPR should never be passed to fold_binary.  */
-  gcc_unreachable ();
-
 default:
   return NULL_TREE;
 } /* switch (code) */
@@ -15117,7 +15112,6 @@ tree_expr_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 case COND_EXPR:
 case CONSTRUCTOR:
 case OBJ_TYPE_REF:
-case ASSERT_EXPR:
 case ADDR_EXPR:
 case WITH_SIZE_EXPR:
 case SSA_NAME:
diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 1eafd3fd3e1..eae49ab3910 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -75,14 +75,7 @@ get_base_decl (tree ref)
   if (gimple_assign_single_p (def))
{
  base = gimple_assign_rhs1 (def);
- if (TREE_CODE (base) != ASSERT_EXPR)
-   return base;
-
- base = TREE_OPERAND (base, 0);
- if (TREE_CODE (base) != SSA_NAME)
-   return base;
-
- continue;
+ return base;
}
 
   if (!gimple_nop_p (def))
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 7ec079f15c6..af704257633 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -339,7 +339,6 @@ dump_unary_rhs (pretty_printer *buffer, const gassign *gs, 
int spc,
   switch (rhs_code)
 {
 case VIEW_CONVERT_EXPR:
-case ASSERT_EXPR:
   dump_generic_node (buffer, rhs, spc, flags, false);
   break;
 
diff --git a/gcc/gimple.cc b/gcc/gimple.cc
index 6c23dd77609..dd054e16453 100644
--- a/gcc/gimple.cc
+++ b/gcc/gimple.cc
@@ -2408,7 +2408,6 @@ get_gimple_rhs_num_ops (enum tree_code code)
   || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS
\
: ((SYM) == CONSTRUCTOR

[PATCH] Remove follow_assert_exprs from overflow_comparison.

2022-11-22 Thread Aldy Hernandez via Gcc-patches

OK pending tests?

gcc/ChangeLog:

* tree-vrp.cc (overflow_comparison_p_1): Remove follow_assert_exprs.
(overflow_comparison_p): Remove use_equiv_p.
* tree-vrp.h (overflow_comparison_p): Same.
* vr-values.cc (vrp_evaluate_conditional_warnv_with_ops): Remove
use_equiv_p argument to overflow_comparison_p.
---
 gcc/tree-vrp.cc  | 40 
 gcc/tree-vrp.h   |  2 +-
 gcc/vr-values.cc |  2 +-
 3 files changed, 6 insertions(+), 38 deletions(-)

diff --git a/gcc/tree-vrp.cc b/gcc/tree-vrp.cc
index d29941d0f2d..3846dc1d849 100644
--- a/gcc/tree-vrp.cc
+++ b/gcc/tree-vrp.cc
@@ -679,7 +679,7 @@ range_fold_unary_expr (value_range *vr,
 
 static bool
 overflow_comparison_p_1 (enum tree_code code, tree op0, tree op1,
-bool follow_assert_exprs, bool reversed, tree *new_cst)
+bool reversed, tree *new_cst)
 {
   /* See if this is a relational operation between two SSA_NAMES with
  unsigned, overflow wrapping values.  If so, check it more deeply.  */
@@ -693,19 +693,6 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
 {
   gimple *op1_def = SSA_NAME_DEF_STMT (op1);
 
-  /* If requested, follow any ASSERT_EXPRs backwards for OP1.  */
-  if (follow_assert_exprs)
-   {
- while (gimple_assign_single_p (op1_def)
-&& TREE_CODE (gimple_assign_rhs1 (op1_def)) == ASSERT_EXPR)
-   {
- op1 = TREE_OPERAND (gimple_assign_rhs1 (op1_def), 0);
- if (TREE_CODE (op1) != SSA_NAME)
-   break;
- op1_def = SSA_NAME_DEF_STMT (op1);
-   }
-   }
-
   /* Now look at the defining statement of OP1 to see if it adds
 or subtracts a nonzero constant from another operand.  */
   if (op1_def
@@ -716,24 +703,6 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
{
  tree target = gimple_assign_rhs1 (op1_def);
 
- /* If requested, follow ASSERT_EXPRs backwards for op0 looking
-for one where TARGET appears on the RHS.  */
- if (follow_assert_exprs)
-   {
- /* Now see if that "other operand" is op0, following the chain
-of ASSERT_EXPRs if necessary.  */
- gimple *op0_def = SSA_NAME_DEF_STMT (op0);
- while (op0 != target
-&& gimple_assign_single_p (op0_def)
-&& TREE_CODE (gimple_assign_rhs1 (op0_def)) == ASSERT_EXPR)
-   {
- op0 = TREE_OPERAND (gimple_assign_rhs1 (op0_def), 0);
- if (TREE_CODE (op0) != SSA_NAME)
-   break;
- op0_def = SSA_NAME_DEF_STMT (op0);
-   }
-   }
-
  /* If we did not find our target SSA_NAME, then this is not
 an overflow test.  */
  if (op0 != target)
@@ -764,13 +733,12 @@ overflow_comparison_p_1 (enum tree_code code, tree op0, 
tree op1,
the alternate range representation is often useful within VRP.  */
 
 bool
-overflow_comparison_p (tree_code code, tree name, tree val,
-  bool use_equiv_p, tree *new_cst)
+overflow_comparison_p (tree_code code, tree name, tree val, tree *new_cst)
 {
-  if (overflow_comparison_p_1 (code, name, val, use_equiv_p, false, new_cst))
+  if (overflow_comparison_p_1 (code, name, val, false, new_cst))
 return true;
   return overflow_comparison_p_1 (swap_tree_comparison (code), val, name,
- use_equiv_p, true, new_cst);
+ true, new_cst);
 }
 
 /* Handle
diff --git a/gcc/tree-vrp.h b/gcc/tree-vrp.h
index 07630b5b1ca..127909604f0 100644
--- a/gcc/tree-vrp.h
+++ b/gcc/tree-vrp.h
@@ -39,7 +39,7 @@ extern enum value_range_kind intersect_range_with_nonzero_bits
 extern bool find_case_label_range (gswitch *, tree, tree, size_t *, size_t *);
 extern tree find_case_label_range (gswitch *, const irange *vr);
 extern bool find_case_label_index (gswitch *, size_t, tree, size_t *);
-extern bool overflow_comparison_p (tree_code, tree, tree, bool, tree *);
+extern bool overflow_comparison_p (tree_code, tree, tree, tree *);
 extern void maybe_set_nonzero_bits (edge, tree);
 
 #endif /* GCC_TREE_VRP_H */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index 0347c29b216..b0dd30260ae 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -837,7 +837,7 @@ 
simplify_using_ranges::vrp_evaluate_conditional_warnv_with_ops
  occurs when the chosen argument is zero and does not occur if the
  chosen argument is not zero.  */
   tree x;
-  if (overflow_comparison_p (code, op0, op1, use_equiv_p, &x))
+  if (overflow_comparison_p (code, op0, op1, &x))
 {
   wide_int max = wi::max_value (TYPE_PRECISION (TREE_TYPE (op0)), 
UNSIGNED);
   /* B = A - 1; if (A < B) -> B = A - 1; if (A == 0)
-- 
2.38.1

[PATCH] Remove use_equiv_p in vr-values.cc

2022-11-22 Thread Aldy Hernandez via Gcc-patches

With no equivalences, the use_equiv_p argument in various methods in
simplify_using_ranges is always false.  This means we can remove all
calls to compare_names, along with the function.

OK pending tests?

gcc/ChangeLog:

* vr-values.cc (simplify_using_ranges::compare_names): Remove.
(vrp_evaluate_conditional_warnv_with_ops): Remove call to
compare_names.
(simplify_using_ranges::vrp_visit_cond_stmt): Remove use_equiv_p
argument to vrp_evaluate_conditional_warnv_with_ops.
* vr-values.h (class simplify_using_ranges): Remove
compare_names.
Remove use_equiv_p to vrp_evaluate_conditional_warnv_with_ops.
---
 gcc/vr-values.cc | 127 +--
 gcc/vr-values.h  |   4 +-
 2 files changed, 3 insertions(+), 128 deletions(-)

diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index b0dd30260ae..1dbd9e47085 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -667,124 +667,6 @@ simplify_using_ranges::compare_name_with_value
   return retval;
 }
 
-/* Given a comparison code COMP and names N1 and N2, compare all the
-   ranges equivalent to N1 against all the ranges equivalent to N2
-   to determine the value of N1 COMP N2.  Return the same value
-   returned by compare_ranges.  Set *STRICT_OVERFLOW_P to indicate
-   whether we relied on undefined signed overflow in the comparison.  */
-
-
-tree
-simplify_using_ranges::compare_names (enum tree_code comp, tree n1, tree n2,
- bool *strict_overflow_p, gimple *s)
-{
-  /* ?? These bitmaps are NULL as there are no longer any equivalences
- available in the value_range*.  */
-  bitmap e1 = NULL;
-  bitmap e2 = NULL;
-
-  /* Use the fake bitmaps if e1 or e2 are not available.  */
-  static bitmap s_e1 = NULL, s_e2 = NULL;
-  static bitmap_obstack *s_obstack = NULL;
-  if (s_obstack == NULL)
-{
-  s_obstack = XNEW (bitmap_obstack);
-  bitmap_obstack_initialize (s_obstack);
-  s_e1 = BITMAP_ALLOC (s_obstack);
-  s_e2 = BITMAP_ALLOC (s_obstack);
-}
-  if (e1 == NULL)
-e1 = s_e1;
-  if (e2 == NULL)
-e2 = s_e2;
-
-  /* Add N1 and N2 to their own set of equivalences to avoid
- duplicating the body of the loop just to check N1 and N2
- ranges.  */
-  bitmap_set_bit (e1, SSA_NAME_VERSION (n1));
-  bitmap_set_bit (e2, SSA_NAME_VERSION (n2));
-
-  /* If the equivalence sets have a common intersection, then the two
- names can be compared without checking their ranges.  */
-  if (bitmap_intersect_p (e1, e2))
-{
-  bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
-  bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
-
-  return (comp == EQ_EXPR || comp == GE_EXPR || comp == LE_EXPR)
-? boolean_true_node
-: boolean_false_node;
-}
-
-  /* Start at -1.  Set it to 0 if we do a comparison without relying
- on overflow, or 1 if all comparisons rely on overflow.  */
-  int used_strict_overflow = -1;
-
-  /* Otherwise, compare all the equivalent ranges.  First, add N1 and
- N2 to their own set of equivalences to avoid duplicating the body
- of the loop just to check N1 and N2 ranges.  */
-  bitmap_iterator bi1;
-  unsigned i1;
-  EXECUTE_IF_SET_IN_BITMAP (e1, 0, i1, bi1)
-{
-  if (!ssa_name (i1))
-   continue;
-
-  value_range tem_vr1;
-  const value_range *vr1 = get_vr_for_comparison (i1, &tem_vr1, s);
-
-  tree t = NULL_TREE, retval = NULL_TREE;
-  bitmap_iterator bi2;
-  unsigned i2;
-  EXECUTE_IF_SET_IN_BITMAP (e2, 0, i2, bi2)
-   {
- if (!ssa_name (i2))
-   continue;
-
- bool sop = false;
-
- value_range tem_vr2;
- const value_range *vr2 = get_vr_for_comparison (i2, &tem_vr2, s);
-
- t = compare_ranges (comp, vr1, vr2, &sop);
- if (t)
-   {
- /* If we get different answers from different members
-of the equivalence set this check must be in a dead
-code region.  Folding it to a trap representation
-would be correct here.  For now just return don't-know.  */
- if (retval != NULL && t != retval)
-   {
- bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
- bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
- return NULL_TREE;
-   }
- retval = t;
-
- if (!sop)
-   used_strict_overflow = 0;
- else if (used_strict_overflow < 0)
-   used_strict_overflow = 1;
-   }
-   }
-
-  if (retval)
-   {
- bitmap_clear_bit (e1, SSA_NAME_VERSION (n1));
- bitmap_clear_bit (e2, SSA_NAME_VERSION (n2));
- if (used_strict_overflow > 0)
-   *strict_overflow_p = true;
- return retval;
-   }
-}
-
-  /* None of the equivalent ranges are useful in computing this
- comparison.  */
-  bitmap_clear_bit (e1, SSA_NAME_VERSION (n1))

Re: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-22 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, November 15, 2022 11:34 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> ; nd ; Marcus Shawcroft
>> 
>> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> 
>> Tamar Christina  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford 
>> >> Sent: Tuesday, November 15, 2022 11:15 AM
>> >> To: Tamar Christina 
>> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> ; nd ; Marcus Shawcroft
>> >> 
>> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >>
>> >> Tamar Christina  writes:
>> >> >> -Original Message-
>> >> >> From: Richard Sandiford 
>> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
>> >> >> To: Tamar Christina 
>> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> >> ; nd ; Marcus
>> Shawcroft
>> >> >> 
>> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >> >>
>> >> >> Tamar Christina  writes:
>> >> >> >> -Original Message-
>> >> >> >> From: Richard Sandiford 
>> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
>> >> >> >> To: Tamar Christina 
>> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
>> >> >> >> ; nd ; Marcus
>> >> Shawcroft
>> >> >> >> 
>> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
>> >> >> >>
>> >> >> >> Tamar Christina  writes:
>> >> >> >> > Hello,
>> >> >> >> >
>> >> >> >> > Ping and updated patch.
>> >> >> >> >
>> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
>> issues.
>> >> >> >> >
>> >> >> >> > Ok for master?
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Tamar
>> >> >> >> >
>> >> >> >> > gcc/ChangeLog:
>> >> >> >> >
>> >> >> >> > * config/aarch64/aarch64.md (*tb1):
>> >> >> >> > Rename
>> >> to...
>> >> >> >> > (*tb1): ... this.
>> >> >> >> > (tbranch4): New.
>> >> >> >> >
>> >> >> >> > gcc/testsuite/ChangeLog:
>> >> >> >> >
>> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
>> >> >> >> >
>> >> >> >> > --- inline copy of patch ---
>> >> >> >> >
>> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
>> >> >> >> > b/gcc/config/aarch64/aarch64.md index
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
>> >> >> >> 71
>> >> >> >> > 2bde55c7c72e 100644
>> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
>> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
>> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
>> >> >> >> >   (const_int 1)))]
>> >> >> >> >  )
>> >> >> >> >
>> >> >> >> > -(define_insn "*tb1"
>> >> >> >> > +(define_expand "tbranch4"
>> >> >> >> >[(set (pc) (if_then_else
>> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
>> >> "register_operand"
>> >> >> >> "r")
>> >> >> >> > -   (const_int 1)
>> >> >> >> > -   (match_operand 1
>> >> >> >> > - 
>> >> >> >> > "aarch64_simd_shift_imm_" "n"))
>> >> >> >> > +   (match_operator 0 "aarch64_comparison_operator"
>> >> >> >> > +[(match_operand:ALLI 1 "register_operand")
>> >> >> >> > + (match_operand:ALLI 2
>> >> >> >> "aarch64_simd_shift_imm_")])
>> >> >> >> > +   (label_ref (match_operand 3 "" ""))
>> >> >> >> > +   (pc)))]
>> >> >> >> > +  "optimize > 0"
>> >> >> >>
>> >> >> >> Why's the pattern conditional on optimize?  Seems a valid
>> >> >> >> choice at -O0
>> >> >> too.
>> >> >> >>
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I had explained the reason why in the original patch, just
>> >> >> > didn't repeat it in
>> >> >> the ping:
>> >> >> >
>> >> >> > Instead of emitting the instruction directly I've chosen to
>> >> >> > expand the pattern using a zero extract and generating the
>> >> >> > existing pattern for comparisons for two
>> >> >> > reasons:
>> >> >> >
>> >> >> >   1. Allows for CSE of the actual comparison.
>> >> >> >   2. It looks like the code in expand makes the label as unused
>> >> >> > and removed
>> >> >> it
>> >> >> >  if it doesn't see a separate reference to it.
>> >> >> >
>> >> >> > Because of this expansion though I disable the pattern at -O0
>> >> >> > since we
>> >> >> have no combine in that case so we'd end up with worse code.  I
>> >> >> did try emitting the pattern directly, but as mentioned in no#2
>> >> >> expand would then kill the label.
>> >> >> >
>> >> >> > Basically I emit the pattern directly, immediately during expand
>> >> >> > the label is
>> >> >> marked as dead for some weird reason.
>> >> >>
>> >> >> Isn't #2 a bug though?  It seems like something we should fix
>> >> >> rather than work around.
>> >> >
>> >> > Yes it's a bug ☹ ok if I'm going to fix that bug then do I need to
>> >> > split the optabs still? Isn't the problem atm that I need the split?
>> >> > If I'm emitting the instruction directly then t

Re: [PATCH v4] LoongArch: Optimize immediate load.

2022-11-22 Thread Xi Ruoyao via Gcc-patches

While I still can't fully understand the immediate load issue and how
this patch fix it, I've tested this patch (alongside the prefetch
instruction patch) with bootstrap-ubsan.  And the compiled result of
imm-load1.c seems OK.

On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:
> v1 -> v2:
> 1. Change the code format.
> 2. Fix bugs in the code.
> 
> v2 -> v3:
> Modifying a code implementation of an undefined behavior.
> 
> v3 -> v4:
> Move the part of the immediate number decomposition from expand pass
> to split
> pass.
> 
> Both regression tests and spec2006 passed.
> 
> The problem mentioned in the link does not move the four immediate
> load
> instructions out of the loop. It has been optimized. Now, as in the
> test case,
> four immediate load instructions are generated outside the loop.
> (
> https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html)
> 
> 
> Because loop2_invariant pass will extract the instructions that do not
> change
> in the loop out of the loop, some instructions will not meet the
> extraction
> conditions if the machine performs immediate decomposition while
> expand pass,
> so the immediate decomposition will be transferred to the split
> process.
> 
> gcc/ChangeLog:
> 
> * config/loongarch/loongarch.cc (enum
> loongarch_load_imm_method):
> Remove the member METHOD_INSV that is not currently used.
> (struct loongarch_integer_op): Define a new member curr_value,
> that records the value of the number stored in the destination
> register immediately after the current instruction has run.
> (loongarch_build_integer): Assign a value to the curr_value
> member variable.
> (loongarch_move_integer): Adds information for the immediate
> load instruction.
> * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
> define_insn_and_split.
> (*movdi_64bit): Likewise.
> (*movsi_internal): Likewise.
> (*movhi_internal): Likewise.
> * config/loongarch/predicates.md: Return true as long as it is
> CONST_INT, ensure
> that the immediate number is not optimized by decomposition
> during expand
> optimization loop.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/loongarch/imm-load.c: New test.
> * gcc.target/loongarch/imm-load1.c: New test.
> ---
>  gcc/config/loongarch/loongarch.cc | 62 ++
> -
>  gcc/config/loongarch/loongarch.md | 44 +++--
>  gcc/config/loongarch/predicates.md    |  2 +-
>  gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
>  .../gcc.target/loongarch/imm-load1.c  | 26 
>  5 files changed, 110 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
>  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c
> 
> diff --git a/gcc/config/loongarch/loongarch.cc
> b/gcc/config/loongarch/loongarch.cc
> index 8ee32c90573..9e0d6c7c3ea 100644
> --- a/gcc/config/loongarch/loongarch.cc
> +++ b/gcc/config/loongarch/loongarch.cc
> @@ -139,22 +139,21 @@ struct loongarch_address_info
>  
>     METHOD_LU52I:
>   Load 52-63 bit of the immediate number.
> -
> -   METHOD_INSV:
> - immediate like 0xfff0fxxx
> -   */
> +*/
>  enum loongarch_load_imm_method
>  {
>    METHOD_NORMAL,
>    METHOD_LU32I,
> -  METHOD_LU52I,
> -  METHOD_INSV
> +  METHOD_LU52I
>  };
>  
>  struct loongarch_integer_op
>  {
>    enum rtx_code code;
>    HOST_WIDE_INT value;
> +  /* Represent the result of the immediate count of the load
> instruction at
> + each step.  */
> +  HOST_WIDE_INT curr_value;
>    enum loongarch_load_imm_method method;
>  };
>  
> @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct
> loongarch_integer_op *codes,
>  {
>    /* The value of the lower 32 bit be loaded with one
> instruction.
>  lu12i.w.  */
> -  codes[0].code = UNKNOWN;
> -  codes[0].method = METHOD_NORMAL;
> -  codes[0].value = low_part;
> +  codes[cost].code = UNKNOWN;
> +  codes[cost].method = METHOD_NORMAL;
> +  codes[cost].value = low_part;
> +  codes[cost].curr_value = low_part;
>    cost++;
>  }
>    else
>  {
>    /* lu12i.w + ior.  */
> -  codes[0].code = UNKNOWN;
> -  codes[0].method = METHOD_NORMAL;
> -  codes[0].value = low_part & ~(IMM_REACH - 1);
> +  codes[cost].code = UNKNOWN;
> +  codes[cost].method = METHOD_NORMAL;
> +  codes[cost].value = low_part & ~(IMM_REACH - 1);
> +  codes[cost].curr_value = codes[cost].value;
>    cost++;
>    HOST_WIDE_INT iorv = low_part & (IMM_REACH - 1);
>    if (iorv != 0)
> {
> - codes[1].code = IOR;
> - codes[1].method = METHOD_NORMAL;
> - codes[1].value = iorv;
> + codes[cost].code = IOR;
> + codes[cost].method = METHOD_NORMAL;
> + codes[cost].value = iorv;
> + co

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Richard Earnshaw via Gcc-patches





On 22/11/2022 13:09, Christophe Lyon wrote:



On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:
gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on 
big-endian, because the _Decimal32 on-stack argument is not

padded in the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument
in the right stack location, similarly to what other tests do
in the same directory.

gcc/testsuite/ChangeLog:

PR target/107604 * gcc.target/aarch64/aapcs64/test_dfp_17.c:
Fix for big-endian. --- 
gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4

 1 file changed, 4 insertions(+)

diff --git
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c index

22dc462bf7c..3c45f715cf7 100644 ---
a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c +++
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c @@
-32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd
}; ANON(struct z, a, D1) ANON(struct z, b, STACK) ANON(int , 5,
W0) +#ifndef __AAPCS64_BIG_ENDIAN__ ANON(_Decimal32, f1,
STACK+32) /* Note: no promotion to _Decimal64.  */ +#else +
ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */ +#endif LAST_ANON(_Decimal64, 0.5dd, STACK+40) #endif


Why would a Decimal32 change stack placement based on the
endianness? Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack
 arguments.

Richard


Ah, OK.
Indeed, it was not immediately obvious to me either, when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe. There are many other tests under aapcs64/ which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



Yes, it could be used to clean all those up as well.




I notice the new ANON definition is not correctly indented.

It looks OK on my side (2 spaces).


Never mind then, it must be a quirk of how the diff is displayed.


Thanks,

Christophe



R.

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Qing Zhao via Gcc-patches




> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
> 
> On Mon, 21 Nov 2022, Qing Zhao wrote:
> 
>> 
>> 
>>> On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
>>> 
>>> On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
 Hi, Richard,
 
 Honestly, it?s very hard for me to decide what?s the best way to handle 
 the interaction 
 between -fstrict-flex-array=M and -Warray-bounds=N. 
 
 Ideally,  -fstrict-flex-array=M should completely control the behavior of 
 -Warray-bounds.
 If possible, I prefer this solution.
 
 However, -Warray-bounds is included in -Wall, and has been used 
 extensively for a long time.
 It?s not safe to change its default behavior. 
>>> 
>>> I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
>>> it is in -Wall is _good_ for this reason. :) No one is going to add
>>> -fstrict-flex-arrays (at any level) without understanding what it does
>>> and wanting those effects on -Warray-bounds.
>> 
>> 
>> The major difficulties to let -fstrict-flex-arrays controlling 
>> -Warray-bounds was discussed in the following threads:
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
>> 
>> Please take a look at the discussion and let me know your opinion.
> 
> My opinion is now, after re-considering and with seeing your new 
> patch, that -Warray-bounds=2 should be changed to only add
> "the intermediate results of pointer arithmetic that may yield out of 
> bounds values" and that what it considers a flex array should now
> be controlled by -fstrict-flex-arrays only.
> 
> That is, I think, the only thing that's not confusing to users even
> if that implies a change from previous behavior that we should
> document by clarifying the -Warray-bounds documentation as well as
> by adding an entry to the Caveats section of gcc-13/changes.html
> 
> That also means that =2 will get _less_ warnings with GCC 13 when
> the user doesn't use -fstrict-flex-arrays as well.

Okay.  So, this is for -Warray-bounds=2.

For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
-fstrict-flex-array=N control -Warray-bounds=1?

Qing

> 
> Richard.
> 
> -- 
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)

[PATCH] ipa-cp: Do not be too optimistic about self-recursive edges (PR 107661)

2022-11-22 Thread Martin Jambor

Hi,

PR 107661 shows that function push_agg_values_for_index_from_edge
should not attempt to optimize self-recursive call graph edges when
called from cgraph_edge_brings_all_agg_vals_for_node.  Unlike when
being called from find_aggregate_values_for_callers_subset, we cannot
expect that any cloning for constants would lead to the edge leading
from a new clone to the same new clone, in this case it would only be
redirected to a new callee.

Fixed by adding a parameter to push_agg_values_from_edge whether being
optimistic about self-recursive edges is possible.

Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
trunk?

Thanks,

Martin


gcc/ChangeLog:

2022-11-22  Martin Jambor  

PR ipa/107661
* ipa-cp.cc (push_agg_values_from_edge): New parameter
optimize_self_recursion, use it to decide whether to pass interim to
the helper function.
(find_aggregate_values_for_callers_subset): Pass true in the new
parameter of push_agg_values_from_edge.
(cgraph_edge_brings_all_agg_vals_for_node): Pass false in the new
parameter of push_agg_values_from_edge.

gcc/testsuite/ChangeLog:

2022-11-22  Martin Jambor  

PR ipa/107661
* g++.dg/ipa/pr107661.C: New test.
---
 gcc/ipa-cp.cc   | 18 +++-
 gcc/testsuite/g++.dg/ipa/pr107661.C | 45 +
 2 files changed, 56 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr107661.C

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index d2bcd5e5e69..f0feb4beb8f 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -5752,14 +5752,16 @@ push_agg_values_for_index_from_edge (struct cgraph_edge 
*cs, int index,
description of ultimate callee of CS or the one it was cloned from (the
summary where lattices are).  If INTERIM is non-NULL, it contains the
current interim state of collected aggregate values which can be used to
-   compute values passed over self-recursive edges and to skip values which
-   clearly will not be part of intersection with INTERIM.  */
+   compute values passed over self-recursive edges (if OPTIMIZE_SELF_RECURSION
+   is true) and to skip values which clearly will not be part of intersection
+   with INTERIM.  */
 
 static void
 push_agg_values_from_edge (struct cgraph_edge *cs,
   ipa_node_params *dest_info,
   vec *res,
-  const ipa_argagg_value_list *interim)
+  const ipa_argagg_value_list *interim,
+  bool optimize_self_recursion)
 {
   ipa_edge_args *args = ipa_edge_args_sum->get (cs);
   if (!args)
@@ -5785,7 +5787,9 @@ push_agg_values_from_edge (struct cgraph_edge *cs,
   ipcp_param_lattices *plats = ipa_get_parm_lattices (dest_info, index);
   if (plats->aggs_bottom)
continue;
-  push_agg_values_for_index_from_edge (cs, index, res, interim);
+  push_agg_values_for_index_from_edge (cs, index, res,
+  optimize_self_recursion ? interim
+  : NULL);
 }
 }
 
@@ -5804,7 +5808,7 @@ find_aggregate_values_for_callers_subset (struct 
cgraph_node *node,
   /* gather_edges_for_value puts a non-recursive call into the first element of
  callers if it can.  */
   auto_vec interim;
-  push_agg_values_from_edge (callers[0], dest_info, &interim, NULL);
+  push_agg_values_from_edge (callers[0], dest_info, &interim, NULL, true);
 
   unsigned valid_entries = interim.length ();
   if (!valid_entries)
@@ -5815,7 +5819,7 @@ find_aggregate_values_for_callers_subset (struct 
cgraph_node *node,
 {
   auto_vec last;
   ipa_argagg_value_list avs (&interim);
-  push_agg_values_from_edge (callers[i], dest_info, &last, &avs);
+  push_agg_values_from_edge (callers[i], dest_info, &last, &avs, true);
 
   valid_entries = intersect_argaggs_with (interim, last);
   if (!valid_entries)
@@ -5882,7 +5886,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
cgraph_edge *cs,
   ipa_node_params *dest_info = ipa_node_params_sum->get (node);
   gcc_checking_assert (dest_info->ipcp_orig_node);
   dest_info = ipa_node_params_sum->get (dest_info->ipcp_orig_node);
-  push_agg_values_from_edge (cs, dest_info, &edge_values, &existing);
+  push_agg_values_from_edge (cs, dest_info, &edge_values, &existing, false);
   const ipa_argagg_value_list avl (&edge_values);
   return avl.superset_of_p (existing);
 }
diff --git a/gcc/testsuite/g++.dg/ipa/pr107661.C 
b/gcc/testsuite/g++.dg/ipa/pr107661.C
new file mode 100644
index 000..cc6f8538dbf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr107661.C
@@ -0,0 +1,45 @@
+/* { dg-do run  { target c++11 } } */
+/* { dg-options "-O1 -fipa-cp -fipa-cp-clone" } */
+
+struct R {} RGood;
+struct L {} LBad;
+
+volatile int vi;
+static void __attribute__((noipa)) L_run(void) { vi = 0; __builtin_abort (); }
+static void callback_fn_L(void) { vi

Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413]

2022-11-22 Thread Richard Sandiford via Gcc-patches

Wilco Dijkstra  writes:
> Hi Richard,
>
>> I guess an obvious question is: if 1 (rather than 2) was the right value
>> for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA
>> pipes?  It would be good to clarify how, conceptually, the core property
>> should map to the fma_reassoc_width value.
>
> 1 turns off reassociation so that FMAs get properly formed. After 
> reassociation far
> fewer FMAs get formed so we end up with more FLOPS which means slower 
> execution.
> It's a significant slowdown on cores that are not wide, have only 1 or 2 FP 
> pipes and
> may have high FP latencies. So we turn it off by default on all older cores.
>
>> It sounds from the existing comment like the main motivation for returning 1
>> was to encourage more FMAs to be formed, rather than to prevent FMAs from
>> being reassociated.  Is that no longer an issue?  Or is the point that,
>> with more FMA pipes, lower FMA formation is a price worth paying for
>> the better parallelism we get when FMAs can be formed?
>
> Exactly. A wide CPU can deal with the extra instructions, so the loss from 
> fewer
> FMAs ends up lower than the speedup from the extra parallelism. Having more 
> FMAs
> will be even faster of course.

Thanks.  It would be good to put this in a comment somewhere, perhaps above
the fma_reassoc_width field.  It isn't obvious from the patch as posted,
and changing the existing comment drops the previous hint about what
was going on.

>
>> Does this code ever see opc == FMA?
>
> No, that's the problem, reassociation ignores the fact that we actually want 
> FMAs.

Yeah, but I was wondering if later code would sometimes query this
hook for existing FMAs, even if that code wasn't the focus of the patch.
Once we add the distinction between FMAs and other ops, it seemed natural
to test for existing FMAs.

But of course, FMA is an rtl code rather than a tree code (oops), so that
was never going to happen.

> A smart reassociation pass could form more FMAs while also increasing
> parallelism, but the way it currently works always results in fewer FMAs.

Yeah, as Richard said, that seems the right long-term fix.
It would also avoid the hack of treating PLUS_EXPR as a signal
of an FMA, which has the drawback of assuming (for 2-FMA cores)
that plain addition never benefits from reassociation in its own right.

Still, I guess the hackiness is pre-existing and the patch is removing
the hackiness for some cores, so from that point of view it's a strict
improvement over the status quo.  And it's too late in the GCC 13
cycle to do FMA reassociation properly.  So I'm OK with the patch
in principle, but could you post an update with more commentary?

Thanks,
Richard

Re: [PATCH] ipa-cp: Do not be too optimistic about self-recursive edges (PR 107661)

2022-11-22 Thread Jan Hubicka via Gcc-patches

> Hi,
> 
> PR 107661 shows that function push_agg_values_for_index_from_edge
> should not attempt to optimize self-recursive call graph edges when
> called from cgraph_edge_brings_all_agg_vals_for_node.  Unlike when
> being called from find_aggregate_values_for_callers_subset, we cannot
> expect that any cloning for constants would lead to the edge leading
> from a new clone to the same new clone, in this case it would only be
> redirected to a new callee.
> 
> Fixed by adding a parameter to push_agg_values_from_edge whether being
> optimistic about self-recursive edges is possible.
> 
> Bootstrapped, LTO-bootstrapped and tested on x86_64-linux.  OK for
> trunk?
OK,
thanks!
Honya
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2022-11-22  Martin Jambor  
> 
>   PR ipa/107661
>   * ipa-cp.cc (push_agg_values_from_edge): New parameter
>   optimize_self_recursion, use it to decide whether to pass interim to
>   the helper function.
>   (find_aggregate_values_for_callers_subset): Pass true in the new
>   parameter of push_agg_values_from_edge.
>   (cgraph_edge_brings_all_agg_vals_for_node): Pass false in the new
>   parameter of push_agg_values_from_edge.
> 
> gcc/testsuite/ChangeLog:
> 
> 2022-11-22  Martin Jambor  
> 
>   PR ipa/107661
>   * g++.dg/ipa/pr107661.C: New test.
> ---
>  gcc/ipa-cp.cc   | 18 +++-
>  gcc/testsuite/g++.dg/ipa/pr107661.C | 45 +
>  2 files changed, 56 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr107661.C
> 
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index d2bcd5e5e69..f0feb4beb8f 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -5752,14 +5752,16 @@ push_agg_values_for_index_from_edge (struct 
> cgraph_edge *cs, int index,
> description of ultimate callee of CS or the one it was cloned from (the
> summary where lattices are).  If INTERIM is non-NULL, it contains the
> current interim state of collected aggregate values which can be used to
> -   compute values passed over self-recursive edges and to skip values which
> -   clearly will not be part of intersection with INTERIM.  */
> +   compute values passed over self-recursive edges (if 
> OPTIMIZE_SELF_RECURSION
> +   is true) and to skip values which clearly will not be part of intersection
> +   with INTERIM.  */
>  
>  static void
>  push_agg_values_from_edge (struct cgraph_edge *cs,
>  ipa_node_params *dest_info,
>  vec *res,
> -const ipa_argagg_value_list *interim)
> +const ipa_argagg_value_list *interim,
> +bool optimize_self_recursion)
>  {
>ipa_edge_args *args = ipa_edge_args_sum->get (cs);
>if (!args)
> @@ -5785,7 +5787,9 @@ push_agg_values_from_edge (struct cgraph_edge *cs,
>ipcp_param_lattices *plats = ipa_get_parm_lattices (dest_info, index);
>if (plats->aggs_bottom)
>   continue;
> -  push_agg_values_for_index_from_edge (cs, index, res, interim);
> +  push_agg_values_for_index_from_edge (cs, index, res,
> +optimize_self_recursion ? interim
> +: NULL);
>  }
>  }
>  
> @@ -5804,7 +5808,7 @@ find_aggregate_values_for_callers_subset (struct 
> cgraph_node *node,
>/* gather_edges_for_value puts a non-recursive call into the first element 
> of
>   callers if it can.  */
>auto_vec interim;
> -  push_agg_values_from_edge (callers[0], dest_info, &interim, NULL);
> +  push_agg_values_from_edge (callers[0], dest_info, &interim, NULL, true);
>  
>unsigned valid_entries = interim.length ();
>if (!valid_entries)
> @@ -5815,7 +5819,7 @@ find_aggregate_values_for_callers_subset (struct 
> cgraph_node *node,
>  {
>auto_vec last;
>ipa_argagg_value_list avs (&interim);
> -  push_agg_values_from_edge (callers[i], dest_info, &last, &avs);
> +  push_agg_values_from_edge (callers[i], dest_info, &last, &avs, true);
>  
>valid_entries = intersect_argaggs_with (interim, last);
>if (!valid_entries)
> @@ -5882,7 +5886,7 @@ cgraph_edge_brings_all_agg_vals_for_node (struct 
> cgraph_edge *cs,
>ipa_node_params *dest_info = ipa_node_params_sum->get (node);
>gcc_checking_assert (dest_info->ipcp_orig_node);
>dest_info = ipa_node_params_sum->get (dest_info->ipcp_orig_node);
> -  push_agg_values_from_edge (cs, dest_info, &edge_values, &existing);
> +  push_agg_values_from_edge (cs, dest_info, &edge_values, &existing, false);
>const ipa_argagg_value_list avl (&edge_values);
>return avl.superset_of_p (existing);
>  }
> diff --git a/gcc/testsuite/g++.dg/ipa/pr107661.C 
> b/gcc/testsuite/g++.dg/ipa/pr107661.C
> new file mode 100644
> index 000..cc6f8538dbf
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr107661.C
> @@ -0,0 +1,45 @@
> +/* { dg-do run  { target c++11 } } */
> +

[pushed] c++: don't use strchrnul [PR107781]

2022-11-22 Thread Jason Merrill via Gcc-patches

As Jonathan suggested.

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The contracts implementation was using strchrnul, which is a glibc
extension, so bootstrap broke on non-glibc targets.  Use C89 strcspn
instead.

PR c++/107781

gcc/cp/ChangeLog:

* contracts.cc (role_name_equal): Use strcspn instead
of strchrnul.
---
 gcc/cp/contracts.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/contracts.cc b/gcc/cp/contracts.cc
index f3afcc62ba0..a9097016768 100644
--- a/gcc/cp/contracts.cc
+++ b/gcc/cp/contracts.cc
@@ -210,8 +210,8 @@ lookup_concrete_semantic (const char *name)
 static bool
 role_name_equal (const char *role, const char *name)
 {
-  size_t role_len = strchrnul (role, ':') - role;
-  size_t name_len = strchrnul (name, ':') - name;
+  size_t role_len = strcspn (role, ":");
+  size_t name_len = strcspn (name, ":");
   if (role_len != name_len)
 return false;
   return strncmp (role, name, role_len) == 0;

base-commit: 4eb3a48698b2ca43967a4e7e7cfc0408192e85b2
-- 
2.31.1

Re: [PATCH 1/8]middle-end: Recognize scalar reductions from bitfields and array_refs

2022-11-22 Thread Jeff Law via Gcc-patches




On 11/22/22 04:08, Richard Biener via Gcc-patches wrote:

On Tue, 22 Nov 2022, Richard Sandiford wrote:


Tamar Christina  writes:

-Original Message-
From: Richard Biener 
Sent: Tuesday, November 22, 2022 10:59 AM
To: Richard Sandiford 
Cc: Tamar Christina via Gcc-patches ; Tamar
Christina ; Richard Biener
; nd 
Subject: Re: [PATCH 1/8]middle-end: Recognize scalar reductions from
bitfields and array_refs

On Tue, 22 Nov 2022, Richard Sandiford wrote:


Tamar Christina via Gcc-patches  writes:

So it's not easily possible the within current infrastructure.  But
it does look like ARM might eventually benefit from something like STV

on x86?

I'm not sure.  The problem with trying to do this in RTL is that
you'd have to be able to decide from two psuedos whether they come
from extracts that are sequential. When coming in from a hard
register that's easy yes.  When coming in from a load, or any other

operation that produces psuedos that becomes harder.

Yeah.

Just in case anyone reading the above is tempted to implement STV for
AArch64: I think it would set a bad precedent if we had a
paste-&-adjust version of the x86 pass.  AFAIK, the target
capabilities and constraints are mostly modelled correctly using
existing mechanisms, so I don't think there's anything particularly
target-specific about the process of forcing things to be on the general or

SIMD/FP side.

So if we did have an STV-ish thing for AArch64, I think it should be a
target-independent pass that uses hooks and recog, even if the pass is
initially enabled for AArch64 only.

Agreed - maybe some of the x86 code can be leveraged, but of course the
cost modeling is the most difficult to get right - IIRC the x86 backend resorts
to backend specific tuning flags rather than trying to get rtx_cost or insn_cost
"correct" here.


(FWIW, on the patch itself, I tend to agree that this is really an SLP
optimisation.  If the vectoriser fails to see the benefit, or if it
fails to handle more complex cases, then it would be good to try to
fix that.)

Also agreed - but costing is hard ;)

I guess, I still disagree here but I've clearly been out-Richard.  The problem 
is still
that this is just basic codegen.  I still don't think it requires -O2 to be 
usable.

So I guess the only correct implementation is to use an STV-like patch.  But 
given
that this is already the second attempt, first RTL one was rejected by Richard,
second GIMPLE one was rejected by Richi I'd like to get an agreement on this STV
thing before I waste months more..

I don't think this in itself is a good motivation for STV.  My comment
above was more about the idea of STV for AArch64 in general (since it
had been raised).

Personally I still think the reduction should be generated in gimple.

I agree, and the proper place to generate the reduction is in SLP.


Sorry to have sent things astray with my earlier ACK.  It looked 
reasonable to me.


jeff

Re: [PATCH] 8/19 modula2 front end: libgm2 contents

2022-11-22 Thread Gaius Mulley via Gcc-patches

Richard Biener  writes:

> On Mon, Oct 10, 2022 at 5:35 PM Gaius Mulley via Gcc-patches
>  wrote:
>>
>>
>>
>> This patch set consists of the libgm2 makefile, autoconf sources
>> necessary to build the libm2pim, libm2iso, libm2min, libm2cor
>> and libm2log.
>
> This looks OK.

Thanks!

> I suppose it was also tested building a cross-compiler?

yes it builds a cross compiler tool chain targetting aarch64, hosted and
built on amd64 gnu linux.

> Can we get some up-to-date status on the build and support status for the
> list of primary and secondary platforms we list on
> https://gcc.gnu.org/gcc-13/criteria.html?

Primary platform summary


aarch64-none-linux-gnu  bootstrapped 6 reg failures
arm-linux-gnueabi   (still building)
i686-pc-linux-gnu   bootstrapped 7 reg failures
powerpc64-unknown-linux-gnu bootstrapped 12 reg failures
powerpc64le-unknown-linux-gnu   bootstrapped 18 reg failures
sparc-sun-solaris2.11   (still building)
x86_64-pc-linux-gnu bootstrapped 6 reg failures
(tumbleweed and bullseye)


there are six regression tests failures common to all platforms (one
test failing with 6 option permutations and with a reasonably obvious
fix will be purged soon)


i586-unknown-freebsdfails at:

ctype_members.cc:137:3: error: redefinition of ‘bool 
std::ctype::do_is(std::ctype_base::mask, char_type) const’
  137 |   ctype::
  |   ^~
In file included from 
/home/gaius/GM2/graft-combine/build-devel-modula2-enabled/i586-unknown-freebsd13.0/libstdc++-v3/include/bits/locale_facets.h:1546,
 from 
/home/gaius/GM2/graft-combine/build-devel-modula2-enabled/i586-unknown-freebsd13.0/libstdc++-v3/include/locale:42,
 from ctype_members.cc:31:


regards,
Gaius

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Qing Zhao via Gcc-patches




> On Nov 22, 2022, at 9:10 AM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
>> 
>> On Mon, 21 Nov 2022, Qing Zhao wrote:
>> 
>>> 
>>> 
 On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
 
 On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> Hi, Richard,
> 
> Honestly, it?s very hard for me to decide what?s the best way to handle 
> the interaction 
> between -fstrict-flex-array=M and -Warray-bounds=N. 
> 
> Ideally,  -fstrict-flex-array=M should completely control the behavior of 
> -Warray-bounds.
> If possible, I prefer this solution.
> 
> However, -Warray-bounds is included in -Wall, and has been used 
> extensively for a long time.
> It?s not safe to change its default behavior. 
 
 I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
 it is in -Wall is _good_ for this reason. :) No one is going to add
 -fstrict-flex-arrays (at any level) without understanding what it does
 and wanting those effects on -Warray-bounds.
>>> 
>>> 
>>> The major difficulties to let -fstrict-flex-arrays controlling 
>>> -Warray-bounds was discussed in the following threads:
>>> 
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
>>> 
>>> Please take a look at the discussion and let me know your opinion.
>> 
>> My opinion is now, after re-considering and with seeing your new 
>> patch, that -Warray-bounds=2 should be changed to only add
>> "the intermediate results of pointer arithmetic that may yield out of 
>> bounds values" and that what it considers a flex array should now
>> be controlled by -fstrict-flex-arrays only.
>> 
>> That is, I think, the only thing that's not confusing to users even
>> if that implies a change from previous behavior that we should
>> document by clarifying the -Warray-bounds documentation as well as
>> by adding an entry to the Caveats section of gcc-13/changes.html
>> 
>> That also means that =2 will get _less_ warnings with GCC 13 when
>> the user doesn't use -fstrict-flex-arrays as well.
> 
> Okay.  So, this is for -Warray-bounds=2.
> 
> For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
> -fstrict-flex-array=N control -Warray-bounds=1?

More thinking on this. (I might misunderstand a little bit in the previous 
email)

If I understand correctly now, what you proposed was:

1. The level of -Warray-bounds will NOT control how a trailing array is 
considered as a flex array member anymore. 
2. Only the level of -fstrict-flex-arrays will control this;
3. Keep the current default  behavior of -Warray-bounds on treating trailing 
arrays as flex array member (treating all [0],[1], and [] as flexible array 
members). 
4. Updating the documentation for -Warray-bounds by clarifying this change, and 
also as an entry to the Caveats section on such change on -Warray-bounds.

If the above is correct, Yes, I like this change. Both the user interface and 
the internal implementation will be simplified and cleaner. 

Let me know if you see any issue with my above understanding.

Thanks a lot.

Qing

> 
> Qing
> 
>> 
>> Richard.
>> 
>> -- 
>> Richard Biener 
>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
>> HRB 36809 (AG Nuernberg)

Re: [PATCH] c++: Fix up -fcontract* options

2022-11-22 Thread Jason Merrill via Gcc-patches


On 11/21/22 18:00, Jakub Jelinek wrote:

Hi!

I've noticed
+FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from output: "  
-fcontract-build-level=[off|default|audit] Specify max contract level to generate runtime checks 
for"
error, this is due to missing dot at the end of the description.

The second part of the first hunk should fix that, but while at it,
I find it weird that some options don't have RejectNegative, yet
for options that accept an argument a negative option looks weird
and isn't really handled.


OK.


Though, shall we have those [on|off] options at all?
Those are inconsistent with all other boolean options gcc has.
Every other boolean option is -fwhatever for it being on
and -fno-whatever for it being off, shouldn't the options be
without arguments and accept negatives (-fcontract-assumption-mode
vs. -fno-contract-assumption-mode etc.)?


True, but I think let's leave them alone for now, they'll probably all 
be replaced as the feature specification evolves.



2022-11-21  Jakub Jelinek  

* c.opt (fcontract-assumption-mode=, fcontract-continuation-mode=,
fcontract-role=, fcontract-semantic=): Add RejectNegative.
(fcontract-build-level=): Terminate description with dot.

--- gcc/c-family/c.opt.jj   2022-11-19 09:21:14.31706 +0100
+++ gcc/c-family/c.opt  2022-11-21 23:51:55.605736499 +0100
@@ -1692,12 +1692,12 @@ EnumValue
  Enum(on_off) String(on) Value(1)
  
  fcontract-assumption-mode=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-assumption-mode=[on|off]   Enable or disable treating axiom level 
contracts as assumptions (default on).
  
  fcontract-build-level=

  C++ Joined RejectNegative
--fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for
+-fcontract-build-level=[off|default|audit] Specify max contract level to 
generate runtime checks for.
  
  fcontract-strict-declarations=

  C++ Var(flag_contract_strict_declarations) Enum(on_off) Joined Init(0) 
RejectNegative
@@ -1708,15 +1708,15 @@ C++ Var(flag_contract_mode) Enum(on_off)
  -fcontract-mode=[on|off]  Enable or disable all contract facilities 
(default on).
  
  fcontract-continuation-mode=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-continuation-mode=[on|off] Enable or disable contract continuation 
mode (default off).
  
  fcontract-role=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-role=:Specify the semantics for all 
levels in a role (default, review), or a custom contract role with given semantics (ex: 
opt:assume,assume,assume)
  
  fcontract-semantic=

-C++ Joined
+C++ Joined RejectNegative
  -fcontract-semantic=:Specify the concrete semantics for 
level
  
  fcoroutines


Jakub

Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Jeff Law via Gcc-patches




On 11/20/22 18:36, Kito Cheng wrote:

So the idea here is just to define the extension so that it gets defined
in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro


Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen 
behavior in the compiler, it does provide the mechanisms to pass along 
isa strings to other tools such as the assembler and signal via the test 
macros that this extension is available.



If so I think that it meets Andrew's requirements and at least some of 
those issues raised by Jim.   But I'm not sure it can address your 
concern WRT consistency.  In fact, I don't really see a way to address 
that concern with option #2 which Andrew seems to think is the only 
reasonable path forward from an RVI standpoint.



I'm at a loss for next steps, particularly as the newbie in this world.


jeff

Re: [PATCH] RISC-V: Add the Zihpm and Zicntr extensions

2022-11-22 Thread Palmer Dabbelt


On Tue, 22 Nov 2022 07:20:15 PST (-0800), jeffreya...@gmail.com wrote:


On 11/20/22 18:36, Kito Cheng wrote:

So the idea here is just to define the extension so that it gets defined
in the ISA strings and passed through to the assembler, right?

That will also define arch test marco:

https://github.com/riscv-non-isa/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro


Sorry I should have been clearer and included the test macro(s) as well.

So a better summary would be that while it doesn't change the codegen
behavior in the compiler, it does provide the mechanisms to pass along
isa strings to other tools such as the assembler and signal via the test
macros that this extension is available.


IMO the important bit here is that we're not adding any compatibility 
flags, like we did when fence.i was removed from the ISA.  That's fine 
as long as we never remove these instructions from the base ISA in the 
software, but that's what's suggested by Andrew in the post.



If so I think that it meets Andrew's requirements and at least some of
those issues raised by Jim.   But I'm not sure it can address your
concern WRT consistency.  In fact, I don't really see a way to address
that concern with option #2 which Andrew seems to think is the only
reasonable path forward from an RVI standpoint.


I'm at a loss for next steps, particularly as the newbie in this world.


It's a super weird one, but there's a bunch of cases in RISC-V where 
we're told to just ignore words in the ISA manual.  Definitely a trap 
for users (and we already had some Linux folks get bit by the counter 
changes here), but that's just how RISC-V works.

RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.

2022-11-22 Thread Tamar Christina via Gcc-patches

Ping

> -Original Message-
> From: Gcc-patches  bounces+tamar.christina=arm@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Friday, November 11, 2022 2:40 PM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [PATCH 5/8]AArch64 aarch64: Make existing V2HF be usable.
> 
> Hi,
> 
> 
> > This name might cause confusion with the SVE iterators, where FULL
> > means "every bit of the register is used".  How about something like
> > VMOVE instead?
> >
> > With this change, I guess VALL_F16 represents "The set of all modes
> > for which the vld1 intrinsics are provided" and VMOVE or whatever is
> > "All Advanced SIMD modes suitable for moving, loading, and storing".
> > That is, VMOVE extends VALL_F16 with modes that are not manifested via
> > intrinsics.
> >
> 
> Done.
> 
> > Where is the 2h used, and is it valid syntax in that context?
> >
> > Same for later instances of 2h.
> 
> They are, but they weren't meant to be in this patch.  They belong in a
> separate FP16 series that I won't get to finish for GCC 13 due not being able
> to finish writing all the tests.  I have moved them to that patch series 
> though.
> 
> While the addp patch series has been killed, this patch is still good 
> standalone
> and improves codegen as shown in the updated testcase.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-simd.md (*aarch64_simd_movv2hf): New.
>   (mov, movmisalign, aarch64_dup_lane,
>   aarch64_store_lane0, aarch64_simd_vec_set,
>   @aarch64_simd_vec_copy_lane, vec_set,
>   reduc__scal_, reduc__scal_,
>   aarch64_reduc__internal,
> aarch64_get_lane,
>   vec_init, vec_extract): Support V2HF.
>   (aarch64_simd_dupv2hf): New.
>   * config/aarch64/aarch64.cc (aarch64_classify_vector_mode):
>   Add E_V2HFmode.
>   * config/aarch64/iterators.md (VHSDF_P): New.
>   (V2F, VMOVE, nunits, Vtype, Vmtype, Vetype, stype, VEL,
>   Vel, q, vp): Add V2HF.
>   * config/arm/types.md (neon_fp_reduc_add_h): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/sve/slp_1.c: Update testcase.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index
> f4152160084d6b6f34bd69f0ba6386c1ab50f77e..487a31010245accec28e779661
> e6c2d578fca4b7 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -19,10 +19,10 @@
>  ;; .
> 
>  (define_expand "mov"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> - (match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> + (match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD"
> -  "
> +{
>/* Force the operand into a register if it is not an
>   immediate whose use can be replaced with xzr.
>   If the mode is 16 bytes wide, then we will be doing @@ -46,12 +46,11 @@
> (define_expand "mov"
>aarch64_expand_vector_init (operands[0], operands[1]);
>DONE;
>  }
> -  "
> -)
> +})
> 
>  (define_expand "movmisalign"
> -  [(set (match_operand:VALL_F16 0 "nonimmediate_operand")
> -(match_operand:VALL_F16 1 "general_operand"))]
> +  [(set (match_operand:VMOVE 0 "nonimmediate_operand")
> +(match_operand:VMOVE 1 "general_operand"))]
>"TARGET_SIMD && !STRICT_ALIGNMENT"
>  {
>/* This pattern is not permitted to fail during expansion: if both 
> arguments
> @@ -73,6 +72,16 @@ (define_insn "aarch64_simd_dup"
>[(set_attr "type" "neon_dup, neon_from_gp")]
>  )
> 
> +(define_insn "aarch64_simd_dupv2hf"
> +  [(set (match_operand:V2HF 0 "register_operand" "=w")
> + (vec_duplicate:V2HF
> +   (match_operand:HF 1 "register_operand" "0")))]
> +  "TARGET_SIMD"
> +  "@
> +   sli\\t%d0, %d1, 16"
> +  [(set_attr "type" "neon_shift_imm")]
> +)
> +
>  (define_insn "aarch64_simd_dup"
>[(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
>   (vec_duplicate:VDQF_F16
> @@ -85,10 +94,10 @@ (define_insn "aarch64_simd_dup"
>  )
> 
>  (define_insn "aarch64_dup_lane"
> -  [(set (match_operand:VALL_F16 0 "register_operand" "=w")
> - (vec_duplicate:VALL_F16
> +  [(set (match_operand:VMOVE 0 "register_operand" "=w")
> + (vec_duplicate:VMOVE
> (vec_select:
> - (match_operand:VALL_F16 1 "register_operand" "w")
> + (match_operand:VMOVE 1 "register_operand" "w")
>   (parallel [(match_operand:SI 2 "immediate_operand" "i")])
>)))]
>"TARGET_SIMD"
> @@ -142,6 +151,29 @@ (define_insn
> "*aarch64_simd_mov"
>mov_reg, neon_move")]
>  )
> 
> +(define_insn "*aarch64_simd_movv2hf"
> +  [(set (match_operand:V2HF 0 "nonimmediate_operand"
> + "=w, m,  m,  w, ?r, ?w, ?r, w, w")
> + (matc

Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-22 Thread Richard Sandiford via Gcc-patches

"Kewen.Lin"  writes:
> Hi Richard,
>
> Many thanks for your review comments!
>
 on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
> Hi,
>
> As discussed in PR98125, -fpatchable-function-entry with
> SECTION_LINK_ORDER support doesn't work well on powerpc64
> ELFv1 because the filled "Symbol" in
>
>   .section name,"flags"o,@type,Symbol
>
> sits in .opd section instead of in the function_section
> like .text or named .text*.
>
> Since we already generates one label LPFE* which sits in
> function_section of current_function_decl, this patch is
> to reuse it as the symbol for the linked_to section.  It
> avoids the above ABI specific issue when using the symbol
> concluded from current_function_decl.
>
> Besides, with this support some previous workarounds for
> powerpc64 ELFv1 can be reverted.
>
> btw, rs6000_print_patchable_function_entry can be dropped
> but there is another rs6000 patch which needs this rs6000
> specific hook rs6000_print_patchable_function_entry, not
> sure which one gets landed first, so just leave it here.
>
> Bootstrapped and regtested on below:
>
>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>  and latest binutils 2.39.
>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>   4) x86_64-redhat-linux with default binutils 2.30
>  and latest binutils 2.39.
>   5) aarch64-linux-gnu  with default binutils 2.30
>  and latest binutils 2.39.
>
>
> [snip...]
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 4db8506b106..d4de6e164ee 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
> unsigned int flags,
>   fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>if (flags & SECTION_LINK_ORDER)
>   {
> -   tree id = DECL_ASSEMBLER_NAME (decl);
> -   ultimate_transparent_alias_target (&id);
> -   const char *name = IDENTIFIER_POINTER (id);
> -   name = targetm.strip_name_encoding (name);
> -   fprintf (asm_out_file, ",%s", name);
> +   /* For now, only section "__patchable_function_entries"
> +  adopts flag SECTION_LINK_ORDER, internal label LPFE*
> +  was emitted in default_print_patchable_function_entry,
> +  just place it here for linked_to section.  */
> +   gcc_assert (!strcmp (name, "__patchable_function_entries"));
>> 
>> I like the idea of removing the rs600 workaround in favour of making the
>> target-independent more robust.  But this seems a bit hackish.  What
>> would we do if SECTION_LINK_ORDER was used for something else in future?
>> 
>
> Good question!  I think it depends on how we can get the symbol for the
> linked_to section, if adopting the name of the decl will suffer the
> similar issue which this patch wants to fix, we have to reuse the label
> LPFE* or some kind of new artificial label in the related section; or
> we can just go with the name of the given decl, or something related to
> that decl.  Since we can't predict any future uses, I just placed an
> assertion here to ensure that we would revisit and adjust this part at
> that time.  Does it sound reasonable to you?

Yeah, I guess that's good enough.  If the old scheme ends up being
correct for some future use, we can make the new behaviour conditional
on __patchable_function_entries.

So yeah, the patch LGTM to me, thanks.

Richard

Re: [PATCH] aarch64: Fix test_dfp_17.c for big-endian [PR 107604]

2022-11-22 Thread Christophe Lyon via Gcc-patches





On 11/22/22 12:33, Richard Earnshaw wrote:



On 22/11/2022 11:21, Richard Sandiford wrote:

Richard Earnshaw via Gcc-patches  writes:

On 22/11/2022 09:01, Christophe Lyon via Gcc-patches wrote:

gcc.target/aarch64/aapcs64/test_dfp_17.c has been failing on
big-endian, because the _Decimal32 on-stack argument is not padded in
the same direction depending on endianness.

This patch fixes the testcase so that it expects the argument in the
right stack location, similarly to what other tests do in the same
directory.

gcc/testsuite/ChangeLog:

PR target/107604
* gcc.target/aarch64/aapcs64/test_dfp_17.c: Fix for big-endian.
---
   gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c

index 22dc462bf7c..3c45f715cf7 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/test_dfp_17.c
@@ -32,6 +32,10 @@ struct z b = { 9.0dd, 10.0dd, 11.0dd, 12.0dd };
 ANON(struct z, a, D1)
 ANON(struct z, b, STACK)
 ANON(int , 5, W0)
+#ifndef __AAPCS64_BIG_ENDIAN__
 ANON(_Decimal32, f1, STACK+32) /* Note: no promotion to 
_Decimal64.  */

+#else
+  ANON(_Decimal32, f1, STACK+36) /* Note: no promotion to 
_Decimal64.  */

+#endif
 LAST_ANON(_Decimal64, 0.5dd, STACK+40)
   #endif


Why would a Decimal32 change stack placement based on the endianness?
Isn't it a 4-byte object?


Yes, but PARM_BOUNDARY (64) sets a minimum alignment for all stack 
arguments.


Richard


Ah, OK.


Indeed, it was not immediately obvious to me either when looking at 
aarch64_layout_arg. aarch64_function_arg_padding comes into play, too.




I wonder if we should have a new macro in the tests, something like 
ANON_PADDED to describe this case and that works things out more 
automagically for big-endian.

Maybe, there are quite a few tests under aapcs64 which have a similar
#ifndef __AAPCS64_BIG_ENDIAN__



I notice the new ANON definition is not correctly indented.

R.

Re: [PATCH v4] LoongArch: Optimize immediate load.

2022-11-22 Thread Xi Ruoyao via Gcc-patches

On Tue, 2022-11-22 at 22:03 +0800, Xi Ruoyao via Gcc-patches wrote:
> While I still can't fully understand the immediate load issue and how
> this patch fix it, I've tested this patch (alongside the prefetch
> instruction patch) with bootstrap-ubsan.  And the compiled result of
> imm-load1.c seems OK.

And it's doing correct thing for Glibc "improved generic string
functions" patch, producing some really tight loop now.

> 
> On Thu, 2022-11-17 at 17:59 +0800, Lulu Cheng wrote:
> > v1 -> v2:
> > 1. Change the code format.
> > 2. Fix bugs in the code.
> > 
> > v2 -> v3:
> > Modifying a code implementation of an undefined behavior.
> > 
> > v3 -> v4:
> > Move the part of the immediate number decomposition from expand pass
> > to split
> > pass.
> > 
> > Both regression tests and spec2006 passed.
> > 
> > The problem mentioned in the link does not move the four immediate
> > load
> > instructions out of the loop. It has been optimized. Now, as in the
> > test case,
> > four immediate load instructions are generated outside the loop.
> > (
> > https://sourceware.org/pipermail/libc-alpha/2022-September/142202.html
> > )
> > 
> > 
> > Because loop2_invariant pass will extract the instructions that do
> > not
> > change
> > in the loop out of the loop, some instructions will not meet the
> > extraction
> > conditions if the machine performs immediate decomposition while
> > expand pass,
> > so the immediate decomposition will be transferred to the split
> > process.
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.cc (enum
> > loongarch_load_imm_method):
> > Remove the member METHOD_INSV that is not currently used.
> > (struct loongarch_integer_op): Define a new member
> > curr_value,
> > that records the value of the number stored in the
> > destination
> > register immediately after the current instruction has run.
> > (loongarch_build_integer): Assign a value to the curr_value
> > member variable.
> > (loongarch_move_integer): Adds information for the immediate
> > load instruction.
> > * config/loongarch/loongarch.md (*movdi_32bit): Redefine as
> > define_insn_and_split.
> > (*movdi_64bit): Likewise.
> > (*movsi_internal): Likewise.
> > (*movhi_internal): Likewise.
> > * config/loongarch/predicates.md: Return true as long as it
> > is
> > CONST_INT, ensure
> > that the immediate number is not optimized by decomposition
> > during expand
> > optimization loop.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/loongarch/imm-load.c: New test.
> > * gcc.target/loongarch/imm-load1.c: New test.
> > ---
> >  gcc/config/loongarch/loongarch.cc | 62 ++--
> > --
> > -
> >  gcc/config/loongarch/loongarch.md | 44 +++--
> >  gcc/config/loongarch/predicates.md    |  2 +-
> >  gcc/testsuite/gcc.target/loongarch/imm-load.c | 10 +++
> >  .../gcc.target/loongarch/imm-load1.c  | 26 
> >  5 files changed, 110 insertions(+), 34 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load.c
> >  create mode 100644 gcc/testsuite/gcc.target/loongarch/imm-load1.c
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index 8ee32c90573..9e0d6c7c3ea 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -139,22 +139,21 @@ struct loongarch_address_info
> >  
> >     METHOD_LU52I:
> >   Load 52-63 bit of the immediate number.
> > -
> > -   METHOD_INSV:
> > - immediate like 0xfff0fxxx
> > -   */
> > +*/
> >  enum loongarch_load_imm_method
> >  {
> >    METHOD_NORMAL,
> >    METHOD_LU32I,
> > -  METHOD_LU52I,
> > -  METHOD_INSV
> > +  METHOD_LU52I
> >  };
> >  
> >  struct loongarch_integer_op
> >  {
> >    enum rtx_code code;
> >    HOST_WIDE_INT value;
> > +  /* Represent the result of the immediate count of the load
> > instruction at
> > + each step.  */
> > +  HOST_WIDE_INT curr_value;
> >    enum loongarch_load_imm_method method;
> >  };
> >  
> > @@ -1475,24 +1474,27 @@ loongarch_build_integer (struct
> > loongarch_integer_op *codes,
> >  {
> >    /* The value of the lower 32 bit be loaded with one
> > instruction.
> >  lu12i.w.  */
> > -  codes[0].code = UNKNOWN;
> > -  codes[0].method = METHOD_NORMAL;
> > -  codes[0].value = low_part;
> > +  codes[cost].code = UNKNOWN;
> > +  codes[cost].method = METHOD_NORMAL;
> > +  codes[cost].value = low_part;
> > +  codes[cost].curr_value = low_part;
> >    cost++;
> >  }
> >    else
> >  {
> >    /* lu12i.w + ior.  */
> > -  codes[0].code = UNKNOWN;
> > -  codes[0].method = METHOD_NORMAL;
> > -  codes[0].value = low_part & ~(IMM_REACH - 1);
> > +  codes[cost].code = UNKNOWN;
> > +  codes[cost].method = METH

RE: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x instrinsic

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Stam Markianos-Wright  wri...@arm.com>
> Subject: [PATCH 16/35] arm: Add integer vector overloading of vsubq_x
> instrinsic
> 
> From: Stam Markianos-Wright 
> 
> In the past we had only defined the vsubq_x generic overload of the
> vsubq_x_* intrinsics for float vector types.  This would cause them
> to fall back to the `__ARM_undef` failure state if they was called
> through the generic version.
> This patch simply adds these overloads.

Ok.
Thanks,
Kyrill

> 
> gcc/ChangeLog:
> 
> * config/arm/arm_mve.h (__arm_vsubq_x FP): New overloads.
>  (__arm_vsubq_x Integer): New.
> ---
>  gcc/config/arm/arm_mve.h | 28 
>  1 file changed, 28 insertions(+)
> 
> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
> index f6b42dc3fab..09167ec118e 100644
> --- a/gcc/config/arm/arm_mve.h
> +++ b/gcc/config/arm/arm_mve.h
> @@ -38259,6 +38259,18 @@ extern void *__ARM_undef;
>  #define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
>__typeof(p2) __p2 = (p2); \
>_Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]:
> __arm_vsubq_x_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce(__p2, uint8x16_t), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]:
> __arm_vsubq_x_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce(__p2, uint16x8_t), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]:
> __arm_vsubq_x_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce(__p2, uint32x4_t), p3), \
> +  int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u8 (__ARM_mve_coerce(__p1, uint8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u16 (__ARM_mve_coerce(__p1, uint16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_u32 (__ARM_mve_coerce(__p1, uint32x4_t),
> __ARM_mve_coerce3(p2, int), p3), \
>int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]:
> __arm_vsubq_x_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce(__p2, float16x8_t), p3), \
>int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]:
> __arm_vsubq_x_f32 (__ARM_mve_coerce(__p1, float32x4_t),
> __ARM_mve_coerce(__p2, float32x4_t), p3), \
>int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_fp_n]:
> __arm_vsubq_x_n_f16 (__ARM_mve_coerce(__p1, float16x8_t),
> __ARM_mve_coerce2(p2, double), p3), \
> @@ -40223,6 +40235,22 @@ extern void *__ARM_undef;
>int (*)[__ARM_mve_type_uint16_t_ptr]: __arm_vld4q_u16
> (__ARM_mve_coerce1(p0, uint16_t *)), \
>int (*)[__ARM_mve_type_uint32_t_ptr]: __arm_vld4q_u32
> (__ARM_mve_coerce1(p0, uint32_t *
> 
> +#define __arm_vsubq_x(p1,p2,p3) ({ __typeof(p1) __p1 = (p1); \
> +  __typeof(p2) __p2 = (p2); \
> +  _Generic( (int
> (*)[__ARM_mve_typeid(__p1)][__ARM_mve_typeid(__p2)])0, \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]:
> __arm_vsubq_x_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce(__p2, int8x16_t), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]:
> __arm_vsubq_x_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce(__p2, int16x8_t), p3), \
> +  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]:
> __arm_vsubq_x_s32 (__ARM_mve_coerce(__p1, int32x4_t),
> __ARM_mve_coerce(__p2, int32x4_t), p3), \
> +  int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s8 (__ARM_mve_coerce(__p1, int8x16_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]:
> __arm_vsubq_x_n_s16 (__ARM_mve_coerce(__p1, int16x8_t),
> __ARM_mve_coerce3(p2, int), p3), \
> +  int (*)

RE: [PATCH 17/35] arm: improve tests and fix vadd*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 17/35] arm: improve tests and fix vadd*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vaddlvq_p_v4si)
>   (mve_vaddq_n_, mve_vaddvaq_)
>   (mve_vaddlvaq_v4si, mve_vaddq_n_f)
>   (mve_vaddlvaq_p_v4si, mve_vaddq,
> mve_vaddq_f):
>   Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_p_s32.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvaq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddlvq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddq_x_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vaddvaq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/

RE: [PATCH 18/35] arm: improve tests for vmulq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 18/35] arm: improve tests for vmulq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmulq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vmulq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmulq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../gcc.target/arm/mve/intrinsics/vmulq_f16.c | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vmulq_f32.c | 16 ++-
>  .../arm/mve/intrinsics/vmulq_m_f16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_f32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_f16.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_f32.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_s16.c| 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_s32.c| 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_s8.c | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_n_u16.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_u32.c| 42 +--
>  .../arm/mve/intrinsics/vmulq_m_n_u8.c | 42 +--
>  .../arm/mve/intrinsics/vmulq_m_s16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_s32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_s8.c   | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u16.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u32.c  | 26 ++--
>  .../arm/mve/intrinsics/vmulq_m_u8.c   | 26 ++--
>  .../arm/mve/intrinsics/vmulq_n_f16.c  | 28 -
>  .../arm/mve/intrinsics/vmulq_n_f32.c  | 28 -
>  .../arm/mve/intrinsics/vmulq_n_s16.c  | 16 ++-
>  .../arm/mve/intrinsics/vmulq_n_s32.c  | 16 ++-
>  .../arm/mve/intrinsics/vmulq_n_s8.c   | 16 ++-
>  .../arm/mv

RE: [PATCH 19/35] arm: improve tests and fix vsubq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 19/35] arm: improve tests and fix vsubq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vsubq_n_f): Fix spacing.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vsubq_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vsubq_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_f16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  2 +-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f16.c | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vsubq_f32.c | 16 ++-
>  .../arm/mve/intrinsics/vsubq_m_f16.c  | 26 --
>  .../arm/mve/intrinsics/vsubq_m_f32.c  | 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_f16.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_f32.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_s16.c| 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_s32.c| 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_s8.c | 26 --
>  .../arm/mve/intrinsics/vsubq_m_n_u16.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_u32.c| 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_n_u8.c | 42 ++--
>  .../arm/mve/intrinsics/vsubq_m_s16.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_s32.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_s8.c   | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u16.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u32.c  | 25 --
>  .../arm/mve/intrinsics/vsubq_m_u8.c   | 25 --
>  .../arm/mve/intrinsics/vsubq_n_f16.c  | 28 ++-
>  .../arm/mve/intrinsics/vsubq_n_f32.c  | 28 ++-
>  .../arm/mve/intrinsics/vsubq_n_s16.c  | 17 +--
>  .../arm/mve/intrinsics/vsubq_n_s3

RE: [PATCH 21/35] arm: improve tests for vhaddq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 21/35] arm: improve tests for vhaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhaddq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhaddq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhaddq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vhaddq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vhaddq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vhaddq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhaddq_u8.c | 16 ++-
>  .../arm/mve/intrinsics/vhaddq_x_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhaddq_x_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhaddq_x_s16.c | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_s32.c | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_u16.c | 25 +--
>  .../arm/mve/intrinsics/vhaddq_x_u3

RE: [PATCH 20/35] arm: improve tests for vfmasq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 20/35] arm: improve tests for vfmasq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vfmasq_m_n_f16.c   | 50 ---
>  .../arm/mve/intrinsics/vfmasq_m_n_f32.c   | 50 ---
>  2 files changed, 84 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> index 06d2d114e46..03b376c9bbe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float16x8_t
> -foo (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f16 (a, b, c, p);
> +  return vfmasq_m_n_f16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float16x8_t
> -foo1 (float16x8_t a, float16x8_t b, float16_t c, mve_pred16_t p)
> +foo1 (float16x8_t m1, float16x8_t m2, float16_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo2 (float16x8_t m1, float16x8_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> index bf1773d0eeb..ecf30ba9826 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float32x4_t
> -foo (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m_n_f32 (a, b, c, p);
> +  return vfmasq_m_n_f32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  float32x4_t
> -foo1 (float32x4_t a, float32x4_t b, float32_t c, mve_pred16_t p)
> +foo1 (float32x4_t m1, float32x4_t m2, float32_t add, mve_pred16_t p)
>  {
> -  return vfmasq_m (a, b, c, p);
> +  return vfmasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vfmast.f32"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vfmast.f32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo2 (float32x4_t m1, float32x4_t m2, mve_pred16_t p)
> +{
> +  return vfmasq_m (m1, m2, 1.1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1

RE: [PATCH 23/35] arm: improve tests for viwdupq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 23/35] arm: improve tests for viwdupq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/viwdupq_m_n_u16.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u16.c | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u32.c | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_m_wb_u8.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_n_u16.c| 32 ++--
>  .../arm/mve/intrinsics/viwdupq_n_u32.c| 32 ++--
>  .../arm/mve/intrinsics/viwdupq_n_u8.c | 28 ++-
>  .../arm/mve/intrinsics/viwdupq_wb_u16.c   | 36 ++---
>  .../arm/mve/intrinsics/viwdupq_wb_u32.c   | 36 ++---
>  .../arm/mve/intrinsics/viwdupq_wb_u8.c| 36 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u16.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u32.c  | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_n_u8.c   | 46 ++---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u16.c | 50 ---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u32.c | 50 ---
>  .../arm/mve/intrinsics/viwdupq_x_wb_u8.c  | 50 ---
>  18 files changed, 658 insertions(+), 106 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> index 0f999cc672b..67a2465f435 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_m_n_u16.c
> @@ -1,23 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m_n_u16 (inactive, a, b, 2, p);
> +  return viwdupq_m_n_u16 (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
>  uint16x8_t
>  foo1 (uint16x8_t inactive, uint32_t a, uint32_t b, mve_pred16_t p)
>  {
> -  return viwdupq_m (inactive, a, b, 2, p);
> +  return viwdupq_m (inactive, a, b, 1, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "viwdupt.u16"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   viwdupt.u16 q[0-9]+, (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), 
> #[0-9]+(?:
>   @.*|)
> +**   ...
> +*/
> +uint16x8_t
> +foo2 (uint16x8_t inactive, mve_pred16_t p)
> +{
> +  return viwdupq_m (inactive, 1, 1, 1, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
>

RE: [PATCH 22/35] arm: improve tests for vhsubq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 22/35] arm: improve tests for vhsubq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vhsubq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vhsubq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhsubq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vhsubq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vhsubq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vhsubq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vhsubq_u8.c | 16 ++-
>  .../arm/mve/intrinsics/vhsubq_x_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vhsubq_x_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vhsubq_x_s16.c | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_s32.c | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_s8.c  | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_u16.c | 25 +--
>  .../arm/mve/intrinsics/vhsubq_x_u3

RE: [PATCH 24/35] arm: improve tests for vmladavaq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 24/35] arm: improve tests for vmladavaq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaq_p_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_p_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmladavaxq_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmladavaq_p_s16.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_s32.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_s8.c   | 33 ++---
>  .../arm/mve/intrinsics/vmladavaq_p_u16.c  | 49 ---
>  .../arm/mve/intrinsics/vmladavaq_p_u32.c  | 49 ---
>  .../arm/mve/intrinsics/vmladavaq_p_u8.c   | 49 ---
>  .../arm/mve/intrinsics/vmladavaxq_p_s16.c | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s32.c | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_p_s8.c  | 33 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s16.c   | 24 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s32.c   | 24 ++---
>  .../arm/mve/intrinsics/vmladavaxq_s8.c| 24 ++---
>  12 files changed, 336 insertions(+), 81 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> index e458204c41b..f3e5eba3b08 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s16   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s16 (a, b, c, p);
> +  return vmladavaq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s16   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo1 (int32_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int32_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p (a, b, c, p);
> +  return vmladavaq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> -/* { dg-final { scan-assembler "vmladavat.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> index e3544787adb..71f6957bfc5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmladavaq_p_s32.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s32   (?:ip|fp|r[0-9]+), q[0-9]+, q[0-9]+(?:  @.*|)
> +**   ...
> +*/
>  int32_t
> -foo (int32_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int32_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmladavaq_p_s32 (a, b, c, p);
> +  return vmladavaq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmladavat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmladavat.s32

RE: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 25/35] arm: improve tests and fix vmlaldavaxq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vmlaldavaq_)
>   (mve_vmlaldavaxq_s, mve_vmlaldavaxq_p_):
> Fix
>   spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c: Improve
> tests.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlaldavaxq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  6 ++--
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s16.c| 32 +++
>  .../arm/mve/intrinsics/vmlaldavaxq_p_s32.c| 32 +++
>  .../arm/mve/intrinsics/vmlaldavaxq_s16.c  | 24 ++
>  .../arm/mve/intrinsics/vmlaldavaxq_s32.c  | 24 ++
>  5 files changed, 91 insertions(+), 27 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index 714dc6fc7ce..d2ffae6a425 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -4163,7 +4163,7 @@ (define_insn "mve_vmlaldavaq_"
>VMLALDAVAQ))
>]
>"TARGET_HAVE_MVE"
> -  "vmlaldava.%# %Q0, %R0, %q2, %q3"
> +  "vmlaldava.%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -4179,7 +4179,7 @@ (define_insn "mve_vmlaldavaxq_s"
>VMLALDAVAXQ_S))
>]
>"TARGET_HAVE_MVE"
> -  "vmlaldavax.s%# %Q0, %R0, %q2, %q3"
> +  "vmlaldavax.s%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -6126,7 +6126,7 @@ (define_insn
> "mve_vmlaldavaxq_p_"
>VMLALDAVAXQ_P))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;vmlaldavaxt.%# %Q0, %R0, %q2, %q3"
> +  "vpst\;vmlaldavaxt.%#\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> index f33d3880236..87f0354a636 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s16.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s16 (a, b, c, p);
> +  return vmlaldavaxq_p_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s16 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo1 (int64_t a, int16x8_t b, int16x8_t c, mve_pred16_t p)
> +foo1 (int64_t add, int16x8_t m1, int16x8_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p (a, b, c, p);
> +  return vmlaldavaxq_p (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> index ab072a9850e..d26bf5b90af 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlaldavaxq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlaldavaxt.s32 (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
> -foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
> +foo (int64_t add, int32x4_t m1, int32x4_t m2, mve_pred16_t p)
>  {
> -  return vmlaldavaxq_p_s32 (a, b, c, p);
> +  return vmlaldavaxq_p_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vmlaldavaxt.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+

RE: [PATCH 26/35] arm: improve tests for vmlasq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 26/35] arm: improve tests for vmlasq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vmlasq_n_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vmlasq_m_n_s16.c   | 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s32.c   | 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_s8.c| 34 ++---
>  .../arm/mve/intrinsics/vmlasq_m_n_u16.c   | 50 ---
>  .../arm/mve/intrinsics/vmlasq_m_n_u32.c   | 50 ---
>  .../arm/mve/intrinsics/vmlasq_m_n_u8.c| 50 ---
>  .../arm/mve/intrinsics/vmlasq_n_s16.c | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_s32.c | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_s8.c  | 24 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u16.c | 36 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u32.c | 36 ++---
>  .../arm/mve/intrinsics/vmlasq_n_u8.c  | 36 ++---
>  12 files changed, 348 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> index bf66e616ec7..af6e588adad 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s16 (a, b, c, p);
> +  return vmlasq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m (a, b, c, p);
> +  return vmlasq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> index 53c21e2e5b6..9d0cc3076d9 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vmlasq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vmlast.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vmlasq_m_n_s32 (a, b, c, p);
> +  return vmlasq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vmlast.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)

RE: [PATCH 27/35] arm: improve tests for vqaddq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 27/35] arm: improve tests for vqaddq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqaddq_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqaddq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vqaddq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqaddq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vqaddq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vqaddq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vqaddq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqaddq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqaddq_u8.c | 16 ++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> index 65d3f770fe2..a659373d441 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqaddq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqaddt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqaddq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqaddt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqaddt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqaddq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assemble

RE: [PATCH 28/35] arm: improve tests for vqdmlahq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 28/35] arm: improve tests for vqdmlahq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlahq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmlashq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill


> ---
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s16.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s32.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_m_n_s8.c  | 34 ++-
>  .../arm/mve/intrinsics/vqdmlahq_n_s16.c   | 24 +
>  .../arm/mve/intrinsics/vqdmlahq_n_s32.c   | 24 +
>  .../arm/mve/intrinsics/vqdmlahq_n_s8.c| 24 +
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s16.c| 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s32.c| 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_m_n_s8.c | 34 ++-
>  .../arm/mve/intrinsics/vqdmlashq_n_s16.c  | 24 +
>  .../arm/mve/intrinsics/vqdmlashq_n_s32.c  | 24 +
>  .../arm/mve/intrinsics/vqdmlashq_n_s8.c   | 24 +
>  12 files changed, 264 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> index d8c4f4bab8e..94d93874542 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s16 (a, b, c, p);
> +  return vqdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s16q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m (a, b, c, p);
> +  return vqdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> index 361f5d00bdf..a3dab7fa02e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmlaht.s32q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqdmlahq_m_n_s32 (a, b, c, p);
> +  return vqdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:

RE: [PATCH 29/35] arm: improve tests for vqdmul*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 29/35] arm: improve tests for vqdmul*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c: Improve
> tests.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulhq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmullbq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqdmulltq_s32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s16.c | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s32.c | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_n_s8.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s16.c   | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s32.c   | 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_m_s8.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulhq_n_s16.c   | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s32.c   | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_n_s8.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s16.c | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s32.c | 16 ++--
>  .../arm/mve/intrinsics/vqdmulhq_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s16.c| 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_n_s32.c| 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmullbq_n_s16.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_n_s32.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_s16.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmullbq_s32.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s16.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_n_s32.c| 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_s16.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_m_s32.c  | 26 ---
>  .../arm/mve/intrinsics/vqdmulltq_n_s16.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_n_s32.c  | 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_s16.c| 16 ++--
>  .../arm/mve/intrinsics/vqdmulltq_s32.c| 16 ++--
>  28 files changed, 504 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> index 57ab85eaf52..a5c1a106205 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqdmulhq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqdmulht.s16q[0-9]+, q[

RE: [PATCH 30/35] arm: improve tests for vqrdmlahq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 30/35] arm: improve tests for vqrdmlahq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c: Improve
> test.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vqrdmlahq_n_s8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s16.c| 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s32.c| 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_m_n_s8.c | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlahq_n_s16.c  | 24 +
>  .../arm/mve/intrinsics/vqrdmlahq_n_s32.c  | 24 +
>  .../arm/mve/intrinsics/vqrdmlahq_n_s8.c   | 24 +
>  6 files changed, 132 insertions(+), 42 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> index 70c3fa0e9b1..07d689279ac 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s16   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s16 (a, b, c, p);
> +  return vqrdmlahq_m_n_s16 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s16   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t add, int16x8_t m1, int16_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> index 75ed9911276..3b02ca16038 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s32   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m_n_s32 (a, b, c, p);
> +  return vqrdmlahq_m_n_s32 (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlaht.s32   q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t add, int32x4_t m1, int32_t m2, mve_pred16_t p)
>  {
> -  return vqrdmlahq_m (a, b, c, p);
> +  return vqrdmlahq_m (add, m1, m2, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlaht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlahq_m_n_s8.c
> index ddaea545f40..b661bdcb4cf 100644
> -

RE: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 31/35] arm: improve tests for vqrdmlashq_m*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c:

Missing ChangeLog entries.
Ok with that fixed.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s16.c   | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s32.c   | 34 ++-
>  .../arm/mve/intrinsics/vqrdmlashq_m_n_s8.c| 34 ++-
>  3 files changed, 78 insertions(+), 24 deletions(-)
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> index 35b9618ca47..da4d724bb46 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s16 (a, b, c, p);
> +  return vqrdmlashq_m_n_s16 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
> -foo1 (int16x8_t a, int16x8_t b, int16_t c, mve_pred16_t p)
> +foo1 (int16x8_t m1, int16x8_t m2, int16_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> index 8517835eb61..2430f1cb102 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s32.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m_n_s32 (a, b, c, p);
> +  return vqrdmlashq_m_n_s32 (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqrdmlasht.s32  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int32x4_t
> -foo1 (int32x4_t a, int32x4_t b, int32_t c, mve_pred16_t p)
> +foo1 (int32x4_t m1, int32x4_t m2, int32_t add, mve_pred16_t p)
>  {
> -  return vqrdmlashq_m (a, b, c, p);
> +  return vqrdmlashq_m (m1, m2, add, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqrdmlasht.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> index e42cc63fa74..30915b24e5e 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqrdmlashq_m_n_s8.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "

RE: [PATCH 32/35] arm: improve tests for vqsubq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 32/35] arm: improve tests for vqsubq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_n_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_m_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_n_u8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_s8.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u16.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u32.c:
>   * gcc.target/arm/mve/intrinsics/vqsubq_u8.c:

Missing text.
Ok with ChangeLog fixed.
Kyrill

> ---
>  .../arm/mve/intrinsics/vqsubq_m_n_s16.c   | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s32.c   | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_s8.c| 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_n_u16.c   | 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_n_u32.c   | 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_n_u8.c| 42 +--
>  .../arm/mve/intrinsics/vqsubq_m_s16.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_s32.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_s8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u16.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u32.c | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_m_u8.c  | 26 ++--
>  .../arm/mve/intrinsics/vqsubq_n_s16.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_s32.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_s8.c  | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_n_u16.c | 28 -
>  .../arm/mve/intrinsics/vqsubq_n_u32.c | 28 -
>  .../arm/mve/intrinsics/vqsubq_n_u8.c  | 28 -
>  .../arm/mve/intrinsics/vqsubq_s16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_s32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_s8.c | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_u16.c   | 16 ++-
>  .../arm/mve/intrinsics/vqsubq_u32.c   | 16 ++-
>  .../gcc.target/arm/mve/intrinsics/vqsubq_u8.c | 16 ++-
>  24 files changed, 516 insertions(+), 72 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> index abcff4f0e3c..39b8089919d 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s16.c
> @@ -1,23 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqsubq_m_n_s16 (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vqsubt.s16  q[0-9]+, q[0-9]+, (?:ip|fp|r[0-9]+)(?:  @.*|)
> +**   ...
> +*/
>  int16x8_t
>  foo1 (int16x8_t inactive, int16x8_t a, int16_t b, mve_pred16_t p)
>  {
>return vqsubq_m (inactive, a, b, p);
>  }
> 
> -/* { dg-final { scan-assembler "vpst" } } */
> -/* { dg-final { scan-assembler "vqsubt.s16"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vqsubq_m_n_s32.c
> b/gcc/testsuite/gcc.target/a

RE: [PATCH 34/35] arm: improve tests for vrshlq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 34/35] arm: improve tests for vrshlq*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c: Improve tests.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_m_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_n_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_u8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vrshlq_x_u8.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  .../arm/mve/intrinsics/vrshlq_m_n_s16.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s32.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_s8.c| 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u16.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u32.c   | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_n_u8.c| 25 +++---
>  .../arm/mve/intrinsics/vrshlq_m_s16.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_s32.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_s8.c  | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u16.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u32.c | 26 ---
>  .../arm/mve/intrinsics/vrshlq_m_u8.c  | 26 ---
>  .../arm/mve/intrinsics/vrshlq_n_s16.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_s32.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_s8.c  | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u16.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u32.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_n_u8.c  | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_s16.c   | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_s32.c   | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_s8.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_u16.c   | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_u32.c   | 16 ++--
>  .../gcc.target/arm/mve/intrinsics/vrshlq_u8.c | 16 ++--
>  .../arm/mve/intrinsics/vrshlq_x_s16.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_s32.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_s8.c  | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u16.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u32.c | 25 +++---
>  .../arm/mve/intrinsics/vrshlq_x_u8.c  | 25 +++---
>  30 files changed, 564 insertions(+), 84 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> index cf51de6aa9c..c7d1f3a5b1c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrshlq_m_n_s16.c
> @@ -1,22 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */

RE: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 33/35] arm: improve tests and fix vrmlaldavhaq*
> 
> gcc/ChangeLog:
> 
>   * config/arm/mve.md (mve_vrmlaldavhq_v4si,
>   mve_vrmlaldavhaq_v4si): Fix spacing vs tabs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c: Improve
> test.
>   * gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c: Likewise.

Ok.
Thanks,
Kyrill

> ---
>  gcc/config/arm/mve.md |  4 +-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_s32.c   | 24 ++-
>  .../arm/mve/intrinsics/vrmlaldavhaq_p_u32.c   | 40 ++-
>  3 files changed, 62 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index d2ffae6a425..b5e6da4b133 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -2543,7 +2543,7 @@ (define_insn "mve_vrmlaldavhq_v4si"
>VRMLALDAVHQ))
>]
>"TARGET_HAVE_MVE"
> -  "vrmlaldavh.32 %Q0, %R0, %q1, %q2"
> +  "vrmlaldavh.32\t%Q0, %R0, %q1, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -2649,7 +2649,7 @@ (define_insn "mve_vrmlaldavhaq_v4si"
>VRMLALDAVHAQ))
>]
>"TARGET_HAVE_MVE"
> -  "vrmlaldavha.32 %Q0, %R0, %q2, %q3"
> +  "vrmlaldavha.32\t%Q0, %R0, %q2, %q3"
>[(set_attr "type" "mve_move")
>  ])
> 
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> index 263d3509771..dec4a969dfe 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_s32.c
> @@ -1,21 +1,41 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.s32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
>  foo (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p_s32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.s32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  int64_t
>  foo1 (int64_t a, int32x4_t b, int32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.s32"  }  } */
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git
> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> index 83ab68c001b..f3c8bfd121c 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vrmlaldavhaq_p_u32.c
> @@ -1,21 +1,57 @@
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
>  /* { dg-add-options arm_v8_1m_mve } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  uint64_t
>  foo (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p_u32 (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
>  uint64_t
>  foo1 (uint64_t a, uint32x4_t b, uint32x4_t c, mve_pred16_t p)
>  {
>return vrmlaldavhaq_p (a, b, c, p);
>  }
> 
> -/* { dg-final { scan-assembler "vrmlaldavhat.u32"  }  } */
> +/*
> +**foo2:
> +**   ...
> +**   vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
> +**   ...
> +**   vpst(?: @.*|)
> +**   ...
> +**   vrmlaldavhat.u32(?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), q[0-9]+,
> q[0-9]+(?:@.*|)
> +**   ...
> +*/
> +uint64_t
> +foo2 (uint32x4_t b, uint32x4_t c, mve_pred16_t p)
> +{
> +  return vrmlaldavhaq_p (1, b, c, p);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> --
> 2.25.1

RE: [PATCH 35/35] arm: improve tests for vsetq_lane*

2022-11-22 Thread Kyrylo Tkachov via Gcc-patches




> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 17, 2022 4:38 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> ; Andrea Corallo 
> Subject: [PATCH 35/35] arm: improve tests for vsetq_lane*
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
>   * gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
> ---
>  .../arm/mve/intrinsics/vsetq_lane_f16.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_f32.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_s16.c   | 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_s32.c   | 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_s64.c   | 27 ++---
>  .../arm/mve/intrinsics/vsetq_lane_s8.c| 24 ++--
>  .../arm/mve/intrinsics/vsetq_lane_u16.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_u32.c   | 36 +++--
>  .../arm/mve/intrinsics/vsetq_lane_u64.c   | 39 ---
>  .../arm/mve/intrinsics/vsetq_lane_u8.c| 36 +++--
>  10 files changed, 284 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> index e03e9620528..b5c9f4d5eb8 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
>  float16x8_t
>  foo (float16_t a, float16x8_t b)
>  {
> -return vsetq_lane_f16 (a, b, 0);
> +  return vsetq_lane_f16 (a, b, 1);
>  }
> 

Hmm, for these tests we should be able to scan for more specific codegen as 
we're setting individual lanes, so we should be able to scan for lane 1 in the 
vmov instruction, though it may need to be flipped for big-endian.
Thanks,
Kyrill

> -/* { dg-final { scan-assembler "vmov.16"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo1 (float16_t a, float16x8_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**   ...
> +**   vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float16x8_t
> +foo2 (float16x8_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> index 2b9f1a7e627..211083ce5d4 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
> @@ -1,15 +1,45 @@
> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> {""} } */
>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>  /* { dg-add-options arm_v8_1m_mve_fp } */
>  /* { dg-additional-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> 
>  #include "arm_mve.h"
> 
> +/*
> +**foo:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
>  float32x4_t
>  foo (float32_t a, float32x4_t b)
>  {
> -return vsetq_lane_f32 (a, b, 0);
> +  return vsetq_lane_f32 (a, b, 1);
>  }
> 
> -/* { dg-final { scan-assembler "vmov.32"  }  } */
> 
> +/*
> +**foo1:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo1 (float32_t a, float32x4_t b)
> +{
> +  return vsetq_lane (a, b, 1);
> +}
> +
> +/*
> +**foo2:
> +**   ...
> +**   vmov.32 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> +**   ...
> +*/
> +float32x4_t
> +foo2 (float32x4_t b)
> +{
> +  return vsetq_lane (1.1, b, 1);
> +}
> +
> +/* { dg-final { scan-assembler-not "__ARM_undef" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c
> b/gcc/testsuite/gcc

Re: [PATCH 2/2] Add a new warning option -Wstrict-flex-arrays.

2022-11-22 Thread Kees Cook via Gcc-patches

On Tue, Nov 22, 2022 at 03:02:04PM +, Qing Zhao wrote:
> 
> 
> > On Nov 22, 2022, at 9:10 AM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > 
> > 
> >> On Nov 22, 2022, at 3:16 AM, Richard Biener  wrote:
> >> 
> >> On Mon, 21 Nov 2022, Qing Zhao wrote:
> >> 
> >>> 
> >>> 
>  On Nov 18, 2022, at 11:31 AM, Kees Cook  wrote:
>  
>  On Fri, Nov 18, 2022 at 03:19:07PM +, Qing Zhao wrote:
> > Hi, Richard,
> > 
> > Honestly, it?s very hard for me to decide what?s the best way to handle 
> > the interaction 
> > between -fstrict-flex-array=M and -Warray-bounds=N. 
> > 
> > Ideally,  -fstrict-flex-array=M should completely control the behavior 
> > of -Warray-bounds.
> > If possible, I prefer this solution.
> > 
> > However, -Warray-bounds is included in -Wall, and has been used 
> > extensively for a long time.
> > It?s not safe to change its default behavior. 
>  
>  I prefer that -fstrict-flex-arrays controls -Warray-bounds. That
>  it is in -Wall is _good_ for this reason. :) No one is going to add
>  -fstrict-flex-arrays (at any level) without understanding what it does
>  and wanting those effects on -Warray-bounds.
> >>> 
> >>> 
> >>> The major difficulties to let -fstrict-flex-arrays controlling 
> >>> -Warray-bounds was discussed in the following threads:
> >>> 
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604133.html
> >>> 
> >>> Please take a look at the discussion and let me know your opinion.
> >> 
> >> My opinion is now, after re-considering and with seeing your new 
> >> patch, that -Warray-bounds=2 should be changed to only add
> >> "the intermediate results of pointer arithmetic that may yield out of 
> >> bounds values" and that what it considers a flex array should now
> >> be controlled by -fstrict-flex-arrays only.
> >> 
> >> That is, I think, the only thing that's not confusing to users even
> >> if that implies a change from previous behavior that we should
> >> document by clarifying the -Warray-bounds documentation as well as
> >> by adding an entry to the Caveats section of gcc-13/changes.html
> >> 
> >> That also means that =2 will get _less_ warnings with GCC 13 when
> >> the user doesn't use -fstrict-flex-arrays as well.
> > 
> > Okay.  So, this is for -Warray-bounds=2.
> > 
> > For -Warray-bounds=1 -fstrict-flex-array=N, if N > 1, should 
> > -fstrict-flex-array=N control -Warray-bounds=1?
> 
> More thinking on this. (I might misunderstand a little bit in the previous 
> email)
> 
> If I understand correctly now, what you proposed was:
> 
> 1. The level of -Warray-bounds will NOT control how a trailing array is 
> considered as a flex array member anymore. 
> 2. Only the level of -fstrict-flex-arrays will control this;
> 3. Keep the current default  behavior of -Warray-bounds on treating trailing 
> arrays as flex array member (treating all [0],[1], and [] as flexible array 
> members). 
> 4. Updating the documentation for -Warray-bounds by clarifying this change, 
> and also as an entry to the Caveats section on such change on -Warray-bounds.
> 
> If the above is correct, Yes, I like this change. Both the user interface and 
> the internal implementation will be simplified and cleaner. 
> 
> Let me know if you see any issue with my above understanding.
> 
> Thanks a lot.

FWIW, this matches what I think makes the most sense too.

-- 
Kees Cook

[committed] libstdc++: Add testcase for fs::path constraint recursion [PR106201]

2022-11-22 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* testsuite/27_io/filesystem/iterators/106201.cc: New test.
---
 .../testsuite/27_io/filesystem/iterators/106201.cc   | 12 
 1 file changed, 12 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
new file mode 100644
index 000..4a64e675816
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
@@ -0,0 +1,12 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
-- 
2.38.1

[committed] libstdc++: Replace std::isdigit and std::isxdigit in [PR107817]

2022-11-22 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

These functions aren't usable in constant expressions. Provide our own
implementations, based on __from_chars_alnum_to_val from .

libstdc++-v3/ChangeLog:

PR libstdc++/107817
* include/std/charconv (__from_chars_alnum_to_val): Add
constexpr for C++20.
* include/std/format (__is_digit, __is_xdigit): New functions.
(_Spec::_S_parse_width_or_precision): Use __is_digit.
(__formatter_fp::parse): Use __is_xdigit.
---
 libstdc++-v3/include/std/charconv |  2 +-
 libstdc++-v3/include/std/format   | 12 +---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/charconv 
b/libstdc++-v3/include/std/charconv
index 8f02395172f..8b2acc5bf8d 100644
--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -454,7 +454,7 @@ namespace __detail
   // If _DecOnly is false: if the character is an alphanumeric digit, then
   // return its corresponding base-36 value, otherwise return a value >= 127.
   template
-_GLIBCXX23_CONSTEXPR unsigned char
+_GLIBCXX20_CONSTEXPR unsigned char
 __from_chars_alnum_to_val(unsigned char __c)
 {
   if _GLIBCXX17_CONSTEXPR (_DecOnly)
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 7ae58eb2416..23ffbdabed8 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -358,6 +358,12 @@ namespace __format
 size_t
 __int_from_arg(const basic_format_arg<_Context>& __arg);
 
+  constexpr bool __is_digit(char __c)
+  { return std::__detail::__from_chars_alnum_to_val(__c) < 10; }
+
+  constexpr bool __is_xdigit(char __c)
+  { return std::__detail::__from_chars_alnum_to_val(__c) < 16; }
+
   template
 struct _Spec
 {
@@ -469,7 +475,7 @@ namespace __format
  unsigned short& __val, bool& __arg_id,
  basic_format_parse_context<_CharT>& __pc)
   {
-   if (std::isdigit(*__first))
+   if (__format::__is_digit(*__first))
  {
auto [__v, __ptr] = __format::__parse_integer(__first, __last);
if (!__ptr)
@@ -1537,7 +1543,7 @@ namespace __format
 
  if (__trailing_zeros)
{
- if (!std::isxdigit(__s[0]))
+ if (!__format::__is_xdigit(__s[0]))
--__sigfigs;
  __z = __prec - __sigfigs;
}
@@ -1627,7 +1633,7 @@ namespace __format
{
  __fill_char = _CharT('0');
  // Write sign before zero filling.
- if (!std::isxdigit(__narrow_str[0]))
+ if (!__format::__is_xdigit(__narrow_str[0]))
{
  *__out++ = __str[0];
  __str.remove_prefix(1);
-- 
2.38.1

Re: [PATCH] testsuite: Fix missing EFFECTIVE_TARGETS variable errors

2022-11-22 Thread Maciej W. Rozycki

On Mon, 21 Nov 2022, Jeff Law wrote:

> > gcc/testsuite/
> > * lib/target-supports.exp
> > (check_effective_target_mpaired_single): Add `args' argument and
> > pass it to `check_no_compiler_messages' replacing
> > `-mpaired-single'.
> > (add_options_for_mips_loongson_mmi): Add `args' argument and
> > pass it to `check_no_compiler_messages'.
> > (check_effective_target_mips_msa): Add `args' argument and pass
> > it to `check_no_compiler_messages' replacing `-mmsa'.
> > (check_effective_target_mpaired_single_runtime)
> > (add_options_for_mpaired_single): Pass `-mpaired-single' to
> > `check_effective_target_mpaired_single'.
> > (check_effective_target_mips_loongson_mmi_runtime)
> > (add_options_for_mips_loongson_mmi): Pass `-mloongson-mmi' to
> > `check_effective_target_mips_loongson_mmi'.
> > (check_effective_target_mips_msa_runtime)
> > (add_options_for_mips_msa): Pass `-mmsa' to
> > `check_effective_target_mips_msa'.
> > (et-is-effective-target): Verify that EFFECTIVE_TARGETS exists
> > and if not, just check if the current compilation environment
> > supports the target feature requested.
> > (check_vect_support_and_set_flags): Pass `-mpaired-single',
> > `-mloongson-mmi', and `-mmsa' to the respective target feature
> > checks.
> 
> OK.

 I have committed it now, thanks for your review.

  Maciej

Re: [PATCH] Fix autoprofiledbootstrap build

2022-11-22 Thread Jeff Law via Gcc-patches




On 11/21/22 14:57, Eugene Rozenfeld via Gcc-patches wrote:

1. Fix gcov version
2. Don't attempt to create an autoprofile file for cc1 since cc1plus
(not cc1) is not invoked when building cc1
3. Fix documentation typo

Tested on x86_64-pc-linux-gnu.

gcc/ChangeLog:

* c/Make-lang.in: Don't attempt to create an autoprofile file for cc1
* cp/Make-lang.in: Fix gcov version
* lto/Make-lang.in: Fix gcov version
* doc/install.texi: Fix documentation typo


Just to be 100% sure.  While the compiler is built with cc1plus, various 
runtime libraries are still build with the C compiler and thus would use 
cc1.  AFAICT it looks like we don't try to build the runtime libraries 
to get any data about the behavior of the C compiler.  Can you confirm?



Assuming that's correct, this is fine for the trunk.


Thanks,

Jeff

Re: [PATCH] Fix count comparison in ipa-cp

2022-11-22 Thread Jeff Law via Gcc-patches




On 11/21/22 14:26, Eugene Rozenfeld via Gcc-patches wrote:

The existing comparison was incorrect for non-PRECISE counts
(e.g., AFDO): we could end up with a 0 base_count, which could
lead to asserts, e.g., in good_cloning_opportunity_p.

gcc/ChangeLog:

 * ipa-cp.cc (ipcp_propagate_stage): Fix profile count comparison.


OK.  Probably somewhat painful to pull together a reliable test for 
this, right?



Jeff

Re: [PATCH 2/5] c++: Set the locus of the function result decl

2022-11-22 Thread Jason Merrill via Gcc-patches


On 11/20/22 12:06, Bernhard Reutner-Fischer wrote:

Hi Jason!

The "meh" of result-decl-plugin-test-2.C should likely be omitted,
grokdeclarator would need some changes to add richloc hints and we would not
be able to make a reliable guess what to remove precisely.
C.f. /* Check all other uses of type modifiers.  */
Furthermore it is unrelated to DECL_RESULT so not of direct interest
here. The other tests in test-2.C, f() and huh() should work though.

I don't know if it's acceptable to change ipa-pure-const to make the
missing noreturn warning more precise and emit a fixit-hint. At least it
would be a real test for the DECL_RESULT and would spare us the plugin.


The main problem I see with that change is that the syntax of the fixit 
might be wrong for non-C-family front-ends.


Here's a version of the patch that fixes template/method handling, and 
adjusts -Waggregate-return as well:
From 5075d2ac12f655f8f83f6f3be27e2c1141e1ce99 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer 
Date: Sun, 20 Nov 2022 18:06:04 +0100
Subject: [PATCH] c++: Set the locus of the function result decl
To: gcc-patches@gcc.gnu.org

gcc/cp/ChangeLog:

	* decl.cc (grokdeclarator): Build RESULT_DECL.
	(start_preparsed_function): Copy location from template.

gcc/ChangeLog:

	* function.cc (init_function_start): Use DECL_RESULT location
	for -Waggregate-return warning.
	* ipa-pure-const.cc (suggest_attribute): Add fixit-hint for the
	noreturn attribute.

gcc/testsuite/ChangeLog:

	* c-c++-common/pr68833-1.c: Adjust noreturn warning line number.
	* gcc.dg/noreturn-1.c: Likewise.
	* g++.dg/diagnostic/return-type-loc1.C: New test.
	* g++.dg/other/resultdecl-1.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/cp/decl.cc| 26 +--
 gcc/function.cc   |  3 +-
 gcc/ipa-pure-const.cc | 14 +++-
 gcc/testsuite/c-c++-common/pr68833-1.c|  2 +-
 .../g++.dg/diagnostic/return-type-loc1.C  | 20 
 gcc/testsuite/g++.dg/other/resultdecl-1.C | 32 +++
 gcc/testsuite/gcc.dg/noreturn-1.c |  2 +-
 7 files changed, 93 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/return-type-loc1.C
 create mode 100644 gcc/testsuite/g++.dg/other/resultdecl-1.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 544efdc9914..2c5cd930e0a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -14774,6 +14774,18 @@ grokdeclarator (const cp_declarator *declarator,
 	else if (constinit_p)
 	  DECL_DECLARED_CONSTINIT_P (decl) = true;
   }
+else if (TREE_CODE (decl) == FUNCTION_DECL)
+  {
+	location_t loc = smallest_type_location (declspecs);
+	if (loc != UNKNOWN_LOCATION)
+	  {
+	tree restype = TREE_TYPE (TREE_TYPE (decl));
+	tree resdecl = build_decl (loc, RESULT_DECL, 0, restype);
+	DECL_ARTIFICIAL (resdecl) = 1;
+	DECL_IGNORED_P (resdecl) = 1;
+	DECL_RESULT (decl) = resdecl;
+	  }
+  }
 
 /* Record constancy and volatility on the DECL itself .  There's
no need to do this when processing a template; we'll do this
@@ -17328,9 +17340,19 @@ start_preparsed_function (tree decl1, tree attrs, int flags)
 
   if (DECL_RESULT (decl1) == NULL_TREE)
 {
-  tree resdecl;
+  /* In a template instantiation, copy the return type location.  When
+	 parsing, the location will be set in grokdeclarator.  */
+  location_t loc = input_location;
+  if (DECL_TEMPLATE_INSTANTIATION (decl1)
+	  && !DECL_CXX_CONSTRUCTOR_P (decl1)
+	  && !DECL_CXX_DESTRUCTOR_P (decl1))
+	{
+	  tree tmpl = template_for_substitution (decl1);
+	  tree res = DECL_RESULT (DECL_TEMPLATE_RESULT (tmpl));
+	  loc = DECL_SOURCE_LOCATION (res);
+	}
 
-  resdecl = build_decl (input_location, RESULT_DECL, 0, restype);
+  tree resdecl = build_decl (loc, RESULT_DECL, 0, restype);
   DECL_ARTIFICIAL (resdecl) = 1;
   DECL_IGNORED_P (resdecl) = 1;
   DECL_RESULT (decl1) = resdecl;
diff --git a/gcc/function.cc b/gcc/function.cc
index 9c8773bbc59..dc333c27e92 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -4997,7 +4997,8 @@ init_function_start (tree subr)
   /* Warn if this value is an aggregate type,
  regardless of which calling convention we are using for it.  */
   if (AGGREGATE_TYPE_P (TREE_TYPE (DECL_RESULT (subr
-warning (OPT_Waggregate_return, "function returns an aggregate");
+warning_at (DECL_SOURCE_LOCATION (DECL_RESULT (subr)),
+		OPT_Waggregate_return, "function returns an aggregate");
 }
 
 /* Expand code to verify the stack_protect_guard.  This is invoked at
diff --git a/gcc/ipa-pure-const.cc b/gcc/ipa-pure-const.cc
index 572a6da274f..8f6e8f63d91 100644
--- a/gcc/ipa-pure-const.cc
+++ b/gcc/ipa-pure-const.cc
@@ -63,6 +63,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "symtab-thunks.h"
 #include "dbgcnt.h"
+#include "gcc-rich-location.h"
 
 /* Lattice values for const and pure f

Re: [PATCH v2] tree-object-size: Support strndup and strdup

2022-11-22 Thread Jeff Law via Gcc-patches




On 11/21/22 07:27, Siddhesh Poyarekar wrote:

On 2022-11-20 10:42, Jeff Law wrote:


On 11/4/22 06:48, Siddhesh Poyarekar wrote:
Use string length of input to strdup to determine the usable size of 
the

resulting object.  Avoid doing the same for strndup since there's a
chance that the input may be too large, resulting in an unnecessary
overhead or worse, the input may not be NULL terminated, resulting in a
crash where there would otherwise have been none.

gcc/ChangeLog:

* tree-object-size.cc (todo): New variable.
(object_sizes_execute): Use it.
(strdup_object_size): New function.
(call_object_size): Use it.

gcc/testsuite/ChangeLog:

* gcc.dg/builtin-dynamic-object-size-0.c (test_strdup,
test_strndup, test_strdup_min, test_strndup_min): New tests.
(main): Call them.
* gcc.dg/builtin-dynamic-object-size-1.c: Silence overread
warnings.
* gcc.dg/builtin-dynamic-object-size-2.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-3.c: Likewise.
* gcc.dg/builtin-dynamic-object-size-4.c: Likewise.
* gcc.dg/builtin-object-size-1.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-2.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.
* gcc.dg/builtin-object-size-3.c: Silence overread warnings.
Declare free, strdup and strndup.
(test11): New test.
(main): Call it.
* gcc.dg/builtin-object-size-4.c: Silence overread warnings.
Declare free, strdup and strndup.
(test9): New test.
(main): Call it.


I'm struggling to see how the SSA updating is correct.  Yes we need 
to update the virtuals due to the introduction of the call to strlen, 
particularly when SRC is not a string constant.  But do we need to do 
more?


Don't we end up gimplifying the 1 + strlenfn (src) expression? Can 
that possibly create new SSA_NAMEs?  Do those need to be put into SSA 
form? I feel like I'm missing something here...


We do all of that manually in gimplify_size_expressions, the only 
thing left to do is updating virtuals AFAICT.


I guess it's actually buried down in force_gimple_operand and I guess 
for temporaries they're not going to be alive across the new gimple 
sequence and each destination gets its own SSA_NAME, so it ought to be 
safe.  Just had to work a bit further through things.


OK for the trunk.


Thanks,
jeff

1 2 >

1 - 100 of 132 matches

Mail list logo