Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 6, 2021 at 12:59 PM Hongtao Liu  wrote:
>
> On Fri, Aug 6, 2021 at 11:44 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
> >  wrote:
> > >
> > > Hi:
> > > ---
> > > OK, I think sth is amiss here upthread.  insv/extv do look like they
> > > are designed
> > > to work on integer modes (but docs do not say anything about this here).
> > > In fact the caller of extract_bit_field_using_extv is named
> > > extract_integral_bit_field.  Of course nothing seems to check what kind of
> > > modes we're dealing with, but we're for example happily doing
> > > expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > > some integer mode and op0 is HFmode?  From the above I get it's
> > > the other way around?  In that case we should wrap the
> > > call to extract_integral_bit_field, extracting in an integer mode with the
> > > same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
> > > ...)).
> >
> > This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
CC jakub.
> > I wonder why the fix for that did not help here.
> >
> aarch64 didn't hit gcc_assert with my testcase, and I debugged it to
> figure out why.
>
> in gimple level, both x86 and aarch64 is the same with
> _3 = BIT_FIELD_REF ;
>
> and they all goes into
> extract_bit_field_using_extv
>
> The difference is aarch64 has ext_mode as DImode, but x86 has ext_mode
> as SImode.
> with ext_mode as DImode and target as (reg:HF 94), aarch64 doesn't hit
> gcc_assert in
>  gen_lowpart (ext_mode, target)
>
> since validate_subreg allow (subreg:DI (reg:HF)), but disallow
> (subreg:SI (reg:HF)).
>
>   /* ??? This should not be here.  Temporarily continue to allow word_mode
>  subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
>  Generally, backends are doing something sketchy but it'll take time to
>  fix them all.  */
>   if (omode == word_mode)
> ;
>
> ext_mode is assigned from extv->field mode which is initialized in
> get_best_reg_extraction_insn.
> get_best_reg_extraction_insn will finally call
> get_optab_extraction_insn and find
> aarch64 doesn't have CODE_FOR_extzvsi but x86 has.
>
> That's why aarch64 has ext_mode as DImode and x86 SImode.
>
> > Thanks,
> > Andrew Pinski
> >
> > > ---
> > >   This is a separate patch as a follow up of upper comments.
> > >
> > > gcc/ChangeLog:
> > >
> > > * expmed.c (extract_bit_field_1): Wrap the call to
> > > extract_integral_bit_field, extracting in an integer mode with
> > > the same size as 'tmode' and then converting the result
> > > as (subreg:tmode (reg:imode)).
> > >
> > > gcc/testsuite/ChangeLog:
> > > * gcc.target/i386/float16-5.c: New test.
> > > ---
> > >  gcc/expmed.c  | 19 +++
> > >  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
> > >  2 files changed, 31 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> > >
> > > diff --git a/gcc/expmed.c b/gcc/expmed.c
> > > index 3143f38e057..72790693ef0 100644
> > > --- a/gcc/expmed.c
> > > +++ b/gcc/expmed.c
> > > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> > > bitsize, poly_uint64 bitnum,
> > >op0_mode = opt_scalar_int_mode ();
> > >  }
> > >
> > > +  /* Make sure we are playing with integral modes.  Pun with subregs
> > > + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > > + in extract_integral_bit_field.  */
> > > +  if (int_mode_for_mode (tmode).exists ()
> > > +  && imode != tmode
> > > +  && imode != GET_MODE (op0))
> > > +{
> > > +  rtx ret = extract_integral_bit_field (op0, op0_mode,
> > > +   bitsize.to_constant (),
> > > +   bitnum.to_constant (), 
> > > unsignedp,
> > > +   NULL, imode, imode,
> > > +   reverse, fallback_p);
> > > +  gcc_assert (ret);
> > > +
> > > +  if (!REG_P (ret))
> > > +   ret = force_reg (imode, ret);
> > > +  return gen_lowpart_SUBREG (tmode, ret);
> > > +}
> > > +
> > >/* It's possible we'll need to handle other cases here for
> > >   polynomial bitnum and bitsize.  */
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
> > > b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > new file mode 100644
> > > index 000..ebc0af1490b
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > > @@ -0,0 +1,12 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-msse2 -O2" } */
> > > +_Float16
> > > +foo (int a)
> > > +{
> > > +  union {
> > > +int a;
> > > +_Float16 b;
> > > +  }c;
> > > +  c.a = a;
> > > +  return c.b;
> > > +}
> > > --
> > > 2.27.0
> > >
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-05 Thread Sebastian Huber

On 05/08/2021 14:53, Martin Liška wrote:

On 7/23/21 11:39 AM, Sebastian Huber wrote:
Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda 
info in a

freestanding environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use 
a linker
script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to 
compile

it):


The patch can be installed once the following nits are fixed:


Thanks for the review, I checked it in like this:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9124bbe1857f0d3a3015d6461d5f8d04f07cab85

I hope the format is now all right.

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: Why vectorization didn't turn on by -O2

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 5, 2021 at 5:20 AM Segher Boessenkool
 wrote:
>
> On Wed, Aug 04, 2021 at 11:22:53AM +0100, Richard Sandiford wrote:
> > Segher Boessenkool  writes:
> > > On Wed, Aug 04, 2021 at 10:10:36AM +0100, Richard Sandiford wrote:
> > >> Richard Biener  writes:
> > >> > Alternatively only enable loop vectorization at -O2 (the above checks
> > >> > flag_tree_slp_vectorize as well).  At least the cost model kind
> > >> > does not have any influence on BB vectorization, that is, we get the
> > >> > same pros and cons as we do for -O3.
> > >>
> > >> Yeah, but a lot of the loop vector cost model choice is about controlling
> > >> code size growth and avoiding excessive runtime versioning tests.
> > >
> > > Both of those depend a lot on the target, and target-specific conditions
> > > as well (which CPU model is selected for example).  Can we factor that
> > > in somehow?  Maybe we need some target hook that returns the expected
> > > percentage code growth for vectorising a given loop, for example, and
> > > -O2 vs. -O3 then selects what percentage is acceptable.
> > >
> > >> BB SLP
> > >> should be a win on both code size and performance (barring significant
> > >> target costing issues).
> > >
> > > Yeah -- but this could use a similar hook as well (just a straightline
> > > piece of code instead of a loop).
> >
> > I think anything like that should be driven by motivating use cases.
> > It's not something that we can easily decide in the abstract.
> >
> > The results so far with using very-cheap at -O2 have been promising,
> > so I don't think new hooks should block that becoming the default.
>
> Right, but it wouldn't hurt to think a sec if we are on the right path
> forward.  It's is crystal clear that to make good decisions about what
> and how to vectorise you need to take *some* target characteristics into
> account, and that will have to happen sooner rather than later.
>
> This was all in reply to
>
> > >> Yeah, but a lot of the loop vector cost model choice is about controlling
> > >> code size growth and avoiding excessive runtime versioning tests.
>
> It was not meant to hold up these patches :-)
>
> > >> PR100089 was an exception because we ended up keeping unvectorised
> > >> scalar code that would never have existed otherwise.  BB SLP proper
> > >> shouldn't have that problem.
> > >
> > > It also is a tiny piece of code.  There will always be tiny examples
> > > that are much worse (or much better) than average.
> >
> > Yeah, what makes PR100089 important isn't IMO the test itself, but the
> > underlying problem that the PR exposed.  Enabling this “BB SLP in loop
> > vectorisation” code can lead to the generation of scalar COND_EXPRs even
> > though we know that ifcvt doesn't have a proper cost model for deciding
> > whether scalar COND_EXPRs are a win.
> >
> > Introducing scalar COND_EXPRs at -O3 is arguably an acceptable risk
> > (although still dubious), but I think it's something we need to avoid
> > for -O2, even if that means losing the optimisation.
>
> Yeah -- -O2 should almost always do the right thing, while -O3 can do
> bad things more often, it just has to be better "on average".
>
>
> Segher

Move thread to gcc-patches and gcc

-- 
BR,
Hongtao


Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Fri, Aug 6, 2021 at 11:44 AM Andrew Pinski via Gcc-patches
 wrote:
>
> On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
>  wrote:
> >
> > Hi:
> > ---
> > OK, I think sth is amiss here upthread.  insv/extv do look like they
> > are designed
> > to work on integer modes (but docs do not say anything about this here).
> > In fact the caller of extract_bit_field_using_extv is named
> > extract_integral_bit_field.  Of course nothing seems to check what kind of
> > modes we're dealing with, but we're for example happily doing
> > expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> > some integer mode and op0 is HFmode?  From the above I get it's
> > the other way around?  In that case we should wrap the
> > call to extract_integral_bit_field, extracting in an integer mode with the
> > same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
> > ...)).
>
> This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
> I wonder why the fix for that did not help here.
>
aarch64 didn't hit gcc_assert with my testcase, and I debugged it to
figure out why.

in gimple level, both x86 and aarch64 is the same with
_3 = BIT_FIELD_REF ;

and they all goes into
extract_bit_field_using_extv

The difference is aarch64 has ext_mode as DImode, but x86 has ext_mode
as SImode.
with ext_mode as DImode and target as (reg:HF 94), aarch64 doesn't hit
gcc_assert in
 gen_lowpart (ext_mode, target)

since validate_subreg allow (subreg:DI (reg:HF)), but disallow
(subreg:SI (reg:HF)).

  /* ??? This should not be here.  Temporarily continue to allow word_mode
 subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
 Generally, backends are doing something sketchy but it'll take time to
 fix them all.  */
  if (omode == word_mode)
;

ext_mode is assigned from extv->field mode which is initialized in
get_best_reg_extraction_insn.
get_best_reg_extraction_insn will finally call
get_optab_extraction_insn and find
aarch64 doesn't have CODE_FOR_extzvsi but x86 has.

That's why aarch64 has ext_mode as DImode and x86 SImode.

> Thanks,
> Andrew Pinski
>
> > ---
> >   This is a separate patch as a follow up of upper comments.
> >
> > gcc/ChangeLog:
> >
> > * expmed.c (extract_bit_field_1): Wrap the call to
> > extract_integral_bit_field, extracting in an integer mode with
> > the same size as 'tmode' and then converting the result
> > as (subreg:tmode (reg:imode)).
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.target/i386/float16-5.c: New test.
> > ---
> >  gcc/expmed.c  | 19 +++
> >  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
> >  2 files changed, 31 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
> >
> > diff --git a/gcc/expmed.c b/gcc/expmed.c
> > index 3143f38e057..72790693ef0 100644
> > --- a/gcc/expmed.c
> > +++ b/gcc/expmed.c
> > @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 
> > bitsize, poly_uint64 bitnum,
> >op0_mode = opt_scalar_int_mode ();
> >  }
> >
> > +  /* Make sure we are playing with integral modes.  Pun with subregs
> > + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> > + in extract_integral_bit_field.  */
> > +  if (int_mode_for_mode (tmode).exists ()
> > +  && imode != tmode
> > +  && imode != GET_MODE (op0))
> > +{
> > +  rtx ret = extract_integral_bit_field (op0, op0_mode,
> > +   bitsize.to_constant (),
> > +   bitnum.to_constant (), 
> > unsignedp,
> > +   NULL, imode, imode,
> > +   reverse, fallback_p);
> > +  gcc_assert (ret);
> > +
> > +  if (!REG_P (ret))
> > +   ret = force_reg (imode, ret);
> > +  return gen_lowpart_SUBREG (tmode, ret);
> > +}
> > +
> >/* It's possible we'll need to handle other cases here for
> >   polynomial bitnum and bitsize.  */
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
> > b/gcc/testsuite/gcc.target/i386/float16-5.c
> > new file mode 100644
> > index 000..ebc0af1490b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-msse2 -O2" } */
> > +_Float16
> > +foo (int a)
> > +{
> > +  union {
> > +int a;
> > +_Float16 b;
> > +  }c;
> > +  c.a = a;
> > +  return c.b;
> > +}
> > --
> > 2.27.0
> >



-- 
BR,
Hongtao


Re: [PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-05 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 5, 2021 at 8:33 PM liuhongt via Gcc-patches
 wrote:
>
> Hi:
> ---
> OK, I think sth is amiss here upthread.  insv/extv do look like they
> are designed
> to work on integer modes (but docs do not say anything about this here).
> In fact the caller of extract_bit_field_using_extv is named
> extract_integral_bit_field.  Of course nothing seems to check what kind of
> modes we're dealing with, but we're for example happily doing
> expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
> some integer mode and op0 is HFmode?  From the above I get it's
> the other way around?  In that case we should wrap the
> call to extract_integral_bit_field, extracting in an integer mode with the
> same size as 'mode' and then converting the result as (subreg:HF (reg:HI 
> ...)).

This seems related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93235 .
I wonder why the fix for that did not help here.

Thanks,
Andrew Pinski

> ---
>   This is a separate patch as a follow up of upper comments.
>
> gcc/ChangeLog:
>
> * expmed.c (extract_bit_field_1): Wrap the call to
> extract_integral_bit_field, extracting in an integer mode with
> the same size as 'tmode' and then converting the result
> as (subreg:tmode (reg:imode)).
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/float16-5.c: New test.
> ---
>  gcc/expmed.c  | 19 +++
>  gcc/testsuite/gcc.target/i386/float16-5.c | 12 
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c
>
> diff --git a/gcc/expmed.c b/gcc/expmed.c
> index 3143f38e057..72790693ef0 100644
> --- a/gcc/expmed.c
> +++ b/gcc/expmed.c
> @@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
> poly_uint64 bitnum,
>op0_mode = opt_scalar_int_mode ();
>  }
>
> +  /* Make sure we are playing with integral modes.  Pun with subregs
> + if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
> + in extract_integral_bit_field.  */
> +  if (int_mode_for_mode (tmode).exists ()
> +  && imode != tmode
> +  && imode != GET_MODE (op0))
> +{
> +  rtx ret = extract_integral_bit_field (op0, op0_mode,
> +   bitsize.to_constant (),
> +   bitnum.to_constant (), unsignedp,
> +   NULL, imode, imode,
> +   reverse, fallback_p);
> +  gcc_assert (ret);
> +
> +  if (!REG_P (ret))
> +   ret = force_reg (imode, ret);
> +  return gen_lowpart_SUBREG (tmode, ret);
> +}
> +
>/* It's possible we'll need to handle other cases here for
>   polynomial bitnum and bitsize.  */
>
> diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
> b/gcc/testsuite/gcc.target/i386/float16-5.c
> new file mode 100644
> index 000..ebc0af1490b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/float16-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse2 -O2" } */
> +_Float16
> +foo (int a)
> +{
> +  union {
> +int a;
> +_Float16 b;
> +  }c;
> +  c.a = a;
> +  return c.b;
> +}
> --
> 2.27.0
>


[PATCH] Make sure we're playing with integral modes before call extract_integral_bit_field.

2021-08-05 Thread liuhongt via Gcc-patches
Hi:
---
OK, I think sth is amiss here upthread.  insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field.  Of course nothing seems to check what kind of
modes we're dealing with, but we're for example happily doing
expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
some integer mode and op0 is HFmode?  From the above I get it's
the other way around?  In that case we should wrap the
call to extract_integral_bit_field, extracting in an integer mode with the
same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).
---
  This is a separate patch as a follow up of upper comments.
 
gcc/ChangeLog:

* expmed.c (extract_bit_field_1): Wrap the call to
extract_integral_bit_field, extracting in an integer mode with
the same size as 'tmode' and then converting the result
as (subreg:tmode (reg:imode)).

gcc/testsuite/ChangeLog:
* gcc.target/i386/float16-5.c: New test.
---
 gcc/expmed.c  | 19 +++
 gcc/testsuite/gcc.target/i386/float16-5.c | 12 
 2 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/float16-5.c

diff --git a/gcc/expmed.c b/gcc/expmed.c
index 3143f38e057..72790693ef0 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1850,6 +1850,25 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, 
poly_uint64 bitnum,
   op0_mode = opt_scalar_int_mode ();
 }
 
+  /* Make sure we are playing with integral modes.  Pun with subregs
+ if we aren't. When tmode is HFmode, op0 is SImode, there will be ICE
+ in extract_integral_bit_field.  */
+  if (int_mode_for_mode (tmode).exists ()
+  && imode != tmode
+  && imode != GET_MODE (op0))
+{
+  rtx ret = extract_integral_bit_field (op0, op0_mode,
+   bitsize.to_constant (),
+   bitnum.to_constant (), unsignedp,
+   NULL, imode, imode,
+   reverse, fallback_p);
+  gcc_assert (ret);
+
+  if (!REG_P (ret))
+   ret = force_reg (imode, ret);
+  return gen_lowpart_SUBREG (tmode, ret);
+}
+
   /* It's possible we'll need to handle other cases here for
  polynomial bitnum and bitsize.  */
 
diff --git a/gcc/testsuite/gcc.target/i386/float16-5.c 
b/gcc/testsuite/gcc.target/i386/float16-5.c
new file mode 100644
index 000..ebc0af1490b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/float16-5.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-msse2 -O2" } */
+_Float16
+foo (int a)
+{
+  union {
+int a;
+_Float16 b;
+  }c;
+  c.a = a;
+  return c.b;
+}
-- 
2.27.0



[PATCH] Fix typo in fold-vec-load-builtin_vec_xl-* tests.

2021-08-05 Thread Michael Meissner via Gcc-patches
[PATCH] Fix typo in fold-vec-load-builtin_vec_xl-* tests.

When I checked in the fix for running tests on power10 systems with
power10 code generation, I had a typo in the
fold-vec-load-builtin_vec_xl-* tests, swapping 'x' and 'v' in the p?lxv
pattern.

I checked this patch on a little endian power10 building GCC with
--with-cpu=power10 and I also ran it on a little endian power9 configured with
--with-cpu=power9 to verify that I didn't cause a regression  Can I check this
into the master branch?

gcc/testsuite/
2021-08-05  Michael Meissner  

* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c: Fix
typo in regular expression.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c:
Likewise.
* gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c:
Likewise.
---
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c  | 2 +-
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c| 2 +-
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c | 2 +-
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c   | 2 +-
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c  | 2 +-
 .../gcc.target/powerpc/fold-vec-load-builtin_vec_xl-short.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
index f6eb88fbe39..ea6adc4f112 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-char.c
@@ -36,4 +36,4 @@ BUILD_VAR_TEST( test10, vector unsigned char, signed long 
long, vector unsigned
 BUILD_VAR_TEST( test11, vector unsigned char, signed int, vector unsigned 
char);
 BUILD_CST_TEST( test12, vector unsigned char, 8, vector unsigned char);
 
-/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lvx\M} 12 } } */
+/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lxv\M} 12 } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
index 66d544530f6..71236472ea4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-double.c
@@ -28,4 +28,4 @@ BUILD_VAR_TEST( test4, vector double, signed long long, 
vector double);
 BUILD_VAR_TEST( test5, vector double, signed int, vector double);
 BUILD_CST_TEST( test6, vector double, 12, vector double);
 
-/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxvx\M|\mp?lvx\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxvx\M|\mp?lxv\M} 6 } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
index 7d84c2091df..cc124af253f 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-float.c
@@ -28,4 +28,4 @@ BUILD_VAR_TEST( test4, vector float, signed long long, vector 
float);
 BUILD_VAR_TEST( test5, vector float, signed int, vector float);
 BUILD_CST_TEST( test6, vector float, 12, vector float);
 
-/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lvx\M} 6 } } */
+/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lxv\M} 6 } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c
index c6a8226d012..c57adadb6fc 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-int.c
@@ -36,4 +36,4 @@ BUILD_VAR_TEST( test10, vector unsigned int, signed long 
long, vector unsigned i
 BUILD_VAR_TEST( test11, vector unsigned int, signed int, vector unsigned int);
 BUILD_CST_TEST( test12, vector unsigned int, 12, vector unsigned int);
 
-/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lvx\M} 12 } } */
+/* { dg-final { scan-assembler-times 
{\mlxvw4x\M|\mlxvd2x\M|\mlxvx\M|\mp?lxv\M} 12 } } */
diff --git 
a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c
index 6f0cd734475..9105bf14d36 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-load-builtin_vec_xl-longlong.c
@@ -36,4 +36,4 @@ BUILD_VAR_TEST( 

Re: [PING][PATCH] define auto_vec copy ctor and assignment (PR 90904)

2021-08-05 Thread Martin Sebor via Gcc-patches

On 7/30/21 9:06 AM, Jason Merrill wrote:

On 7/27/21 2:56 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575690.html

Are there any other suggestions or comments or is the latest revision
okay to commit?


OK.


I had to make a few more adjustments to fix up code that's snuck
in since I last tested the patch.  I committed r12-2776 after
retesting on x86_64-linux.

With the cleanup out of the way I'll resubmit the copy ctor patch
next.

Martin




On 7/20/21 12:34 PM, Martin Sebor wrote:

On 7/14/21 10:23 AM, Jason Merrill wrote:

On 7/14/21 10:46 AM, Martin Sebor wrote:

On 7/13/21 9:39 PM, Jason Merrill wrote:

On 7/13/21 4:02 PM, Martin Sebor wrote:

On 7/13/21 12:37 PM, Jason Merrill wrote:

On 7/13/21 10:08 AM, Jonathan Wakely wrote:

On Mon, 12 Jul 2021 at 12:02, Richard Biener wrote:

Somebody with more C++ knowledge than me needs to approve the
vec.h changes - I don't feel competent to assess all effects 
of the change.


They look OK to me except for:

-extern vnull vNULL;
+static constexpr vnull vNULL{ };

Making vNULL have static linkage can make it an ODR violation 
to use

vNULL in templates and inline functions, because different
instantiations will refer to a different "vNULL" in each 
translation

unit.


The ODR says this is OK because it's a literal constant with the 
same value (6.2/12.2.1).


But it would be better without the explicit 'static'; then in 
C++17 it's implicitly inline instead of static.


I'll remove the static.



But then, do we really want to keep vNULL at all?  It's a weird 
blurring of the object/pointer boundary that is also dependent 
on vec being a thin wrapper around a pointer.  In almost all 
cases it can be replaced with {}; one exception is == 
comparison, where it seems to be testing that the embedded 
pointer is null, which is a weird thing to want to test.


The one use case I know of for vNULL where I can't think of
an equally good substitute is in passing a vec as an argument by
value.  The only way to do that that I can think of is to name
the full vec type (i.e., the specialization) which is more typing
and less generic than vNULL.  I don't use vNULL myself so I wouldn't
miss this trick if it were to be removed but others might feel
differently.


In C++11, it can be replaced by {} in that context as well.


Cool.  I thought I'd tried { } here but I guess not.




If not, I'm all for getting rid of vNULL but with over 350 uses
of it left, unless there's some clever trick to make the removal
(mostly) effortless and seamless, I'd much rather do it 
independently

of this initial change. I also don't know if I can commit to making
all this cleanup.


I already have a patch to replace all but one use of vNULL, but 
I'll hold off with it until after your patch.


So what's the next step?  The patch only removes a few uses of vNULL
but doesn't add any.  Is it good to go as is (without the static and
with the additional const changes Richard suggested)?  This patch is
attached to my reply to Richard:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575199.html


As Richard wrote:


The pieces where you change vec<> passing to const vec<>& and the few
where you change vec<> * to const vec<> * are OK - this should make 
the

rest a smaller piece to review.


Please go ahead and apply those changes and send a new patch with 
the remainder of the changes.


I have just pushed r12-2418:
https://gcc.gnu.org/pipermail/gcc-cvs/2021-July/350886.html



A few other comments:


-   omp_declare_simd_clauses);
+   *omp_declare_simd_clauses);


Instead of doing this indirection in all of the callers, let's 
change c_finish_omp_declare_simd to take a pointer as well, and do 
the indirection in initializing a reference variable at the top of 
the function.


Okay.




+    sched_init_luids (bbs.to_vec ());
+    haifa_init_h_i_d (bbs.to_vec ());


Why are these to_vec changes needed when you are also changing the 
functions to take const&?


Calling to_vec() here isn't necessary so I've removed it.




-  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo);
+  vec checks = LOOP_VINFO_CHECK_NONZERO (loop_vinfo).to_vec ();


Why not use a reference here and in other similar spots?


Sure, that works too.

Attached is what's left of the original changes now that r12-2418
has been applied.

Martin








[PATCH] diagnose more new/delete mismatches (PR 101791)

2021-08-05 Thread Martin Sebor via Gcc-patches

-Wmismatched-new-delete partly relies on valid_new_delete_pair_p()
to detect matching calls to operator new and delete.  The function
returns a conservative result which, when indicating a mismatch,
the warning then works to make more accurate before it triggers.
As it turns out, the logic is inadvertently overly permissive in
the most common case of a mismatch: between the array and scalar
forms of the global operators new and delete such as in:

  delete new int[2];   // should be delete[]

The attached patch solves the problem by adding a new argument to
valid_new_delete_pair_p() for the function to set to indicate when
its negative result is definitive rather than a conservative guess.

Tested on x86_64-linux.

Martin
Diagnose mismatches between array and scalar new and delete [PR101791].

Resolves:
PR middle-end/101791 - missing warning on a mismatch between scalar and array forms of new and delete


gcc/ChangeLog:

	PR middle-end/101791
	* gimple-ssa-warn-access.cc (new_delete_mismatch_p): Use new argument
	to valid_new_delete_pair_p.
	* tree.c (valid_new_delete_pair_p): Add argument.
	* tree.h (valid_new_delete_pair_p): Same.

gcc/testsuite/ChangeLog:

	PR middle-end/101791
	* g++.dg/warn/Wmismatched-new-delete-6.C: New test.
	* g++.dg/warn/Wmismatched-new-delete-7.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index e4d98b2ec28..c1ced98780f 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -1233,9 +1782,14 @@ new_delete_mismatch_p (tree new_decl, tree delete_decl)
 
   /* valid_new_delete_pair_p() returns a conservative result (currently
  it only handles global operators).  A true result is reliable but
- a false result doesn't necessarily mean the operators don't match.  */
-  if (valid_new_delete_pair_p (new_name, delete_name))
+ a false result doesn't necessarily mean the operators don't match
+ unless CERTAIN is set.  */
+  bool certain;
+  if (valid_new_delete_pair_p (new_name, delete_name, ))
 return false;
+  /* CERTAIN is set when the negative result is certain.  */
+  if (certain)
+return true;
 
   /* For anything not handled by valid_new_delete_pair_p() such as member
  operators compare the individual demangled components of the mangled
diff --git a/gcc/tree.c b/gcc/tree.c
index e923e67b694..5c747321002 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -14311,17 +14311,28 @@ verify_type_context (location_t loc, type_context_kind context,
 	  || targetm.verify_type_context (loc, context, type, silent_p));
 }
 
-/* Return that NEW_ASM and DELETE_ASM name a valid pair of new and
-   delete operators.  */
+/* Return true if NEW_ASM and DELETE_ASM name a valid pair of new and
+   delete operators.  Return false if they may or may not name such
+   a pair and, when nonnull, set *PCERTAIN to true if they certainly
+   do not.  */
 
 bool
-valid_new_delete_pair_p (tree new_asm, tree delete_asm)
+valid_new_delete_pair_p (tree new_asm, tree delete_asm,
+			 bool *pcertain /* = NULL */)
 {
+  bool certain;
+  if (!pcertain)
+pcertain = 
+
   const char *new_name = IDENTIFIER_POINTER (new_asm);
   const char *delete_name = IDENTIFIER_POINTER (delete_asm);
   unsigned int new_len = IDENTIFIER_LENGTH (new_asm);
   unsigned int delete_len = IDENTIFIER_LENGTH (delete_asm);
 
+  /* The following failures are due to invalid names so they're not
+ considered certain mismatches.  */
+  *pcertain = false;
+
   if (new_len < 5 || delete_len < 6)
 return false;
   if (new_name[0] == '_')
@@ -14334,11 +14345,19 @@ valid_new_delete_pair_p (tree new_asm, tree delete_asm)
 ++delete_name, --delete_len;
   if (new_len < 4 || delete_len < 5)
 return false;
+
+  /* The following failures are due to names of user-defined operators
+ so they're also not considered certain mismatches.  */
+
   /* *_len is now just the length after initial underscores.  */
   if (new_name[0] != 'Z' || new_name[1] != 'n')
 return false;
   if (delete_name[0] != 'Z' || delete_name[1] != 'd')
 return false;
+
+  /* The following failures are certain mismatches.  */
+  *pcertain = true;
+
   /* _Znw must match _Zdl, _Zna must match _Zda.  */
   if ((new_name[2] != 'w' || delete_name[2] != 'l')
   && (new_name[2] != 'a' || delete_name[2] != 'a'))
@@ -14377,6 +14396,9 @@ valid_new_delete_pair_p (tree new_asm, tree delete_asm)
 	  && !memcmp (delete_name + 5, "St11align_val_tRKSt9nothrow_t", 29))
 	return true;
 }
+
+  /* The negative result is conservative.  */
+  *pcertain = false;
   return false;
 }
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 972ceb370f8..69d12193686 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -5396,7 +5396,9 @@ extern bool gimple_canonical_types_compatible_p (const_tree, const_tree,
 extern bool type_with_interoperable_signedness (const_tree);
 extern bitmap get_nonnull_args (const_tree);
 extern int get_range_pos_neg (tree);
-extern bool valid_new_delete_pair_p (tree, tree);
+
+/* 

[PATCH] move more code to access warning pass

2021-08-05 Thread Martin Sebor via Gcc-patches

As I mentioned in the description of the access warning pass when
I submitted it in July(*), I planned to change the -Wstringop-xxx
code in the pass to run on the GIMPLE IL instead of on trees in
builtins.c (and elsewhere).  The attached patch implements this
change along with moving more warning code from builtins.c and
calls.c into the pass source.

The changes are mostly mechanical but I should explain one aspect
that might draw attention: since some of the warning functions are
still called from outside the pass with tree arguments, I made them
templates parameterized on the type of the argument: either gimple*
or tree, and provided overloads for each(**).  I expect this to be
a transient solution until remaining callers that pass in trees are
moved into the new pass.  This might take a bit of effort and time
and involve more churn than feels appropriate for a single patch.

Tested on x86_64-linux and by building Glibc and GDB+Binutils with
no new warnings.

As the next steps I plan to:

* integrate the new pass with ranger and enable the pointer query
  caching to avoid repeatedly computing object sizes for statements
  involving related pointers
* move remaining warning code from builtins.c and calls.c (and
  possibly also gimple-fold.c) into the new pass (as much of it
  as possible
* investigate running a subset of the new pass early on during
  optimization in addition to late as it does now, to detect
  problems that are impossible to detect otherwise (i.e., split
  the pass into two stages similar to -Wuninitialized and
  -Wmaybe-uninitialized)
* investigate integrating the uninitialized predicate analyzer
  into the pass to help reduce false positives and perhaps also
  false negatives by enabling maybe-kinds of diagnostics for
  conditional code
* integrate -Warray-bounds into the pass (and remove it from vrp)

Martin

[*] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575377.html
[**] This was made easy by introducing overloads of functions like
get_location(gimple*) and get_location(tree) to return the location
of a GIMPLE statement or a tree decl or expression.
gcc/ChangeLog:

	* builtins.c (expand_builtin_memchr): Move to gimple-ssa-warn-access.cc.
	(expand_builtin_strcat): Same.
	(expand_builtin_stpncpy): Same.
	(expand_builtin_strncat): Same.
	(check_read_access): Same.
	(check_memop_access): Same.
	(expand_builtin_strlen): Move checks to gimple-ssa-warn-access.cc.
	(expand_builtin_strnlen): Same.
	(expand_builtin_memcpy): Same.
	(expand_builtin_memmove): Same.
	(expand_builtin_mempcpy): Same.
	(expand_builtin_strcpy): Same.
	(expand_builtin_strcpy_args): Same.
	(expand_builtin_stpcpy_1): Same.
	(expand_builtin_strncpy): Same.
	(expand_builtin_memset): Same.
	(expand_builtin_bzero): Same.
	(expand_builtin_strcmp): Same.
	(expand_builtin_strncmp): Same.
	(expand_builtin): Remove handlers.
	(fold_builtin_strlen): Add a comment.
	* builtins.h (check_access): Move to gimple-ssa-warn-access.cc.
	* calls.c (maybe_warn_nonstring_arg): Same.
	* gimple-fold.c (gimple_fold_builtin_strcpy): Pass argument to callee.
	(gimple_fold_builtin_stpcpy): Same.
	* gimple-ssa-warn-access.cc (has_location): New function.
	(get_location): Same.
	(get_callee_fndecl): Same.
	(call_nargs): Same.
	(call_arg): Same.
	(warn_string_no_nul): Define.
	(unterminated_array): Same.
	(check_nul_terminated_array): Same.
	(maybe_warn_nonstring_arg): Same.
	(maybe_warn_for_bound): Same.
	(warn_for_access): Same.
	(check_access): Same.
	(check_memop_access): Same.
	(check_read_access): Same.
	(warn_dealloc_offset): Use helper functions.
	(maybe_emit_free_warning): Same.
	(class pass_waccess): Add members.
	(check_strcat): New function.
	(check_strncat): New function.
	(check_stxcpy): New function.
	(check_stxncpy): New function.
	(check_strncmp): New function.
	(pass_waccess::check_builtin): New function.
	(pass_waccess::check): Call it.
	* gimple-ssa-warn-access.h (warn_string_no_nul): Move here from
	builtins.h.
	(maybe_warn_for_bound): Same.
	(check_access): Same.
	(check_memop_access): Same.
	(check_read_access): Same.
	* pointer-query.h (struct access_data): Define a ctor overload.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wsizeof-pointer-memaccess1.c: Also disable
	-Wstringop-overread.
	* c-c++-common/attr-nonstring-3.c: Adjust pattern of expected message.
	* gcc.dg/Warray-bounds-39.c: Add an xfail due to a known bug.
	* gcc.dg/Wstring-compare-3.c: Also disable -Wstringop-overread.
	* gcc.dg/attr-nonstring-2.c: Adjust pattern of expected message.
	* gcc.dg/attr-nonstring-4.c: Same.
	* gcc.dg/Wstringop-overread-6.c: New test.
	* gcc.dg/sso-14.c: Fix typos to avoid buffer overflow.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2387b5d2a5d..d2be807f1d6 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -131,7 +131,6 @@ static rtx expand_builtin_va_copy (tree);
 static rtx inline_expand_builtin_bytecmp (tree, rtx);
 static rtx expand_builtin_strcmp (tree, rtx);
 static rtx expand_builtin_strncmp (tree, rtx, 

Re: [RFC, Fortran] Fix c_float128 and c_float128_complex on targets with 128-bit long double.

2021-08-05 Thread Michael Meissner via Gcc-patches
On Thu, Aug 05, 2021 at 12:19:37PM -0600, Sandra Loosemore wrote:
> On 8/5/21 11:33 AM, Michael Meissner wrote:
> >At the moment, we only fully support C and C++ when changing the long double
> >format (between IEEE 128-bit, IBM 128-bit, and 64-bit) when the compiler is
> >invoked (and assuming you are using GLIBC 2.32 or newer).
> >
> >For Fortran and the other languages, the only way to change the floating 
> >point
> >format is to configure the compiler with the '--with-long-double-format=ieee'
> >configuration option.  This makes TFmode use IEEE 128-bit floating point
> >instead of IBM 128-bit floating point.
> 
> My understanding from reading the code is is that for GNU/Linux
> targets, PowerPC already defaults to the IEEE format for TFmode?
> I'm not sure what targets the IBM format might be the default on.

All PowerPC systems that I'm aware of that use 128-bit floating point use the
IBM format.  It is anticipated that one or more Linux distributions in the
future may move to IEEE 128-bit format, but right now, I'm not aware that any
have moved.

> >It would take somebody familar with the Fortran front end and libraries to 
> >make
> >the same sort of modifications that were made for the C and C++ languages.
> >Basically you need build libgfortran so that it has support for both floating
> >point formats, using different names.  You would need to modify the fortran
> >front end to call the alternate functions when the switch is used to change 
> >the
> >floating point format.  It might be nice to have a Fortran specific way to
> >specify which of the two floating point formats are used for REAL*16 (similar
> >to the use of __float128 and __ibm128 in C/C++, and also _Float128 in just 
> >C).
> >
> >If you are going to do it, good luck.
> 
> Well, I am actually not at all interested in doing that.  My
> questions for the PowerPC experts are:
> 
> (1) When is the __float128 type defined, and which format does it
> specify?  Is it always the IEEE format, or does it specify the same
> format as TFmode/long double?

__float128 (and _Float128 in C) is always IEEE 128-bit, if IEEE 128-bit is
supported.  By default, IEEE 128-bit is only supported on little endian PowerPC
64-bit systems.

For C (but not C++), you can declare constants with the f128 suffix so that
they would be compatible with the _Float128 type.

> 
> (2) If __float128 is not always the same 128-bit format as
> TFmode/long double, how can I detect that in the Fortran front end
> in a target-independent way?  Is it possible without adding a new
> target hook?

You can look at the constants in float.h:

For a system with IBM long double:

FLT_RADIX= 2
LDBL_MANT_DIG= 106
LDBL_DIG = 31
LDBL_MIN_EXP = -968
LDBL_MIN_10_EXP  = -291
LDBL_MAX_EXP = 1024
LDBL_MAX_10_EXP  = 308

For a system with IEEE long double:

FLT_RADIX= 2
LDBL_MANT_DIG= 113
LDBL_DIG = 33
LDBL_MIN_EXP = -16381
LDBL_MIN_10_EXP  = -4931
LDBL_MAX_EXP = 16384
LDBL_MAX_10_EXP  = 4932

For a system that uses 64-bit numbers for long double:

FLT_RADIX= 2
LDBL_MANT_DIG= 53
LDBL_DIG = 15
LDBL_MIN_EXP = -1021
LDBL_MIN_10_EXP  = -307
LDBL_MAX_EXP = 1024
LDBL_MAX_10_EXP  = 308

In addition, the PowerPC GCC defines __LONG_DOUBLE_IEEE128__ if long double is
IEEE 128-bit, and __LONG_DOUBLE_IBM128__ if long double IBM 128-bit.  If long
double is 64-bit, the macro __LONG_DOUBLE_128__ is not defined.

> (3) Can we do anything about the "Additional Floating Types" section
> in extend.texi?  It's not clear on the answer to (1), and I think
> the stuff about "future versions of GCC" is bit-rotten as it goes
> back to at least GCC 6.  (Either it's been implemented by now, or
> the idea was discarded.)
> 
> Basically, I want the Fortran front end to define c_float128 to 16
> if C supports __float128 and it corresponds to TFmode, otherwise it
> ought to continue to define c_float128 to -4.  I do not want to make
> the Fortran front end support multiple 128-bit encodings at the same
> time, just accurately define c_float128.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-08-05 Thread Segher Boessenkool
On Thu, Aug 05, 2021 at 06:49:21PM +0200, Martin Liška wrote:
> On 8/5/21 5:39 PM, Segher Boessenkool wrote:
> >>>If -mbork is the default, the coompiler whould behave the same if you
> >>>invoke it with -mbork as when you do not.  And the optimize attribute
> >>>should work exactly the same as command line options.
> >>
> >>Ah, got your point. All right, let's use then 'optimize(1)'.
> >>
> >>Is it fine with the adjustment?
> >
> >You are saying the compiler's behaviour is broken, but are changing the
> >testcase to avoid exhibiting that behaviour?
> 
> No, both selections of the 'optimize' attribute trigger the ICE. The reason
> is that any optimize attribute triggers eventually 
> cl_target_option_save/restore
> mechanism which is what the patch addresses.

Aha.  So say that in the test case!  That the test case is to check if
the bug fixed in  has been reintroduced (or describe the bug
more precisely if needed).

Okay for trunk with that fixed.  Thanks!


Segher


Go patch committed: extend runtime/internal/atomic to sync/atomic

2021-08-05 Thread Ian Lance Taylor via Gcc-patches
This Go patch extends the internal runtime/internal/atomic package to
match the externally visible sync/atomic package.  This is the
gofrontend version of https://golang.org/cl/289152.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
a6b138f257a431f3337f3c6bf53f7865c455ba60
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 394530c1cbc..19ab2de5c18 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-b47bcf942daa9a0c252db9b57b8f138adbfcdaa2
+32590102c464679f845667b5554e1dcce2549ad2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/expressions.cc b/gcc/go/gofrontend/expressions.cc
index 3e433d6c20d..33177a709ab 100644
--- a/gcc/go/gofrontend/expressions.cc
+++ b/gcc/go/gofrontend/expressions.cc
@@ -11590,12 +11590,10 @@ Call_expression::intrinsify(Gogo* gogo,
   // sync/atomic functions and runtime/internal/atomic functions
   // are very similar. In order not to duplicate code, we just
   // redirect to the latter and let the code below to handle them.
-  // In case there is no equivalent functions (slight variance
-  // in types), we just make an artificial name (begin with '$').
   // Note: no StorePointer, SwapPointer, and CompareAndSwapPointer,
   // as they need write barriers.
   if (name == "LoadInt32")
-name = "$Loadint32";
+name = "Loadint32";
   else if (name == "LoadInt64")
 name = "Loadint64";
   else if (name == "LoadUint32")
@@ -11607,9 +11605,9 @@ Call_expression::intrinsify(Gogo* gogo,
   else if (name == "LoadPointer")
 name = "Loadp";
   else if (name == "StoreInt32")
-name = "$Storeint32";
+name = "Storeint32";
   else if (name == "StoreInt64")
-name = "$Storeint64";
+name = "Storeint64";
   else if (name == "StoreUint32")
 name = "Store";
   else if (name == "StoreUint64")
@@ -11617,7 +11615,7 @@ Call_expression::intrinsify(Gogo* gogo,
   else if (name == "StoreUintptr")
 name = "Storeuintptr";
   else if (name == "AddInt32")
-name = "$Xaddint32";
+name = "Xaddint32";
   else if (name == "AddInt64")
 name = "Xaddint64";
   else if (name == "AddUint32")
@@ -11627,9 +11625,9 @@ Call_expression::intrinsify(Gogo* gogo,
   else if (name == "AddUintptr")
 name = "Xadduintptr";
   else if (name == "SwapInt32")
-name = "$Xchgint32";
+name = "Xchgint32";
   else if (name == "SwapInt64")
-name = "$Xchgint64";
+name = "Xchgint64";
   else if (name == "SwapUint32")
 name = "Xchg";
   else if (name == "SwapUint64")
@@ -11637,9 +11635,9 @@ Call_expression::intrinsify(Gogo* gogo,
   else if (name == "SwapUintptr")
 name = "Xchguintptr";
   else if (name == "CompareAndSwapInt32")
-name = "$Casint32";
+name = "Casint32";
   else if (name == "CompareAndSwapInt64")
-name = "$Casint64";
+name = "Casint64";
   else if (name == "CompareAndSwapUint32")
 name = "Cas";
   else if (name == "CompareAndSwapUint64")
@@ -11875,7 +11873,7 @@ Call_expression::intrinsify(Gogo* gogo,
 
   if ((name == "Load" || name == "Load64" || name == "Loadint64" || name 
== "Loadp"
|| name == "Loaduint" || name == "Loaduintptr" || name == "LoadAcq"
-   || name == "$Loadint32")
+   || name == "Loadint32")
   && this->args_ != NULL && this->args_->size() == 1)
 {
   if (int_size < 8 && (name == "Load64" || name == "Loadint64"))
@@ -11895,7 +11893,7 @@ Call_expression::intrinsify(Gogo* gogo,
   code = Runtime::ATOMIC_LOAD_8;
   res_type = uint64_type;
 }
-  else if (name == "$Loadint32")
+  else if (name == "Loadint32")
 {
   code = Runtime::ATOMIC_LOAD_4;
   res_type = int32_type;
@@ -11942,10 +11940,10 @@ Call_expression::intrinsify(Gogo* gogo,
 
   if ((name == "Store" || name == "Store64" || name == "StorepNoWB"
|| name == "Storeuintptr" || name == "StoreRel"
-   || name == "$Storeint32" || name == "$Storeint64")
+   || name == "Storeint32" || name == "Storeint64")
   && this->args_ != NULL && this->args_->size() == 2)
 {
-  if (int_size < 8 && (name == "Store64" || name == "$Storeint64"))
+  if (int_size < 8 && (name == "Store64" || name == "Storeint64"))
 return NULL;
 
   Runtime::Function code;
@@ -11955,9 +11953,9 @@ Call_expression::intrinsify(Gogo* gogo,
 code = Runtime::ATOMIC_STORE_4;
   else if (name == "Store64")
 code = Runtime::ATOMIC_STORE_8;
-  else if (name == "$Storeint32")
+  else if (name == "Storeint32")
 code = 

Re: [RFC, Fortran] Fix c_float128 and c_float128_complex on targets with 128-bit long double.

2021-08-05 Thread Sandra Loosemore

On 8/5/21 11:33 AM, Michael Meissner wrote:

At the moment, we only fully support C and C++ when changing the long double
format (between IEEE 128-bit, IBM 128-bit, and 64-bit) when the compiler is
invoked (and assuming you are using GLIBC 2.32 or newer).

For Fortran and the other languages, the only way to change the floating point
format is to configure the compiler with the '--with-long-double-format=ieee'
configuration option.  This makes TFmode use IEEE 128-bit floating point
instead of IBM 128-bit floating point.


My understanding from reading the code is is that for GNU/Linux targets, 
PowerPC already defaults to the IEEE format for TFmode?  I'm not sure 
what targets the IBM format might be the default on.



It would take somebody familar with the Fortran front end and libraries to make
the same sort of modifications that were made for the C and C++ languages.
Basically you need build libgfortran so that it has support for both floating
point formats, using different names.  You would need to modify the fortran
front end to call the alternate functions when the switch is used to change the
floating point format.  It might be nice to have a Fortran specific way to
specify which of the two floating point formats are used for REAL*16 (similar
to the use of __float128 and __ibm128 in C/C++, and also _Float128 in just C).

If you are going to do it, good luck.


Well, I am actually not at all interested in doing that.  My questions 
for the PowerPC experts are:


(1) When is the __float128 type defined, and which format does it 
specify?  Is it always the IEEE format, or does it specify the same 
format as TFmode/long double?


(2) If __float128 is not always the same 128-bit format as TFmode/long 
double, how can I detect that in the Fortran front end in a 
target-independent way?  Is it possible without adding a new target hook?


(3) Can we do anything about the "Additional Floating Types" section in 
extend.texi?  It's not clear on the answer to (1), and I think the stuff 
about "future versions of GCC" is bit-rotten as it goes back to at least 
GCC 6.  (Either it's been implemented by now, or the idea was discarded.)


Basically, I want the Fortran front end to define c_float128 to 16 if C 
supports __float128 and it corresponds to TFmode, otherwise it ought to 
continue to define c_float128 to -4.  I do not want to make the Fortran 
front end support multiple 128-bit encodings at the same time, just 
accurately define c_float128.


-Sandra



Re: [committed] libstdc++: Move attributes that follow requires-clauses [PR101782]

2021-08-05 Thread Jonathan Wakely via Gcc-patches

On 05/08/21 15:40 +0100, Jonathan Wakely wrote:

On 05/08/21 15:19 +0100, Jonathan Wakely wrote:

On 04/08/21 12:55 +0100, Jonathan Wakely wrote:

This adds [[nodiscard]] throughout , as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.


This change causes errors when -fconcepts-ts is used. Fixed like so.

Tested powerpc64le-linux, committed to trunk.




commit 7b1de3eb9ed3f8dde54732d88520292c5ad1157d
Author: Jonathan Wakely 
Date:   Thu Aug 5 13:34:00 2021

  libstdc++: Move attributes that follow requires-clauses [PR101782]
  As explained in the PR, the grammar in the Concepts TS means that a [
  token following a requires-clause is parsed as part of the
  logical-or-expression rather than the start of an attribute. That makes
  the following ill-formed when using -fconcepts-ts:
template requires foo [[nodiscard]] int f(T);
  This change moves all attributes that follow a requires-clause to the
  end of the function declarator.



Except that as Jakub pointed out, putting it there doesn't work.

It needs to be:

 template requires foo int f [[nodiscard]] (T);

At least the testsuite isn't failing now, but the attributes I moved
have no effect. I'll fix it ... some time.


This should be correct now.

Tested powerpc64le-linux, pushed to trunk.


commit c8b024fa4b76bfd914e96dd3cecfbb6ee8e91316
Author: Jonathan Wakely 
Date:   Thu Aug 5 16:46:00 2021

libstdc++: Move [[nodiscard]] attributes again [PR101782]

Where I moved these nodiscard attributes to made them apply to the
function type, not to the function. This meant they no longer generated
the desired -Wunused-result warnings, and were ill-formed with Clang
(but only a pedwarn with GCC).

Clang also detected ill-formed attributes in  which this fixes.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101782
* include/bits/ranges_base.h (ranges::begin, ranges::end)
(ranges::rbegin, ranges::rend, ranges::size, ranges::ssize)
(ranges::empty, ranges::data): Move attribute after the
declarator-id instead of at the end of the declarator.
* include/bits/stl_iterator.h (__gnu_cxx::__normal_iterator):
Move attributes back to the start of the function declarator,
but move the requires-clause to the end.
(common_iterator): Move attribute after the declarator-id.
* include/bits/stl_queue.h (queue): Remove ill-formed attributes
from friend declaration that are not definitions.
* include/std/ranges (views::all, views::filter)
(views::transform, views::take, views::take_while,
views::drop) (views::drop_while, views::join,
views::lazy_split) (views::split, views::counted,
views::common, views::reverse) (views::elements): Move
attributes after the declarator-id.

diff --git a/libstdc++-v3/include/bits/ranges_base.h b/libstdc++-v3/include/bits/ranges_base.h
index 1dac9680b4f..49c7d9c9f06 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -111,8 +111,7 @@ namespace ranges
 	requires is_array_v> || __member_begin<_Tp>
 	  || __adl_begin<_Tp>
 	constexpr auto
-	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
-	[[nodiscard]]
+	operator()[[nodiscard]](_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
 	{
 	  if constexpr (is_array_v>)
 	{
@@ -163,8 +162,7 @@ namespace ranges
 	requires is_bounded_array_v>
 	  || __member_end<_Tp> || __adl_end<_Tp>
 	constexpr auto
-	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
-	[[nodiscard]]
+	operator()[[nodiscard]](_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
 	{
 	  if constexpr (is_bounded_array_v>)
 	{
@@ -268,9 +266,8 @@ namespace ranges
   template<__maybe_borrowed_range _Tp>
 	requires __member_rbegin<_Tp> || __adl_rbegin<_Tp> || __reversable<_Tp>
 	constexpr auto
-	operator()(_Tp&& __t) const
+	operator()[[nodiscard]](_Tp&& __t) const
 	noexcept(_S_noexcept<_Tp&>())
-	[[nodiscard]]
 	{
 	  if constexpr (__member_rbegin<_Tp>)
 	return __t.rbegin();
@@ -327,9 +324,8 @@ namespace ranges
   template<__maybe_borrowed_range _Tp>
 	requires __member_rend<_Tp> || __adl_rend<_Tp> || __reversable<_Tp>
 	constexpr auto
-	operator()(_Tp&& __t) const
+	operator()[[nodiscard]](_Tp&& __t) const
 	noexcept(_S_noexcept<_Tp&>())
-	[[nodiscard]]
 	{
 	  if constexpr (__member_rend<_Tp>)
 	return __t.rend();
@@ -417,8 +413,7 @@ namespace ranges
 	requires is_bounded_array_v>
 	  || __member_size<_Tp> || __adl_size<_Tp> || __sentinel_size<_Tp>
 	constexpr auto
-	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
-	[[nodiscard]]
+	operator()[[nodiscard]](_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
 	{
 	  if constexpr 

Re: [RFC, Fortran] Fix c_float128 and c_float128_complex on targets with 128-bit long double.

2021-08-05 Thread Michael Meissner via Gcc-patches
On Wed, Aug 04, 2021 at 02:14:07PM -0600, Sandra Loosemore wrote:
> I was trying last week to run my not-yet-committed TS29113 testsuite
> on a powerpc64le-linux-gnu target and ran into some problems with
> the kind constants c_float128 and c_float128_complex from the
> ISO_C_BINDING module; per the gfortran manual they are supposed to
> represent the kind of the gcc extension type __float128 and the
> corresponding complex type.  They were being set to -4 (e.g., not
> supported) instead of 16, although this target does define
> __float128 and real(16) is accepted as a supported type by the
> Fortran front end.
> 
> Anyway, the root of the problem is that the definition of these
> constants only looked at gfc_float128_type_node, which only gets set
> if TFmode is not the same type as long_double_type_node.  I
> experimented with setting gfc_float128_type_node =
> long_double_type_node but that caused various Bad Things to happen
> elsewhere in code that expected them to be distinct types, so I
> ended up with this minimally intrusive patch that only tweaks the
> definitions of the c_float128 and c_float128_complex constants.
> 
> I'm not sure this is completely correct, though.  I see PowerPC
> supports 2 different 128-bit encodings and it looks like TFmode/long
> double is mapped onto the one selected by the ABI and/or
> command-line options; that's the only one the Fortran front end
> knows about.  All of TFmode, IFmode, and KFmode would map onto kind
> 16 anyway (in spite of having different TYPE_PRECISION values) so
> Fortran wouldn't be able to distinguish them.  The thing that
> confuses me is how/when the rs6000 backend defines __float128; it
> looks like the documentation in the GCC manual doesn't agree with
> the code, and I'm not sure what the intended behavior really is.  Is
> it possible that __float128 could end up defined but specifying a
> different type than TFmode, and if so is there a target-independent
> way to identify that situation?  Can the PowerPC experts help
> straighten me out?
> 
> -Sandra

At the moment, we only fully support C and C++ when changing the long double
format (between IEEE 128-bit, IBM 128-bit, and 64-bit) when the compiler is
invoked (and assuming you are using GLIBC 2.32 or newer).

For Fortran and the other languages, the only way to change the floating point
format is to configure the compiler with the '--with-long-double-format=ieee'
configuration option.  This makes TFmode use IEEE 128-bit floating point
instead of IBM 128-bit floating point.

It would take somebody familar with the Fortran front end and libraries to make
the same sort of modifications that were made for the C and C++ languages.
Basically you need build libgfortran so that it has support for both floating
point formats, using different names.  You would need to modify the fortran
front end to call the alternate functions when the switch is used to change the
floating point format.  It might be nice to have a Fortran specific way to
specify which of the two floating point formats are used for REAL*16 (similar
to the use of __float128 and __ibm128 in C/C++, and also _Float128 in just C).

If you are going to do it, good luck.

FWIW, I have built GCC compilers with the alternate floating point format, and
I've been able to run the test suite and compile the Spec 2017 test suite with
it.

Generally to build a bootstrap compiler with the alternate long double
representation I go through the following steps.

1) Make sure you have a GLIBC that supports switching long double
representation (2.32 or new).  I tend to use the IBM Advance Toolchain AT
14.0-3 to provide the new library, but if the native GLIBC on your system is
new enough you could use that.  I've discovered that there are problems if the
GCC zlib is use, so I use the system zlib.  The options I use when configuring
the compiler include:

--with-advance-toolchain=at14.0
--with-system-zlib
--with-native-system-header-dir=/opt/at14.0/include

If you are building and running on a power9 system, it is helpful if you add
the option to set the default CPU to power9, since that way the compiler will
automatically use the hardware IEEE 128-bit instructions that were added in
power9.  Otherwise it has to go through a call to do each operation.  If you
are running on a power9, we use the ifunc attribute to select modules that are
built to use the hardware instruction.

--with-cpu=power9

2) Build a non-bootstrap compiler with the '--with-long-double-format=ieee'
configuration option, and install it somewhere.

3) With the compiler built in step #2, build the three libraries GCC uses (MPC,
MPFR, and GMP), and install them some place.  You need to do this because these
libraries have functions in them with long double arguments.  GCC doesn't
actually use the functions with the long double arguments, but you will get a
linker warning about mixing long double floating point type.

4) With the compiler built in step #2 

[PATCH 4/4] aarch64: Use memcpy to copy structures in bfloat vst* intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst[234][q] and vst1[q]_x[234] bfloat
Neon intrinsics in arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst[234]q or vst1q_x[234] bfloat
intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-30  Jonathan Wright  

* config/aarch64/arm_neon.h (vst1_bf16_x2): Use
__builtin_memcpy instead of constructing an additional
__builtin_aarch64_simd_oi one vector at a time.
(vst1q_bf16_x2): Likewise.
(vst1_bf16_x3): Use __builtin_memcpy instead of constructing
an additional __builtin_aarch64_simd_ci one vector at a time.
(vst1q_bf16_x3): Likewise.
(vst1_bf16_x4): Use __builtin_memcpy instead of a union.
(vst1q_bf16_x4): Likewise.
(vst2_bf16): Use __builtin_memcpy instead of constructing an
additional __builtin_aarch64_simd_oi one vector at a time.
(vst2q_bf16): Likewise.
(vst3_bf16): Use __builtin_memcpy instead of constructing an
additional __builtin_aarch64_simd_ci mode one vector at a
time.
(vst3q_bf16): Likewise.
(vst4_bf16): Use __builtin_memcpy instead of constructing an
additional __builtin_aarch64_simd_xi one vector at a time.
(vst4q_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14731.patch
Description: rb14731.patch


[PATCH 3/4] aarch64: Use memcpy to copy structures in vst2[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst2[q]_lane Neon intrinsics in
arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst2q_lane intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-30  Jonathan Wright  

* config/aarch64/arm_neon.h (__ST2_LANE_FUNC): Delete.
(__ST2Q_LANE_FUNC): Delete.
(vst2_lane_f16): Use __builtin_memcpy to copy vector
structure instead of constructing __builtin_aarch64_simd_oi
one vector at a time.
(vst2_lane_f32): Likewise.
(vst2_lane_f64): Likewise.
(vst2_lane_p8): Likewise.
(vst2_lane_p16): Likewise.
(vst2_lane_p64): Likewise.
(vst2_lane_s8): Likewise.
(vst2_lane_s16): Likewise.
(vst2_lane_s32): Likewise.
(vst2_lane_s64): Likewise.
(vst2_lane_u8): Likewise.
(vst2_lane_u16): Likewise.
(vst2_lane_u32): Likewise.
(vst2_lane_u64): Likewise.
(vst2_lane_bf16): Likewise.
(vst2q_lane_f16): Use __builtin_memcpy to copy vector
structure instead of using a union.
(vst2q_lane_f32): Likewise.
(vst2q_lane_f64): Likewise.
(vst2q_lane_p8): Likewise.
(vst2q_lane_p16): Likewise.
(vst2q_lane_p64): Likewise.
(vst2q_lane_s8): Likewise.
(vst2q_lane_s16): Likewise.
(vst2q_lane_s32): Likewise.
(vst2q_lane_s64): Likewise.
(vst2q_lane_u8): Likewise.
(vst2q_lane_u16): Likewise.
(vst2q_lane_u32): Likewise.
(vst2q_lane_u64): Likewise.
(vst2q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14730.patch
Description: rb14730.patch


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-08-05 Thread Martin Liška

On 8/5/21 5:39 PM, Segher Boessenkool wrote:

On Thu, Aug 05, 2021 at 02:05:24PM +0200, Martin Liška wrote:

On 7/23/21 7:57 PM, Segher Boessenkool wrote:

Hi!

On Fri, Jul 23, 2021 at 07:47:54AM +0200, Martin Liška wrote:

On 7/12/21 7:20 PM, Segher Boessenkool wrote:

+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof
(f) *


-fno-stack-protector is default.


Yes, but one needs an optimize attribute in order to trigger
cl_target_option_save/restore
mechanism.


So it behaves differently if you select the default than if you do not
select anything?  That is wrong, no?


Sorry, I don't get your example, please explain it.


If -mbork is the default, the coompiler whould behave the same if you
invoke it with -mbork as when you do not.  And the optimize attribute
should work exactly the same as command line options.


Ah, got your point. All right, let's use then 'optimize(1)'.

Is it fine with the adjustment?


You are saying the compiler's behaviour is broken, but are changing the
testcase to avoid exhibiting that behaviour?


No, both selections of the 'optimize' attribute trigger the ICE. The reason
is that any optimize attribute triggers eventually cl_target_option_save/restore
mechanism which is what the patch addresses.


No, this is not fine.

If a flag is the default the compiler should do the same thing with and
without any attribute setting that flag, directly or indirectly.


Yes, but should not crash. And that's what is my patch dealing with.

Cheers,
Martin




Segher





[PATCH 2/4] aarch64: Use memcpy to copy structures in vst3[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst3[q]_lane Neon intrinsics in
arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst3q_lane intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-30  Jonathan Wright  

* config/aarch64/arm_neon.h (__ST3_LANE_FUNC): Delete.
(__ST3Q_LANE_FUNC): Delete.
(vst3_lane_f16): Use __builtin_memcpy to copy vector
structure instead of constructing __builtin_aarch64_simd_ci
one vector at a time.
(vst3_lane_f32): Likewise.
(vst3_lane_f64): Likewise.
(vst3_lane_p8): Likewise.
(vst3_lane_p16): Likewise.
(vst3_lane_p64): Likewise.
(vst3_lane_s8): Likewise.
(vst3_lane_s16): Likewise.
(vst3_lane_s32): Likewise.
(vst3_lane_s64): Likewise.
(vst3_lane_u8): Likewise.
(vst3_lane_u16): Likewise.
(vst3_lane_u32): Likewise.
(vst3_lane_u64): Likewise.
(vst3_lane_bf16): Likewise.
(vst3q_lane_f16): Use __builtin_memcpy to copy vector
structure instead of using a union.
(vst3q_lane_f32): Likewise.
(vst3q_lane_f64): Likewise.
(vst3q_lane_p8): Likewise.
(vst3q_lane_p16): Likewise.
(vst3q_lane_p64): Likewise.
(vst3q_lane_s8): Likewise.
(vst3q_lane_s16): Likewise.
(vst3q_lane_s32): Likewise.
(vst3q_lane_s64): Likewise.
(vst3q_lane_u8): Likewise.
(vst3q_lane_u16): Likewise.
(vst3q_lane_u32): Likewise.
(vst3q_lane_u64): Likewise.
(vst3q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14729.patch
Description: rb14729.patch


[PATCH v2 7/7] fortran: Ignore unused args in scalarization [PR97896]

2021-08-05 Thread Mikael Morin via Gcc-patches

The KIND argument of the INDEX intrinsic is a compile time constant
that is used at compile time only to resolve to a kind-specific library
method.  It is otherwise completely ignored at runtime, and there is
no code generated for it as the library procedure has no kind argument.
This confuses the scalarizer which expects to see every argument
of elemental functions to be used when calling a procedure.
This change removes the argument from the scalarization lists
at the beginning of the scalarization process, so that the argument
is completely ignored.

gcc/fortran/
PR fortran/97896
* interface.c (gfc_dummy_arg_get_name): New function.
* gfortran.h (gfc_dummy_arg_get_name): Declare it.
* trans-array.h (gfc_get_intrinsic_for_expr,
gfc_get_proc_ifc_for_expr): New.
* trans-array.c (gfc_get_intrinsic_for_expr,
arg_evaluated_for_scalarization): New.
(gfc_walk_elemental_function_args): Add intrinsic procedure
as argument.  Check arg_evaluated_for_scalarization.
* trans-intrinsic.c (gfc_walk_intrinsic_function): Update call.
* trans-stmt.c (get_intrinsic_for_code): New.
(gfc_trans_call): Update call.

gcc/testsuite/
PR fortran/97896
* gfortran.dg/index_5.f90: New.
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/interface.c   | 17 +
 gcc/fortran/trans-array.c | 51 ++-
 gcc/fortran/trans-array.h |  3 ++
 gcc/fortran/trans-intrinsic.c |  1 +
 gcc/fortran/trans-stmt.c  | 20 +++
 gcc/testsuite/gfortran.dg/index_5.f90 | 23 
 7 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/index_5.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 5a28d1408eb..4035d260498 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2196,6 +2196,7 @@ struct gfc_dummy_arg
 #define gfc_get_dummy_arg() XCNEW (gfc_dummy_arg)
 
 
+const char * gfc_dummy_arg_get_name (gfc_dummy_arg &);
 const gfc_typespec & gfc_dummy_arg_get_typespec (gfc_dummy_arg &);
 bool gfc_dummy_arg_is_optional (gfc_dummy_arg &);
 
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 7289374e932..22aa916c88e 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -5400,6 +5400,23 @@ gfc_get_formal_from_actual_arglist (gfc_symbol *sym,
 }
 
 
+const char *
+gfc_dummy_arg_get_name (gfc_dummy_arg & dummy_arg)
+{
+  switch (dummy_arg.kind)
+{
+case GFC_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.intrinsic->name;
+
+case GFC_NON_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.non_intrinsic->sym->name;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+
 const gfc_typespec &
 gfc_dummy_arg_get_typespec (gfc_dummy_arg & dummy_arg)
 {
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 6ae72a354e5..96b0a2583b0 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -11201,6 +11201,51 @@ gfc_get_proc_ifc_for_expr (gfc_expr *procedure_ref)
 }
 
 
+/* Given an expression referring to an intrinsic function call,
+   return the intrinsic symbol.  */
+
+gfc_intrinsic_sym *
+gfc_get_intrinsic_for_expr (gfc_expr *call)
+{
+  if (call == NULL)
+return NULL;
+
+  /* Normal procedure case.  */
+  if (call->expr_type == EXPR_FUNCTION)
+return call->value.function.isym;
+  else
+return NULL;
+}
+
+
+/* Indicates whether an argument to an intrinsic function should be used in
+   scalarization.  It is usually the case, except for some intrinsics
+   requiring the value to be constant, and using the value at compile time only.
+   As the value is not used at runtime in those cases, we don’t produce code
+   for it, and it should not be visible to the scalarizer.  */
+
+static bool
+arg_evaluated_for_scalarization (gfc_intrinsic_sym *function,
+ gfc_dummy_arg *dummy_arg)
+{
+  if (function != NULL && dummy_arg != NULL)
+{
+  switch (function->id)
+	{
+	  case GFC_ISYM_INDEX:
+	if (strcmp ("kind", gfc_dummy_arg_get_name (*dummy_arg)) == 0)
+	  return false;
+	  /* Fallthrough.  */
+
+	  default:
+	break;
+	}
+}
+
+  return true;
+}
+
+
 /* Walk the arguments of an elemental function.
PROC_EXPR is used to check whether an argument is permitted to be absent.  If
it is NULL, we don't do the check and the argument is assumed to be present.
@@ -11208,6 +11253,7 @@ gfc_get_proc_ifc_for_expr (gfc_expr *procedure_ref)
 
 gfc_ss *
 gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
+  gfc_intrinsic_sym *intrinsic_sym,
   gfc_ss_type type)
 {
   int scalar;
@@ -11222,7 +11268,9 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
   for (; arg; arg = arg->next)
 {
   gfc_dummy_arg * const dummy_arg = arg->associated_dummy;
-  if (!arg->expr || arg->expr->expr_type == EXPR_NULL)
+  if (!arg->expr
+	

[PATCH v2 6/7] Revert "Remove KIND argument from INDEX so it does not mess up scalarization."

2021-08-05 Thread Mikael Morin via Gcc-patches

This reverts commit d09847357b965a2c2cda063827ce362d4c9c86f2 except for
its testcase.

gcc/fortran/
* intrinsic.c (add_sym_4ind): Remove.
(add_functions): Use add_sym4 instead of add_sym4ind.
Don’t special case the index intrinsic.
* iresolve.c (gfc_resolve_index_func): Use the individual arguments
directly instead of the full argument list.
* intrinsic.h (gfc_resolve_index_func): Update the declaration
accordingly.
* trans-decl.c (gfc_get_extern_function_decl): Don’t modify the
list of arguments in the case of the index intrinsic.
---
 gcc/fortran/intrinsic.c  | 48 ++--
 gcc/fortran/intrinsic.h  |  3 ++-
 gcc/fortran/iresolve.c   | 21 --
 gcc/fortran/trans-decl.c | 24 +---
 4 files changed, 14 insertions(+), 82 deletions(-)

diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index d8bf5732e0a..5cd4225762b 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -888,39 +888,6 @@ add_sym_4 (const char *name, gfc_isym_id id, enum klass cl, int actual_ok, bt ty
 	   (void *) 0);
 }
 
-/* Add a symbol to the function list where the function takes 4
-   arguments and resolution may need to change the number or
-   arrangement of arguments. This is the case for INDEX, which needs
-   its KIND argument removed.  */
-
-static void
-add_sym_4ind (const char *name, gfc_isym_id id, enum klass cl, int actual_ok,
-	  bt type, int kind, int standard,
-	  bool (*check) (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *),
-	  gfc_expr *(*simplify) (gfc_expr *, gfc_expr *, gfc_expr *,
- gfc_expr *),
-	  void (*resolve) (gfc_expr *, gfc_actual_arglist *),
-	  const char *a1, bt type1, int kind1, int optional1,
-	  const char *a2, bt type2, int kind2, int optional2,
-	  const char *a3, bt type3, int kind3, int optional3,
-	  const char *a4, bt type4, int kind4, int optional4 )
-{
-  gfc_check_f cf;
-  gfc_simplify_f sf;
-  gfc_resolve_f rf;
-
-  cf.f4 = check;
-  sf.f4 = simplify;
-  rf.f1m = resolve;
-
-  add_sym (name, id, cl, actual_ok, type, kind, standard, cf, sf, rf,
-	   a1, type1, kind1, optional1, INTENT_IN,
-	   a2, type2, kind2, optional2, INTENT_IN,
-	   a3, type3, kind3, optional3, INTENT_IN,
-	   a4, type4, kind4, optional4, INTENT_IN,
-	   (void *) 0);
-}
-
 
 /* Add a symbol to the subroutine list where the subroutine takes
4 arguments.  */
@@ -2223,11 +2190,11 @@ add_functions (void)
 
   /* The resolution function for INDEX is called gfc_resolve_index_func
  because the name gfc_resolve_index is already used in resolve.c.  */
-  add_sym_4ind ("index", GFC_ISYM_INDEX, CLASS_ELEMENTAL, ACTUAL_YES,
-		BT_INTEGER, di, GFC_STD_F77,
-		gfc_check_index, gfc_simplify_index, gfc_resolve_index_func,
-		stg, BT_CHARACTER, dc, REQUIRED, ssg, BT_CHARACTER, dc, REQUIRED,
-		bck, BT_LOGICAL, dl, OPTIONAL, kind, BT_INTEGER, di, OPTIONAL);
+  add_sym_4 ("index", GFC_ISYM_INDEX, CLASS_ELEMENTAL, ACTUAL_YES,
+	 BT_INTEGER, di, GFC_STD_F77,
+	 gfc_check_index, gfc_simplify_index, gfc_resolve_index_func,
+	 stg, BT_CHARACTER, dc, REQUIRED, ssg, BT_CHARACTER, dc, REQUIRED,
+	 bck, BT_LOGICAL, dl, OPTIONAL, kind, BT_INTEGER, di, OPTIONAL);
 
   make_generic ("index", GFC_ISYM_INDEX, GFC_STD_F77);
 
@@ -4547,10 +4514,9 @@ resolve_intrinsic (gfc_intrinsic_sym *specific, gfc_expr *e)
 
   arg = e->value.function.actual;
 
-  /* Special case hacks for MIN, MAX and INDEX.  */
+  /* Special case hacks for MIN and MAX.  */
   if (specific->resolve.f1m == gfc_resolve_max
-  || specific->resolve.f1m == gfc_resolve_min
-  || specific->resolve.f1m == gfc_resolve_index_func)
+  || specific->resolve.f1m == gfc_resolve_min)
 {
   (*specific->resolve.f1m) (e, arg);
   return;
diff --git a/gcc/fortran/intrinsic.h b/gcc/fortran/intrinsic.h
index 2148f89e194..b195e0b271a 100644
--- a/gcc/fortran/intrinsic.h
+++ b/gcc/fortran/intrinsic.h
@@ -521,7 +521,8 @@ void gfc_resolve_ibits (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *);
 void gfc_resolve_ibset (gfc_expr *, gfc_expr *, gfc_expr *);
 void gfc_resolve_image_index (gfc_expr *, gfc_expr *, gfc_expr *);
 void gfc_resolve_image_status (gfc_expr *, gfc_expr *, gfc_expr *);
-void gfc_resolve_index_func (gfc_expr *, gfc_actual_arglist *);
+void gfc_resolve_index_func (gfc_expr *, gfc_expr *, gfc_expr *, gfc_expr *,
+			 gfc_expr *);
 void gfc_resolve_ierrno (gfc_expr *);
 void gfc_resolve_ieor (gfc_expr *, gfc_expr *, gfc_expr *);
 void gfc_resolve_ichar (gfc_expr *, gfc_expr *, gfc_expr *);
diff --git a/gcc/fortran/iresolve.c b/gcc/fortran/iresolve.c
index e17fe45f080..598c0409b66 100644
--- a/gcc/fortran/iresolve.c
+++ b/gcc/fortran/iresolve.c
@@ -1276,27 +1276,16 @@ gfc_resolve_ior (gfc_expr *f, gfc_expr *i, gfc_expr *j)
 
 
 void
-gfc_resolve_index_func (gfc_expr *f, gfc_actual_arglist *a)
+gfc_resolve_index_func (gfc_expr *f, gfc_expr *str,
+	

[PATCH v2 5/7] fortran: Delete redundant missing_arg_type field

2021-08-05 Thread Mikael Morin via Gcc-patches

Now that we can get information about an actual arg's associated
dummy using the associated_dummy attribute, the field missing_arg_type
contains redundant information.
This removes it.

gcc/fortran/
* gfortran.h (gfc_actual_arglist::missing_arg_type): Remove.
* interface.c (gfc_compare_actual_formal): Remove
missing_arg_type initialization.
* intrinsic.c (sort_actual): Ditto.
* trans-expr.c (gfc_conv_procedure_call): Use associated_dummy
and gfc_dummy_arg_get_typespec to get the dummy argument type.
---
 gcc/fortran/gfortran.h   | 5 -
 gcc/fortran/interface.c  | 5 -
 gcc/fortran/intrinsic.c  | 5 +
 gcc/fortran/trans-expr.c | 9 +++--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 12dd33bf74f..5a28d1408eb 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1155,11 +1155,6 @@ typedef struct gfc_actual_arglist
   /* Alternate return label when the expr member is null.  */
   struct gfc_st_label *label;
 
-  /* This is set to the type of an eventual omitted optional
- argument. This is used to determine if a hidden string length
- argument has to be added to a function call.  */
-  bt missing_arg_type;
-
   gfc_param_spec_type spec_type;
 
   struct gfc_expr *expr;
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index d463ee8228a..7289374e932 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3581,11 +3581,6 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   if (*ap == NULL && n > 0)
 *ap = new_arg[0];
 
-  /* Note the types of omitted optional arguments.  */
-  for (a = *ap, f = formal; a; a = a->next, f = f->next)
-if (a->expr == NULL && a->label == NULL)
-  a->missing_arg_type = f->sym->ts.type;
-
   return true;
 }
 
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index c42891e7e1a..d8bf5732e0a 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -4438,10 +4438,7 @@ do_sort:
 	}
 
   if (a == NULL)
-	{
-	  a = gfc_get_actual_arglist ();
-	  a->missing_arg_type = f->ts.type;
-	}
+	a = gfc_get_actual_arglist ();
 
   a->associated_dummy = get_intrinsic_dummy_arg (f);
 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index b18a9ec9799..3e1f12bfbc7 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -5831,7 +5831,10 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		{
 		  /* Pass a NULL pointer for an absent arg.  */
 		  parmse.expr = null_pointer_node;
-		  if (arg->missing_arg_type == BT_CHARACTER)
+		  gfc_dummy_arg * const dummy_arg = arg->associated_dummy;
+		  if (dummy_arg
+		  && gfc_dummy_arg_get_typespec (*dummy_arg).type
+			 == BT_CHARACTER)
 		parmse.string_length = build_int_cst (gfc_charlen_type_node,
 			  0);
 		}
@@ -5848,7 +5851,9 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 			  || !CLASS_DATA (fsym)->attr.allocatable));
 	  gfc_init_se (, NULL);
 	  parmse.expr = null_pointer_node;
-	  if (arg->missing_arg_type == BT_CHARACTER)
+	  if (arg->associated_dummy
+	  && gfc_dummy_arg_get_typespec (*arg->associated_dummy).type
+		 == BT_CHARACTER)
 	parmse.string_length = build_int_cst (gfc_charlen_type_node, 0);
 	}
   else if (fsym && fsym->ts.type == BT_CLASS


[PATCH v2 4/7] fortran: simplify elemental arguments walking

2021-08-05 Thread Mikael Morin via Gcc-patches

This adds two functions working with the wrapper class gfc_dummy_arg
and makes usage of them to simplify a bit the walking of elemental
procedure arguments for scalarization.  As information about dummy arguments
can be obtained from the actual argument through the just-introduced
associated_dummy field, there is no need to carry around the procedure
interface and walk dummy arguments manually together with actual arguments.

gcc/fortran/
* interface.c (gfc_dummy_arg_get_typespec,
gfc_dummy_arg_is_optional): New functions.
* gfortran.h (gfc_dummy_arg_get_typespec,
gfc_dummy_arg_is_optional): Declare them.
* trans.h (gfc_ss_info::dummy_arg): Use the wrapper type
as declaration type.
* trans-array.c (gfc_scalar_elemental_arg_saved_as_reference):
use gfc_dummy_arg_get_typespec function to get the type.
(gfc_walk_elemental_function_args): Remove proc_ifc argument.
Get info about the dummy arg using the associated_dummy field.
* trans-array.h (gfc_walk_elemental_function_args): Update declaration.
* trans-intrinsic.c (gfc_walk_intrinsic_function):
Update call to gfc_walk_elemental_function_args.
* trans-stmt.c (gfc_trans_call): Ditto.
(get_proc_ifc_for_call): Remove.
---
 gcc/fortran/gfortran.h|  4 
 gcc/fortran/interface.c   | 34 ++
 gcc/fortran/trans-array.c | 23 +++
 gcc/fortran/trans-array.h |  2 +-
 gcc/fortran/trans-intrinsic.c |  2 +-
 gcc/fortran/trans-stmt.c  | 22 --
 gcc/fortran/trans.h   |  4 ++--
 7 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index c890d80bce0..12dd33bf74f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2201,6 +2201,10 @@ struct gfc_dummy_arg
 #define gfc_get_dummy_arg() XCNEW (gfc_dummy_arg)
 
 
+const gfc_typespec & gfc_dummy_arg_get_typespec (gfc_dummy_arg &);
+bool gfc_dummy_arg_is_optional (gfc_dummy_arg &);
+
+
 /* Specifies the various kinds of check functions used to verify the
argument lists of intrinsic functions. fX with X an integer refer
to check functions of intrinsics with X arguments. f1m is used for
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index dba167559d1..d463ee8228a 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -5403,3 +5403,37 @@ gfc_get_formal_from_actual_arglist (gfc_symbol *sym,
   f = &((*f)->next);
 }
 }
+
+
+const gfc_typespec &
+gfc_dummy_arg_get_typespec (gfc_dummy_arg & dummy_arg)
+{
+  switch (dummy_arg.kind)
+{
+case GFC_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.intrinsic->ts;
+
+case GFC_NON_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.non_intrinsic->sym->ts;
+
+default:
+  gcc_unreachable ();
+}
+}
+
+
+bool
+gfc_dummy_arg_is_optional (gfc_dummy_arg & dummy_arg)
+{
+  switch (dummy_arg.kind)
+{
+case GFC_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.intrinsic->optional;
+
+case GFC_NON_INTRINSIC_DUMMY_ARG:
+  return dummy_arg.u.non_intrinsic->sym->attr.optional;
+
+default:
+  gcc_unreachable ();
+}
+}
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 0d013defdbb..6ae72a354e5 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -2879,7 +2879,8 @@ gfc_scalar_elemental_arg_saved_as_reference (gfc_ss_info * ss_info)
   /* If the expression is of polymorphic type, it's actual size is not known,
  so we avoid copying it anywhere.  */
   if (ss_info->data.scalar.dummy_arg
-  && ss_info->data.scalar.dummy_arg->ts.type == BT_CLASS
+  && gfc_dummy_arg_get_typespec (*ss_info->data.scalar.dummy_arg).type
+	 == BT_CLASS
   && ss_info->expr->ts.type == BT_CLASS)
 return true;
 
@@ -11207,9 +11208,8 @@ gfc_get_proc_ifc_for_expr (gfc_expr *procedure_ref)
 
 gfc_ss *
 gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
-  gfc_symbol *proc_ifc, gfc_ss_type type)
+  gfc_ss_type type)
 {
-  gfc_formal_arglist *dummy_arg;
   int scalar;
   gfc_ss *head;
   gfc_ss *tail;
@@ -11218,16 +11218,12 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
   head = gfc_ss_terminator;
   tail = NULL;
 
-  if (proc_ifc)
-dummy_arg = gfc_sym_get_dummy_args (proc_ifc);
-  else
-dummy_arg = NULL;
-
   scalar = 1;
   for (; arg; arg = arg->next)
 {
+  gfc_dummy_arg * const dummy_arg = arg->associated_dummy;
   if (!arg->expr || arg->expr->expr_type == EXPR_NULL)
-	goto loop_continue;
+	continue;
 
   newss = gfc_walk_subexpr (head, arg->expr);
   if (newss == head)
@@ -11237,13 +11233,13 @@ gfc_walk_elemental_function_args (gfc_ss * ss, gfc_actual_arglist *arg,
 	  newss = gfc_get_scalar_ss (head, arg->expr);
 	  newss->info->type = type;
 	  if (dummy_arg)
-	newss->info->data.scalar.dummy_arg = 

[PATCH v2 3/7] fortran: Reverse actual vs dummy argument mapping

2021-08-05 Thread Mikael Morin via Gcc-patches

There was originally no way from an actual argument to get
to the corresponding dummy argument, even if the job of sorting
and matching actual with dummy arguments was done.
The closest was a field named actual in gfc_intrinsic_arg that was
used as scratch data when sorting arguments of one specific call.
However that value was overwritten later on as arguments of another
call to the same procedure were sorted and matched.

This change removes that field and adds instead a new field
associated_dummy in gfc_actual_arglist.  This field uses the just
introduced gfc_dummy_arg interface, which makes it usable with
both external and intrinsic procedure dummy arguments.

As the removed field was used in the code sorting and matching arguments,
that code has to be updated.  Two local vectors with matching indices
are introduced for respectively dummy and actual arguments, and the
loops are modified to use indices and update those argument vectors.

gcc/fortran/
* gfortran.h (gfc_actual_arglist): New field associated_dummy.
(gfc_intrinsic_arg): Remove field actual.
* interface.c (get_nonintrinsic_dummy_arg): New.
(gfc_compare_actual): Initialize associated_dummy.
* intrinsic.c (get_intrinsic_dummy_arg): New.
(sort_actual):  Add argument vectors.
Use loops with indices on argument vectors.
Initialize associated_dummy.
---
 gcc/fortran/gfortran.h  | 11 ++-
 gcc/fortran/interface.c | 21 ++--
 gcc/fortran/intrinsic.c | 43 ++---
 3 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 55ac4a80549..c890d80bce0 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1144,6 +1144,9 @@ gfc_formal_arglist;
 #define gfc_get_formal_arglist() XCNEW (gfc_formal_arglist)
 
 
+struct gfc_dummy_arg;
+
+
 /* The gfc_actual_arglist structure is for actual arguments and
for type parameter specification lists.  */
 typedef struct gfc_actual_arglist
@@ -1160,6 +1163,11 @@ typedef struct gfc_actual_arglist
   gfc_param_spec_type spec_type;
 
   struct gfc_expr *expr;
+
+  /*  The dummy arg this actual arg is associated with, if the interface
+  is explicit.  NULL otherwise.  */
+  gfc_dummy_arg *associated_dummy;
+
   struct gfc_actual_arglist *next;
 }
 gfc_actual_arglist;
@@ -2166,7 +2174,6 @@ typedef struct gfc_intrinsic_arg
   gfc_typespec ts;
   unsigned optional:1, value:1;
   ENUM_BITFIELD (sym_intent) intent:2;
-  gfc_actual_arglist *actual;
 
   struct gfc_intrinsic_arg *next;
 }
@@ -2191,6 +2198,8 @@ struct gfc_dummy_arg
   } u;
 };
 
+#define gfc_get_dummy_arg() XCNEW (gfc_dummy_arg)
+
 
 /* Specifies the various kinds of check functions used to verify the
argument lists of intrinsic functions. fX with X an integer refer
diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 9e3e8aa9da9..dba167559d1 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -3026,6 +3026,18 @@ lookup_arg_fuzzy (const char *arg, gfc_formal_arglist *arguments)
 }
 
 
+static gfc_dummy_arg *
+get_nonintrinsic_dummy_arg (gfc_formal_arglist *formal)
+{
+  gfc_dummy_arg * const dummy_arg = gfc_get_dummy_arg ();
+
+  dummy_arg->kind = GFC_NON_INTRINSIC_DUMMY_ARG;
+  dummy_arg->u.non_intrinsic = formal;
+
+  return dummy_arg;
+}
+
+
 /* Given formal and actual argument lists, see if they are compatible.
If they are compatible, the actual argument list is sorted to
correspond with the formal list, and elements for missing optional
@@ -3131,6 +3143,8 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 			   "call at %L", where);
 	  return false;
 	}
+  else
+	a->associated_dummy = get_nonintrinsic_dummy_arg (f);
 
   if (a->expr == NULL)
 	{
@@ -3546,9 +3560,12 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   /* The argument lists are compatible.  We now relink a new actual
  argument list with null arguments in the right places.  The head
  of the list remains the head.  */
-  for (i = 0; i < n; i++)
+  for (f = formal, i = 0; f; f = f->next, i++)
 if (new_arg[i] == NULL)
-  new_arg[i] = gfc_get_actual_arglist ();
+  {
+	new_arg[i] = gfc_get_actual_arglist ();
+	new_arg[i]->associated_dummy = get_nonintrinsic_dummy_arg (f);
+  }
 
   if (na != 0)
 {
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index ffeaf2841b7..c42891e7e1a 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -4269,6 +4269,18 @@ remove_nullargs (gfc_actual_arglist **ap)
 }
 
 
+static gfc_dummy_arg *
+get_intrinsic_dummy_arg (gfc_intrinsic_arg *intrinsic)
+{
+  gfc_dummy_arg * const dummy_arg = gfc_get_dummy_arg ();
+
+  dummy_arg->kind = GFC_INTRINSIC_DUMMY_ARG;
+  dummy_arg->u.intrinsic = intrinsic;
+
+  return dummy_arg;
+}
+
+
 /* Given an actual arglist and a formal arglist, sort the actual

[PATCH v2 2/7] fortran: Tiny sort_actual internal refactoring

2021-08-05 Thread Mikael Morin via Gcc-patches

Preliminary refactoring to make further changes more obvious.
No functional change.

gcc/fortran/
* intrinsic.c (sort_actual): initialise variable and use it earlier.
---
 gcc/fortran/intrinsic.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 219f04f2317..ffeaf2841b7 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -4411,19 +4411,18 @@ do_sort:
 
   for (f = formal; f; f = f->next)
 {
-  if (f->actual && f->actual->label != NULL && f->ts.type)
+  a = f->actual;
+  if (a && a->label != NULL && f->ts.type)
 	{
 	  gfc_error ("ALTERNATE RETURN not permitted at %L", where);
 	  return false;
 	}
 
-  if (f->actual == NULL)
+  if (a == NULL)
 	{
 	  a = gfc_get_actual_arglist ();
 	  a->missing_arg_type = f->ts.type;
 	}
-  else
-	a = f->actual;
 
   if (actual == NULL)
 	*ap = a;


[PATCH v2 1/7] fortran: new wrapper class gfc_dummy_arg

2021-08-05 Thread Mikael Morin via Gcc-patches

Introduce a new wrapper class gfc_dummy_arg that provides a common
interface to both dummy arguments of user-defined procedures (which
have type gfc_formal_arglist) and dummy arguments of intrinsic procedures
(which have type gfc_intrinsic_arg).

gcc/fortran/
* gfortran.h (gfc_dummy_arg_kind, gfc_dummy_arg): New.
---
 gcc/fortran/gfortran.h | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 921aed93dc3..55ac4a80549 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -2169,11 +2169,29 @@ typedef struct gfc_intrinsic_arg
   gfc_actual_arglist *actual;
 
   struct gfc_intrinsic_arg *next;
-
 }
 gfc_intrinsic_arg;
 
 
+typedef enum {
+  GFC_UNDEFINED_DUMMY_ARG = 0,
+  GFC_INTRINSIC_DUMMY_ARG,
+  GFC_NON_INTRINSIC_DUMMY_ARG
+}
+gfc_dummy_arg_kind;
+
+/* dummy arg of either an intrinsic or a user-defined procedure.  */
+struct gfc_dummy_arg
+{
+  gfc_dummy_arg_kind kind;
+
+  union {
+gfc_intrinsic_arg *intrinsic;
+gfc_formal_arglist *non_intrinsic;
+  } u;
+};
+
+
 /* Specifies the various kinds of check functions used to verify the
argument lists of intrinsic functions. fX with X an integer refer
to check functions of intrinsics with X arguments. f1m is used for


[PATCH v2 0/7] fortran: Ignore unused arguments for scalarisation [PR97896]

2021-08-05 Thread Mikael Morin via Gcc-patches
Hello,

This is the second submit of a patch series whose first version[1] was not
welcome because of its C++ usage.

After some thought I figured out that rewriting the series without C++
features would not be that impacting after all.
So here you go, the (not so) good old-fashioned way.

The problematic case is intrinsic procedures where an argument is actually not 
used in the code generated (KIND argument of INDEX in the testcase), which 
confuses the scalariser.

Thomas König comitted a change to workaround the problem, but it regressed in 
PR97896.  These patch put the workaround where I think it is more appropriate, 
namely at the beginning of the scalarisation procedure.  This is the patch 7 of 
the series, preceded with the revert in patch 6.  I intend to commit both of 
them squashed together.

The rest of the series (patches 1-5) is preliminary work to be able to identify 
the KIND argument of the INDEX intrinsic by its name, rather than using the 
right number of next->next->next indirections starting with the first argument. 
 It is probably overkill for just this use case, but I think it’s worth having 
that facility in the long term.

I intend to submit a separate patch for the release branch with only patch 6 
and 7 and the next->next->next indirections.

Regression-tested on x86_64-linux-gnu.  Ok for master? 


[1] https://gcc.gnu.org/pipermail/fortran/2021-August/056303.html

Mikael Morin (7):
  fortran: new wrapper class gfc_dummy_arg
  fortran: Tiny sort_actual internal refactoring
  fortran: Reverse actual vs dummy argument mapping
  fortran: simplify elemental arguments walking
  fortran: Delete redundant missing_arg_type field
  Revert "Remove KIND argument from INDEX so it does not mess up
scalarization."
  fortran: Ignore unused args in scalarization [PR97896]

 gcc/fortran/gfortran.h|  41 +--
 gcc/fortran/interface.c   |  77 ++--
 gcc/fortran/intrinsic.c   | 101 +++---
 gcc/fortran/intrinsic.h   |   3 +-
 gcc/fortran/iresolve.c|  21 +-
 gcc/fortran/trans-array.c |  74 ++-
 gcc/fortran/trans-array.h |   5 +-
 gcc/fortran/trans-decl.c  |  24 +-
 gcc/fortran/trans-expr.c  |   9 ++-
 gcc/fortran/trans-intrinsic.c |   3 +-
 gcc/fortran/trans-stmt.c  |  30 
 gcc/fortran/trans.h   |   4 +-
 gcc/testsuite/gfortran.dg/index_5.f90 |  23 ++
 13 files changed, 262 insertions(+), 153 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/index_5.f90

-- 
2.30.2



[PATCH 1/4] aarch64: Use memcpy to copy structures in vst4[q]_lane intrinsics

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi,

As subject, this patch uses __builtin_memcpy to copy vector structures
instead of using a union - or constructing a new opaque structure one
vector at a time - in each of the vst4[q]_lane Neon intrinsics in
arm_neon.h.

It also adds new code generation tests to verify that superfluous move
instructions are not generated for the vst4q_lane intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-29  Jonathan Wright  

* config/aarch64/arm_neon.h (__ST4_LANE_FUNC): Delete.
(__ST4Q_LANE_FUNC): Delete.
(vst4_lane_f16): Use __builtin_memcpy to copy vector
structure instead of constructing __builtin_aarch64_simd_xi
one vector at a time.
(vst4_lane_f32): Likewise.
(vst4_lane_f64): Likewise.
(vst4_lane_p8): Likewise.
(vst4_lane_p16): Likewise.
(vst4_lane_p64): Likewise.
(vst4_lane_s8): Likewise.
(vst4_lane_s16): Likewise.
(vst4_lane_s32): Likewise.
(vst4_lane_s64): Likewise.
(vst4_lane_u8): Likewise.
(vst4_lane_u16): Likewise.
(vst4_lane_u32): Likewise.
(vst4_lane_u64): Likewise.
(vst4_lane_bf16): Likewise.
(vst4q_lane_f16): Use __builtin_memcpy to copy vector
structure instead of using a union.
(vst4q_lane_f32): Likewise.
(vst4q_lane_f64): Likewise.
(vst4q_lane_p8): Likewise.
(vst4q_lane_p16): Likewise.
(vst4q_lane_p64): Likewise.
(vst4q_lane_s8): Likewise.
(vst4q_lane_s16): Likewise.
(vst4q_lane_s32): Likewise.
(vst4q_lane_s64): Likewise.
(vst4q_lane_u8): Likewise.
(vst4q_lane_u16): Likewise.
(vst4q_lane_u32): Likewise.
(vst4q_lane_u64): Likewise.
(vst4q_lane_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_structure_intrinsics.c: Add new
tests.


rb14728.patch
Description: rb14728.patch


Re: [PATCH 01/34] rs6000: Incorporate new builtins code into the build machinery

2021-08-05 Thread Segher Boessenkool
On Thu, Aug 05, 2021 at 08:47:54AM -0500, Bill Schmidt wrote:
> Hi Segher,
> 
> On 8/4/21 5:29 PM, Segher Boessenkool wrote:
> >On Thu, Jul 29, 2021 at 08:30:48AM -0500, Bill Schmidt wrote:
> >+rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o
> >>+   $(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
> >>+   $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)
> >I wonder what the difference is between BUILD_LINKERFLAGS and
> >BUILD_LDFLAGS?  Do you have any idea?
> >
> I couldn't find evidence that BUILD_LINKERFLAGS ever has anything that 
> BUILD_LDFLAGS doesn't, but I put that down to my ignorance of the 
> cobwebbed corners of the build system.  There is probably some configure 
> magic that can set it, and I suspect it has something to do with cross 
> builds; but it might also just be a leftover artifact.  I decided I 
> should use the same build rule as the other gen- programs to make sure 
> cross builds work as expected. Certainly open to better ideas if you 
> have them!

Oh no, the patch is fine as is, I approved it...  I'm just terminally
nosy :-)  It isn't clear what (if any) difference there is between the
two vars.  I do know you are just copying exising practice here.


Segher


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-08-05 Thread Segher Boessenkool
On Thu, Aug 05, 2021 at 02:05:24PM +0200, Martin Liška wrote:
> On 7/23/21 7:57 PM, Segher Boessenkool wrote:
> >Hi!
> >
> >On Fri, Jul 23, 2021 at 07:47:54AM +0200, Martin Liška wrote:
> >>On 7/12/21 7:20 PM, Segher Boessenkool wrote:
> >>+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof
> >>(f) *
> >
> >-fno-stack-protector is default.
> 
> Yes, but one needs an optimize attribute in order to trigger
> cl_target_option_save/restore
> mechanism.
> >>>
> >>>So it behaves differently if you select the default than if you do not
> >>>select anything?  That is wrong, no?
> >>
> >>Sorry, I don't get your example, please explain it.
> >
> >If -mbork is the default, the coompiler whould behave the same if you
> >invoke it with -mbork as when you do not.  And the optimize attribute
> >should work exactly the same as command line options.
> 
> Ah, got your point. All right, let's use then 'optimize(1)'.
> 
> Is it fine with the adjustment?

You are saying the compiler's behaviour is broken, but are changing the
testcase to avoid exhibiting that behaviour?  No, this is not fine.

If a flag is the default the compiler should do the same thing with and
without any attribute setting that flag, directly or indirectly.


Segher


[PATCH, v3, libgomp, OpenMP 5.0, committed] Implement omp_get_device_num

2021-08-05 Thread Chung-Lin Tang



On 2021/8/3 8:07 PM, Thomas Schwinge wrote:

Really suggest to have intelmic support be re-worked as an offload plugin inside
libgomp, rather than floating outside by itself.

Well, it is a regular libgomp plugin, just its sources are not in
'libgomp/plugin/' and it's not built during libgomp build.  Are you
suggesting just to move it into 'libgomp/plugin/'?  This may need some
more complicated setup because of its 'liboffloadmic' dependency?


Well it appears that liboffloadmic is layered atop of a COI API (Common Offload 
Interface?)
that is supposed to be the true proprietary interface to Intel MIC devices.

I think it is more reasonable to have each libgomp plugin to directly be built
atop of the vendor-specific interface for the accelerator. Having another 
in-tree library
serve in-between makes things a bit unnecessarily complex.

(I'm not sure if I recall correctly, but did liboffloadmic have another use 
besides for
libgomp?)


--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -102,6 +102,12 @@ struct addr_pair
 uintptr_t end;
   };
   
+/* This symbol is to name a target side variable that holds the designated

+   'device number' of the target device. The symbol needs to be available to
+   libgomp code and the  offload plugin (which in the latter case must be
+   stringified).  */
+#define GOMP_DEVICE_NUM_VAR __gomp_device_num

For a single var it is acceptable (though, please avoid the double space
before offload plugin in the comment), but once we have more than one
variable, I think we should simply have a struct which will contain all the
parameters that need to be copied from the host to the offloading device at
image load time (and have eventually another struct that holds parameters
that we'll need to copy to the device on each kernel launch, I bet some ICVs
will be one category, other ICVs another one).

ACK.  Also other program state, like 'fenv' or the gfortran "state blob".
This is  "Missing data/state
sharing/propagation between host and offloading devices".


Okay, so we actually have a PR number for this :)



Actually, if you look at the 5.[01] specifications, omp_get_device_num() is not
defined in terms of an ICV. Maybe it conceptually ought to be, but the current
description of "the device number of the device on which the calling thread is
executing" is not one if the defined ICVs.

It looks like there will eventually be some kind of ICV block handled in a 
similar
way, but I think that the modifications will be straightforward then. For now,
I think it's okay for GOMP_DEVICE_NUM_VAR to just be a normal global variable.

There is, by the way, precedent for that:
'libgomp/config/nvptx/time.c:double __nvptx_clocktick', set up in
'libgomp/plugin/plugin-nvptx.c:nvptx_set_clocktick' ('cuModuleGetGlobal'
to get the device address, followed by 'cuMemcpyHtoD'), invoked from
'libgomp/plugin/plugin-nvptx.c:GOMP_OFFLOAD_load_image', quite simple.

For the case discussed here, we're now adding more complex
'other_count'/'other_entries'/'num_others' bookkeeping.  (Great that all
of the plugins plus 'libgomp/target.c' invented their own terminology...)
;-)


Well, that is kind of what nvptx is doing by itself internally.
(e.g. libgomp/config/gcn/time.c does not use such external setting by the 
plugin)

Maybe that "last" entry handled by load_image will eventually turn into a large
block struct to handle all such cases.


--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c

...

+  if (status == HSA_STATUS_SUCCESS)
+{
+  uint64_t device_num_varptr;
+  uint32_t device_num_varsize;
+
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
+   (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_ADDRESS,
+_num_varptr);
+  if (status != HSA_STATUS_SUCCESS)
+   hsa_fatal ("Could not extract a variable from its symbol", status);
+  status = hsa_fns.hsa_executable_symbol_get_info_fn
+   (var_symbol, HSA_EXECUTABLE_SYMBOL_INFO_VARIABLE_SIZE,
+_num_varsize);
+  if (status != HSA_STATUS_SUCCESS)
+   hsa_fatal ("Could not extract a variable size from its symbol", status);
+
+  pair->start = device_num_varptr;
+  pair->end = device_num_varptr + device_num_varsize;
+}
+  else
+pair->start = pair->end = 0;
+

Is the 'else' branch to accomodate "old" executables running against
"new" libgomp?  If yes, then please add a comment, "for compatibility
with pre-GCC 12 executables" or similar.


No, it's because GOMP_DEVICE_NUM_VAR is only linked in when the program
actually uses omp_get_device_num().

Even when a program does use omp_get_device_num(), only that offload image
which contains that part of libgomp has the device number variable defined.

So the else case should actually be quite common.


Also, add 'pair++;', to avoid future confusion?


Done.


+  if (r == CUDA_SUCCESS)
+{
+  targ_tbl->start = (uintptr_t) device_num_varptr;
+  targ_tbl->end = 

Re: [PATCH] libcpp: Regenerate ucnid.h using Unicode 13.0.0 files [PR100977]

2021-08-05 Thread Jeff Law via Gcc-patches




On 8/5/2021 2:17 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

The following patch (incremental to the makeucnid.c fix) regenerates
ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files.

Bootstrapped/regtested on top of the previous patch (which has also
been bootstrapped/regtested alone) on x86_64-linux and i686-linux,
ok for trunk?

2021-08-04  Jakub Jelinek  

PR c++/100977
* ucnid.h: Regenerated using Unicode 13.0.0 files.

This and its prerequisite makeucnid.c fix are both OK.
jeff



Re: [PATCH] libgcc: Honor LDFLAGS_FOR_TARGET when linking libgcc_s

2021-08-05 Thread Jeff Law via Gcc-patches




On 8/5/2021 6:41 AM, Jakub Jelinek via Gcc-patches wrote:

Hi!

When building gcc with some specific LDFLAGS_FOR_TARGET, e.g.
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
those flags propagate info linking of target shared libraries,
e.g. 
lib{ubsan,tsan,stdc++,quadmath,objc,lsan,itm,gphobos,gdruntime,gomp,go,gfortran,atomic,asan}.so.*
but there is one important exception, libgcc_s.so.* linking ignores it.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux with 
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
and verified that libgcc_s.so.* is BIND_NOW when it previously wasn't, and
without any LDFLAGS_FOR_TARGET on x86_64-linux and i686-linux.
There on x86_64-linux I've verified that the libgcc_s.so.1 linking command
line for -m64 is identical except for whitespace to one without the patch,
and for -m32 multilib $(LDFLAGS) actually do supply there an extra -m32
that also repeats later in the @multilib_flags@, which should be harmless.

Ok for trunk?

2021-08-04  Jakub Jelinek  

* config/t-slibgcc (SHLIB_LINK): Add $(LDFLAGS).
* config/t-slibgcc-darwin (SHLIB_LINK): Likewise.
* config/t-slibgcc-vms (SHLIB_LINK): Likewise.
* config/t-slibgcc-fuchsia (SHLIB_LDFLAGS): Remove $(LDFLAGS).

OK
jeff



RE: [PATCH take 2] Fold (X<

2021-08-05 Thread Roger Sayle
 

Sorry (again) for the disruption.  Hopefully, Jakub agrees, but I’d like to 
think that this

isn’t a problem with my recent patch, but a slightly fragile test case.  PR 
target/100056 concerned

an issue where the compiler would generate three instructions for a given idiom 
when

this could optimally be done in two.  And this was resolved by Jakub’s backend 
splitters.

With my recent change, the code generated for (i<<11)|i is now the same as has 
always

been generated for (i<<11)+i and 2049*i, which is still be done optimally in 
two instructions.

Alas, the trouble is that one of those instructions is a [us]bfiz.

 

Both i+(i<<11) and i|(i<<11) compute exactly the same result, for unsigned char 
i, so

the question is whether one implementation is superior to the other, in which 
case 

GCC should generate that form for both expressions.

 

I’m unfamiliar with AArch64, but presumably the code generated by synth_mult 
for 2049*i

is optimal if the backend’s RTX_COST is well parameterized.  [LLVM also 
generates a two

instruction sequence for 2049*I on ARM64].

 

I believe a solution is to tweak the test case by changing the dg-final 
condition to:

scan-assembler-not \\t[us]bfiz\\tw0, w0, 11

which failed prior to Jakub’s fix to PR100056, and passes since then, but this 
too

is a little fragile; ideally we’d count the number of instructions.

 

Does this sound like a reasonable fix/workaround?  Or is the issue that 
[us]bfiz is slow,

and that synth_mult (and the addition form) should also avoid generating this 
insn?

 

Thanks in advance,

Roger

--

 

From: Christophe Lyon  
Sent: 05 August 2021 14:53
To: Roger Sayle 
Cc: GCC Patches ; Marc Glisse 
Subject: Re: [PATCH take 2] Fold (X > wrote:


Hi Marc,

Thanks for the feedback.  After some quality time in gdb, I now appreciate
that
match.pd behaves (subtly) differently between generic and gimple, and the
trees actually being passed to tree_nonzero_bits were not quite what I had
expected.  Sorry for my confusion, the revised patch below is now much
shorter
(and my follow-up patch that was originally to tree_nonzero_bits looks like
it
now needs to be to get_nonzero_bits!).

This revised patch has been retested on 864_64-pc-linux-gnu with a
"make bootstrap" and "make -k check" with no new failures.

Ok for mainline?

2021-07-28  Roger Sayle  mailto:ro...@nextmovesoftware.com> >
Marc Glisse mailto:marc.gli...@inria.fr> >

gcc/ChangeLog
* match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
(X*C1)^(X*C2) as X*(C1+C2), and related variants, using
tree_nonzero_bits to ensure that operands are bit-wise disjoint.

gcc/testsuite/ChangeLog
* gcc.dg/fold-ior-4.c: New test.

 

Hi,

 

This patch introduces a regression on aarch64:

FAIL:gcc.target/aarch64/pr100056.c scan-assembler-not \\t[us]bfiz\\tw[0-9 
 ]+, w[0-9]+, 11

 

Can you check?

 

Thanks,

 

Christophe

 

 

Roger
--

-Original Message-
From: Marc Glisse mailto:marc.gli...@inria.fr> > 
Sent: 26 July 2021 16:45
To: Roger Sayle mailto:ro...@nextmovesoftware.com> 
>
Cc: 'GCC Patches' mailto:gcc-patches@gcc.gnu.org> >
Subject: Re: [PATCH] Fold (X< The one aspect that's a little odd is that each transform is paired 
> with a convert@1 variant, using the efficient match machinery to 
> expose any zero extension to fold-const.c's tree_nonzero_bits
functionality.

Copying the first transform for context

+(for op (bit_ior bit_xor)
+ (simplify
+  (op (mult:s@0 @1 INTEGER_CST@2)
+  (mult:s@3 @1 INTEGER_CST@4))
+  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
+   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
+   (mult @1
+{ wide_int_to_tree (type, wi::to_wide (@2) + wi::to_wide (@4));
})))  
+(simplify
+  (op (mult:s@0 (convert@1 @2) INTEGER_CST@3)
+  (mult:s@4 (convert@1 @2) INTEGER_CST@5))
+  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
+   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
+   (mult @1
+{ wide_int_to_tree (type, wi::to_wide (@3) + wi::to_wide (@5));
})))

Could you explain how the convert helps exactly?

--
Marc Glisse



Re: [committed] libstdc++: Move attributes that follow requires-clauses [PR101782]

2021-08-05 Thread Jonathan Wakely via Gcc-patches

On 05/08/21 15:19 +0100, Jonathan Wakely wrote:

On 04/08/21 12:55 +0100, Jonathan Wakely wrote:

This adds [[nodiscard]] throughout , as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.


This change causes errors when -fconcepts-ts is used. Fixed like so.

Tested powerpc64le-linux, committed to trunk.




commit 7b1de3eb9ed3f8dde54732d88520292c5ad1157d
Author: Jonathan Wakely 
Date:   Thu Aug 5 13:34:00 2021

   libstdc++: Move attributes that follow requires-clauses [PR101782]
   
   As explained in the PR, the grammar in the Concepts TS means that a [

   token following a requires-clause is parsed as part of the
   logical-or-expression rather than the start of an attribute. That makes
   the following ill-formed when using -fconcepts-ts:
   
 template requires foo [[nodiscard]] int f(T);
   
   This change moves all attributes that follow a requires-clause to the

   end of the function declarator.



Except that as Jakub pointed out, putting it there doesn't work.

It needs to be:

  template requires foo int f [[nodiscard]] (T);

At least the testsuite isn't failing now, but the attributes I moved
have no effect. I'll fix it ... some time.





Re: [committed 2/2] libstdc++: Add [[nodiscard]] to sequence containers

2021-08-05 Thread Jonathan Wakely via Gcc-patches

On 04/08/21 13:00 +0100, Jonathan Wakely wrote:

On 04/08/21 12:56 +0100, Jonathan Wakely wrote:

... and container adaptors.

This adds the [[nodiscard]] attribute to functions with no side-effects
for the sequence containers and their iterators, and the debug versions
of those containers, and the container adaptors,


I don't plan to add any more [[nodiscard]] attributes for now, but
these two commits should demonstrate how to do it for anybody who
wants to contribute similar patches.


OK, one more change in this vein, adding [[nodiscard]] to .

Tested powerpc64le-linux, committed to trunk.


commit 8dec72aeb54e98643c0fb3d53768cdb96cf1342a
Author: Jonathan Wakely 
Date:   Thu Aug 5 14:01:31 2021

libstdc++: Add [[nodiscard]] to 

This adds the [[nodiscard]] attribute to all conversion operators,
comparison operators, call operators and non-member functions in
. Nothing in this header except constructors has side effects.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* libsupc++/compare (partial_ordering, weak_ordering)
(strong_ordering, is_eq, is_neq, is_lt, is_lteq, is_gt, is_gteq)
(compare_three_way, strong_order, weak_order, partial_order)
(compare_strong_order_fallback, compare_weak_order_fallback)
(compare_partial_order_fallback, __detail::__synth3way): Add
nodiscard attribute.
* testsuite/18_support/comparisons/categories/zero_neg.cc: Add
-Wno-unused-result to options.

diff --git a/libstdc++-v3/libsupc++/compare b/libstdc++-v3/libsupc++/compare
index dd0ec5fa36d..faeff641437 100644
--- a/libstdc++-v3/libsupc++/compare
+++ b/libstdc++-v3/libsupc++/compare
@@ -86,49 +86,61 @@ namespace std
 static const partial_ordering unordered;
 
 // comparisons
+[[nodiscard]]
 friend constexpr bool
 operator==(partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value == 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator==(partial_ordering, partial_ordering) noexcept = default;
 
+[[nodiscard]]
 friend constexpr bool
 operator< (partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value == -1; }
 
+[[nodiscard]]
 friend constexpr bool
 operator> (partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value == 1; }
 
+[[nodiscard]]
 friend constexpr bool
 operator<=(partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value <= 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator>=(partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
 
+[[nodiscard]]
 friend constexpr bool
 operator< (__cmp_cat::__unspec, partial_ordering __v) noexcept
 { return __v._M_value == 1; }
 
+[[nodiscard]]
 friend constexpr bool
 operator> (__cmp_cat::__unspec, partial_ordering __v) noexcept
 { return __v._M_value == -1; }
 
+[[nodiscard]]
 friend constexpr bool
 operator<=(__cmp_cat::__unspec, partial_ordering __v) noexcept
 { return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
 
+[[nodiscard]]
 friend constexpr bool
 operator>=(__cmp_cat::__unspec, partial_ordering __v) noexcept
 { return 0 >= __v._M_value; }
 
+[[nodiscard]]
 friend constexpr partial_ordering
 operator<=>(partial_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v; }
 
+[[nodiscard]]
 friend constexpr partial_ordering
 operator<=>(__cmp_cat::__unspec, partial_ordering __v) noexcept
 {
@@ -168,53 +180,66 @@ namespace std
 static const weak_ordering equivalent;
 static const weak_ordering greater;
 
+[[nodiscard]]
 constexpr operator partial_ordering() const noexcept
 { return partial_ordering(__cmp_cat::_Ord(_M_value)); }
 
 // comparisons
+[[nodiscard]]
 friend constexpr bool
 operator==(weak_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value == 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator==(weak_ordering, weak_ordering) noexcept = default;
 
+[[nodiscard]]
 friend constexpr bool
 operator< (weak_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value < 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator> (weak_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value > 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator<=(weak_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value <= 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator>=(weak_ordering __v, __cmp_cat::__unspec) noexcept
 { return __v._M_value >= 0; }
 
+[[nodiscard]]
 friend constexpr bool
 operator< (__cmp_cat::__unspec, weak_ordering __v) noexcept
 { return 0 < __v._M_value; }
 
+[[nodiscard]]
 friend constexpr 

Re: [committed] libstdc++: Move attributes that follow requires-clauses [PR101782]

2021-08-05 Thread Ville Voutilainen via Gcc-patches
On Thu, 5 Aug 2021 at 17:21, Jonathan Wakely via Libstdc++
 wrote:
>
> On 04/08/21 12:55 +0100, Jonathan Wakely wrote:
> >This adds [[nodiscard]] throughout , as proposed by P2377R0
> >(with some minor corrections).
> >
> >The attribute is added for all modes from C++11 up, using
> >[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
> >be used directly.
>
> This change causes errors when -fconcepts-ts is used. Fixed like so.

But this makes the attribute appertain to the function type, not to
the function. It's also ill-formed:
"The attribute-token nodiscard may be applied to the declarator-id in
a function declaration or to the
declaration of a class or enumeration. " Your change makes nodiscard
be applied to the function declaration,
not to the declarator-id.


Re: [committed 2/2] libstdc++: Add [[nodiscard]] to sequence containers

2021-08-05 Thread Jonathan Wakely via Gcc-patches
On Thu, 5 Aug 2021 at 13:14, Ville Voutilainen
 wrote:
>
> On Thu, 5 Aug 2021 at 15:11, Christophe Lyon via Libstdc++
>  wrote:
> >
> > Hi Jonathan,
> >
> > On Wed, Aug 4, 2021 at 2:04 PM Jonathan Wakely via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > > On 04/08/21 12:56 +0100, Jonathan Wakely wrote:
> > > >... and container adaptors.
> > > >
> > > >This adds the [[nodiscard]] attribute to functions with no side-effects
> > > >for the sequence containers and their iterators, and the debug versions
> > > >of those containers, and the container adaptors,
> > >
> > > I don't plan to add any more [[nodiscard]] attributes for now, but
> > > these two commits should demonstrate how to do it for anybody who
> > > wants to contribute similar patches.
> > >
> > > I didn't add tests that verify we do actually warn on each of those
> > > functions, because there are hundreds of them, and I know they're
> > > working because I had to alter existing tests to not warn.
> > >
> > >
> > I've noticed a regression on aarch64/arm:
> > FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++17 (test for excess
> > errors)
> > Excess errors:
> > /gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
> > return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
> > std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
> > std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
> > unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]
> >
> > FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++2a (test for excess
> > errors)
> > Excess errors:
> > /gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
> > return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
> > std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
> > std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
> > unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]
> >
> > Not sure why you didn't see it?
>
> That can easily happen when running just the library tests, rather
> than all of them. :P

Right, I didn't run all the compiler tests.

Fixed with this patch, tested x86_64-linux, pushed to trunk.
commit 03d47da7e1e91adddbde261ffefd2760df59a564
Author: Jonathan Wakely 
Date:   Thu Aug 5 14:00:35 2021

testsuite: Fix warning introduced by nodiscard in libstdc++

Signed-off-by: Jonathan Wakely 

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/inline7.C: Cast nodiscard call to void.

diff --git a/gcc/testsuite/g++.old-deja/g++.other/inline7.C 
b/gcc/testsuite/g++.old-deja/g++.other/inline7.C
index a3723cfba1e..62639c5 100644
--- a/gcc/testsuite/g++.old-deja/g++.other/inline7.C
+++ b/gcc/testsuite/g++.old-deja/g++.other/inline7.C
@@ -8,7 +8,7 @@ std::list li;
 
 void f ()
 {
-  li.size ();
+  (void) li.size ();
 }
 
 int main ()


[committed] libstdc++: Move attributes that follow requires-clauses [PR101782]

2021-08-05 Thread Jonathan Wakely via Gcc-patches

On 04/08/21 12:55 +0100, Jonathan Wakely wrote:

This adds [[nodiscard]] throughout , as proposed by P2377R0
(with some minor corrections).

The attribute is added for all modes from C++11 up, using
[[__nodiscard__]] or _GLIBCXX_NODISCARD where C++17 [[nodiscard]] can't
be used directly.


This change causes errors when -fconcepts-ts is used. Fixed like so.

Tested powerpc64le-linux, committed to trunk.

commit 7b1de3eb9ed3f8dde54732d88520292c5ad1157d
Author: Jonathan Wakely 
Date:   Thu Aug 5 13:34:00 2021

libstdc++: Move attributes that follow requires-clauses [PR101782]

As explained in the PR, the grammar in the Concepts TS means that a [
token following a requires-clause is parsed as part of the
logical-or-expression rather than the start of an attribute. That makes
the following ill-formed when using -fconcepts-ts:

  template requires foo [[nodiscard]] int f(T);

This change moves all attributes that follow a requires-clause to the
end of the function declarator.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR libstdc++/101782
* include/bits/ranges_base.h (ranges::begin, ranges::end)
(ranges::rbegin, ranges::rend, ranges::size, ranges::ssize)
(ranges::empty, ranges::data): Move attribute to the end of
the declarator.
* include/bits/stl_iterator.h (__gnu_cxx::__normal_iterator)
(common_iterator): Likewise for non-member operator functions.
* include/std/ranges (views::all, views::filter)
(views::transform, views::take, views::take_while, views::drop)
(views::drop_while, views::join, views::lazy_split)
(views::split, views::counted, views::common, views::reverse)
(views::elements): Likewise.
* testsuite/std/ranges/access/101782.cc: New test.

diff --git a/libstdc++-v3/include/bits/ranges_base.h b/libstdc++-v3/include/bits/ranges_base.h
index 614b6edf9df..1dac9680b4f 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -110,9 +110,9 @@ namespace ranges
   template<__maybe_borrowed_range _Tp>
 	requires is_array_v> || __member_begin<_Tp>
 	  || __adl_begin<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (is_array_v>)
 	{
@@ -162,9 +162,9 @@ namespace ranges
   template<__maybe_borrowed_range _Tp>
 	requires is_bounded_array_v>
 	  || __member_end<_Tp> || __adl_end<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (is_bounded_array_v>)
 	{
@@ -267,10 +267,10 @@ namespace ranges
 public:
   template<__maybe_borrowed_range _Tp>
 	requires __member_rbegin<_Tp> || __adl_rbegin<_Tp> || __reversable<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const
 	noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (__member_rbegin<_Tp>)
 	return __t.rbegin();
@@ -326,10 +326,10 @@ namespace ranges
 public:
   template<__maybe_borrowed_range _Tp>
 	requires __member_rend<_Tp> || __adl_rend<_Tp> || __reversable<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const
 	noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (__member_rend<_Tp>)
 	return __t.rend();
@@ -416,9 +416,9 @@ namespace ranges
   template
 	requires is_bounded_array_v>
 	  || __member_size<_Tp> || __adl_size<_Tp> || __sentinel_size<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (is_bounded_array_v>)
 	return extent_v>;
@@ -437,9 +437,9 @@ namespace ranges
   // 3403. Domain of ranges::ssize(E) doesn't match ranges::size(E)
   template
 	requires requires (_Tp& __t) { _Size{}(__t); }
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const noexcept(noexcept(_Size{}(__t)))
+	[[nodiscard]]
 	{
 	  auto __size = _Size{}(__t);
 	  using __size_type = decltype(__size);
@@ -497,9 +497,9 @@ namespace ranges
   template
 	requires __member_empty<_Tp> || __size0_empty<_Tp>
 	  || __eq_iter_empty<_Tp>
-	[[nodiscard]]
 	constexpr bool
 	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp&>())
+	[[nodiscard]]
 	{
 	  if constexpr (__member_empty<_Tp>)
 	return bool(__t.empty());
@@ -539,9 +539,9 @@ namespace ranges
 public:
   template<__maybe_borrowed_range _Tp>
 	requires __member_data<_Tp> || __begin_data<_Tp>
-	[[nodiscard]]
 	constexpr auto
 	operator()(_Tp&& __t) const noexcept(_S_noexcept<_Tp>())
+	[[nodiscard]]
 	{
 	  if constexpr (__member_data<_Tp>)
 	return __t.data();
diff --git a/libstdc++-v3/include/bits/stl_iterator.h b/libstdc++-v3/include/bits/stl_iterator.h
index 3773d600b8f..053ae41e9c3 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ 

Re: [PATCH take 2] Fold (X<

2021-08-05 Thread Christophe Lyon via Gcc-patches
On Wed, Jul 28, 2021 at 2:45 PM Roger Sayle 
wrote:

>
> Hi Marc,
>
> Thanks for the feedback.  After some quality time in gdb, I now appreciate
> that
> match.pd behaves (subtly) differently between generic and gimple, and the
> trees actually being passed to tree_nonzero_bits were not quite what I had
> expected.  Sorry for my confusion, the revised patch below is now much
> shorter
> (and my follow-up patch that was originally to tree_nonzero_bits looks like
> it
> now needs to be to get_nonzero_bits!).
>
> This revised patch has been retested on 864_64-pc-linux-gnu with a
> "make bootstrap" and "make -k check" with no new failures.
>
> Ok for mainline?
>
> 2021-07-28  Roger Sayle  
> Marc Glisse 
>
> gcc/ChangeLog
> * match.pd (bit_ior, bit_xor): Canonicalize (X*C1)|(X*C2) and
> (X*C1)^(X*C2) as X*(C1+C2), and related variants, using
> tree_nonzero_bits to ensure that operands are bit-wise disjoint.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/fold-ior-4.c: New test.
>
>
Hi,

This patch introduces a regression on aarch64:
FAIL:gcc.target/aarch64/pr100056.c scan-assembler-not
\\t[us]bfiz\\tw[0-9]+, w[0-9]+, 11

Can you check?

Thanks,

Christophe



> Roger
> --
>
> -Original Message-
> From: Marc Glisse 
> Sent: 26 July 2021 16:45
> To: Roger Sayle 
> Cc: 'GCC Patches' 
> Subject: Re: [PATCH] Fold (X< possible.
>
> On Mon, 26 Jul 2021, Roger Sayle wrote:
>
> > The one aspect that's a little odd is that each transform is paired
> > with a convert@1 variant, using the efficient match machinery to
> > expose any zero extension to fold-const.c's tree_nonzero_bits
> functionality.
>
> Copying the first transform for context
>
> +(for op (bit_ior bit_xor)
> + (simplify
> +  (op (mult:s@0 @1 INTEGER_CST@2)
> +  (mult:s@3 @1 INTEGER_CST@4))
> +  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
> +   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@3)) == 0)
> +   (mult @1
> +{ wide_int_to_tree (type, wi::to_wide (@2) + wi::to_wide (@4));
> })))
> +(simplify
> +  (op (mult:s@0 (convert@1 @2) INTEGER_CST@3)
> +  (mult:s@4 (convert@1 @2) INTEGER_CST@5))
> +  (if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_WRAPS (type)
> +   && (tree_nonzero_bits (@0) & tree_nonzero_bits (@4)) == 0)
> +   (mult @1
> +{ wide_int_to_tree (type, wi::to_wide (@3) + wi::to_wide (@5));
> })))
>
> Could you explain how the convert helps exactly?
>
> --
> Marc Glisse
>


Re: [PATCH 01/34] rs6000: Incorporate new builtins code into the build machinery

2021-08-05 Thread Bill Schmidt via Gcc-patches

Hi Segher,

On 8/4/21 5:29 PM, Segher Boessenkool wrote:

On Thu, Jul 29, 2021 at 08:30:48AM -0500, Bill Schmidt wrote:
+rs6000-gen-builtins: rs6000-gen-builtins.o rbtree.o

+   $(LINKER_FOR_BUILD) $(BUILD_LINKERFLAGS) $(BUILD_LDFLAGS) -o $@ \
+   $(filter-out $(BUILD_LIBDEPS), $^) $(BUILD_LIBS)

I wonder what the difference is between BUILD_LINKERFLAGS and
BUILD_LDFLAGS?  Do you have any idea?

I couldn't find evidence that BUILD_LINKERFLAGS ever has anything that 
BUILD_LDFLAGS doesn't, but I put that down to my ignorance of the 
cobwebbed corners of the build system.  There is probably some configure 
magic that can set it, and I suspect it has something to do with cross 
builds; but it might also just be a leftover artifact.  I decided I 
should use the same build rule as the other gen- programs to make sure 
cross builds work as expected. Certainly open to better ideas if you 
have them!


Thanks,
Bill



Re: [PATCH 1/2] Add emulated gather capability to the vectorizer

2021-08-05 Thread Christophe Lyon via Gcc-patches
On Thu, Aug 5, 2021 at 11:53 AM Richard Biener  wrote:

> On Thu, 5 Aug 2021, Christophe Lyon wrote:
>
> > On Wed, Aug 4, 2021 at 2:08 PM Richard Biener  wrote:
> >
> > > On Wed, 4 Aug 2021, Richard Sandiford wrote:
> > >
> > > > Richard Biener  writes:
> > > > > This adds a gather vectorization capability to the vectorizer
> > > > > without target support by decomposing the offset vector, doing
> > > > > sclar loads and then building a vector from the result.  This
> > > > > is aimed mainly at cases where vectorizing the rest of the loop
> > > > > offsets the cost of vectorizing the gather.
> > > > >
> > > > > Note it's difficult to avoid vectorizing the offset load, but in
> > > > > some cases later passes can turn the vector load + extract into
> > > > > scalar loads, see the followup patch.
> > > > >
> > > > > On SPEC CPU 2017 510.parest_r this improves runtime from 250s
> > > > > to 219s on a Zen2 CPU which has its native gather instructions
> > > > > disabled (using those the runtime instead increases to 254s)
> > > > > using -Ofast -march=znver2 [-flto].  It turns out the critical
> > > > > loops in this benchmark all perform gather operations.
> > > > >
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > > >
> > > > > 2021-07-30  Richard Biener  
> > > > >
> > > > > * tree-vect-data-refs.c (vect_check_gather_scatter):
> > > > > Include widening conversions only when the result is
> > > > > still handed by native gather or the current offset
> > > > > size not already matches the data size.
> > > > > Also succeed analysis in case there's no native support,
> > > > > noted by a IFN_LAST ifn and a NULL decl.
> > > > > (vect_analyze_data_refs): Always consider gathers.
> > > > > * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
> > > > > Test for no IFN gather rather than decl gather.
> > > > > * tree-vect-stmts.c (vect_model_load_cost): Pass in the
> > > > > gather-scatter info and cost emulated gathers accordingly.
> > > > > (vect_truncate_gather_scatter_offset): Properly test for
> > > > > no IFN gather.
> > > > > (vect_use_strided_gather_scatters_p): Likewise.
> > > > > (get_load_store_type): Handle emulated gathers and its
> > > > > restrictions.
> > > > > (vectorizable_load): Likewise.  Emulate them by extracting
> > > > > scalar offsets, doing scalar loads and a vector construct.
> > > > >
> > > > > * gcc.target/i386/vect-gather-1.c: New testcase.
> > > > > * gfortran.dg/vect/vect-8.f90: Adjust.
> > >
> >
> > Hi,
> >
> > The adjusted testcase now fails on aarch64:
> > FAIL:  gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect
> > "vectorized 23 loops" 1
>
> That likely means it needs adjustment for the aarch64 case as well
> which I didn't touch.  I suppose it's now vectorizing 24 loops?
> And 24 with SVE as well, so we might be able to merge the
> aarch64_sve and aarch64 && ! aarch64_sve cases?
>
> Like with
>
> diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> index cc1aebfbd84..c8a7d896bac 100644
> --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> @@ -704,7 +704,6 @@ CALL track('KERNEL  ')
>  RETURN
>  END SUBROUTINE kernel
>
> -! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" {
> target aarch64_sve } } }
> -! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" {
> target { aarch64*-*-* && { ! aarch64_sve } } } } }
> +! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" {
> target aarch64*-*-* } } }
>  ! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect"
> { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
>  ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" {
> target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
>
> f951 vect.exp testing with and without -march=armv8.3-a+sve shows
> this might work, but if you can double-check that would be nice.
>
>
Indeed LGTM, thanks


> Richard.
>


Re: [PATCH] vect: Move costing helpers from aarch64 code

2021-08-05 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Thu, Aug 5, 2021 at 2:04 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches
>> >  wrote:
>> >>
>> >> When the vectoriser scalarises a strided store, it counts one
>> >> scalar_store for each element plus one vec_to_scalar extraction
>> >> for each element.  However, extracting element 0 is free on AArch64,
>> >> so it should have zero cost.
>> >>
>> >> I don't have a testcase that requires this for existing -mtune
>> >> options, but it becomes more important with a later patch.
>> >>
>> >> gcc/
>> >> * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
>> >> function, split out from...
>> >> (aarch64_detect_vector_stmt_subtype): ...here.
>> >> (aarch64_add_stmt_cost): Treat extracting element 0 as free.
>> >> ---
>> >>  gcc/config/aarch64/aarch64.c | 22 +++---
>> >>  1 file changed, 19 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> >> index 36f11808916..084f8caa0da 100644
>> >> --- a/gcc/config/aarch64/aarch64.c
>> >> +++ b/gcc/config/aarch64/aarch64.c
>> >> @@ -14622,6 +14622,18 @@ aarch64_builtin_vectorization_cost (enum 
>> >> vect_cost_for_stmt type_of_cost,
>> >>  }
>> >>  }
>> >>
>> >> +/* Return true if an operaton of kind KIND for STMT_INFO represents
>> >> +   the extraction of an element from a vector in preparation for
>> >> +   storing the element to memory.  */
>> >> +static bool
>> >> +aarch64_is_store_elt_extraction (vect_cost_for_stmt kind,
>> >> +stmt_vec_info stmt_info)
>> >> +{
>> >> +  return (kind == vec_to_scalar
>> >> + && STMT_VINFO_DATA_REF (stmt_info)
>> >> + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
>> >> +}
>> >
>> > It would be nice to put functions like this in tree-vectorizer.h in some
>> > section marked with a comment to contain helpers for the target
>> > add_stmt_cost.
>>
>> Yeah, I guess that would avoid pointless cut-&-paste between targets.
>> How does this look?  Tested on aarch64-linux-gnu and x86_64-linux-gnu.
>
> Looks good besides ...
>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * tree-vectorizer.h (vect_is_store_elt_extraction, vect_is_reduction)
>> (vect_reduc_type, vect_embedded_comparison_type, 
>> vect_comparison_type)
>> (vect_is_extending_load, vect_is_integer_truncation): New functions,
>> moved from aarch64.c but given different names.
>> * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction)
>> (aarch64_is_reduction, aarch64_reduc_type)
>> (aarch64_embedded_comparison_type, aarch64_comparison_type)
>> (aarch64_extending_load_p, aarch64_integer_truncation_p): Delete
>> in favor of the above.  Update callers accordingly.
>> ---
>>  gcc/config/aarch64/aarch64.c | 125 ---
>>  gcc/tree-vectorizer.h| 104 +
>>  2 files changed, 118 insertions(+), 111 deletions(-)
>>
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index deb22477e28..fd8681747ca 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -2192,4 +2192,108 @@ extern vect_pattern_decl_t slp_patterns[];
>>  /* Number of supported pattern matchers.  */
>>  extern size_t num__slp_patterns;
>>
>> +/* --
>> +   Target support routines
>> +   ---
>> +   The following routines are provided to simplify costing decisions in
>> +   target code.  Please add more as needed.  */
>> +
>> +/* Return true if an operaton of kind KIND for STMT_INFO represents
>> +   the extraction of an element from a vector in preparation for
>> +   storing the element to memory.  */
>> +inline bool
>> +vect_is_store_elt_extraction (vect_cost_for_stmt kind, stmt_vec_info 
>> stmt_info)
>> +{
>> +  return (kind == vec_to_scalar
>> + && STMT_VINFO_DATA_REF (stmt_info)
>> + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
>> +}
>> +
>> +/* Return true if STMT_INFO represents part of a reduction.  */
>> +inline bool
>> +vect_is_reduction (stmt_vec_info stmt_info)
>> +{
>> +  return (STMT_VINFO_REDUC_DEF (stmt_info)
>> + || VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)));
>> +}
>> +
>> +/* If STMT_INFO describes a reduction, return the type of reduction
>> +   it describes, otherwise return -1.  */
>> +inline int
>
> it's not clear what 'type of reduction' is - why not return enum
> vect_reduction_type?
> Because of the -1?  Maybe we can simply add a NOT_REDUCTION member
> to the enum?

Yeah, because of the -1.  I don't like the idea of adding a fake value
since it complicates switch statements that handle all non-fake values.

> Or simply adjust the comment as "return the vect_reduction_type
> of 

Re: [PATCH v3] gcov: Add __gcov_info_to_gdca()

2021-08-05 Thread Martin Liška

On 7/23/21 11:39 AM, Sebastian Huber wrote:

Add __gcov_info_to_gcda() to libgcov to get the gcda data for a gcda info in a
freestanding environment.  It is intended to be used with the
-fprofile-info-section option.  A crude test program which doesn't use a linker
script is (use "gcc -coverage -fprofile-info-section -lgcc test.c" to compile
it):


The patch can be installed once the following nits are fixed:



   #include 
   #include 
   #include 

   extern const struct gcov_info *my_info;

   static void
   filename (const char *f, void *arg)
   {
 printf("filename: %s\n", f);
   }

   static void
   dump (const void *d, unsigned n, void *arg)
   {
 const unsigned char *c = d;

 for (unsigned i = 0; i < n; ++i)
   printf ("%02x", c[i]);
   }

   static void *
   allocate (unsigned length, void *arg)
   {
 return malloc (length);
   }

   int main()
   {
 __asm__ volatile (".set my_info, .LPBX2");
 __gcov_info_to_gcda (my_info, filename, dump, allocate, NULL);
 return 0;
   }

With this patch,  is included in libgcov-driver.c even if
inhibit_libc is defined.  This header file should be also available for
freestanding environments.  If this is not the case, then we have to define
intptr_t somehow.

The patch removes one use of memset() which makes the  include
superfluous.

gcc/

* gcov-io.h (gcov_write): Declare.
* gcov-io.c (gcov_write): New.
(gcov_write_counter): Remove.
(gcov_write_tag_length): Likewise.
(gcov_write_summary): Replace gcov_write_tag_length() with calls to
gcov_write_unsigned().
* doc/invoke.texi (fprofile-info-section): Mention
__gcov_info_to_gdca().

gcc/testsuite/

* gcc.dg/gcov-info-to-gcda.c: New test.

libgcc/

* Makefile.in (LIBGCOV_DRIVER): Add _gcov_info_to_gcda.
* gcov.h (gcov_info): Declare.
(__gcov_info_to_gdca): Likewise.
* libgcov.h (gcov_write_counter): Remove.
(gcov_write_tag_length): Likewise.
* libgcov-driver.c (#include ): New.
(#include ): Remove.
(NEED_L_GCOV): Conditionally define.
(NEED_L_GCOV_INFO_TO_GCDA): Likewise.
(are_all_counters_zero): New.
(gcov_dump_handler): Likewise.
(gcov_allocate_handler): Likewise.
(dump_unsigned): Likewise.
(dump_counter): Likewise.
(write_topn_counters): Add dump_fn, allocate_fn, and arg parameters.
Use dump_unsigned() and dump_counter().
(write_one_data): Add dump_fn, allocate_fn, and arg parameters.  Use
dump_unsigned(), dump_counter(), and are_all_counters_zero().
(__gcov_info_to_gcda): New.
---
  gcc/doc/invoke.texi  |  80 +++--
  gcc/gcov-io.c|  36 ++---
  gcc/gcov-io.h|   1 +
  gcc/testsuite/gcc.dg/gcov-info-to-gcda.c |  60 +++
  libgcc/Makefile.in   |   2 +-
  libgcc/gcov.h|  19 +++
  libgcc/libgcov-driver.c  | 196 ++-
  libgcc/libgcov.h |   5 -
  8 files changed, 313 insertions(+), 86 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/gcov-info-to-gcda.c

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 32697e6117c0..5f31312b9485 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14798,17 +14798,17 @@ To optimize the program based on the collected 
profile information, use
  Register the profile information in the specified section instead of using a
  constructor/destructor.  The section name is @var{name} if it is specified,
  otherwise the section name defaults to @code{.gcov_info}.  A pointer to the
-profile information generated by @option{-fprofile-arcs} or
-@option{-ftest-coverage} is placed in the specified section for each
-translation unit.  This option disables the profile information registration
-through a constructor and it disables the profile information processing
-through a destructor.  This option is not intended to be used in hosted
-environments such as GNU/Linux.  It targets systems with limited resources
-which do not support constructors and destructors.  The linker could collect
-the input sections in a continuous memory block and define start and end
-symbols.  The runtime support could dump the profiling information registered
-in this linker set during program termination to a serial line for example.  A
-GNU linker script example which defines a linker output section follows:
+profile information generated by @option{-fprofile-arcs} is placed in the
+specified section for each translation unit.  This option disables the profile
+information registration through a constructor and it disables the profile
+information processing through a destructor.  This option is not intended to be
+used in hosted environments such as GNU/Linux.  It targets free-standing
+environments (for example embedded systems) with limited resources 

[PATCH] libgcc: Honor LDFLAGS_FOR_TARGET when linking libgcc_s

2021-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

When building gcc with some specific LDFLAGS_FOR_TARGET, e.g.
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
those flags propagate info linking of target shared libraries,
e.g. 
lib{ubsan,tsan,stdc++,quadmath,objc,lsan,itm,gphobos,gdruntime,gomp,go,gfortran,atomic,asan}.so.*
but there is one important exception, libgcc_s.so.* linking ignores it.

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux with 
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now
and verified that libgcc_s.so.* is BIND_NOW when it previously wasn't, and
without any LDFLAGS_FOR_TARGET on x86_64-linux and i686-linux.
There on x86_64-linux I've verified that the libgcc_s.so.1 linking command
line for -m64 is identical except for whitespace to one without the patch,
and for -m32 multilib $(LDFLAGS) actually do supply there an extra -m32
that also repeats later in the @multilib_flags@, which should be harmless.

Ok for trunk?

2021-08-04  Jakub Jelinek  

* config/t-slibgcc (SHLIB_LINK): Add $(LDFLAGS).
* config/t-slibgcc-darwin (SHLIB_LINK): Likewise.
* config/t-slibgcc-vms (SHLIB_LINK): Likewise.
* config/t-slibgcc-fuchsia (SHLIB_LDFLAGS): Remove $(LDFLAGS).

--- libgcc/config/t-slibgcc.jj  2021-01-04 10:25:53.778064598 +0100
+++ libgcc/config/t-slibgcc 2021-08-04 12:25:36.931692406 +0200
@@ -32,7 +32,7 @@ SHLIB_INSTALL_SOLINK = $(LN_S) $(SHLIB_S
$(DESTDIR)$(slibdir)$(SHLIB_SLIBDIR_QUAL)/$(SHLIB_SOLINK)
 
 SHLIB_LINK = $(CC) $(LIBGCC2_CFLAGS) -shared -nodefaultlibs \
-   $(SHLIB_LDFLAGS) \
+   $(SHLIB_LDFLAGS) $(LDFLAGS) \
-o $(SHLIB_DIR)/$(SHLIB_SONAME).tmp @multilib_flags@ \
$(SHLIB_OBJS) $(SHLIB_LC) && \
rm -f $(SHLIB_DIR)/$(SHLIB_SOLINK) && \
--- libgcc/config/t-slibgcc-darwin.jj   2020-01-12 11:54:38.690379055 +0100
+++ libgcc/config/t-slibgcc-darwin  2021-08-04 12:26:56.484588816 +0200
@@ -15,7 +15,7 @@ SHLIB_LC = -lc
 # Note that this version is used for the loader, not the linker; the linker
 # uses the stub versions named by the versioned members of $(INSTALL_FILES).
 
-SHLIB_LINK = $(CC) $(LIBGCC2_CFLAGS) -dynamiclib -nodefaultlibs \
+SHLIB_LINK = $(CC) $(LIBGCC2_CFLAGS) $(LDFLAGS) -dynamiclib -nodefaultlibs \
-install_name @shlib_slibdir@/$(SHLIB_INSTALL_NAME) \
-single_module -o $(SHLIB_DIR)/$(SHLIB_SONAME) \
-Wl,-exported_symbols_list,$(SHLIB_MAP) \
--- libgcc/config/t-slibgcc-vms.jj  2020-01-12 11:54:38.691379040 +0100
+++ libgcc/config/t-slibgcc-vms 2021-08-04 12:27:23.644212047 +0200
@@ -22,7 +22,7 @@ SHLIB_LINK = \
   objdump --syms $(SHLIB_OBJS) | \
   $(SHLIB_SYMVEC) >> SYMVEC_.opt ; \
   echo "case_sensitive=NO" >> SYMVEC_.opt; \
-  $(CC) $(LIBGCC2_CFLAGS) -nodefaultlibs \
+  $(CC) $(LIBGCC2_CFLAGS) $(LDFLAGS) -nodefaultlibs \
   -shared --for-linker=/noinform -o $(SHLIB_NAME) $(SHLIB_OBJS) \
   --for-linker=SYMVEC_.opt \
   --for-linker=gsmatch=equal,$(shlib_version)
--- libgcc/config/t-slibgcc-fuchsia.jj  2021-08-04 11:40:49.313107664 +0200
+++ libgcc/config/t-slibgcc-fuchsia 2021-08-04 12:26:05.078301945 +0200
@@ -18,5 +18,4 @@
 
 # Fuchsia-specific shared library overrides.
 
-SHLIB_LDFLAGS = -Wl,--soname=$(SHLIB_SONAME) \
-$(LDFLAGS)
+SHLIB_LDFLAGS = -Wl,--soname=$(SHLIB_SONAME)

Jakub



Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-05 Thread Christophe Lyon via Gcc-patches
On Thu, Aug 5, 2021 at 2:28 PM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:

> On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
>  wrote:
> >
> >
> >
> > On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org> wrote:
> >>
> >> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
> >>  wrote:
> >> >
> >> >
> >> >
> >> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >> >>
> >> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov 
> wrote:
> >> >> >
> >> >> >
> >> >> >
> >> >> > > -Original Message-
> >> >> > > From: Prathamesh Kulkarni 
> >> >> > > Sent: 06 July 2021 08:06
> >> >> > > To: Christophe LYON 
> >> >> > > Cc: Kyrylo Tkachov ; gcc Patches  >> >> > > patc...@gcc.gnu.org>
> >> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding
> vector
> >> >> > > constructor
> >> >> > >
> >> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
> >> >> > >  wrote:
> >> >> > > >
> >> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
> >> >> > > >  wrote:
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
> >> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
> >> >> > > > > >  wrote:
> >> >> > > > > >>
> >> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
> >> >> > > > >  -Original Message-
> >> >> > > > >  From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> > > > >  Sent: 28 June 2021 09:38
> >> >> > > > >  To: Kyrylo Tkachov 
> >> >> > > > >  Cc: Christophe Lyon ; gcc
> Patches
> >> >> > >  >> >> > > > >  patc...@gcc.gnu.org>
> >> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> > > vector
> >> >> > > > >  constructor
> >> >> > > > > 
> >> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
> >> >> > > 
> >> >> > > > >  wrote:
> >> >> > > > > >
> >> >> > > > > >> -Original Message-
> >> >> > > > > >> From: Prathamesh Kulkarni <
> prathamesh.kulka...@linaro.org>
> >> >> > > > > >> Sent: 14 June 2021 09:02
> >> >> > > > > >> To: Christophe Lyon 
> >> >> > > > > >> Cc: gcc Patches ; Kyrylo
> Tkachov
> >> >> > > > > >> 
> >> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization in
> expanding
> >> >> > > vector
> >> >> > > > > >> constructor
> >> >> > > > > >>
> >> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
> >> >> > > > > >>  wrote:
> >> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
> >> >> > > > >  
> >> >> > > > > >> wrote:
> >> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni
> via Gcc-
> >> >> > > patches
> >> >> > > > >   wrote:
> >> >> > > > > > Hi,
> >> >> > > > > > As mentioned in PR, for the following test-case:
> >> >> > > > > >
> >> >> > > > > > #include 
> >> >> > > > > >
> >> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
> >> >> > > > > > {
> >> >> > > > > > return vdup_n_bf16 (a);
> >> >> > > > > > }
> >> >> > > > > >
> >> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
> >> >> > > > > > {
> >> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
> >> >> > > > > > }
> >> >> > > > > >
> >> >> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon
> -mfloat-
> >> >> > > > >  abi=softfp
> >> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being
> >> >> > > vectorized:
> >> >> > > > > >
> >> >> > > > > > f1:
> >> >> > > > > >   vdup.16 d16, r0
> >> >> > > > > >   vmovr0, r1, d16  @ v4bf
> >> >> > > > > >   bx  lr
> >> >> > > > > >
> >> >> > > > > > f2:
> >> >> > > > > >   mov r3, r0  @ __bf16
> >> >> > > > > >   adr r1, .L4
> >> >> > > > > >   ldrdr0, [r1]
> >> >> > > > > >   mov r2, r3  @ __bf16
> >> >> > > > > >   mov ip, r3  @ __bf16
> >> >> > > > > >   bfi r1, r2, #0, #16
> >> >> > > > > >   bfi r0, ip, #0, #16
> >> >> > > > > >   bfi r1, r3, #16, #16
> >> >> > > > > >   bfi r0, r2, #16, #16
> >> >> > > > > >   bx  lr
> >> >> > > > > >
> >> >> > > > > > This seems to happen because vec_init pattern in
> neon.md
> >> >> > > has VDQ
> >> >> > > > > >> mode
> >> >> > > > > > iterator, which doesn't include V4BF. In attached
> patch, I
> >> >> > > changed
> >> >> > > > > > mode
> >> >> > > > > > to VDQX which seems to work for the test-case, and
> the
> >> >> > > compiler
> >> >> > > > >  now
> >> >> > > > > >> generates:
> >> >> > > > > > f2:
> >> >> > > > > >   

Re: [ARM] PR98435: Missed optimization in expanding vector constructor

2021-08-05 Thread Prathamesh Kulkarni via Gcc-patches
On Tue, 3 Aug 2021 at 20:52, Christophe Lyon
 wrote:
>
>
>
> On Tue, Aug 3, 2021 at 12:57 PM Prathamesh Kulkarni 
>  wrote:
>>
>> On Tue, 3 Aug 2021 at 14:59, Christophe Lyon
>>  wrote:
>> >
>> >
>> >
>> > On Tue, Jul 6, 2021 at 11:26 AM Prathamesh Kulkarni via Gcc-patches 
>> >  wrote:
>> >>
>> >> On Tue, 6 Jul 2021 at 13:33, Kyrylo Tkachov  
>> >> wrote:
>> >> >
>> >> >
>> >> >
>> >> > > -Original Message-
>> >> > > From: Prathamesh Kulkarni 
>> >> > > Sent: 06 July 2021 08:06
>> >> > > To: Christophe LYON 
>> >> > > Cc: Kyrylo Tkachov ; gcc Patches > >> > > patc...@gcc.gnu.org>
>> >> > > Subject: Re: [ARM] PR98435: Missed optimization in expanding vector
>> >> > > constructor
>> >> > >
>> >> > > On Thu, 1 Jul 2021 at 16:26, Prathamesh Kulkarni
>> >> > >  wrote:
>> >> > > >
>> >> > > > On Wed, 30 Jun 2021 at 20:51, Christophe LYON
>> >> > > >  wrote:
>> >> > > > >
>> >> > > > >
>> >> > > > > On 29/06/2021 12:46, Prathamesh Kulkarni wrote:
>> >> > > > > > On Mon, 28 Jun 2021 at 14:48, Christophe LYON
>> >> > > > > >  wrote:
>> >> > > > > >>
>> >> > > > > >> On 28/06/2021 10:40, Kyrylo Tkachov via Gcc-patches wrote:
>> >> > > > >  -Original Message-
>> >> > > > >  From: Prathamesh Kulkarni 
>> >> > > > >  Sent: 28 June 2021 09:38
>> >> > > > >  To: Kyrylo Tkachov 
>> >> > > > >  Cc: Christophe Lyon ; gcc Patches
>> >> > > > >> > > > >  patc...@gcc.gnu.org>
>> >> > > > >  Subject: Re: [ARM] PR98435: Missed optimization in expanding
>> >> > > vector
>> >> > > > >  constructor
>> >> > > > > 
>> >> > > > >  On Thu, 24 Jun 2021 at 22:01, Kyrylo Tkachov
>> >> > > 
>> >> > > > >  wrote:
>> >> > > > > >
>> >> > > > > >> -Original Message-
>> >> > > > > >> From: Prathamesh Kulkarni 
>> >> > > > > >> Sent: 14 June 2021 09:02
>> >> > > > > >> To: Christophe Lyon 
>> >> > > > > >> Cc: gcc Patches ; Kyrylo Tkachov
>> >> > > > > >> 
>> >> > > > > >> Subject: Re: [ARM] PR98435: Missed optimization in 
>> >> > > > > >> expanding
>> >> > > vector
>> >> > > > > >> constructor
>> >> > > > > >>
>> >> > > > > >> On Wed, 9 Jun 2021 at 15:58, Prathamesh Kulkarni
>> >> > > > > >>  wrote:
>> >> > > > > >>> On Fri, 4 Jun 2021 at 13:15, Christophe Lyon
>> >> > > > >  
>> >> > > > > >> wrote:
>> >> > > > >  On Fri, 4 Jun 2021 at 09:27, Prathamesh Kulkarni via Gcc-
>> >> > > patches
>> >> > > > >   wrote:
>> >> > > > > > Hi,
>> >> > > > > > As mentioned in PR, for the following test-case:
>> >> > > > > >
>> >> > > > > > #include 
>> >> > > > > >
>> >> > > > > > bfloat16x4_t f1 (bfloat16_t a)
>> >> > > > > > {
>> >> > > > > > return vdup_n_bf16 (a);
>> >> > > > > > }
>> >> > > > > >
>> >> > > > > > bfloat16x4_t f2 (bfloat16_t a)
>> >> > > > > > {
>> >> > > > > > return (bfloat16x4_t) {a, a, a, a};
>> >> > > > > > }
>> >> > > > > >
>> >> > > > > > Compiling with arm-linux-gnueabi -O3 -mfpu=neon -mfloat-
>> >> > > > >  abi=softfp
>> >> > > > > > -march=armv8.2-a+bf16+fp16 results in f2 not being
>> >> > > vectorized:
>> >> > > > > >
>> >> > > > > > f1:
>> >> > > > > >   vdup.16 d16, r0
>> >> > > > > >   vmovr0, r1, d16  @ v4bf
>> >> > > > > >   bx  lr
>> >> > > > > >
>> >> > > > > > f2:
>> >> > > > > >   mov r3, r0  @ __bf16
>> >> > > > > >   adr r1, .L4
>> >> > > > > >   ldrdr0, [r1]
>> >> > > > > >   mov r2, r3  @ __bf16
>> >> > > > > >   mov ip, r3  @ __bf16
>> >> > > > > >   bfi r1, r2, #0, #16
>> >> > > > > >   bfi r0, ip, #0, #16
>> >> > > > > >   bfi r1, r3, #16, #16
>> >> > > > > >   bfi r0, r2, #16, #16
>> >> > > > > >   bx  lr
>> >> > > > > >
>> >> > > > > > This seems to happen because vec_init pattern in neon.md
>> >> > > has VDQ
>> >> > > > > >> mode
>> >> > > > > > iterator, which doesn't include V4BF. In attached 
>> >> > > > > > patch, I
>> >> > > changed
>> >> > > > > > mode
>> >> > > > > > to VDQX which seems to work for the test-case, and the
>> >> > > compiler
>> >> > > > >  now
>> >> > > > > >> generates:
>> >> > > > > > f2:
>> >> > > > > >   vdup.16 d16, r0
>> >> > > > > >   vmovr0, r1, d16  @ v4bf
>> >> > > > > >   bx  lr
>> >> > > > > >
>> >> > > > > > However, the pattern is also gated on TARGET_HAVE_MVE
>> >> > > and I am
>> >> > > > > >> not
>> >> > > > > > sure if either VDQ or VDQX are correct modes for MVE 
>> >> > > > > > since
>> >> > > MVE
>> >> > > > >  has
>> >> > > > 

Re: [committed 2/2] libstdc++: Add [[nodiscard]] to sequence containers

2021-08-05 Thread Ville Voutilainen via Gcc-patches
On Thu, 5 Aug 2021 at 15:11, Christophe Lyon via Libstdc++
 wrote:
>
> Hi Jonathan,
>
> On Wed, Aug 4, 2021 at 2:04 PM Jonathan Wakely via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
>
> > On 04/08/21 12:56 +0100, Jonathan Wakely wrote:
> > >... and container adaptors.
> > >
> > >This adds the [[nodiscard]] attribute to functions with no side-effects
> > >for the sequence containers and their iterators, and the debug versions
> > >of those containers, and the container adaptors,
> >
> > I don't plan to add any more [[nodiscard]] attributes for now, but
> > these two commits should demonstrate how to do it for anybody who
> > wants to contribute similar patches.
> >
> > I didn't add tests that verify we do actually warn on each of those
> > functions, because there are hundreds of them, and I know they're
> > working because I had to alter existing tests to not warn.
> >
> >
> I've noticed a regression on aarch64/arm:
> FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++17 (test for excess
> errors)
> Excess errors:
> /gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
> return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
> std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
> std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
> unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]
>
> FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++2a (test for excess
> errors)
> Excess errors:
> /gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
> return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
> std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
> std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
> unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]
>
> Not sure why you didn't see it?

That can easily happen when running just the library tests, rather
than all of them. :P


Re: [PATCH] vect: Move costing helpers from aarch64 code

2021-08-05 Thread Richard Biener via Gcc-patches
On Thu, Aug 5, 2021 at 2:04 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> When the vectoriser scalarises a strided store, it counts one
> >> scalar_store for each element plus one vec_to_scalar extraction
> >> for each element.  However, extracting element 0 is free on AArch64,
> >> so it should have zero cost.
> >>
> >> I don't have a testcase that requires this for existing -mtune
> >> options, but it becomes more important with a later patch.
> >>
> >> gcc/
> >> * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
> >> function, split out from...
> >> (aarch64_detect_vector_stmt_subtype): ...here.
> >> (aarch64_add_stmt_cost): Treat extracting element 0 as free.
> >> ---
> >>  gcc/config/aarch64/aarch64.c | 22 +++---
> >>  1 file changed, 19 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> index 36f11808916..084f8caa0da 100644
> >> --- a/gcc/config/aarch64/aarch64.c
> >> +++ b/gcc/config/aarch64/aarch64.c
> >> @@ -14622,6 +14622,18 @@ aarch64_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >>  }
> >>  }
> >>
> >> +/* Return true if an operaton of kind KIND for STMT_INFO represents
> >> +   the extraction of an element from a vector in preparation for
> >> +   storing the element to memory.  */
> >> +static bool
> >> +aarch64_is_store_elt_extraction (vect_cost_for_stmt kind,
> >> +stmt_vec_info stmt_info)
> >> +{
> >> +  return (kind == vec_to_scalar
> >> + && STMT_VINFO_DATA_REF (stmt_info)
> >> + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
> >> +}
> >
> > It would be nice to put functions like this in tree-vectorizer.h in some
> > section marked with a comment to contain helpers for the target
> > add_stmt_cost.
>
> Yeah, I guess that would avoid pointless cut-&-paste between targets.
> How does this look?  Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Looks good besides ...

> Thanks,
> Richard
>
>
> gcc/
> * tree-vectorizer.h (vect_is_store_elt_extraction, vect_is_reduction)
> (vect_reduc_type, vect_embedded_comparison_type, vect_comparison_type)
> (vect_is_extending_load, vect_is_integer_truncation): New functions,
> moved from aarch64.c but given different names.
> * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction)
> (aarch64_is_reduction, aarch64_reduc_type)
> (aarch64_embedded_comparison_type, aarch64_comparison_type)
> (aarch64_extending_load_p, aarch64_integer_truncation_p): Delete
> in favor of the above.  Update callers accordingly.
> ---
>  gcc/config/aarch64/aarch64.c | 125 ---
>  gcc/tree-vectorizer.h| 104 +
>  2 files changed, 118 insertions(+), 111 deletions(-)
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index deb22477e28..fd8681747ca 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2192,4 +2192,108 @@ extern vect_pattern_decl_t slp_patterns[];
>  /* Number of supported pattern matchers.  */
>  extern size_t num__slp_patterns;
>
> +/* --
> +   Target support routines
> +   ---
> +   The following routines are provided to simplify costing decisions in
> +   target code.  Please add more as needed.  */
> +
> +/* Return true if an operaton of kind KIND for STMT_INFO represents
> +   the extraction of an element from a vector in preparation for
> +   storing the element to memory.  */
> +inline bool
> +vect_is_store_elt_extraction (vect_cost_for_stmt kind, stmt_vec_info 
> stmt_info)
> +{
> +  return (kind == vec_to_scalar
> + && STMT_VINFO_DATA_REF (stmt_info)
> + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
> +}
> +
> +/* Return true if STMT_INFO represents part of a reduction.  */
> +inline bool
> +vect_is_reduction (stmt_vec_info stmt_info)
> +{
> +  return (STMT_VINFO_REDUC_DEF (stmt_info)
> + || VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)));
> +}
> +
> +/* If STMT_INFO describes a reduction, return the type of reduction
> +   it describes, otherwise return -1.  */
> +inline int

it's not clear what 'type of reduction' is - why not return enum
vect_reduction_type?
Because of the -1?  Maybe we can simply add a NOT_REDUCTION member
to the enum?  Or simply adjust the comment as "return the vect_reduction_type
of the reduction it describes, otherwise return -1"?

> +vect_reduc_type (vec_info *vinfo, stmt_vec_info stmt_info)
> +{
> +  if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
> +if (STMT_VINFO_REDUC_DEF (stmt_info))
> +  {
> +   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, 

Re: [committed 2/2] libstdc++: Add [[nodiscard]] to sequence containers

2021-08-05 Thread Christophe Lyon via Gcc-patches
Hi Jonathan,

On Wed, Aug 4, 2021 at 2:04 PM Jonathan Wakely via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> On 04/08/21 12:56 +0100, Jonathan Wakely wrote:
> >... and container adaptors.
> >
> >This adds the [[nodiscard]] attribute to functions with no side-effects
> >for the sequence containers and their iterators, and the debug versions
> >of those containers, and the container adaptors,
>
> I don't plan to add any more [[nodiscard]] attributes for now, but
> these two commits should demonstrate how to do it for anybody who
> wants to contribute similar patches.
>
> I didn't add tests that verify we do actually warn on each of those
> functions, because there are hundreds of them, and I know they're
> working because I had to alter existing tests to not warn.
>
>
I've noticed a regression on aarch64/arm:
FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++17 (test for excess
errors)
Excess errors:
/gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]

FAIL: g++.old-deja/g++.other/inline7.C  -std=gnu++2a (test for excess
errors)
Excess errors:
/gcc/testsuite/g++.old-deja/g++.other/inline7.C:11:11: warning: ignoring
return value of 'std::__cxx11::list<_Tp, _Alloc>::size_type
std::__cxx11::list<_Tp, _Alloc>::size() const [with _Tp = int*; _Alloc =
std::allocator; std::__cxx11::list<_Tp, _Alloc>::size_type = long
unsigned int]', declared with attribute 'nodiscard' [-Wunused-result]

Not sure why you didn't see it?

Christophe


Re: [PATCH] rs6000: Fix restored rs6000_long_double_type_size.

2021-08-05 Thread Martin Liška

On 7/23/21 7:57 PM, Segher Boessenkool wrote:

Hi!

On Fri, Jul 23, 2021 at 07:47:54AM +0200, Martin Liška wrote:

On 7/12/21 7:20 PM, Segher Boessenkool wrote:

+static __attribute__ ((optimize ("-fno-stack-protector"))) __typeof
(f) *


-fno-stack-protector is default.


Yes, but one needs an optimize attribute in order to trigger
cl_target_option_save/restore
mechanism.


So it behaves differently if you select the default than if you do not
select anything?  That is wrong, no?


Sorry, I don't get your example, please explain it.


If -mbork is the default, the coompiler whould behave the same if you
invoke it with -mbork as when you do not.  And the optimize attribute
should work exactly the same as command line options.


Ah, got your point. All right, let's use then 'optimize(1)'.

Is it fine with the adjustment?
Cheers,
Martin



Or perhaps you are saying you have this in the testcase only to exercise
the option save/restore code paths?  Please document that then, in the
testcase.


Segher



>From 517e32a75aa59b4b538eda30e78de0fa925bb0f9 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 1 Jun 2021 15:39:14 +0200
Subject: [PATCH] rs6000: Fix restored rs6000_long_double_type_size

As mentioned in the "Fallout: save/restore target options in handle_optimize_attribute"
thread, we need to support target option restore
of rs6000_long_double_type_size == FLOAT_PRECISION_TFmode.

gcc/ChangeLog:

	* config/rs6000/rs6000.c (rs6000_option_override_internal): When
	a target option is restored, it can have
	rs6000_long_double_type_size set to FLOAT_PRECISION_TFmode
	and error should not be emitted.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pragma-optimize.c: New test.
---
 gcc/config/rs6000/rs6000.c |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c | 13 +
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pragma-optimize.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2de5a96e1b6..5b1c06b09fc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4189,6 +4189,8 @@ rs6000_option_override_internal (bool global_init_p)
   else
 	rs6000_long_double_type_size = default_long_double_size;
 }
+  else if (rs6000_long_double_type_size == FLOAT_PRECISION_TFmode)
+; /* The option value can be seen when cl_target_option_restore is called.  */
   else if (rs6000_long_double_type_size == 128)
 rs6000_long_double_type_size = FLOAT_PRECISION_TFmode;
   else if (global_options_set.x_rs6000_ieeequad)
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
new file mode 100644
index 000..e8ba63a0667
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pragma-optimize.c
@@ -0,0 +1,13 @@
+/* { dg-options "-O2 -mlong-double-128 -mabi=ibmlongdouble" } */
+
+extern unsigned long int x;
+extern float f (float);
+extern __typeof (f) f_power8;
+extern __typeof (f) f_power9;
+extern __typeof (f) f __attribute__ ((ifunc ("f_ifunc")));
+static __attribute__ ((optimize (1))) __typeof (f) *
+f_ifunc (void)
+{
+  __typeof (f) *res = x ? f_power9 : f_power8;
+  return res;
+}
-- 
2.32.0



[PATCH] vect: Move costing helpers from aarch64 code

2021-08-05 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, Aug 3, 2021 at 2:09 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> When the vectoriser scalarises a strided store, it counts one
>> scalar_store for each element plus one vec_to_scalar extraction
>> for each element.  However, extracting element 0 is free on AArch64,
>> so it should have zero cost.
>>
>> I don't have a testcase that requires this for existing -mtune
>> options, but it becomes more important with a later patch.
>>
>> gcc/
>> * config/aarch64/aarch64.c (aarch64_is_store_elt_extraction): New
>> function, split out from...
>> (aarch64_detect_vector_stmt_subtype): ...here.
>> (aarch64_add_stmt_cost): Treat extracting element 0 as free.
>> ---
>>  gcc/config/aarch64/aarch64.c | 22 +++---
>>  1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 36f11808916..084f8caa0da 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -14622,6 +14622,18 @@ aarch64_builtin_vectorization_cost (enum 
>> vect_cost_for_stmt type_of_cost,
>>  }
>>  }
>>
>> +/* Return true if an operaton of kind KIND for STMT_INFO represents
>> +   the extraction of an element from a vector in preparation for
>> +   storing the element to memory.  */
>> +static bool
>> +aarch64_is_store_elt_extraction (vect_cost_for_stmt kind,
>> +stmt_vec_info stmt_info)
>> +{
>> +  return (kind == vec_to_scalar
>> + && STMT_VINFO_DATA_REF (stmt_info)
>> + && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
>> +}
>
> It would be nice to put functions like this in tree-vectorizer.h in some
> section marked with a comment to contain helpers for the target
> add_stmt_cost.

Yeah, I guess that would avoid pointless cut-&-paste between targets.
How does this look?  Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_is_store_elt_extraction, vect_is_reduction)
(vect_reduc_type, vect_embedded_comparison_type, vect_comparison_type)
(vect_is_extending_load, vect_is_integer_truncation): New functions,
moved from aarch64.c but given different names.
* config/aarch64/aarch64.c (aarch64_is_store_elt_extraction)
(aarch64_is_reduction, aarch64_reduc_type)
(aarch64_embedded_comparison_type, aarch64_comparison_type)
(aarch64_extending_load_p, aarch64_integer_truncation_p): Delete
in favor of the above.  Update callers accordingly.
---
 gcc/config/aarch64/aarch64.c | 125 ---
 gcc/tree-vectorizer.h| 104 +
 2 files changed, 118 insertions(+), 111 deletions(-)

diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index deb22477e28..fd8681747ca 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2192,4 +2192,108 @@ extern vect_pattern_decl_t slp_patterns[];
 /* Number of supported pattern matchers.  */
 extern size_t num__slp_patterns;
 
+/* --
+   Target support routines
+   ---
+   The following routines are provided to simplify costing decisions in
+   target code.  Please add more as needed.  */
+
+/* Return true if an operaton of kind KIND for STMT_INFO represents
+   the extraction of an element from a vector in preparation for
+   storing the element to memory.  */
+inline bool
+vect_is_store_elt_extraction (vect_cost_for_stmt kind, stmt_vec_info stmt_info)
+{
+  return (kind == vec_to_scalar
+ && STMT_VINFO_DATA_REF (stmt_info)
+ && DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)));
+}
+
+/* Return true if STMT_INFO represents part of a reduction.  */
+inline bool
+vect_is_reduction (stmt_vec_info stmt_info)
+{
+  return (STMT_VINFO_REDUC_DEF (stmt_info)
+ || VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)));
+}
+
+/* If STMT_INFO describes a reduction, return the type of reduction
+   it describes, otherwise return -1.  */
+inline int
+vect_reduc_type (vec_info *vinfo, stmt_vec_info stmt_info)
+{
+  if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
+if (STMT_VINFO_REDUC_DEF (stmt_info))
+  {
+   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
+   return int (STMT_VINFO_REDUC_TYPE (reduc_info));
+  }
+  return -1;
+}
+
+/* If STMT_INFO is a COND_EXPR that includes an embedded comparison, return the
+   scalar type of the values being compared.  Return null otherwise.  */
+inline tree
+vect_embedded_comparison_type (stmt_vec_info stmt_info)
+{
+  if (auto *assign = dyn_cast (stmt_info->stmt))
+if (gimple_assign_rhs_code (assign) == COND_EXPR)
+  {
+   tree cond = gimple_assign_rhs1 (assign);
+   if (COMPARISON_CLASS_P (cond))
+ return TREE_TYPE (TREE_OPERAND (cond, 0));
+  

[committed v2 3/3] arm: reorder assembler architecture directives [PR101723]

2021-08-05 Thread Richard Earnshaw via Gcc-patches

A change to the way gas interprets the .fpu directive in binutils-2.34
means that issuing .fpu will clear any features set by .arch_extension
that apply to the floating point or simd units.  This unfortunately
causes problems for more recent versions of the architecture because
we currently emit .arch, .arch_extension and .fpu directives at
different times and try to suppress redundant changes.

This change addresses this by firstly unifying all the places where we
emit these directives to a single block of code and secondly
(re)emitting all the directives if any changes have been made to the
target options.  Whilst this is slightly more than the strict minimum
it should be enough to catch all cases where a change could have
happened.  The new code also emits the directives in the order: .arch,
.fpu, .arch_extension.  This ensures that the additional architectural
extensions are not removed by a later .fpu directive.

Whilst writing this patch I also noticed that in the corner case where
the last function to be compiled had a non-standard set of
architecture flags, the assembler would add an incorrect set of
derived attributes for the file as a whole.  Instead of reflecting the
command-line options it would reflect the flags from the last file in
the function.  To address this I've also added a call to re-emit the
flags from the asm_file_end callback so the assembler will be in the
correct state when it finishes processing the intput.

There's some slight churn to the testsuite as a consequence of this,
because previously we had a hack to suppress emitting a .fpu directive
for one specific case, but with the new order this is no-longer
necessary.

gcc/ChangeLog:

PR target/101723
* config/arm/arm-cpus.in (generic-armv7-a): Add quirk to suppress
writing .cpu directive in asm output.
* config/arm/arm.c (arm_identify_fpu_from_isa): New variable.
(arm_last_printed_arch_string): Delete.
(arm_last-printed_fpu_string): Delete.
(arm_configure_build_target): If use of floating-point/SIMD is
disabled, remove all fp/simd related features from the target ISA.
(last_arm_targ_options): New variable.
(arm_print_asm_arch_directives): Add new parameters.  Change order
of emitted directives and handle all cases here.
(arm_file_start): Always call arm_print_asm_arch_directives, move
all generation of .arch/.arch_extension here.
(arm_file_end): Call arm_print_asm_arch.
(arm_declare_function_name): Call arm_print_asm_arch_directives
instead of printing .arch/.fpu directives directly.

gcc/testsuite/ChangeLog:

PR target/101723
* gcc.target/arm/cortex-m55-nofp-flag-hard.c: Update expected output.
* gcc.target/arm/cortex-m55-nofp-flag-softfp.c: Likewise.
* gcc.target/arm/cortex-m55-nofp-nomve-flag-softfp.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_fpu1.c: Convert to dg-do assemble.
Add a non-no-op function body.
* gcc.target/arm/mve/intrinsics/mve_fpu2.c: Likewise.
* gcc.target/arm/pr98636.c (dg-options): Add -mfloat-abi=softfp.
* gcc.target/arm/attr-neon.c: Tighten scan-assembler tests.
* gcc.target/arm/attr-neon2.c: Use -Ofast, convert test to use
check-function-bodies.
* gcc.target/arm/attr-neon3.c: Likewise.
* gcc.target/arm/pr69245.c: Tighten scan-assembler match, but allow
multiple instances.
* gcc.target/arm/pragma_fpu_attribute.c: Likewise.
* gcc.target/arm/pragma_fpu_attribute_2.c: Likewise.
---
 gcc/config/arm/arm-cpus.in|   1 +
 gcc/config/arm/arm.c  | 186 --
 gcc/testsuite/gcc.target/arm/attr-neon.c  |   9 +-
 gcc/testsuite/gcc.target/arm/attr-neon2.c |  35 ++--
 gcc/testsuite/gcc.target/arm/attr-neon3.c |  48 +++--
 .../arm/cortex-m55-nofp-flag-hard.c   |   2 +-
 .../arm/cortex-m55-nofp-flag-softfp.c |   2 +-
 .../arm/cortex-m55-nofp-nomve-flag-softfp.c   |   2 +-
 .../gcc.target/arm/mve/intrinsics/mve_fpu1.c  |   5 +-
 .../gcc.target/arm/mve/intrinsics/mve_fpu2.c  |   5 +-
 gcc/testsuite/gcc.target/arm/pr69245.c|   6 +-
 gcc/testsuite/gcc.target/arm/pr98636.c|   3 +-
 .../gcc.target/arm/pragma_fpu_attribute.c |   7 +-
 .../gcc.target/arm/pragma_fpu_attribute_2.c   |   7 +-
 14 files changed, 169 insertions(+), 149 deletions(-)

diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index ab4b6acf5ea..249995a6bca 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -1080,6 +1080,7 @@ begin cpu generic-armv7-a
  cname genericv7a
  tune flags LDSCHED
  architecture armv7-a+fp
+ isa quirk_no_asmcpu
  option mp add mp
  option sec add sec
  option vfpv3-d16 add VFPv3 FP_DBL
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 273202ac2fd..11dafc70067 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c

[committed v2 2/3] arm: Don't reconfigure globals in arm_configure_build_target

2021-08-05 Thread Richard Earnshaw via Gcc-patches

arm_configure_build_target is usually used to reconfigure the
arm_active_target structure, which is then used to reconfigure a
number of other global variables describing the current target.
Occasionally, however, we need to use arm_configure_build_target to
construct a temporary target structure and in that case it is wrong to
try to reconfigure the global variables (although probably harmless,
since arm_option_reconfigure_globals() only looks at
arm_active_target).  At the very least, however, this is wasted work,
so it is best not to do it unless needed.  What's more, several
callers of arm_configure_build target call
arm_option_reconfigure_globals themselves within a few lines, making
the call from within arm_configure_build_target completely redundant.

So this patch moves the responsibility of calling of
arm_configure_build_target to its callers (only two places needed
updating).

gcc:
* config/arm/arm.c (arm_configure_build_target): Don't call
arm_option_reconfigure_globals.
(arm_option_restore): Call arm_option_reconfigure_globals after
reconfiguring the target.
* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
---
 gcc/config/arm/arm-c.c | 1 +
 gcc/config/arm/arm.c   | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index ae2139c4bfa..cc7901bca8d 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -409,6 +409,7 @@ arm_pragma_target_parse (tree args, tree pop_target)
   target_option_current_node = cur_tree;
   arm_configure_build_target (_active_target,
   TREE_TARGET_OPTION (cur_tree), false);
+  arm_option_reconfigure_globals ();
 }
 
   /* Update macros if target_node changes. The global state will be restored
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b2dd58d8751..273202ac2fd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3058,6 +3058,7 @@ arm_option_restore (struct gcc_options */* opts */,
 		struct cl_target_option *ptr)
 {
   arm_configure_build_target (_active_target, ptr, false);
+  arm_option_reconfigure_globals ();
 }
 
 /* Reset options between modes that the user has specified.  */
@@ -3441,7 +3442,6 @@ arm_configure_build_target (struct arm_build_target *target,
   target->tune_flags = tune_data->tune_flags;
   target->tune = tune_data->tune;
   target->tune_core = tune_data->scheduler;
-  arm_option_reconfigure_globals ();
 }
 
 /* Fix up any incompatible options that the user has specified.  */


[committed v2 1/3] arm: ensure the arch_name is always set for the build target

2021-08-05 Thread Richard Earnshaw via Gcc-patches

This should never happen now if GCC is invoked by the driver, but in
the unusual case of calling cc1 (or its ilk) directly from the command
line the build target's arch_name string can remain NULL.  This can
complicate later processing meaning that we need to check for this
case explicitly in some circumstances.  Nothing should rely on this
behaviour, so it's simpler to always set the arch_name when
configuring the build target and be done with it.

gcc:

* config/arm/arm.c (arm_configure_build_target): Ensure the target's
arch_name is always set.
---
 gcc/config/arm/arm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6d781e23ee9..b2dd58d8751 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3432,6 +3432,8 @@ arm_configure_build_target (struct arm_build_target *target,
   const cpu_tune *tune_data = _tunes[arm_selected_tune - all_cores];
 
   /* Finish initializing the target structure.  */
+  if (!target->arch_name)
+target->arch_name = arm_selected_arch->common.name;
   target->arch_pp_name = arm_selected_arch->arch;
   target->base_arch = arm_selected_arch->base_arch;
   target->profile = arm_selected_arch->profile;


[committed v2 0/3] arm: fix problems when targetting extended FPUs [PR101723]

2021-08-05 Thread Richard Earnshaw via Gcc-patches
Thanks, Christophe, I've updated the testsuite to fix all the issues I could
see from your test runs.

This is what I've finally committed, but if there's any more fallout, please
let me know.

R.

Richard Earnshaw (3):
  arm: ensure the arch_name is always set for the build target
  arm: Don't reconfigure globals in arm_configure_build_target
  arm: reorder assembler architecture directives [PR101723]

 gcc/config/arm/arm-c.c|   1 +
 gcc/config/arm/arm-cpus.in|   1 +
 gcc/config/arm/arm.c  | 190 --
 gcc/testsuite/gcc.target/arm/attr-neon.c  |   9 +-
 gcc/testsuite/gcc.target/arm/attr-neon2.c |  35 +++-
 gcc/testsuite/gcc.target/arm/attr-neon3.c |  48 -
 .../arm/cortex-m55-nofp-flag-hard.c   |   2 +-
 .../arm/cortex-m55-nofp-flag-softfp.c |   2 +-
 .../arm/cortex-m55-nofp-nomve-flag-softfp.c   |   2 +-
 .../gcc.target/arm/mve/intrinsics/mve_fpu1.c  |   5 +-
 .../gcc.target/arm/mve/intrinsics/mve_fpu2.c  |   5 +-
 gcc/testsuite/gcc.target/arm/pr69245.c|   6 +-
 gcc/testsuite/gcc.target/arm/pr98636.c|   3 +-
 .../gcc.target/arm/pragma_fpu_attribute.c |   7 +-
 .../gcc.target/arm/pragma_fpu_attribute_2.c   |   7 +-
 15 files changed, 173 insertions(+), 150 deletions(-)

-- 
2.25.1



[PATCH, AArch64] PR target/101609 - Use the correct iterator for AArch64 vector right shift pattern.

2021-08-05 Thread Tejas Belagod via Gcc-patches
Hi,

Loops containing long long shifts fail to vectorize due to the vectorizer
not being able to recognize long long right shifts. This is due to a bug
in the iterator used for the vashr and vlshr patterns in aarch64-simd.md.

Tested and bootstrapped on aarch64-linux. OK?

2021-08-05  Tejas Belagod  

gcc/ChangeLog:

PR target/101609
* config/aarch64/aarch64-simd.md (vlshr3, vashr3): Use
  the right iterator.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-shr-reg.c: New testcase.
* gcc.target/aarch64/vect-shr-reg-run.c: Likewise.


Thanks,
Tejas Belagod.
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
c5638d096fa84a27b4ea397f62cd0d05a28e7c8c..48eddf64e05afe3788abfa05141f6544a9323ea1
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1299,13 +1299,10 @@ (define_expand "vashl3"
   DONE;
 })
 
-;; Using mode VDQ_BHSI as there is no V2DImode neg!
-;; Negating individual lanes most certainly offsets the
-;; gain from vectorization.
 (define_expand "vashr3"
- [(match_operand:VDQ_BHSI 0 "register_operand")
-  (match_operand:VDQ_BHSI 1 "register_operand")
-  (match_operand:VDQ_BHSI 2 "register_operand")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   rtx neg = gen_reg_rtx (mode);
@@ -1333,9 +1330,9 @@ (define_expand "aarch64_ashr_simddi"
 )
 
 (define_expand "vlshr3"
- [(match_operand:VDQ_BHSI 0 "register_operand")
-  (match_operand:VDQ_BHSI 1 "register_operand")
-  (match_operand:VDQ_BHSI 2 "register_operand")]
+ [(match_operand:VDQ_I 0 "register_operand")
+  (match_operand:VDQ_I 1 "register_operand")
+  (match_operand:VDQ_I 2 "register_operand")]
  "TARGET_SIMD"
 {
   rtx neg = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c 
b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
new file mode 100644
index 
..3190448e0936b9d5265f538304f9d20f13927339
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg-run.c
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -march=armv8.2-a" } */
+
+#include "vect-shr-reg.c"
+
+int
+main(void)
+{
+  int64_t a[16];
+  int64_t b[16];
+  int64_t c[17];
+
+  uint64_t ua[16];
+  uint64_t ub[16];
+  uint64_t uc[17];
+
+  int64_t res_a[16];
+  uint64_t res_ua[16];
+
+  int i;
+
+  /* Set up inputs.  */
+  for (i = 0; i < 16; i++)
+{
+  b[i] = -2;
+  c[i] = 34;
+  ub[i] = 0x;
+  uc[i] = 52;
+}
+
+  /* Set up reference values.  */
+  for (i = 0; i < 16; i++)
+{
+  res_a[i] = -1LL;
+  res_ua[i] = 0x0fffLL;
+}
+
+  /* Do the shifts.  */
+  f (ua, ub, uc);
+  g (a, b, c);
+
+  /* Compare outputs against reference values.  */
+  for (i = 0; i < 16; i++)
+{
+  if (a[i] != res_a[i])
+   __builtin_abort ();
+
+  if (ua[i] != res_ua[i])
+   __builtin_abort ();
+}
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c 
b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
new file mode 100644
index 
..5736dafb5a19957032e7b4bc1e90b218f52788fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-shr-reg.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a" } */
+
+#include 
+#include 
+
+#pragma GCC target "+nosve"
+
+int __attribute__((noinline))
+f(uint64_t *__restrict a, uint64_t *__restrict b, uint64_t *__restrict c)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+a[i] = b[i] >> c[i];
+}
+
+
+int __attribute__((noinline))
+g(int64_t *__restrict a, int64_t *__restrict b, int64_t *__restrict c)
+{
+  int i;
+
+  for (i = 0; i < 16; i++)
+a[i] = b[i] >> c[i];
+}
+
+/* { dg-final { scan-assembler "neg\\tv" } } */
+/* { dg-final { scan-assembler "ushl\\tv" } } */
+/* { dg-final { scan-assembler "sshl\\tv" } } */


Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-08-05 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> V2 of this change implements the same approach as for the multiply
> and add-widen patches.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  
>
> * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
> of vec_select high-half from being added into Neon subtract
> cost.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vsubX_high_cost.c: New test.

OK, thanks.

Richard

> From: Jonathan Wright
> Sent: 29 July 2021 10:23
> To: gcc-patches@gcc.gnu.org 
> Cc: Richard Sandiford ; Kyrylo Tkachov 
> 
> Subject: [PATCH] aarch64: Don't include vec_select high-half in SIMD subtract 
> cost
>
> Hi,
>
> The Neon subtract-long/subract-widen instructions can select the top
> or bottom half of the operand registers. This selection does not
> change the cost of the underlying instruction and this should be
> reflected by the RTL cost function.
>
> This patch adds RTL tree traversal in the Neon subtract cost function
> to match vec_select high-half of its operands. This traversal
> prevents the cost of the vec_select from being added into the cost of
> the subtract - meaning that these instructions can now be emitted in
> the combine pass as they are no longer deemed prohibitively
> expensive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  
>
> * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
> of vec_select high-half from being added into Neon subtract
> cost.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vsubX_high_cost.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> cc92cc9c208e63f262c22c7fe8e6915825884775..89129c8ecf1655fbb69437733b0d42d79c864836
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13089,6 +13089,21 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> outer ATTRIBUTE_UNUSED,
>   op1 = XEXP (x, 1);
>  
>  cost_minus:
> + if (VECTOR_MODE_P (mode))
> +   {
> + /* SUBL2 and SUBW2.  */
> + unsigned int vec_flags = aarch64_classify_vector_mode (mode);
> + if (vec_flags & VEC_ADVSIMD)
> +   {
> + /* The select-operand-high-half versions of the sub instruction
> +have the same cost as the regular three vector version -
> +don't add the costs of the select into the costs of the sub.
> +*/
> + op0 = aarch64_strip_extend_vec_half (op0);
> + op1 = aarch64_strip_extend_vec_half (op1);
> +   }
> +   }
> +
>   *cost += rtx_cost (op0, mode, MINUS, 0, speed);
>  
>   /* Detect valid immediates.  */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vsubX_high_cost.c 
> b/gcc/testsuite/gcc.target/aarch64/vsubX_high_cost.c
> new file mode 100644
> index 
> ..09bc7fc7766e8bcb468d592cbf4005a57cf09397
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vsubX_high_cost.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include 
> +
> +#define TEST_SUBL(rettype, intype, ts, rs) \
> +  rettype test_vsubl_ ## ts (intype a, intype b, intype c) \
> + { \
> + rettype t0 = vsubl_ ## ts (vget_high_ ## ts (a), \
> +vget_high_ ## ts (c)); \
> + rettype t1 = vsubl_ ## ts (vget_high_ ## ts (b), \
> +vget_high_ ## ts (c)); \
> + return vaddq ## _ ## rs (t0, t1); \
> + }
> +
> +TEST_SUBL (int16x8_t, int8x16_t, s8, s16)
> +TEST_SUBL (uint16x8_t, uint8x16_t, u8, u16)
> +TEST_SUBL (int32x4_t, int16x8_t, s16, s32)
> +TEST_SUBL (uint32x4_t, uint16x8_t, u16, u32)
> +TEST_SUBL (int64x2_t, int32x4_t, s32, s64)
> +TEST_SUBL (uint64x2_t, uint32x4_t, u32, u64)
> +
> +#define TEST_SUBW(rettype, intype, intypel, ts, rs) \
> +  rettype test_vsubw_ ## ts (intype a, intype b, intypel c) \
> + { \
> + rettype t0 = vsubw_ ## ts (a, vget_high_ ## ts (c)); \
> + rettype t1 = vsubw_ ## ts (b, vget_high_ ## ts (c)); \
> + return vaddq ## _ ## rs (t0, t1); \
> + }
> +
> +TEST_SUBW (int16x8_t, int16x8_t, int8x16_t, s8, s16)
> +TEST_SUBW (uint16x8_t, uint16x8_t, uint8x16_t, u8, u16)
> +TEST_SUBW (int32x4_t, int32x4_t, int16x8_t, s16, s32)
> +TEST_SUBW (uint32x4_t, uint32x4_t, uint16x8_t, u16, u32)
> +TEST_SUBW (int64x2_t, int64x2_t, int32x4_t, s32, s64)
> +TEST_SUBW (uint64x2_t, uint64x2_t, uint32x4_t, u32, u64)
> +
> +/* { dg-final { scan-assembler-not "dup\\t" } } */


Re: [PATCH] doc: Document cond_* shift optabs in md.texi

2021-08-05 Thread Richard Biener via Gcc-patches
On Thu, Aug 5, 2021 at 12:37 PM Richard Sandiford via Gcc-patches
 wrote:
>
> As per $SUBJECT.  OK to install?

OK.

> Richard
>
>
> gcc/
> PR middle-end/101787
> * doc/md.texi (cond_ashl, cond_ashr, cond_lshr): Document.
> ---
>  gcc/doc/md.texi | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index f6d1bc1ad0f..f8047aefccc 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6921,6 +6921,9 @@ operand 0, otherwise (operand 2 + operand 3) is moved.
>  @cindex @code{cond_smax@var{mode}} instruction pattern
>  @cindex @code{cond_umin@var{mode}} instruction pattern
>  @cindex @code{cond_umax@var{mode}} instruction pattern
> +@cindex @code{cond_ashl@var{mode}} instruction pattern
> +@cindex @code{cond_ashr@var{mode}} instruction pattern
> +@cindex @code{cond_lshr@var{mode}} instruction pattern
>  @item @samp{cond_add@var{mode}}
>  @itemx @samp{cond_sub@var{mode}}
>  @itemx @samp{cond_mul@var{mode}}
> @@ -6935,6 +6938,9 @@ operand 0, otherwise (operand 2 + operand 3) is moved.
>  @itemx @samp{cond_smax@var{mode}}
>  @itemx @samp{cond_umin@var{mode}}
>  @itemx @samp{cond_umax@var{mode}}
> +@itemx @samp{cond_ashl@var{mode}}
> +@itemx @samp{cond_ashr@var{mode}}
> +@itemx @samp{cond_lshr@var{mode}}
>  When operand 1 is true, perform an operation on operands 2 and 3 and
>  store the result in operand 0, otherwise store operand 4 in operand 0.
>  The operation works elementwise if the operands are vectors.
> @@ -6962,6 +6968,11 @@ Operands 0, 2, 3 and 4 all have mode @var{m}.  Operand 
> 1 is a scalar
>  integer if @var{m} is scalar, otherwise it has the mode returned by
>  @code{TARGET_VECTORIZE_GET_MASK_MODE}.
>
> +@samp{cond_@var{op}@var{mode}} generally corresponds to a conditional
> +form of @samp{@var{op}@var{mode}3}.  As an exception, the vector forms
> +of shifts correspond to patterns like @code{vashl@var{mode}3} rather
> +than patterns like @code{ashl@var{mode}3}.
> +
>  @cindex @code{cond_fma@var{mode}} instruction pattern
>  @cindex @code{cond_fms@var{mode}} instruction pattern
>  @cindex @code{cond_fnma@var{mode}} instruction pattern
> --
> 2.17.1
>


Re: [PATCH V2] aarch64: Don't include vec_select high-half in SIMD add cost

2021-08-05 Thread Richard Sandiford via Gcc-patches
Jonathan Wright  writes:
> Hi,
>
> V2 of this patch uses the same approach as that just implemented
> for the multiply high-half cost patch.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  
>
> * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
> of vec_select high-half from being added into Neon add cost.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vaddX_high_cost.c: New test.

OK, thanks.

Richard

>
> From: Jonathan Wright
> Sent: 29 July 2021 10:22
> To: gcc-patches@gcc.gnu.org 
> Cc: Richard Sandiford ; Kyrylo Tkachov 
> 
> Subject: [PATCH] aarch64: Don't include vec_select high-half in SIMD add cost
>
> Hi,
>
> The Neon add-long/add-widen instructions can select the top or bottom
> half of the operand registers. This selection does not change the
> cost of the underlying instruction and this should be reflected by
> the RTL cost function.
>
> This patch adds RTL tree traversal in the Neon add cost function to
> match vec_select high-half of its operands. This traversal prevents
> the cost of the vec_select from being added into the cost of the
> subtract - meaning that these instructions can now be emitted in the
> combine pass as they are no longer deemed prohibitively expensive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-07-28  Jonathan Wright  
>
> * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
> of vec_select high-half from being added into Neon add cost.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vaddX_high_cost.c: New test.
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 10a436ad7e6fa6c5de706ee5abbdc6fb3d268076..cc92cc9c208e63f262c22c7fe8e6915825884775
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -13161,6 +13161,21 @@ cost_minus:
>   op1 = XEXP (x, 1);
>  
>  cost_plus:
> + if (VECTOR_MODE_P (mode))
> +   {
> + /* ADDL2 and ADDW2.  */
> + unsigned int vec_flags = aarch64_classify_vector_mode (mode);
> + if (vec_flags & VEC_ADVSIMD)
> +   {
> + /* The select-operand-high-half versions of the add instruction
> +have the same cost as the regular three vector version -
> +don't add the costs of the select into the costs of the add.
> +*/
> + op0 = aarch64_strip_extend_vec_half (op0);
> + op1 = aarch64_strip_extend_vec_half (op1);
> +   }
> +   }
> +
>   if (GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMPARE
>   || GET_RTX_CLASS (GET_CODE (op0)) == RTX_COMM_COMPARE)
> {
> diff --git a/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c 
> b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
> new file mode 100644
> index 
> ..43f28d597a94d8aceac87ef2240a50cc56c07240
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vaddX_high_cost.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include 
> +
> +#define TEST_ADDL(rettype, intype, ts, rs) \
> +  rettype test_vaddl_ ## ts (intype a, intype b, intype c) \
> + { \
> + rettype t0 = vaddl_ ## ts (vget_high_ ## ts (a), \
> +vget_high_ ## ts (c)); \
> + rettype t1 = vaddl_ ## ts (vget_high_ ## ts (b), \
> +vget_high_ ## ts (c)); \
> + return vaddq ## _ ## rs (t0, t1); \
> + }
> +
> +TEST_ADDL (int16x8_t, int8x16_t, s8, s16)
> +TEST_ADDL (uint16x8_t, uint8x16_t, u8, u16)
> +TEST_ADDL (int32x4_t, int16x8_t, s16, s32)
> +TEST_ADDL (uint32x4_t, uint16x8_t, u16, u32)
> +TEST_ADDL (int64x2_t, int32x4_t, s32, s64)
> +TEST_ADDL (uint64x2_t, uint32x4_t, u32, u64)
> +
> +#define TEST_ADDW(rettype, intype, intypel, ts, rs) \
> +  rettype test_vaddw_ ## ts (intype a, intype b, intypel c) \
> + { \
> + rettype t0 = vaddw_ ## ts (a, vget_high_ ## ts (c)); \
> + rettype t1 = vaddw_ ## ts (b, vget_high_ ## ts (c)); \
> + return vaddq ## _ ## rs (t0, t1); \
> + }
> +
> +TEST_ADDW (int16x8_t, int16x8_t, int8x16_t, s8, s16)
> +TEST_ADDW (uint16x8_t, uint16x8_t, uint8x16_t, u8, u16)
> +TEST_ADDW (int32x4_t, int32x4_t, int16x8_t, s16, s32)
> +TEST_ADDW (uint32x4_t, uint32x4_t, uint16x8_t, u16, u32)
> +TEST_ADDW (int64x2_t, int64x2_t, int32x4_t, s32, s64)
> +TEST_ADDW (uint64x2_t, uint64x2_t, uint32x4_t, u32, u64)
> +
> +/* { dg-final { scan-assembler-not "dup\\t" } } */


[PATCH] doc: Document cond_* shift optabs in md.texi

2021-08-05 Thread Richard Sandiford via Gcc-patches
As per $SUBJECT.  OK to install?

Richard


gcc/
PR middle-end/101787
* doc/md.texi (cond_ashl, cond_ashr, cond_lshr): Document.
---
 gcc/doc/md.texi | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index f6d1bc1ad0f..f8047aefccc 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6921,6 +6921,9 @@ operand 0, otherwise (operand 2 + operand 3) is moved.
 @cindex @code{cond_smax@var{mode}} instruction pattern
 @cindex @code{cond_umin@var{mode}} instruction pattern
 @cindex @code{cond_umax@var{mode}} instruction pattern
+@cindex @code{cond_ashl@var{mode}} instruction pattern
+@cindex @code{cond_ashr@var{mode}} instruction pattern
+@cindex @code{cond_lshr@var{mode}} instruction pattern
 @item @samp{cond_add@var{mode}}
 @itemx @samp{cond_sub@var{mode}}
 @itemx @samp{cond_mul@var{mode}}
@@ -6935,6 +6938,9 @@ operand 0, otherwise (operand 2 + operand 3) is moved.
 @itemx @samp{cond_smax@var{mode}}
 @itemx @samp{cond_umin@var{mode}}
 @itemx @samp{cond_umax@var{mode}}
+@itemx @samp{cond_ashl@var{mode}}
+@itemx @samp{cond_ashr@var{mode}}
+@itemx @samp{cond_lshr@var{mode}}
 When operand 1 is true, perform an operation on operands 2 and 3 and
 store the result in operand 0, otherwise store operand 4 in operand 0.
 The operation works elementwise if the operands are vectors.
@@ -6962,6 +6968,11 @@ Operands 0, 2, 3 and 4 all have mode @var{m}.  Operand 1 
is a scalar
 integer if @var{m} is scalar, otherwise it has the mode returned by
 @code{TARGET_VECTORIZE_GET_MASK_MODE}.
 
+@samp{cond_@var{op}@var{mode}} generally corresponds to a conditional
+form of @samp{@var{op}@var{mode}3}.  As an exception, the vector forms
+of shifts correspond to patterns like @code{vashl@var{mode}3} rather
+than patterns like @code{ashl@var{mode}3}.
+
 @cindex @code{cond_fma@var{mode}} instruction pattern
 @cindex @code{cond_fms@var{mode}} instruction pattern
 @cindex @code{cond_fnma@var{mode}} instruction pattern
-- 
2.17.1



Re: [ARM] PR66791: Replace builtins for signed vmul_n intrinsics

2021-08-05 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 12 Jul 2021 at 15:24, Prathamesh Kulkarni
 wrote:
>
> On Mon, 12 Jul 2021 at 15:23, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 5 Jul 2021 at 14:47, Prathamesh Kulkarni
> >  wrote:
> > >
> > > Hi,
> > > This patch replaces builtins with __a * __b for signed variants of
> > > vmul_n intrinsics.
> > > As discussed earlier, the patch has issue if __a * __b overflows, and
> > > whether we wish to leave
> > > that as UB.
> > ping 
> > https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6785eb595981abd93ad85edcfdf1d2e43c0841f5
> Oops sorry, I meant this link:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574428.html
ping * 2 https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574428.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh


Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-05 Thread Richard Biener via Gcc-patches
On Thu, Aug 5, 2021 at 11:43 AM Hongtao Liu  wrote:
>
> On Thu, Aug 5, 2021 at 5:24 PM Richard Biener
>  wrote:
> >
> > On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu  wrote:
> > >
> > > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee 
> > > > > > HFmode.
> > > > > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > > > X86_64_SSEHF_CLASS.
> > > > > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > > > (examine_argument): Ditto.
> > > > > > (construct_container): Ditto.
> > > > > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > > > > X86_64_SSEHF_CLASS.
> > > > > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > > > > %xmm0.
> > > > > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > > > register.
> > > > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > > > (ix86_secondary_reload): Require gpr as intermediate 
> > > > > > register
> > > > > > to store _Float16 from sse register when sse4 is not
> > > > > > available.
> > > > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 
> > > > > > under
> > > > > > sse2.
> > > > > > (ix86_scalar_mode_supported_p): Ditto.
> > > > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > > > (*pushhf): Ditto.
> > > > > > (*movhf_internal): Ditto.
> > > > > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > > > _Float16 for x86.
> > > > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > > > which is used by extract_bit_field but not backends.
> > > > > >
> > > > [...]
> > > > >
> > > > > Ping, i'd like to ask for approval for the below codes which is
> > > > > related to generic part.
> > > > >
> > > > > start from ..
> > > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > > > index ff3b4449b37..775ee397836 100644
> > > > > > --- a/gcc/emit-rtl.c
> > > > > > +++ b/gcc/emit-rtl.c
> > > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, 
> > > > > > machine_mode imode,
> > > > > >   fix them all.  */
> > > > > >if (omode == word_mode)
> > > > > >  ;
> > > > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > > > > (reg:HF))
> > > > > > + here. Though extract_bit_field is the culprit here, not the 
> > > > > > backends.  */
> > > > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > > > +  && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > > > +;
> > > > > >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > > > store_bit_field
> > > > > >   is the culprit here, and not the backends.  */
> > > > > >else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > >
> > > > > and end here.
> > > >
> > > > So the main restriction otherwise in place is
> > > >
> > > >   /* Subregs involving floating point modes are not allowed to
> > > >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > >  (subreg:SI (reg:DF) 0) isn't.  */
> > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > > {
> > > >   if (! (known_eq (isize, osize)
> > > >  /* LRA can use subreg to store a floating point value in
> > > > an integer mode.  Although the floating point and the
> > > > integer modes need the same number of hard registers,
> > > > the size of floating point mode can be less than the
> > > > integer mode.  LRA also uses subregs for a register
> > > > should be used in different mode in on insn.  */
> > > >  || lra_in_progress))
> > > > return false;
> > > >
> > > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI 
> > > > (reg:HF)))
> > >
> > > After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> > > would be finally handled by below cut
> > > cut-
> > >   /* Find a correspondingly-sized integer field, so we can apply
> > >  shifts and masks to it.  */
> > >   scalar_int_mode int_mode;
> > >   if (!int_mode_for_mode (tmode).exists (_mode))
> > > /* If this fails, we should probably push op0 out to memory and then
> > >do a load.  */
> > > int_mode = int_mode_for_mode (mode).require ();
> > >
> > >   target = extract_fixed_bit_field 

Re: [ARM] PR66791: Replace builtin in vld1_dup intrinsics

2021-08-05 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 29 Jul 2021 at 19:58, Prathamesh Kulkarni
 wrote:
>
> Hi,
> The attached patch replaces builtins in vld1_dup intrinsics with call
> to corresponding vdup_n intrinsic and removes entry for vld1_dup from
> arm_neon_builtins.def.
> Bootstrapped+tested on arm-linux-gnueabihf.
> OK to commit ?
ping https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576321.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh


Re: [PATCH 1/2] Add emulated gather capability to the vectorizer

2021-08-05 Thread Richard Biener
On Thu, 5 Aug 2021, Christophe Lyon wrote:

> On Wed, Aug 4, 2021 at 2:08 PM Richard Biener  wrote:
> 
> > On Wed, 4 Aug 2021, Richard Sandiford wrote:
> >
> > > Richard Biener  writes:
> > > > This adds a gather vectorization capability to the vectorizer
> > > > without target support by decomposing the offset vector, doing
> > > > sclar loads and then building a vector from the result.  This
> > > > is aimed mainly at cases where vectorizing the rest of the loop
> > > > offsets the cost of vectorizing the gather.
> > > >
> > > > Note it's difficult to avoid vectorizing the offset load, but in
> > > > some cases later passes can turn the vector load + extract into
> > > > scalar loads, see the followup patch.
> > > >
> > > > On SPEC CPU 2017 510.parest_r this improves runtime from 250s
> > > > to 219s on a Zen2 CPU which has its native gather instructions
> > > > disabled (using those the runtime instead increases to 254s)
> > > > using -Ofast -march=znver2 [-flto].  It turns out the critical
> > > > loops in this benchmark all perform gather operations.
> > > >
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > > >
> > > > 2021-07-30  Richard Biener  
> > > >
> > > > * tree-vect-data-refs.c (vect_check_gather_scatter):
> > > > Include widening conversions only when the result is
> > > > still handed by native gather or the current offset
> > > > size not already matches the data size.
> > > > Also succeed analysis in case there's no native support,
> > > > noted by a IFN_LAST ifn and a NULL decl.
> > > > (vect_analyze_data_refs): Always consider gathers.
> > > > * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
> > > > Test for no IFN gather rather than decl gather.
> > > > * tree-vect-stmts.c (vect_model_load_cost): Pass in the
> > > > gather-scatter info and cost emulated gathers accordingly.
> > > > (vect_truncate_gather_scatter_offset): Properly test for
> > > > no IFN gather.
> > > > (vect_use_strided_gather_scatters_p): Likewise.
> > > > (get_load_store_type): Handle emulated gathers and its
> > > > restrictions.
> > > > (vectorizable_load): Likewise.  Emulate them by extracting
> > > > scalar offsets, doing scalar loads and a vector construct.
> > > >
> > > > * gcc.target/i386/vect-gather-1.c: New testcase.
> > > > * gfortran.dg/vect/vect-8.f90: Adjust.
> >
> 
> Hi,
> 
> The adjusted testcase now fails on aarch64:
> FAIL:  gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect
> "vectorized 23 loops" 1

That likely means it needs adjustment for the aarch64 case as well
which I didn't touch.  I suppose it's now vectorizing 24 loops?
And 24 with SVE as well, so we might be able to merge the
aarch64_sve and aarch64 && ! aarch64_sve cases?

Like with

diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
index cc1aebfbd84..c8a7d896bac 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
@@ -704,7 +704,6 @@ CALL track('KERNEL  ')
 RETURN
 END SUBROUTINE kernel
 
-! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { 
target aarch64_sve } } }
-! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" { 
target { aarch64*-*-* && { ! aarch64_sve } } } } }
+! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" { 
target aarch64*-*-* } } }
 ! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1 "vect" 
{ target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
 ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" { 
target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }

f951 vect.exp testing with and without -march=armv8.3-a+sve shows
this might work, but if you can double-check that would be nice.

Richard.


[PATCH] Remove legacy back threader.

2021-08-05 Thread Aldy Hernandez via Gcc-patches
At this point I don't see any use for the legacy mode, which I had
originally left in place during the transition.

This patch removes the legacy back threader, and cleans up the code a
bit.  There are no functional changes to the non-legacy code.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* doc/invoke.texi: Remove docs for threader-mode param.
* flag-types.h (enum threader_mode): Remove.
* params.opt: Remove threader-mode param.
* tree-ssa-threadbackward.c (class back_threader): Remove
path_is_unreachable_p.
Make find_paths private.
Add maybe_thread and thread_through_all_blocks.
Remove reference marker for m_registry.
Remove reference marker for m_profit.
(back_threader::back_threader): Adjust for registry and profit not
being references.
(dump_path): Move down.
(debug): Move down.
(class thread_jumps): Remove.
(class back_threader_registry): Remove m_all_paths.
Remove destructor.
(thread_jumps::thread_through_all_blocks): Move to back_threader
class.
(fsm_find_thread_path): Remove
(back_threader::maybe_thread): New.
(back_threader::thread_through_all_blocks): Move from
thread_jumps.
(back_threader_registry::back_threader_registry): Remove
m_all_paths.
(back_threader_registry::~back_threader_registry): Remove.
(thread_jumps::find_taken_edge): Remove.
(thread_jumps::check_subpath_and_update_thread_path): Remove.
(thread_jumps::maybe_register_path): Remove.
(thread_jumps::handle_phi): Remove.
(handle_assignment_p): Remove.
(thread_jumps::handle_assignment): Remove.
(thread_jumps::fsm_find_control_statement_thread_paths): Remove.
(thread_jumps::find_jump_threads_backwards): Remove.
(thread_jumps::find_jump_threads_backwards_with_ranger): Remove.
(try_thread_blocks): Rename find_jump_threads_backwards to
maybe_thread.
(pass_early_thread_jumps::execute): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Remove call into the legacy
code and adjust for ranger threader.
---
 gcc/doc/invoke.texi   |   3 -
 gcc/flag-types.h  |   7 -
 gcc/params.opt|  13 -
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c|   3 +-
 gcc/tree-ssa-threadbackward.c | 539 ++
 5 files changed, 61 insertions(+), 504 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4efc8b757ec..65bb9981f02 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13421,9 +13421,6 @@ Setting to 0 disables the analysis completely.
 @item modref-max-escape-points
 Specifies the maximum number of escape points tracked by modref per SSA-name.
 
-@item threader-mode
-Specifies the mode the backwards threader should run in.
-
 @item profile-func-internal-id
 A parameter to control whether to use function internal id in profile
 database lookup. If the value is 0, the compiler uses an id that
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e39673f6716..e43d1de490d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -454,13 +454,6 @@ enum evrp_mode
   EVRP_MODE_RVRP_DEBUG = EVRP_MODE_RVRP_ONLY | EVRP_MODE_DEBUG
 };
 
-/* Backwards threader mode.  */
-enum threader_mode
-{
-  THREADER_MODE_LEGACY = 0,
-  THREADER_MODE_RANGER = 1
-};
-
 /* Modes of OpenACC 'kernels' constructs handling.  */
 enum openacc_kernels
 {
diff --git a/gcc/params.opt b/gcc/params.opt
index aa2fb4047b6..92b003e38cb 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1010,19 +1010,6 @@ Maximum depth of DFS walk used by modref escape analysis.
 Common Joined UInteger Var(param_modref_max_escape_points) Init(256) Param 
Optimization
 Maximum number of escape points tracked by modref per SSA-name.
 
--param=threader-mode=
-Common Joined Var(param_threader_mode) Enum(threader_mode) 
Init(THREADER_MODE_RANGER) Param Optimization
---param=threader-mode=[legacy|ranger] Specifies the mode the backwards 
threader should run in.
-
-Enum
-Name(threader_mode) Type(enum threader_mode) UnknownError(unknown threader 
mode %qs)
-
-EnumValue
-Enum(threader_mode) String(legacy) Value(THREADER_MODE_LEGACY)
-
-EnumValue
-Enum(threader_mode) String(ranger) Value(THREADER_MODE_RANGER)
-
 -param=tm-max-aggregate-size=
 Common Joined UInteger Var(param_tm_max_aggregate_size) Init(9) Param 
Optimization
 Size in bytes after which thread-local aggregates should be instrumented with 
the logging functions instead of save/restore pairs.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
index 1c2d12aa9ea..5fc2145a432 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 5, 2021 at 5:24 PM Richard Biener
 wrote:
>
> On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu  wrote:
> >
> > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> >  wrote:
> > >
> > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > > X86_64_SSEHF_CLASS.
> > > > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > > (examine_argument): Ditto.
> > > > > (construct_container): Ditto.
> > > > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > > > X86_64_SSEHF_CLASS.
> > > > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > > > %xmm0.
> > > > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > > register.
> > > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > > (ix86_secondary_reload): Require gpr as intermediate register
> > > > > to store _Float16 from sse register when sse4 is not
> > > > > available.
> > > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > > > sse2.
> > > > > (ix86_scalar_mode_supported_p): Ditto.
> > > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > > (*pushhf): Ditto.
> > > > > (*movhf_internal): Ditto.
> > > > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > > _Float16 for x86.
> > > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > > which is used by extract_bit_field but not backends.
> > > > >
> > > [...]
> > > >
> > > > Ping, i'd like to ask for approval for the below codes which is
> > > > related to generic part.
> > > >
> > > > start from ..
> > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > > index ff3b4449b37..775ee397836 100644
> > > > > --- a/gcc/emit-rtl.c
> > > > > +++ b/gcc/emit-rtl.c
> > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, 
> > > > > machine_mode imode,
> > > > >   fix them all.  */
> > > > >if (omode == word_mode)
> > > > >  ;
> > > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > > > (reg:HF))
> > > > > + here. Though extract_bit_field is the culprit here, not the 
> > > > > backends.  */
> > > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > > +  && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > > +;
> > > > >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > > store_bit_field
> > > > >   is the culprit here, and not the backends.  */
> > > > >else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > >
> > > > and end here.
> > >
> > > So the main restriction otherwise in place is
> > >
> > >   /* Subregs involving floating point modes are not allowed to
> > >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > >  (subreg:SI (reg:DF) 0) isn't.  */
> > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > {
> > >   if (! (known_eq (isize, osize)
> > >  /* LRA can use subreg to store a floating point value in
> > > an integer mode.  Although the floating point and the
> > > integer modes need the same number of hard registers,
> > > the size of floating point mode can be less than the
> > > integer mode.  LRA also uses subregs for a register
> > > should be used in different mode in on insn.  */
> > >  || lra_in_progress))
> > > return false;
> > >
> > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI 
> > > (reg:HF)))
> >
> > After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> > would be finally handled by below cut
> > cut-
> >   /* Find a correspondingly-sized integer field, so we can apply
> >  shifts and masks to it.  */
> >   scalar_int_mode int_mode;
> >   if (!int_mode_for_mode (tmode).exists (_mode))
> > /* If this fails, we should probably push op0 out to memory and then
> >do a load.  */
> > int_mode = int_mode_for_mode (mode).require ();
> >
> >   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> > bitnum, target, unsignedp, reverse);
> > -end
> >
> > and generate things like below cut
> >
> > ---cut
> > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 6 3 7 2 (parallel [
> > (set (reg:HI 86)
> > (and:HI (subreg:HI 

Re: [PATCH 1/2] Add emulated gather capability to the vectorizer

2021-08-05 Thread Christophe Lyon via Gcc-patches
On Wed, Aug 4, 2021 at 2:08 PM Richard Biener  wrote:

> On Wed, 4 Aug 2021, Richard Sandiford wrote:
>
> > Richard Biener  writes:
> > > This adds a gather vectorization capability to the vectorizer
> > > without target support by decomposing the offset vector, doing
> > > sclar loads and then building a vector from the result.  This
> > > is aimed mainly at cases where vectorizing the rest of the loop
> > > offsets the cost of vectorizing the gather.
> > >
> > > Note it's difficult to avoid vectorizing the offset load, but in
> > > some cases later passes can turn the vector load + extract into
> > > scalar loads, see the followup patch.
> > >
> > > On SPEC CPU 2017 510.parest_r this improves runtime from 250s
> > > to 219s on a Zen2 CPU which has its native gather instructions
> > > disabled (using those the runtime instead increases to 254s)
> > > using -Ofast -march=znver2 [-flto].  It turns out the critical
> > > loops in this benchmark all perform gather operations.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > 2021-07-30  Richard Biener  
> > >
> > > * tree-vect-data-refs.c (vect_check_gather_scatter):
> > > Include widening conversions only when the result is
> > > still handed by native gather or the current offset
> > > size not already matches the data size.
> > > Also succeed analysis in case there's no native support,
> > > noted by a IFN_LAST ifn and a NULL decl.
> > > (vect_analyze_data_refs): Always consider gathers.
> > > * tree-vect-patterns.c (vect_recog_gather_scatter_pattern):
> > > Test for no IFN gather rather than decl gather.
> > > * tree-vect-stmts.c (vect_model_load_cost): Pass in the
> > > gather-scatter info and cost emulated gathers accordingly.
> > > (vect_truncate_gather_scatter_offset): Properly test for
> > > no IFN gather.
> > > (vect_use_strided_gather_scatters_p): Likewise.
> > > (get_load_store_type): Handle emulated gathers and its
> > > restrictions.
> > > (vectorizable_load): Likewise.  Emulate them by extracting
> > > scalar offsets, doing scalar loads and a vector construct.
> > >
> > > * gcc.target/i386/vect-gather-1.c: New testcase.
> > > * gfortran.dg/vect/vect-8.f90: Adjust.
>

Hi,

The adjusted testcase now fails on aarch64:
FAIL:  gfortran.dg/vect/vect-8.f90   -O   scan-tree-dump-times vect
"vectorized 23 loops" 1


Christophe

> > ---
> > >  gcc/testsuite/gcc.target/i386/vect-gather-1.c |  18 
> > >  gcc/testsuite/gfortran.dg/vect/vect-8.f90 |   2 +-
> > >  gcc/tree-vect-data-refs.c |  34 --
> > >  gcc/tree-vect-patterns.c  |   2 +-
> > >  gcc/tree-vect-stmts.c | 100 --
> > >  5 files changed, 138 insertions(+), 18 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/vect-gather-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> > > new file mode 100644
> > > index 000..134aef39666
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/vect-gather-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-Ofast -msse2 -fdump-tree-vect-details" } */
> > > +
> > > +#ifndef INDEXTYPE
> > > +#define INDEXTYPE int
> > > +#endif
> > > +double vmul(INDEXTYPE *rowstart, INDEXTYPE *rowend,
> > > +   double *luval, double *dst)
> > > +{
> > > +  double res = 0;
> > > +  for (const INDEXTYPE * col = rowstart; col != rowend; ++col,
> ++luval)
> > > +res += *luval * dst[*col];
> > > +  return res;
> > > +}
> > > +
> > > +/* With gather emulation this should be profitable to vectorize
> > > +   even with plain SSE2.  */
> > > +/* { dg-final { scan-tree-dump "loop vectorized" "vect" } } */
> > > diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> > > index 9994805d77f..cc1aebfbd84 100644
> > > --- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> > > +++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
> > > @@ -706,5 +706,5 @@ END SUBROUTINE kernel
> > >
> > >  ! { dg-final { scan-tree-dump-times "vectorized 24 loops" 1 "vect" {
> target aarch64_sve } } }
> > >  ! { dg-final { scan-tree-dump-times "vectorized 23 loops" 1 "vect" {
> target { aarch64*-*-* && { ! aarch64_sve } } } } }
> > > -! { dg-final { scan-tree-dump-times "vectorized 2\[23\] loops" 1
> "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
> > > +! { dg-final { scan-tree-dump-times "vectorized 2\[234\] loops" 1
> "vect" { target { vect_intdouble_cvt && { ! aarch64*-*-* } } } } }
> > >  ! { dg-final { scan-tree-dump-times "vectorized 17 loops" 1 "vect" {
> target { { ! vect_intdouble_cvt } && { ! aarch64*-*-* } } } } }
> > > diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
> > > index 6995efba899..3c29ff04fd8 100644
> > > --- a/gcc/tree-vect-data-refs.c
> > > +++ 

Re: [PATCH] tree-optimization/101756 - avoid vectorizing boolean MAX reductions

2021-08-05 Thread Richard Biener
On Thu, 5 Aug 2021, Christophe Lyon wrote:

> On Wed, Aug 4, 2021 at 12:33 PM Richard Biener  wrote:
> 
> > The following avoids vectorizing MIN/MAX reductions on bools which,
> > when ending up as vector(2)  would need to be
> > adjusted because of the sign change.  The fix instead avoids any
> > reduction vectorization where the result isn't compatible
> > to the original scalar type since we don't compensate for that
> > either.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> >
> > 2021-08-04  Richard Biener  
> >
> > PR tree-optimization/101756
> > * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
> > the result of the reduction epilogue is compatible to the original
> > scalar result.
> >
> > * gcc.dg/vect/bb-slp-pr101756.c: New testcase.
> >
> 
> Hi,
> 
> The new testcase fails on aarch64 because:
>  FAIL: gcc.dg/vect/bb-slp-pr101756.c (test for excess errors)
> Excess errors:
> /gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c:4:1: warning: GCC does not
> currently support mixed size types for 'simd' functions
> 
> Can you check?

I have pushed

>From 425fce297da1696b4b5178e533d823f13fb250a5 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Thu, 5 Aug 2021 11:39:50 +0200
Subject: [PATCH] Adjust gcc.dg/vect/bb-slp-pr101756.c
To: gcc-patches@gcc.gnu.org

This adjusts the testcase for excess diagnostics emitted by some
targets because of the attribute simd usage like

warning: GCC does not currently support mixed size types for 'simd' functions

on aarch64.

2021-08-05  Richard Biener  

* gcc.dg/vect/bb-slp-pr101756.c: Add -w.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
index 9420e77f64e..de7f1806926 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
@@ -1,4 +1,6 @@
 /* { dg-do compile } */
+/* SIMD support can emit additional diagnostics.  */
+/* { dg-additional-options "-w" } */
 
 __attribute__ ((simd)) int
 tq (long int ea, int of, int kk)
-- 
2.31.1



Re: [PATCH] tree-optimization/101756 - avoid vectorizing boolean MAX reductions

2021-08-05 Thread Christophe Lyon via Gcc-patches
On Wed, Aug 4, 2021 at 12:33 PM Richard Biener  wrote:

> The following avoids vectorizing MIN/MAX reductions on bools which,
> when ending up as vector(2)  would need to be
> adjusted because of the sign change.  The fix instead avoids any
> reduction vectorization where the result isn't compatible
> to the original scalar type since we don't compensate for that
> either.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> 2021-08-04  Richard Biener  
>
> PR tree-optimization/101756
> * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Make sure
> the result of the reduction epilogue is compatible to the original
> scalar result.
>
> * gcc.dg/vect/bb-slp-pr101756.c: New testcase.
>

Hi,

The new testcase fails on aarch64 because:
 FAIL: gcc.dg/vect/bb-slp-pr101756.c (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c:4:1: warning: GCC does not
currently support mixed size types for 'simd' functions

Can you check?

Thanks

Christophe




> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c | 15 +++
>  gcc/tree-vect-slp.c |  8 +---
>  2 files changed, 20 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
> b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
> new file mode 100644
> index 000..9420e77f64e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr101756.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +__attribute__ ((simd)) int
> +tq (long int ea, int of, int kk)
> +{
> +  int bc;
> +
> +  for (bc = 0; bc < 2; ++bc)
> +{
> +  ++ea;
> +  of |= !!kk < !!ea;
> +}
> +
> +  return of;
> +}
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index a554c24e0fb..d169bed8e94 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -4847,15 +4847,17 @@ static bool
>  vectorizable_bb_reduc_epilogue (slp_instance instance,
> stmt_vector_for_cost *cost_vec)
>  {
> -  enum tree_code reduc_code
> -= gimple_assign_rhs_code (instance->root_stmts[0]->stmt);
> +  gassign *stmt = as_a  (instance->root_stmts[0]->stmt);
> +  enum tree_code reduc_code = gimple_assign_rhs_code (stmt);
>if (reduc_code == MINUS_EXPR)
>  reduc_code = PLUS_EXPR;
>internal_fn reduc_fn;
>tree vectype = SLP_TREE_VECTYPE (SLP_INSTANCE_TREE (instance));
>if (!reduction_fn_for_scalar_code (reduc_code, _fn)
>|| reduc_fn == IFN_LAST
> -  || !direct_internal_fn_supported_p (reduc_fn, vectype,
> OPTIMIZE_FOR_BOTH))
> +  || !direct_internal_fn_supported_p (reduc_fn, vectype,
> OPTIMIZE_FOR_BOTH)
> +  || !useless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)),
> +TREE_TYPE (vectype)))
>  return false;
>
>/* There's no way to cost a horizontal vector reduction via REDUC_FN so
> --
> 2.31.1
>


Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-05 Thread Richard Biener via Gcc-patches
On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu  wrote:
>
> On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
>  wrote:
> >
> > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > X86_64_SSEHF_CLASS.
> > > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > (examine_argument): Ditto.
> > > > (construct_container): Ditto.
> > > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > > X86_64_SSEHF_CLASS.
> > > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > > %xmm0.
> > > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > register.
> > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > (ix86_secondary_reload): Require gpr as intermediate register
> > > > to store _Float16 from sse register when sse4 is not
> > > > available.
> > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > > sse2.
> > > > (ix86_scalar_mode_supported_p): Ditto.
> > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > (*pushhf): Ditto.
> > > > (*movhf_internal): Ditto.
> > > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > _Float16 for x86.
> > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > which is used by extract_bit_field but not backends.
> > > >
> > [...]
> > >
> > > Ping, i'd like to ask for approval for the below codes which is
> > > related to generic part.
> > >
> > > start from ..
> > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > index ff3b4449b37..775ee397836 100644
> > > > --- a/gcc/emit-rtl.c
> > > > +++ b/gcc/emit-rtl.c
> > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode 
> > > > imode,
> > > >   fix them all.  */
> > > >if (omode == word_mode)
> > > >  ;
> > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > > (reg:HF))
> > > > + here. Though extract_bit_field is the culprit here, not the 
> > > > backends.  */
> > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > +  && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > +;
> > > >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > store_bit_field
> > > >   is the culprit here, and not the backends.  */
> > > >else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >
> > > and end here.
> >
> > So the main restriction otherwise in place is
> >
> >   /* Subregs involving floating point modes are not allowed to
> >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >  (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > {
> >   if (! (known_eq (isize, osize)
> >  /* LRA can use subreg to store a floating point value in
> > an integer mode.  Although the floating point and the
> > integer modes need the same number of hard registers,
> > the size of floating point mode can be less than the
> > integer mode.  LRA also uses subregs for a register
> > should be used in different mode in on insn.  */
> >  || lra_in_progress))
> > return false;
> >
> > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
>
> After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> would be finally handled by below cut
> cut-
>   /* Find a correspondingly-sized integer field, so we can apply
>  shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (_mode))
> /* If this fails, we should probably push op0 out to memory and then
>do a load.  */
> int_mode = int_mode_for_mode (mode).require ();
>
>   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> bitnum, target, unsignedp, reverse);
> -end
>
> and generate things like below cut
>
> ---cut
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
> (set (reg:HI 86)
> (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
> (const_int -1 [0x])))
> (clobber (reg:CC 17 flags))
> ]) 
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>  (nil))
> (insn 7 6 11 2 (set (reg:HF 82 [  ])
> 

Re: [PATCH] RISC-V: Allow unaligned accesses in cpymemsi expansion

2021-08-05 Thread Christoph Müllner via Gcc-patches
Ping.

On Thu, Jul 29, 2021 at 4:33 PM Christoph Muellner
 wrote:
>
> The RISC-V cpymemsi expansion is called, whenever the by-pieces
> infrastructure will not be taking care of the builtin expansion.
> Currently, that's the case for e.g. memcpy() with n <= 24 bytes.
> The code emitted by the by-pieces infrastructure emits code, that
> performs unaligned accesses if the target's
> riscv_slow_unaligned_access_p is false (and n is not 1).
>
> If n > 24, then the RISC-V cpymemsi expansion is called, which is
> implemented in riscv_expand_block_move(). The current implementation
> does not check riscv_slow_unaligned_access_p and never emits unaligned
> accesses.
>
> Since by-pieces emits unaligned accesses, it is reasonable to implement
> the same behaviour in the cpymemsi expansion. And that's what this patch
> is doing.
>
> The patch checks riscv_slow_unaligned_access_p at the entry and sets
> the allowed alignment accordingly. This alignment is then propagated
> down to the routines that emit the actual instructions.
>
> Without the patch a memcpy() with n==25 will be exanded only
> if the given pointers are aligned. With the patch also unaligned
> pointers are accepted if riscv_slow_unaligned_access_p is false.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.c (riscv_block_move_straight): Add
> parameter align.
> (riscv_adjust_block_mem): Replace parameter length by parameter
> align.
> (riscv_block_move_loop): Add parameter align.
> (riscv_expand_block_move): Set alignment properly if the target
> has fast unaligned access.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/builtins-strict-align.c: New test.
> * gcc.target/riscv/builtins-unaligned-1.c: New test.
> * gcc.target/riscv/builtins-unaligned-2.c: New test.
> * gcc.target/riscv/builtins-unaligned-3.c: New test.
> * gcc.target/riscv/builtins-unaligned-4.c: New test.
> * gcc.target/riscv/builtins.h: New test.
>
> Signed-off-by: Christoph Muellner 
> ---
>  gcc/config/riscv/riscv.c  | 53 +++
>  .../gcc.target/riscv/builtins-strict-align.c  | 13 +
>  .../gcc.target/riscv/builtins-unaligned-1.c   | 15 ++
>  .../gcc.target/riscv/builtins-unaligned-2.c   | 15 ++
>  .../gcc.target/riscv/builtins-unaligned-3.c   | 15 ++
>  .../gcc.target/riscv/builtins-unaligned-4.c   | 15 ++
>  gcc/testsuite/gcc.target/riscv/builtins.h | 10 
>  7 files changed, 115 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-strict-align.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins-unaligned-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/builtins.h
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 576960bb37c..0596a9ff1b6 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -3173,11 +3173,13 @@ riscv_legitimize_call_address (rtx addr)
>return addr;
>  }
>
> -/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
> +/* Emit straight-line code to move LENGTH bytes from SRC to DEST
> +   with accesses that are ALIGN bytes aligned.
> Assume that the areas do not overlap.  */
>
>  static void
> -riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
> +riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
> +  unsigned HOST_WIDE_INT align)
>  {
>unsigned HOST_WIDE_INT offset, delta;
>unsigned HOST_WIDE_INT bits;
> @@ -3185,8 +3187,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
> HOST_WIDE_INT length)
>enum machine_mode mode;
>rtx *regs;
>
> -  bits = MAX (BITS_PER_UNIT,
> - MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
> +  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
>
>mode = mode_for_size (bits, MODE_INT, 0).require ();
>delta = bits / BITS_PER_UNIT;
> @@ -3211,21 +3212,20 @@ riscv_block_move_straight (rtx dest, rtx src, 
> unsigned HOST_WIDE_INT length)
>  {
>src = adjust_address (src, BLKmode, offset);
>dest = adjust_address (dest, BLKmode, offset);
> -  move_by_pieces (dest, src, length - offset,
> - MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
> +  move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
>  }
>  }
>
>  /* Helper function for doing a loop-based block operation on memory
> -   reference MEM.  Each iteration of the loop will operate on LENGTH
> -   bytes of MEM.
> +   reference MEM.
>
> Create a new base register for use within the loop and point it to
> the start of MEM.  Create a new memory reference that uses this

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-08-05 Thread Christoph Müllner via Gcc-patches
Ping.

On Thu, Jul 29, 2021 at 9:36 PM Christoph Müllner  wrote:
>
> On Thu, Jul 29, 2021 at 8:54 PM Palmer Dabbelt  wrote:
> >
> > On Tue, 27 Jul 2021 02:32:12 PDT (-0700), cmuell...@gcc.gnu.org wrote:
> > > Ok, so if I understand correctly Palmer and Andrew prefer
> > > overlap_op_by_pieces to be controlled
> > > by its own field in the riscv_tune_param struct and not by the field
> > > slow_unaligned_access in this struct
> > > (i.e. slow_unaligned_access==false is not enough to imply
> > > overlap_op_by_pieces==true).
> >
> > I guess, but I'm not really worried about this at that level of detail
> > right now.  It's not like the tune structures form any sort of external
> > interface we have to keep stable, we can do whatever we want with those
> > fields so I'd just aim for encoding the desired behavior as simply as
> > possible rather than trying to build something extensible.
> >
> > There are really two questions we need to answer: is this code actually
> > faster for the C906, and is this what the average users wants under -Os.
>
> I never mentioned -Os.
> My main goal is code compiled for -O2, -O3 or even -Ofast.
> And I want to execute code as fast as possible.
>
> Loading hot data from cache is faster when being done by a single
> load-word instruction than 4 load-byte instructions.
> Less instructions implies less pressure for the instruction cache.
> Less instructions implies less work for a CPU pipeline.
> Architectures, which don't have a penalty for unaligned accesses
> therefore observe a performance benefit.
>
> What I understand from Andrew's email is that it is not that simple
> and implementation might have a penalty for overlapping accesses
> that is high enough to avoid them. I don't have the details for C906,
> so I can't say if that's the case.
>
> > That first one is pretty easy: just running those simple code sequences
> > under a sweep of page offsets should be sufficient to determine if this
> > is always faster (in which case it's an easy yes), if it's always slower
> > (an easy no), or if there's some slow cases like page/cache line
> > crossing (in which case we'd need to think a bit).
> >
> > The second one is a bit tricker.  In the past we'd said these sort of
> > "actively misalign accesses to generate smaller code" sort of thing
> > isn't suitable for -Os (as most machines still have very slow unaligned
> > accesses) but is suitable for -Oz (don't remember if that ever ended up
> > in GCC, though).  That still seems like a reasonable decision, but if it
> > turns out that implementations with fast unaligned accesses become the
> > norm then it'd probably be worth revisiting it.  Not sure exactly how to
> > determine that tipping point, but I think we're a long way away from it
> > right now.
> >
> > IMO it's really just premature to try and design an encoding of the
> > tuning paramaters until we have an idea of what they are, as we'll just
> > end up devolving down the path of trying to encode all possible hardware
> > and that's generally a huge waste of time.  Since there's no ABI here we
> > can refactor this however we want as new tunings show up.
>
> I guess you mean that there needs to be a clear benefit for a supported
> machine in GCC. Either obviously (see below), by measurement results,
> or by decision
> of the machine's maintainer (especially if the decision is a trade-off).
>
> >
> > > I don't have access to pipeline details that give proof that there are 
> > > cases
> > > where this patch causes a performance penalty.
> > >
> > > So, I leave this here as a summary for someone who has enough information 
> > > and
> > > interest to move this forward:
> > > * the original patch should be sufficient, but does not have tests:
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575791.html
> > > * the tests can be taken from this patch:
> > >   https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html
> > >   Note, that there is a duplicated "sw" in builtins-overlap-6.c, which
> > > should be a "sd".
> > >
> > > Thanks for the feedback!
> >
> > Cool.  Looks like the C906 is starting to show up in the real world, so
> > we should be able to find someone who has access to one and cares enough
> > to at least run some simple benchamrks of these code sequences.  IMO
> > that's a pretty low interest bar, so I don't see any harm in waiting --
> > when the hardware is common then I'm sure someone will care enough to
> > give this a shot, and until then it's not really impacting anyone either
> > way.
> >
> > The -Os thing is a bigger discussion, and while I'm happy to have it I
> > don't really think we're even close to these being common enough yet.  I
> > saw your memmove patch and think the same rationale might apply there,
> > but I haven't looked closely and won't have time to for a bit as I've
> > got to get around to the other projects.
>
> The cpymemsi patch is also targeting -O2 or higher for fast code execution.
> And it is one of the cases 

Re: [PATCH 1/7] dwarf: externalize lookup_type_die

2021-08-05 Thread Richard Biener via Gcc-patches
On Wed, Aug 4, 2021 at 7:55 PM David Faust via Gcc-patches
 wrote:
>
> Expose the function lookup_type_die in dwarf2out, so that it can be used
> by CTF/BTF when adding BPF CO-RE information. The function is now
> non-static, and an extern prototype is added in dwarf2out.h.

OK.

> gcc/ChangeLog:
>
> * dwarf2out.c (lookup_type_die): Function is no longer static.
> * dwarf2out.h: Expose it here.
> ---
>  gcc/dwarf2out.c | 3 +--
>  gcc/dwarf2out.h | 1 +
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 1022fb75315..f32084c3eaf 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -3740,7 +3740,6 @@ static bool remove_AT (dw_die_ref, enum 
> dwarf_attribute);
>  static void remove_child_TAG (dw_die_ref, enum dwarf_tag);
>  static void add_child_die (dw_die_ref, dw_die_ref);
>  static dw_die_ref new_die (enum dwarf_tag, dw_die_ref, tree);
> -static dw_die_ref lookup_type_die (tree);
>  static dw_die_ref strip_naming_typedef (tree, dw_die_ref);
>  static dw_die_ref lookup_type_die_strip_naming_typedef (tree);
>  static void equate_type_number_to_die (tree, dw_die_ref);
> @@ -5838,7 +5837,7 @@ new_die (enum dwarf_tag tag_value, dw_die_ref 
> parent_die, tree t)
>
>  /* Return the DIE associated with the given type specifier.  */
>
> -static inline dw_die_ref
> +dw_die_ref
>  lookup_type_die (tree type)
>  {
>dw_die_ref die = TYPE_SYMTAB_DIE (type);
> diff --git a/gcc/dwarf2out.h b/gcc/dwarf2out.h
> index b2152a53bf9..312a9909784 100644
> --- a/gcc/dwarf2out.h
> +++ b/gcc/dwarf2out.h
> @@ -417,6 +417,7 @@ extern dw_die_ref new_die_raw (enum dwarf_tag);
>  extern dw_die_ref base_type_die (tree, bool);
>
>  extern dw_die_ref lookup_decl_die (tree);
> +extern dw_die_ref lookup_type_die (tree);
>
>  extern dw_die_ref dw_get_die_child (dw_die_ref);
>  extern dw_die_ref dw_get_die_sib (dw_die_ref);
> --
> 2.32.0
>


Re: [PATCH v3] Make loops_list support an optional loop_p root

2021-08-05 Thread Kewen.Lin via Gcc-patches
on 2021/8/4 下午8:04, Richard Biener wrote:
> On Wed, Aug 4, 2021 at 12:47 PM Kewen.Lin  wrote:
>>
>> on 2021/8/4 下午6:01, Richard Biener wrote:
>>> On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin  wrote:

 on 2021/8/3 下午8:08, Richard Biener wrote:
> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin  wrote:
>>
>> on 2021/7/29 下午4:01, Richard Biener wrote:
>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin  wrote:

 on 2021/7/22 下午8:56, Richard Biener wrote:
> On Tue, Jul 20, 2021 at 4:37
> PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This v2 has addressed some review comments/suggestions:
>>
>>   - Use "!=" instead of "<" in function operator!= (const Iter )
>>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
>> to support loop hierarchy tree rather than just a function,
>> and adjust to use loops* accordingly.
>
> I actually meant struct loop *, not struct loops * ;)  At the point
> we pondered to make loop invariant motion work on single
> loop nests we gave up not only but also because it iterates
> over the loop nest but all the iterators only ever can process
> all loops, not say, all loops inside a specific 'loop' (and
> including that 'loop' if LI_INCLUDE_ROOT).  So the
> CTOR would take the 'root' of the loop tree as argument.
>
> I see that doesn't trivially fit how loops_list works, at least
> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> could be adjusted to do ONLY_INNERMOST as well?
>


 Thanks for the clarification!  I just realized that the previous
 version with struct loops* is problematic, all traversal is
 still bounded with outer_loop == NULL.  I think what you expect
 is to respect the given loop_p root boundary.  Since we just
 record the loops' nums, I think we still need the function* fn?
>>>
>>> Would it simplify things if we recorded the actual loop *?
>>>
>>
>> I'm afraid it's unsafe to record the loop*.  I had the same
>> question why the loop iterator uses index rather than loop* when
>> I read this at the first time.  I guess the design of processing
>> loops allows its user to update or even delete the folllowing
>> loops to be visited.  For example, when the user does some tricks
>> on one loop, then it duplicates the loop and its children to
>> somewhere and then removes the loop and its children, when
>> iterating onto its children later, the "index" way will check its
>> validity by get_loop at that point, but the "loop *" way will
>> have some recorded pointers to become dangling, can't do the
>> validity check on itself, seems to need a side linear search to
>> ensure the validity.
>>
>>> There's still the to_visit reserve which needs a bound on
>>> the number of loops for efficiency reasons.
>>>
>>
>> Yes, I still keep the fn in the updated version.
>>
 So I add one optional argument loop_p root and update the
 visiting codes accordingly.  Before this change, the previous
 visiting uses the outer_loop == NULL as the termination condition,
 it perfectly includes the root itself, but with this given root,
 we have to use it as the termination condition to avoid to iterate
 onto its possible existing next.

 For LI_ONLY_INNERMOST, I was thinking whether we can use the
 code like:

 struct loops *fn_loops = loops_for_fn (fn)->larray;
 for (i = 0; vec_safe_iterate (fn_loops, i, ); i++)
 if (aloop != NULL
 && aloop->inner == NULL
 && flow_loop_nested_p (tree_root, aloop))
  this->to_visit.quick_push (aloop->num);

 it has the stable bound, but if the given root only has several
 child loops, it can be much worse if there are many loops in fn.
 It seems impossible to predict the given root loop hierarchy size,
 maybe we can still use the original linear searching for the case
 loops_for_fn (fn) == root?  But since this visiting seems not so
 performance critical, I chose to share the code originally used
 for FROM_INNERMOST, hope it can have better readability and
 maintainability.
>>>
>>> I was indeed looking for something that has execution/storage
>>> bound on the subtree we're interested in.  If we pull the CTOR
>>> out-of-line we can probably keep the linear search for
>>> LI_ONLY_INNERMOST when looking at the whole loop tree.
>>>
>>
>> OK, I've moved the suggested single loop tree walker out-of-line
>> to cfgloop.c, and brought the linear search back for
>> LI_ONLY_INNERMOST when looking at the whole 

[PATCH] sanitizer: cherry pick 414482751452e54710f16bae58458c66298aaf69

2021-08-05 Thread Martin Liška

I'm going to push the commit to all active branches (except master).

The patch is needed in order to support recent glibc (2.34).

libsanitizer/ChangeLog:

PR sanitizer/101749
* sanitizer_common/sanitizer_posix_libcdep.cpp: Prevent
generation of dependency on _cxa_guard for static
initialization.
---
 libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp 
b/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp
index 7ff48c35851..a65b16f5290 100644
--- a/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp
@@ -166,9 +166,10 @@ bool SupportsColoredOutput(fd_t fd) {
 #if !SANITIZER_GO
 // TODO(glider): different tools may require different altstack size.
 static uptr GetAltStackSize() {
-  // SIGSTKSZ is not enough.
-  static const uptr kAltStackSize = SIGSTKSZ * 4;
-  return kAltStackSize;
+  // Note: since GLIBC_2.31, SIGSTKSZ may be a function call, so this may be
+  // more costly that you think. However GetAltStackSize is only call 2-3 times
+  // per thread so don't cache the evaluation.
+  return SIGSTKSZ * 4;
 }
 
 void SetAlternateSignalStack() {

--
2.32.0



Re: [PATCH] Optimize x ? bswap(x) : 0 in tree-ssa-phiopt

2021-08-05 Thread Martin Liška

Hello.

I noticed the patch caused new Clang warnings:

/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-clang/build/gcc/tree-ssa-phiopt.c:2586:10:
 warning: comparison of different enumeration types in switch statement 
('combined_fn' and 'built_in_function') [-Wenum-compare-switch]
/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-clang/build/gcc/tree-ssa-phiopt.c:2589:10:
 warning: comparison of different enumeration types in switch statement 
('combined_fn' and 'built_in_function') [-Wenum-compare-switch]
/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-clang/build/gcc/tree-ssa-phiopt.c:2592:10:
 warning: comparison of different enumeration types in switch statement 
('combined_fn' and 'built_in_function') [-Wenum-compare-switch]

Can you please take a look?
Thanks,
Martin


Re: [PATCH, v2, libgomp, OpenMP 5.0] Implement omp_get_device_num

2021-08-05 Thread Chung-Lin Tang




On 2021/8/3 8:22 PM, Thomas Schwinge wrote:

Hi Chung-Lin!

On 2021-08-02T21:10:57+0800, Chung-Lin Tang  wrote:

--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c



+int32_t
+omp_get_device_num_ (void)
+{
+  return omp_get_device_num ();
+}


Missing 'ialias_redirect (omp_get_device_num)'?


Grüße
  Thomas



Thanks, will fix before committing.

Chung-Lin


[PATCH V2] aarch64: Don't include vec_select high-half in SIMD subtract cost

2021-08-05 Thread Jonathan Wright via Gcc-patches
Hi,

V2 of this change implements the same approach as for the multiply
and add-widen patches.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  

* config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
of vec_select high-half from being added into Neon subtract
cost.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vsubX_high_cost.c: New test.



From: Jonathan Wright
Sent: 29 July 2021 10:23
To: gcc-patches@gcc.gnu.org 
Cc: Richard Sandiford ; Kyrylo Tkachov 

Subject: [PATCH] aarch64: Don't include vec_select high-half in SIMD subtract 
cost 
 
Hi,

The Neon subtract-long/subract-widen instructions can select the top
or bottom half of the operand registers. This selection does not
change the cost of the underlying instruction and this should be
reflected by the RTL cost function.

This patch adds RTL tree traversal in the Neon subtract cost function
to match vec_select high-half of its operands. This traversal
prevents the cost of the vec_select from being added into the cost of
the subtract - meaning that these instructions can now be emitted in
the combine pass as they are no longer deemed prohibitively
expensive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-07-28  Jonathan Wright  

    * config/aarch64/aarch64.c: Traverse RTL tree to prevent cost
    of vec_select high-half from being added into Neon subtract
    cost.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/vsubX_high_cost.c: New test.

rb14711.patch
Description: rb14711.patch


Fix PR tree-optimization/101626

2021-08-05 Thread Eric Botcazou
This is a regression present on the mainline, caused by an oversight of mine 
in an earlier fix for SRA, whereby I forgot to exclude cases for reverse SSO.

Tested on x86-64/Linux, applied on the mainline as obvious.


2021-08-05  Eric Botcazuo  

PR tree-optimization/101626
* tree-sra.c (propagate_subaccesses_from_rhs): Do not set the
reverse scalar storage order on a pointer or vector component.


2021-08-05  Eric Botcazuo  

* gcc.dg/sso-15.c: New test.

-- 
Eric Botcazoudiff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index c05d22f3e8f..3a9e14f50a0 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -2790,7 +2790,10 @@ propagate_subaccesses_from_rhs (struct access *lacc, struct access *racc)
 	{
 	  /* We are about to change the access type from aggregate to scalar,
 	 so we need to put the reverse flag onto the access, if any.  */
-	  const bool reverse = TYPE_REVERSE_STORAGE_ORDER (lacc->type);
+	  const bool reverse
+	= TYPE_REVERSE_STORAGE_ORDER (lacc->type)
+	  && !POINTER_TYPE_P (racc->type)
+	  && !VECTOR_TYPE_P (racc->type);
 	  tree t = lacc->base;
 
 	  lacc->type = racc->type;
/* { dg-do compile } */
/* { dg-options "-O2" } */

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define REV_ENDIANNESS __attribute__((scalar_storage_order("big-endian")))
#else
#define REV_ENDIANNESS __attribute__((scalar_storage_order("little-endian")))
#endif

struct X { int *p; } REV_ENDIANNESS;

struct X x;

struct X __attribute__((noinline)) foo (int *p)
{
  struct X x;
  x.p = p;
  return x;
}

void __attribute((noinline)) bar (void)
{
  *x.p = 1;
}

extern void abort (void);

int main (void)
{
  int i = 0;
  x = foo();
  bar();
  if (i != 1)
abort ();
  return 0;
}


[PATCH] libcpp: Regenerate ucnid.h using Unicode 13.0.0 files [PR100977]

2021-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch (incremental to the makeucnid.c fix) regenerates
ucnid.h with https://www.unicode.org/Public/13.0.0/ucd/ files.

Bootstrapped/regtested on top of the previous patch (which has also
been bootstrapped/regtested alone) on x86_64-linux and i686-linux,
ok for trunk?

2021-08-04  Jakub Jelinek  

PR c++/100977
* ucnid.h: Regenerated using Unicode 13.0.0 files.

--- libcpp/ucnid.h  2021-08-04 15:53:37.436955348 +0200
+++ libcpp/ucnid.h  2021-08-04 22:55:29.010108021 +0200
@@ -391,6 +391,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x07f1 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x07f2 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x07f3 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x07fc },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x07fd },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0815 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x0819 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x081a },
@@ -401,7 +403,11 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x082d },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0858 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x085b },
-{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x08e3 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x08d2 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x08d3 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08e1 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x08e2 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x08e3 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08e5 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x08e6 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08e8 },
@@ -415,7 +421,7 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x08f6 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08f8 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x08fa },
-{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08fe },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x08ff },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0900 },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0903 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0904 },
@@ -476,6 +482,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x09e5 },
 { C99|N99|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x09ef },
 { C99|  0|CXX|C11|  0|CID|NFC|NKC|  0,   0, 0x09f1 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x09fd },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x09fe },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0a01 },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0a02 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0a04 },
@@ -683,6 +691,8 @@ static const struct ucnrange ucnranges[]
 { C99|  0|CXX|C11|  0|CID|NFC|NKC|  0,   0, 0x0d28 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0d29 },
 { C99|  0|CXX|C11|  0|CID|NFC|NKC|  0,   0, 0x0d39 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0d3a },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   9, 0x0d3c },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0d3d },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|CTX,   0, 0x0d3e },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0d43 },
@@ -754,7 +764,7 @@ static const struct ucnrange ucnranges[]
 { C99|  0|CXX|C11|  0|CID|NFC|  0|  0,   0, 0x0eb3 },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0eb7 },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0, 118, 0x0eb9 },
-{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0eba },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   9, 0x0eba },
 { C99|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0ebc },
 { C99|  0|CXX|C11|  0|CID|NFC|NKC|  0,   0, 0x0ebd },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x0ebf },
@@ -890,6 +900,13 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x1a7c },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1a7e },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x1a7f },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1aaf },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x1ab4 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x1aba },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x1abc },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x1abd },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1abe },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x1ac0 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1b05 },
 {   0|  0|  0|C11|  0|  0|NFC|NKC|  0,   0, 0x1b06 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1b07 },
@@ -940,6 +957,8 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 220, 0x1ced },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1cf3 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x1cf4 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1cf7 },
+{   0|  0|  0|C11|  0|CID|NFC|NKC|  0, 230, 0x1cf9 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x1d2b },
 {   0|  0|  0|C11|  0|CID|NFC|  0|  0,   0, 0x1d2e },
 {   0|  0|  0|C11|  

[PATCH] libcpp: Fix makeucnid bug with combining values [PR100977]

2021-08-05 Thread Jakub Jelinek via Gcc-patches
Hi!

I've noticed in ucnid.h two adjacent lines that had all flags and combine
values identical and as such were supposed to be merged.

This is due to a bug in makeucnid.c, which records last_flag,
last_combine and really_safe of what has just been printed, but
because of a typo mishandles it for last_combine, always compares against
the combining_value[0] which is 0.

This has two effects on the table, one is that often the table is
unnecessarily large, as for non-zero .combine every character has its own
record instead of adjacent characters with the same flags and combine
being merged.  This means larger tables.
The other is that sometimes the last char that has combine set doesn't
actually have it in the tables, because the code is printing entries only
upon seeing the next character and if that character does have
combining_value of 0 and flags are otherwise the same as previously printed,
it will not print anything.

The following patch fixes that, for clarity what exactly it affects
I've regenerated with the same Unicode files as last time it has
been regenerated.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-08-05  Jakub Jelinek  

PR c++/100977
* makeucnid.c (write_table): Fix computation of last_combine.
* ucnid.h: Regenerated using Unicode 6.3.0 files.

--- libcpp/makeucnid.c.jj   2021-01-04 10:25:53.555067123 +0100
+++ libcpp/makeucnid.c  2021-08-04 15:53:28.253082984 +0200
@@ -274,7 +274,7 @@ write_table (void)
combining_value[i - 1],
i - 1);
last_flag = flags[i];
-   last_combine = combining_value[0];
+   last_combine = combining_value[i];
really_safe = decomp[i][0] == 0;
   }
 
--- libcpp/ucnid.h.jj   2021-08-04 15:04:46.053701822 +0200
+++ libcpp/ucnid.h  2021-08-04 15:53:37.436955348 +0200
@@ -116,116 +116,52 @@ static const struct ucnrange ucnranges[]
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x02df },
 { C99|  0|  0|C11|  0|CID|NFC|  0|  0,   0, 0x02e4 },
 {   0|  0|  0|C11|  0|CID|NFC|NKC|  0,   0, 0x02ff },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0300 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0301 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0302 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0303 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0304 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 230, 0x0305 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0306 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0307 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0308 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0309 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x030a },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x030b },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x030c },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 230, 0x030d },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 230, 0x030e },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x030f },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 230, 0x0310 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0311 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 230, 0x0312 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0313 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 230, 0x0314 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 232, 0x0315 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0316 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0317 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0318 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0319 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 232, 0x031a },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 216, 0x031b },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x031c },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x031d },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x031e },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x031f },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0320 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 202, 0x0321 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 202, 0x0322 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0323 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0324 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0325 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0326 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 202, 0x0327 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 202, 0x0328 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0329 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x032a },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x032b },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x032c },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x032d },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x032e },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x032f },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0330 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|CTX, 220, 0x0331 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0332 },
 {   0|  0|  0|C11|N11|CID|NFC|NKC|  0, 220, 0x0333 },
-{   0|  0|  0|C11|N11|CID|NFC|NKC|  0,   1, 

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Thu, Aug 5, 2021 at 3:31 PM Hongtao Liu  wrote:
>
> On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
>  wrote:
> >
> > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu  wrote:
> > >
> > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > X86_64_SSEHF_CLASS.
> > > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > (examine_argument): Ditto.
> > > > (construct_container): Ditto.
> > > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > > X86_64_SSEHF_CLASS.
> > > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > > %xmm0.
> > > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > register.
> > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > (ix86_secondary_reload): Require gpr as intermediate register
> > > > to store _Float16 from sse register when sse4 is not
> > > > available.
> > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > > sse2.
> > > > (ix86_scalar_mode_supported_p): Ditto.
> > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > (*pushhf): Ditto.
> > > > (*movhf_internal): Ditto.
> > > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > _Float16 for x86.
> > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > which is used by extract_bit_field but not backends.
> > > >
> > [...]
> > >
> > > Ping, i'd like to ask for approval for the below codes which is
> > > related to generic part.
> > >
> > > start from ..
> > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > index ff3b4449b37..775ee397836 100644
> > > > --- a/gcc/emit-rtl.c
> > > > +++ b/gcc/emit-rtl.c
> > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode 
> > > > imode,
> > > >   fix them all.  */
> > > >if (omode == word_mode)
> > > >  ;
> > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > > (reg:HF))
> > > > + here. Though extract_bit_field is the culprit here, not the 
> > > > backends.  */
> > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > +  && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > +;
> > > >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > store_bit_field
> > > >   is the culprit here, and not the backends.  */
> > > >else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > >
> > > and end here.
> >
> > So the main restriction otherwise in place is
> >
> >   /* Subregs involving floating point modes are not allowed to
> >  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> >  (subreg:SI (reg:DF) 0) isn't.  */
> >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > {
> >   if (! (known_eq (isize, osize)
> >  /* LRA can use subreg to store a floating point value in
> > an integer mode.  Although the floating point and the
> > integer modes need the same number of hard registers,
> > the size of floating point mode can be less than the
> > integer mode.  LRA also uses subregs for a register
> > should be used in different mode in on insn.  */
> >  || lra_in_progress))
> > return false;
> >
> > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
>
> After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> would be finally handled by below cut
> cut-
>   /* Find a correspondingly-sized integer field, so we can apply
>  shifts and masks to it.  */
>   scalar_int_mode int_mode;
>   if (!int_mode_for_mode (tmode).exists (_mode))
> /* If this fails, we should probably push op0 out to memory and then
>do a load.  */
> int_mode = int_mode_for_mode (mode).require ();
>
>   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> bitnum, target, unsignedp, reverse);
> -end
>
> and generate things like below cut
>
> ---cut
> (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> (insn 6 3 7 2 (parallel [
> (set (reg:HI 86)
> (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
> (const_int -1 [0x])))
> (clobber (reg:CC 17 flags))
> ]) 
> "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> -1
>  (nil))
> (insn 7 6 11 2 (set (reg:HF 82 [  ])
> 

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-08-05 Thread Hongtao Liu via Gcc-patches
On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
 wrote:
>
> On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> > > * config/i386/i386.c (enum x86_64_reg_class): Add
> > > X86_64_SSEHF_CLASS.
> > > (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > (examine_argument): Ditto.
> > > (construct_container): Ditto.
> > > (classify_argument): Ditto, and set HFmode/HCmode to
> > > X86_64_SSEHF_CLASS.
> > > (function_value_32): Return _FLoat16/Complex Float16 by
> > > %xmm0.
> > > (function_value_64): Return _Float16/Complex Float16 by SSE
> > > register.
> > > (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > (ix86_secondary_reload): Require gpr as intermediate register
> > > to store _Float16 from sse register when sse4 is not
> > > available.
> > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under
> > > sse2.
> > > (ix86_scalar_mode_supported_p): Ditto.
> > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > (*pushhf): Ditto.
> > > (*movhf_internal): Ditto.
> > > * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > _Float16 for x86.
> > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > which is used by extract_bit_field but not backends.
> > >
> [...]
> >
> > Ping, i'd like to ask for approval for the below codes which is
> > related to generic part.
> >
> > start from ..
> > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > index ff3b4449b37..775ee397836 100644
> > > --- a/gcc/emit-rtl.c
> > > +++ b/gcc/emit-rtl.c
> > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode 
> > > imode,
> > >   fix them all.  */
> > >if (omode == word_mode)
> > >  ;
> > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > (reg:HF))
> > > + here. Though extract_bit_field is the culprit here, not the 
> > > backends.  */
> > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > +  && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > +;
> > >/* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > store_bit_field
> > >   is the culprit here, and not the backends.  */
> > >else if (known_ge (osize, regsize) && known_ge (isize, osize))
> >
> > and end here.
>
> So the main restriction otherwise in place is
>
>   /* Subregs involving floating point modes are not allowed to
>  change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
>  (subreg:SI (reg:DF) 0) isn't.  */
>   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> {
>   if (! (known_eq (isize, osize)
>  /* LRA can use subreg to store a floating point value in
> an integer mode.  Although the floating point and the
> integer modes need the same number of hard registers,
> the size of floating point mode can be less than the
> integer mode.  LRA also uses subregs for a register
> should be used in different mode in on insn.  */
>  || lra_in_progress))
> return false;
>
> I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))

After debug, I find (subreg:SI (reg:HF)) is not really needed, it
would be finally handled by below cut
cut-
  /* Find a correspondingly-sized integer field, so we can apply
 shifts and masks to it.  */
  scalar_int_mode int_mode;
  if (!int_mode_for_mode (tmode).exists (_mode))
/* If this fails, we should probably push op0 out to memory and then
   do a load.  */
int_mode = int_mode_for_mode (mode).require ();

  target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
bitnum, target, unsignedp, reverse);
-end

and generate things like below cut

---cut
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
(set (reg:HI 86)
(and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) 
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
 (nil))
(insn 7 6 11 2 (set (reg:HF 82 [  ])
(subreg:HF (reg:HI 86) 0))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
-1
 (nil))
(insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
(reg:HF 82 [  ]))
"../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
-1