Re: [PATCH v1 5/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 6

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 7/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 8

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 4/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 5

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 6/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 7

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 3/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 4

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 2/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 3

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 1/7] RISC-V: Add testcases for unsigned .SAT_ADD vector form 2

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Mon, Jun 17, 2024 22:34 PM pan2.li

Re: [PATCH v1 2/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar form 12

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Tue, Jun 18, 2024 16:25 PM Li, Pan2

Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar form 11

2024-06-18 Thread ??????
lgtm








 --Reply to Message--
 On Tue, Jun 18, 2024 16:25 PM Li, Pan2

Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-18 Thread Kewen.Lin
Hi Carl,

>> I'd expect the "-runnable" test case focuses on testing for run.  Normally,
>> the one without "-runnable" would focus on testing for compiling (scan some
>> desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test
>> for different things, maybe we should separate them into different names
>> if they don't test for a same test point.
> 
> The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various
> built-ins that didn't have any test cases.  There wasn't an intention that 
> there was 
> any connection to the existing altivec-*.c test files.  I started creating 
> runnable
> when I started adding support for built-ins that we claimed to support but 
> had never
> actually been implemented.  I created runnable tests to make sure my 
> implementation
> actually worked.  I continued to add runnable tests for built-ins
> that existed but didn't have a test case.  Adding runnable tests did find a 
> couple
> of issues where the existing implementation had a bug.  
> 
> That all said, if we want tochange the name of altivec-1-runnable.c and 
> altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps 
> we should 
> finish fixing the header for this test file, then do altivec-1-runnable, and 
> then 
> a final patch that does all the file renaming?

Yes, that's what I preferred, maybe something like altivec-run-n.c or
altivec-runnable-n.c to avoid the possible confusion.


>>> That said, I don't like not having a -mdejagnu-cpu=... here.
>>> I think for our server cpus, this is fine, but on an embedded system
>>> with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
>>> just adding -maltivec to that default may not make much sense for that
>>> default and probably should be an error.  Maybe something like:
>>
>> Yes, for some embedded cpus, there will be some error messages, but since
>> we have powerpc_altivec_ok effective target, the error would make that
>> effective target checking fail so I'd expect it'll stop it being tested
>> (unsupported).
>>
>>>
>>> /* { dg-do run { target vmx_hw } } */
>>> /* { dg-do compile { target { ! vmx_hw } } } */
>>> /* { dg-require-effective-target powerpc_altivec_ok } */
>>> /* { dg-options "-O2 -mdejagnu=power7" } */
>>>

...

> We had -mdejagnu=power8 before, but it looks like we want to go to power7 now.
> 
> It sounds like we want the following:
> 
> /* { dg-do run { target vmx_hw } } */
> /* { dg-do compile { target { ! vmx_hw } } } */
> /* { dg-options "-O2 -mdejagnu=power7" } */
> /* { dg-require-effective-target powerpc_altivec } */

As mentioned above, I'd expect powerpc_altivec can stop this being tested
without altivec feature support, so IMHO an explicit cpu type isn't necessary
(though I'm not opposed to specifying it), btw, s/-mdejagnu/-mdejagnu-cpu/.

BR,
Kewen



Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> GCC maintainers:
> 
> The patch has been updated per the feedback from version 3.  Please let me 
> know it the patch is acceptable for mainline.
> 
> Thanks.
> 
>   Carl 
> 
> --
> 
> rs6000, remove vector set and vector init built-ins
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_init_v1ti
> 
> perform the same operation as initializing the vector in C code.  For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
>   __builtin_vec_set_v2df
> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.
> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>   __builtin_vec_init_v4sf, __builtin_vec_init_v4si,
>   __builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
>   __builtin_vec_init_v2df, __builtin_vec_init_v2di,
>   __builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
>   built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 44 +++
>  1 file changed, 4 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 02aa04e5698..053dc0115d2 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1118,37 +1118,6 @@
>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>  VEC_EXT_V8HI nothing {extract}
>  
> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char);
> -VEC_INIT_V16QI nothing {init}

I just realized this {init} is customized for vec_init only, these removed 
vec_init
bifs are the only users of it, so we should remove this attribute as well.  
Sorry that
I should have found and pointed out this in the previous review.  I think it 
means
some removals are needed on:

1) comments in rs6000-builtins.def
   ;   init Process as a vec_init function

2) related gen code for this attribute bit, like:

  fprintf (header_file, "#define bif_init_bit\t\t(0x0001)\n");
  fprintf (header_file,
   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
  if (bifp->attrs.isinit)
fprintf (init_file, " | bif_init_bit");

The others look good to me!

BR,
Kewen


Re: [PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> 
> GCC maintainers:
> 
> The patch has been updated per the comments from version 3.  Please let me 
> know if the patch is acceptable for mainline.
> 
> Thanks.
> 
>  Carl 
> 
> -
> 
> rs6000, extend vec_xxpermdi built-in for __int128 args
> 
> Add a new signed and unsigned overloaded instances for vec_xxpermdi
> 
>__int128 vec_xxpermdi (__int128, __int128, const int);
>__uint128 vec_xxpermdi (__uint128, __uint128, const int);

Nit: I think we need the "vector" keyword here to avoid confusion.

> 
> Update the documentation to include a reference to the new built-in
> instances.
> 
> Add test cases for the new overloaded instances.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>   overloaded built-in instances.

Better to mention something like:  "built-in instances for vector
signed and unsigned int128".

>   * doc/extend.texi:  Add documentation for new overloaded built-in

Nit: One more space before "Add".

>   instances.

... can be extended similarly.

> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-overload.def |   4 +
>  gcc/doc/extend.texi   |   4 +
>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>  3 files changed, 237 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 6cec1ad4f1a..354f8fabe0f 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4936,6 +4936,10 @@
>  XXPERMDI_2DI  XXPERMDI_VSLL
>vull __builtin_vsx_xxpermdi (vull, vull, const int);
>  XXPERMDI_2DI  XXPERMDI_VULL
> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
> +XXPERMDI_1TI  XXPERMDI_1SQ
> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
> +XXPERMDI_1TI  XXPERMDI_1UQ

Nit: XXPERMDI_1SQ -> XXPERMDI_SQ
 XXPERMDI_1UQ -> XXPERMDI_UQ
(removing "1" to align with the above).

>vf __builtin_vsx_xxpermdi (vf, vf, const int);
>  XXPERMDI_4SF  XXPERMDI_VF
>vd __builtin_vsx_xxpermdi (vd, vd, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d7d8d149a43..9e45976436b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char 
> *);
>  
>  vector double vec_xxpermdi (vector double, vector double, const int);
>  vector float vec_xxpermdi (vector float, vector float, const int);
> +vector __int128 vec_xxpermdi (vector signed __int128,
> +  vector signed __int128, const int);

Nit: either s/vector __int128/vector signed __int128/
 or s/signed //g
to keep consistent.

> +vector __int128 vec_xxpermdi (vector unsigned __int128,
> +  vector unsigned __int128, const int);

This line misses unsigned for the return type.

OK for trunk with nits above tweaked, thanks!

BR,
Kewen


Re: [PATCH 7/13 ver4] rs6000, add overloaded vec_sel with int128 arguments

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> 
> GCC maintainers:
> 
> The patch has been updated per the comments from version 3.  Please let me 
> know if the patch is acceptable for mainline.
> 
>  Carl 
> 
> -
> 
> rs6000, add overloaded vec_sel with int128 arguments
> 
> Extend the vec_sel built-in to take three signed/unsigned/bool int128
> arguments and return a signed/unsigned/bool int128 result.
> 
> Extending the vec_sel built-in makes the existing buit-ins
> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
> patch removes these built-ins.
> 
> The patch adds documentation and test cases for the new overloaded
> vec_sel built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>   __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>   * config/rs6000/rs6000-overload.def (vec_sel): Add new
>   overloaded  definitions.

Nit: unexpected tab between "overloaded" and "definitions", should be a space,
better to mention which types of overloaded function are added, like "
for vector signed, unsigned and bool int128 types."

>   * doc/extend.texi: Add documentation for new vec_sel instances.

Likewise.

> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/builtins-10-runnable.c: New runnable test
>   file.
>   * gcc.target/powerpc/builtins-10.c: New compile only test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |   6 -
>  gcc/config/rs6000/rs6000-overload.def |  12 +
>  gcc/doc/extend.texi   |  20 ++
>  .../gcc.target/powerpc/builtins-10-runnable.c | 220 ++
>  .../gcc.target/powerpc/builtins-10.c  |  63 +
>  5 files changed, 315 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index b90b3f34167..c969cd0f3f6 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1907,12 +1907,6 @@
>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>  
> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
> -XXSEL_1TI vector_select_v1ti {}
> -
> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
> -
>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>  XXSEL_2DF vector_select_v2df {}
>  
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 4d857bb1af3..6cec1ad4f1a 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3274,6 +3274,18 @@
>  VSEL_2DF  VSEL_2DF_B
>vd __builtin_vec_sel (vd, vd, vull);
>  VSEL_2DF  VSEL_2DF_U
> +  vsq __builtin_vec_sel (vsq, vsq, vbq);
> +VSEL_1TI  VSEL_1TI_B
> +  vsq __builtin_vec_sel (vsq, vsq, vuq);
> +VSEL_1TI  VSEL_1TI_U
> +  vuq __builtin_vec_sel (vuq, vuq, vbq);
> +VSEL_1TI_UNS  VSEL_1TI_UB
> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
> +VSEL_1TI_UNS  VSEL_1TI_UU
> +  vbq __builtin_vec_sel (vbq, vbq, vbq);
> +VSEL_1TI_UNS  VSEL_1TI_BB
> +  vbq __builtin_vec_sel (vbq, vbq, vuq);
> +VSEL_1TI_UNS  VSEL_1TI_BU

Nit: Put these new lines after line "VSEL_2DI_UNS  VSEL_2DI_BU"
and before "vf __builtin_vec_sel (vf, vf, vbi);", to make all
integral element type be placed together.

>  ; The following variants are deprecated.
>vsll __builtin_vec_sel (vsll, vsll, vsll);
>  VSEL_2DI_B  VSEL_2DI_S
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b1620274285..d7d8d149a43 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21420,6 +21420,26 @@ Additional built-in functions are available for the 
> 64-bit PowerPC
>  family of processors, for efficient use of 128-bit floating point
>  (@code{__float128}) values.
>  
> +Vector select
> +
> +@smallexample
> +vector signed __int128 vec_sel (vector signed __int128,
> +   vector signed __int128, vector bool __int128);
> +vector signed __int128 vec_sel (vector signed __int128,
> +   vector signed __int128, vector unsigned __int128);
> +vector unsigned __int128 vec_sel (vector unsigned __int128,
> +   vector unsigned __int128, vector bool __int128);
> +vector unsigned __int128 vec_sel (vector unsigned __int128,
> +   vector unsigned __int128, vector unsigned __int128);
> +vector bool __int128 vec_sel (vector bool __int128,
> +   vector bool __int128, vector bool __int128);
> +vector bool __int128 vec_sel (vector bool __int128,
> +   vector bool __int128, vector unsigned __int128);
> +@end smallexample
> +
> +The instance is an extension of the exiting 

Re: [PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> 
> GCC maintainers:
> 
> As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
> __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has 
> been updated per the comments from version 3.
> 
> Please let me know if this patch is acceptable for mainline.  
> 
>  Carl 
> 
> --
> 
> rs6000, extend the current vec_{un,}signed{e,o} built-ins
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
> convert a vector of floats to signed/unsigned long long ints.  Extend the

Nit: s/signed/a vector of signed/

> existing vec_{un,}signed{e,o} built-ins to handle the argument
> vector of floats to return the even/odd signed/unsigned integers.
> 

Likewise.

> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
> built-ins.
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
> now for internal use only. They are not documented and they do not
> have testcases.
> 


> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove.

As the comments in 2/13 v4 and the previous review comments, I preferred
these two are moved to 2/13 as well (this patch should focus on extending).

> 
> Add testcases and update documentation.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
>   __builtin_vsx_xvcvdpuxws): Removed.
>   (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed

Nit: s/Renamed/Rename to/

>   __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
>   (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>   VEC_VUNSIGNEDE_V4SF respectively.

Likewise.

>   (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
>   built-in definitions.
>   * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>   vec_unsignede,vec_unsignedo):  Add new overloaded specifications.

Formatting nits: "..,.." -> ".., ..", "  " -> " "

>   * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>   vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.

Likewise.

>   * doc/extend.texi (vec_signedo, vec_signede): Add documentation
>   for new overloaded built-ins.

Missing vec_unsignedo and vec_unsignede, may be also mention for which
types, like "converting vector float to vector {un,}signed long long".

> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/builtins-3-runnable.c
>   (test_unsigned_int_result, test_ll_unsigned_int_result): Add
>   new argument.
>   (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
>   tests for the overloaded built-ins.
> ---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>  gcc/config/rs6000/vsx.md  | 84 +++
>  gcc/doc/extend.texi   | 10 +++
>  .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
>  5 files changed, 154 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 322d27b7a0d..29a9deb3410 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1688,26 +1688,26 @@
>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>  
> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
> -XVCVDPSXWS vsx_xvcvdpsxws {}
> -
>const vsll __builtin_vsx_xvcvdpuxds (vd);
>  XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>  
>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>  
> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
> -XVCVDPUXWS vsx_xvcvdpuxws {}
> -
>const vd __builtin_vsx_xvcvspdp (vf);
>  XVCVSPDP vsx_xvcvspdp {}
>  
> -  const vsll __builtin_vsx_xvcvspsxds (vf);
> -XVCVSPSXDS vsx_xvcvspsxds {}
> +  const vsll __builtin_vsignede_v4sf (vf);
> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}
> +
> +  const vsll __builtin_vsignedo_v4sf (vf);
> +VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
> +
> +  const vull __builtin_vunsignede_v4sf (vf);
> +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
>  
> -  const vsll __builtin_vsx_xvcvspuxds (vf);
> -XVCVSPUXDS vsx_xvcvspuxds {}
> +  const vull __builtin_vunsignedo_v4sf (vf);
> +VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
>  
>const vd __builtin_vsx_xvcvsxddp (vsll);
>  XVCVSXDDP vsx_floatv2div2df2 {}
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 84bd9ae6554..4d857bb1af3 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ 

Re: [PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> GCC maintainers:
> 
> Per the comments on patch 0004 from version 3, the removal of 
> The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
> moved to this patch.  The rest of the patch is unchanged from version 3.  
> There were no comments on this patch for version 3.
> 
> Please let me know if this patch is acceptable.  Thanks.
> 
> Carl 
> 
> 
> -
> 
> rs6000, Remove __builtin_vsx_xvcvspsxws,
>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

Nit: Maybe make it shorter like: Remove built-ins 
__builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

> 
> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed

Nit: Strictly speaking, not a duplicate of vec_signed but covered by it.

> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
> built-in is not documented and there are no test cases for it.
> 
> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
> vec_unsigned, remove.
> 
> The __builtin_vsx_xvcvspuxws is redundant as it is covered by
> vec_unsigned, remove.

As mentioned in the previous review, I'd expect patch 4/13 only focuses on
extending vec_{un,}signed{e,o} for vector float (aka. __builtin_vsx_xvcvspsxds
and __builtin_vsx_xvcvspuxds related), and this patch focuses on some built-in
removals which have been covered by the existing vec_{un,}signed{,e,o}, so
it can also drop the built-ins:

"The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove."

// copied from 4/13.

BR,
Kewen

> 
> This patch removes the redundant built-in.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
>   __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws):
>   Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 7c36976a089..8cf0b715898 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1697,9 +1697,6 @@
>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>  
> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
> -
>const vsi __builtin_vsx_xvcvdpuxws (vd);
>  XVCVDPUXWS vsx_xvcvdpuxws {}
>  
> @@ -1709,15 +1706,9 @@
>const vsll __builtin_vsx_xvcvspsxds (vf);
>  XVCVSPSXDS vsx_xvcvspsxds {}
>  
> -  const vsi __builtin_vsx_xvcvspsxws (vf);
> -XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
> -
>const vsll __builtin_vsx_xvcvspuxds (vf);
>  XVCVSPUXDS vsx_xvcvspuxds {}
>  
> -  const vsi __builtin_vsx_xvcvspuxws (vf);
> -XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
> -
>const vd __builtin_vsx_xvcvsxddp (vsll);
>  XVCVSXDDP vsx_floatv2div2df2 {}
>  



Re: Re: [RE] [v2] RISC-V: Add Zfbfmin extension

2024-06-18 Thread Jin Ma

Great news, thanks for the quick reply.

BR,
Jin













--
From:wangf...@eswincomputing.com 
Send Time:2024 Jun. 19 (Wed.) 08:18
To:Jin Ma
Cc:"kito.cheng"; "juzhe.zhong"; 
"jinma.contrib"; 
zengxiao; "gcc-patches"; 
gaofei
Subject:Re: Re: [RE] [v2] RISC-V: Add Zfbfmin extension


Hi Jin,


Will submit patch after internal review,maybe today.


wangf...@eswincomputing.com

 
From: Jin Ma
Date: 2024-06-18 18:25
To: wangfeng
CC: Kito Cheng; juzhe.zhong; jinma.contrib; zengxiao; gcc-patches; Fei Gao
Subject: Re: [RE] [v2] RISC-V: Add Zfbfmin extension

 
Hi, Feng
  Any new developments here on zvfbfmin and zvfbfwma?
 
BR,
Jin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--
From:Fei Gao 
Send Time:2024 Jun. 7 (Fri.) 17:34
To:jinma; "gcc-patches"; 
zengxiao; wangfeng
Cc:jeffreyalaw; Kito Cheng; 
"juzhe.zhong"; "jinma.contrib"; 
jinma
Subject:Re: [RE] [v2] RISC-V: Add Zfbfmin extension
 
 
 
 
 
 
 
Hi Jin
 
 
We have completed zvfbfmin and zvfbfwma in GCC. 
Wang Feng will post after dragon boat festival. 
 
 
BR, 
Fei
From: Jin Ma
Date: 2024-06-07 15:35
To: gcc-patches; zengxiao
CC: jeffreyalaw; kito.cheng; juzhe.zhong; jinma.contrib; Jin Ma
Subject: [RE] [v2] RISC-V: Add Zfbfmin extension
 
Hi,
 
Is there a plan to implement zvfbfmin and zvfbfwma? Or how can I get the 
relevant patches
in advance for testing? By the way, The LLVM seems to be fully implemented now 
:-)
 
Ref:
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/293
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/auto-generated/bfloat16/intrinsic_funcs.adoc
 
 
 
Thanks,
Jin
 
 
 
 






[MAINTAINERS] Update my email address and move to DCO .

2024-06-18 Thread Ramana Radhakrishnan
As $Subject. 

Pushed. 

Ramana

commit 01691a6d0582a921bbcc09ab5e0cd9e7deca2cca
Author: Ramana Radhakrishnan 
Date:   Tue Jun 18 16:05:31 2024 +0530

[MAINTAINERS] Update my email address and move to DCO.

 Signed-off-by: Ramana Radhakrishnan  

 * MAINTAINERS: Update my email address.

diff --git a/MAINTAINERS b/MAINTAINERS
index 8b6fa16f79a..41319595bb5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -59,7 +59,7 @@ amdgcn port Andrew Stubbs 
 arc port Claudiu Zissulescu 
 arm port Nick Clifton 
 arm port Richard Earnshaw 
-arm port Ramana Radhakrishnan 
+arm port Ramana Radhakrishnan 
 avr port Denis Chertykov 
 bfin port Jie Zhang 
 bpf port Jose E. Marchesi 
@@ -776,6 +776,7 @@ Immad Mir 
 Gaius Mulley 
 Andrew Pinski 
 Siddhesh Poyarekar 
+Ramana Radhakrishnan 
 Navid Rahimi 
 Rishi Raj 
 Trevor Saunders 


smime.p7s
Description: S/MIME cryptographic signature


Re: Re: [RE] [v2] RISC-V: Add Zfbfmin extension

2024-06-18 Thread wangf...@eswincomputing.com
Hi Jin,

Will submit patch after internal review,maybe today.



wangf...@eswincomputing.com
 
From: Jin Ma
Date: 2024-06-18 18:25
To: wangfeng
CC: Kito Cheng; juzhe.zhong; jinma.contrib; zengxiao; gcc-patches; Fei Gao
Subject: Re: [RE] [v2] RISC-V: Add Zfbfmin extension
 
Hi, Feng
  Any new developments here on zvfbfmin and zvfbfwma?
 
BR,
Jin
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
--
From:Fei Gao 
Send Time:2024 Jun. 7 (Fri.) 17:34
To:jinma; "gcc-patches"; 
zengxiao; wangfeng
Cc:jeffreyalaw; Kito Cheng; 
"juzhe.zhong"; "jinma.contrib"; 
jinma
Subject:Re: [RE] [v2] RISC-V: Add Zfbfmin extension
 
 
 
 
 
 
 
Hi Jin
 
 
We have completed zvfbfmin and zvfbfwma in GCC. 
Wang Feng will post after dragon boat festival. 
 
 
BR, 
Fei
From: Jin Ma
Date: 2024-06-07 15:35
To: gcc-patches; zengxiao
CC: jeffreyalaw; kito.cheng; juzhe.zhong; jinma.contrib; Jin Ma
Subject: [RE] [v2] RISC-V: Add Zfbfmin extension
 
Hi,
Is there a plan to implement zvfbfmin and zvfbfwma? Or how can I get the 
relevant patches
in advance for testing? By the way, The LLVM seems to be fully implemented now 
:-)
 
Ref:
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/293
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/auto-generated/bfloat16/intrinsic_funcs.adoc
 
 
 
Thanks,
Jin
 
 
 
 


Re: [to-be-committed] [RISC-V] Minor cleanup/improvement to bset/binv patterns

2024-06-18 Thread Maciej W. Rozycki
On Tue, 18 Jun 2024, Jeff Law wrote:

> This has gone through my tester.  I'll wait for a verdict from pre-commit CI
> before moving forward.

 Why do these "[to-be-committed]" annotations end up in the repository 
though?  It does not appear to me to be useful information to be stored 
there forever.

  Maciej


[PATCH v2] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-18 Thread Peter Bergner
Updated patch.  This passed bootstrap and regtesting on powerpc64le-linux
with no regressions.  Ok for trunk?

Changes from v1:
1. Moved the disabling of shrink-wrapping to rs6000_emit_prologue
   and beefed up comment.  Used a more accurate test.
2. Added comment to the test case on why rop_ok is needed.

Peter


rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

Only disable shrink-wrapping when using -mrop-protect when we know we
will be emitting the ROP-protect hash instructions (ie, non-leaf functions).

2024-06-17  Peter Bergner  

gcc/
PR target/114759
* config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
the disabling of shrink-wrapping from here
* config/rs6000/rs6000-logue.cc (rs6000_emit_prologue): ...to here.

gcc/testsuite/
PR target/114759
* gcc.target/powerpc/pr114759-1.c: New test.
---
 gcc/config/rs6000/rs6000-logue.cc |  5 +
 gcc/config/rs6000/rs6000.cc   |  4 
 gcc/testsuite/gcc.target/powerpc/pr114759-1.c | 16 
 3 files changed, 21 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-1.c

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 193e2122c0f..c384e48e378 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -3018,6 +3018,11 @@ rs6000_emit_prologue (void)
&& (lookup_attribute ("no_split_stack",
  DECL_ATTRIBUTES (cfun->decl))
== NULL));
+  /* If we are inserting ROP-protect hash instructions, disable shrink-wrap
+ until the bug where the hashst insn is emitted in the wrong location
+ is fixed.  See PR101324 for details.  */
+  if (info->rop_hash_size)
+flag_shrink_wrap = 0;
 
   frame_pointer_needed_indeed
 = frame_pointer_needed && df_regs_ever_live_p (HARD_FRAME_POINTER_REGNUM);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e4dc629ddcc..fd6e013c346 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void)
 }
   else if (!OPTION_SET_P (flag_cunroll_grow_size))
 flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
-
-  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
-  if (rs6000_rop_protect)
-flag_shrink_wrap = 0;
 }
 
 #ifdef TARGET_USES_LINUX64_OPT
diff --git a/gcc/testsuite/gcc.target/powerpc/pr114759-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
new file mode 100644
index 000..579e08e920f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect 
-fdump-rtl-pro_and_epilogue" } */
+/* { dg-require-effective-target rop_ok } Only enable on supported ABIs. */
+
+/* Verify we still attempt shrink-wrapping when using -mrop-protect
+   and there are no function calls.  */
+
+long
+foo (long arg)
+{
+  if (arg)
+asm ("" ::: "r20");
+  return 0;
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */
-- 
2.43.0



Re: [Committed V3 2/2] RISC-V: Move mode assertion out of conditional branch in emit_insn

2024-06-18 Thread Edwin Lu

Thanks!

Edwin

On 6/17/2024 5:33 PM, Jeff Law wrote:



On 6/17/24 12:33 PM, Edwin Lu wrote:

When emitting insns, we have an early assertion to ensure the input
operand's mode and the expanded operand's mode are the same; however, it
does not perform this check if the pattern does not have an explicit
machine mode specifying the operand. In this scenario, it will always
assume that mode = Pmode to correctly satisfy the
maybe_legitimize_operand check, however, there may be problems when
working in 32 bit environments.

Make the assert unconditional and replace it with an internal error for
more descriptive logging

gcc/ChangeLog:

* config/riscv/riscv-v.cc: Move assert out of conditional block

OK.

Jeff



Re: [Committed V3 1/2] RISC-V: Fix vwsll combine on rv32 targets

2024-06-18 Thread Edwin Lu

Committed. Thanks!

Edwin

On 6/17/2024 5:31 PM, Jeff Law wrote:



On 6/17/24 12:33 PM, Edwin Lu wrote:

On rv32 targets, vwsll_zext1_scalar_ would trigger an ice in
maybe_legitimize_instruction when zero extending a uint32 to uint64 due
to a mismatch between the input operand's mode (DI) and the expanded 
insn

operand's mode (Pmode == SI). Ensure that mode of the operands match

Tested on rv32/64 gcv newlib. Letting CI perform additional testing

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Fix mode mismatch

OK
jeff




Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-18 Thread Peter Bergner
On 6/18/24 3:38 PM, Segher Boessenkool wrote:
> From my viewpoint, -mrop-protect should not change code generation at
> all, except of course it has to emit some hash* insns :-)

Ideally, I agree with that.  That said, the hash* insns only accept negative
offsets and the allowed range is rather limited, so from a practical standpoint,
we might have to modify the prologue code slightly to satisfy the restrictions
those insns have.

Peter



[PATCH v4] RISC-V: Promote Zaamo/Zalrsc to a when using an old binutils

2024-06-18 Thread Patrick O'Neill
Binutils 2.42 and before don't support Zaamo/Zalrsc. When users specify
both Zaamo and Zalrsc, promote them to 'a' in the -march string.

This does not affect testsuite results for users with old versions of binutils.
Testcases that failed due to 'call'/isa string continue to fail after this PATCH
when using an old version of binutils.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add 'a' extension to
riscv_combine_info.

Signed-off-by: Patrick O'Neill 
---
We will emit calls if the user only specifies Zaamo or Zalrsc.
To my knowledge there isn't a way to make a testcase for this in dejagnu.
I used the most recent version of the 'a' extension arbitrarily since AFAICT the
version of the extension doesn't affect the combine logic.
---
 gcc/common/config/riscv/riscv-common.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1dc1d9904c7..410e673f5e0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -401,6 +401,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 /* Combine extensions defined in this table  */
 static const struct riscv_ext_version riscv_combine_info[] =
 {
+  {"a", ISA_SPEC_CLASS_20191213, 2, 1},
   {"zk",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zks",  ISA_SPEC_CLASS_NONE, 1, 0},
--
2.34.1



[x86 PATCH] Allow all register_operand SUBREGs in x86_ternlog_idx.

2024-06-18 Thread Roger Sayle

This patch tweaks ix86_ternlog_idx to allow any SUBREG that matches
the register_operand predicate, and is split out as an independent
piece of a patch that I have to clean-up redundant ternlog patterns
in sse.md.  It turns out that some of these patterns aren't (yet)
sufficiently redundant to be obsolete.  The problem is that the
"new" ternlog pattern has the restriction that it allows SUBREGs,
but only those where the inner and outer modes are the same size,
where regular patterns use "register_operand" which allows arbitrary
(including paradoxical) SUBREGs.

A motivating example is f2 in gcc.target/i386/avx512dq-abs-copysign-1.c

void f2 (float x, float y)
{
  register float a __asm ("xmm16"), b __asm ("xmm17");
  a = x;
  b = y;
  asm volatile ("" : "+v" (a), "+v" (b));
  a = __builtin_copysignf (a, b);
  asm volatile ("" : "+v" (a));
}

for which combine tries:

(set (subreg:V4SF (reg:SF 100 [ _3 ]) 0)
(ior:V4SF (and:V4SF (not:V4SF (reg:V4SF 104))
(subreg:V4SF (reg:SF 110) 0))
(reg:V4SF 106)))

where the SUBREG is paradoxical, with inner mode SF and outer mode V4SF.
This patch allows the recently added ternlog_operand to accept this case.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-06-18  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_ternlog_idx): Allow any SUBREG
that matches register_operand.  Use rtx_equal_p to compare REG
or SUBREG "leaf" operands.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 312329e..174c52b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -25570,27 +25570,32 @@ ix86_ternlog_idx (rtx op, rtx *args)
 
   switch (GET_CODE (op))
 {
+case SUBREG:
+  if (!register_operand (op, GET_MODE (op)))
+   return -1;
+  /* FALLTHRU */
+
 case REG:
   if (!args[0])
{
  args[0] = op;
  return 0xf0;
}
-  if (REGNO (op) == REGNO (args[0]))
+  if (rtx_equal_p (op, args[0]))
return 0xf0;
   if (!args[1])
{
  args[1] = op;
  return 0xcc;
}
-  if (REGNO (op) == REGNO (args[1]))
+  if (rtx_equal_p (op, args[1]))
return 0xcc;
   if (!args[2])
{
  args[2] = op;
  return 0xaa;
}
-  if (REG_P (args[2]) && REGNO (op) == REGNO (args[2]))
+  if (rtx_equal_p (op, args[2]))
return 0xaa;
   return -1;
 
@@ -25628,12 +25633,6 @@ ix86_ternlog_idx (rtx op, rtx *args)
return 0x55;
   return -1;
 
-case SUBREG:
-  if (GET_MODE_SIZE (GET_MODE (SUBREG_REG (op)))
- != GET_MODE_SIZE (GET_MODE (op)))
-   return -1;
-  return ix86_ternlog_idx (SUBREG_REG (op), args);
-
 case NOT:
   idx0 = ix86_ternlog_idx (XEXP (op, 0), args);
   return (idx0 >= 0) ? idx0 ^ 0xff : -1;


RE: [PATCH v4] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-18 Thread Pengxuan Zheng (QUIC)
> On Mon, Jun 17, 2024 at 11:25 PM Pengxuan Zheng
>  wrote:
> >
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> >   cnt v1.16b, v0.16b
> >   uaddlp  v2.8h, v1.16b
> >
> > For V4HI, we generate:
> >   cnt v1.8b, v0.8b
> >   uaddlp  v2.4h, v1.8b
> >
> > For V4SI, we generate:
> >   cnt v1.16b, v0.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >
> > For V4SI with TARGET_DOTPROD, we generate the following instead:
> >   moviv0.4s, #0
> >   moviv1.16b, #1
> >   cnt v3.16b, v2.16b
> >   udotv0.4s, v3.16b, v1.16b
> >
> > For V2SI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >   uaddlp  v3.2s, v2.4h
> >
> > For V2SI with TARGET_DOTPROD, we generate the following instead:
> >   moviv0.8b, #0
> >   moviv1.8b, #1
> >   cnt v3.8b, v2.8b
> >   udotv0.2s, v3.8b, v1.8b
> >
> > For V2DI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >   uaddlp  v4.2d, v3.4s
> >
> > For V4SI with TARGET_DOTPROD, we generate the following instead:
> >   moviv0.4s, #0
> >   moviv1.16b, #1
> >   cnt v3.16b, v2.16b
> >   udotv0.4s, v3.16b, v1.16b
> >   uaddlp  v0.2d, v0.4s
> >
> > PR target/113859
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (aarch64_addlp):
> Rename to...
> > (@aarch64_addlp): ... This.
> > (popcount2): New define_expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/popcnt-udot.c: New test.
> > * gcc.target/aarch64/popcnt-vec.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 52 +-
> >  .../gcc.target/aarch64/popcnt-udot.c  | 45 
> >  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69
> > +++
> >  3 files changed, 165 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index 0bb39091a38..3bdd4400408 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3461,7 +3461,7 @@ (define_insn
> "*aarch64_addlv_ze"
> >[(set_attr "type" "neon_reduc_add")]
> >  )
> >
> > -(define_expand "aarch64_addlp"
> > +(define_expand "@aarch64_addlp"
> >[(set (match_operand: 0 "register_operand")
> > (plus:
> >   (vec_select:
> > @@ -3517,6 +3517,56 @@ (define_insn
> "popcount2"
> >[(set_attr "type" "neon_cnt")]
> >  )
> >
> > +(define_expand "popcount2"
> > +  [(set (match_operand:VDQHSD 0 "register_operand")
> > +(popcount:VDQHSD (match_operand:VDQHSD 1
> > +"register_operand")))]
> > +  "TARGET_SIMD"
> > +  {
> > +/* Generate a byte popcount. */
> > +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> > +rtx tmp = gen_reg_rtx (mode);
> > +auto icode = optab_handler (popcount_optab, mode);
> > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode,
> > +operands[1])));
> > +
> > +if (TARGET_DOTPROD)
> > +  {
> > +/* For V4SI and V2SI, we can generate a UDOT with a 0 accumulator
> and a
> > +   1 multiplicant. For V2DI, another UAADDLP is needed. */
> > +if (mode == V4SImode || mode == V2SImode
> > +|| mode == V2DImode)
> 
> I think the above simplified/modified to just `mode == SImode ||
> mode == DImode`.
> Also s/multiplicant/multiplicand/ .

Thanks, Andrew! I have updated the patch accordingly.

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655020.html
> 
> > +  {
> > +machine_mode dp_mode =  == 64 ? V2SImode : V4SImode;
> > +rtx ones = force_reg (mode, CONST1_RTX (mode));
> > +rtx zeros = CONST0_RTX (dp_mode);
> > +rtx dp = gen_reg_rtx (dp_mode);
> > +auto dp_icode = optab_handler (udot_prod_optab, mode);
> > +emit_move_insn (dp, zeros);
> > +emit_insn (GEN_FCN (dp_icode) (dp, tmp, ones, dp));
> > +if (mode == V2DImode)
> > +  {
> > +emit_insn (gen_aarch64_uaddlpv4si (operands[0], dp));
> > +DONE;
> > +  }
> > +emit_move_insn (operands[0], dp);
> > +DONE;
> > +  }
> > +  }
> > +
> > +/* Use a sequence of UADDLPs to accumulate the counts. Each step
> doubles
> > +   the element size and halves the number of elements. */
> > +do
> > +  {
> > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE
> (tmp));
> > +mode = insn_data[icode].operand[0].mode;
> > +rtx dest = mode == mode ? operands[0] : 

Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-18 Thread Segher Boessenkool
On Tue, Jun 18, 2024 at 12:53:09PM -0500, Peter Bergner wrote:
> On 6/18/24 8:20 AM, Segher Boessenkool wrote:
> > On Mon, Jun 17, 2024 at 08:54:46PM -0500, Peter Bergner wrote:
> >> So we should be able to shrink-wrap in the presence of the ROP protection.
> [snip]
> > But do we want to?  And, how far, in what cases not?
> 
> My answer to the above would be "yes", "as far as we do today without
> -mrop-protect" and "none". :-)  I don't think -mrop-protect should affect
> whether we shrink-wrap or not.

That is a good answer, and I agree :-)

> I don't think shrink-wrapping call free
> paths makes the compiled code less secure by not emitting the hashst/hashchk
> insns on those paths, so why would we do anything different wrt 
> shrink-wrapping?

>From my viewpoint, -mrop-protect should not change code generation at
all, except of course it has to emit some hash* insns :-)

If we want to have some functions noipa, then we should just put that
attribute there in the code!  Maybe some applications / libraries /
kernels / whatever should do some of that, but the compiler cannot
really help with policy questions like that.


Segher


[PATCH v5] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-18 Thread Pengxuan Zheng
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.

With this patch, we now generate the following for V8HI:
  cnt v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b

For V4HI, we generate:
  cnt v1.8b, v0.8b
  uaddlp  v2.4h, v1.8b

For V4SI, we generate:
  cnt v1.16b, v0.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h

For V4SI with TARGET_DOTPROD, we generate the following instead:
  moviv0.4s, #0
  moviv1.16b, #1
  cnt v3.16b, v2.16b
  udotv0.4s, v3.16b, v1.16b

For V2SI, we generate:
  cnt v1.8b, v.8b
  uaddlp  v2.4h, v1.8b
  uaddlp  v3.2s, v2.4h

For V2SI with TARGET_DOTPROD, we generate the following instead:
  moviv0.8b, #0
  moviv1.8b, #1
  cnt v3.8b, v2.8b
  udotv0.2s, v3.8b, v1.8b

For V2DI, we generate:
  cnt v1.16b, v.16b
  uaddlp  v2.8h, v1.16b
  uaddlp  v3.4s, v2.8h
  uaddlp  v4.2d, v3.4s

For V4SI with TARGET_DOTPROD, we generate the following instead:
  moviv0.4s, #0
  moviv1.16b, #1
  cnt v3.16b, v2.16b
  udotv0.4s, v3.16b, v1.16b
  uaddlp  v0.2d, v0.4s

PR target/113859

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_addlp): Rename to...
(@aarch64_addlp): ... This.
(popcount2): New define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-udot.c: New test.
* gcc.target/aarch64/popcnt-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md| 51 +-
 .../gcc.target/aarch64/popcnt-udot.c  | 58 
 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
 3 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 0bb39091a38..1c76123a518 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3461,7 +3461,7 @@ (define_insn 
"*aarch64_addlv_ze"
   [(set_attr "type" "neon_reduc_add")]
 )
 
-(define_expand "aarch64_addlp"
+(define_expand "@aarch64_addlp"
   [(set (match_operand: 0 "register_operand")
(plus:
  (vec_select:
@@ -3517,6 +3517,55 @@ (define_insn "popcount2"
   [(set_attr "type" "neon_cnt")]
 )
 
+(define_expand "popcount2"
+  [(set (match_operand:VDQHSD 0 "register_operand")
+(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
+  "TARGET_SIMD"
+  {
+/* Generate a byte popcount. */
+machine_mode mode =  == 64 ? V8QImode : V16QImode;
+rtx tmp = gen_reg_rtx (mode);
+auto icode = optab_handler (popcount_optab, mode);
+emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
+
+if (TARGET_DOTPROD)
+  {
+/* For V4SI and V2SI, we can generate a UDOT with a 0 accumulator and a
+   1 multiplicand. For V2DI, another UAADDLP is needed. */
+if (mode == SImode || mode == DImode)
+  {
+machine_mode dp_mode =  == 64 ? V2SImode : V4SImode;
+rtx ones = force_reg (mode, CONST1_RTX (mode));
+rtx zeros = CONST0_RTX (dp_mode);
+rtx dp = gen_reg_rtx (dp_mode);
+auto dp_icode = optab_handler (udot_prod_optab, mode);
+emit_move_insn (dp, zeros);
+emit_insn (GEN_FCN (dp_icode) (dp, tmp, ones, dp));
+if (mode == V2DImode)
+  {
+emit_insn (gen_aarch64_uaddlpv4si (operands[0], dp));
+DONE;
+  }
+emit_move_insn (operands[0], dp);
+DONE;
+  }
+  }
+
+/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles
+   the element size and halves the number of elements. */
+do
+  {
+auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
+mode = insn_data[icode].operand[0].mode;
+rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
+emit_insn (GEN_FCN (icode) (dest, tmp));
+tmp = dest;
+  }
+while (mode != mode);
+DONE;
+  }
+)
+
 ;; 'across lanes' max and min ops.
 
 ;; Template for outputting a scalar, so we can create __builtins which can be
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
new file mode 100644
index 000..150ff746361
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+dotprod -fno-vect-cost-model" } */
+
+/*
+** bar:
+** ldr q([0-9]+), \[x0\]
+** moviv([0-9]+).16b, 0x1
+** moviv([0-9]+).4s, 0
+** cnt v([0-9]+).16b, v\1.16b
+** udotv\3.4s, v\4.16b, v\2.16b
+** str q\3, \[x1\]
+** ret
+*/
+void
+bar (unsigned int 

Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2024-06-18 Thread Segher Boessenkool
On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
So, nothing here is obvious at all still.  Could you please split it up
a bit more, so that every step is either small or simple?

So maybe first just split patterns to BE and LE versions, and nothing
else?

And one patch per insn, if at all possible.

This matters so that a regression search will immediately show the
culprit pattern, if anything went wrong.

Most patches will not change anything consequential, but some will, and
it should be very clear which do!

And change (or add) comments in the patch so that I don't have to ask
the same questions as before again! :-)

Most of this seems clean and good, but there is just too much
independent stuff going on at the same time.  If your patch series is
split up correctly writing a changelog for it is very easy (this is a
good canary to use!), and if we get regressions from this it should be
trivial to fond the problem, too.

> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>  {
>emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> +  emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>  }
>else
>  {
>emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> -  emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> +  emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>  }
>DONE;
>  })

Please don't.  Call the generic gen_vmrg* patterns from the widen
things, don't try to do the compilers job of specialising stuff, it
only makes things much less readable, and causes more mistakes.  Just do
like what was there before, essentially.


Segher


Re: [Patch, rs6000, middle-end] v2: Add implementation for different targets for pair mem fusion

2024-06-18 Thread Richard Sandiford
Ajit Agarwal  writes:
> Hello Richard:
>
> On 14/06/24 4:26 pm, Richard Sandiford wrote:
>> Ajit Agarwal  writes:
>>> Hello Richard:
>>>
>>> All comments are addressed.
>> 
>> I don't think this addresses the following comments from the previous
>> reviews:
>> 
>> (1) It is not correct to mark existing insn uses as live-out.
>> The patch mustn't try to do this.
>> 
>
> Addressed in v3 of the patch.

The new version still tries to circumvent the live-out assert though.
While the old patch brute-forced the assert to false by setting
the live-out flag, the new patch just removes the assert.

Like I said earlier, the assert is showing up a real bug and we
should fix the bug rather than suppress the assert.

rtl-ssa live-out uses are somewhat like DF_LIVE_OUT in df.
They occur at the end of a basic block, in an artificial insn_info
that does not correspond to an actual rtl insn.

The comment above process_uses_of_deleted_def says:

// SET has been deleted.  Clean up all remaining uses.  Such uses are
// either dead phis or now-redundant live-out uses.

In other words, if we're removing a definition, all uses in "real"
debug and non-debug insns must be removed either earlier than the
definition or at the same time as the definition.  No such uses
should remain.  The only uses that should be left are phis and
the fake end-of-block live-out uses that I described above.  These
uses are just "plumbing" that support something that is now neither
defined nor used by real instructions.  It's therefore safe to delete
the plumbing.

Please see the previous discussion about this:


> +// Check whether load can be fusable or not.
> +// Return true if fuseable otherwise false.
> +bool
> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
> +{
> +  for (auto def : info->defs())
> +{
> +  auto set = dyn_cast (def);
> +  for (auto use1 : set->nondebug_insn_uses ())
> + use1->set_is_live_out_use (true);
> +}

 What was the reason for adding this loop?

>>>
>>> The purpose of adding is to avoid assert failure in 
>>> gcc/rtl-ssa/changes.cc:252
>>
>> That assert is making sure that we don't delete a definition of a
>> register (or memory) while a real insn still uses it.  If the assert
>> is firing then something has gone wrong.
>>
>> Live-out uses are a particular kind of use that occur at the end of
>> basic blocks.  It's incorrect to mark normal insn uses as live-out.
>>
>> When an assert fails, it's important to understand why the failure
>> occurs, rather than brute-force the assert condition to true.
>>
>
> The above assert failure occurs when there is a debug insn and its
> use is not live-out.

Uses in debug insns are never live-out uses.

It sounds like the bug is that we're failing to update all debug uses of
the original register.  We need to do that, or "reset" the debug insn if
substitution fails for some reason.

See fixup_debug_uses for what the target-independent part of the pass
does for debug insns that are affected by movement.  Hopefully the
update needed here will be simpler than that.


What happens if you leave the assert alone?  When does it fire?  Is it
still for uses in debug insns?  If so, it's the fusion pass's responsibility
to update those, as mentioned above.  And it must update them before,
or at the same time as, it deletes the definition.

Thanks,
Richard


Re: [PATCH-1v4] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-18 Thread Richard Sandiford
HAO CHEN GUI  writes:
> Hi Richard,
>
> 在 2024/6/17 17:04, Richard Sandiford 写道:
>> I don't think we should keep the single_set condition after this change.
>> insn_cost can handle all instructions.
>
> Just tested with removing single_set condition. It causes some regressions.
> If the new_rtl is a debug insn, it still can do the replacement as it can
> be recog and it's not a single_set rtl. After removing single_set condition,
> the "ok" (recog) will be reset to false in the "if" block as the cost of
> debug insn is unknown. So the replacement won't be done.
>
>   bool ok = recog (attempt, use_change);
>   if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
> {
>   bool strict_p = !prop.likely_profitable_p ();
>   if (!change_is_worthwhile (use_change, strict_p))
> {
>   if (dump_file)
> fprintf (dump_file, "change not profitable");
>   ok = false;
> }
> }
>
>   if (!ok)
> {
>   /* The pattern didn't match, but if all uses of SRC folded to
>  constants, we can add a REG_EQUAL note for the result, if there
>  isn't one already.  */
>   if (!prop.folded_to_constants_p ())

We shouldn't be trying to cost changes to debug insns, so I think
the extra condition should be !use_insn->is_debug_insn ().

Thanks,
Richard



[PATCH] Fortran: fix for CHARACTER(len=*) dummies with bind(C) [PR115390]

2024-06-18 Thread Harald Anlauf
Dear all,

the attached simple patch fixes warnings for use of uninitialized
temporaries for the string length before being defined.  The cause
is obvious: type sizes were being calculated before the temporaries
were set from the descriptor for the dummy passed to the BIND(C)
procedure.  Wrong code might have been possible as well.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 95a3cefd5e84cf0d393c2606757894389c08ebba Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 18 Jun 2024 21:57:19 +0200
Subject: [PATCH] Fortran: fix for CHARACTER(len=*) dummies with bind(C)
 [PR115390]

gcc/fortran/ChangeLog:

	PR fortran/115390
	* trans-decl.cc (gfc_conv_cfi_to_gfc): Move derivation of type sizes
	for character via gfc_trans_vla_type_sizes to after character length
	has been set.

gcc/testsuite/ChangeLog:

	PR fortran/115390
	* gfortran.dg/bind_c_char_11.f90: New test.
---
 gcc/fortran/trans-decl.cc|  4 +-
 gcc/testsuite/gfortran.dg/bind_c_char_11.f90 | 45 
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/bind_c_char_11.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index dca7779528b..704f24be84a 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -7063,8 +7063,8 @@ gfc_conv_cfi_to_gfc (stmtblock_t *init, stmtblock_t *finally,
   if (sym->ts.type == BT_CHARACTER
   && !INTEGER_CST_P (sym->ts.u.cl->backend_decl))
 {
-  gfc_conv_string_length (sym->ts.u.cl, NULL, init);
-  gfc_trans_vla_type_sizes (sym, init);
+  gfc_conv_string_length (sym->ts.u.cl, NULL, );
+  gfc_trans_vla_type_sizes (sym, );
 }

   /* gfc->data = cfi->base_addr - or for scalars: gfc = cfi->base_addr.
diff --git a/gcc/testsuite/gfortran.dg/bind_c_char_11.f90 b/gcc/testsuite/gfortran.dg/bind_c_char_11.f90
new file mode 100644
index 000..5ed8e82853b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bind_c_char_11.f90
@@ -0,0 +1,45 @@
+! { dg-do compile }
+! { dg-additional-options "-Wuninitialized" }
+!
+! PR fortran/115390 - fixes for CHARACTER(len=*) dummies with bind(C)
+
+module test
+  implicit none
+contains
+  subroutine bar(s,t) bind(c)
+character(*), intent(in) :: s,t
+optional :: t
+call foo(s,t)
+  end
+  subroutine bar1(s,t) bind(c)
+character(*), intent(in) :: s(:),t(:)
+optional :: t
+call foo1(s,t)
+  end
+  subroutine bar4(s,t) bind(c)
+character(len=*,kind=4), intent(in) :: s,t
+optional:: t
+call foo4(s,t)
+  end
+  subroutine bar5(s,t) bind(c)
+character(len=*,kind=4), intent(in) :: s(:),t(:)
+optional:: t
+call foo5(s,t)
+  end
+  subroutine foo(s,t)
+character(*), intent(in) :: s,t
+optional :: t
+  end
+  subroutine foo1(s,t)
+character(*), intent(in) :: s(:),t(:)
+optional :: t
+  end
+  subroutine foo4(s,t)
+character(len=*,kind=4), intent(in) :: s,t
+optional:: t
+  end
+  subroutine foo5(s,t)
+character(len=*,kind=4), intent(in) :: s(:),t(:)
+optional:: t
+  end
+end
--
2.35.3



[to-be-committed] [RISC-V] Minor cleanup/improvement to bset/binv patterns

2024-06-18 Thread Jeff Law
This patch introduces a bit_optab iterator that maps IOR/XOR to bset and 
binv (and one day bclr if we need it).  That allows us to combine some 
patterns that only differed in the RTL opcode (IOR vs XOR) and in the 
name/assembly (bset vs binv).


Additionally this also allow us to use the iterator in the 
bsetmask and bsetidisi patterns thus potentially fixing a missed 
optimization.


This has gone through my tester.  I'll wait for a verdict from 
pre-commit CI before moving forward.


Jeff



diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index ae5e7e510c0..17a036bdb60 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -569,22 +569,22 @@ (define_insn_and_split "*minmax"
 
 ;; ZBS extension.
 
-(define_insn "*bset"
+(define_insn "*"
   [(set (match_operand:X 0 "register_operand" "=r")
-   (ior:X (ashift:X (const_int 1)
-(match_operand:QI 2 "register_operand" "r"))
-  (match_operand:X 1 "register_operand" "r")))]
+   (any_or:X (ashift:X (const_int 1)
+   (match_operand:QI 2 "register_operand" "r"))
+ (match_operand:X 1 "register_operand" "r")))]
   "TARGET_ZBS"
-  "bset\t%0,%1,%2"
+  "\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
-(define_insn "*bset_mask"
+(define_insn "*_mask"
   [(set (match_operand:X 0 "register_operand" "=r")
-   (ior:X (ashift:X (const_int 1)
-(subreg:QI
- (and:X (match_operand:X 2 "register_operand" "r")
-(match_operand 3 "" 
"")) 0))
-  (match_operand:X 1 "register_operand" "r")))]
+   (any_or:X (ashift:X (const_int 1)
+   (subreg:QI
+ (and:X (match_operand:X 2 "register_operand" "r")
+(match_operand 3 "" 
"")) 0))
+ (match_operand:X 1 "register_operand" "r")))]
   "TARGET_ZBS"
   "bset\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
@@ -655,24 +655,24 @@ (define_insn "*bset_1_mask"
   "bset\t%0,x0,%1"
   [(set_attr "type" "bitmanip")])
 
-(define_insn "*bseti"
+(define_insn "*i"
   [(set (match_operand:X 0 "register_operand" "=r")
-   (ior:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "single_bit_mask_operand" "DbS")))]
+   (any_or:X (match_operand:X 1 "register_operand" "r")
+ (match_operand:X 2 "single_bit_mask_operand" "DbS")))]
   "TARGET_ZBS"
-  "bseti\t%0,%1,%S2"
+  "i\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
 ;; As long as the SImode operand is not a partial subreg, we can use a
 ;; bseti without postprocessing, as the middle end is smart enough to
 ;; stay away from the signbit.
-(define_insn "*bsetidisi"
+(define_insn "*idisi"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (ior:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
-   (match_operand 2 "single_bit_mask_operand" "i")))]
+   (any_or:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
+  (match_operand 2 "single_bit_mask_operand" "i")))]
   "TARGET_ZBS && TARGET_64BIT
&& !partial_subreg_p (operands[1])"
-  "bseti\t%0,%1,%S2"
+  "i\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
 ;; We can easily handle zero extensions
@@ -781,23 +781,6 @@ (define_split
  (and:DI (rotate:DI (const_int -2) (match_dup 1))
  (match_dup 3)))])
 
-(define_insn "*binv"
-  [(set (match_operand:X 0 "register_operand" "=r")
-   (xor:X (ashift:X (const_int 1)
-(match_operand:QI 2 "register_operand" "r"))
-  (match_operand:X 1 "register_operand" "r")))]
-  "TARGET_ZBS"
-  "binv\t%0,%1,%2"
-  [(set_attr "type" "bitmanip")])
-
-(define_insn "*binvi"
-  [(set (match_operand:X 0 "register_operand" "=r")
-   (xor:X (match_operand:X 1 "register_operand" "r")
-  (match_operand:X 2 "single_bit_mask_operand" "DbS")))]
-  "TARGET_ZBS"
-  "binvi\t%0,%1,%S2"
-  [(set_attr "type" "bitmanip")])
-
 (define_insn "*bext"
   [(set (match_operand:X 0 "register_operand" "=r")
(zero_extract:X (match_operand:X 1 "register_operand" "r")
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 1e37e843023..20745faa55e 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -275,6 +275,9 @@ (define_code_attr optab [(ashift "ashl")
 (fix "fix_trunc")
 (unsigned_fix "fixuns_trunc")])
 
+(define_code_attr bit_optab [(ior "bset")
+(xor "binv")])
+
 ;;  code attributes
 (define_code_attr or_optab [(ior "ior")
(xor "xor")])


Re: [PATCH v2 1/2] libstdc++: Handle extended alignment in std::get_temporary_buffer [PR105258]

2024-06-18 Thread Jonathan Wakely
On Tue, 18 Jun 2024 at 19:05, Stephan Bergmann  wrote:
>
> On 6/3/24 22:22, Jonathan Wakely wrote:
> > Pushed to trunk now.
>
> Just a heads-up that this started to cause Clang (at least 18/19) to
> emit a -Wdeprecated-declarations now,

Yes, I saw this too.

> (There already is another such pragma diagnostic ignored a bit further
> down in that file, so I assume that's the way to go there?)

Yes, the call to get_temporary_buffer used to be in that function
further down in the file. I moved it elsewhere in the file, but didn't
move the pragma to go with it, oops.

So we can remove those later pragmas, and move them earlier in the
file. I'm already testing a patch locally.


[PATCH] libcpp: Add support for gnu::offset #embed/__has_embed parameter

2024-06-18 Thread Jakub Jelinek
Hi!

The following patch adds on top of the just posted #embed patch
a first extension, gnu::offset which allows to seek in the data
file (for seekable files, otherwise read and throw away).
I think this is useful e.g. when some binary data start with
some well known header which shouldn't be included in the data etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-18  Jakub Jelinek  

libcpp/
* internal.h (struct cpp_embed_params): Add offset member.
* directives.cc (_cpp_parse_embed_params): Parse gnu::offset
parameter.
* files.cc (struct _cpp_file): Add offset member.
(_cpp_stack_embed): Handle params->offset.
gcc/
* doc/cpp.texi (Binary Resource Inclusion): Document gnu::offset
#embed parameter.
gcc/testsuite/
* c-c++-common/cpp/embed-15.c: New test.
* c-c++-common/cpp/embed-16.c: New test.
* gcc.dg/cpp/embed-5.c: New test.

--- libcpp/internal.h.jj2024-06-18 08:37:55.759488622 +0200
+++ libcpp/internal.h   2024-06-18 08:38:39.355915868 +0200
@@ -630,7 +630,7 @@ struct cpp_embed_params
 {
   location_t loc;
   bool has_embed;
-  cpp_num_part limit;
+  cpp_num_part limit, offset;
   cpp_embed_params_tokens prefix, suffix, if_empty;
 };
 
--- libcpp/directives.cc.jj 2024-06-18 08:37:55.767488517 +0200
+++ libcpp/directives.cc2024-06-18 11:32:05.073768407 +0200
@@ -1102,6 +1102,21 @@ _cpp_parse_embed_params (cpp_reader *pfi
break;
  }
}
+  else if (param_prefix_len == 3 && memcmp (param_prefix, "gnu", 3) == 0)
+   {
+ struct { int len; const char *name; } gnu_params[] = {
+   { 6, "offset" }
+ };
+ for (size_t i = 0;
+  i < sizeof (gnu_params) / sizeof (gnu_params[0]); ++i)
+   if (param_name_len == gnu_params[i].len
+   && memcmp (param_name, gnu_params[i].name,
+  param_name_len) == 0)
+ {
+   param_kind = 4 + i;
+   break;
+ }
+   }
   if (param_kind != (size_t) -1)
{
  if ((seen & (1 << param_kind)) == 0)
@@ -1130,11 +1145,21 @@ _cpp_parse_embed_params (cpp_reader *pfi
   if (param_kind != (size_t) -1 && token->type != CPP_OPEN_PAREN)
cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
 "expected '('");
-  else if (param_kind == 0)
+  else if (param_kind == 0 || param_kind == 4)
{
  if (params->has_embed && pfile->op_stack == NULL)
_cpp_expand_op_stack (pfile);
- params->limit = _cpp_parse_expr (pfile, "#embed", token);
+ cpp_num_part res = _cpp_parse_expr (pfile, "#embed", token);
+ if (param_kind == 0)
+   params->limit = res;
+ else
+   {
+ if (res > INTTYPE_MAXIMUM (off_t))
+   cpp_error_with_line (pfile, CPP_DL_ERROR, loc, 0,
+"too large 'gnu::offset' argument");
+ else
+   params->offset = res;
+   }
  token = _cpp_get_token_no_padding (pfile);
}
   else if (token->type == CPP_OPEN_PAREN)
--- libcpp/files.cc.jj  2024-06-18 11:24:24.598781748 +0200
+++ libcpp/files.cc 2024-06-18 11:31:42.238066309 +0200
@@ -90,6 +90,9 @@ struct _cpp_file
   /* Size for #embed, perhaps smaller than st.st_size.  */
   size_t limit;
 
+  /* Offset for #embed.  */
+  off_t offset;
+
   /* File descriptor.  Invalid if -1, otherwise open.  */
   int fd;
 
@@ -1243,8 +1246,11 @@ _cpp_stack_embed (cpp_reader *pfile, con
   _cpp_file *orig_file = file;
   if (file->buffer_valid
   && (!S_ISREG (file->st.st_mode)
- || (file->limit < file->st.st_size + (size_t) 0
- && file->limit < params->limit)))
+ || file->offset + (cpp_num_part) 0 > params->offset
+ || (file->limit < file->st.st_size - file->offset + (size_t) 0
+ && (params->offset - file->offset > (cpp_num_part) file->limit
+ || file->limit - (params->offset
+   - file->offset) < params->limit
 {
   bool found = false;
   if (S_ISREG (file->st.st_mode))
@@ -1257,8 +1263,13 @@ _cpp_stack_embed (cpp_reader *pfile, con
 && strcmp (file->path, file->next_file->path) == 0)
{
  file = file->next_file;
- if (file->limit >= file->st.st_size + (size_t) 0
- || file->limit >= params->limit)
+ if (file->offset + (cpp_num_part) 0 <= params->offset
+ && (file->limit >= (file->st.st_size - file->offset
+ + (size_t) 0)
+ || (params->offset
+ - file->offset <= (cpp_num_part) file->limit
+ && file->limit - (params->offset
+   - file->offset) >= params->limit)))
  

[PATCH] libcpp, c-family, v2: Add (dumb) C23 N3017 #embed support [PR105863]

2024-06-18 Thread Jakub Jelinek
Hi!

Here is an updated patch.  It fixes one-liner in files.cc (|| instead of
&&), fixes -fdirectives-only preprocessing of #embed (it isn't 100% in the
spirit of -fdirectives-only mode, because for the tokens from
prefix/suffix/if_empty clauses it has to actually preprocess them and can't
leave them as is, while
#define FOO a + b + c +
#embed __FILE__ prefix(FOO) limit(2)
perhaps could be somehow in theory preprocessed into
 FOO 35,100
#define BAR a + b + c + ) suffix (d
#embed __FILE__ prefix(BAR) limit(2)
can't, so this patch emits
 a + b + c + 35,100
and
  a + b + c + 35,100 d
there even in -fdirectives-only mode.  And for -traditional-cpp, the patch
just errors out, I'm afraid #embed is quite incompatible with the
traditional preprocessing and people using traditional preprocessing IMNSHO
shouldn't be encountering #embed nor __has_embed.
Additionally, I've added documentation for the directive and operators.
While initially I thought that is not needed because C23 documents that,
I've noticed we actually document how #include/#define etc. behaves even
when the standards define that too, it is useful to explain what search
paths are searched and we need some spot where to document extensions (the
gnu:: namespace parameters).

What follows is the previous description:
The following patch implements the C23 N3017 "#embed - a scannable,
tooling-friendly binary resource inclusion mechanism" paper.

The implementation is intentionally dumb, in that it doesn't significantly
speed up compilation of larger initializers and doesn't make it possible
to use huge #embeds (like several gigabytes large, that is compile time
and memory still infeasible).
There are 2 reasons for this.  One is that I think like it is implemented
now in the patch is how we should use it for the smaller #embed sizes,
dunno with which boundary, whether 32 bytes or 64 or something like that,
certainly handling the single byte cases which is something that can appear
anywhere in the source where constant integer literal can appear is
desirable and I think for a few bytes it isn't worth it to come up with
something smarter and users would like to e.g. see it in -E readably as
well (perhaps the slow vs. fast boundary should be determined by command
line option).  And the other one is to be able to more easily find
regressions in behavior caused by the optimizations, so we have something
to get back in git to compare against.
I'm definitely willing to work on the optimizations (likely introduce a new
CPP_* token type to refer to a range of libcpp owned memory (start + size)
and similarly some tree which can do the same, and can be at any time e.g.
split into 2 subparts + say INTEGER_CST in between if needed say for
const unsigned char d[] = {
#embed "2GB.dat" prefix (0, 0, ) suffix (, [0x4000] = 42)
}; still without having to copy around huge amounts of data; STRING_CST
owns the memory it points to and can be only 2GB in size), but would
like to do that incrementally.
And would like to first include some extensions also not included in
this patch, like gnu::offset (off) parameter to allow to skip certain
constant amount of bytes at the start of the files, plus
gnu::base64 ("base64_encoded_data") parameter to add something which can
store more efficiently large amounts of the #embed data in preprocessed
source.

I've been cross-checking all the tests also against the LLVM implementation
https://github.com/llvm/llvm-project/pull/68620
which has been for a few hours even committed to LLVM trunk but reverted
afterwards.

The patch uses --embed-dir= option that clang plans to add above and doesn't
use other variants on the search directories yet, plus there are no
default directories at least for the time being where to search for embed
files.  So, #embed "..." works if it is found in the same directory (or
relative to the current file's directory) and #embed "/..." or #embed 
work always, but relative #embed <...> doesn't unless at least one
--embed-dir= is specified.  There is no reason to differentiate between
system and non-system directories, so we don't need -isystem like
counterpart, perhaps -iquote like counterpart could be useful in the future,
dunno what else.  Should we have --embed-directory= as alias to --embed-dir=?

There are some differences beyond clang ICEs, so I'd like to point them out
to make sure there is agreement on the choices in the patch.  They are also
mentioned in the comments of the llvm pull request.

The most important is that the GCC patch (as well as the original thephd.dev
LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into
a mere sequence of numbers like 123,2,35,26 rather then what clang
effectively treats as (unsigned char)123,(unsigned char)2,(unsigned
char)35,(unsigned char)26 but only does that when using integrated
preprocessor, not when using -save-temps where it acts as GCC.
JeanHeyd as the original author agrees that is how it is currently worded in
C23.

Another difference 

Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2024-06-18 Thread Peter Bergner
On 6/12/24 2:50 AM, Kewen.Lin wrote:
> As the recent PR115355 shows, this issue can also affect the
> behavior when users are adopting vectorization optimization,
> IMHO we should get this landed as soon as possible.

I agree we want this fixed ASAP.




> As all said above, I believe this patch is a correct fix and
> considering the impact of the issue, I'd like to get this
> pushed next week if no objections.

The only complaint I have on the patch, and I know this existed before
the patch, is we're using register_operand for the predicate for these
patterns when we probably should be using altivec_register_operand or
vsx_register_operand depending on the specific pattern.

Yes, other pre-existing patterns use that, but those should probably be
fixed too.  Maybe we go with register_operand for now with this patch
and then have a follow-on patch (from us) that cleans those all up???

Otherwise, LGTM (although I can't approve it).

Peter




assumed size

2024-06-18 Thread Martin Uecker


Hi all,

I am working paper for the following syntax extension

int a[10];
int (*a)[*] = 


This would not be a wide pointer, it will just initialize
the size of the type from the initializer.  This would
also work for VM type.  So the result is a conventional
pointer to an arrays and either a regular or a variably
modified type.

I am not so sure how to best integrate it.  Maybe we
could just say the type becomes the composite type.


Martin


[PATCH] libstdc++: testsuite: Skip atomics test if there's no -latomic

2024-06-18 Thread Thiago Jung Bauermann
On arm-none-eabi, 29_atomics/atomic_float/compare_exchange_padding.cc
fails to build:

FAIL: 29_atomics/atomic_float/compare_exchange_padding.cc  -std=gnu++20 (test 
for excess errors)
Excess errors:
/home/bauermann/.cache/builds/combined-tree-thumb-m55-hard-eabi/ld/.libs/ld-new:
 cannot find -latomic: No such file or directory
collect2: error: ld returned 1 exit status

UNRESOLVED: 29_atomics/atomic_float/compare_exchange_padding.cc  -std=gnu++20 
compilation failed to produce executable

This test should be skipped if libatomic is not available for the
target.  To that end, add dg-require-libatomic-available and use it in
29_atomics/atomic_float/compare_exchange_padding.cc.

Also, check_effective_target_libatomic_available is fixed to use
atomic_link_flags to properly compile the test executable.

Tested on:
- Host x86_64-linux-gnu, target arm-unknown-eabi
- Native aarch64-linux-gnu
- Native x86_64-linux-gnu

gcc/testsuite/
* lib/target-supports-dg.exp (dg-require-libatomic-available):
New procedure.
* lib/target-supports.exp (check_effective_target_libatomic_available):
Use atomic_link_flags.

libstdc++-v3/
* testsuite/29_atomics/atomic_float/compare_exchange_padding.cc:
Use dg-require-libatomic-available.
---
 gcc/testsuite/lib/target-supports-dg.exp | 9 +
 gcc/testsuite/lib/target-supports.exp| 2 +-
 .../29_atomics/atomic_float/compare_exchange_padding.cc  | 1 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 6dce9fdc1ce2..502e4e22b368 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -698,3 +698,12 @@ proc dg-require-prog-name-available { args } {
 }
 }
 
+# If the atomic library is supported on this target, skip this test.
+
+proc dg-require-libatomic-available { args } {
+set libatomic_available [check_effective_target_libatomic_available]
+if { $libatomic_available == 0 } {
+   upvar dg-do-what dg-do-what
+   set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"]
+}
+}
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index e307f4e69efb..de27297c1787 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1662,7 +1662,7 @@ proc check_iconv_available { test_what } {
 proc check_effective_target_libatomic_available { } {
 return [check_no_compiler_messages libatomic_available executable {
int main (void) { return 0; }
-} "-latomic"]
+} "[atomic_link_flags [get_multilibs]] -latomic"]
 }
 
 # Return 1 if an ASCII locale is supported on this host, 0 otherwise.
diff --git 
a/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
index 49626ac66511..351244b25279 100644
--- a/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/compare_exchange_padding.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++20 } }
+// { dg-require-libatomic-available "" }
 // { dg-options "-O0" }
 // { dg-additional-options "[atomic_link_flags [get_multilibs]] -latomic" }
 


Re: [PATCH 6/7] diagnostics: eliminate diagnostic_context::m_print_path callback

2024-06-18 Thread David Malcolm
On Tue, 2024-06-18 at 11:08 -0400, David Malcolm wrote:
> No functional change intended.

Sorry, it looks like I should have combined patches 6 and 7, in that
patch 6 temporarily breaks the build:

e.g.:
  https://builder.sourceware.org/buildbot/#/builders/156/builds/10063

make[2]: Leaving directory 
'/home/mjw/wildebeest/buildbot/gcc-fedora-ppc64le/gcc-build/gcc'
g++ -no-pie   -g -O2 -DIN_GCC-fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings   -DHAVE_CONFIG_H -no-pie  gcov-dump.o \
hash-table.o ggc-none.o\
libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a 
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a  -o gcov-dump
/usr/bin/ld: libcommon.a(diagnostic.o): in function 
`diagnostic_context::show_any_path(diagnostic_info const&)':
/home/mjw/wildebeest/buildbot/gcc-fedora-ppc64le/gcc-build/gcc/../../gcc/gcc/diagnostic.cc:918:(.text+0x2084):
 undefined reference to `diagnostic_context::print_path(diagnostic_path const*)'
/usr/bin/ld: 
/home/mjw/wildebeest/buildbot/gcc-fedora-ppc64le/gcc-build/gcc/../../gcc/gcc/diagnostic.cc:918:(.text+0x4a34):
 undefined reference to `diagnostic_context::print_path(diagnostic_path const*)'
collect2: error: ld returned 1 exit status

Should be fixed by patch 7 in the kit, which puts that symbol in
libcommon.a; am continuing to watch the post-commit CI.

Sorry again
Dave



Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-18 Thread Carl Love
Kewen, Peter, Segher:

On 6/17/24 19:56, Kewen.Lin wrote:
> Hi,
> 
> on 2024/6/18 00:08, Peter Bergner wrote:
>> On 6/14/24 1:37 PM, Carl Love wrote:
>>> Per the additional feedback after patch: 
>>>
>>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>>   Author: Carl Love 
>>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>>
>>>   rs6000, altivec-2-runnable.c should be a runnable test
>>> 
>>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>>   test.  This patch changes the dg-do command argument to run.
>>> 
>>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>>   argument to run.
>>
>> Test case altivec-1-runnable.c seems to have the same issue, in that it
>> is currently a dg-do compile test case rather than the intended dg-do run.
> 
> Good catch!

OK, will update that as well.  I think it will need the same header as 
altivec-2-runnable.c
so once we have a final change for altivec-2-runnable.c, I will make the header 
for
altivec-1-runnable.c be the same.

> 
>> Can you have a look at changing that to dg-do run too?  My guess it that
>> this one will want something similar to some other altivec test cases, ala:
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -maltivec -mabi=altivec" } */
> 
> I'd expect the "-runnable" test case focuses on testing for run.  Normally,
> the one without "-runnable" would focus on testing for compiling (scan some
> desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test
> for different things, maybe we should separate them into different names
> if they don't test for a same test point.

The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various
built-ins that didn't have any test cases.  There wasn't an intention that 
there was 
any connection to the existing altivec-*.c test files.  I started creating 
runnable
when I started adding support for built-ins that we claimed to support but had 
never
actually been implemented.  I created runnable tests to make sure my 
implementation
actually worked.  I continued to add runnable tests for built-ins
that existed but didn't have a test case.  Adding runnable tests did find a 
couple
of issues where the existing implementation had a bug.  

That all said, if we want tochange the name of altivec-1-runnable.c and 
altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps we 
should 
finish fixing the header for this test file, then do altivec-1-runnable, and 
then 
a final patch that does all the file renaming?

> 
>>
>> That said, I don't like not having a -mdejagnu-cpu=... here.
>> I think for our server cpus, this is fine, but on an embedded system
>> with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
>> just adding -maltivec to that default may not make much sense for that
>> default and probably should be an error.  Maybe something like:
> 
> Yes, for some embedded cpus, there will be some error messages, but since
> we have powerpc_altivec_ok effective target, the error would make that
> effective target checking fail so I'd expect it'll stop it being tested
> (unsupported).
> 
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -mdejagnu=power7" } */
>>
>> ...makes more sense?   Ke Wen & Segher, thoughts on that?
>> Ke Wen, should powerpc_altivec_ok be powerpc_altivec here???
> 
> Yes, I just pushed r15-1390 for this change.
> 
> BR,
> Kewen
> 

We had -mdejagnu=power8 before, but it looks like we want to go to power7 now.

It sounds like we want the following:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -mdejagnu=power7" } */
/* { dg-require-effective-target powerpc_altivec } */

 Carl 


Re: [C PATCH] Fix ICE related to incomplete structures in C23 [PR114930,PR115502].

2024-06-18 Thread Martin Uecker
Am Dienstag, dem 18.06.2024 um 17:27 +0200 schrieb Richard Biener:
> 
> > Am 18.06.2024 um 17:20 schrieb Martin Uecker :
> > 
> > 
> > As discussed this replaces the use of check_qualified_type with
> > a simple check for qualifiers as suggested by Jakub in
> > c_update_type_canonical.
> 
> Note a canonical type should always be unqualified (for
> classical qualifiers, not address space or atomic qualification)

The logic in build_qualified_type is the same as in this patch,
it constructs TYPE_CANONICAL with qualifiers.  Or what am I
missing?

Martin

> 
> Richard 
> 
> > Martin
> > 
> > 
> > Bootstrapped and regression tested on x86_64.
> > 
> > 
> >C23: Fix ICE related to incomplete structures [PR114930,PR115502].
> > 
> >The fix for PR114574 needs to be further revised because 
> > check_qualified_type
> >makes decision based on TYPE_NAME which can be incorrect for C when there
> >are TYPE_DECLS involved.
> > 
> >gcc/c/:
> >* c-decl.c (c_update_type_canonical): Do not use 
> > check_qualified_type.
> > 
> >gcc/testsuite/:
> >* gcc.dg/pr114930.c: New test.
> >* gcc.dg/pr115502.c: New test.
> > 
> > diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> > index 01326570e2b..610061a07f8 100644
> > --- a/gcc/c/c-decl.cc
> > +++ b/gcc/c/c-decl.cc
> > @@ -9374,7 +9374,7 @@ c_update_type_canonical (tree t)
> >  if (TYPE_QUALS (x) == TYPE_QUALS (t))
> >TYPE_CANONICAL (x) = TYPE_CANONICAL (t);
> >  else if (TYPE_CANONICAL (t) != t
> > -   || check_qualified_type (x, t, TYPE_QUALS (x)))
> > +   || TYPE_QUALS (x) != TYPE_QUALS (TYPE_CANONICAL (t)))
> >TYPE_CANONICAL (x)
> >  = build_qualified_type (TYPE_CANONICAL (t), TYPE_QUALS (x));
> >  else
> > diff --git a/gcc/testsuite/gcc.dg/pr114930.c 
> > b/gcc/testsuite/gcc.dg/pr114930.c
> > new file mode 100644
> > index 000..5e982fb8929
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr114930.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile }
> > + * { dg-options "-std=c23 -flto" } */
> > +
> > +typedef struct WebPPicture WebPPicture;
> > +typedef int (*WebPProgressHook)(const WebPPicture *);
> > +WebPProgressHook progress_hook;
> > +struct WebPPicture {
> > +} WebPGetColorPalette(const struct WebPPicture *);
> > +
> > diff --git a/gcc/testsuite/gcc.dg/pr115502.c 
> > b/gcc/testsuite/gcc.dg/pr115502.c
> > new file mode 100644
> > index 000..02b52622c5a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr115502.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile }
> > + * { dg-options "-std=c23 -flto" } */
> > +
> > +typedef struct _OSet OSet;
> > +typedef OSet AvlTree;
> > +void vgPlain_OSetGen_Lookup(const OSet *);
> > +struct _OSet {};
> > +void vgPlain_OSetGen_Lookup(const AvlTree *);
> > +
> > 



[PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-18 Thread Siarhei Volkau
If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

Changes v1 -> v2:
 - switching to peephole2 approach
 - added test case

gcc/ChangeLog:

* config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
(peephole2 to rewrite DI/DF store): New.
(thumb1_movdi_insn): Handle overlapped regs ldmia case.
(thumb_movdf_insn): Likewise.

* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

* gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau 
---
 gcc/config/arm/iterators.md   |  3 +++
 gcc/config/arm/thumb1.md  | 27 ++-
 .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 8d066fcf05d..09046bff83b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
 ;; A list of the 32bit and 64bit integer modes
 (define_mode_iterator SIDI [SI DI])
 
+;; A list of the 64bit modes for thumb1.
+(define_mode_iterator DIDF [DI DF])
+
 ;; A list of atomic compare and swap success return modes
 (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
 
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index d7074b43f60..ed4b706773a 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
   gcc_assert (TARGET_HAVE_MOVT);
   return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
 case 4:
+  if (reg_overlap_mentioned_p (operands[0], operands[1]))
+   return \"ldmia\\t%m1, {%0, %H0}\";
   return \"ldmia\\t%1, {%0, %H0}\";
 case 5:
   return \"stmia\\t%0, {%1, %H1}\";
@@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
   return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
 case 1:
+  if (reg_overlap_mentioned_p (operands[0], operands[1]))
+   return \"ldmia\\t%m1, {%0, %H0}\";
   return \"ldmia\\t%1, {%0, %H0}\";
 case 2:
   return \"stmia\\t%0, {%1, %H1}\";
@@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
(set_attr "conds" "clob")
(set_attr "type" "multiple")]
 )
-
+
+;; match patterns usable by ldmia/stmia
+(define_peephole2
+  [(set (match_operand:DIDF 0 "low_register_operand" "")
+   (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
+  "TARGET_THUMB1
+   && (peep2_reg_dead_p (1, operands[1])
+   || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
+  [(set (match_dup 0)
+   (mem:DIDF (post_inc:SI (match_dup 1]
+  ""
+)
+
+(define_peephole2
+  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
+   (match_operand:DIDF 0 "low_register_operand" ""))]
+  "TARGET_THUMB1
+   && peep2_reg_dead_p (1, operands[1])"
+  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
+   (match_dup 0))]
+  ""
+)
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c 
b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
new file mode 100644
index 000..167fa9ec876
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mthumb -Os" }  */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void copy_df(double *dst, const double *src)
+{
+*dst = *src;
+}
+
+void copy_di(unsigned long long *dst, const unsigned long long *src)
+{
+*dst = *src;
+}
+
+/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
+/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
-- 
2.45.2



[committed] [RISC-V] Fix wrong patch application

2024-06-18 Thread Jeff Law


Applied the wrong patch which didn't have the final testsuite adjustment 
to skip -Os on the new test.  Fixed thusly.


Pushed to the trunk.

Jeff

commit cbf7245c8b305fe997a535051a4fec379a429243
Author: Jeff Law 
Date:   Tue Jun 18 12:10:57 2024 -0600

[committed] [RISC-V] Fix wrong patch application

Applied the wrong patch which didn't have the final testsuite adjustment to
skip -Os on the new test.  Fixed thusly.

Pushed to the trunk.

gcc/testsuite
* gcc.target/riscv/zbs-ext-2.c: Do not run for -Os.

diff --git a/gcc/testsuite/gcc.target/riscv/zbs-ext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-ext-2.c
index 301bc9d89c4..690dd722bce 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-ext-2.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-ext-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
-/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" } } */
 
 
 typedef unsigned int uint32_t;


Re: [PATCH v2 1/2] libstdc++: Handle extended alignment in std::get_temporary_buffer [PR105258]

2024-06-18 Thread Stephan Bergmann

On 6/3/24 22:22, Jonathan Wakely wrote:

Pushed to trunk now.


Just a heads-up that this started to cause Clang (at least 18/19) to 
emit a -Wdeprecated-declarations now,



$ cat test.cc
#include 
void f(int * p1, int * p2) { std::stable_sort(p1, p2); }



$ clang++ 
--gcc-install-dir=/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0 
-fsyntax-only test.cc
In file included from test.cc:1:
In file included from 
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/algorithm:61:
In file included from 
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_algo.h:69:
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_tempbuf.h:207:11:
 warning: 'get_temporary_buffer' is deprecated [-Wdeprecated-declarations]
  207 | std::get_temporary_buffer(__original_len));
  |  ^
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_tempbuf.h:323:40:
 note: in instantiation of member function 
'std::_Temporary_buffer<__gnu_cxx::__normal_iterator>, 
int>::_Impl::_Impl' requested here
  323 | : _M_original_len(__original_len), _M_impl(__original_len)
  |^
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_algo.h:4948:15:
 note: in instantiation of member function 
'std::_Temporary_buffer<__gnu_cxx::__normal_iterator>, 
int>::_Temporary_buffer' requested here
 4948 |   _TmpBuf __buf(__first, (__last - __first + 1) / 2);
  |   ^
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_algo.h:4993:23:
 note: in instantiation of function template specialization 
'std::__stable_sort<__gnu_cxx::__normal_iterator>, 
__gnu_cxx::__ops::_Iter_less_iter>' requested here
 4993 |   _GLIBCXX_STD_A::__stable_sort(__first, __last,
  |   ^
test.cc:3:37: note: in instantiation of function template specialization 
'std::stable_sort<__gnu_cxx::__normal_iterator>>' 
requested here
3 | void f(std::vector & v) { std::stable_sort(v.begin(), v.end()); }
  | ^
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/bits/stl_tempbuf.h:141:5:
 note: 'get_temporary_buffer' has been explicitly marked deprecated here
  141 | _GLIBCXX17_DEPRECATED
  | ^
/home/sberg/gcc/inst/lib/gcc/x86_64-pc-linux-gnu/15.0.0/../../../../include/c++/15.0.0/x86_64-pc-linux-gnu/bits/c++config.h:123:34:
 note: expanded from macro '_GLIBCXX17_DEPRECATED'
  123 | # define _GLIBCXX17_DEPRECATED [[__deprecated__]]
  |  ^
1 warning generated.


which could be silenced with


--- a/libstdc++-v3/include/bits/stl_tempbuf.h
+++ b/libstdc++-v3/include/bits/stl_tempbuf.h
@@ -203,8 +203,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
explicit
_Impl(ptrdiff_t __original_len)
{
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
  pair __p(
std::get_temporary_buffer(__original_len));
+#pragma GCC diagnostic pop
  _M_len = __p.second;
  _M_buffer = __p.first;
}


(There already is another such pragma diagnostic ignored a bit further 
down in that file, so I assume that's the way to go there?)




Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-18 Thread Peter Bergner
On 6/18/24 8:20 AM, Segher Boessenkool wrote:
> On Mon, Jun 17, 2024 at 08:54:46PM -0500, Peter Bergner wrote:
>> So we should be able to shrink-wrap in the presence of the ROP protection.
[snip]
> But do we want to?  And, how far, in what cases not?

My answer to the above would be "yes", "as far as we do today without
-mrop-protect" and "none". :-)  I don't think -mrop-protect should affect
whether we shrink-wrap or not.  I don't think shrink-wrapping call free
paths makes the compiled code less secure by not emitting the hashst/hashchk
insns on those paths, so why would we do anything different wrt shrink-wrapping?


Peter



Re: [pushed] readings: Drop FORTRAN 77 test suite at itl.nist.gov

2024-06-18 Thread Steve Kargl
On Tue, Jun 18, 2024 at 09:13:23AM +0200, Gerald Pfeifer wrote:
> The original subsite has disappeared and we couldn't find it elsewhere.
> 

https://github.com/gklimowicz/FCVS

gklimowicz is a flang developer and member of J3.

-- 
Steve


Re: [Patch, rs6000, middle-end] v2: Add implementation for different targets for pair mem fusion

2024-06-18 Thread Ajit Agarwal
Hello Richard:

On 14/06/24 4:26 pm, Richard Sandiford wrote:
> Ajit Agarwal  writes:
>> Hello Richard:
>>
>> All comments are addressed.
> 
> I don't think this addresses the following comments from the previous
> reviews:
> 
> (1) It is not correct to mark existing insn uses as live-out.
> The patch mustn't try to do this.
> 

Addressed in v3 of the patch.

> (2) To quote a previous review:
> 
> It's probably better to create a fresh OO register, rather than
> change an existing 128-bit register to 256 bits.  If we do that,
> and if reg:V16QI 125 is the destination of the second load
> (which I assume it is from the 16 offset in the subreg),
> then the new RTL should be:
> 
>   (vec_select:HI (subreg:V8HI (reg:OO NEW_REG) 16) ...)
> 
> It's possible to get this by using insn_propagation to replace
> (reg:V16QI 125) with (subreg:V16QI (reg:OO NEW_REG) 16).
> insn_propagation should then take care of the rest.
> 
> There are no existing rtl-ssa routines for handling new registers
> though.  (The idea was to add things as the need arose.)
> 
> The reason for (2) is that changing the mode of an existing pseudo
> invalidates all existing references to that pseudo.  Although the
> patch tries to fix things up, it's doing that at a stage where
> there is already "garbage in" (in the sense that the starting
> RTL is invalid).  Just changing the mode would also invalidate
> things like REG_EXPR, for example.
> 
> In contrast, the advantage of creating a new pseudo means that every
> insn transformation is from structurally valid RTL to structurally
> valid RTL.  It also prevents information being incorrectly carried
> over from the old pseudo.
> x

Addressed in v3 of the patch.

> Thanks,
> Richard

Thanks & Regards
Ajit


[Patch, rs6000, middle-end] v3: Add implementation for different targets for pair mem fusion

2024-06-18 Thread Ajit Agarwal
Hello Richard:

All comments are addressed.

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000, middle-end: Add implementation for different targets for pair mem fusion

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

2024-06-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* pair-fusion.h: Add additional pure virtual function
required for rs6000 target implementation.
* pair-fusion.cc: Use of virtual functions for additional
virtual function addded for rs6000 target.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation for generic pure virtual
functions.
* config/rs6000/mma.md: Modify movoo machine description.
Add new machine description movoo1.
* config/rs6000/rs6000.cc: Modify rs6000_split_multireg_move
to expand movoo machine description for all constraints.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/changes.cc: Add new is_live_out function and use of
same.
* rtl-ssa/functions.h: Move out allocate function from private
to public and add get_m_temp_defs function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mem-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/mma.md  |  26 +-
 gcc/config/rs6000/rs6000-mem-fusion.cc| 731 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |  55 +-
 gcc/config/rs6000/rs6000.md   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/pair-fusion.cc|  26 +-
 gcc/pair-fusion.h |  28 +
 gcc/rtl-ssa/changes.cc|  34 +-
 gcc/rtl-ssa/functions.h   |   7 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 15 files changed, 935 insertions(+), 26 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e500ba63e32..348308b2e93 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -524,6 +524,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -560,6 +561,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 04e2d0066df..88413926a02 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -294,7 +294,31 @@
 
 (define_insn_and_split "*movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
-   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+(match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_reg_operand (operands[0], OOmode)
+   || gpc_reg_operand (operands[1], OOmode))"
+;;""
+  "@
+   #
+   #
+   #"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecstore,veclogical")
+   

Re: [RFC v3] RISC-V: Promote Zaamo/Zalrsc to a when using an old binutils

2024-06-18 Thread Patrick O'Neill

Ah that makes sense. We discussed it a bit during the patchworks
meeting - I'll drop the other changes and add it to riscv_combine_info.

Thanks,
Patrick


On 6/17/24 22:45, Kito Cheng wrote:

When 'a' is put into riscv_combine_info, 'a' will only be added into
arch string only if zaamo *AND* zalrsc is there, so zalrsc only won't
trigger that.

On Tue, Jun 18, 2024 at 1:35 PM Patrick O'Neill  wrote:



On Mon, Jun 17, 2024 at 5:51 PM Kito Cheng  wrote:

Maybe just add 'a' to riscv_combine_info and other logic to keep the
same (e.g. keep the logic for skip_zaamo_zalrsc)?


I did consider unconditionally upgrading zaamo/zalrsc to ‘a’ (I think that’s 
what you’re suggesting w/ riscv_combine_info).
That could cause issues if users are trying to compile for a zalrsc-only chip with 
an old version of binutils. If we upgrade zalrsc -> ‘a’ for both cc1 and 
binutils then cc1 will emit amo ops instead of their lr/sc equivalent.
GCC would end up emitting insns that are illegal for the user-provided -march 
string.

Patrick


On Tue, Jun 18, 2024 at 8:03 AM Patrick O'Neill  wrote:

Binutils 2.42 and before don't support Zaamo/Zalrsc. Promote Zaamo/Zalrsc to
'a' in the -march string when assembling.

This change respects Zaamo/Zalrsc when generating code.

Testcases that check for the default isa string will fail with the old binutils
since zaamo/zalrsc aren't emitted anymore. All other Zaamo/Zalrsc testcases
pass.

gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc
 (riscv_subset_list::to_string): Add toggle to promote Zaamo/Zalrsc
 extensions to 'a'.
 (riscv_arch_str): Ditto.
 (riscv_expand_arch): Ditto.
 (riscv_expand_arch_from_cpu): Ditto.
 (riscv_expand_arch_upgrade_exts): New function. Wrapper around
 riscv_expand_arch to preserve the function signature.
 (riscv_expand_arch_no_upgrade_exts): Ditto
 (riscv_expand_arch_from_cpu_upgrade_exts): New function. Wrapper around
 riscv_expand_arch_from_cpu to preserve the function signature.
 (riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
 * config/riscv/riscv-protos.h (riscv_arch_str): Add toggle to function
 prototype.
 * config/riscv/riscv-subset.h: Ditto.
 * config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
 * config/riscv/riscv.cc (riscv_emit_attribute):
 (riscv_declare_function_name):
 * config/riscv/riscv.h (riscv_expand_arch): Remove.
 (riscv_expand_arch_from_cpu): Ditto.
 (riscv_expand_arch_upgrade_exts): Add toggle wrapper functions.
 (riscv_expand_arch_no_upgrade_exts): Ditto.
 (riscv_expand_arch_from_cpu_upgrade_exts): Ditto.
 (riscv_expand_arch_from_cpu_no_upgrade_exts): Ditto.
 (EXTRA_SPEC_FUNCTIONS): Ditto.
 (OPTION_DEFAULT_SPECS): Use non-upgraded march string when invoking the
 compiler.
 (ASM_SPEC): Use upgraded march string when invoking the assembler.

Signed-off-by: Patrick O'Neill 
---
v3 ChangeLog:
Rebased on non-promoting patch.
Wrap all Zaamo/Zalrsc upgrade code in #ifndef to prevent compiler
warnings about unused/potentially undefined variables.
Silence unused parameter warning with a voidcast.
---
RFC since I'm not sure if this upgrade behavior is more trouble than
it's worth - this is a pretty invasive change. Happy to iterate further
or just drop these changes.
---
  gcc/common/config/riscv/riscv-common.cc | 111 +---
  gcc/config/riscv/riscv-protos.h |   3 +-
  gcc/config/riscv/riscv-subset.h |   2 +-
  gcc/config/riscv/riscv-target-attr.cc   |   4 +-
  gcc/config/riscv/riscv.cc   |   7 +-
  gcc/config/riscv/riscv.h|  46 ++
  6 files changed, 137 insertions(+), 36 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 1dc1d9904c7..05c26f73b73 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -907,7 +907,7 @@ riscv_subset_list::add (const char *subset, bool implied_p)
 VERSION_P to determine append version info or not.  */

  std::string
-riscv_subset_list::to_string (bool version_p) const
+riscv_subset_list::to_string (bool version_p, bool upgrade_exts) const
  {
std::ostringstream oss;
oss << "rv" << m_xlen;
@@ -916,10 +916,17 @@ riscv_subset_list::to_string (bool version_p) const
riscv_subset_t *subset;

bool skip_zifencei = false;
-  bool skip_zaamo_zalrsc = false;
bool skip_zicsr = false;
bool i2p0 = false;

+#ifndef HAVE_AS_MARCH_ZAAMO_ZALRSC
+  bool upgrade_zaamo_zalrsc = false;
+  bool has_a_ext = false;
+  bool insert_a_ext = false;
+  bool inserted_a_ext = false;
+  riscv_subset_t *a_subset;
+#endif
+
/* For RISC-V ISA version 2.2 or earlier version, zicsr and zifencei is
   included in the base ISA.  */
if (riscv_isa_spec == ISA_SPEC_CLASS_2P2)
@@ -945,8 +952,33 @@ 

Re: [PATCH v4] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-18 Thread Andrew Pinski
On Mon, Jun 17, 2024 at 11:25 PM Pengxuan Zheng  wrote:
>
> This patch improves GCC’s vectorization of __builtin_popcount for aarch64 
> target
> by adding popcount patterns for vector modes besides QImode, i.e., HImode,
> SImode and DImode.
>
> With this patch, we now generate the following for V8HI:
>   cnt v1.16b, v0.16b
>   uaddlp  v2.8h, v1.16b
>
> For V4HI, we generate:
>   cnt v1.8b, v0.8b
>   uaddlp  v2.4h, v1.8b
>
> For V4SI, we generate:
>   cnt v1.16b, v0.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
>
> For V4SI with TARGET_DOTPROD, we generate the following instead:
>   moviv0.4s, #0
>   moviv1.16b, #1
>   cnt v3.16b, v2.16b
>   udotv0.4s, v3.16b, v1.16b
>
> For V2SI, we generate:
>   cnt v1.8b, v.8b
>   uaddlp  v2.4h, v1.8b
>   uaddlp  v3.2s, v2.4h
>
> For V2SI with TARGET_DOTPROD, we generate the following instead:
>   moviv0.8b, #0
>   moviv1.8b, #1
>   cnt v3.8b, v2.8b
>   udotv0.2s, v3.8b, v1.8b
>
> For V2DI, we generate:
>   cnt v1.16b, v.16b
>   uaddlp  v2.8h, v1.16b
>   uaddlp  v3.4s, v2.8h
>   uaddlp  v4.2d, v3.4s
>
> For V4SI with TARGET_DOTPROD, we generate the following instead:
>   moviv0.4s, #0
>   moviv1.16b, #1
>   cnt v3.16b, v2.16b
>   udotv0.4s, v3.16b, v1.16b
>   uaddlp  v0.2d, v0.4s
>
> PR target/113859
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (aarch64_addlp): Rename 
> to...
> (@aarch64_addlp): ... This.
> (popcount2): New define_expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/popcnt-udot.c: New test.
> * gcc.target/aarch64/popcnt-vec.c: New test.
>
> Signed-off-by: Pengxuan Zheng 
> ---
>  gcc/config/aarch64/aarch64-simd.md| 52 +-
>  .../gcc.target/aarch64/popcnt-udot.c  | 45 
>  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69 +++
>  3 files changed, 165 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 0bb39091a38..3bdd4400408 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -3461,7 +3461,7 @@ (define_insn 
> "*aarch64_addlv_ze"
>[(set_attr "type" "neon_reduc_add")]
>  )
>
> -(define_expand "aarch64_addlp"
> +(define_expand "@aarch64_addlp"
>[(set (match_operand: 0 "register_operand")
> (plus:
>   (vec_select:
> @@ -3517,6 +3517,56 @@ (define_insn "popcount2"
>[(set_attr "type" "neon_cnt")]
>  )
>
> +(define_expand "popcount2"
> +  [(set (match_operand:VDQHSD 0 "register_operand")
> +(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
> +  "TARGET_SIMD"
> +  {
> +/* Generate a byte popcount. */
> +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> +rtx tmp = gen_reg_rtx (mode);
> +auto icode = optab_handler (popcount_optab, mode);
> +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode, operands[1])));
> +
> +if (TARGET_DOTPROD)
> +  {
> +/* For V4SI and V2SI, we can generate a UDOT with a 0 accumulator 
> and a
> +   1 multiplicant. For V2DI, another UAADDLP is needed. */
> +if (mode == V4SImode || mode == V2SImode
> +|| mode == V2DImode)

I think the above simplified/modified to just `mode == SImode ||
mode == DImode`.
Also s/multiplicant/multiplicand/ .

> +  {
> +machine_mode dp_mode =  == 64 ? V2SImode : V4SImode;
> +rtx ones = force_reg (mode, CONST1_RTX (mode));
> +rtx zeros = CONST0_RTX (dp_mode);
> +rtx dp = gen_reg_rtx (dp_mode);
> +auto dp_icode = optab_handler (udot_prod_optab, mode);
> +emit_move_insn (dp, zeros);
> +emit_insn (GEN_FCN (dp_icode) (dp, tmp, ones, dp));
> +if (mode == V2DImode)
> +  {
> +emit_insn (gen_aarch64_uaddlpv4si (operands[0], dp));
> +DONE;
> +  }
> +emit_move_insn (operands[0], dp);
> +DONE;
> +  }
> +  }
> +
> +/* Use a sequence of UADDLPs to accumulate the counts. Each step doubles
> +   the element size and halves the number of elements. */
> +do
> +  {
> +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE (tmp));
> +mode = insn_data[icode].operand[0].mode;
> +rtx dest = mode == mode ? operands[0] : gen_reg_rtx (mode);
> +emit_insn (GEN_FCN (icode) (dest, tmp));
> +tmp = dest;
> +  }
> +while (mode != mode);
> +DONE;
> +  }
> +)
> +
>  ;; 'across lanes' max and min ops.
>
>  ;; Template for outputting a scalar, so we can create __builtins which can be
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt-udot.c
> new file mode 

Re: [C PATCH] Fix ICE related to incomplete structures in C23 [PR114930, PR115502].

2024-06-18 Thread Richard Biener



> Am 18.06.2024 um 17:20 schrieb Martin Uecker :
> 
> 
> As discussed this replaces the use of check_qualified_type with
> a simple check for qualifiers as suggested by Jakub in
> c_update_type_canonical.

Note a canonical type should always be unqualified (for classical qualifiers, 
not address space or atomic qualification)

Richard 

> Martin
> 
> 
> Bootstrapped and regression tested on x86_64.
> 
> 
>C23: Fix ICE related to incomplete structures [PR114930,PR115502].
> 
>The fix for PR114574 needs to be further revised because 
> check_qualified_type
>makes decision based on TYPE_NAME which can be incorrect for C when there
>are TYPE_DECLS involved.
> 
>gcc/c/:
>* c-decl.c (c_update_type_canonical): Do not use 
> check_qualified_type.
> 
>gcc/testsuite/:
>* gcc.dg/pr114930.c: New test.
>* gcc.dg/pr115502.c: New test.
> 
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index 01326570e2b..610061a07f8 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -9374,7 +9374,7 @@ c_update_type_canonical (tree t)
>  if (TYPE_QUALS (x) == TYPE_QUALS (t))
>TYPE_CANONICAL (x) = TYPE_CANONICAL (t);
>  else if (TYPE_CANONICAL (t) != t
> -   || check_qualified_type (x, t, TYPE_QUALS (x)))
> +   || TYPE_QUALS (x) != TYPE_QUALS (TYPE_CANONICAL (t)))
>TYPE_CANONICAL (x)
>  = build_qualified_type (TYPE_CANONICAL (t), TYPE_QUALS (x));
>  else
> diff --git a/gcc/testsuite/gcc.dg/pr114930.c b/gcc/testsuite/gcc.dg/pr114930.c
> new file mode 100644
> index 000..5e982fb8929
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114930.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile }
> + * { dg-options "-std=c23 -flto" } */
> +
> +typedef struct WebPPicture WebPPicture;
> +typedef int (*WebPProgressHook)(const WebPPicture *);
> +WebPProgressHook progress_hook;
> +struct WebPPicture {
> +} WebPGetColorPalette(const struct WebPPicture *);
> +
> diff --git a/gcc/testsuite/gcc.dg/pr115502.c b/gcc/testsuite/gcc.dg/pr115502.c
> new file mode 100644
> index 000..02b52622c5a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr115502.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile }
> + * { dg-options "-std=c23 -flto" } */
> +
> +typedef struct _OSet OSet;
> +typedef OSet AvlTree;
> +void vgPlain_OSetGen_Lookup(const OSet *);
> +struct _OSet {};
> +void vgPlain_OSetGen_Lookup(const AvlTree *);
> +
> 


[C PATCH] Fix ICE related to incomplete structures in C23 [PR114930,PR115502].

2024-06-18 Thread Martin Uecker


As discussed this replaces the use of check_qualified_type with
a simple check for qualifiers as suggested by Jakub in
c_update_type_canonical.

Martin


Bootstrapped and regression tested on x86_64.


C23: Fix ICE related to incomplete structures [PR114930,PR115502].

The fix for PR114574 needs to be further revised because 
check_qualified_type
makes decision based on TYPE_NAME which can be incorrect for C when there
are TYPE_DECLS involved.

gcc/c/:
* c-decl.c (c_update_type_canonical): Do not use 
check_qualified_type.

gcc/testsuite/:
* gcc.dg/pr114930.c: New test.
* gcc.dg/pr115502.c: New test.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 01326570e2b..610061a07f8 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9374,7 +9374,7 @@ c_update_type_canonical (tree t)
  if (TYPE_QUALS (x) == TYPE_QUALS (t))
TYPE_CANONICAL (x) = TYPE_CANONICAL (t);
  else if (TYPE_CANONICAL (t) != t
-  || check_qualified_type (x, t, TYPE_QUALS (x)))
+  || TYPE_QUALS (x) != TYPE_QUALS (TYPE_CANONICAL (t)))
TYPE_CANONICAL (x)
  = build_qualified_type (TYPE_CANONICAL (t), TYPE_QUALS (x));
  else
diff --git a/gcc/testsuite/gcc.dg/pr114930.c b/gcc/testsuite/gcc.dg/pr114930.c
new file mode 100644
index 000..5e982fb8929
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114930.c
@@ -0,0 +1,9 @@
+/* { dg-do compile }
+ * { dg-options "-std=c23 -flto" } */
+
+typedef struct WebPPicture WebPPicture;
+typedef int (*WebPProgressHook)(const WebPPicture *);
+WebPProgressHook progress_hook;
+struct WebPPicture {
+} WebPGetColorPalette(const struct WebPPicture *);
+
diff --git a/gcc/testsuite/gcc.dg/pr115502.c b/gcc/testsuite/gcc.dg/pr115502.c
new file mode 100644
index 000..02b52622c5a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115502.c
@@ -0,0 +1,9 @@
+/* { dg-do compile }
+ * { dg-options "-std=c23 -flto" } */
+
+typedef struct _OSet OSet;
+typedef OSet AvlTree;
+void vgPlain_OSetGen_Lookup(const OSet *);
+struct _OSet {};
+void vgPlain_OSetGen_Lookup(const AvlTree *);
+



[PATCH 5/7] diagnostics: introduce diagnostic-macro-unwinding.h/cc

2024-06-18 Thread David Malcolm
Eliminate a dependency on "tree" from the code used by
diagnostic_path handling.

No functional change intended.

gcc/ChangeLog:
* Makefile.in (OBJS): Add diagnostic-macro-unwinding.o.

gcc/c-family/ChangeLog:
* c-opts.cc: Replace include of "tree-diagnostic.h" with
"diagnostic-macro-unwinding.h".

gcc/ChangeLog:
* diagnostic-macro-unwinding.cc: New file, with material taken
from tree-diagnostic.cc.
* diagnostic-macro-unwinding.h: New file, with material taken
from tree-diagnostic.h.
* tree-diagnostic-path.cc: Repalce include of "tree-diagnostic.h"
with "diagnostic-macro-unwinding.h".
* tree-diagnostic.cc (struct loc_map_pair): Move to
diagnostic-macro-unwinding.cc.
(maybe_unwind_expanded_macro_loc): Likewise.
(virt_loc_aware_diagnostic_finalizer): Likewise.
* tree-diagnostic.h (virt_loc_aware_diagnostic_finalizer): Move
decl to diagnostic-macro-unwinding.h.
(maybe_unwind_expanded_macro_loc): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in   |   1 +
 gcc/c-family/c-opts.cc|   2 +-
 gcc/diagnostic-macro-unwinding.cc | 221 ++
 gcc/diagnostic-macro-unwinding.h  |  29 
 gcc/tree-diagnostic-path.cc   |   2 +-
 gcc/tree-diagnostic.cc| 195 --
 gcc/tree-diagnostic.h |   5 -
 7 files changed, 253 insertions(+), 202 deletions(-)
 create mode 100644 gcc/diagnostic-macro-unwinding.cc
 create mode 100644 gcc/diagnostic-macro-unwinding.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a2799b8d826..e701d9fb082 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1826,6 +1826,7 @@ OBJS = \
 OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \
diagnostic-format-json.o \
diagnostic-format-sarif.o \
+   diagnostic-macro-unwinding.o \
diagnostic-show-locus.o \
edit-context.o \
pretty-print.o intl.o \
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index faaf9ee6350..33114f13c8d 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -32,7 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "flags.h"
 #include "toplev.h"
 #include "langhooks.h"
-#include "tree-diagnostic.h" /* for virt_loc_aware_diagnostic_finalizer */
+#include "diagnostic-macro-unwinding.h" /* for 
virt_loc_aware_diagnostic_finalizer */
 #include "intl.h"
 #include "cppdefault.h"
 #include "incpath.h"
diff --git a/gcc/diagnostic-macro-unwinding.cc 
b/gcc/diagnostic-macro-unwinding.cc
new file mode 100644
index 000..3056d8c8afb
--- /dev/null
+++ b/gcc/diagnostic-macro-unwinding.cc
@@ -0,0 +1,221 @@
+/* Code for unwinding macro expansions in diagnostics.
+   Copyright (C) 1999-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tree.h"
+#include "diagnostic.h"
+#include "diagnostic-macro-unwinding.h"
+#include "intl.h"
+
+/* This is a pair made of a location and the line map it originated
+   from.  It's used in the maybe_unwind_expanded_macro_loc function
+   below.  */
+struct loc_map_pair
+{
+  const line_map_macro *map;
+  location_t where;
+};
+
+
+/* Unwind the different macro expansions that lead to the token which
+   location is WHERE and emit diagnostics showing the resulting
+   unwound macro expansion trace.  Let's look at an example to see how
+   the trace looks like.  Suppose we have this piece of code,
+   artificially annotated with the line numbers to increase
+   legibility:
+
+$ cat -n test.c
+  1#define OPERATE(OPRD1, OPRT, OPRD2) \
+  2  OPRD1 OPRT OPRD2;
+  3
+  4#define SHIFTL(A,B) \
+  5  OPERATE (A,<<,B)
+  6
+  7#define MULT(A) \
+  8  SHIFTL (A,1)
+  9
+ 10void
+ 11g ()
+ 12{
+ 13  MULT (1.0);// 1.0 << 1; <-- so this is an error.
+ 14}
+
+   Here is the diagnostic that we want the compiler to generate:
+
+test.c: In function ‘g’:
+test.c:5:14: error: invalid operands to binary << (have ‘double’ and ‘int’)
+test.c:2:9: note: in definition of macro 'OPERATE'
+test.c:8:3: note: in expansion of macro 'SHIFTL'
+test.c:13:3: note: in expansion of macro 'MULT'
+

[PATCH 7/7] diagnostics: rename tree-diagnostic-path.cc to diagnostic-path.cc

2024-06-18 Thread David Malcolm
Now that nothing in tree-diagnostic-path.cc uses "tree", this patch
renames it to diagnostic-path.cc and moves it from OBJS to
OBJS-libcommon.

No functional change intended.

gcc/ChangeLog:
* Makefile.in (OBJS): Move selftest-diagnostic-path.o,
selftest-logical-location.o, and tree-diagnostic-path.o to...
(OBJS-libcommon): ...here, renaming tree-diagnostic-path.o to
diagnostic-path.o.
* tree-diagnostic-path.cc: Rename to...
* diagnostic-path.cc: ...this.  Drop include of "tree.h".
(tree_diagnostic_path_cc_tests): Rename to...
(diagnostic_path_cc_tests): ...this.
* selftest-run-tests.cc (selftest::run_tests): Update for above
renaming.
* selftest.h (tree_diagnostic_path_cc_tests): Rename decl to...
(diagnostic_path_cc_tests): ...this.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in | 6 +++---
 gcc/{tree-diagnostic-path.cc => diagnostic-path.cc} | 3 +--
 gcc/selftest-run-tests.cc   | 2 +-
 gcc/selftest.h  | 2 +-
 4 files changed, 6 insertions(+), 7 deletions(-)
 rename gcc/{tree-diagnostic-path.cc => diagnostic-path.cc} (99%)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e701d9fb082..638ea6b2307 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1700,8 +1700,6 @@ OBJS = \
ubsan.o \
sanopt.o \
sancov.o \
-   selftest-diagnostic-path.o \
-   selftest-logical-location.o \
simple-diagnostic-path.o \
tree-call-cdce.o \
tree-cfg.o \
@@ -1712,7 +1710,6 @@ OBJS = \
tree-dfa.o \
tree-diagnostic.o \
tree-diagnostic-client-data-hooks.o \
-   tree-diagnostic-path.o \
tree-dump.o \
tree-eh.o \
tree-emutls.o \
@@ -1827,6 +1824,7 @@ OBJS-libcommon = diagnostic-spec.o diagnostic.o 
diagnostic-color.o \
diagnostic-format-json.o \
diagnostic-format-sarif.o \
diagnostic-macro-unwinding.o \
+   diagnostic-path.o \
diagnostic-show-locus.o \
edit-context.o \
pretty-print.o intl.o \
@@ -1834,6 +1832,8 @@ OBJS-libcommon = diagnostic-spec.o diagnostic.o 
diagnostic-color.o \
sbitmap.o \
vec.o input.o hash-table.o ggc-none.o memory-block.o \
selftest.o selftest-diagnostic.o sort.o \
+   selftest-diagnostic-path.o \
+   selftest-logical-location.o \
text-art/box-drawing.o \
text-art/canvas.o \
text-art/ruler.o \
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/diagnostic-path.cc
similarity index 99%
rename from gcc/tree-diagnostic-path.cc
rename to gcc/diagnostic-path.cc
index 35f8ea2b8b6..882dc1c5805 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/diagnostic-path.cc
@@ -25,7 +25,6 @@ along with GCC; see the file COPYING3.  If not see
 #define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
-#include "tree.h"
 #include "diagnostic.h"
 #include "diagnostic-macro-unwinding.h"
 #include "intl.h"
@@ -2199,7 +2198,7 @@ control_flow_tests (const line_table_case _)
 /* Run all of the selftests within this file.  */
 
 void
-tree_diagnostic_path_cc_tests ()
+diagnostic_path_cc_tests ()
 {
   /* In a few places we use the global dc's printer to determine
  colorization so ensure this off during the tests.  */
diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc
index 3275db38ba9..e6779206c47 100644
--- a/gcc/selftest-run-tests.cc
+++ b/gcc/selftest-run-tests.cc
@@ -102,7 +102,7 @@ selftest::run_tests ()
   spellcheck_cc_tests ();
   spellcheck_tree_cc_tests ();
   tree_cfg_cc_tests ();
-  tree_diagnostic_path_cc_tests ();
+  diagnostic_path_cc_tests ();
   simple_diagnostic_path_cc_tests ();
   attribs_cc_tests ();
 
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 2d1aa91607e..dcb1463ed90 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -222,6 +222,7 @@ extern void cgraph_cc_tests ();
 extern void convert_cc_tests ();
 extern void diagnostic_color_cc_tests ();
 extern void diagnostic_format_json_cc_tests ();
+extern void diagnostic_path_cc_tests ();
 extern void diagnostic_show_locus_cc_tests ();
 extern void digraph_cc_tests ();
 extern void dumpfile_cc_tests ();
@@ -259,7 +260,6 @@ extern void sreal_cc_tests ();
 extern void store_merging_cc_tests ();
 extern void tree_cc_tests ();
 extern void tree_cfg_cc_tests ();
-extern void tree_diagnostic_path_cc_tests ();
 extern void tristate_cc_tests ();
 extern void typed_splay_tree_cc_tests ();
 extern void vec_cc_tests ();
-- 
2.26.3



[PATCH 2/7] diagnostics: eliminate "tree" from diagnostic_{event, path}

2024-06-18 Thread David Malcolm
This patch eliminates the use of "tree" from diagnostic_{event,path} in
favor of const logical_location *.

No functional change intended.

gcc/analyzer/ChangeLog:
* checker-event.h (checker_event::fndecl): Drop "final" and
"override", converting from a vfunc implementation to a plain
accessor.
* checker-path.cc (checker_path::same_function_p): New.
* checker-path.h (checker_path::same_function_p): New decl.

gcc/ChangeLog:
* diagnostic.cc: Include "logical-location.h".
(diagnostic_path::get_first_event_in_a_function): Fix typo in
leading comment.  Rewrite to use logical_location rather than
tree.  Drop test on stack depth.
(diagnostic_path::interprocedural_p): Rewrite to use
logical_location rather than tree.
(logical_location::function_p): New.
* diagnostic-path.h (diagnostic_event::get_fndecl): Eliminate
vfunc.
(diagnostic_path::same_function_p): New pure virtual func.
* logical-location.h (logical_location::get_name_for_path_output):
New pure virtual func.
* simple-diagnostic-path.cc
(simple_diagnostic_path::same_function_p): New.
(simple_diagnostic_event::simple_diagnostic_event): Initialize
m_logical_loc.
* simple-diagnostic-path.h: Include "tree-logical-location.h".
(simple_diagnostic_event::get_fndecl): Convert from a vfunc
implementation to an accessor.
(simple_diagnostic_event::get_logical_location): Use
m_logical_loc.
(simple_diagnostic_event::m_logical_loc): New field.
(simple_diagnostic_path::same_function_p): New decl.
* tree-diagnostic-path.cc: Move pragma disabling -Wformat-diag to
cover the whole file.
(can_consolidate_events): Add params "path", "ev1_idx", and
"ev2_idx".  Rewrite to use diagnostic_path::same_function_p rather
than tree.
(per_thread_summary::per_thread_summary): Add "path" param
(per_thread_summary::m_path): New field.
(event_range::event_range): Update for conversion of m_fndecl to
m_logical_loc.
(event_range::maybe_add_event): Rename param "idx" to
"new_ev_idx".  Update call to can_consolidate_events to pass in
"m_path", "m_start_idx", and "new_ev_idx".
(event_range::m_fndecl): Replace with...
(event_range::m_logical_loc): ...this.
(path_summary::get_or_create_events_for_thread_id): Pass "path" to
per_thread_summary ctor.
(per_thread_summary::interprocedural_p): Rewrite to use
diagnostic_path::same_function_p rather than tree.
(print_fndecl): Delete.
(thread_event_printer::print_swimlane_for_event_range): Update for
conversion from tree to logical_location.
(default_tree_diagnostic_path_printer): Likewise.
(default_tree_make_json_for_path): Likewise.
* tree-logical-location.cc: Include "intl.h".
(compiler_logical_location::get_name_for_tree_for_path_output):
New.
(tree_logical_location::get_name_for_path_output): New.
(current_fndecl_logical_location::get_name_for_path_output): New.
* tree-logical-location.h
(compiler_logical_location::get_name_for_tree_for_path_output):
New decl.
(tree_logical_location::get_name_for_path_output): New decl.
(current_fndecl_logical_location::get_name_for_path_output): New
decl.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/checker-event.h  |   2 +-
 gcc/analyzer/checker-path.cc  |   8 +++
 gcc/analyzer/checker-path.h   |   4 ++
 gcc/diagnostic-path.h |  10 +++-
 gcc/diagnostic.cc |  47 
 gcc/logical-location.h|   5 ++
 gcc/simple-diagnostic-path.cc |  11 +++-
 gcc/simple-diagnostic-path.h  |  13 -
 gcc/tree-diagnostic-path.cc   | 103 +-
 gcc/tree-logical-location.cc  |  25 +
 gcc/tree-logical-location.h   |   3 +
 11 files changed, 161 insertions(+), 70 deletions(-)

diff --git a/gcc/analyzer/checker-event.h b/gcc/analyzer/checker-event.h
index d0935aca985..4343641f441 100644
--- a/gcc/analyzer/checker-event.h
+++ b/gcc/analyzer/checker-event.h
@@ -91,7 +91,6 @@ public:
   /* Implementation of diagnostic_event.  */
 
   location_t get_location () const final override { return m_loc; }
-  tree get_fndecl () const final override { return m_effective_fndecl; }
   int get_stack_depth () const final override { return m_effective_depth; }
   const logical_location *get_logical_location () const final override
   {
@@ -111,6 +110,7 @@ public:
   maybe_add_sarif_properties (sarif_object _flow_loc_obj) const 
override;
 
   /* Additional functionality.  */
+  tree get_fndecl () const { return m_effective_fndecl; }
 
   int get_original_stack_depth () const { return m_original_depth; }
 
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 

[PATCH 3/7] diagnostics: remove tree usage from tree-diagnostic-path.cc

2024-06-18 Thread David Malcolm
No functional change intended.

gcc/ChangeLog:
* Makefile.in (OBJS): Add selftest-diagnostic-path.o and
selftest-logical-location.o.
* logical-location.h: Include "label-text.h".
(class logical_location): Update leading comment.
* selftest-diagnostic-path.cc: New file, adapted from
simple-diagnostic-path.cc and from material in
tree-diagnostic-path.cc.
* selftest-diagnostic-path.h: New file, adapted from
simple-diagnostic-path.h and from material in
tree-diagnostic-path.cc.
* selftest-logical-location.cc: New file.
* selftest-logical-location.h: New file.
* tree-diagnostic-path.cc: Remove includes of "tree-pretty-print.h",
"langhooks.h", and "simple-diagnostic-path.h".  Add include of
"selftest-diagnostic-path.h".
(class test_diagnostic_path): Delete, in favor of new
implementation in selftest-diagnostic-path.{h,cc}, which is
directly derived from diagnostic_path, rather than from
simple_diagnostic_path.
(selftest::test_intraprocedural_path): Eliminate tree usage,
via change to test_diagnostic_path, using strings rather than
function_decls for identifying functions in the test.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in  |   2 +
 gcc/logical-location.h   |   5 +-
 gcc/selftest-diagnostic-path.cc  | 233 +++
 gcc/selftest-diagnostic-path.h   | 163 +
 gcc/selftest-logical-location.cc |  71 ++
 gcc/selftest-logical-location.h  |  58 
 gcc/tree-diagnostic-path.cc  | 161 +++--
 7 files changed, 580 insertions(+), 113 deletions(-)
 create mode 100644 gcc/selftest-diagnostic-path.cc
 create mode 100644 gcc/selftest-diagnostic-path.h
 create mode 100644 gcc/selftest-logical-location.cc
 create mode 100644 gcc/selftest-logical-location.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 35f259da858..a2799b8d826 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1700,6 +1700,8 @@ OBJS = \
ubsan.o \
sanopt.o \
sancov.o \
+   selftest-diagnostic-path.o \
+   selftest-logical-location.o \
simple-diagnostic-path.o \
tree-call-cdce.o \
tree-cfg.o \
diff --git a/gcc/logical-location.h b/gcc/logical-location.h
index c3b72081135..bba21087786 100644
--- a/gcc/logical-location.h
+++ b/gcc/logical-location.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_LOGICAL_LOCATION_H
 #define GCC_LOGICAL_LOCATION_H
 
+#include "label-text.h"
+
 /* An enum for discriminating between different kinds of logical location
for a diagnostic.
 
@@ -46,7 +48,8 @@ enum logical_location_kind
- "within function 'foo'", or
- "within method 'bar'",
but *without* requiring knowledge of trees
-   (see tree-logical-location.h for subclasses relating to trees).  */
+   (see tree-logical-location.h for concrete subclasses relating to trees,
+   and selftest-logical-location.h for a concrete subclass for selftests).  */
 
 class logical_location
 {
diff --git a/gcc/selftest-diagnostic-path.cc b/gcc/selftest-diagnostic-path.cc
new file mode 100644
index 000..6d21f2e5599
--- /dev/null
+++ b/gcc/selftest-diagnostic-path.cc
@@ -0,0 +1,233 @@
+/* Concrete classes for selftests involving diagnostic paths.
+   Copyright (C) 2019-2024 Free Software Foundation, Inc.
+   Contributed by David Malcolm 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+
+#include "config.h"
+#define INCLUDE_VECTOR
+#include "system.h"
+#include "coretypes.h"
+#include "version.h"
+#include "demangle.h"
+#include "backtrace.h"
+#include "diagnostic.h"
+#include "selftest-diagnostic-path.h"
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* class test_diagnostic_path : public diagnostic_path.  */
+

[PATCH 6/7] diagnostics: eliminate diagnostic_context::m_print_path callback

2024-06-18 Thread David Malcolm
No functional change intended.

gcc/ChangeLog:
* diagnostic-format-json.cc (diagnostic_output_format_init_json):
Replace clearing of diagnostic_context::m_print_path callback with
setting the path format to DPF_NONE.
* diagnostic-format-sarif.cc
(diagnostic_output_format_init_sarif): Likewise.
* diagnostic.cc (diagnostic_context::show_any_path): Replace call
to diagnostic_context::m_print_path callback with a direct call to
diagnostic_context::print_path.
* diagnostic.h (diagnostic_context::print_path): New decl.
(diagnostic_context::m_print_path): Delete callback.
* tree-diagnostic-path.cc (default_tree_diagnostic_path_printer):
Convert to...
(diagnostic_context::print_path): ...this.
* tree-diagnostic.cc (tree_diagnostics_defaults): Delete
initialization of m_print_path.
* tree-diagnostic.h (default_tree_diagnostic_path_printer): Delete
decl.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-json.cc  |  4 ++--
 gcc/diagnostic-format-sarif.cc |  4 +++-
 gcc/diagnostic.cc  |  3 +--
 gcc/diagnostic.h   |  4 ++--
 gcc/tree-diagnostic-path.cc| 23 +++
 gcc/tree-diagnostic.cc |  1 -
 gcc/tree-diagnostic.h  |  3 ---
 7 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index 2bdc2c13d37..ec03ac15aeb 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -395,8 +395,8 @@ private:
 static void
 diagnostic_output_format_init_json (diagnostic_context *context)
 {
-  /* Override callbacks.  */
-  context->m_print_path = nullptr; /* handled in json_end_diagnostic.  */
+  /* Suppress normal textual path output.  */
+  context->set_path_format (DPF_NONE);
 
   /* The metadata is handled in JSON format, rather than as text.  */
   context->set_show_cwe (false);
diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 79116f051bc..5f438dd38a8 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -1991,8 +1991,10 @@ private:
 static void
 diagnostic_output_format_init_sarif (diagnostic_context *context)
 {
+  /* Suppress normal textual path output.  */
+  context->set_path_format (DPF_NONE);
+
   /* Override callbacks.  */
-  context->m_print_path = nullptr; /* handled in sarif_end_diagnostic.  */
   context->set_ice_handler_callback (sarif_ice_handler);
 
   /* The metadata is handled in SARIF format, rather than as text.  */
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 844eb8e1048..471135f16de 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -915,8 +915,7 @@ diagnostic_context::show_any_path (const diagnostic_info 
)
   if (!path)
 return;
 
-  if (m_print_path)
-m_print_path (this, path);
+  print_path (path);
 }
 
 /* class diagnostic_event.  */
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index ff2aa3dd9a3..c6846525da3 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -583,6 +583,8 @@ private:
   pretty_printer *pp,
   diagnostic_source_effect_info *effect_info);
 
+  void print_path (const diagnostic_path *path);
+
   /* Data members.
  Ideally, all of these would be private and have "m_" prefixes.  */
 
@@ -712,8 +714,6 @@ private:
   urlifier *m_urlifier;
 
 public:
-  void (*m_print_path) (diagnostic_context *, const diagnostic_path *);
-
   /* Auxiliary data for client.  */
   void *m_client_aux_data;
 
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/tree-diagnostic-path.cc
index adaaf30b84f..35f8ea2b8b6 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/tree-diagnostic-path.cc
@@ -884,17 +884,16 @@ print_path_summary_as_text (const path_summary *ps, 
diagnostic_context *dc,
 
 } /* end of anonymous namespace for path-printing code.  */
 
-/* Print PATH to CONTEXT, according to CONTEXT's path_format.  */
+/* Print PATH according to this context's path_format.  */
 
 void
-default_tree_diagnostic_path_printer (diagnostic_context *context,
- const diagnostic_path *path)
+diagnostic_context::print_path (const diagnostic_path *path)
 {
   gcc_assert (path);
 
   const unsigned num_events = path->num_events ();
 
-  switch (context->get_path_format ())
+  switch (get_path_format ())
 {
 case DPF_NONE:
   /* Do nothing.  */
@@ -909,7 +908,7 @@ default_tree_diagnostic_path_printer (diagnostic_context 
*context,
label_text event_text (event.get_desc (false));
gcc_assert (event_text.get ());
diagnostic_event_id_t event_id (i);
-   if (context->show_path_depths_p ())
+   if (this->show_path_depths_p ())
  {
int stack_depth = event.get_stack_depth ();
/* -fdiagnostics-path-format=separate-events doesn't print
@@ -941,13 +940,13 @@ 

[PATCH 4/7] diagnostics: eliminate diagnostic_context::m_make_json_for_path

2024-06-18 Thread David Malcolm
Now that the path-handling code for json_output_format no longer
needs "tree", and thus can be in OBJS-libcommon we can move it
from tree-diagnostic-path.cc to diagnostic-format-json.cc where it
should have been all along.

No functional change intended.

gcc/ChangeLog:
* diagnostic-format-json.cc: Include "diagnostic-path.h" and
"logical-location.h".
(make_json_for_path): Move tree-diagnostic-path.cc's
default_tree_make_json_for_path here, renaming it and making it
static.
(json_output_format::on_end_diagnostic): Replace call of
m_context's m_make_json_for_path callback with a direct call to
make_json_for_path.
* diagnostic.h (diagnostic_context::m_make_json_for_path): Drop
field.
* tree-diagnostic-path.cc: Drop include of "json.h".
(default_tree_make_json_for_path): Rename to make_json_for_path
and move to diagnostic-format-json.cc.
* tree-diagnostic.cc (tree_diagnostics_defaults): Drop
initialization of m_make_json_for_path.
* tree-diagnostic.h (default_tree_make_json_for): Delete decl.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-json.cc | 37 ---
 gcc/diagnostic.h  |  2 --
 gcc/tree-diagnostic-path.cc   | 32 --
 gcc/tree-diagnostic.cc|  1 -
 gcc/tree-diagnostic.h |  2 --
 5 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/gcc/diagnostic-format-json.cc b/gcc/diagnostic-format-json.cc
index 0782ae831eb..2bdc2c13d37 100644
--- a/gcc/diagnostic-format-json.cc
+++ b/gcc/diagnostic-format-json.cc
@@ -25,8 +25,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "selftest-diagnostic.h"
 #include "diagnostic-metadata.h"
+#include "diagnostic-path.h"
 #include "json.h"
 #include "selftest.h"
+#include "logical-location.h"
 
 /* Subclass of diagnostic_output_format for JSON output.  */
 
@@ -187,6 +189,36 @@ json_from_metadata (const diagnostic_metadata *metadata)
   return metadata_obj;
 }
 
+/* Make a JSON value for PATH.  */
+
+static json::value *
+make_json_for_path (diagnostic_context *context,
+   const diagnostic_path *path)
+{
+  json::array *path_array = new json::array ();
+  for (unsigned i = 0; i < path->num_events (); i++)
+{
+  const diagnostic_event  = path->get_event (i);
+
+  json::object *event_obj = new json::object ();
+  if (event.get_location ())
+   event_obj->set ("location",
+   json_from_expanded_location (context,
+event.get_location ()));
+  label_text event_text (event.get_desc (false));
+  event_obj->set_string ("description", event_text.get ());
+  if (const logical_location *logical_loc = event.get_logical_location ())
+   {
+ label_text name (logical_loc->get_name_for_path_output ());
+ event_obj->set_string ("function", name.get ());
+   }
+  event_obj->set_integer ("depth", event.get_stack_depth ());
+  path_array->append (event_obj);
+}
+  return path_array;
+}
+
+
 /* Implementation of "on_end_diagnostic" vfunc for JSON output.
Generate a JSON object for DIAGNOSTIC, and store for output
within current diagnostic group.  */
@@ -291,10 +323,9 @@ json_output_format::on_end_diagnostic (const 
diagnostic_info ,
 }
 
   const diagnostic_path *path = richloc->get_path ();
-  if (path && m_context.m_make_json_for_path)
+  if (path)
 {
-  json::value *path_value
-   = m_context.m_make_json_for_path (_context, path);
+  json::value *path_value = make_json_for_path (_context, path);
   diag_obj->set ("path", path_value);
 }
 
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 9a9571bb76d..ff2aa3dd9a3 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -713,8 +713,6 @@ private:
 
 public:
   void (*m_print_path) (diagnostic_context *, const diagnostic_path *);
-  json::value *(*m_make_json_for_path) (diagnostic_context *,
-   const diagnostic_path *);
 
   /* Auxiliary data for client.  */
   void *m_client_aux_data;
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/tree-diagnostic-path.cc
index 39a85d33015..40b197d971c 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/tree-diagnostic-path.cc
@@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-diagnostic.h"
 #include "intl.h"
 #include "diagnostic-path.h"
-#include "json.h"
 #include "gcc-rich-location.h"
 #include "diagnostic-color.h"
 #include "diagnostic-event-id.h"
@@ -954,37 +953,6 @@ default_tree_diagnostic_path_printer (diagnostic_context 
*context,
 }
 }
 
-/* This has to be here, rather than diagnostic-format-json.cc,
-   since diagnostic-format-json.o is within OBJS-libcommon and thus
-   doesn't have access to trees (for m_fndecl).  */
-
-json::value *

[PATCH 1/7] diagnostics: move simple_diagnostic_{path, thread, event} to their own .h/cc

2024-06-18 Thread David Malcolm
As work towards eliminating the dependency on "tree" from
path-printing, move these classes to a new simple-diagnostic-path.h/cc.

No functional change intended.

gcc/analyzer/ChangeLog:
* checker-path.h: Include "simple-diagnostic-path.h".

gcc/ChangeLog:
* Makefile.in (OBJS): Add simple-diagnostic-path.o.
* diagnostic-path.h (class simple_diagnostic_event): Move to
simple-diagnostic-path.h.
(class simple_diagnostic_thread): Likewise.
(class simple_diagnostic_path): Likewise.
* diagnostic.cc (simple_diagnostic_path::simple_diagnostic_path):
Move to simple-diagnostic-path.cc.
(simple_diagnostic_path::num_events): Likewise.
(simple_diagnostic_path::get_event): Likewise.
(simple_diagnostic_path::num_threads): Likewise.
(simple_diagnostic_path::get_thread): Likewise.
(simple_diagnostic_path::add_thread): Likewise.
(simple_diagnostic_path::add_event): Likewise.
(simple_diagnostic_path::add_thread_event): Likewise.
(simple_diagnostic_path::connect_to_next_event): Likewise.
(simple_diagnostic_event::simple_diagnostic_event): Likewise.
(simple_diagnostic_event::~simple_diagnostic_event): Likewise.
* selftest-run-tests.cc (selftest::run_tests): Call
selftest::simple_diagnostic_path_cc_tests.
* selftest.h (selftest::simple_diagnostic_path_cc_tests): New
decl.
* simple-diagnostic-path.cc: New file, from the above material.
* simple-diagnostic-path.h: New file, from the above material
from diagnostic-path.h.
* tree-diagnostic-path.cc: Include "simple-diagnostic-path.h".

gcc/testsuite/ChangeLog
* gcc.dg/plugin/diagnostic_plugin_test_paths.c: Include
"simple-diagnostic-path.h".

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in   |   1 +
 gcc/analyzer/checker-path.h   |   1 +
 gcc/diagnostic-path.h | 104 +---
 gcc/diagnostic.cc | 149 
 gcc/selftest-run-tests.cc |   1 +
 gcc/selftest.h|   1 +
 gcc/simple-diagnostic-path.cc | 228 ++
 gcc/simple-diagnostic-path.h  | 130 ++
 .../plugin/diagnostic_plugin_test_paths.c |   1 +
 gcc/tree-diagnostic-path.cc   |   1 +
 10 files changed, 366 insertions(+), 251 deletions(-)
 create mode 100644 gcc/simple-diagnostic-path.cc
 create mode 100644 gcc/simple-diagnostic-path.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f5adb647d3f..35f259da858 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1700,6 +1700,7 @@ OBJS = \
ubsan.o \
sanopt.o \
sancov.o \
+   simple-diagnostic-path.o \
tree-call-cdce.o \
tree-cfg.o \
tree-cfgcleanup.o \
diff --git a/gcc/analyzer/checker-path.h b/gcc/analyzer/checker-path.h
index 6b3e8a34fe5..162ebb3f0d8 100644
--- a/gcc/analyzer/checker-path.h
+++ b/gcc/analyzer/checker-path.h
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_ANALYZER_CHECKER_PATH_H
 
 #include "analyzer/checker-event.h"
+#include "simple-diagnostic-path.h"
 
 namespace ana {
 
diff --git a/gcc/diagnostic-path.h b/gcc/diagnostic-path.h
index 938bd583a3d..958eb725322 100644
--- a/gcc/diagnostic-path.h
+++ b/gcc/diagnostic-path.h
@@ -201,108 +201,8 @@ private:
   bool get_first_event_in_a_function (unsigned *out_idx) const;
 };
 
-/* Concrete subclasses.  */
-
-/* A simple implementation of diagnostic_event.  */
-
-class simple_diagnostic_event : public diagnostic_event
-{
- public:
-  simple_diagnostic_event (location_t loc, tree fndecl, int depth,
-  const char *desc,
-  diagnostic_thread_id_t thread_id = 0);
-  ~simple_diagnostic_event ();
-
-  location_t get_location () const final override { return m_loc; }
-  tree get_fndecl () const final override { return m_fndecl; }
-  int get_stack_depth () const final override { return m_depth; }
-  label_text get_desc (bool) const final override
-  {
-return label_text::borrow (m_desc);
-  }
-  const logical_location *get_logical_location () const final override
-  {
-return NULL;
-  }
-  meaning get_meaning () const final override
-  {
-return meaning ();
-  }
-  bool connect_to_next_event_p () const final override
-  {
-return m_connected_to_next_event;
-  }
-  diagnostic_thread_id_t get_thread_id () const final override
-  {
-return m_thread_id;
-  }
-
-  void connect_to_next_event ()
-  {
-m_connected_to_next_event = true;
-  }
-
- private:
-  location_t m_loc;
-  tree m_fndecl;
-  int m_depth;
-  char *m_desc; // has been i18n-ed and formatted
-  bool m_connected_to_next_event;
-  diagnostic_thread_id_t m_thread_id;
-};
-
-/* A simple implementation of diagnostic_thread.  */
-
-class simple_diagnostic_thread : public 

[pushed 0/7] diagnostics: remove "tree" dependency from diagnostic paths

2024-06-18 Thread David Malcolm
This patch kit removes the dependency on "tree" from diagnostic paths,
renaming tree-diagnostic-path.cc to diagnostic-path.cc.

I have an updated prototype of libdiagnostics that uses this to support
execution paths.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r15-1410-gf89f9c7ae7190c through
r15-1416-g524cdf4dab610e.

David Malcolm (7):
  diagnostics: move simple_diagnostic_{path,thread,event} to their own
.h/cc
  diagnostics: eliminate "tree" from diagnostic_{event,path}
  diagnostics: remove tree usage from tree-diagnostic-path.cc
  diagnostics: eliminate diagnostic_context::m_make_json_for_path
  diagnostics: introduce diagnostic-macro-unwinding.h/cc
  diagnostics: eliminate diagnostic_context::m_print_path callback
  diagnostics: rename tree-diagnostic-path.cc to diagnostic-path.cc

 gcc/Makefile.in   |   6 +-
 gcc/analyzer/checker-event.h  |   2 +-
 gcc/analyzer/checker-path.cc  |   8 +
 gcc/analyzer/checker-path.h   |   5 +
 gcc/c-family/c-opts.cc|   2 +-
 gcc/diagnostic-format-json.cc |  41 ++-
 gcc/diagnostic-format-sarif.cc|   4 +-
 gcc/diagnostic-macro-unwinding.cc | 221 
 gcc/diagnostic-macro-unwinding.h  |  29 ++
 ...-diagnostic-path.cc => diagnostic-path.cc} | 317 ++
 gcc/diagnostic-path.h | 114 +--
 gcc/diagnostic.cc | 199 +++
 gcc/diagnostic.h  |   6 +-
 gcc/logical-location.h|  10 +-
 gcc/selftest-diagnostic-path.cc   | 233 +
 gcc/selftest-diagnostic-path.h| 163 +
 gcc/selftest-logical-location.cc  |  71 
 gcc/selftest-logical-location.h   |  58 
 gcc/selftest-run-tests.cc |   3 +-
 gcc/selftest.h|   3 +-
 gcc/simple-diagnostic-path.cc | 237 +
 gcc/simple-diagnostic-path.h  | 139 
 .../plugin/diagnostic_plugin_test_paths.c |   1 +
 gcc/tree-diagnostic.cc| 197 ---
 gcc/tree-diagnostic.h |  10 -
 gcc/tree-logical-location.cc  |  25 ++
 gcc/tree-logical-location.h   |   3 +
 27 files changed, 1410 insertions(+), 697 deletions(-)
 create mode 100644 gcc/diagnostic-macro-unwinding.cc
 create mode 100644 gcc/diagnostic-macro-unwinding.h
 rename gcc/{tree-diagnostic-path.cc => diagnostic-path.cc} (89%)
 create mode 100644 gcc/selftest-diagnostic-path.cc
 create mode 100644 gcc/selftest-diagnostic-path.h
 create mode 100644 gcc/selftest-logical-location.cc
 create mode 100644 gcc/selftest-logical-location.h
 create mode 100644 gcc/simple-diagnostic-path.cc
 create mode 100644 gcc/simple-diagnostic-path.h

-- 
2.26.3



[PATCH] c++: ICE with __dynamic_cast redecl [PR115501]

2024-06-18 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since r13-3299, build_dynamic_cast_1 calls pushdecl which calls
duplicate_decls and that in this testcase emits the "conflicting
declaration" error and returns error_mark_node, so the subsequent
build_cxx_call crashes on the error_mark_node.

PR c++/115501

gcc/cp/ChangeLog:

* rtti.cc (build_dynamic_cast_1): Return if dcast_fn is erroneous.

gcc/testsuite/ChangeLog:

* g++.dg/rtti/dyncast8.C: New test.
---
 gcc/cp/rtti.cc   |  2 ++
 gcc/testsuite/g++.dg/rtti/dyncast8.C | 15 +++
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/rtti/dyncast8.C

diff --git a/gcc/cp/rtti.cc b/gcc/cp/rtti.cc
index ed69606f4dd..cc006ea927f 100644
--- a/gcc/cp/rtti.cc
+++ b/gcc/cp/rtti.cc
@@ -794,6 +794,8 @@ build_dynamic_cast_1 (location_t loc, tree type, tree expr,
  pop_abi_namespace (flags);
  dynamic_cast_node = dcast_fn;
}
+ if (dcast_fn == error_mark_node)
+   return error_mark_node;
  result = build_cxx_call (dcast_fn, 4, elems, complain);
  SET_EXPR_LOCATION (result, loc);
 
diff --git a/gcc/testsuite/g++.dg/rtti/dyncast8.C 
b/gcc/testsuite/g++.dg/rtti/dyncast8.C
new file mode 100644
index 000..de23433dd9b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/rtti/dyncast8.C
@@ -0,0 +1,15 @@
+// PR c++/115501
+// { dg-do compile }
+
+struct s{virtual void f();};
+struct s1 : s{};
+namespace __cxxabiv1
+{
+  extern "C" void __dynamic_cast(); // { dg-message "previous declaration" }
+}
+void diagnostic_information_impl(s const *se)
+{
+  dynamic_cast(se);
+}
+
+// { dg-error "conflicting declaration" "" { target *-*-* } 0 }

base-commit: e4f938936867d8799775d1455e67bd3fb8711afd
-- 
2.45.1



[PATCH] c++: ICE with __has_unique_object_representations [PR115476]

2024-06-18 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14/13?

-- >8 --
Here we started to ICE with r13-25: in check_trait_type, for "X[]" we
return true here:

  if (kind == 1 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
return true; // Array of unknown bound. Don't care about completeness.

and then end up crashing in record_has_unique_obj_representations:

4836  if (cur != wi::to_offset (sz))

because sz is null.

https://eel.is/c++draft/type.traits#tab:meta.unary.prop-row-47-column-3-sentence-1
says that the preconditions for __has_unique_object_representations are:
"T shall be a complete type, cv void, or an array of unknown bound" and
that "For an array type T, the same result as
has_unique_object_representations_v>" so T[]
should be treated as T.  So we should use kind==2 for the trait.

PR c++/115476

gcc/cp/ChangeLog:

* semantics.cc (finish_trait_expr)
: Move below to call
check_trait_type with kind==2.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/has-unique-obj-representations4.C: New test.
---
 gcc/cp/semantics.cc  |  2 +-
 .../cpp1z/has-unique-obj-representations4.C  | 16 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 08f5f245e7d..42251b6764b 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12966,7 +12966,6 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_HAS_NOTHROW_COPY:
 case CPTK_HAS_TRIVIAL_COPY:
 case CPTK_HAS_TRIVIAL_DESTRUCTOR:
-case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
   if (!check_trait_type (type1))
return error_mark_node;
   break;
@@ -12976,6 +12975,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_STD_LAYOUT:
 case CPTK_IS_TRIVIAL:
 case CPTK_IS_TRIVIALLY_COPYABLE:
+case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
   if (!check_trait_type (type1, /* kind = */ 2))
return error_mark_node;
   break;
diff --git a/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C 
b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
new file mode 100644
index 000..d6949dc7005
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/has-unique-obj-representations4.C
@@ -0,0 +1,16 @@
+// PR c++/115476
+// { dg-do compile { target c++11 } }
+
+struct X;
+static_assert(__has_unique_object_representations(X), "");   // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[1]), "");  // { dg-error 
"invalid use of incomplete type" }
+static_assert(__has_unique_object_representations(X[][1]), "");  // { dg-error 
"invalid use of incomplete type" }
+
+struct X {
+  int x;
+};
+static_assert(__has_unique_object_representations(X), "");
+static_assert(__has_unique_object_representations(X[]), "");
+static_assert(__has_unique_object_representations(X[1]), "");
+static_assert(__has_unique_object_representations(X[][1]), "");

base-commit: 7f9be55a4630134a237219af9cc8143e02080380
-- 
2.45.1



[r15-1394 Regression] FAIL: gcc.dg/pr115109.c (test for excess errors) on Linux/x86_64

2024-06-18 Thread haochen.jiang
On Linux/x86_64,

c9b96a68860bfdee49d40b4a844af7c5ef69cd12 is the first bad commit
commit c9b96a68860bfdee49d40b4a844af7c5ef69cd12
Author: Martin Uecker 
Date:   Sat May 18 22:00:04 2024 +0200

c23: Fix for redeclared enumerator initialized with different type 
[PR115109]

caused

FAIL: gcc.dg/c23-tag-enum-6.c  (test for errors, line 10)
FAIL: gcc.dg/c23-tag-enum-6.c  (test for errors, line 13)
FAIL: gcc.dg/c23-tag-enum-7.c (test for excess errors)
FAIL: gcc.dg/pr115109.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-1394/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/c23-tag-enum-6.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/c23-tag-enum-6.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/c23-tag-enum-7.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/c23-tag-enum-7.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr115109.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr115109.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


RE: [PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

2024-06-18 Thread Li, Pan2
Thanks Richard, will commit this one and then have a try to reduce unnecessary 
pattern following your suggestion.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 18, 2024 7:08 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

On Mon, Jun 17, 2024 at 9:07 AM  wrote:
>
> From: Pan Li 
>
> We missed one match pattern for the unsigned scalar .SAT_SUB,  aka
> form 11.
>
> Form 11:
>   #define SAT_SUB_U_11(T) \
>   T sat_sub_u_11_##T (T x, T y) \
>   { \
> T ret; \
> bool overflow = __builtin_sub_overflow (x, y, ); \
> return overflow ? 0 : ret; \
>   }
>
> Thus,  add above form 11 to the match pattern gimple_unsigned_integer_sat_sub.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK, but see my other mail.  Eventually sth like

(for cmp (tcc_comparison)
   icmp (inverted_tcc_comparison)
   ncmp (inverted_tcc_comparison_with_nans)
(simplify
 (cond (cmp @0 @1) @2 @3)
 (if (tree_swap_operands_p (@2, @3))
  (with { enum tree_code ic = invert_tree_comparison (cmp, HONOR_NANS (@0)); }
   (if (ic == icmp)
   (cond (icmp @0 @1) @3 @2)
   (if (ic == ncmp)
(cond (ncmp @0 @1) @3 @2))

helps here.  Of course with matching PHIs the above isn't going to help.

> gcc/ChangeLog:
>
> * match.pd: Add form 11 match pattern for .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..5c330a43ed0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3186,13 +3186,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW).  */
> +/* Unsigned saturation sub, case 7 (branch eq with .SUB_OVERFLOW).  */
>  (match (unsigned_integer_sat_sub @0 @1)
>   (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
>(realpart @2) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 8 (branch ne with .SUB_OVERFLOW).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
> +   integer_zerop (realpart @2))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>


RE: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD

2024-06-18 Thread Li, Pan2
Thanks Richard for comments.

> we might want to consider such transform in match.pd, in this case this
> would allow to elide one of the patterns.

That makes much more sense to me, it is not good idea to have many patterns for 
SAT_ADD,
will commit this first and have a try in another PATCH for this.

Pan

-Original Message-
From: Richard Biener  
Sent: Tuesday, June 18, 2024 7:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD

On Mon, Jun 17, 2024 at 3:41 AM  wrote:
>
> From: Pan Li 
>
> When investigate the vectorization of .SAT_ADD,  we notice there
> are additional 2 forms,  aka form 7 and 8 for .SAT_ADD.
>
> Form 7:
>   #define DEF_SAT_U_ADD_FMT_7(T)  \
>   T __attribute__((noinline)) \
>   sat_u_add_##T##_fmt_7 (T x, T y)\
>   {   \
> return x > (T)(x + y) ? -1 : (x + y); \
>   }
>
> Form 8:
>   #define DEF_SAT_U_ADD_FMT_8(T)   \
>   T __attribute__((noinline))  \
>   sat_u_add_##T##_fmt_8 (T x, T y) \
>   {\
> return x <= (T)(x + y) ? (x + y) : -1; \
>   }
>
> Thus,  add above 2 forms to the match gimple_unsigned_integer_sat_add,
> and then the vectorizer can try to recog the pattern like form 7 and
> form 8.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK.

Note that fold-const.cc has canonicalization for the minus one to be put last:

  /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
  if (truth_value_p (TREE_CODE (arg0))
  && tree_swap_operands_p (op1, op2))
{
  location_t loc0 = expr_location_or (arg0, loc);
  /* See if this can be inverted.  If it can't, possibly because
 it was a floating-point inequality comparison, don't do
 anything.  */
  tem = fold_invert_truthvalue (loc0, arg0);
  if (tem)
return fold_build3_loc (loc, code, type, tem, op2, op1);

we might want to consider such transform in match.pd, in this case this
would allow to elide one of the patterns.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..aae6d30a5e4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
>integer_minus_onep (usadd_left_part_2 @0 @1)))
>
> +/* Unsigned saturation add, case 7 (branch with le):
> +   SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
> +
> +/* Unsigned saturation add, case 8 (branch with gt):
> +   SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2))
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> --
> 2.34.1
>


Re: [PATCH] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-06-18 Thread Segher Boessenkool
On Mon, Jun 17, 2024 at 08:54:46PM -0500, Peter Bergner wrote:
> On 6/17/24 7:57 PM, Segher Boessenkool wrote:
> > On Mon, Jun 17, 2024 at 06:49:18PM -0500, Peter Bergner wrote:
> >> On 6/17/24 6:11 PM, Segher Boessenkool wrote:
> >> Yeah, I didn't write that, I only moved it, but I can try to come up with
> >> an explanation of why we need to disable it now.  That said, my hope is to
> >> not have to disable shrink-wrapping even when we emit the ROP protect hash
> >> insns in the future, but that will take some extra work.  If I can manage
> >> that, then this should all just go away. :-)  Until then, we can stick
> >> with this patch's micro-optimization.
> > 
> > If you inline one function into another, there is no ROP protection on
> > their boundary anymore (since there is no such boundary anymore!)  This
> > is not necessarily a problem, but you do want some noipa or similar
> > markup where without ROP protection you have no incentive to do that.
> > 
> > Shrink-wrapping allows more inlining, and more inlining allows more
> > shrink-wrapping, but there is no direct relation between shrink-wrapping
> > and our ROP protect stuff?  We just need to make sure the hashst and
> > hashchk things are done at the very start and the very end of the
> > functions, but we need to make sure of that anyway!
> > 
> > So yeah, please investigate a bit more :-)
> 
> So we should be able to shrink-wrap in the presence of the ROP protection.

Well of course, if we really *cannot* currently that obviously is
just a bug.

But do we want to?  And, how far, in what cases not?

In extremis everything is inlined into main(), and -mrop-protect doesn't
do any protection.  This can be done for *any* program btw.

In practice this isn't likely to happen so much.  But we need to
monitor -- a lot of (target, and backend) optimisations try to reduce
nesting overhead, usually by nesting less.  And -mrop-protect only does
anything at function boundaries :-)

> The ROP attacks work by buffer overrun type issues, clobbering the return
> address that was saved on the stack causing us to return to somewhere else.

Buffer overflows on the stack are the easiest / most common way to do
this, yes.  But there are other ways to do a return address overwrite.
Not as easy to exploit by far, in most programs, sure.

> If we don't need to save the return address on the stack like for leaf
> functions, or shrink-wrapped sections that are call free, those codes
> are not really susceptible to ROP attacks.

That is true in some way, yes.  Your caller will still check the chain
(albeit later).

> It's the call paths where we
> save the return address on the stack that we have to protect.  If inlining
> or shrink wrapping increases the amount of code that is call free (ie, we
> don't need to save the return address), then that code is not less safe
> than before but as safe or safer than before.  It seems the reason we
> disabled shrink-wrapping now, was that we were emitting the hashst in the
> wrong location (PR101324) causing us to store a bad hash value.  I think
> that was just a "bug" that probably should have been fixed rather than
> worked around by disabling shrink-wrapping.  It's on my TODO to take a
> look at fixing that correctly.

Sounds good!


Segher


Re: [PATCH v2] libstdc++: Fix build for AVR [PR115481, PR111639]

2024-06-18 Thread Detlef Vollmann

On 6/14/24 19:59, Jonathan Wakely wrote:

On Fri, 14 Jun 2024 at 18:45, Xi Ruoyao wrote:


On Fri, 2024-06-14 at 19:37 +0200, Detlef Vollmann wrote:

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5645e991af7..17dbae7bd87 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -5080,7 +5080,7 @@ else
  We can't simply define LARGE_OFF_T to be 9223372036854775807,
  since some C++ compilers masquerading as C compilers
  incorrectly reject 9223372036854775807.  */
-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))
int off_t_is_large[(LARGE_OFF_T % 2147483629 == 721
  && LARGE_OFF_T % 2147483647 == 1)
 ? 1 : -1];


This shouldn't happen.  Please regenerate using *vanilla* autoconf-2.69.


Yes, please, if possible. But I can regenerate it locally before
pushing if needed.


This version of the patch doesn't have the problem (this was the
easy part :-)

However, my tests show that this is only a partial solution.
But I believe the patch is correct, and it solves the build
problems for libstdc++ for AVR target.

However, at least for avr-libc 2.0.0 (the version that has the
*f functions as macros) it's not a complete patch:
std::sinf etc, and std::sinl etc are not provided
(::sinf and ::sinl are not provided either).
std::sin exists and has overloads for float, double and lon double.

And calling std::sin(float) only works if the call is inlined,
otherwise I get a linker error
".text._ZSt3sinf[_ZSt3sinf]+0x10): undefined reference to `sinf'".

I don't get a similar error for sinl when calling std::sin(long double).
I didn't have time to dig into the build of libstdc++ to find out
what the difference is.

For avr-libc 2.1.0 (that provides real *f functions but no *l)
I still get the error that std::sinl is not provided but none
of the other problems.

For avr-libc 2.2.0 (that provides real *f and *l functions) I don't
get any of the problems above.

But for all three avr-libc versions I get the problem that the
C++ sin(long double) returns a different value than the C sinl
(which is just another linker name for sin):
I don't really know about floating point representations, but
it looks like the C version fills the mantissa for a long double
return with 0, while the C++ version has bits there.
Looking at the implementation in math_stubs_long_double.cc,
which only calls the double version and promotes the return
to long double this looks strange.
But probably libstdc++ is compiled with different flags than
my test program.
And if I convert the long double result back to double I get
the same value as for C, so it's not really wrong, but it's
irritating...

So again, I think my patch helps, but is not a complete solution.

  Detlef
commit 1e68bbe1da86913820af146ec5294d1ab53d72f7
Author: Detlef Vollmann 
Date:   Sat Jun 15 00:19:45 2024 +0200

libstdc++-v3: detect math functions when using avr-libc and handle macros

Different versions of avr-libc have different definitions of
math functions:
  - 2.0.0 and earlier have float versions as macros
  - 2.1.0 has float versions as proper functions, but no long double versions
  - 2.2.0 has long double versions

This commit tells configure to always check if these functions are available.
It also #undef's any macros.

libstdc++-v3/ChangeLog:

* crossconfig.m4 [avr*-*-*]: Add compile-checks for
float and long double functions.
* configure: Regenerate.
* src/c++98/math_stubs_float.cc: #undef any macros.

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5645e991af7..891a5f8e053 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28628,51 +28628,2354 @@ case "${host}" in
 ;;
 
   avr*-*-*)
-$as_echo "#define HAVE_ACOSF 1" >>confdefs.h
 
-$as_echo "#define HAVE_ASINF 1" >>confdefs.h
 
-$as_echo "#define HAVE_ATAN2F 1" >>confdefs.h
 
-$as_echo "#define HAVE_ATANF 1" >>confdefs.h
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for acosf declaration" >&5
+$as_echo_n "checking for acosf declaration... " >&6; }
+if ${glibcxx_cv_func_acosf_use+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+
+
+  ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+#include 
+#ifdef HAVE_IEEEFP_H
+# include 
+#endif
+#undef acosf
+
+int
+main ()
+{
+
+  void (*f)(void) = (void (*)(void))acosf;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  glibcxx_cv_func_acosf_use=yes
+
+else
+  glibcxx_cv_func_acosf_use=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi

[committed] libstdc++: Fix outdated comment about standard integer types

2024-06-18 Thread Jonathan Wakely
Pushed to trunk.

-->8 --

The long long and unsigned long long types have been standard since
C++11, so are not extensions. There are also the char8_t, char16_t and
char32_t types. Just refer to the standard integer types, without saying
how many there are.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h: Fix outdated comment about the
number of standard integer types.
---
 libstdc++-v3/include/bits/cpp_type_traits.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 679eee99b90..6834dee5557 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -130,10 +130,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef __false_type __type;
 };
 
-  // Thirteen specializations (yes there are eleven standard integer
-  // types; long long and unsigned long long are
-  // supported as extensions).  Up to four target-specific __int
-  // types are supported as well.
+  // Explicit specializations for the standard integer types.
+  // Up to four target-specific __int types are supported as well.
   template<>
 struct __is_integer
 {
-- 
2.45.1



Re: [RFC PATCH] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores

2024-06-18 Thread Siarhei Volkau
пн, 17 июн. 2024 г. в 15:43, Richard Earnshaw (lists)
:

> I like the idea behind this patch, but I think I'd try first doing this as a 
> peephole2 rule to rewrite the address in this case.  That has the additional 
> advantage that we then estimate the size of the instruction more accurately.

Indeed, I tried it and it seems to work, although sometimes it does
odd things that I can't explain, e.g:

define_insn patchdefine_peephole2 patch
...  ...
ldmia   r0!, {r4, r5}movsr3, r0
ldmia   r1!, {r2, r3}ldmia   r3!, {r4, r5}
movsr0, r7   movsr0, r7
...  ldr r2, [r1, #0]
 ldr r3, [r1, #4]
 # r1 unused later on
 ...

But in general it finds a little bit more cases where ldmia/stmia can
be applied.

> > 2. Might it be profitable for thumb2?

> I think it would then be easy to extend this to thumb2 as well if it looks 
> like a win (perhaps only for -Os in the thumb2 case).

Sounds good, I'll look at it later.

> For testing, I'd start with something like 
> gcc/testsuite/gcc.target/arm/thumb-andsi.c as a template and adapt that for 
> your specific case.  Matching something like "ldmia\tr[0-7]!," should be 
> enough.

I'll send the v2 patch with test case(s) soon.

BR, Siarhei


Re: [pushed][PR114415][scheduler]: Fixing wrong code generation

2024-06-18 Thread Vaseeharan Vinayagamoorthy
Hi,

I have found that this patch has introduced a regression in the arm-none-eabi 
toolchain for a testcase in the libstdc++ testsuite, which was previously 
passing:

FAIL: 27_io/basic_istream/ignore/char/94749.cc execution test

The toolchain was built with:
Build = x86_64-none-linux-gnu
Host = x86_64-none-linux-gnu
Target = arm-none-eabi

The test is running on a simulator with:
 -mthumb/-march=armv8.1-m.main+mve/-mfloat-abi=hard

Kind regards,
Vasee


From: Vladimir Makarov 
Sent: 04 April 2024 21:10
To: gcc-patches@gcc.gnu.org
Subject: [pushed][PR114415][scheduler]: Fixing wrong code generation

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

The patch was successfully tested and bootstrapped on x86_64, ppc64le, aarch64.



[PATCH] tree-optimization/115537 - ICE with SLP condition reduction vectorization

2024-06-18 Thread Richard Biener
The condition rejecting "multiple-type" SLP condition reduction lacks
handling EXTRACT_LAST reductions.

Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.

Richard.

PR tree-optimization/115537
* tree-vect-loop.cc (vectorizable_reduction): Also reject
SLP condition reductions of EXTRACT_LAST kind when multiple
statement copies are involved.

* gcc.dg/vect/pr115537.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr115537.c | 19 +++
 gcc/tree-vect-loop.cc|  5 +++--
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115537.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115537.c 
b/gcc/testsuite/gcc.dg/vect/pr115537.c
new file mode 100644
index 000..99ed467feb8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115537.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mcpu=neoverse-n1" { target aarch64*-*-* } } */
+
+char *a;
+int b;
+void c()
+{
+  int d = 0, e = 0, f;
+  for (; f; ++f)
+if (a[f] == 5)
+  ;
+else if (a[f])
+  e = 1;
+else
+  d = 1;
+  if (d)
+if (e)
+  b = 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7c79e9da106..eeb75c09e91 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8083,13 +8083,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 
   if ((reduction_type == COND_REDUCTION
|| reduction_type == INTEGER_INDUC_COND_REDUCTION
-   || reduction_type == CONST_COND_REDUCTION)
+   || reduction_type == CONST_COND_REDUCTION
+   || reduction_type == EXTRACT_LAST_REDUCTION)
   && slp_node
   && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)
 {
   if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"multiple types in condition reduction reduction.\n");
+"multiple types in condition reduction.\n");
   return false;
 }
 
-- 
2.35.3


[MAINTAINERS] Update my email address

2024-06-18 Thread Kyrylo Tkachov
Hi all,

Pushing to trunk.
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov 

* MAINTAINERS (aarch64 port): Update my email address.
(DCO section): Likewise.



maintainers.patch
Description: maintainers.patch


Re: [PATCH 1/3 v3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-06-18 Thread Richard Biener
On Tue, 11 Jun 2024, Hu, Lin1 wrote:

> I wrap a part of code about indirect conversion. The API refers to 
> supportable_narrowing/widening_operations.

Sorry for the delay - comments inline.

> BRs,
> Lin
> 
> gcc/ChangeLog:
> 
>   PR target/107432
>   * tree-vect-generic.cc
>   (expand_vector_conversion): Support convert for int -> int,
>   float -> float and int <-> float.
>   * tree-vect-stmts.cc (vectorizable_conversion): Wrap the
>   indirect convert part.
>   (supportable_indirect_convert_operation): New function.
>   * tree-vectorizer.h (supportable_indirect_convert_operation):
>   Define the new function.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/107432
>   * gcc.target/i386/pr107432-1.c: New test.
>   * gcc.target/i386/pr107432-2.c: Ditto.
>   * gcc.target/i386/pr107432-3.c: Ditto.
>   * gcc.target/i386/pr107432-4.c: Ditto.
>   * gcc.target/i386/pr107432-5.c: Ditto.
>   * gcc.target/i386/pr107432-6.c: Ditto.
>   * gcc.target/i386/pr107432-7.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 
>  gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 +
>  gcc/testsuite/gcc.target/i386/pr107432-3.c |  55 +
>  gcc/testsuite/gcc.target/i386/pr107432-4.c |  56 +
>  gcc/testsuite/gcc.target/i386/pr107432-5.c |  72 ++
>  gcc/testsuite/gcc.target/i386/pr107432-6.c | 139 
>  gcc/testsuite/gcc.target/i386/pr107432-7.c | 156 +
>  gcc/tree-vect-generic.cc   |  33 ++-
>  gcc/tree-vect-stmts.cc | 244 +
>  gcc/tree-vectorizer.h  |   9 +
>  10 files changed, 1011 insertions(+), 92 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
> b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> new file mode 100644
> index 000..a4f37447eb4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> @@ -0,0 +1,234 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */
> +/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
> +
> +#include 
> +
> +typedef short __v2hi __attribute__ ((__vector_size__ (4)));
> +typedef char __v2qi __attribute__ ((__vector_size__ (2)));
> +typedef char __v4qi __attribute__ ((__vector_size__ (4)));
> +typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> +
> +typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4)));
> +typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8)));
> +typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2)));
> +typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4)));
> +typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8)));
> +typedef unsigned int __v2su __attribute__ ((__vector_size__ (8)));
> +
> +__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2si);
> +}
> +
> +__m128i  mm256_cvtepi64_epi32_builtin_convertvector(__m256i a)
> +{
> +  return (__m128i)__builtin_convertvector((__v4di)a, __v4si);
> +}
> +
> +__m256i  mm512_cvtepi64_epi32_builtin_convertvector(__m512i a)
> +{
> +  return (__m256i)__builtin_convertvector((__v8di)a, __v8si);
> +}
> +
> +__v2hi   mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2hi);
> +}
> +
> +__v4hi   mm256_cvtepi64_epi16_builtin_convertvector(__m256i a)
> +{
> +  return __builtin_convertvector((__v4di)a, __v4hi);
> +}
> +
> +__m128i  mm512_cvtepi64_epi16_builtin_convertvector(__m512i a)
> +{
> +  return (__m128i)__builtin_convertvector((__v8di)a, __v8hi);
> +}
> +
> +__v2qi   mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2qi);
> +}
> +
> +__v4qi   mm256_cvtepi64_epi8_builtin_convertvector(__m256i a)
> +{
> +  return __builtin_convertvector((__v4di)a, __v4qi);
> +}
> +
> +__v8qi   mm512_cvtepi64_epi8_builtin_convertvector(__m512i a)
> 

Re: [PATCH 0/8] Follow-on force_subreg patches

2024-06-18 Thread Richard Biener
On Mon, Jun 17, 2024 at 11:55 AM Richard Sandiford
 wrote:
>
> This series expands on the fix for PR115464 by using force_subreg
> in more places.  It also adds some convenience wrappers for lowpart
> and highpart subregs.
>
> A part of this will need to be backported after a grace period,
> but I'll post the cherry-picked parts separately.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard Sandiford (8):
>   Make force_subreg emit nothing on failure
>   aarch64: Use force_subreg in more places
>   Make more use of force_subreg
>   Add force_lowpart_subreg
>   aarch64: Add some uses of force_lowpart_subreg
>   Make more use of force_lowpart_subreg
>   Add force_highpart_subreg
>   aarch64: Add some uses of force_highpart_subreg
>
>  gcc/builtins.cc   | 22 +++---
>  gcc/config/aarch64/aarch64-builtins.cc| 15 +++
>  gcc/config/aarch64/aarch64-simd.md|  4 +-
>  .../aarch64/aarch64-sve-builtins-base.cc  | 10 ++---
>  .../aarch64/aarch64-sve-builtins-functions.h  |  6 +--
>  .../aarch64/aarch64-sve-builtins-sme.cc   |  2 +-
>  gcc/config/aarch64/aarch64.cc | 31 -
>  gcc/explow.cc | 34 +-
>  gcc/explow.h  |  2 +
>  gcc/expmed.cc | 26 ---
>  gcc/expr.cc   | 44 +--
>  gcc/optabs.cc | 26 ++-
>  .../aarch64/sve/acle/general/pr115464_2.c | 11 +
>  13 files changed, 111 insertions(+), 122 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr115464_2.c
>
> --
> 2.25.1
>


Re: [PATCH v1] Match: Support form 11 for the unsigned scalar .SAT_SUB

2024-06-18 Thread Richard Biener
On Mon, Jun 17, 2024 at 9:07 AM  wrote:
>
> From: Pan Li 
>
> We missed one match pattern for the unsigned scalar .SAT_SUB,  aka
> form 11.
>
> Form 11:
>   #define SAT_SUB_U_11(T) \
>   T sat_sub_u_11_##T (T x, T y) \
>   { \
> T ret; \
> bool overflow = __builtin_sub_overflow (x, y, ); \
> return overflow ? 0 : ret; \
>   }
>
> Thus,  add above form 11 to the match pattern gimple_unsigned_integer_sat_sub.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK, but see my other mail.  Eventually sth like

(for cmp (tcc_comparison)
   icmp (inverted_tcc_comparison)
   ncmp (inverted_tcc_comparison_with_nans)
(simplify
 (cond (cmp @0 @1) @2 @3)
 (if (tree_swap_operands_p (@2, @3))
  (with { enum tree_code ic = invert_tree_comparison (cmp, HONOR_NANS (@0)); }
   (if (ic == icmp)
   (cond (icmp @0 @1) @3 @2)
   (if (ic == ncmp)
(cond (ncmp @0 @1) @3 @2))

helps here.  Of course with matching PHIs the above isn't going to help.

> gcc/ChangeLog:
>
> * match.pd: Add form 11 match pattern for .SAT_SUB.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..5c330a43ed0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3186,13 +3186,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> -/* Unsigned saturation sub, case 7 (branch with .SUB_OVERFLOW).  */
> +/* Unsigned saturation sub, case 7 (branch eq with .SUB_OVERFLOW).  */
>  (match (unsigned_integer_sat_sub @0 @1)
>   (cond^ (eq (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
>(realpart @2) integer_zerop)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>&& types_match (type, @0, @1
>
> +/* Unsigned saturation sub, case 8 (branch ne with .SUB_OVERFLOW).  */
> +(match (unsigned_integer_sat_sub @0 @1)
> + (cond^ (ne (imagpart (IFN_SUB_OVERFLOW@2 @0 @1)) integer_zerop)
> +   integer_zerop (realpart @2))
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && types_match (type, @0, @1
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>


Re: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD

2024-06-18 Thread Richard Biener
On Mon, Jun 17, 2024 at 3:41 AM  wrote:
>
> From: Pan Li 
>
> When investigate the vectorization of .SAT_ADD,  we notice there
> are additional 2 forms,  aka form 7 and 8 for .SAT_ADD.
>
> Form 7:
>   #define DEF_SAT_U_ADD_FMT_7(T)  \
>   T __attribute__((noinline)) \
>   sat_u_add_##T##_fmt_7 (T x, T y)\
>   {   \
> return x > (T)(x + y) ? -1 : (x + y); \
>   }
>
> Form 8:
>   #define DEF_SAT_U_ADD_FMT_8(T)   \
>   T __attribute__((noinline))  \
>   sat_u_add_##T##_fmt_8 (T x, T y) \
>   {\
> return x <= (T)(x + y) ? (x + y) : -1; \
>   }
>
> Thus,  add above 2 forms to the match gimple_unsigned_integer_sat_add,
> and then the vectorizer can try to recog the pattern like form 7 and
> form 8.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression test with newlib.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap test.
> 4. The x86 fully regression test.

OK.

Note that fold-const.cc has canonicalization for the minus one to be put last:

  /* If the second operand is simpler than the third, swap them
 since that produces better jump optimization results.  */
  if (truth_value_p (TREE_CODE (arg0))
  && tree_swap_operands_p (op1, op2))
{
  location_t loc0 = expr_location_or (arg0, loc);
  /* See if this can be inverted.  If it can't, possibly because
 it was a floating-point inequality comparison, don't do
 anything.  */
  tem = fold_invert_truthvalue (loc0, arg0);
  if (tem)
return fold_build3_loc (loc, code, type, tem, op2, op1);

we might want to consider such transform in match.pd, in this case this
would allow to elide one of the patterns.

Richard.

> gcc/ChangeLog:
>
> * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 99968d316ed..aae6d30a5e4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
>integer_minus_onep (usadd_left_part_2 @0 @1)))
>
> +/* Unsigned saturation add, case 7 (branch with le):
> +   SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
> +
> +/* Unsigned saturation add, case 8 (branch with gt):
> +   SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
> +(match (unsigned_integer_sat_add @0 @1)
> + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2))
> +
>  /* Unsigned saturation sub, case 1 (branch with gt):
> SAT_U_SUB = X > Y ? X - Y : 0  */
>  (match (unsigned_integer_sat_sub @0 @1)
> --
> 2.34.1
>


Re: [PATCH] build: Fix missing variable quotes

2024-06-18 Thread Richard Biener
On Tue, Jun 18, 2024 at 10:35 AM Sam James  wrote:
>
> YunQiang Su  writes:
>
> > OK for trunk?
>
> It looks good to me, but I can't approve. (I'd dare say it's obvious,
> even.)
>
> Richard, any chance you could give it a quick ack?

OK


Re: [RE] [v2] RISC-V: Add Zfbfmin extension

2024-06-18 Thread Jin Ma

Hi, Feng
  Any new developments here on zvfbfmin and zvfbfwma?

BR,
Jin



















--
From:Fei Gao 
Send Time:2024 Jun. 7 (Fri.) 17:34
To:jinma; "gcc-patches"; 
zengxiao; wangfeng
Cc:jeffreyalaw; Kito Cheng; 
"juzhe.zhong"; "jinma.contrib"; 
jinma
Subject:Re: [RE] [v2] RISC-V: Add Zfbfmin extension







Hi Jin


We have completed zvfbfmin and zvfbfwma in GCC. 
Wang Feng will post after dragon boat festival. 


BR, 
Fei
From: Jin Ma
Date: 2024-06-07 15:35
To: gcc-patches; zengxiao
CC: jeffreyalaw; kito.cheng; juzhe.zhong; jinma.contrib; Jin Ma
Subject: [RE] [v2] RISC-V: Add Zfbfmin extension

Hi,
 
Is there a plan to implement zvfbfmin and zvfbfwma? Or how can I get the 
relevant patches
in advance for testing? By the way, The LLVM seems to be fully implemented now 
:-)
 
Ref:
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/293
 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/auto-generated/bfloat16/intrinsic_funcs.adoc
 
 
 
Thanks,
Jin






[PATCH, OpenACC 2.7, v3] Implement reductions for arrays and structs

2024-06-18 Thread Chung-Lin Tang
On 2024/6/6 9:41 PM, Chung-Lin Tang wrote:
> This is v2 of the C/C++/middle-end parts of array/struct
> support for OpenACC reductions.
> 
> The main changes are much fixed support for sub-arrays,
> and some new testcases.
> 
> Tested on mainline using x86_64 host and nvptx/amdgcn offloading.
> Will backport to upcoming omp/devel/gcc-14 branch after approved for mainline.

This is a quick update to a "v3" version: apart from tiny bug fixes in 
testcases,
an addition of automatic LDS increase for GCN (triggered by reductions over 
arrays of sufficient size).

Andrew, what I now do in gcn_shared_mem_layout is: increase acc_lds_size by 
increments of 0x600,
while giving a warning that this may decrease occupancy. Another warning type 
is given when the LDS
usage is more than architectural limit of 64KB, but compilation is allowed to 
proceed. I think this
is the better route, since maybe this limit is not very "hard" (more allowed in 
future?)

(FWIW, I was able to at least run such offload regions with more than 64K LDS 
usage, though I'm not
sure if somewhere later in the compiler/linker curbs this automatically)

Thanks,
Chung-Lin

2024-06-18  Chung-Lin Tang  

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new
functions.

gcc/ChangeLog:

* config/gcn/gcn.cc (LDS_INCR_UNIT): New macro symbol.
(acc_lds_size): Adjust init value definition.
(gcn_shared_mem_layout): Adjust acc_lds_size when reduction size too
large. Issue warning when reduction size causes LDS usage to increase
or break 64K limit.
* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* gimplify.cc (gimplify_scan_omp_clauses): Sanity checking for
supported array reduction cases.
(gimplify_adjust_omp_clauses): Peel away array MEM_REF for decl lookup.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_rec_input_clauses): Set OMP_CLAUSE_REDUCTION_PRIVATE_EXPR.
(oacc_array_reduction_bias): New function.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Use OMP_CLAUSE_REDUCTION_PRIVATE_EXPR as reduction private copy if set.
Handle array reductions using new oacc_array_reduction_bias function.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

* tree.cc (omp_clause_num_ops): Increase OMP_CLAUSE_REDUCTION ops to 6.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_EXPR): New macro.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/reduction-9.c: New test.
* 

Re: [PATCH] [alpha] adjust MEM alignment for block move [PR115459] (was: Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required)

2024-06-18 Thread Maciej W. Rozycki
On Thu, 13 Jun 2024, Maciej W. Rozycki wrote:

> > Maciej, would you be so kind as to give it a spin with a native
> > regstrap?  TIA,
> 
>  I will certainly run regression-testing once the job I started yesterday 
> has finished with my Alpha system, which should be fairly soon as it's 
> already well into libstdc++ testing.

 This has now completed successfully, with no regressions observed across 
all the GCC and library testsuites other than the gnat one.

 The gnat one obviously couldn't be run without the fix in place, not even 
in a build without libada, because it depends on gnattools, which in turn 
need libada.  Results with the change applied appear reasonable however, 
though they are not completely clean.  I trigerred issues while running 
this part of the testsuite and I have now posted a proposed fix, at: 
.

 Also one of the libstdc++ test cases caused to lock the target machine up 
regardless of the fix due to memory exhaustion which took me some time to 
investigate and sort out (now dealt with `ulimit -d'), and a testsuite run 
takes 24 hours almost exactly, hence the total amount of time it took me 
to complete this verification.

>  I cannot make a native bootstrap however as I have only just set up my 
> Alpha to run at all and it has a very rudimentary and outdated userland, 
> suitable for remote regression testing only.  It's NFS-rooted too and due 
> to a failure in my lab last month I may not be able to recover from before 
> August it runs over backup 10Mbps Ethernet rather than intended 100Mbps 
> FDDI, so I can imagine performance would be abysmal even if I brought the 
> userland up to date.
> 
>  However Adrian (cc-ed) has recently told me he could be running all kinds 
> of stuff with his Alpha.  Adrian, would you be able to verify Alexandre's 
> proposed fix in a native regstrap?

 We've exchanged with Adrian a couple of messages off-list and I'm not 
sure after all whether he'll be able to run such a regstrap.  Given that 
this fix has addressed a problem affecting building a part of the compiler 
itself and the overall status of the Alpha port I do hope it can go ahead 
based on cross-compilation verification only.  Several of our targets are 
routinely verified in such a manner only.

 FAOD I run my builds with `--enable-werror-always' requested to have 
`-Werror' included in the compilation options just as a native regstrap 
has, so this part of verification has been covered as well.

  Maciej


[PATCH 1/1] ada: Make the names of uninstalled cross-gnattools consistent across builds

2024-06-18 Thread Maciej W. Rozycki
We suffer from an inconsistency in the names of uninstalled gnattools 
executables in cross-compiler configurations.  The cause is a recipe we 
have:

ada.all.cross:
for tool in $(ADA_TOOLS) ; do \
  if [ -f $$tool$(exeext) ] ; \
  then \
$(MV) $$tool$(exeext) $$tool-cross$(exeext); \
  fi; \
done

the intent of which is to give the names of gnattools executables the 
'-cross' suffix, consistently with the compiler drivers: 'gcc-cross', 
'g++-cross', etc.

A problem with the recipe is that this 'make' target is called too early 
in the build process, before gnattools have been made.  Consequently no 
renames happen and owing to that they are conditional on the presence of 
the individual executables the recipe succeeds doing nothing.

However if a target is requested later on such as 'make pdf' that does 
not cause gnattools executables to be rebuilt, then 'ada.all.cross' does 
succeed in renaming the executables already present in the build tree.  
Then if the 'gnat' testsuite is run later on which expects non-suffixed 
'gnatmake' executable, it does not find the 'gnatmake-cross' executable 
in the build tree and may either catastrophically fail or incorrectly 
use a system-installed copy of 'gnatmake'.

Of course if a target is requested such as `make all' that does cause 
gnattools executables to be rebuilt, then both suffixed and non-suffixed 
uninstalled executables result.

Fix the problem by moving the renaming of gnattools to a separate 'make' 
recipe, pasted into a new 'gnattools-cross-mv' target and the existing 
legacy 'cross-gnattools' target.  Then invoke the new target explicitly 
from the 'gnattools-cross' recipe in gnattools/.

Update the test harness accordingly, so that suffixed gnattools are used 
in cross-compilation testsuite runs.

gcc/
* ada/gcc-interface/Make-lang.in (ada.all.cross): Move recipe 
to...
(GNATTOOLS_CROSS_MV): ... this new variable.
(cross-gnattools): Paste it here.
(gnattools-cross-mv): New target.

gnattools/
* Makefile.in (gnattools-cross): Also build 'gnattools-cross-mv' 
in GCC_DIR.

gcc/testsuite/
* lib/gnat.exp (local_find_gnatmake, find_gnatclean): Use 
'-cross' suffix where testing a cross-compiler.
---
 gcc/ada/gcc-interface/Make-lang.in |   19 ---
 gcc/testsuite/lib/gnat.exp |   22 ++
 gnattools/Makefile.in  |1 +
 3 files changed, 31 insertions(+), 11 deletions(-)

gcc-ada-all-cross-gnattools.diff
Index: gcc/gcc/ada/gcc-interface/Make-lang.in
===
--- gcc.orig/gcc/ada/gcc-interface/Make-lang.in
+++ gcc/gcc/ada/gcc-interface/Make-lang.in
@@ -780,6 +780,7 @@ gnattools: $(GCC_PARTS) $(CONFIG_H) pref
 cross-gnattools: force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1-re
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools2
+   $(GNATTOOLS_CROSS_MV)
 
 canadian-gnattools: force
$(MAKE) -C ada $(ADA_TOOLS_FLAGS_TO_PASS) gnattools1-re
@@ -795,19 +796,23 @@ gnatlib gnatlib-sjlj gnatlib-zcx gnatlib
   FORCE_DEBUG_ADAFLAGS="$(FORCE_DEBUG_ADAFLAGS)" \
   $@
 
+gnattools-cross-mv:
+   $(GNATTOOLS_CROSS_MV)
+
+GNATTOOLS_CROSS_MV=\
+  for tool in $(ADA_TOOLS) ; do \
+if [ -f $$tool$(exeext) ] ; \
+then \
+  $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
+fi; \
+  done
+
 # use only for native compiler
 gnatlib_and_tools: gnatlib gnattools
 
 # Build hooks:
 
 ada.all.cross:
-   for tool in $(ADA_TOOLS) ; do \
- if [ -f $$tool$(exeext) ] ; \
- then \
-   $(MV) $$tool$(exeext) $$tool-cross$(exeext); \
- fi; \
-   done
-
 ada.start.encap:
 ada.rest.encap:
 ada.man:
Index: gcc/gcc/testsuite/lib/gnat.exp
===
--- gcc.orig/gcc/testsuite/lib/gnat.exp
+++ gcc/gcc/testsuite/lib/gnat.exp
@@ -199,12 +199,19 @@ proc prune_gnat_output { text } {
 # which prevent multilib from working, so define a new one.
 
 proc local_find_gnatmake {} {
+global target_triplet
 global tool_root_dir
+global host_triplet
 
 if ![is_remote host] {
-set file [lookfor_file $tool_root_dir gnatmake]
+   if { "$host_triplet" == "$target_triplet" } {
+   set gnatmake gnatmake
+   } else {
+   set gnatmake gnatmake-cross
+   }
+   set file [lookfor_file $tool_root_dir $gnatmake]
 if { $file == "" } {
-   set file [lookfor_file $tool_root_dir gcc/gnatmake]
+   set file [lookfor_file $tool_root_dir gcc/$gnatmake]
 }
 if { $file != "" } {
set root [file dirname $file]
@@ -225,12 +232,19 @@ proc local_find_gnatmake {} {
 }
 
 proc find_gnatclean {} {
+global target_triplet
 global tool_root_dir
+global host_triplet
 
 if ![is_remote host] {
-set 

[PATCH 0/1] ada: Make the names of uninstalled cross-gnattools consistent across builds

2024-06-18 Thread Maciej W. Rozycki
Hi,

 Having rebuilt GCC with no changes relevant to Ada I saw all the gnat 
tests fail all of a sudden.  Upon a closer inspection I have noticed that 
in the earlier build where tests passed `gnatmake' was invoked (in an 
`alpha-linux-gnu' cross-compiler build) as:

/path/to/alpha-linux/obj/gcc/gcc/gnatmake 
--GCC=/path/to/alpha-linux/obj/gcc/gcc/xgcc 
--GNATBIND=/path/to/alpha-linux/obj/gcc/gcc/gnatbind 
--GNATLINK=/path/to/alpha-linux/obj/gcc/gcc/gnatlink -cargs 
-B/path/to/alpha-linux/obj/gcc/gcc -largs 
--GCC=/path/to/alpha-linux/obj/gcc/gcc/xgcc -B/path/to/alpha-linux/obj/gcc/gcc  
-margs --RTS=/path/to/alpha-linux/obj/gcc/alpha-linux-gnu/./libada [...]

while in the later one it was instead invoked as:

alpha-linux-gnu-gnatmake 
--RTS=/path/to/alpha-linux/obj/gcc/alpha-linux-gnu/./libada [...]

so rather than the uninstalled program a previously installed one from the 
install root in /path/to/alpha-linux/install/usr/bin was run with all the 
options pointing to the build tree missing, causing failures such as:

FAIL: gnat.dg/abstract1.adb (test for excess errors)
Excess errors:
alpha-linux-gnu-gcc: fatal error: cannot execute 'gnat1': posix_spawnp: No such 
file or directory
compilation terminated.

(why `alpha-linux-gnu-gcc' couldn't find `gnat1' via a relative path to 
its libexec dir while it finds `cc1', etc. just fine is another matter).

 I have tracked down the cause to a small difference in the two build 
scripts that I use: one runs `make pdf' after the main build and the other 
one does not.  The latter one makes the gnat testsuite work, while the 
former one causes it to fail.

 Now one might ask themselves: "Why would building PDF documentation
matter for the gnat testsuite?"  Indeed, a good question, with a 
surprising answer.  It boils down to how uninstalled cross-gnattools are 
each named.  There is a hook in gcc/ada/gcc-interface/Make-lang.in, which 
has this:

ada.all.cross:
for tool in $(ADA_TOOLS) ; do \
  if [ -f $$tool$(exeext) ] ; \
  then \
$(MV) $$tool$(exeext) $$tool-cross$(exeext); \
  fi; \
done

Clearly cross-gnattools are meant to be called `gnatmake-cross', etc. in 
their uninstalled forms and this hook is supposed to take care of it.  
However in a normal build of the compiler the hook is invoked too early, 
before gnattools have been compiled and given that it's all conditional it 
silently does nothing.  Now upon a reinvocation of `make' to build PDF 
documentation gnattools will have already been built and are there in 
place, so the hook faithfully renames the executables as told:

[...]
make[4]: Nothing to be done for 'pdf'.
Making pdf in mpcheck
make[4]: Nothing to be done for 'pdf'.
make[4]: Nothing to be done for 'pdf-am'.
make[3]: Nothing to be done for 'pdf-am'.
make[2]: Entering directory '/path/to/alpha-linux/obj/gcc/gcc'
for tool in gnatbind gnatchop gnat gnatkr gnatlink gnatls gnatmake gnatname 
gnatprep gnatclean ; do \
  if [ -f $tool ] ; \
  then \
mv $tool $tool-cross; \
  fi; \
done
make[2]: Leaving directory '/path/to/alpha-linux/obj/gcc/gcc'
make[1]: Entering directory '/path/to/alpha-linux/obj/gcc'
[...]

Oh well!

 And then `make install' does not care, because it has:

gnat-install-tools:
$(MKDIR) $(DESTDIR)$(bindir)
-if [ -f gnat1$(exeext) ] ; \
then \
  for tool in $(ADA_TOOLS) ; do \
install_name=`echo $$tool|sed 
'$(program_transform_name)'`$(exeext); \
$(RM) $(DESTDIR)$(bindir)/$$install_name; \
if [ -f $$tool-cross$(exeext) ] ; \
then \
  $(INSTALL_PROGRAM) $$tool-cross$(exeext) 
$(DESTDIR)$(bindir)/$$install_name; \
else \
  $(INSTALL_PROGRAM) $$tool$(exeext) 
$(DESTDIR)$(bindir)/$$install_name; \
fi ; \
  done; \
  $(RM) $(DESTDIR)$(bindir)/gnatdll$(exeext); \
  $(INSTALL_PROGRAM) gnatdll$(exeext) 
$(DESTDIR)$(bindir)/gnatdll$(exeext); \
fi

so it works regardless of whether the names have a `-cross' suffix or not.

 Then the test harness which looks for `gnatmake' in the build tree 
rather than `gnatmake-cross' fails to find the executable and resorts to 
generic `alpha-linux-gnu-gnatmake', having skipped all the logic to figure 
out the options pointing to the build tree.  Looking for `gnatmake' rather 
than `gnatmake-cross' in the cross-compilation case is the second bug 
here.

 This seems like a long-standing issue as there haven't been changes in 
these areas since forever and I guess nobody noticed this because they 
didn't invoke `make pdf' or a similar target that would cause the 
`ada.all.cross' `make' target to trigger again after the main build.

 This patch addresses the issue by moving the renaming of cross-gnattools 
to separate `make' targets/recipes, explicitly invoked after gnattools has 
been built in a cross-compiler configuration, while adjusting the test 
harness is accordingly.  I have verified with 

Re: [PATCH-1v4] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-06-18 Thread HAO CHEN GUI
Hi Richard,

在 2024/6/17 17:04, Richard Sandiford 写道:
> I don't think we should keep the single_set condition after this change.
> insn_cost can handle all instructions.

Just tested with removing single_set condition. It causes some regressions.
If the new_rtl is a debug insn, it still can do the replacement as it can
be recog and it's not a single_set rtl. After removing single_set condition,
the "ok" (recog) will be reset to false in the "if" block as the cost of
debug insn is unknown. So the replacement won't be done.

  bool ok = recog (attempt, use_change);
  if (ok && !prop.changed_mem_p () && !use_insn->is_asm ())
{
  bool strict_p = !prop.likely_profitable_p ();
  if (!change_is_worthwhile (use_change, strict_p))
{
  if (dump_file)
fprintf (dump_file, "change not profitable");
  ok = false;
}
}

  if (!ok)
{
  /* The pattern didn't match, but if all uses of SRC folded to
 constants, we can add a REG_EQUAL note for the result, if there
 isn't one already.  */
  if (!prop.folded_to_constants_p ())

  Looking forwarding to your advice.

Thanks
Gui Haochen


[PATCH] MIPS: Set condmove cost to SET(REG, REG)

2024-06-18 Thread YunQiang Su
On most uarch, the cost condmove is same as other noraml integer,
and it should be COSTS_N_INSNS(1).

In GCC12 or previous, the condmove is always enabled, and from
GCC13, we start to compare the cost.

The generic rtx_cost give the result of COSTS_N_INSN(2).
Let's define it to COSTS_N_INSN(1) in mips_rtx_costs.

gcc
* config/mips/mips.cc(mips_rtx_costs): Set condmove cost.
* config/mips/mips.md(mov_on_,
mov_on__mips16e2,
mov_on__ne
mov_on__ne_mips16e2): Define name by
remove starting *, so that we can use CODE_FOR_.

gcc/testsute
* gcc.target/mips/movcc-2.c: Add k?100:1000 test.
---
 gcc/config/mips/mips.cc | 24 
 gcc/config/mips/mips.md |  8 
 gcc/testsuite/gcc.target/mips/movcc-2.c | 14 ++
 3 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index b7acf041903..48924116937 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -4692,6 +4692,30 @@ mips_rtx_costs (rtx x, machine_mode mode, int outer_code,
  *total = mips_set_reg_reg_cost (GET_MODE (SET_DEST (x)));
  return true;
}
+  int insn_code;
+  if (register_operand (SET_DEST (x), VOIDmode)
+ && GET_CODE (SET_SRC (x)) == IF_THEN_ELSE)
+   insn_code = recog_memoized (make_insn_raw (x));
+  else
+   insn_code = -1;
+  switch (insn_code)
+   {
+   /* MIPS16e2 ones may be listed here, while the only known CPU core
+  that implements MIPS16e2 is interAptiv.  The Dependency delays
+  of MOVN/MOVZ on interAptiv is 3.  */
+   case CODE_FOR_movsi_on_si:
+   case CODE_FOR_movdi_on_si:
+   case CODE_FOR_movsi_on_di:
+   case CODE_FOR_movdi_on_di:
+   case CODE_FOR_movsi_on_si_ne:
+   case CODE_FOR_movdi_on_si_ne:
+   case CODE_FOR_movsi_on_di_ne:
+   case CODE_FOR_movdi_on_di_ne:
+ *total = mips_set_reg_reg_cost (GET_MODE (SET_DEST (x)));
+ return true;
+   default:
+ break;
+   }
   return false;
 
 default:
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 508fb1afa6c..9962313602a 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7492,7 +7492,7 @@ (define_insn "insn_pseudo"
 
 ;; MIPS4 Conditional move instructions.
 
-(define_insn "*mov_on_"
+(define_insn "mov_on_"
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
(if_then_else:GPR
 (match_operator 4 "equality_operator"
@@ -7507,7 +7507,7 @@ (define_insn "*mov_on_"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
-(define_insn "*mov_on__mips16e2"
+(define_insn "mov_on__mips16e2"
   [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
(if_then_else:GPR
 (match_operator 4 "equality_operator"
@@ -7525,7 +7525,7 @@ (define_insn "*mov_on__mips16e2"
(set_attr "mode" "")
(set_attr "extended_mips16" "yes")])
 
-(define_insn "*mov_on__ne"
+(define_insn "mov_on__ne"
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
(if_then_else:GPR
 (match_operand:GPR2 1 "register_operand" ",")
@@ -7538,7 +7538,7 @@ (define_insn "*mov_on__ne"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
-(define_insn "*mov_on__ne_mips16e2"
+(define_insn "mov_on__ne_mips16e2"
   [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
   (if_then_else:GPR
(match_operand:GPR2 1 "register_operand" 
",,t,t")
diff --git a/gcc/testsuite/gcc.target/mips/movcc-2.c 
b/gcc/testsuite/gcc.target/mips/movcc-2.c
index 1926e6460d1..cbda3c8febc 100644
--- a/gcc/testsuite/gcc.target/mips/movcc-2.c
+++ b/gcc/testsuite/gcc.target/mips/movcc-2.c
@@ -3,6 +3,8 @@
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 /* { dg-final { scan-assembler "\tmovz\t" } } */
 /* { dg-final { scan-assembler "\tmovn\t" } } */
+/* { dg-final { scan-assembler "\tmovz\t" } } */
+/* { dg-final { scan-assembler "\tmovn\t" } } */
 
 void ext_long (long);
 
@@ -17,3 +19,15 @@ sub5 (long i, long j, int k)
 {
   ext_long (!k ? i : j);
 }
+
+NOMIPS16 long
+sub6 (int k)
+{
+  return !k ? 100 : 1000;
+}
+
+NOMIPS16 long
+sub7 (int k)
+{
+  return !k ? 100 : 1000;
+}
-- 
2.39.3 (Apple Git-146)



Re: [wwwdocs,pushed] backends.html - Update weblinks to AVR simulators

2024-06-18 Thread Georg-Johann Lay




Am 18.06.24 um 00:06 schrieb Gerald Pfeifer:

On Sat, 15 Jun 2024, Georg-Johann Lay wrote:

Applied this one:


Cool.


+SimulAVR at https://www.nongnu.org/simulavr;


This one gives a http response of "301 Moved Permanently" redirecting to
https://www.nongnu.org/simulavr/ . I'll fix this in a minute.

On a related note, though, can we update the references to the simulators
from (exemplary)

+avrtest at
+  https://github.com/sprintersb/atest;
+>https://github.com/sprintersb/atest

to

+https://github.com/sprintersb/atest;>avrtest


Thanks,
Gerald


Of course, yes.  I was unsure about the used link style; the old page
had link text = href.  Using the project name feels much more natural.

Johann



Re: [PATCH] build: Fix missing variable quotes

2024-06-18 Thread Sam James
YunQiang Su  writes:

> OK for trunk?

It looks good to me, but I can't approve. (I'd dare say it's obvious,
even.)

Richard, any chance you could give it a quick ack?


[PATCH v1 2/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar form 12

2024-06-18 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 12 of unsigned SAT_SUB and
the RISC-V backend implement the SAT_SUB for vector mode, add
more test case to cover the form 12.

Form 12:
  #define DEF_SAT_U_SUB_FMT_12(T)\
  T __attribute__((noinline))\
  sat_u_sub_##T##_fmt_12 (T x, T y)  \
  {  \
T ret;   \
bool overflow = __builtin_sub_overflow (x, y, ); \
return !overflow ? ret : 0;  \
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper macro for
testing.
* gcc.target/riscv/sat_u_sub-45.c: New test.
* gcc.target/riscv/sat_u_sub-46.c: New test.
* gcc.target/riscv/sat_u_sub-47.c: New test.
* gcc.target/riscv/sat_u_sub-48.c: New test.
* gcc.target/riscv/sat_u_sub-run-45.c: New test.
* gcc.target/riscv/sat_u_sub-run-46.c: New test.
* gcc.target/riscv/sat_u_sub-run-47.c: New test.
* gcc.target/riscv/sat_u_sub-run-48.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-45.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-46.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-47.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-48.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-45.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-46.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-47.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-48.c   | 25 +++
 9 files changed, 182 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-45.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-46.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-47.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-48.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-45.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-46.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-47.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-48.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index ab7289a6947..0c2e44af718 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -150,6 +150,15 @@ sat_u_sub_##T##_fmt_11 (T x, T y)  \
   return overflow ? 0 : ret;   \
 }
 
+#define DEF_SAT_U_SUB_FMT_12(T)\
+T __attribute__((noinline))\
+sat_u_sub_##T##_fmt_12 (T x, T y)  \
+{  \
+  T ret;   \
+  bool overflow = __builtin_sub_overflow (x, y, ); \
+  return !overflow ? ret : 0;  \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -161,5 +170,6 @@ sat_u_sub_##T##_fmt_11 (T x, T y)  \
 #define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
 #define RUN_SAT_U_SUB_FMT_10(T, x, y) sat_u_sub_##T##_fmt_10(x, y)
 #define RUN_SAT_U_SUB_FMT_11(T, x, y) sat_u_sub_##T##_fmt_11(x, y)
+#define RUN_SAT_U_SUB_FMT_12(T, x, y) sat_u_sub_##T##_fmt_12(x, y)
 
 #endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-45.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-45.c
new file mode 100644
index 000..1aad8961e29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-45.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_12:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_12(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-46.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-46.c
new file mode 100644
index 000..d184043f6f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-46.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } 

Re: [wwwdocs] [PATCH 2/4] codingconventions: Fix various typos

2024-06-18 Thread Gerald Pfeifer
On Tue, 22 Mar 2022, Pokechu22 via Gcc-patches wrote:
>  htdocs/codingconventions.html | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Thank you for this patch, and apologies it fell through the cracks.

Applying your patch I found that I independently had discovered and fixed 
the first two items half a year later via

   commit 69102e7bf8c4afdc6380e0e6547c84cc5649eae5
   Author: Gerald Pfeifer 
   Date:   Wed Oct 19 16:11:10 2022 +0200

codingconventions: Fix two typos

The third hunk, though, was still there, and I fixed this now:

> -On the rare occasion that using mulitple inheritance is indeed useful,
> +On the rare occasion that using multiple inheritance is indeed useful,

Thank you for pointing these out.

Gerald


[PATCH v1 1/2] RISC-V: Add testcases for unsigned .SAT_SUB scalar form 11

2024-06-18 Thread pan2 . li
From: Pan Li 

After the middle-end support the form 11 of unsigned SAT_SUB and
the RISC-V backend implement the SAT_SUB for vector mode, add
more test case to cover the form 11.

Form 11:
  #define DEF_SAT_U_SUB_FMT_11(T)\
  T __attribute__((noinline))\
  sat_u_sub_##T##_fmt_11 (T x, T y)  \
  {  \
T ret;   \
bool overflow = __builtin_sub_overflow (x, y, ); \
return overflow ? 0 : ret;   \
  }

Passed the rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper
macro for testing.
* gcc.target/riscv/sat_u_sub-41.c: New test.
* gcc.target/riscv/sat_u_sub-42.c: New test.
* gcc.target/riscv/sat_u_sub-43.c: New test.
* gcc.target/riscv/sat_u_sub-44.c: New test.
* gcc.target/riscv/sat_u_sub-run-41.c: New test.
* gcc.target/riscv/sat_u_sub-run-42.c: New test.
* gcc.target/riscv/sat_u_sub-run-43.c: New test.
* gcc.target/riscv/sat_u_sub-run-44.c: New test.

Signed-off-by: Pan Li 
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h| 11 
 gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c | 19 ++
 gcc/testsuite/gcc.target/riscv/sat_u_sub-43.c | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_sub-44.c | 17 +
 .../gcc.target/riscv/sat_u_sub-run-41.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-42.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-43.c   | 25 +++
 .../gcc.target/riscv/sat_u_sub-run-44.c   | 25 +++
 9 files changed, 183 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-43.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-44.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-41.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-42.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-43.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub-run-44.c

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 0f94c5ff087..ab7289a6947 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -2,6 +2,7 @@
 #define HAVE_SAT_ARITH
 
 #include 
+#include 
 
 
/**/
 /* Saturation Add (unsigned and signed)   
*/
@@ -140,6 +141,15 @@ sat_u_sub_##T##_fmt_10 (T x, T y)   \
   return !overflow ? ret : 0;   \
 }
 
+#define DEF_SAT_U_SUB_FMT_11(T)\
+T __attribute__((noinline))\
+sat_u_sub_##T##_fmt_11 (T x, T y)  \
+{  \
+  T ret;   \
+  bool overflow = __builtin_sub_overflow (x, y, ); \
+  return overflow ? 0 : ret;   \
+}
+
 #define RUN_SAT_U_SUB_FMT_1(T, x, y) sat_u_sub_##T##_fmt_1(x, y)
 #define RUN_SAT_U_SUB_FMT_2(T, x, y) sat_u_sub_##T##_fmt_2(x, y)
 #define RUN_SAT_U_SUB_FMT_3(T, x, y) sat_u_sub_##T##_fmt_3(x, y)
@@ -150,5 +160,6 @@ sat_u_sub_##T##_fmt_10 (T x, T y)   \
 #define RUN_SAT_U_SUB_FMT_8(T, x, y) sat_u_sub_##T##_fmt_8(x, y)
 #define RUN_SAT_U_SUB_FMT_9(T, x, y) sat_u_sub_##T##_fmt_9(x, y)
 #define RUN_SAT_U_SUB_FMT_10(T, x, y) sat_u_sub_##T##_fmt_10(x, y)
+#define RUN_SAT_U_SUB_FMT_11(T, x, y) sat_u_sub_##T##_fmt_11(x, y)
 
 #endif
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
new file mode 100644
index 000..dd13f94e40f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_sub-41.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_sub_uint8_t_fmt_11:
+** sub\s+[atx][0-9]+,\s*a0,\s*a1
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1
+** and\s+a0,\s*a0,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_SUB_FMT_11(uint8_t)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_SUB " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_sub-42.c
new file mode 100644
index 000..3ed4195b18b
--- /dev/null
+++ 

Re: [PATCH] build: Fix missing variable quotes

2024-06-18 Thread YunQiang Su
OK for trunk?


-- 
YunQiang Su


[RFC] MIPS: Use SLL+BGEZ for one bit test on pre-R2

2024-06-18 Thread YunQiang Su
PR target/111376.
Currently, we are using LUI/ANDI/BEQZ for on-bit-test if the bitpos>=16,
while in fact we can use SLL/BGEZ.

Note:
1) if bitpos<16, we can use ANDI/BEQZ.
2) For R2+, we have EXT.

Known problems:
  1. On some uarch, SLL has more delay, such as 74K:
 See the talk in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111376.
  2. We haven't test it on any real pre-R2 hardware for performance.
 So, I request some test here.
---
 gcc/config/mips/mips.md   | 33 +++
 .../gcc.target/mips/mips3-one-bit-test.c  | 55 +++
 2 files changed, 88 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips3-one-bit-test.c

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 806fd29cf97..508fb1afa6c 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -6256,6 +6256,39 @@ (define_insn "*branch_bit_inverted"
 }
   [(set_attr "type" "branch")
(set_attr "branch_likely" "no")])
+
+(define_insn_and_split "*branch_on_bit"
+  [(set (pc)
+   (if_then_else
+   (match_operator 0 "equality_operator"
+   [(zero_extract:GPR (match_operand:GPR 2 "register_operand" "d")
+(const_int 1)
+(match_operand:GPR 3 "const_int_operand"))
+(const_int 0)])
+   (label_ref (match_operand 1))
+   (pc)))]
+  "!ISA_HAS_BBIT && !ISA_HAS_EXT_INS && !TARGET_MIPS16 && UINTVAL 
(operands[3]) >= 16"
+  "#"
+  "&& !reload_completed"
+  [(set (match_dup 4)
+   (ashift:GPR (match_dup 2) (match_dup 3)))
+   (set (pc)
+   (if_then_else
+   (match_op_dup 0 [(match_dup 4) (const_int 0)])
+   (label_ref (match_operand 1))
+   (pc)))]
+{
+  int shift = GET_MODE_BITSIZE (mode) - 1 - INTVAL (operands[3]);
+  operands[3] = GEN_INT (shift);
+  operands[4] = gen_reg_rtx (mode);
+
+  if (GET_CODE (operands[0]) == EQ)
+operands[0] = gen_rtx_GE (mode, operands[4], const0_rtx);
+  else
+operands[0] = gen_rtx_LT (mode, operands[4], const0_rtx);
+}
+[(set_attr "type" "branch")])
+
 
 ;;
 ;;  
diff --git a/gcc/testsuite/gcc.target/mips/mips3-one-bit-test.c 
b/gcc/testsuite/gcc.target/mips/mips3-one-bit-test.c
new file mode 100644
index 000..50672e71d73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips3-one-bit-test.c
@@ -0,0 +1,55 @@
+/* { dg-options "-mips3 -mgp64" } */
+/* FIXME: -Os fails due to rtx_cost: PR115473.  */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" "-Os" } { "" } } */
+/* { dg-final { scan-assembler "f32_15:.*andi\t\\\$4,\\\$4,0x8000" } } */
+/* { dg-final { scan-assembler "f32:.*sll\t\\\$4,\\\$4,15" } } */
+/* { dg-final { scan-assembler "f64_15:.*andi\t\\\$4,\\\$4,0x8000" } } */
+/* { dg-final { scan-assembler "f64:.*dsll\t\\\$4,\\\$4,47" } } */
+
+/* Test to make sure we can use sll+bgtz to test one bit.
+   See PR111376.  */
+
+int f1();
+int f2();
+
+/* If the bits is < 16, we can use andi+beqz.  */
+NOMIPS16 int
+f32_15(int a)
+{
+  int p = (a & (1<<15));
+  if (p)
+return f1();
+  else
+return f2();
+}
+
+/* If the bits >= 16, we can use sll+bgez.  */
+NOMIPS16 int
+f32(int a)
+{
+  int p = (a & (1<<16));
+  if (p)
+return f1();
+  else
+return f2();
+}
+
+NOMIPS16 int
+f64_15(long long a)
+{
+  long long p = (a & (1LL<<15));
+  if (p)
+return f1();
+  else
+return f2();
+}
+
+NOMIPS16 int
+f64(long long a)
+{
+  long long p = (a & (1LL<<16));
+  if (p)
+return f1();
+  else
+return f2();
+}
-- 
2.39.3 (Apple Git-146)



[PATCH v2 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-18 Thread Kong, Lingling
gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that
test if the cfcmov can be generated.
(ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p
return ture.
* config/i386/i386-opts.h (enum apx_features): Add apx_cfcmov.
* config/i386/i386.cc (ix86_have_conditional_move_mem_notrap): New
function to hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
(TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP): Target hook define.
(ix86_rtx_costs): Add UNSPEC_APX_CFCMOV cost;
* config/i386/i386.h (TARGET_APX_CFCMOV): Define.
* config/i386/i386.md (cfmovcc): New define_insn to support
cfcmov.
(cfmovcc_2): Ditto.
(UNSPEC_APX_CFCMOV): New unspec for cfcmov.
* config/i386/i386.opt: Add enum value for cfcmov.
* config/i386/predicates.md (register_or_cfc_mem_operand): New
define_predicate.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-cfcmov-1.c: New test.
* gcc.target/i386/apx-cfcmov-2.c: Ditto.
---
 gcc/config/i386/i386-expand.cc   | 63 +
 gcc/config/i386/i386-opts.h  |  4 +-
 gcc/config/i386/i386.cc  | 16 +++--
 gcc/config/i386/i386.h   |  1 +
 gcc/config/i386/i386.md  | 53 --
 gcc/config/i386/i386.opt |  3 +
 gcc/config/i386/predicates.md|  7 ++
 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c | 73 
 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c | 40 +++
 9 files changed, 248 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 312329e550b..c02a4bcbec3 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -3336,6 +3336,30 @@ ix86_expand_int_addcc (rtx operands[])
   return true;
 }
 
+/* Return TRUE if we could convert "if (test) x = a; else x = b;" to cfcmov,
+   especially when load a or b or x store may cause memmory faults.  */
+bool
+ix86_can_cfcmov_p (rtx x, rtx a, rtx b)
+{
+  machine_mode mode = GET_MODE (x);
+  if (TARGET_APX_CFCMOV
+  && (mode == DImode || mode == SImode || mode == HImode))
+{
+  /* C load (r m r), (r m C), (r r m). For r m m could use
+two cfcmov. */
+  if (register_operand (x, mode)
+ && ((MEM_P (a) && register_operand (b, mode))
+ || (MEM_P (a) && b == const0_rtx)
+ || (register_operand (a, mode) && MEM_P (b))
+ || (MEM_P (a) && MEM_P (b
+   return true;
+  /* C store  (m r 0).  */
+  else if (MEM_P (x) && x == b && register_operand (a, mode))
+   return true;
+}
+  return false;
+}
+
 bool
 ix86_expand_int_movcc (rtx operands[])
 {
@@ -3366,6 +3390,45 @@ ix86_expand_int_movcc (rtx operands[])
 
   compare_code = GET_CODE (compare_op);
 
+  if (MEM_P (operands[0])
+  && !ix86_can_cfcmov_p (operands[0], op2, op3))
+return false;
+
+  if (may_trap_or_fault_p (op2) || may_trap_or_fault_p (op3))
+  {
+   if (ix86_can_cfcmov_p (operands[0], op2, op3))
+ {
+   if (may_trap_or_fault_p (op2))
+ op2 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[2]),
+   UNSPEC_APX_CFCMOV);
+   if (may_trap_or_fault_p (op3))
+ op3 = gen_rtx_UNSPEC (mode, gen_rtvec (1, operands[3]),
+   UNSPEC_APX_CFCMOV);
+   emit_insn (compare_seq);
+
+   if (may_trap_or_fault_p (op2) && may_trap_or_fault_p (op3))
+ {
+   emit_insn (gen_rtx_SET (operands[0],
+   gen_rtx_IF_THEN_ELSE (mode,
+ compare_op,
+ op2,
+ operands[0])));
+   emit_insn (gen_rtx_SET (operands[0],
+   gen_rtx_IF_THEN_ELSE (mode,
+ compare_op,
+ operands[0],
+ op3)));
+ }
+   else
+ emit_insn (gen_rtx_SET (operands[0],
+ gen_rtx_IF_THEN_ELSE (mode,
+   compare_op,
+   op2, op3)));
+   return true;
+ }
+   return false;
+  }
+
   if ((op1 == const0_rtx && (code == GE || code == LT))
   || (op1 == constm1_rtx && (code == GT || code == LE)))
 sign_bit_compare_p = true;
diff --git 

[PATCH v2 0/2] [APX CFCMOV] Support APX CFCMOV

2024-06-18 Thread Kong, Lingling
Hi,

Thank you for reviewing v1!

Changes in v2:
Removed the target hook and added a new optab for cfcmov.

Lingling Kong (2):
  [APX CFCMOV] Support APX CFCMOV in if_convert pass
  [APX CFCMOV] Support APX CFCMOV in backend

 gcc/config/i386/i386-expand.cc   |  63 +
 gcc/config/i386/i386-opts.h  |   4 +-
 gcc/config/i386/i386.cc  |  16 +-
 gcc/config/i386/i386.h   |   1 +
 gcc/config/i386/i386.md  |  53 +++-
 gcc/config/i386/i386.opt |   3 +
 gcc/config/i386/predicates.md|   7 +
 gcc/ifcvt.cc | 246 ++-
 gcc/optabs.def   |   1 +
 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c |  73 ++ 
 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c |  40 +++
 11 files changed, 494 insertions(+), 13 deletions(-) 
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c

--

> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, June 17, 2024 11:05 AM
> To: Jeff Law 
> Cc: Alexander Monakov ; Kong, Lingling
> ; gcc-patches@gcc.gnu.org; Liu, Hongtao
> ; Uros Bizjak 
> Subject: Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV
> 
> On Sat, Jun 15, 2024 at 1:22 AM Jeff Law  wrote:
> >
> >
> >
> > On 6/14/24 11:10 AM, Alexander Monakov wrote:
> > >
> > > On Fri, 14 Jun 2024, Kong, Lingling wrote:
> > >
> > >> APX CFCMOV[1] feature implements conditionally faulting which means
> > >> that all memory faults are suppressed when the condition code
> > >> evaluates to false and load or store a memory operand. Now we could load
> or store a memory operand may trap or fault for conditional move.
> > >>
> > >> In middle-end, now we don't support a conditional move if we knew
> > >> that a load from A or B could trap or fault.
> > >
> > > Predicated loads on Itanium don't trap either. They are
> > > modeled via COND_EXEC on RTL. The late if-conversion pass (the
> > > instance that runs after
> > > reload) is capable of introducing them.
> > >
> > >> To enable CFCMOV, we add a target HOOK
> > >> TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP
> > >> in if-conversion pass to allow convert to cmov.
> > >
> > > Considering the above, is the new hook really necessary? Can you
> > > model the new instructions via (cond_exec () (set ...)) instead of (set
> (if_then_else ...)) ?
> > Note that turning on cond_exec will turn off some of the cmove support.
> Yes, cfcmov looks more like a cmov than a cond_exec.
> >
> > But the general suggesting of trying to avoid a hook for this is a
> > good one.  In fact, my first reaction to this thread was "do we really
> > need a hook for this".
> Maybe a new optab, .i.e cfmovmodecc, and it differs from movcc for Conditional
> Fault?
> >
> > jeff
> 
> 
> 
> --
> BR,
> Hongtao


[PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-18 Thread Kong, Lingling
APX CFCMOV feature implements conditionally faulting which means

that all memory faults are suppressed when the condition code

evaluates to false and load or store a memory operand. Now we

could load or store a memory operand may trap or fault for

conditional move.



In middle-end, now we don't support a conditional move if we knew

that a load from A or B could trap or fault. To enable CFCMOV, we

added a new optab.



Conditional move suppress_fault for condition mem store would not

move any arithmetic calculations. For condition mem load now just

support a conditional move one trap mem and one no trap and no mem

cases.



gcc/ChangeLog:



   * ifcvt.cc (noce_try_cmove_load_mem_notrap): Allow convert

   to cfcmov for conditional load.

   (noce_try_cmove_store_mem_notrap): Convert to conditional store.

   (noce_process_if_block): Ditto.

   * optabs.def (OPTAB_D): New optab.

---

gcc/ifcvt.cc   | 246 -

gcc/optabs.def |   1 +

2 files changed, 246 insertions(+), 1 deletion(-)



diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc

index 58ed42673e5..65c069b8cc6 100644

--- a/gcc/ifcvt.cc

+++ b/gcc/ifcvt.cc

@@ -783,6 +783,8 @@ static rtx noce_emit_cmove (struct noce_if_info *, rtx, 
enum rtx_code, rtx,

 rtx, rtx, rtx, rtx = NULL, rtx 
= NULL);

static bool noce_try_cmove (struct noce_if_info *);

static bool noce_try_cmove_arith (struct noce_if_info *);

+static bool noce_try_cmove_load_mem_notrap (struct noce_if_info *);

+static bool noce_try_cmove_store_mem_notrap (struct noce_if_info *, rtx *, 
rtx);

static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);

static bool noce_try_minmax (struct noce_if_info *);

static bool noce_try_abs (struct noce_if_info *);

@@ -2401,6 +2403,233 @@ noce_try_cmove_arith (struct noce_if_info *if_info)

   return false;

}



+/* When target support suppress memory fault, try more complex cases involving

+   conditional_move's source or dest may trap or fault.  */

+

+static bool

+noce_try_cmove_load_mem_notrap (struct noce_if_info *if_info)

+{

+  rtx a = if_info->a;

+  rtx b = if_info->b;

+  rtx x = if_info->x;

+

+  if (MEM_P (x))

+return false;

+  /* Just handle a conditional move from one trap MEM + other non_trap,

+ non mem cases.  */

+  if (!(MEM_P (a) ^ MEM_P (b)))

+  return false;

+  bool a_trap = may_trap_or_fault_p (a);

+  bool b_trap = may_trap_or_fault_p (b);

+

+  if (!(a_trap ^ b_trap))

+return false;

+  if (a_trap && !MEM_P (a))

+return false;

+  if (b_trap && !MEM_P (b))

+return false;

+

+  rtx orig_b;

+  rtx_insn *insn_a, *insn_b;

+  bool a_simple = if_info->then_simple;

+  bool b_simple = if_info->else_simple;

+  basic_block then_bb = if_info->then_bb;

+  basic_block else_bb = if_info->else_bb;

+  rtx target;

+  enum rtx_code code;

+  rtx cond = if_info->cond;

+  rtx_insn *ifcvt_seq;

+

+  /* if (test) x = *a; else x = c - d;

+ => x = c - d;

+ if (test)

+   x = *a;

+  */

+

+  code = GET_CODE (cond);

+  insn_a = if_info->insn_a;

+  insn_b = if_info->insn_b;

+  machine_mode x_mode = GET_MODE (x);

+

+  /* Because we only handle one trap MEM + other non_trap, non mem cases,

+ just move one trap MEM always in then_bb.  */

+  if (noce_reversed_cond_code (if_info) != UNKNOWN)

+{

+  bool reversep = false;

+  if (b_trap)

+ reversep = true;

+

+  if (reversep)

+ {

+   if (if_info->rev_cond)

+ {

+   cond = if_info->rev_cond;

+   code = GET_CODE (cond);

+ }

+   else

+ code = reversed_comparison_code (cond, if_info->jump);

+   std::swap (a, b);

+   std::swap (insn_a, insn_b);

+   std::swap (a_simple, b_simple);

+   std::swap (then_bb, else_bb);

+ }

+}

+

+  if (then_bb && else_bb

+  && (!bbs_ok_for_cmove_arith (then_bb, else_bb,  if_info->orig_x)

+   || !bbs_ok_for_cmove_arith (else_bb, then_bb,  
if_info->orig_x)))

+return false;

+

+  start_sequence ();

+

+  /* If one of the blocks is empty then the corresponding B or A value

+ came from the test block.  The non-empty complex block that we will

+ emit might clobber the register used by B or A, so move it to a pseudo

+ first.  */

+

+  rtx tmp_b = NULL_RTX;

+

+  /* Don't move trap mem to a pseudo. */

+  if (!may_trap_or_fault_p (b) && (b_simple || !else_bb))

+tmp_b = gen_reg_rtx (x_mode);

+

+  orig_b = b;

+

+  rtx emit_a = NULL_RTX;

+  rtx emit_b = NULL_RTX;

+  rtx_insn *tmp_insn = NULL;

+  bool modified_in_a = false;

+  bool modified_in_b = false;

+  /* If either operand is complex, load it into a register first.

+ The best way to do this is to 

Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-06-18 Thread mayshao-oc




On 5/28/24 14:15, Uros Bizjak wrote:




On Mon, May 27, 2024 at 10:33 AM MayShao  wrote:


From: mayshao 

Hi all:
 This patch enables -march/-mtune=shijidadao, costs and tunings are set 
according to the characteristics of the processor.

 Bootstrapped /regtested X86_64.

 Ok for trunk?


OK.

Thanks,
Uros.


Thanks for your review, please help me commit.

BR
Mayshao




BR
Mayshao
gcc/ChangeLog:

 * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize shijidadao.
 * common/config/i386/i386-common.cc: Add shijidadao.
 * common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
 Add ZHAOXIN_FAM7H_SHIJIDADAO.
 * config.gcc: Add shijidadao.
 * config/i386/driver-i386.cc (host_detect_local_cpu):
 Let -march=native recognize shijidadao processors.
 * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
 * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
 (m_SHIJIDADAO): New definition.
 * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
 * config/i386/x86-tune-costs.h (struct processor_costs):
 Add shijidadao_cost.
 * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
 (ix86_adjust_cost): Ditto.
 * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add 
m_SHIJIDADAO.
 (X86_TUNE_USE_GATHER_4PARTS): Ditto.
 (X86_TUNE_USE_GATHER_8PARTS): Ditto.
 (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
 * doc/extend.texi: Add details about shijidadao.
 * doc/invoke.texi: Ditto.

gcc/testsuite/ChangeLog:

 * g++.target/i386/mv32.C: Handle new -march
 * gcc.target/i386/funcspec-56.inc: Ditto.
---
  gcc/common/config/i386/cpuinfo.h  |   8 +-
  gcc/common/config/i386/i386-common.cc |   8 +-
  gcc/common/config/i386/i386-cpuinfo.h |   1 +
  gcc/config.gcc|  14 ++-
  gcc/config/i386/driver-i386.cc|  11 +-
  gcc/config/i386/i386-c.cc |   7 ++
  gcc/config/i386/i386-options.cc   |   4 +-
  gcc/config/i386/i386.h|   1 +
  gcc/config/i386/x86-tune-costs.h  | 116 ++
  gcc/config/i386/x86-tune-sched.cc |   2 +
  gcc/config/i386/x86-tune.def  |   8 +-
  gcc/doc/extend.texi   |   3 +
  gcc/doc/invoke.texi   |   6 +
  gcc/testsuite/g++.target/i386/mv32.C  |   6 +
  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
  15 files changed, 183 insertions(+), 14 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 4610bf6d6a4..936039725ab 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
   reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
 }
- else if (model >= 0x5b)
+ else if (model == 0x5b)
 {
   cpu = "yongfeng";
   CHECK___builtin_cpu_is ("yongfeng");
   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
 }
+ else if (model >= 0x6b)
+   {
+ cpu = "shijidadao";
+ CHECK___builtin_cpu_is ("shijidadao");
+ cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
+   }
break;
  default:
break;
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 895e5fa662d..eb3f94c529c 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -2066,6 +2066,7 @@ const char *const processor_names[] =
"intel",
"lujiazui",
"yongfeng",
+  "shijidadao",
"geode",
"k6",
"athlon",
@@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
| PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
{"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
 PTA_LUJIAZUI,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
{"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
 PTA_YONGFENG,
-   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
+  {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
+   PTA_YONGFENG,
+   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
{"k8", PROCESSOR_K8, CPU_K8,
  PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
| PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
b/gcc/common/config/i386/i386-cpuinfo.h
index 9edad96d4fd..fa3b76f4931 100644
--- a/gcc/common/config/i386/i386-cpuinfo.h
+++ b/gcc/common/config/i386/i386-cpuinfo.h
@@ -104,6 +104,7 @@ enum processor_subtypes

Re: [PATCH 0/2] aarch64: Small cleanups of the cavium cores

2024-06-18 Thread Kyrylo Tkachov
Hi Andrew,

> On 18 Jun 2024, at 05:40, Andrew Pinski  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> While thinking the variant patch I had posted, I went back to
> look at the original cores which used the variant and saw there
> was small cleanup for them since thunderx was no longer considered
> a V8.1-a core but rather just a V8-a one; when I did that change
> I didn't do the cleanups like is done in this patch set.
> Note there is a core which uses the variant selection so we can't
> remove the code there.
> 

Ok for both.
Thanks,
Kyrill

> Andrew Pinski (2):
>  aarch64: make thunderxt88p1 an alias of thunderxt88
>  aarch64: Add comment about thunderxt81/t83 being aliases
> 
> gcc/config/aarch64/aarch64-cores.def | 6 +++---
> gcc/config/aarch64/aarch64-tune.md   | 2 +-
> 2 files changed, 4 insertions(+), 4 deletions(-)
> 
> --
> 2.43.0
> 



smime.p7s
Description: S/MIME cryptographic signature


[pushed] readings: Drop FORTRAN 77 test suite at itl.nist.gov

2024-06-18 Thread Gerald Pfeifer
The original subsite has disappeared and we couldn't find it elsewhere.

Pushed.

Gerald
---
 htdocs/readings.html | 6 --
 1 file changed, 6 deletions(-)

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 784a3bd7..ae1b52bb 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -423,12 +423,6 @@ names.
   Testing and Validation -
   Some packages aimed at Fortran compiler validation.
 
-  
-https://www.itl.nist.gov/div897/ctg/fortran_form.htm;>FORTRAN
-77 test suite by the NIST Information Technology Laboratory
-(https://www.itl.nist.gov/div897/ctg/software.htm;>license)
-contains legal and operational Fortran 77 code.
-  
   
 The g77 testsuite (which is part of GCC).
   
-- 
2.45.2


Re: [PATCH] function.h: eliminate macros "dom_computed" and "n_bbs_in_dom_tree"

2024-06-18 Thread Richard Biener
On Tue, Jun 18, 2024 at 2:11 AM David Malcolm  wrote:
>
> Be explicit when we use "cfun".
>
> No functional change intended.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
> * dominance.cc (compute_dom_fast_query): Replace uses of
> "dom_computed" macro with explicit use of cfun.
> (compute_dom_fast_query_in_region): Likewise.
> (calculate_dominance_info): Likewise, also for macro
> "n_bbs_in_dom_tree".
> (calculate_dominance_info_for_region): Likewise for
> "dom_computed" macro.
> (get_immediate_dominator): Likewise.
> (set_immediate_dominator): Likewise.
> (get_dominated_by): Likewise.
> (redirect_immediate_dominators): Likewise.
> (nearest_common_dominator): Likewise.
> (dominated_by_p): Likewise.
> (bb_dom_dfs_in): Likewise.
> (bb_dom_dfs_out): Likewise.
> (recompute_dominator): Likewise.
> (iterate_fix_dominators): Likewise.
> (add_to_dominance_info): Likewise, also for macro
> "n_bbs_in_dom_tree".
> (delete_from_dominance_info): Likewise.
> (set_dom_info_availability): Likewise for
> "dom_computed" macro.
> * function.h (dom_computed): Delete macro.
> (n_bbs_in_dom_tree): Delete macro.
>
> Signed-off-by: David Malcolm 
> ---
>  gcc/dominance.cc | 70 +---
>  gcc/function.h   |  3 ---
>  2 files changed, 36 insertions(+), 37 deletions(-)
>
> diff --git a/gcc/dominance.cc b/gcc/dominance.cc
> index 0357210ed27f..528b38caa9db 100644
> --- a/gcc/dominance.cc
> +++ b/gcc/dominance.cc
> @@ -672,7 +672,7 @@ compute_dom_fast_query (enum cdi_direction dir)
>
>gcc_checking_assert (dom_info_available_p (dir));
>
> -  if (dom_computed[dir_index] == DOM_OK)
> +  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)

The x_* members are named this way to be not used directly.  I think the
canonical replacement these days would be a

inline bool dom_computed (function *, enum cdi_direction) {...}

inline function.

Richard.

>  return;
>
>FOR_ALL_BB_FN (bb, cfun)
> @@ -681,7 +681,7 @@ compute_dom_fast_query (enum cdi_direction dir)
> assign_dfs_numbers (bb->dom[dir_index], );
>  }
>
> -  dom_computed[dir_index] = DOM_OK;
> +  cfun->cfg->x_dom_computed[dir_index] = DOM_OK;
>  }
>
>  /* Analogous to the previous function but compute the data for reducible
> @@ -697,7 +697,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir,
>
>gcc_checking_assert (dom_info_available_p (dir));
>
> -  if (dom_computed[dir_index] == DOM_OK)
> +  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
>  return;
>
>/* Assign dfs numbers for region nodes except for entry and exit nodes.  */
> @@ -708,7 +708,7 @@ compute_dom_fast_query_in_region (enum cdi_direction dir,
> assign_dfs_numbers (bb->dom[dir_index], );
>  }
>
> -  dom_computed[dir_index] = DOM_OK;
> +  cfun->cfg->x_dom_computed[dir_index] = DOM_OK;
>  }
>
>  /* The main entry point into this module.  DIR is set depending on whether
> @@ -721,7 +721,7 @@ calculate_dominance_info (cdi_direction dir, bool 
> compute_fast_query)
>  {
>unsigned int dir_index = dom_convert_dir_to_idx (dir);
>
> -  if (dom_computed[dir_index] == DOM_OK)
> +  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
>  {
>checking_verify_dominators (dir);
>return;
> @@ -730,14 +730,14 @@ calculate_dominance_info (cdi_direction dir, bool 
> compute_fast_query)
>timevar_push (TV_DOMINANCE);
>if (!dom_info_available_p (dir))
>  {
> -  gcc_assert (!n_bbs_in_dom_tree[dir_index]);
> +  gcc_assert (!cfun->cfg->x_n_bbs_in_dom_tree[dir_index]);
>
>basic_block b;
>FOR_ALL_BB_FN (b, cfun)
> {
>   b->dom[dir_index] = et_new_tree (b);
> }
> -  n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn (cfun);
> +  cfun->cfg->x_n_bbs_in_dom_tree[dir_index] = n_basic_blocks_for_fn 
> (cfun);
>
>dom_info di (cfun, dir);
>di.calc_dfs_tree ();
> @@ -749,7 +749,7 @@ calculate_dominance_info (cdi_direction dir, bool 
> compute_fast_query)
> et_set_father (b->dom[dir_index], d->dom[dir_index]);
> }
>
> -  dom_computed[dir_index] = DOM_NO_FAST_QUERY;
> +  cfun->cfg->x_dom_computed[dir_index] = DOM_NO_FAST_QUERY;
>  }
>else
>  checking_verify_dominators (dir);
> @@ -772,7 +772,7 @@ calculate_dominance_info_for_region (cdi_direction dir,
>basic_block bb;
>unsigned int i;
>
> -  if (dom_computed[dir_index] == DOM_OK)
> +  if (cfun->cfg->x_dom_computed[dir_index] == DOM_OK)
>  return;
>
>timevar_push (TV_DOMINANCE);
> @@ -791,7 +791,7 @@ calculate_dominance_info_for_region (cdi_direction dir,
>  if (basic_block d = di.get_idom (bb))
>et_set_father (bb->dom[dir_index], d->dom[dir_index]);
>
> -  dom_computed[dir_index] = 

RE: [PATCH v3] aarch64: Add vector popcount besides QImode [PR113859]

2024-06-18 Thread Pengxuan Zheng (QUIC)
> Hi,
> 
> > -Original Message-
> > From: Pengxuan Zheng 
> > Sent: Friday, June 14, 2024 12:57 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Pengxuan Zheng 
> > Subject: [PATCH v3] aarch64: Add vector popcount besides QImode
> > [PR113859]
> >
> > This patch improves GCC’s vectorization of __builtin_popcount for
> > aarch64 target by adding popcount patterns for vector modes besides
> > QImode, i.e., HImode, SImode and DImode.
> >
> > With this patch, we now generate the following for V8HI:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >
> > For V4HI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >
> > For V4SI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >
> > For V2SI, we generate:
> >   cnt v1.8b, v.8b
> >   uaddlp  v2.4h, v1.8b
> >   uaddlp  v3.2s, v2.4h
> >
> > For V2DI, we generate:
> >   cnt v1.16b, v.16b
> >   uaddlp  v2.8h, v1.16b
> >   uaddlp  v3.4s, v2.8h
> >   uaddlp  v4.2d, v3.4s
> 
> Nice patch!  We can do better for these sequences though. Would you
> instead consider using udot with a 0 accumulator and 1 multiplicatent.
> 
> Essentially
> movi v0.16b, #0
> movi v1.16b, #1
> cnt v3.16b, v2.16b
> udot  v0.4s, v3.16b, v1.16b
> 
> this has 1 instruction less on the critical path so should be half the 
> latency of
> the uaddlp variants.
> 
> For the DI case you'll still need a final uaddlp.

Thanks for your suggestions, Tamar! That's indeed more efficient. I have 
updated 
the patch accordingly. Please let me know if you have any other comments.

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654947.html

Thanks,
Pengxuan
> 
> Cheers,
> Tamar
> 
> >
> > PR target/113859
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-simd.md (aarch64_addlp):
> > Rename to...
> > (@aarch64_addlp): ... This.
> > (popcount2): New define_expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/popcnt-vec.c: New test.
> >
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-simd.md| 28 +++-
> >  gcc/testsuite/gcc.target/aarch64/popcnt-vec.c | 69
> > +++
> >  2 files changed, 96 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64- simd.md index 0bb39091a38..ee73e13534b
> > 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -3461,7 +3461,7 @@ (define_insn
> > "*aarch64_addlv_ze"
> >[(set_attr "type" "neon_reduc_add")]
> >  )
> >
> > -(define_expand "aarch64_addlp"
> > +(define_expand "@aarch64_addlp"
> >[(set (match_operand: 0 "register_operand")
> > (plus:
> >   (vec_select:
> > @@ -3517,6 +3517,32 @@ (define_insn
> "popcount2"
> >[(set_attr "type" "neon_cnt")]
> >  )
> >
> > +(define_expand "popcount2"
> > +  [(set (match_operand:VDQHSD 0 "register_operand")
> > +(popcount:VDQHSD (match_operand:VDQHSD 1
> > +"register_operand")))]
> > +  "TARGET_SIMD"
> > +  {
> > +/* Generate a byte popcount. */
> > +machine_mode mode =  == 64 ? V8QImode : V16QImode;
> > +rtx tmp = gen_reg_rtx (mode);
> > +auto icode = optab_handler (popcount_optab, mode);
> > +emit_insn (GEN_FCN (icode) (tmp, gen_lowpart (mode,
> > +operands[1])));
> > +
> > +/* Use a sequence of UADDLPs to accumulate the counts. Each step
> doubles
> > +   the element size and halves the number of elements. */
> > +do
> > +  {
> > +auto icode = code_for_aarch64_addlp (ZERO_EXTEND, GET_MODE
> (tmp));
> > +mode = insn_data[icode].operand[0].mode;
> > +rtx dest = mode == mode ? operands[0] : gen_reg_rtx
> (mode);
> > +emit_insn (GEN_FCN (icode) (dest, tmp));
> > +tmp = dest;
> > +  }
> > +while (mode != mode);
> > +DONE;
> > +  }
> > +)
> > +
> >  ;; 'across lanes' max and min ops.
> >
> >  ;; Template for outputting a scalar, so we can create __builtins
> > which can be diff --git
> > a/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > new file mode 100644
> > index 000..0c4926d7ca8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/popcnt-vec.c
> > @@ -0,0 +1,69 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fno-vect-cost-model" } */
> > +
> > +/* This function should produce cnt v.16b. */ void bar (unsigned char
> > +*__restrict b, unsigned char *__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]); }
> > +
> > +/* This function should produce cnt v.16b and uaddlp (Add Long
> > +Pairwise). */ void
> > +bar1 (unsigned short *__restrict b, unsigned short *__restrict d) {
> > +  for (int i = 0; i < 1024; i++)
> > +d[i] = __builtin_popcount (b[i]); }
> > +
> > +/* This function should produce cnt v.16b and 2 uaddlp (Add Long
> > 

  1   2   >