Re: [PATCH 23/52] mmix: Remove macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE

2024-06-05 Thread Kewen.Lin
Hi Hans-Peter,

on 2024/6/6 09:41, Hans-Peter Nilsson wrote:
> On Sun, 2 Jun 2024, Kewen Lin wrote:
> 
>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> defines in mmix port.
> 
> This is fine once prerequisites are in place.
> 
> If I may add a nit: In these target change commit messages, add 
> a hint as to which defaulted hook or macro the removed macro now 
> corresponds to, like "these now correspond to the default values 
> of the new target hook mode_for_floating_type".  Else when doing 
> port archaeology, from the commit message it looks like the 
> context has not changed, you just removed some definitions.

Thanks for the comments!  Sorry I didn't point out that once all
sub-patches after 09/52 are approved, I'll squash all port specific
sub-patches (after 09/52) onto 09/52, as that one poisoned these
macros and all sub-patches after that can not be committed separately.
As they would be in the same commit eventually, I think the context
would not be the concern then.

BR,
Kewen

> 
>>
>> gcc/ChangeLog:
>>
>>  * config/mmix/mmix.h (FLOAT_TYPE_SIZE): Remove.
>>  (DOUBLE_TYPE_SIZE): Likewise.
>>  (LONG_DOUBLE_TYPE_SIZE): Likewise.
>> ---
>>  gcc/config/mmix/mmix.h | 4 
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/gcc/config/mmix/mmix.h b/gcc/config/mmix/mmix.h
>> index c3c5a2a69c9..e20bca1d363 100644
>> --- a/gcc/config/mmix/mmix.h
>> +++ b/gcc/config/mmix/mmix.h
>> @@ -195,10 +195,6 @@ struct GTY(()) machine_function
>>  #define SHORT_TYPE_SIZE 16
>>  #define LONG_LONG_TYPE_SIZE 64
>>  
>> -#define FLOAT_TYPE_SIZE 32
>> -#define DOUBLE_TYPE_SIZE 64
>> -#define LONG_DOUBLE_TYPE_SIZE 64
>> -
>>  #define DEFAULT_SIGNED_CHAR 1
>>  
>>  
>> -- 
>> 2.43.0
>>
>>



Re: [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-05 Thread Kewen.Lin
Hi,

on 2024/6/6 09:43, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2024/6/5 17:00, Kewen.Lin 写道:
>> This predicate can be moved to its only use (define_insn part condition).
>> The const_vector match_code check is redundant as const_vec_duplicate_p
>> already checks that, I wonder if we really need easy_altivec_constant?
>> Even if one vector constant doesn't meet easy_altivec_constant, but if
>> it matches the desired duplicated pattern, it doesn't need the swapping
>> either, no?
> 
> Thanks for your comments.
> I think we need easy_altivec_constant as the constant will be directly
> moved to a vector register after split. It might fail if it's not a easy
> alitvec constant?
> 
>   [(set (match_dup 2)
>   (match_dup 1))

For that case, can we move operand 1 to a pseudo first by emit_move_insn, then
use that pseudo for the source of set?

BR,
Kewen



Re: [PATCH 01/52] ada: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-05 Thread Kewen.Lin
Hi Eric,

on 2024/6/3 16:51, Eric Botcazou wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in ada
>> with TYPE_PRECISION of long_double_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/ada/ChangeLog:
>>
>>  * gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
>>  long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
> 
> OK, thanks.
> 

Pushed as r15-1031, thanks!

BR,
Kewen



Re: [PATCH 02/52 v2] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-05 Thread Kewen.Lin
Hi Iain,

on 2024/6/4 19:35, Iain Buclaw wrote:
> Excerpts from Kewen.Lin's message of Juni 4, 2024 5:17 am:
>> Hi Iain,
>>
>> on 2024/6/3 22:39, Iain Buclaw wrote:
>>> Excerpts from Kewen.Lin's message of Juni 3, 2024 10:57 am:
 Hi Iain,

 on 2024/6/3 16:40, Iain Buclaw wrote:
> Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
>> TYPE_PRECISION of long_double_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>
> Thanks, one question though: Is TYPE_PRECISION really equivalent to
> LONG_DOUBLE_TYPE_SIZE?

 Yes, it's guaranteed by the code in build_common_tree_nodes:

   long_double_type_node = make_node (REAL_TYPE);
   TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
   layout_type (long_double_type_node);

 , the macro LONG_DOUBLE_TYPE_SIZE is assigned to TYPE_PRECISION of
 long_double_type_node, layout_type will only pick up one mode as
 the given precision and won't change it.

>
> Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
> the answer to be "no".

 I'm afraid it's poorly named before.

>>>
>>> Thanks for confirming Kewen.
>>>
>>> I suspect then that this code is incorrectly using this macro, and it
>>> should instead be using:
>>>
>>> int_size_in_bytes(long_double_type_node)
>>>
>>> as any padding should be considered as part of the overall type size for
>>> the purpose that this field serves in the D part of the front-end.
>>
>> Got it, thanks for the explanation and suggestion.
>>
>>>
>>> Are you able to update the patch this way instead? Otherwise I'm happy
>>> to push the change instead.
>>
>> Sure, updated as below:
>>
> 
> Thanks!
> 
> This is OK to apply any time.

Pushed as r15-1032, thanks!

BR,
Kewen



Re: [PATCH 03/52] fortran: Replace uses of {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE

2024-06-05 Thread Kewen.Lin
Hi Harald,

on 2024/6/4 04:01, Harald Anlauf wrote:
> Hi,
> 
> Am 03.06.24 um 05:00 schrieb Kewen Lin:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in fortran with TYPE_PRECISION of
>> {float,{,long_}double}_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/fortran/ChangeLog:
>>
>> * trans-intrinsic.cc (build_round_expr): Use TYPE_PRECISION of
>> long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
>> * trans-types.cc (gfc_build_real_type): Use TYPE_PRECISION of
>> {float,double,long_double}_type_node to replace
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>> ---
>>   gcc/fortran/trans-intrinsic.cc |  3 ++-
>>   gcc/fortran/trans-types.cc | 10 ++
>>   2 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
>> index 912c1000e18..96839705112 100644
>> --- a/gcc/fortran/trans-intrinsic.cc
>> +++ b/gcc/fortran/trans-intrinsic.cc
>> @@ -395,7 +395,8 @@ build_round_expr (tree arg, tree restype)
>>    don't have an appropriate function that converts directly to the 
>> integer
>>    type (such as kind == 16), just use ROUND, and then convert the 
>> result to
>>    an integer.  We might also need to convert the result afterwards.  */
>> -  if (resprec <= INT_TYPE_SIZE && argprec <= LONG_DOUBLE_TYPE_SIZE)
>> +  if (resprec <= INT_TYPE_SIZE
>> +  && argprec <= TYPE_PRECISION (long_double_type_node))
>>   fn = builtin_decl_for_precision (BUILT_IN_IROUND, argprec);
>>     else if (resprec <= LONG_TYPE_SIZE)
>>   fn = builtin_decl_for_precision (BUILT_IN_LROUND, argprec);
>> diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
>> index 8466c595e06..0ef67723fcd 100644
>> --- a/gcc/fortran/trans-types.cc
>> +++ b/gcc/fortran/trans-types.cc
>> @@ -873,13 +873,15 @@ gfc_build_real_type (gfc_real_info *info)
>>     int mode_precision = info->mode_precision;
>>     tree new_type;
>>
>> -  if (mode_precision == FLOAT_TYPE_SIZE)
>> +  if (mode_precision == TYPE_PRECISION (float_type_node))
>>   info->c_float = 1;
>> -  if (mode_precision == DOUBLE_TYPE_SIZE)
>> +  if (mode_precision == TYPE_PRECISION (double_type_node))
>>   info->c_double = 1;
>> -  if (mode_precision == LONG_DOUBLE_TYPE_SIZE && !info->c_float128)
>> +  if (mode_precision == TYPE_PRECISION (long_double_type_node)
>> +  && !info->c_float128)
>>   info->c_long_double = 1;
>> -  if (mode_precision != LONG_DOUBLE_TYPE_SIZE && mode_precision == 128)
>> +  if (mode_precision != TYPE_PRECISION (long_double_type_node)
>> +  && mode_precision == 128)
>>   {
>>     /* TODO: see PR101835.  */
>>     info->c_float128 = 1;
> 
> the Fortran part looks good to me.

Pushed as r15-1033, thanks!

BR,
Kewen



Re: [PATCH 07/52] darwin: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-05 Thread Kewen.Lin
Hi Iain,

on 2024/6/4 14:21, Iain Sandoe wrote:
> Hi Kewen,
> 
>> On 3 Jun 2024, at 04:00, Kewen Lin  wrote:
>>
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in darwin.c
>> with TYPE_PRECISION of long_double_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/ChangeLog:
>>
>>  * config/darwin.cc (darwin_patch_builtins): Use TYPE_PRECISION of
>>  long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
>> ---
>> gcc/config/darwin.cc | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
>> index 63b8c509405..9129378be37 100644
>> --- a/gcc/config/darwin.cc
>> +++ b/gcc/config/darwin.cc
>> @@ -3620,7 +3620,7 @@ darwin_patch_builtin (enum built_in_function fncode)
>> void
>> darwin_patch_builtins (void)
>> {
>> -  if (LONG_DOUBLE_TYPE_SIZE != 128)
>> +  if (TYPE_PRECISION (long_double_type_node) != 128)
>> return;
> 
> Darwin (at this revision) supports  long-double-{64,128}, but the support is 
> limited
> to ibm128 for the 128b case.
> 
> The purpose of this code is to adjust the libc function name in response to 
> the
> {64,128} for the long double type - when that is 128.
> 
> It seems that the revised version should be no less fragile than the original 
> (since
> we now have potentially two 128b long double formats, although IEEE754 is not 
> yet 
> implemented for < p7 so should not (yet) be relevant).

Thanks for the information, yes, from what's in build_common_tree_nodes, both
LONG_DOUBLE_TYPE_SIZE and TYPE_PRECISION (long_double_type_node) should be the 
same.

> 
> So, OK for the Darwin parts.

Pushed as r15-1034, thanks!

BR,
Kewen

> thanks,
> Iain
> 
> 
>>
>> #define PATCH_BUILTIN(fncode) darwin_patch_builtin (fncode);
>> -- 
>> 2.43.0
>>
> 



Re: [Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-06-05 Thread Kewen.Lin
Hi Haochen,

on 2024/1/26 09:17, HAO CHEN GUI wrote:
> Hi,
>   This patch creates an insn_and_split pattern which helps the duplicated
> constant vector replace the source pseudo of store insn in fwprop pass.
> Thus the store can be implemented by a single stxvd2x and it eliminates the
> unnecessary byte swap insn on P8 LE. The test case shows the optimization.
> 
>   The patch depends on the first generic patch which uses insn cost in fwprop.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions.
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Eliminate unnecessary byte swaps for duplicated constant vector store
> 
> gcc/
>   PR target/113325
>   * config/rs6000/predicates.md (duplicate_easy_altivec_constant): New.
>   * config/rs6000/vsx.md (vsx_stxvd2x4_le_const_): New.
> 
> gcc/testsuite/
>   PR target/113325
>   * gcc.target/powerpc/pr113325.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index ef7d3f214c4..8ab6db630b7 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -759,6 +759,14 @@ (define_predicate "easy_vector_constant"
>return false;
>  })
> 
> +;; Return 1 if it's a duplicated easy_altivec_constant.
> +(define_predicate "duplicate_easy_altivec_constant"
> +  (and (match_code "const_vector")
> +   (match_test "easy_altivec_constant (op, mode)"))
> +{
> +  return const_vec_duplicate_p (op);
> +})

This predicate can be moved to its only use (define_insn part condition).
The const_vector match_code check is redundant as const_vec_duplicate_p
already checks that, I wonder if we really need easy_altivec_constant?
Even if one vector constant doesn't meet easy_altivec_constant, but if
it matches the desired duplicated pattern, it doesn't need the swapping
either, no?

> +
>  ;; Same as easy_vector_constant but only for EASY_VECTOR_15_ADD_SELF.
>  (define_predicate "easy_vector_constant_add_self"
>(and (match_code "const_vector")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 26fa32829af..98e4be26f64 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3362,6 +3362,29 @@ (define_insn "*vsx_stxvd2x4_le_"
>"stxvd2x %x1,%y0"
>[(set_attr "type" "vecstore")])
> 
> +(define_insn_and_split "vsx_stxvd2x4_le_const_"
> +  [(set (match_operand:VSX_W 0 "memory_operand" "=Z")
> + (match_operand:VSX_W 1 "duplicate_easy_altivec_constant" "W"))]
> +  "!BYTES_BIG_ENDIAN
> +   && VECTOR_MEM_VSX_P (mode)
> +   && !TARGET_P9_VECTOR"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 2)
> + (match_dup 1))
> +   (set (match_dup 0)
> + (vec_select:VSX_W
> +   (match_dup 2)
> +   (parallel [(const_int 2) (const_int 3)
> +  (const_int 0) (const_int 1)])))]
> +{
> +  operands[2] = can_create_pseudo_p () ? gen_reg_rtx_and_attrs (operands[1])
> +  : operands[1];
> +
> +}
> +  [(set_attr "type" "vecstore")
> +   (set_attr "length" "8")])
> +
>  (define_insn "*vsx_stxvd2x8_le_V8HI"
>[(set (match_operand:V8HI 0 "memory_operand" "=Z")
>  (vec_select:V8HI
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113325.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> new file mode 100644
> index 000..dff68ac0a51
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113325.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */

Nit: s/powerpc_vsx_ok/powerpc_vsx/

BR,
Kewen

> +/* { dg-final { scan-assembler-not {\mxxpermdi\M} } } */
> +
> +void* foo (void* s1)
> +{
> +  return __builtin_memset (s1, 0, 32);
> +}



Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-06-04 Thread Kewen.Lin
Hi,

on 2024/5/29 23:58, Carl Love wrote:
> Updated the patch per the feedback comments from the previous version.
> 
>  Carl 
> ---
> 
> rs6000, extend the current vec_{un,}signed{e,o} built-ins
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
> convert a vector of floats to signed/unsigned long long ints.  Extend the
> existing vec_{un,}signed{e,o} built-ins to handle the argument
> vector of floats to return the even/odd signed/unsigned integers.
> 
> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
> built-ins.
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
> now for internal use only. They are not documented and they do not
> have testcases.
> > The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
> vec_unsigned, remove.
> 
> The __builtin_vsx_xvcvspuxws is redundante as it is covered by
> vec_unsigned, remove.

I perfer to move these removals into sub-patch 2/13 or split them out into
a new patch, since they don't match the subject of this patch.  Moving it
to sub-patch 2/13 looks good as they are all about vec_{un,}signed{,e,o}.

> 
> Add testcases and update documentation.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>   __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>   (__builtin_vsx_xvcvspuxds): Fix return type.
>   (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>   VEC_VUNSIGNEDE_V4SF respectively.
>   (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
>   vunsignede_v4sf respectively.
>   (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
>   __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
>   * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>   vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
>   * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>   vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
>   * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
>   overloaded built-ins.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 25 ++
>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>  gcc/config/rs6000/vsx.md  | 88 +++
>  gcc/doc/extend.texi   | 10 +++
>  .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
>  5 files changed, 157 insertions(+), 25 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index bf9a0ae22fc..cea2649b86c 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1688,32 +1688,23 @@
>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>  
> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
> -XVCVDPSXWS vsx_xvcvdpsxws {}
> -
> -  const vsll __builtin_vsx_xvcvdpuxds (vd);
> -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
> -
>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>  
> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
> -
> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
> -XVCVDPUXWS vsx_xvcvdpuxws {}
> -
>const vd __builtin_vsx_xvcvspdp (vf);
>  XVCVSPDP vsx_xvcvspdp {}
>  
>const vsll __builtin_vsx_xvcvspsxds (vf);
> -XVCVSPSXDS vsx_xvcvspsxds {}
> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}

We should rename __builtin_vsx_xvcvspsxds to
__builtin_vsx_vsignede_v4sf, one reason is to align with
the existing others, one more important thing
is that it doesn't generate 1-1 mapping xvcvspsxds,
putting that mnemonic can be misleading.

> +
> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);

Ditto.

> +VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
>  
> -  const vsll __builtin_vsx_xvcvspuxds (vf);> -XVCVSPUXDS vsx_xvcvspuxds 
> {}
> +  const vull __builtin_vsx_xvcvspuxds (vf);

Ditto.

> +VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
>  
> -  const vsi __builtin_vsx_xvcvspuxws (vf);
> -XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
> +  const vull __builtin_vsx_xvcvspuxds_low (vf);

Ditto.

> +VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
>  
>const vd __builtin_vsx_xvcvsxddp (vsll);
>  XVCVSXDDP vsx_floatv2div2df2 {}
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 84bd9ae6554..4d857bb1af3 100644

Re: [PATCH 5/13 ver 3] rs6000, Remove redundant float/double type conversions

2024-06-04 Thread Kewen.Lin
Hi,

on 2024/5/30 00:00, Carl Love wrote:
> This is a new patch to removed the built-ins that were inadvertently missing 
> in the previous series.
> 
>   Carl 
> --
> 
> rs6000, Remove redundant float/double type conversions

Nit: s! float/double type conversions! vector float/double conversion builtins!

OK for trunk with this subject tweaked.

BR,
Kewen

> 
> The following built-ins are redundant as they are covered by another
> overloaded built-in.
> 
>   __builtin_vsx_xvcvspdp covered by vec_double{e,o}
>   __builtin_vsx_xvcvdpsp covered by vec_float{e,o}
>   __builtin_vsx_xvcvsxwdp covered by vec_double{e,o}
>   __builtin_vsx_xvcvuxddp_uns covered by  vec_double
> 
> Remove the redundant built-ins. They are not documented nor do they have
> test cases.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp,
>   __builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp,
>   __builtin_vsx_xvcvuxddp_uns): Remove.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index cea2649b86c..6049f3a4599 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1679,9 +1679,6 @@
>const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
>  XVCMPGTSP_P vector_gt_v4sf_p {pred}
>  
> -  const vf __builtin_vsx_xvcvdpsp (vd);
> -XVCVDPSP vsx_xvcvdpsp {}
> -
>const vsll __builtin_vsx_xvcvdpsxds (vd);
>  XVCVDPSXDS vsx_fix_truncv2dfv2di2 {}
>  
> @@ -1691,9 +1688,6 @@
>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>  
> -  const vd __builtin_vsx_xvcvspdp (vf);
> -XVCVSPDP vsx_xvcvspdp {}
> -
>const vsll __builtin_vsx_xvcvspsxds (vf);
>  VEC_VSIGNEDE_V4SF vsignede_v4sf {}
>  
> @@ -1715,9 +1709,6 @@
>const vf __builtin_vsx_xvcvsxdsp (vsll);
>  XVCVSXDSP vsx_xvcvsxdsp {}
>  
> -  const vd __builtin_vsx_xvcvsxwdp (vsi);
> -XVCVSXWDP vsx_xvcvsxwdp {}
> -
>const vf __builtin_vsx_xvcvsxwsp (vsi);
>  XVCVSXWSP vsx_floatv4siv4sf2 {}
>  
> @@ -1727,9 +1718,6 @@
>const vd __builtin_vsx_xvcvuxddp_scale (vsll, const int<5>);
>  XVCVUXDDP_SCALE vsx_xvcvuxddp_scale {}
>  
> -  const vd __builtin_vsx_xvcvuxddp_uns (vull);
> -XVCVUXDDP_UNS vsx_floatunsv2div2df2 {}
> -
>const vf __builtin_vsx_xvcvuxdsp (vull);
>  XVCVUXDSP vsx_xvcvuxdsp {}
>  



Re: [PATCH 1/13 ver 3] rs6000, Remove __builtin_vsx_cmple* builtins

2024-06-04 Thread Kewen.Lin
Hi Carl,

on 2024/5/29 23:52, Carl Love wrote:
> This patch was approved in the previous series.  There are no changes to this 
> patch.  Reposting for completeness. 

I guess you can just push the approved ones, as there is no dependency
between any two of them?  It can help to reduce the size of this series.

BR,
Kewen

> 
>  Carl 
> ---
> 
> rs6000, Remove __builtin_vsx_cmple* builtins
> 
> The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
> unsigned arguments and return an unsigned result.  The current definitions
> take signed arguments and return signed results which is incorrect.
> 
> The signed and unsigned versions of __builtin_vsx_cmple* are not
> documented in extend.texi.  Also there are no test cases for the
> built-ins.
> 
> Users can use the existing vec_cmple as PVIPR defines instead of
> __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
> __builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
> __builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
> __builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.
> 
> Hence these built-ins are redundant and are removed by this patch.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
>   RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
>   RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
>   RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
>   RS6000_BIF_CMPLE_U1TI): Remove case statements.
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
>   __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
>   __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
>   __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
>   __builtin_vsx_cmple_u8hi): Remove buit-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 13 
>  gcc/config/rs6000/rs6000-builtins.def | 30 ---
>  2 files changed, 43 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 320affd79e3..ac9f16fe51a 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
>  
> -case RS6000_BIF_CMPLE_16QI:
> -case RS6000_BIF_CMPLE_U16QI:
> -case RS6000_BIF_CMPLE_8HI:
> -case RS6000_BIF_CMPLE_U8HI:
> -case RS6000_BIF_CMPLE_4SI:
> -case RS6000_BIF_CMPLE_U4SI:
> -case RS6000_BIF_CMPLE_2DI:
> -case RS6000_BIF_CMPLE_U2DI:
> -case RS6000_BIF_CMPLE_1TI:
> -case RS6000_BIF_CMPLE_U1TI:
> -  fold_compare_helper (gsi, LE_EXPR, stmt);
> -  return true;
> -
>  /* flavors of vec_splat_[us]{8,16,32}.  */
>  case RS6000_BIF_VSPLTISB:
>  case RS6000_BIF_VSPLTISH:
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..7c36976a089 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1337,30 +1337,6 @@
>const vss __builtin_vsx_cmpge_u8hi (vus, vus);
>  CMPGE_U8HI vector_nltuv8hi {}
>  
> -  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
> -CMPLE_16QI vector_ngtv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
> -CMPLE_2DI vector_ngtv2di {}
> -
> -  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
> -CMPLE_4SI vector_ngtv4si {}
> -
> -  const vss __builtin_vsx_cmple_8hi (vss, vss);
> -CMPLE_8HI vector_ngtv8hi {}
> -
> -  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
> -CMPLE_U16QI vector_ngtuv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
> -CMPLE_U2DI vector_ngtuv2di {}
> -
> -  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
> -CMPLE_U4SI vector_ngtuv4si {}
> -
> -  const vss __builtin_vsx_cmple_u8hi (vss, vss);
> -CMPLE_U8HI vector_ngtuv8hi {}
> -
>const vd __builtin_vsx_concat_2df (double, double);
>  CONCAT_2DF vsx_concat_v2df {}
>  
> @@ -3117,12 +3093,6 @@
>const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
>  CMPGE_U1TI vector_nltuv1ti {}
>  
> -  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
> -CMPLE_1TI vector_ngtv1ti {}
> -
> -  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
> -CMPLE_U1TI vector_ngtuv1ti {}
> -
>const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
>  VCNTMBB vec_cntmb_v16qi {}
>  



Re: [PATCH 12/13 ver 3] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-06-04 Thread Kewen.Lin
Hi Carl,

on 2024/5/30 00:11, Carl Love wrote:
> This was patch 11 from the previous series.  Patch was updated to address 
> feedback comments.
> 
>Carl 
> --
> 
> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
> 
> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
> __builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
> there are no test cases for it.  The patch removes built-in
> __builtin_vsx_xvcmpeqsp_p.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
>   Remove built-in definition.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 64690b9b9b5..48ebc018a8d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1619,9 +1619,6 @@
>const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>  XVCMPEQSP vector_eqv4sf {}
>  
> -  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
> -XVCMPEQSP_P vector_eq_v4sf_p {pred}
> -
>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
>  





Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-06-04 Thread Kewen.Lin
Hi,

on 2024/5/30 00:16, Carl Love wrote:
> This was patch 13 from the previous series.  Note the previous series patch 
> 12 was dropped.  This patch is the same as the previous version.  The 
> additional work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
> __builtin_vec_set_v2d per the feedback comments with equivalent gimple code 
> is being deferred to a future patch.  The goal of this series was simply to 
> remove duplicated built-ins, extending overloaded built-ins as needed.  
> Adding the needed gimple code to remove the additional built-ins is beyond 
> the goal of this patch series.
> 
>  Carl 
> ---
> 
> rs6000, remove vector set and vector init built-ins.
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_set_v1ti

Typo here, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

> 
> perform the same operation as initializing the vector in C code.  For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf

Please also add the reserved ones (...v1ti/v2di/v2df), as they are the 
same too, temporarily reserving them for the uses in resolve_vec_insert()
doesn't affect this.

> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.
> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>   __builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>   __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>   __builtin_vec_init_v2df, __builtin_vec_set_v1ti,

Typo, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>   __builtin_vec_set_v1ti): Remove built-in definitions.

The last three ones are not actually removed.

> ---
>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>  1 file changed, 2 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 48ebc018a8d..8349d45169f 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1118,37 +1118,6 @@
>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>  VEC_EXT_V8HI nothing {extract}
>  
> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char);
> -VEC_INIT_V16QI nothing {init}
> -
> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
> -VEC_INIT_V4SF nothing {init}
> -
> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
> - signed int);
> -VEC_INIT_V4SI nothing {init}
> -
> -  const vss __builtin_vec_init_v8hi (signed short, signed short, signed 
> short,\
> - signed short, signed short, signed short, signed short, \
> - signed short);
> -VEC_INIT_V8HI nothing {init}
> -
> -  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
> -VEC_SET_V16QI nothing {set}
> -
> -  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
> -VEC_SET_V4SF nothing {set}
> -
> -  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
> -VEC_SET_V4SI nothing {set}
> -
> -  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
> -VEC_SET_V8HI nothing {set}
> -
> -
>  ; Cell builtins.
>  [cell]
>pure vsc __builtin_altivec_lvlx (signed long, const void *);
> @@ -1295,15 +1264,8 @@
>const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  

Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-03 Thread Kewen.Lin
Hi,

on 2024/5/30 00:10, Carl Love wrote:
>  This was patch 10 from the previous series.  The patch was updated to 
> address feedback comments.
> 
> Carl 
> ---
> 
> rs6000, extend vec_xxpermdi built-in for __int128 args
> 
> Add a new signed and unsigned overloaded instances for vec_xxpermdi
> 
>__int128 vec_xxpermdi (__int128, __int128, const int);
>__uint128 vec_xxpermdi (__uint128, __uint128, const int);
> 
> Update the documentation to include a reference to the new built-in
> instances.
> 
> Add test cases for the new overloaded instances.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>   overloaded built-in instances.
>   * doc/extend.texi:  Add documentation for new overloaded built-in
>   instances.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-overload.def |   4 +
>  gcc/doc/extend.texi   |   2 +
>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>  3 files changed, 235 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index a210c5ad10d..45000f161e4 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4932,6 +4932,10 @@
>  XXPERMDI_4SF  XXPERMDI_VF
>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>  XXPERMDI_2DF  XXPERMDI_VD
> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
> +XXPERMDI_1TI  XXPERMDI_1TI
> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
> +XXPERMDI_1TI  XXPERMDI_1TUI

Nits:
  - Move them before "vf __builtin_vsx_xxpermdi (vf, vf, const int);" so
they are close to instances for other integral types.
  - As the existing name convention, _{SQ,UQ} are better.

vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
   XXPERMDI_1TI  XXPERMDI_1SQ
vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
   XXPERMDI_1TI  XXPERMDI_1UQ

>  
>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0756230b19e..edfef1bdab7 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char 
> *);
>  vector double vec_xxpermdi (vector double, vector double, const int);
>  vector float vec_xxpermdi (vector float, vector float, const int);
>  vector long long vec_xxpermdi (vector long long, vector long long, const 
> int);

> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
> +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const int);

Nit: These two lines break the long long and unsigned long long lines, can you 
move
them one line upward?  Also using the explicit "signed" and "unsigned" would be
better than "__{u,}int128".

>  vector unsigned long long vec_xxpermdi (vector unsigned long long,
>  vector unsigned long long, const 
> int);
>  vector int vec_xxpermdi (vector int, vector int, const int);
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> new file mode 100644
> index 000..2d5dce09404
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> @@ -0,0 +1,229 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-options "-save-temps" } */

Nit: dg-options line isn't needed as it doesn't check assembly.

BR,
Kewen

> +
> +#include 
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include 
> +void print_i128 (unsigned __int128 val)
> +{
> +  printf(" 0x%016llx%016llx",
> + (unsigned long long)(val >> 64),
> + (unsigned long long)(val & 0x));
> +}
> +#endif
> +
> +extern void abort (void);
> +
> +union convert_union {
> +  vector signed __int128s128;
> +  vector unsigned __int128  u128;
> +  char  val[16];
> +} convert;
> +
> +int check_u128_result(vector unsigned __int128 vresult_u128,
> +   vector unsigned __int128 expected_vresult_u128)
> +{
> +  /* Use a for loop to check each byte manually so the test case will
> + run with ISA 2.06.
> +
> + Return 1 if they match, 0 otherwise.  */
> +
> +  int i;
> +
> +  union convert_union result;
> +  union convert_union expected;
> +
> +  result.u128 = vresult_u128;
> +  expected.u128 = expected_vresult_u128;
> +
> +  /* Check if each byte of the result and expected match. */
> +  for (i = 0; i < 16; i++)
> +{
> +  if (result.val[i] != expected.val[i])
> + return 0;
> +}
> +  return 1;
> +}
> +
> +int check_s128_result(vector 

Re: [PATCH 9/13 ver 3] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-06-03 Thread Kewen.Lin
Hi,

on 2024/5/30 00:06, Carl Love wrote:
> This was patch 8 in the previous series.  Updated patch per the feedback 
> comments.
> 
> Carl 
> 
> 
> rs6000, remove __builtin_vsx_vperm_* built-ins
> 
> The undocumented built-ins:
>   __builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti,
>   __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df,
>   __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns,
>   __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si,
>   __builtin_vsx_vperm_4si_uns
> 
> are duplicats of the __builtin_altivec_* builtins that are used by
> the overloaded vec_perm built-in that is documented in the PVIPR.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
>   built-in definitions and comments.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
>   __builtin_vsx_vperm): Change call to built-in to the  overloaded
>   built-in vec_perm.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 33 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c| 22 ++---
>  2 files changed, 11 insertions(+), 44 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index a78c52183bc..f02a8c4de45 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1529,39 +1529,6 @@
>const vf __builtin_vsx_uns_floato_v2di (vsll);
>  UNS_FLOATO_V2DI unsfloatov2di {}
>  
> -; These are duplicates of __builtin_altivec_* counterparts, and are being
> -; kept for backwards compatibility.  The reason for their existence is
> -; unclear.  TODO: Consider deprecation/removal at some point.
> -  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
> -VPERM_16QI_X altivec_vperm_v16qi {}
> -
> -  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
> -VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
> -
> -  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
> -VPERM_1TI_X altivec_vperm_v1ti {}
> -
> -  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
> -VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
> -
> -  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
> -VPERM_2DF_X altivec_vperm_v2df {}
> -
> -  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
> -VPERM_2DI_X altivec_vperm_v2di {}
> -
> -  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
> -VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
> -
> -  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
> -VPERM_4SF_X altivec_vperm_v4sf {}
> -
> -  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
> -VPERM_4SI_X altivec_vperm_v4si {}
> -
> -  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
> -VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
> -
>const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
>  VPERM_8HI_X altivec_vperm_v8hi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index e20d3f03c86..f06d871b6b1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -88,17 +88,17 @@ int do_perm(void)
>  {
>int i = 0;
>  
> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
> -
> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
> +  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
> +  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
> +  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
> +  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
> +  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
> +
> +  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); 

Re: [PATCH 8/13 ver 3] rs6000, remove the vec_xxsel built-ins, they are, duplicates

2024-06-03 Thread Kewen.Lin
Hi,

on 2024/5/30 00:05, Carl Love wrote:
> This was patch 7 in the previous series.  Patch was updated to address the 
> feedback comments.
> 
> Carl 
> 
> 
> rs6000, remove the vec_xxsel built-ins, they are duplicates
> 
> The following undocumented built-ins are covered by the existing overloaded
> vec_sel built-in definitions.
> 
>   const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)
> 
>   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)
> 
>   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)
> 
>   const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)
> 
>   const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)
> 
>   const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)
> 
>   const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)
> 
>   const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)
> 
>   const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)
> 
>   const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)
> 
> This patch removed the duplicate built-in definitions so users will only
> use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
> 16qi, 4sf, 2df] tests are also removed.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi,
>   __builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
>   __builtin_vsx_xxsel_2di,__builtin_vsx_xxsel_2di_uns,
>   __builtin_vsx_xxsel_4sf,__builtin_vsx_xxsel_4si,
>   __builtin_vsx_xxsel_4si_uns,__builtin_vsx_xxsel_8hi,
>   __builtin_vsx_xxsel_8hi_uns): Removebuilt-in definitions.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
>   __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
>   __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df,
>   __builtin_vsx_xxsel): Change built-in call to overloaded built-in
>   call vec_sel.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 30 
>  .../gcc.target/powerpc/vsx-builtin-3.c| 36 ++-
>  2 files changed, 19 insertions(+), 47 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index ea0da77f13e..a78c52183bc 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1898,36 +1898,6 @@
>const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
>  XXPERMDI_8HI vsx_xxpermdi_v8hi {}
>  
> -  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> -XXSEL_16QI vector_select_v16qi {}
> -
> -  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> -XXSEL_16QI_UNS vector_select_v16qi_uns {}
> -
> -  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> -XXSEL_2DF vector_select_v2df {}
> -
> -  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> -XXSEL_2DI vector_select_v2di {}
> -
> -  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> -XXSEL_2DI_UNS vector_select_v2di_uns {}
> -
> -  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> -XXSEL_4SF vector_select_v4sf {}
> -
> -  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> -XXSEL_4SI vector_select_v4si {}
> -
> -  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> -XXSEL_4SI_UNS vector_select_v4si_uns {}
> -
> -  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> -XXSEL_8HI vector_select_v8hi {}
> -
> -  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> -XXSEL_8HI_UNS vector_select_v8hi_uns {}
> -
>const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
>  XXSLDWI_16QI vsx_xxsldwi_v16qi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index ff875c55304..e20d3f03c86 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -37,6 +37,8 @@
>  /* { dg-final { scan-assembler "xvcvsxdsp" } } */
>  /* { dg-final { scan-assembler "xvcvuxdsp" } } */
>  
> +#include 
> +
>  extern __vector int si[][4];
>  extern __vector short ss[][4];
>  extern __vector signed char sc[][4];
> @@ 

Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-06-03 Thread Kewen.Lin
Hi,

on 2024/5/30 00:03, Carl Love wrote:
> This was patch 6 in the previous series.  Updated the documentation file per 
> the comments.  No functional changes to the patch.
> 
>   Carl 
> 
> 
> rs6000, add overloaded vec_sel with int128 arguments
> 
> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
> and return a signed/unsigned int128 result.
> 
> Extending the vec_sel built-in makes the existing buit-ins
> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
> patch removes these built-ins.
> 
> The patch adds documentation and test cases for the new overloaded vec_sel
> built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>   __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>   * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>   definitions.
>   * doc/extend.texi: Add documentation for new vec_sel instances.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |   6 -
>  gcc/config/rs6000/rs6000-overload.def |   4 +
>  gcc/doc/extend.texi   |  12 ++
>  .../powerpc/vec-sel-runnable-i128.c   | 129 ++
>  4 files changed, 145 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 13e36df008d..ea0da77f13e 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1904,12 +1904,6 @@
>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>  
> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
> -XXSEL_1TI vector_select_v1ti {}
> -
> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
> -
>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>  XXSEL_2DF vector_select_v2df {}
>  
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 4d857bb1af3..a210c5ad10d 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3274,6 +3274,10 @@
>  VSEL_2DF  VSEL_2DF_B
>vd __builtin_vec_sel (vd, vd, vull);
>  VSEL_2DF  VSEL_2DF_U
> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
> +VSEL_1TI  VSEL_1TI_S
> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
> +VSEL_1TI_UNS  VSEL_1TI_U

I just noticed that for integral types, such as: signed/unsigned int, we have 
six instances:

  vsi __builtin_vec_sel (vsi, vsi, vbi);
VSEL_4SI  VSEL_4SI_B
  vsi __builtin_vec_sel (vsi, vsi, vui);
VSEL_4SI  VSEL_4SI_U
  vui __builtin_vec_sel (vui, vui, vbi);
VSEL_4SI_UNS  VSEL_4SI_UB
  vui __builtin_vec_sel (vui, vui, vui);
VSEL_4SI_UNS  VSEL_4SI_UU
  vbi __builtin_vec_sel (vbi, vbi, vbi);
VSEL_4SI_UNS  VSEL_4SI_BB
  vbi __builtin_vec_sel (vbi, vbi, vui);

It considers the control vector can only have unsigned and bool types, also 
consider the
return type can be bool.  It aligns with what PVIPR defines, so here we should 
have:

vsq __builtin_vec_sel (vsq, vsq, vbq);
vsq __builtin_vec_sel (vsq, vsq, vuq);
vuq __builtin_vec_sel (vuq, vuq, vbq);
vuq __builtin_vec_sel (vuq, vuq, vuq);
vbq __builtin_vec_sel (vbq, vbq, vbq);
vbq __builtin_vec_sel (vbq, vbq, vuq);

Sorry that I didn't find this in the previous review.


>  ; The following variants are deprecated.
>vsll __builtin_vec_sel (vsll, vsll, vsll);
>  VSEL_2DI_B  VSEL_2DI_S
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index b88e61641a2..0756230b19e 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
> 64-bit PowerPC
>  family of processors, for efficient use of 128-bit floating point
>  (@code{__float128}) values.
>  
> +Vector select
> +
> +@smallexample
> +vector signed __int128 vec_sel (vector signed __int128,
> +   vector signed __int128, vector signed __int128);
> +vector unsigned __int128 vec_sel (vector unsigned __int128,
> +   vector unsigned __int128, vector unsigned __int128);
> +@end smallexample

As above, the documentation here has to consider vector bool __int128 and note 
that
the control vector are of type either vector unsigned __int128 or vector bool 
__int128.

> +
> +The instance is an extension of the exiting overloaded built-in 
> @code{vec_sel}
> +that is documented in the PVIPR.
> +
>  @node Basic PowerPC Built-in Functions Available on ISA 2.06
>  @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
> 

Re: [PATCH 3/13 ver 3] rs6000, fix error in unsigned vector float to unsigned int built-in definition

2024-06-03 Thread Kewen.Lin
Hi,

on 2024/5/29 23:56, Carl Love wrote:
> This patch was updated per the feedback comment from the previous version in 
> series 2.
> 
>  Carl 
> ---
> 
> rs6000, fix error in unsigned vector float to unsigned int built-in 
> definitions
> 
> The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of
> doubles and return a vector of unsigned long long ints.  Similarly
> __builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to
> return a vector of unsinged ints.  The definitions are using the signed
> version of the instructions not the unsigned version of the instruction.
> The results should also be unsigned.  The builtins are used by the
> overloaded vec_unsigned builtin which has an unsigned result.
> 
> Similarly the built-ins __builtin_vsx_vunsignede_v2df and
> __builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result.
> If the floating point argument is negative, the unsigned result is zero.
> The built-ins are used in the overloaded built-in vec_unsignede and
> vec_unsignedo respectively.
> 
> Add a test cases for a negative floating point arguments for each of the
> above built-ins.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
>   __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
>   __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/builtins-3-runnable.c: Add tests for
>   vec_unsignede and vec_unsignedo with negative arguments.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 
>  .../gcc.target/powerpc/builtins-3-runnable.c  | 30 +--
>  2 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index c6d2ea1bc39..bf9a0ae22fc 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1580,16 +1580,16 @@
>const vsi __builtin_vsx_vsignedo_v2df (vd);
>  VEC_VSIGNEDO_V2DF vsignedo_v2df {}
>  
> -  const vsll __builtin_vsx_vunsigned_v2df (vd);
> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
> +  const vull __builtin_vsx_vunsigned_v2df (vd);
> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
>  
> -  const vsi __builtin_vsx_vunsigned_v4sf (vf);
> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
> +  const vui __builtin_vsx_vunsigned_v4sf (vf);
> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
>  
> -  const vsi __builtin_vsx_vunsignede_v2df (vd);
> +  const vui __builtin_vsx_vunsignede_v2df (vd);
>  VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
>  
> -  const vsi __builtin_vsx_vunsignedo_v2df (vd);
> +  const vui __builtin_vsx_vunsignedo_v2df (vd);
>  VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
>  
>const vf __builtin_vsx_xscvdpsp (double);
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> index 0231a1fd086..5dcdfbee791 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> @@ -313,6 +313,14 @@ int main()
>   test_unsigned_int_result (ALL, vec_uns_int_result,
> vec_uns_int_expected);
>  
> + /* Convert single precision float to  unsigned int.  Negative
> +arguments.  */
> + vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
> + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
> + vec_uns_int_result = vec_unsigned (vec_flt0);
> + test_unsigned_int_result (ALL, vec_uns_int_result,
> +   vec_uns_int_expected);
> +
>   /* Convert double precision float to long long unsigned int */
>   vec_dble0 = (vector double){124.930, 8134.49};
>   vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
> @@ -320,10 +328,18 @@ int main()
>   test_ll_unsigned_int_result (vec_ll_uns_int_result,
>vec_ll_uns_int_expected);
>  
> + /* Convert double precision float to long long unsigned int. Negative
> +arguments.  */
> + vec_dble0 = (vector double){-24.93, -134.9};
> + vec_ll_uns_int_expected = (vector long long unsigned int){0, 0};
> + vec_ll_uns_int_result = vec_unsigned (vec_dble0);
> + test_ll_unsigned_int_result (vec_ll_uns_int_result,
> +  vec_ll_uns_int_expected);
> +
>   /* Convert double precision vector float to vector unsigned int,
> -even words */
> - vec_dble0 = (vector double){3124.930, 8234.49};
> - vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
> +even words.  Negative arguments */
> + vec_dble0 = (vector double){-124.930, -234.49};
> + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
> 

Re: [PATCH] fix PowerPC < 7 w/ Altivec not to default to power7

2024-06-03 Thread Kewen.Lin
Hi Rene,

on 2024/5/31 22:57, Rene Rebe wrote:
> Hi Kewen,
> 
> thank you for your reply.
> 
>> on 2024/3/8 19:33, Rene Rebe wrote:
>>> This might not be the best timing -short before a major release-,
>>> however, Sam just commented on the bug I filled years ago [1], so here
>>> we go:
>>>
>>> Glibc uses .machine to determine assembler optimizations to use.
>>> However, since reworking the rs6000 .machine output selection in
>>> commit e154242724b084380e3221df7c08fcdbd8460674 22 May 2019, G5 as
>>> well as Cell, and even power4 w/ -maltivec currently resulted in
>>> power7. Mask _ALTIVEC away as the .machine selection already did for
>>> GFX and GPOPT.
>>
>> Thanks for fixing, this fix looks reasonable to me, OPTION_MASK_ALTIVEC
>> is a part of POWERPC_7400_MASK so any specified cpu type which has this
>> POWERPC_7400_MASK by default and isn't handled early in function
>> rs6000_machine_from_flags can suffer from this issue.
>>
>>>
>>> powerpc64-t2-linux-gnu-gcc  test.c -S -o - -mcpu=G5
>>> .file   "test.c"
>>> .machine power7
>>> .abiversion 2
>>> .section".text"
>>> .ident  "GCC: (GNU) 10.2.0"
>>> .section.note.GNU-stack,"",@progbits
>>>
>>
>> Nit: Could you also add one test case for this?
>>
>> btw, -mdejagnu-cpu=G5 can force the cpu type in dg-options.
> 
> It took me a while to allocate enough time to study dejagnu and write
> a suitable test, I hope this suits your needs:
> 
> --- ./gcc/testsuite/gcc.target/powerpc/pr97367.c.vanilla  2024-05-30 
> 18:26:29.839784279 +0200
> +++ ./gccc/testsuite/gcc.target/powerpc/pr97367.c 2024-05-30 
> 18:20:34.873818482 +0200
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target lp64 } */

Nit: I think this require-effective-target line isn't needed.

> +/* { dg-options "-S -mcpu=G5" } */

"dg-do compile" ensures "Compile with ‘-S’ to produce an assembly
code file.", so "-S" is redundant.  As hinted before, we want
-mdejagnu-cpu=G5 here rather than -mcpu=G5 because for some old
versions of dejagnu the command line arguments you set in RUNTESTFLAGS
will override the one set in dg-option, -mdejagnu-cpu= is one internal
option only for test suite (also powerpc specific).

Nit: I prefer to have one dummy function here to avoid some
possible failure if users specify some options which forbids
an empty translation unit.  Maybe something like:

int dummy ()
{
  return 0;
}

> +
> +/* { dg-final { scan-assembler "power4" } } */
> 
> I double checked it works and fails as expected.
> 
>>> We ship this in T2/Linux [2] since 2020 and it is tested on G5, Cell
>>> and Power8.
>>>
>>> Signed-of-by: René Rebe 
>>>
>>> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97367
>>> [2] https://t2sde.org
>>>
>>> --- gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc.vanilla  
>>> 2021-04-25 22:57:16.964223106 +0200
>>> +++ gcc-11.1.0-RC-20210423/gcc/config/rs6000/rs6000.cc  2021-04-25 
>>> 22:57:27.193223841 +0200
>>> @@ -5765,7 +5765,7 @@
>>>HOST_WIDE_INT flags = rs6000_isa_flags;
>>>  
>>>/* Disable the flags that should never influence the .machine selection. 
>>>  */
>>> -  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
>>> OPTION_MASK_ISEL);
>>> +  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
>>> OPTION_MASK_ALTIVEC | OPTION_MASK_ISEL);
>>
>> Nit: This line is too long and needs re-format.
> 
> While I don't really find ~100 chars too long for modern standards,

It's required by https://gcc.gnu.org/codingconventions.html#C_Formatting
and there is one script for the check contrib/check_GNU_style.sh,
as https://gcc.gnu.org/contribute.html#standards.

BR,
Kewen

> I'm happy to line break that for you once the above test is approved.
> 
> Thank you so much,
> 
>   René
> 


[PATCH 09/52 v2] Replace {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type

2024-06-03 Thread Kewen.Lin
Hi Joseph,

on 2024/6/4 01:59, Joseph Myers wrote:
> On Sun, 2 Jun 2024, Kewen Lin wrote:
> 
>> +value less than or equal to mode precision of the mode used for C type
>> +@code{long double} (from hook @code{targetm.c.mode_for_floating_type}
>> +with tree_index TI_LONG_DOUBLE_TYPE).  If you do not define this macro,
>> +mode precision of the mode used for C type @code{long double} is the
>> +default.
> 
> Identifiers such as tree_index and TI_LONG_DOUBLE_TYPE should be enclosed 
> in @code{} in documentation (in this case it would be better to say "with 
> argument @code{TI_LONG_DOUBLE_TYPE}" rather than mentioning the tree_index 
> type of the argument).

Thanks for the review, updated as v2 (also incorporating Richi's comments on
"type type" typo):

Subject: [PATCH 09/52] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
 mode_for_floating_type

Currently how we determine which mode will be used for a
floating point type is that for a given type precision
(size) call mode_for_size to get the first mode which has
this size in the specified class.  On Powerpc, we have
three modes (TF/KF/IF) having the same mode precision 128
(see[1]), so the processing forces us to have to place TF
at the first place, it would require us to make more
adjustment in some generic code to avoid some unexpected
mode conversions and it would be even worse if we get rid
of TF eventually one day.  And as Joseph pointed out in [2],
"floating types should have their mode, not a poorly
defined precision value", as Joseph and Richi suggested,
this patch is to introduce one hook mode_for_floating_type
which returns the corresponding mode for type float, double
or long double.  The default implementation returns SFmode
for float and DFmode for double or long double.  For ports
which need special treatment, there are some other patches
for their own port specific implementation (referring to
how {,LONG_}DOUBLE_TYPE_SIZE get used there).  For all
generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
on the context, some of them are replaced with TYPE_PRECISION
of the according type node, some other are replaced with
GET_MODE_PRECISION on the mode from mode_for_floating_type.
This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
specific are removed, but there are still some which are
good to be kept for readability then they get renamed with
port specific prefix.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ChangeLog:

* coretypes.h (enum tree_index): Forward declaration.
* defaults.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* doc/rtl.texi: Update document by replacing {FLOAT,DOUBLE}_TYPE_SIZE
with C type {float,double}.
* doc/tm.texi.in: Document new hook mode_for_floating_type, remove
document entries for {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE and
update document for WIDEST_HARDWARE_FP_SIZE.
* doc/tm.texi: Regenerate.
* emit-rtl.cc (init_emit_once): Replace DOUBLE_TYPE_SIZE by
calling targetm.c.mode_for_floating_type with TI_DOUBLE_TYPE.
* real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
* system.h (FLOAT_TYPE_SIZE): Poison.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* target.def (mode_for_floating_type): New hook.
* targhooks.cc (default_mode_for_floating_type): New function.
(default_scalar_mode_supported_p): Update macros
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
targetm.c.mode_for_floating_type with
TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
* targhooks.h (default_mode_for_floating_type): New declaration.
* tree-core.h (enum tree_index): Specify underlying type unsigned
to sync with forward declaration in coretypes.h.
(NUM_FLOATN_TYPES): Explicitly convert to int.
(NUM_FLOATNX_TYPES): Likewise.
(NUM_FLOATN_NX_TYPES): Likewise.
* tree.cc (build_common_tree_nodes): Update macros
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
targetm.c.mode_for_floating_type with
TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE and set type mode accordingly.
---
 gcc/coretypes.h|  1 +
 gcc/defaults.h | 12 
 gcc/doc/rtl.texi   |  2 +-
 gcc/doc/tm.texi| 33 +
 gcc/doc/tm.texi.in | 27 +++
 gcc/emit-rtl.cc|  3 ++-
 gcc/real.h |  7 ---
 gcc/system.h   |  3 ++-
 gcc/target.def |  9 +
 gcc/targhooks.cc   | 18 +++---
 gcc/targhooks.h|  1 +
 gcc/tree-core.h| 13 +++--
 gcc/tree.cc| 18 +++---
 13 files changed, 77 insertions(+), 70 deletions(-)

diff 

Re: [PATCH 06/52] m2: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Joseph and Gaius,

on 2024/6/4 02:02, Joseph Myers wrote:
> On Sun, 2 Jun 2024, Kewen Lin wrote:
> 
>> diff --git a/gcc/m2/gm2-gcc/m2type.cc b/gcc/m2/gm2-gcc/m2type.cc
>> index 571923c08ef..d52cbdf0b99 100644
>> --- a/gcc/m2/gm2-gcc/m2type.cc
>> +++ b/gcc/m2/gm2-gcc/m2type.cc
>> @@ -1420,7 +1420,7 @@ build_m2_short_real_node (void)
>>/* Define `REAL'.  */
>>  
>>c = make_node (REAL_TYPE);
>> -  TYPE_PRECISION (c) = FLOAT_TYPE_SIZE;
>> +  TYPE_PRECISION (c) = TYPE_PRECISION (float_type_node);
>>layout_type (c);
>>return c;
>>  }
>> @@ -1433,7 +1433,7 @@ build_m2_real_node (void)
>>/* Define `REAL'.  */
>>  
>>c = make_node (REAL_TYPE);
>> -  TYPE_PRECISION (c) = DOUBLE_TYPE_SIZE;
>> +  TYPE_PRECISION (c) = TYPE_PRECISION (double_type_node);
>>layout_type (c);
>>return c;
>>  }
>> @@ -1447,7 +1447,7 @@ build_m2_long_real_node (void)
>>if (M2Options_GetIBMLongDouble ())
>>  {
>>longreal = make_node (REAL_TYPE);
>> -  TYPE_PRECISION (longreal) = LONG_DOUBLE_TYPE_SIZE;
>> +  TYPE_PRECISION (longreal) = TYPE_PRECISION (long_double_type_node);
> 
> This looks rather like m2 would still have the same problem the generic 
> code previously had: going via precision when that might not uniquely 
> determine the desired machine mode.  And so making sure to use the right 
> machine mode as done when setting up long_double_type_node etc. would be 
> better than keeping this code copying TYPE_PRECISION and hoping to 
> determine a machine mode from that.  It certainly looks like this code 
> wants to match float, double and long double, rather than possibly getting 
> a different mode with possibly the same TYPE_PRECISION.

Good point, sorry that I just did a replacement without checking the context.
If the above holds (Gaius can confirm or clarify), SET_TYPE_MODE would be
also applied here, that is:

diff --git a/gcc/m2/gm2-gcc/m2type.cc b/gcc/m2/gm2-gcc/m2type.cc
index d52cbdf0b99..5ff02a18876 100644
--- a/gcc/m2/gm2-gcc/m2type.cc
+++ b/gcc/m2/gm2-gcc/m2type.cc
@@ -1421,6 +1421,7 @@ build_m2_short_real_node (void)

   c = make_node (REAL_TYPE);
   TYPE_PRECISION (c) = TYPE_PRECISION (float_type_node);
+  SET_TYPE_MODE (c, TYPE_MODE (float_type_node));
   layout_type (c);
   return c;
 }
@@ -1434,6 +1435,7 @@ build_m2_real_node (void)

   c = make_node (REAL_TYPE);
   TYPE_PRECISION (c) = TYPE_PRECISION (double_type_node);
+  SET_TYPE_MODE (c, TYPE_MODE (double_type_node));
   layout_type (c);
   return c;
 }

I'm not sure and curious why the above builds new nodes for short real and
real but re-use float128_type_node or long_double_type_node for some cases,
some special needs cause the former ones should have separated nodes?

> 
> (I don't know if the M2Options_GetIBMLongDouble call would be needed at 
> all once you use the machine mode for long double in a reliable way, or 
> whether this code could be further simplified.)

long_double_type_node should already take care of ibmlongdouble, IIUC it
would be like:

@@ -1443,13 +1445,7 @@ build_m2_long_real_node (void)
 {
   tree longreal;

-  /* Define `LONGREAL'.  */
-  if (M2Options_GetIBMLongDouble ())
-{
-  longreal = make_node (REAL_TYPE);
-  TYPE_PRECISION (longreal) = TYPE_PRECISION (long_double_type_node);
-}
-  else if (M2Options_GetIEEELongDouble ())
+  if (M2Options_GetIEEELongDouble ())
 longreal = float128_type_node;
   else
 longreal = long_double_type_node;

unless there is some special need requiring one different node for
ibmlongdouble, then:

diff --git a/gcc/m2/gm2-gcc/m2type.cc b/gcc/m2/gm2-gcc/m2type.cc
index c808e3e8dbb..5e479e4bbce 100644
--- a/gcc/m2/gm2-gcc/m2type.cc
+++ b/gcc/m2/gm2-gcc/m2type.cc
@@ -1450,6 +1450,7 @@ build_m2_long_real_node (void)
 {
   longreal = make_node (REAL_TYPE);
   TYPE_PRECISION (longreal) = TYPE_PRECISION (long_double_type_node);
+  SET_TYPE_MODE (longreal, TYPE_MODE (long_double_type_node));
 }
   else if (M2Options_GetIEEELongDouble ())
 longreal = float128_type_node;

BR,
Kewen



[PATCH 02/52 v2] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Iain,

on 2024/6/3 22:39, Iain Buclaw wrote:
> Excerpts from Kewen.Lin's message of Juni 3, 2024 10:57 am:
>> Hi Iain,
>>
>> on 2024/6/3 16:40, Iain Buclaw wrote:
>>> Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
 Joseph pointed out "floating types should have their mode,
 not a poorly defined precision value" in the discussion[1],
 as he and Richi suggested, the existing macros
 {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
 hook mode_for_floating_type.  To be prepared for that, this
 patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
 TYPE_PRECISION of long_double_type_node.

 [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

>>>
>>> Thanks, one question though: Is TYPE_PRECISION really equivalent to
>>> LONG_DOUBLE_TYPE_SIZE?
>>
>> Yes, it's guaranteed by the code in build_common_tree_nodes:
>>
>>   long_double_type_node = make_node (REAL_TYPE);
>>   TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
>>   layout_type (long_double_type_node);
>>
>> , the macro LONG_DOUBLE_TYPE_SIZE is assigned to TYPE_PRECISION of
>> long_double_type_node, layout_type will only pick up one mode as
>> the given precision and won't change it.
>>
>>>
>>> Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
>>> the answer to be "no".
>>
>> I'm afraid it's poorly named before.
>>
> 
> Thanks for confirming Kewen.
> 
> I suspect then that this code is incorrectly using this macro, and it
> should instead be using:
> 
> int_size_in_bytes(long_double_type_node)
> 
> as any padding should be considered as part of the overall type size for
> the purpose that this field serves in the D part of the front-end.

Got it, thanks for the explanation and suggestion.

> 
> Are you able to update the patch this way instead? Otherwise I'm happy
> to push the change instead.

Sure, updated as below:

Subject: [PATCH 02/52] d: Replace use of LONG_DOUBLE_TYPE_SIZE

Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type.  To be prepared for that, this
patch is to remove the only one use of LONG_DOUBLE_TYPE_SIZE
in d.  Iain found that LONG_DOUBLE_TYPE_SIZE is poorly named
and used incorrectly before, so this patch follows his advice
with int_size_in_bytes.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

Co-authored-by: Iain Buclaw 

gcc/d/ChangeLog:

* d-target.cc (Target::_init): Use int_size_in_bytes of
long_double_type_node to replace the expression with
LONG_DOUBLE_TYPE_SIZE for c.long_doublesize assignment.
---
 gcc/d/d-target.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/d/d-target.cc b/gcc/d/d-target.cc
index 127b9d7ce7c..dd46e535891 100644
--- a/gcc/d/d-target.cc
+++ b/gcc/d/d-target.cc
@@ -163,7 +163,7 @@ Target::_init (const Param &)
   this->c.intsize = (INT_TYPE_SIZE / BITS_PER_UNIT);
   this->c.longsize = (LONG_TYPE_SIZE / BITS_PER_UNIT);
   this->c.long_longsize = (LONG_LONG_TYPE_SIZE / BITS_PER_UNIT);
-  this->c.long_doublesize = (LONG_DOUBLE_TYPE_SIZE / BITS_PER_UNIT);
+  this->c.long_doublesize = int_size_in_bytes (long_double_type_node);
   this->c.wchar_tsize = (WCHAR_TYPE_SIZE / BITS_PER_UNIT);

   this->c.bitFieldStyle = targetm.ms_bitfield_layout_p (unknown_type_node)
--
2.43.0

BR,
Kewen


Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Andrew,

on 2024/6/3 17:14, Andrew Stubbs wrote:
> On 03/06/2024 04:01, Kewen Lin wrote:
>> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> defines in gcn port.
>>
>> gcc/ChangeLog:
>>
>> * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
> 
> 
> Assuming that this does not enable some long-double mode support that wasn't 
> present before then this LGTM.

Thanks!  Yes, it doesn't, as the default hook implementation returns DFmode for 
long double type.

> 
> GCN does have some partially implemented support for HFmode ... do I need to 
> do something new for that to work?

For this hook, no, as it's mainly for float, double and long double types (C 
language supported non decimal floating
point types).  If you are referring to _Float16, I guess you may be interested 
in another hook TARGET_FLOATN_MODE
which is for FloatN types.

BR,
Kewen

> 
> Andrew
> 
>> ---
>>   gcc/config/gcn/gcn.h | 3 ---
>>   1 file changed, 3 deletions(-)
>>
>> diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
>> index afa615320ca..e3bfd29c17d 100644
>> --- a/gcc/config/gcn/gcn.h
>> +++ b/gcc/config/gcn/gcn.h
>> @@ -111,9 +111,6 @@
>>   #define INT_TYPE_SIZE  32
>>   #define LONG_TYPE_SIZE  64
>>   #define LONG_LONG_TYPE_SIZE  64
>> -#define FLOAT_TYPE_SIZE  32
>> -#define DOUBLE_TYPE_SIZE  64
>> -#define LONG_DOUBLE_TYPE_SIZE  64
>>   #define DEFAULT_SIGNED_CHAR  1
>>   #define PCC_BITFIELD_TYPE_MATTERS 1
>>   
> 



Re: [PATCH 51/52] sparc: New hook implementation sparc_c_mode_for_floating_type

2024-06-03 Thread Kewen.Lin
Hi Eric,

on 2024/6/3 17:02, Eric Botcazou wrote:
>>  * config/sparc/sparc.cc (sparc_c_mode_for_floating_type): New
>>  (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>>  (FLOAT_TYPE_SIZE): Remove.
>>  (DOUBLE_TYPE_SIZE): Likewise.
>>  (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  (sparc_type_code): Replace FLOAT_TYPE_SIZE with TYPE_PRECISION of
>>  float_type_node.
>>  * config/sparc/sparc.h (FLOAT_TYPE_SIZE): Remove.
>>  (DOUBLE_TYPE_SIZE): Remove.
>>  * config/sparc/freebsd.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/linux.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/linux64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/netbsd-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/openbsd64.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sol2.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sp-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
>>  * config/sparc/sp64-elf.h (LONG_DOUBLE_TYPE_SIZE): Rename to ...
>>  (SPARC_LONG_DOUBLE_TYPE_SIZE): ... this.
> 
> OK, modulo the following tweaks:

Thanks for the review!

> 
>> --- a/gcc/config/sparc/sparc.cc
>> +++ b/gcc/config/sparc/sparc.cc
>> @@ -718,6 +718,7 @@ static bool sparc_vectorize_vec_perm_const
>> (machine_mode, machine_mode, const vec_perm_indices &);
>>  static bool sparc_can_follow_jump (const rtx_insn *, const rtx_insn *);
>>  static HARD_REG_SET sparc_zero_call_used_regs (HARD_REG_SET);
>> +static machine_mode sparc_c_mode_for_floating_type (enum tree_index);
>>  
>>  #ifdef SUBTARGET_ATTRIBUTE_TABLE
>>  /* Table of valid machine attributes.  */
>> @@ -971,6 +972,9 @@ char sparc_hard_reg_printed[8];
>>  #undef TARGET_ZERO_CALL_USED_REGS
>>  #define TARGET_ZERO_CALL_USED_REGS sparc_zero_call_used_regs
>>
>> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
>> +#define TARGET_C_MODE_FOR_FLOATING_TYPE sparc_c_mode_for_floating_type
>> +
>>  struct gcc_target targetm = TARGET_INITIALIZER;
>>
>>  /* Return the memory reference contained in X if any, zero otherwise.  */
>> @@ -9824,16 +9828,9 @@ sparc_assemble_integer (rtx x, unsigned int size, int
>> aligned_p) #define LONG_LONG_TYPE_SIZE (BITS_PER_WORD * 2)
>>  #endif
>>
>> -#ifndef FLOAT_TYPE_SIZE
>> -#define FLOAT_TYPE_SIZE BITS_PER_WORD
>> -#endif
>> -
>> -#ifndef DOUBLE_TYPE_SIZE
>> -#define DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>> -#endif
>> -
>> -#ifndef LONG_DOUBLE_TYPE_SIZE
>> -#define LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>> +/* LONG_DOUBLE_TYPE_SIZE get poisoned, so add SPARC_ prefix.  */
>> +#ifndef SPARC_LONG_LONG_TYPE_SIZE
>> +#define SPARC_LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
>>  #endif
>>
>>  unsigned long
> 
> You can delete {SPARC_}LONG_DOUBLE_TYPE_SIZE too.

Good point, sparc.h already defines the default.

> 
>> @@ -9920,7 +9917,7 @@ sparc_type_code (tree type)
>>/* Carefully distinguish all the standard types of C,
>>   without messing up if the language is not C.  */
>>
>> -  if (TYPE_PRECISION (type) == FLOAT_TYPE_SIZE)
>> +  if (TYPE_PRECISION (type) == TYPE_PRECISION (float_type_node))
>>  return (qualifiers | 6);
>>
>>else
>> @@ -13984,4 +13981,16 @@ sparc_zero_call_used_regs (HARD_REG_SET
>> need_zeroed_hardregs) return need_zeroed_hardregs;
>>  }
>>
>> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
>> +   for TI_LONG_DOUBLE_TYPE which is for long double type, go with the
>> +   default one for the others.  */
>> +
>> +static machine_mode
>> +sparc_c_mode_for_floating_type (enum tree_index ti)
>> +{
>> +  if (ti == TI_LONG_DOUBLE_TYPE)
>> +return SPARC_LONG_DOUBLE_TYPE_SIZE == 128 ? TFmode : DFmode;
>> +  return default_mode_for_floating_type (ti);
>> +}
>> +
>>  #include "gt-sparc.h"
> 
> I think that TI_LONG_DOUBLE_TYPE is self-explanatory so just:
> 
> /* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return TFmode or DFmode
>for TI_LONG_DOUBLE_TYPE and the default for others.
> 

Will adjust it, thanks!

BR,
Kewen



Re: [PATCH 09/52] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type

2024-06-03 Thread Kewen.Lin
Hi Richi,

on 2024/6/3 14:49, Richard Biener wrote:
> On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin  wrote:
>>
>> Currently how we determine which mode will be used for a
>> floating point type is that for a given type precision
>> (size) call mode_for_size to get the first mode which has
>> this size in the specified class.  On Powerpc, we have
>> three modes (TF/KF/IF) having the same mode precision 128
>> (see[1]), so the processing forces us to have to place TF
>> at the first place, it would require us to make more
>> adjustment in some generic code to avoid some unexpected
>> mode conversions and it would be even worse if we get rid
>> of TF eventually one day.  And as Joseph pointed out in [2],
>> "floating types should have their mode, not a poorly
>> defined precision value", as Joseph and Richi suggested,
>> this patch is to introduce one hook mode_for_floating_type
>> which returns the corresponding mode for type float, double
>> or long double.  The default implementation returns SFmode
>> for float and DFmode for double or long double.  For ports
>> which need special treatment, there are some other patches
>> for their own port specific implementation (referring to
>> how {,LONG_}DOUBLE_TYPE_SIZE get used there).  For all
>> generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
>> on the context, some of them are replaced with TYPE_PRECISION
>> of the according type node, some other are replaced with
>> GET_MODE_PRECISION on the mode from mode_for_floating_type.
>> This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
>> so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
>> specific are removed, but there are still some which are
>> good to be kept for readability then they get renamed with
>> port specific prefix.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
>> [2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/ChangeLog:
>>
>> * coretypes.h (enum tree_index): Forward declaration.
>> * defaults.h (FLOAT_TYPE_SIZE): Remove.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
>> * doc/rtl.texi: Update document by replacing {FLOAT,DOUBLE}_TYPE_SIZE
>> with C type {float,double}.
>> * doc/tm.texi.in: Document new hook mode_for_floating_type, remove
>> document entries for {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE and
>> update document for WIDEST_HARDWARE_FP_SIZE.
>> * doc/tm.texi: Regenerate.
>> * emit-rtl.cc (init_emit_once): Replace DOUBLE_TYPE_SIZE by
>> calling targetm.c.mode_for_floating_type with TI_DOUBLE_TYPE.
>> * real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use TYPE_PRECISION of
>> long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
>> * system.h (FLOAT_TYPE_SIZE): Poison.
>> (DOUBLE_TYPE_SIZE): Likewise.
>> (LONG_DOUBLE_TYPE_SIZE): Likewise.
>> * target.def (mode_for_floating_type): New hook.
>> * targhooks.cc (default_mode_for_floating_type): New function.
>> (default_scalar_mode_supported_p): Update macros
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>> targetm.c.mode_for_floating_type with
>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>> * targhooks.h (default_mode_for_floating_type): New declaration.
>> * tree-core.h (enum tree_index): Specify underlying type unsigned
>> to sync with forward declaration in coretypes.h.
>> (NUM_FLOATN_TYPES): Explicitly convert to int.
>> (NUM_FLOATNX_TYPES): Likewise.
>> (NUM_FLOATN_NX_TYPES): Likewise.
>> * tree.cc (build_common_tree_nodes): Update macros
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>> targetm.c.mode_for_floating_type with
>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE and set type mode accordingly.
>> ---
>>  gcc/coretypes.h|  1 +
>>  gcc/defaults.h | 12 
>>  gcc/doc/rtl.texi   |  2 +-
>>  gcc/doc/tm.texi| 33 +
>>  gcc/doc/tm.texi.in | 27 +++
>>  gcc/emit-rtl.cc|  3 ++-
>>  gcc/real.h |  7 ---
>>  gcc/system.h   |  3 ++-
>>  gcc/target.def |  9 +
>>  gcc/targhooks.cc   | 18 +++---
>>  gcc/targhooks.h|  1 +
>>  gcc/tree-core.h| 13 +++--
>>  gcc/tree.cc| 18 +++---
>>  13 files changed, 77 insertions(+), 70 deletions(-)
>>
>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>> index 1ac6f0abea3..00c1c58bd8c 100644
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -100,6 +100,7 @@ struct gimple;
>>  typedef gimple *gimple_seq;
>>  struct gimple_stmt_iterator;
>>  class code_helper;
>> +enum tree_index : unsigned;
>>
>>  /* Forward declare rtx_code, so that we can use it in target hooks without
>> needing to pull in rtl.h.  */
>> diff --git a/gcc/defaults.h b/gcc/defaults.h
>> index 92f3e07f742..ac2d25852ab 100644
>> --- 

Re: [PATCH 02/52] d: Replace use of LONG_DOUBLE_TYPE_SIZE

2024-06-03 Thread Kewen.Lin
Hi Iain,

on 2024/6/3 16:40, Iain Buclaw wrote:
> Excerpts from Kewen Lin's message of Juni 3, 2024 5:00 am:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of LONG_DOUBLE_TYPE_SIZE in d with
>> TYPE_PRECISION of long_double_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
> 
> Thanks, one question though: Is TYPE_PRECISION really equivalent to
> LONG_DOUBLE_TYPE_SIZE?

Yes, it's guaranteed by the code in build_common_tree_nodes:

  long_double_type_node = make_node (REAL_TYPE);
  TYPE_PRECISION (long_double_type_node) = LONG_DOUBLE_TYPE_SIZE;
  layout_type (long_double_type_node);

, the macro LONG_DOUBLE_TYPE_SIZE is assigned to TYPE_PRECISION of
long_double_type_node, layout_type will only pick up one mode as
the given precision and won't change it.

> 
> Unless LONG_DOUBLE_TYPE_SIZE was poorly named to begin with, I'd assume
> the answer to be "no".

I'm afraid it's poorly named before.

> 
> i.e: TYPE_PRECISION = 80, but LONG_DOUBLE_TYPE_SIZE = 96 or 128.

>From what I interpreted from the code, it should never happen.

BR,
Kewen



Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-31 Thread Kewen.Lin
Hi Haochen,

on 2024/5/30 11:14, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2024/5/29 13:26, Kewen.Lin 写道:
>> I can understand re-using "unordered" and "eq" will save some efforts than
>> doing with unspecs, but they are actually RTL codes instead of bits on the
>> specific hardware CR, a downside is that people who isn't aware of this
>> design point can have some misunderstanding when reading/checking the code
>> or dumping, from this perspective unspecs (with reasonable name) can be
>> more meaningful.  Normally adopting RTL code is better since they have the
>> chance to be considered (optimized) in generic pass/code, but it isn't the
>> case here as we just use the code itself but not be with the same semantic
>> (meaning).  Looking forward to others' opinions on this, if we want to adopt
>> "unordered" and "eq" like what this patch does, I think we should at least
>> emphasize such points in rs6000-modes.def.
> 
> Thanks so much for your comments. IMHO, the core is if we can re-define
> "unordered" or "eq" for certain CC mode on a specific target. If we can't or
> it's unsafe, we have to use the unspecs. In this case, I just want to define
> the code "unordered" on CCBCD as testing if the bit 3 is set on this CR field.

But my point is that "unordered" has its semantic, it looks a bit tricky to
adopt it on the result from BCD comparison when which only has "invalid" and
"overflow" other than normal ones, though I can understand that this patch
wants to use it to test bit 3 since for float comparison bit 3 is for
"unordered".  However, IMHO it would be more clear to have one unspec to test
bit 3 when bit 3 doesn't actually mean unordered result.

> Actually rs6000 already use "lt" code to test if bit 0 is set for vector
> compare instructions. The following expand is an example.

Yeah, but it doesn't mean it's the most sensible way to do this, IMHO it
suffers from the similar issue and can be improved as well.

> 
> (define_expand "vector_ae__p"
>   [(parallel
> [(set (reg:CC CR6_REGNO)
>   (unspec:CC [(ne:CC (match_operand:VI 1 "vlogical_operand")
>  (match_operand:VI 2 "vlogical_operand"))]
>UNSPEC_PREDICATE))
>  (set (match_dup 3)
>   (ne:VI (match_dup 1)
>  (match_dup 2)))])
>(set (match_operand:SI 0 "register_operand" "=r")
> (lt:SI (reg:CC CR6_REGNO)
>(const_int 0)))
>(set (match_dup 0)
> (xor:SI (match_dup 0)
> (const_int 1)))]
> 
> I think the "lt" on CC just doesn't mean it compares if CC value is less than 
> an
> integer. It just tests the "lt" bit (bit 0) is set or not on this CC.

But bit 0 doesn't mean for "lt" comparison result in this context any more.

BR,
Kewen

> 
>   Looking forward to your and Segher's further invaluable comments.
> 
> Thanks
> Gui Haochen



Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-31 Thread Kewen.Lin
on 2024/5/29 14:52, Alexandre Oliva wrote:
> On May 27, 2024, "Kewen.Lin"  wrote:
> 
>> I wonder if it's possible to have a test case for this?
> 
> gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on

Nice!

> ppc64le-linux-gnu.  Are these the sort of test case you're interested

Yes, I was curious if we can have some testing coverage on this.  As
Segher pointed out, it would be good to have this information in commit
log.

BR,
Kewen

> in, or are you looking for something that tests the offsets in debug
> info, rather than the end-to-end debugging feature?
> 





Re: [PATCH v2] add explicit ABI and align options to pr88233.c

2024-05-31 Thread Kewen.Lin
on 2024/5/29 14:32, Alexandre Oliva wrote:
> On May 26, 2024, "Kewen.Lin"  wrote:
> 
>> Hi,
>> on 2024/4/22 17:38, Alexandre Oliva wrote:
>>> Ping?
>>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566530.html
>>> (modified version follows)
> 
>> Segher originated this test case, I was expecting he can chime in this. :)
> 
> Me too ;-)
> 
>>> We've observed failures of this test on powerpc configurations that
>>> default to different calling conventions and alignment requirements.
> 
>> It seems that it was using the original "BE" and "LE" guards to shadow
>> ABIs, could you share some more on how you found this failure?  It seems
>> that your test environment with -mstrict-align turned on by default?  And
>> also having a ABI which passing small struct return value in register?
> 
> Exactly, AdaCore's ppc64-vx7r2 are configured so as to enable
> -mstrict-align and -freg-struct-return by default.

OK, thanks for the information!

> 
> But since these settings may change depending on the target variant, I
> figured it would be useful to record what the assumptions are that the
> test makes.  That one of these settings changed depending on endianness
> and affected codegen was, to me, further evidence that this would be
> useful, so, with the explicit settings, I could restore the original
> test's expectations.

Got it, but it also means we can probably test it without the default ABI
on the test env, someone may argue this testing is of less value.  By
visiting the original PR, maybe we can drop the scanning on the load isns
and just keep the scanning-not on mtvsr, it becomes not sensitive for the
alignment and struct result passing way.  Looking forward to Segher's
opinion on this patch. :)

BR,
Kewen



Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-28 Thread Kewen.Lin
Hi,

on 2024/4/30 15:18, HAO CHEN GUI wrote:
> Hi,
>   It's the first patch of a series of patches optimizing CC modes on
> rs6000.
> 
>   bcd insns set all four bits of a CR field. But it has different single
> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
> to indict overflow or invalid number. It's not a bit for unordered test.
> So the "le" test should be reversed to "gt" not "ungt". The "ge" test
> should be reversed to "lt" not "unlt". That's the root cause of PR100736
> and PR114732.
> 
>   This patch fixes the issue by adding a new type of CC mode - CCBCD for
> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
> be merged to a uniform pattern which is for all CC modes in sequential
> patch.

Thanks for doing this, adding one more CCmode for BCD specific looks
reasonable and make code more clear.

> 
>   The rtl code "unordered" is still used for testing overflow or
> invalid number. IMHO, the "unordered" on a CC mode can be considered as
> testing the forth bit of a CR field setting or not. The "eq" on a CC mode
> can be considered as testing the third bit setting or not. Thus we avoid
> creating lots of unspecs for the CR bit testing.

I can understand re-using "unordered" and "eq" will save some efforts than
doing with unspecs, but they are actually RTL codes instead of bits on the
specific hardware CR, a downside is that people who isn't aware of this
design point can have some misunderstanding when reading/checking the code
or dumping, from this perspective unspecs (with reasonable name) can be
more meaningful.  Normally adopting RTL code is better since they have the
chance to be considered (optimized) in generic pass/code, but it isn't the
case here as we just use the code itself but not be with the same semantic
(meaning).  Looking forward to others' opinions on this, if we want to adopt
"unordered" and "eq" like what this patch does, I think we should at least
emphasize such points in rs6000-modes.def.

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

Some minor comments are inlined, Segher did a lot of work on CC, I'm looking
forward to his review on this patch series. :)

> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add a new type of CC mode - CCBCD for bcd insns
> 
> gcc/
>   PR target/100736
>   PR target/114732
>   * config/rs6000/altivec.md (bcd_): Replace CCFP
>   with CCBCD.
>   (*bcd_test_): Likewise.
>   (*bcd_test2_): Likewise.
>   (bcd__): Likewise.
>   (*bcdinvalid_): Likewise.
>   (bcdinvalid_): Likewise.
>   (bcdshift_v16qi): Likewise.
>   (bcdmul10_v16qi): Likewise.
>   (bcddiv10_v16qi): Likewise.
>   (peephole for bcd_add/sub): Likewise.
>   * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
>   and its supported comparison codes.
>   * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
>   * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
>   assertion.
>   * config/rs6000/rs6000.md (CC_any): Add CCBCD.
>   (ccbcd_rev): New code iterator.
>   (*_cc): New insn and split pattern for CCBCD reverse
>   compare.
> 
> gcc/testsuite/
>   PR target/100736
>   PR target/114732
>   * gcc.target/powerpc/pr100736.c: New.
>   * gcc.target/powerpc/pr114732.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index bb20441c096..9fa8cf89f61 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -4443,7 +4443,7 @@ (define_insn "bcd_"
> (match_operand:VBCD 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_1_operand" "n")]
>UNSPEC_BCD_ADD_SUB))
> -   (clobber (reg:CCFP CR6_REGNO))]
> +   (clobber (reg:CCBCD CR6_REGNO))]
>"TARGET_P8_VECTOR"
>"bcd. %0,%1,%2,%3"
>[(set_attr "type" "vecsimple")])
> @@ -4454,8 +4454,8 @@ (define_insn "bcd_"
>  ;; probably should be one that can go in the VMX (Altivec) registers, so we
>  ;; can't use DDmode or DFmode.

Here is a paragraph of comments above:

;; Use a floating point type (V2DFmode) for the compare to set CR6 so that we
;; can use the unordered test for BCD nans and add/subtracts that overflow.  An
;; UNORDERED test on an integer type (like V1TImode) is not defined.  The type
;; probably should be one that can go in the VMX (Altivec) registers, so we
;; can't use DDmode or DFmode.

Is it still hold?  It's not obvious where is the code checking unordered test
should be on fp type (modes), if it still takes effect, "unspec" would help
to get rid of this restriction.  Otherwise, this comment should be updated
and we can drop this workaround with V2DF here.

>  (define_insn "*bcd_test_"
> -  [(set (reg:CCFP CR6_REGNO)
> - (compare:CCFP
> +  [(set (reg:CCBCD CR6_REGNO)
> + (compare:CCBCD
>(unspec:V2DF [(match_operand:VBCD 1 

Re: [RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-28 Thread Kewen.Lin
Hi Richi and Joseph,

on 2024/5/24 20:23, Richard Biener wrote:
> On Fri, May 24, 2024 at 12:20 PM Kewen.Lin  wrote:
>> btw, the attached patch is bootstrapped and regtested on
>> powerpc64-linux-gnu and powerpc64le-linux-gnu with all
>> languages on, cross cc1 built well for affected ports.
> 
> Looks reasonable to me - I'd split language changes out but
> keep target and middle-end together.  The middle-end parts
> look good to me - I'm always a bit nervous when using
> size and precision exchangably, esp. for FP, but it seems
> this has been done before.

Thanks for the suggestion!  I'll split them into a patch series
as components soon and follow this suggestion when committing
(some preparation language changes go first and squash the
others together).

on 2024/5/29 05:06, Joseph Myers wrote:
> On Fri, 24 May 2024, Kewen.Lin wrote:
> 
>> Following your suggestion and comments, I made this patch
>> for mode_for_floating_type first, considering this touches
>> a few FE and port specific code, I think I have to split
>> it into a patch series.  Before making that, I'd like to
>> ensure this meets what you expected, and also seek for the
> 
> The general idea seems reasonable (I haven't reviewed it in detail).  
> Note that when removing a target macro, it's a good idea to add it to the 
> "Old target macros that have moved to the target hooks structure." list 
> (of #pragma GCC poison) in system.h to ensure any new target that was 
> originally written before the change doesn't accidentally get into GCC 
> while still using the old macros.
> 

Thanks for the comments on target macro removal!  I found it means
that we can't use such macros any more even if they have become port
specific.  For some targets such as pa, they redefine these macros in
some subtarget headers, or these macros get used in other macro
definitions.  Considering leaving them can have better readability,
I didn't try to change them in this RFC/PATCH, I'll update them with
target prefix in the following patch series.

BR,
Kewen



Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Kewen.Lin
on 2024/5/28 20:09, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> on 2024/5/27 20:54, Richard Biener wrote:
>>> On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:
>>>>
>>>> Hi,
>>>>   This patch adds an optab for __builtin_isfinite. The finite check can be
>>>> implemented on rs6000 by a single instruction. It needs an optab to be
>>>> expanded to the certain sequence of instructions.
>>>>
>>>>   The subsequent patches will implement the expand on rs6000.
>>>>
>>>>   Compared to previous version, the main change is to specify acceptable
>>>> modes for the optab.
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
>>>>
>>>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>>>> regressions. Is this OK for trunk?
>>>>
>>>> Thanks
>>>> Gui Haochen
>>>>
>>>> ChangeLog
>>>> optab: Add isfinite_optab for isfinite builtin
>>>>
>>>> gcc/
>>>> * builtins.cc (interclass_mathfn_icode): Set optab to 
>>>> isfinite_optab
>>>> for isfinite builtin.
>>>> * optabs.def (isfinite_optab): New.
>>>> * doc/md.texi (isfinite): Document.
>>>>
>>>>
>>>> patch.diff
>>>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>>>> index f8d94c4b435..b8432f84020 100644
>>>> --- a/gcc/builtins.cc
>>>> +++ b/gcc/builtins.cc
>>>> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>>>errno_set = true; builtin_optab = ilogb_optab; break;
>>>>  CASE_FLT_FN (BUILT_IN_ISINF):
>>>>builtin_optab = isinf_optab; break;
>>>> -case BUILT_IN_ISNORMAL:
>>>>  case BUILT_IN_ISFINITE:
>>>> +  builtin_optab = isfinite_optab; break;
>>>> +case BUILT_IN_ISNORMAL:
>>>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>>>  case BUILT_IN_FINITED32:
>>>>  case BUILT_IN_FINITED64:
>>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>>>> index 5730bda80dc..67407fad37d 100644
>>>> --- a/gcc/doc/md.texi
>>>> +++ b/gcc/doc/md.texi
>>>> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
>>>> with operand 2.
>>>>
>>>>  This pattern is not allowed to @code{FAIL}.
>>>>
>>>> +@cindex @code{isfinite@var{m}2} instruction pattern
>>>> +@item @samp{isfinite@var{m}2}
>>>> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
>>>> +@code{DFmode}, or @code{TFmode} floating point number and to 0
>>>
>>> It should probably say scalar floating-point mode?  But what about the 
>>> result?
>>> Is any integer mode OK?  That's esp. important if this might be used on
>>> vector modes.
>>>
>>>> +otherwise.
>>>> +
>>>> +If this pattern @code{FAIL}, a call to the library function
>>>> +@code{isfinite} is used.
>>>
>>> Or it's otherwise inline expanded?  Or does this imply targets
>>> have to make sure to implement the pattern when isfinite is
>>> not available in libc/libm?  I suggest to leave this sentence out,
>>> we usually only say when a pattern may _not_ FAIL (and usually
>>> FAILing isn't different from not providing a pattern).
>>
>> As Haochen's previous reply, I think there are three cases:
>>   1) no optab defined, fold in a generic way;
>>   2) optab defined, SUCC, expand as what it defines;
>>   3) optab defined, FAIL, generate a library call;
>>
>> From above, I had the concern that ports may assume FAILing can
>> fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

Good point, sure, I'll follow up this.

BR,
Kewen

> 
>> Does your comment imply ports usually don't make such assumption
>> (or they just check what happens for FAIL)?
>>
>> BR,
>> Kewen
>>
>>>
>>>>  @end table
>>>>
>>>>  @end ifset
>>>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>>>> index ad14f9328b9..dcd77315c2a 100644
>>>> --- a/gcc/optabs.def
>>>> +++ b/gcc/optabs.def
>>>> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>>>>  OPTAB_D (hypot_optab, "hypot$a3")
>>>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>>>  OPTAB_D (isinf_optab, "isinf$a2")
>>>> +OPTAB_D (isfinite_optab, "isfinite$a2")
>>>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>>>  OPTAB_D (log10_optab, "log10$a2")
>>
>>
>>



Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Kewen.Lin
Hi,

on 2024/5/27 20:54, Richard Biener wrote:
> On Mon, May 27, 2024 at 11:37 AM HAO CHEN GUI  wrote:
>>
>> Hi,
>>   This patch adds an optab for __builtin_isfinite. The finite check can be
>> implemented on rs6000 by a single instruction. It needs an optab to be
>> expanded to the certain sequence of instructions.
>>
>>   The subsequent patches will implement the expand on rs6000.
>>
>>   Compared to previous version, the main change is to specify acceptable
>> modes for the optab.
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652170.html
>>
>>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
>> regressions. Is this OK for trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> optab: Add isfinite_optab for isfinite builtin
>>
>> gcc/
>> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>> for isfinite builtin.
>> * optabs.def (isfinite_optab): New.
>> * doc/md.texi (isfinite): Document.
>>
>>
>> patch.diff
>> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
>> index f8d94c4b435..b8432f84020 100644
>> --- a/gcc/builtins.cc
>> +++ b/gcc/builtins.cc
>> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>>errno_set = true; builtin_optab = ilogb_optab; break;
>>  CASE_FLT_FN (BUILT_IN_ISINF):
>>builtin_optab = isinf_optab; break;
>> -case BUILT_IN_ISNORMAL:
>>  case BUILT_IN_ISFINITE:
>> +  builtin_optab = isfinite_optab; break;
>> +case BUILT_IN_ISNORMAL:
>>  CASE_FLT_FN (BUILT_IN_FINITE):
>>  case BUILT_IN_FINITED32:
>>  case BUILT_IN_FINITED64:
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 5730bda80dc..67407fad37d 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -8557,6 +8557,15 @@ operand 2, greater than operand 2 or is unordered 
>> with operand 2.
>>
>>  This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{isfinite@var{m}2} instruction pattern
>> +@item @samp{isfinite@var{m}2}
>> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
>> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> 
> It should probably say scalar floating-point mode?  But what about the result?
> Is any integer mode OK?  That's esp. important if this might be used on
> vector modes.
> 
>> +otherwise.
>> +
>> +If this pattern @code{FAIL}, a call to the library function
>> +@code{isfinite} is used.
> 
> Or it's otherwise inline expanded?  Or does this imply targets
> have to make sure to implement the pattern when isfinite is
> not available in libc/libm?  I suggest to leave this sentence out,
> we usually only say when a pattern may _not_ FAIL (and usually
> FAILing isn't different from not providing a pattern).

As Haochen's previous reply, I think there are three cases:
  1) no optab defined, fold in a generic way;
  2) optab defined, SUCC, expand as what it defines;
  3) optab defined, FAIL, generate a library call;

>From above, I had the concern that ports may assume FAILing can
fall back with the generic folding, but it's not actually.
Does your comment imply ports usually don't make such assumption
(or they just check what happens for FAIL)?

BR,
Kewen

> 
>>  @end table
>>
>>  @end ifset
>> diff --git a/gcc/optabs.def b/gcc/optabs.def
>> index ad14f9328b9..dcd77315c2a 100644
>> --- a/gcc/optabs.def
>> +++ b/gcc/optabs.def
>> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>>  OPTAB_D (hypot_optab, "hypot$a3")
>>  OPTAB_D (ilogb_optab, "ilogb$a2")
>>  OPTAB_D (isinf_optab, "isinf$a2")
>> +OPTAB_D (isfinite_optab, "isfinite$a2")
>>  OPTAB_D (issignaling_optab, "issignaling$a2")
>>  OPTAB_D (ldexp_optab, "ldexp$a3")
>>  OPTAB_D (log10_optab, "log10$a2")





Re: [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-27 Thread Kewen.Lin
Hi Haochen,

on 2024/5/27 15:22, HAO CHEN GUI wrote:
> Hi Kewen,
>   Thanks for your comments.
> 
> 在 2024/5/27 11:18, Kewen.Lin 写道:
>> Does this require "This pattern is not allowed to FAIL."?
>>
>> I guess yes?  Since if it's decided to go with this pattern
>> expanding, there is no fall back?
> 
>   The builtin is inline folded if the optab doesn't exist on
> the target. Otherwise, it is expanded by target specific
> insns. If it fails at expand, the library is called. It can't
> fall back to inline folding when it fails at expand. I am not
> sure whether it should be marked "allowed to FAIL" or not.

Ah, so it's allowed to FAIL, just not falling back with the
generic folding way, then it should not be marked with "not
allowed to FAIL" clearly.  Maybe we can note that "FAIL" makes
it call the according library function instead (avoiding the
possible assumption it will fall back with generic folding).

BR,
Kewen

> 
>   Could anyone advice me?
> 
> Thanks
> Gui Haochen



Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-26 Thread Kewen.Lin
Hi,

on 2024/5/25 20:13, Alexandre Oliva wrote:
> On Apr 27, 2023, Alexandre Oliva  wrote:
> 
>> On Apr 14, 2023, Alexandre Oliva  wrote:
>>> On Mar 23, 2023, Alexandre Oliva  wrote:
 This patch introduces infrastructure for targets to add an offset to
 the label issued after the call_insn to set the call_return_pc
 attribute.  This will be used on rs6000, that sometimes issues another
 instruction after the call proper as part of a call insn.
> 
>>> Ping?
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614453.html
> 
>> Ping?
> 
> Ping?
> Refreshed, retested on ppc64le-linux-gnu.  Ok to install?
> 
> 
> Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes
> out of a single call insn, but the call (bl) or jump (b) is not always
> the last opcode in the sequence.
> 
> This does not seem to be a problem for exception handling tables, but
> the return_pc attribute in the call graph output in dwarf2+ debug
> information, that takes the address of a label output right after the
> call, does not match the value of the link register even for non-tail
> calls.  E.g., with ABI_AIX or ABI_ELFv2, such code as:
> 
>   foo ();
> 
> outputs:
> 
>   bl foo
>   nop
>  LVL#:
> [...]
>   .8byte .LVL#  # DW_AT_call_return_pc
> 
> but debug info consumers may rely on the return_pc address, and draw
> incorrect conclusions from its off-by-4 value.
> 
> This patch uses the infrastructure for targets to add an offset to the
> label issued after the call_insn to set the call_return_pc attribute,
> on rs6000, to account for opcodes issued after actual call opcode as
> part of call insns output patterns.

I wonder if it's possible to have a test case for this?

BR,
Kewen

> 
> 
> for  gcc/ChangeLog
> 
>   * config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL):
>   Override.
>   (rs6000_call_offset_return_label): New.
> ---
>  gcc/config/rs6000/rs6000.cc |   18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc9a..77e6b94a539da 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1779,6 +1779,8 @@ static const scoped_attribute_specs *const 
> rs6000_attribute_table[] =
>  #undef TARGET_OVERLAP_OP_BY_PIECES_P
>  #define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>  
> +#undef TARGET_CALL_OFFSET_RETURN_LABEL
> +#define TARGET_CALL_OFFSET_RETURN_LABEL rs6000_call_offset_return_label
>  
>  
>  /* Processor table.  */
> @@ -14822,6 +14824,22 @@ rs6000_assemble_integer (rtx x, unsigned int size, 
> int aligned_p)
>return default_assemble_integer (x, size, aligned_p);
>  }
>  
> +/* Return the offset to be added to the label output after CALL_INSN
> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> +
> +static int
> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> +{
> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always
> + a 4-byte instruction, but some output patterns issue other
> + opcodes afterwards.  The return label is issued after the entire
> + call insn, including any such post-call opcodes.  Instead of
> + figuring out which cases need adjustments, we compute the offset
> + back to the address of the call opcode proper, then add the
> + constant 4 bytes, to get the address after that opcode.  */
> +  return 4 - get_attr_length (call_insn);
> +}
> +
>  /* Return a template string for assembly to emit when making an
> external call.  FUNOP is the call mem argument operand number.  */
>  
> 
> 



Re: [PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-26 Thread Kewen.Lin
Hi,

on 2024/5/20 16:15, HAO CHEN GUI wrote:
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
> 
>   The subsequent patches will implement the expand on rs6000.
> 
>   Compared to previous version, the main change is to document isfinite
> in md.texi.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
> 
> gcc/
>   * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
>   for isfinite builtin.
>   * optabs.def (isfinite_optab): New.
>   * doc/md.texi (isfinite): Document.
> 
> 
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index f8d94c4b435..b8432f84020 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 5730bda80dc..8ed70b3feea 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -8557,6 +8557,11 @@ operand 2, greater than operand 2 or is unordered with 
> operand 2.
> 
>  This pattern is not allowed to @code{FAIL}.
> 
> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite floating-point
> +number and to 0 otherwise.
> +

Does this require "This pattern is not allowed to FAIL."?

I guess yes?  Since if it's decided to go with this pattern
expanding, there is no fall back?

BR,
Kewen


>  @end table
> 
>  @end ifset
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")



Re: [PATCH v2] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2024-05-26 Thread Kewen.Lin
Hi,

on 2024/4/22 18:11, Alexandre Oliva wrote:
> Ping?-ish
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619678.html
> 
> It's that time of the year again.  The good news is that this is the
> last patch in my ppc*-vxworks7* set ;-)
> 
> On May 25, 2023, Segher Boessenkool  wrote:
> 
>> On Thu, May 25, 2023 at 10:55:37AM -0300, Alexandre Oliva wrote:
>>> I've actually identified the corresponding change to the
>>> lp64 tests, compared the effects of the codegen changes, and concluded
>>> the tests needed this changing for ilp32 to keep on testing for the same
>>> thing after code changes brought about by changes that AFAICT had been
>>> well understood when making the lp64 adjustments.
> 
>>> /* -m32 target has an 'add' in place of one of the 'addi'. */
>>> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } 
>>> } } */
>>> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } 
>>> } } */
>>> +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 } } */
> 
>> Just {\madd} or more conservative {\maddi?\M} then?
> 
> I've made these changes in the v2 below.
> 
> Codegen changes caused add instruction count mismatches on
> ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
> expected counts were adjusted for lp64, but ilp32 differences
> remained, and published test results confirm it.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR testsuite/101169
>   * gcc.target/powerpc/fold-vec-extract-double.p7.c: Adjust addi
>   counts for ilp32.
>   * gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
>   * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
> ---
>  .../powerpc/fold-vec-extract-double.p7.c   |5 ++---
>  .../gcc.target/powerpc/fold-vec-extract-float.p7.c |5 ++---
>  .../gcc.target/powerpc/fold-vec-extract-float.p8.c |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-int.p7.c   |3 +--
>  .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |2 +-
>  .../gcc.target/powerpc/fold-vec-extract-short.p7.c |3 +--
>  .../gcc.target/powerpc/fold-vec-extract-short.p8.c |2 +-
>  7 files changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> index 3cae644b90b71..e69d9253e2d28 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
> @@ -13,12 +13,11 @@
>  /* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
>  /* -m32 target has an 'add' in place of one of the 'addi'. */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi?\M} 2 } } */
>  /* -m32 target has a rlwinm in place of a rldic .  */
>  /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
> -/* { dg-final { scan-assembler-times {\mlfdx\M|\mlfd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mlfdx?\M} 1 } } */
>  
>  #include 
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> index 59a4979457dcb..9ff197a704906 100644
> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
> @@ -12,13 +12,12 @@
>  /* { dg-final { scan-assembler-times {\mxscvspdp\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
>  /* -m32 as an add in place of an addi. */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi?\M} 2 } } */
>  /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 1 } } */
>  /* -m32 uses rlwinm in place of rldic */
>  /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
>  /* -m32 has lfs in place of lfsx */
> -/* { dg-final { scan-assembler-times {\mlfsx\M|\mlfs\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mlfsx?\M} 1 } } */
>  
>  #include 
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c 
> b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
> index 

Re: [PATCH v2] add explicit ABI and align options to pr88233.c

2024-05-26 Thread Kewen.Lin
Hi,

on 2024/4/22 17:38, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566530.html
> (modified version follows)

Segher originated this test case, I was expecting he can chime in this. :)

> 
> 
> We've observed failures of this test on powerpc configurations that
> default to different calling conventions and alignment requirements.

It seems that it was using the original "BE" and "LE" guards to shadow
ABIs, could you share some more on how you found this failure?  It seems
that your test environment with -mstrict-align turned on by default?  And
also having a ABI which passing small struct return value in register?

BR,
Kewen


> Both settings are needed for the original expectations to be met.
> 
> The test was later modified to have different expectations for big and
> little endian code generation.  This patch restores the original
> codegen expectations, that, with the explicit options, don't vary any
> more.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/powerpc/pr88233.c: Make some alignment strictness
>   and calling conventions assumptions explicit.  Restore uniform
>   codegen expectations
> ---
>  gcc/testsuite/gcc.target/powerpc/pr88233.c |7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr88233.c 
> b/gcc/testsuite/gcc.target/powerpc/pr88233.c
> index 27c73717a3f79..46a3ebfa28775 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr88233.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr88233.c
> @@ -1,5 +1,5 @@
>  /* { dg-require-effective-target lp64 } */
> -/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-strict-align 
> -fpcc-struct-return" } */
>  
>  typedef struct { double a[2]; } A;
>  A
> @@ -9,6 +9,5 @@ foo (const A *a)
>  }
>  
>  /* { dg-final { scan-assembler-not {\mmtvsr} } } */
> -/* { dg-final { scan-assembler-times {\mlxvd2x\M} 1 { target { be } } } } */
> -/* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 { target { be } } } } */
> -/* { dg-final { scan-assembler-times {\mlfd\M} 2 { target { le } } } } */
> +/* { dg-final { scan-assembler-times {\mlxvd2x\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
> 
> 



Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-26 Thread Kewen.Lin
Hi Carl,

on 2024/5/25 04:18, Carl Love wrote:
> Kewen:
> 
> On 5/14/24 01:43, Kewen.Lin wrote:
>> Hi,
>>
>> on 2024/4/20 05:17, Carl Love wrote:
>>> rs6000, Remove __builtin_vsx_xvcvspsxws built-in
>>>
>>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
>>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>>> built-in is not documented and there are no test cases for it.
>>>
>>> This patch removes the redundant built-in.
>>
>> By revisiting the comments on the previous version:
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html
> 
> The comments from the previous version:
> -
>I think we should recommend users to adopt the recommended built-ins in
>PVIPR, by checking the corresponding mnemonic in PVIPR, I got:
> 
>__builtin_vsx_xvcvspsxws -> vec_signed
>__builtin_vsx_xvcvspsxds -> N/A
>__builtin_vsx_xvcvspuxds -> N/A
>__builtin_vsx_xvcvdpsxws -> vec_signed{e,o}
>__builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o}
>__builtin_vsx_xvcvdpuxds_uns -> vec_unsigned
>__builtin_vsx_xvcvspdp   -> vec_double{e,o}
>__builtin_vsx_xvcvdpsp   -> vec_float{e,o}
>__builtin_vsx_xvcvspuxws -> vec_unsigned
>__builtin_vsx_xvcvsxwdp  -> vec_double{e,o}
>__builtin_vsx_xvcvuxddp_uns> vec_double
> 
>For __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds which don't have
>the according PVIPR built-ins, we can extend the current 
> vec_{un,}signed{e,o}
>to cover them and document them following the section mentioning PVIPR.
> 
> are handled by multiple patches in the new series.  The main comment on the 
> previous patch series was to remove most of the built-ins as they were 
> redundant.  So, basically most of the patches in the previous series were 
> thrown out and a new series to remove the built-ins in the current series.
> 
> 
> That all said, I distinctly remember addressing each of the above built-ins.  
> The work on the series got
> interrupted a couple of times and it looks like some of the patches to 
> address the above got lost.  My bad.
> The following is a list of which patch takes care of removing the duplicate 
> built-ins.

No problem, thanks for working on this!!

> 
> __builtin_vsx_xvcvspsxws patch 2 removes this built-in
> __builtin_vsx_xvcvspsxds -> N/A  patch 4 extends vec_{un,}signede 
> to cover this built-in,
>Built-in used in 
> rs6000-overload.def.  Built-in now for   
>  internal use only.
> __builtin_vsx_xvcvspuxds -> N/A  patch 4 extends vec_{un,}signedo 
> to cover this built-in.
>Built-in used in 
> rs6000-overload.def.  Built-in now for
>  internal use only 
> 
> 
> __builtin_vsx_xvcvdpsxws -> vec_signed{e,o}   removed in patch 4
> __builtin_vsx_xvcvdpuxws -> vec_unsigned{e,o} removed in patch 4
> 
> __builtin_vsx_xvcvdpuxds_uns -> vec_unsigned  remove in patch 4
> __builtin_vsx_xvcvspuxws -> vec_unsigned  remove in patch 4

Just to avoid some misunderstanding, I guess you meant the new patch 4?
As the current patch 4 doesn't remove these.

> 
> The following will changes will be put into a new patch when the series is 
> reposted.  It appears they
> got lost in the current series.  My bad.
> 
> __builtin_vsx_xvcvspdp   -> vec_double{e,o}   remove in new patch number 5
> __builtin_vsx_xvcvdpsp   -> vec_float{e,o}remove in new patch number 5
> 
> __builtin_vsx_xvcvsxwdp  -> vec_double{e,o}   remove in new patch number 5
> __builtin_vsx_xvcvuxddp_uns> vec_double   remove in new patch number 5
> 
>>
>> I wonder if it's intentional to keep the others, at least bifs
>> __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and
>> __builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the
>> equivalent ones in PVIPR.  And for the others, users can still use
>> the PVIPR ones by considering endianness (controlling with endianness
>> macros).
>>
> 
> Hopefully that makes it clearer where the various changes are.   
> 
> The next series will add a new patch 5 in the series.  The remaining patches 
> in this series, patches 5, 6, ... will get moved to patch 6, 7, ... in the 
> next posting of the built-in cleanup patch series.
> 

OK, thanks!

BR,
Kewen



Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-24 Thread Kewen.Lin
Hi,

on 2024/5/24 02:21, Carl Love wrote:
> 
> 
> On 5/13/24 22:37, Kewen.Lin wrote:
>> Hi,
>>
>> on 2024/4/20 05:18, Carl Love wrote:
>>> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
>>>
>>> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
>>> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
>>> the test cases are removed.
>>>
>>> gcc/ChangeLog:
>>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>>> Remove built-in definition.
>>>
>>
>> Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
>> __builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
>> considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.
>>
>>
>>> gcc/testsuite/ChangeLog:
>>> * vsx-builtin-3.c (do_cmp): Remove test case for
>>> __builtin_vsx_xvcmpeqsp.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>>>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>>>  2 files changed, 5 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 2f6149edd5f..19d05b8043a 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -1613,9 +1613,6 @@
>>>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>>>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>>>  
>>> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>>> -XVCMPEQSP vector_eqv4sf {}
>>> -
>>>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>>>  XVCMPGEDP vector_gev2df {}
>>>  
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
>>> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> index 35ea31b2616..245893dc0e3 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
>>> @@ -27,7 +27,6 @@
>>>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
>>> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>>>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>>>  /* { dg-final { scan-assembler "xxsldwi" } } */
>>> @@ -112,7 +111,6 @@ int do_cmp (void)
>>>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>>>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>>>  
>>> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>>>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>>>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>>>return i;
>>
>> As the other in this patch series, I prefer to change it with
>> vec_cmpeq here, OK for trunk with this tweaked (also keep the
>> scan there), thanks!
> 
> When I went to change the test case I noticed that __builtin_vsx_xvcmpeqsp 
> and vec_cmpeq both return a vector where the element is all ones if the 
> comparison is True and zeros if False.  However, the return type for 
> __builtin_vsx_xvcmpeqsp is vector floats but vec_cmpeq returns vector bool.
> 

Ah, so they are not equivalent from prototype perspective.

> The PVIPR says the vec_cmpeq built-in returns a value where each bit in the 
> vector element is a 1 if the comparison is equal and 0 otherwise.  However, 
> the documented result is a vector bool int for the floating point comparison. 
>  The return value for __builtin_vsx_xvcmpeqsp was vector float.

IMHO PVIPR prototype (returning vector bool) makes more sense,
it does match better with what the result holds.

> 
> So, the "bit values" returned are the same but not of the same type. So 
> technically vec_cmpeq is not a drop in replacement for 
> __builtin_vsx_xvcmpeqsp.  Given that, perhaps we should not be removing 
> __builtin_vsx_xvcmpeqsp?
> 
> The testcase has to be changed from:
>  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>  bi[i][0] = vec_cmpeq (f[i][1], f[i][2]); i++;

For the test case change, I'd expect that it can work with:

-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
+  f[i][0] = (vector float) vec_cmpeq (f[i][1], f[i][2]); i++;

> 
> I am thinking we should drop this patch from the series, i.e. don't remove 
> __builtin_vsx_xvcmpeqsp.  Thoughts?
> 

Since __builtin_vsx_xvcmpeqsp is an undocumented built-in, I don't
expect users to use it, even there is someone, IMHO vector bool is
a better fit.  In case someone actually wants the vector non-bool
type, he/she can just add an explicit conversion.  So I'm inclined
to remove the vsx_xvcmpeqsp, users should try to use PVIPR built-ins
as possible as they can.  But I'm also fine for holding on this, as
there are some other related built-ins cmp* (cmpge,cmpgt...), we
can re-visit and handle them together later.

BR,
Kewen


[RFC/PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook

2024-05-24 Thread Kewen.Lin
Hi Joseph and Richi,

on 2024/5/13 21:18, Joseph Myers wrote:
> On Mon, 13 May 2024, Kewen.Lin wrote:
> 
>>> In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
>>> though this removes the "convenient" defaulting, requiring each target to
>>> enumerate all standard C ABI type modes.  But that might be also a good 
>>> thing.
>>>
>>
>> I guess the main value by extending from floating point types to all is to
>> unify them?  (Assuming that excepting for floating types the others would
>> not have multiple possible representations like what we faces on 128bit fp).
> 
> For integer types, giving the number of bits makes sense as an interface - 
> there isn't an issue with different modes.
> 
> So I think it's appropriate for floating and integer types to have 
> separate hooks - with the one for floating types returning a mode, and the 
> one for integer types returning a number of bits.  (And also keep the 
> existing separate hook for _FloatN / _FloatNx modes.)
> 
> That may also make for more convenient defaults (whether a target has long 
> double wider than double is largely independent of what sizes it uses for 
> integer types).
> 

Following your suggestion and comments, I made this patch
for mode_for_floating_type first, considering this touches
a few FE and port specific code, I think I have to split
it into a patch series.  Before making that, I'd like to
ensure this meets what you expected, and also seek for the
suggestion on how to organize the sub-patches.  There seem
two ways for sub-patches:
  1) split this into pieces according to FEs and ports, and
 squash all of them and commit one patch.
  2) extract all hook implementation as 1st series (split
 as ports);
 extract the hook enablement as 2nd part (split as
 generic and FEs);
 the remaining is to remove useless macros (split it
 as generic and ports);

The 1) is straightforward, while the 2) is fine-grained and
easy for isolation, but not sure if it's worth doing.

btw, the attached patch is bootstrapped and regtested on
powerpc64-linux-gnu and powerpc64le-linux-gnu with all
languages on, cross cc1 built well for affected ports.

BR,
Kewen

-
From 2935750160f4eaf72eb7fba5832c99d6bf552862 Mon Sep 17 00:00:00 2001
From: Kewen Lin 
Date: Fri, 24 May 2024 00:10:22 -0500
Subject: [PATCH] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
 mode_for_floating_type

Currently how we determine which mode will be used for a
floating point type is that for a given type precision
(size) call mode_for_size to get the first mode which has
this size in the specified class.  On Powerpc, we have
three modes (TF/KF/IF) having the same mode precision 128
(see[1]), so the processing forces us to have to place TF
at the first place, it would require us to make more
adjustment in some generic code to avoid some unexpected
mode conversions and it would be even worse if we get rid
of TF eventually one day.  And as Joseph pointed out in [2],
"floating  types should have their mode, not a poorly
defined precision value", as Joseph and Richi suggested,
this patch is to introduce one hook mode_for_floating_type
which returns the corresponding mode for type float, double
or long double.  The default implementation returns SFmode
for float and DFmode for double or long double, and ports
which need special treatment have their own port specific
implementation (referring to {,LONG_}DOUBLE_TYPE_SIZE).
For all generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
depending on the context, it replaces them with
TYPE_PRECISION of the according type node, or
GET_MODE_PRECISION on the mode from mode_for_floating_type.
It also removes some useless uses of
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in target specific codes,
but leaves those being used (like defining other macros)
untouched.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html

gcc/ada/ChangeLog:

* gcc-interface/decl.cc (gnat_to_gnu_entity): Use TYPE_PRECISION of
long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_c_mode_for_floating_type):
New function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/aarch64/aarch64.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/alpha/alpha.cc (alpha_c_mode_for_floating_type): New
function.
(TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
* config/alpha/alpha.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.
* config/arc/arc.h (FLOAT_TYPE_SIZE): Remove.
(DOUBLE_TYPE_SIZE): Likewise.
(LONG_DOUBLE_TYPE_SIZE): Likewise.

Re: [PATCH] missing reuire target has_arch_ppc64 for pr106550.c

2024-05-22 Thread Kewen.Lin
Hi Jeff,

subject typo: s/reuire/require/

on 2024/5/23 09:11, Jiufu Guo wrote:
> Hi,
> 
> Case pr106550.c is testing constant building for 64bit
> register. So, this case requires target of has_arch_ppc64.
> 

Nit: Maybe add more comments saying it fails with -m32
without having the expected rldimi?  So it requires
has_arch_ppc64.

> Bootstrap and regtest pass on ppc64{,le}.
> Is this ok for trunk?
> 

Missing a changelog entry here, maybe something like:

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106550.c: Adjust by requiring has_arch_ppc64
effective target.

> BR,
> Jeff(Jiufu) Guo
> 
> ---
>  gcc/testsuite/gcc.target/powerpc/pr106550.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> index 74e395331ab..146514b3adf 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
> @@ -1,6 +1,7 @@
>  /* PR target/106550 */
>  /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
>  /* { dg-require-effective-target power10_ok } */

Nit: power10_ok can be dropped.

> +/* { dg-require-effective-target has_arch_ppc64 } */
OK with the nits above tweaked, thanks.

BR,
Kewen



Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-22 Thread Kewen.Lin
Hi Carl,

on 2024/5/23 08:29, Carl Love wrote:
> Kewen:
> 
> On 5/13/24 22:44, Kewen.Lin wrote:
>>> perform the same operation as setting a specific element in the vector in
>>> C code.  For example:
>>>
>>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>>   src_v4si[index] = int_val;
>>>
>>> The built-in actually generates more instructions than the inline C code
>>> with no optimization but is identical with -O3 optimizations.
>>>
>>> All of the above built-ins that are removed do not have test cases and
>>> are not documented.
>>>
>>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>>> __builtin_vec_set_v2df are not removed as they are used in function
>>> resolve_vec_insert() in file rs6000-c.cc.
>> I think we can replace these calls with the equivalent gimple codes
>> (early expanding it) and then we can get rid of these instances.
> 
> Hmm, going to need a little coaching here.  I am not sure how to do this.  
> Looks like I get to lean some  something new.
> 

We have functions rs6000_gimple_fold.*_builtin to fold the builtins,
it's folding (expanding) the bif with equivalent gimple codes, what
we want here is similar, you can refer to some implementation there.
For the expected gimple code, you can refer to what's generated with
normal C code.  Feel free to let me know when you meet some issues
when you are trying, even you prefer me to follow up this.

BR,
Kewen


Re: [PATCH v2] [testsuite] xfail pr79004 on longdouble64; drop long_double_64bit (was: ppc: testsuite: pr79004 needs -mlong-double-128)

2024-05-21 Thread Kewen.Lin
Hi,

on 2024/5/21 11:04, Alexandre Oliva wrote:
> On May  8, 2024, "Kewen.Lin"  wrote:
> 
>>>> How about the generic one "longdouble64"?  I did a grep and found it has 
>>>> one
>>>> use, I'd expect it can work here. :)
>>>
>>> ... since this and longdouble128 exist, maybe we can fix it and leave
>>> them all alone, despite the interface oddity.
>>>
>> ... personally I'm inclined to drop this 64 bit one. :)
> 
> Some of the asm opcodes expected by pr79004 depend on
> -mlong-double-128 to be output.  E.g., without this flag, the
> conditions of patterns @extenddf2 and extendsf2 do not
> hold, and so GCC resorts to libcalls instead of even trying
> rs6000_expand_float128_convert.
> 
> Perhaps the conditions are too strict, and they could enable the use
> of conversion insns involving __ieee128/_Float128 even with 64-bit
> long doubles.
> 
> For now, xfail the opcodes that are not available on longdouble64.
> 
> While at that, drop long_double_64bit, since it's broken and sort of
> redundant.
> 
> Regstrapped on x86_64-linux-gnu, also tested with gcc-13 on ppc64-vx7r2.
> Ok to install?

OK for trunk, thanks!

BR,
Kewen

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR target/105359
>   * gcc.target/powerpc/pr79004.c: Xfail opcodes not available on
>   longdouble64.
>   * lib/target-supports.exp
>   (check_effective_target_long_double_64bit): Drop.
>   (add_options_for_long_double_64bit): Likewise.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr79004.c |   14 +
>  gcc/testsuite/lib/target-supports.exp  |   43 
> 
>  2 files changed, 8 insertions(+), 49 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
> b/gcc/testsuite/gcc.target/powerpc/pr79004.c
> index caf1f6c1eefe4..2cb8bf4bc14bc 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
> @@ -100,10 +100,12 @@ void to_uns_short_store_n (TYPE a, unsigned short *p, 
> long n) { p[n] = (unsigned
>  void to_uns_int_store_n (TYPE a, unsigned int *p, long n) { p[n] = (unsigned 
> int)a; }
>  void to_uns_long_store_n (TYPE a, unsigned long *p, long n) { p[n] = 
> (unsigned long)a; }
>  
> -/* { dg-final { scan-assembler-not {\mbl __}   } } */
> -/* { dg-final { scan-assembler {\mxscvdpqp\M}  } } */
> -/* { dg-final { scan-assembler {\mxscvqpdp\M}  } } */
> -/* { dg-final { scan-assembler {\mxscvqpdpo\M} } } */
> +/* On targets with 64-bit long double, some opcodes to deal with __float128 
> are
> +   disabled, see PR target/105359.  */
> +/* { dg-final { scan-assembler-not {\mbl __}   { xfail longdouble64 } } 
> } */
> +/* { dg-final { scan-assembler {\mxscvdpqp\M}  { xfail longdouble64 } } 
> } */
> +/* { dg-final { scan-assembler {\mxscvqpdp\M}  { xfail longdouble64 } } 
> } */
> +/* { dg-final { scan-assembler {\mxscvqpdpo\M} { xfail longdouble64 } } 
> } */
>  /* { dg-final { scan-assembler {\mxscvqpsdz\M} } } */
>  /* { dg-final { scan-assembler {\mxscvqpswz\M} } } */
>  /* { dg-final { scan-assembler {\mxscvsdqp\M}  } } */
> @@ -111,7 +113,7 @@ void to_uns_long_store_n (TYPE a, unsigned long *p, long 
> n) { p[n] = (unsigned l
>  /* { dg-final { scan-assembler {\mlxsd\M}  } } */
>  /* { dg-final { scan-assembler {\mlxsiwax\M}   } } */
>  /* { dg-final { scan-assembler {\mlxsiwzx\M}   } } */
> -/* { dg-final { scan-assembler {\mlxssp\M} } } */
> +/* { dg-final { scan-assembler {\mlxssp\M} { xfail longdouble64 } } 
> } */
>  /* { dg-final { scan-assembler {\mstxsd\M} } } */
>  /* { dg-final { scan-assembler {\mstxsiwx\M}   } } */
> -/* { dg-final { scan-assembler {\mstxssp\M}} } */
> +/* { dg-final { scan-assembler {\mstxssp\M}{ xfail longdouble64 } } 
> } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index ec9baa4f32a30..dc7d4f2b5f39e 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2930,49 +2930,6 @@ proc add_options_for_long_double_ieee128 { flags } {
>  return "$flags"
>  }
>  
> -# Check if GCC and GLIBC supports explicitly specifying that the long double
> -# format uses the IEEE 64-bit.  Under little endian PowerPC Linux, you need
> -# GLIBC 2.32 or later to be able to use a different long double format for
> -# running a program than the system default.
> -
> -proc check_effective_target_long_double_64bit { } {
> -return [check_runtime_nocache long_double_64bit {
> - #include 
> -   

Re: [PATCH] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-05-21 Thread Kewen.Lin
Hi,

on 2024/2/26 13:43, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PR110040 exposes an issue concerning moves from vector registers to GPRs.
> There are two moves, one for upper 64 bits and the other for the lower
> 64 bits.  In the problematic test case, we are only interested in storing
> the lower 64 bits.  However, the instruction for copying the upper 64 bits
> is still emitted and is dead code.  This patch adds a splitter that splits
> apart the two move instructions so that DCE can remove the dead code after
> splitting.
> 
> 2024-02-26  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110040
>   * config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.
> 
> gcc/testsuite/
>   PR target/110040
>   * gcc.target/powerpc/pr110040-1.c: New testcase.
>   * gcc.target/powerpc/pr110040-2.c: New testcase.
> 
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 6111cc90eb7..78457f8fb14 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6706,3 +6706,19 @@
>"vmsumcud %0,%1,%2,%3"
>[(set_attr "type" "veccomplex")]
>  )
> +
> +(define_split
> +  [(set (match_operand:V1TI 0 "int_reg_operand")
> +   (match_operand:V1TI 1 "vsx_register_operand"))]
> +  "reload_completed
> +   && TARGET_DIRECT_MOVE_64BIT"
> +   [(pc)]
> +{
> +  rtx op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
> +  rtx op1 = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  rtx op2 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  rtx op3 = gen_rtx_REG (V2DImode, REGNO (operands[1]));

Nit: op3 is the same as op1 (so useless and removable), just
call it as src_op?  Maybe op0 and op2 as dest_op0, dest_op1
to match the extracted index?

> +  emit_insn (gen_vsx_extract_v2di (op0, op1, GEN_INT (0)));
> +  emit_insn (gen_vsx_extract_v2di (op2, op3, GEN_INT (1)));

Nit: GEN_INT ({0,1}) can be substituted with const{0,1}_rtx.

> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> new file mode 100644
> index 000..fb3bd254636
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> @@ -0,0 +1,14 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */

powerpc_p9vector_ok doesn't exist any more, you have to rebase.

This also requires int128 effective target.

Change it to:

/* { dg-options "-O2" } */
/* { dg-require-effective-target int128 } */
/* { dg-require-effective-target powerpc_vsx } */

Since we just check no mfvsrd, I think we can even drop the
"-mdejagnu-cpu=power9", could you test if it works on multiple
environments?

> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed long *dst, vector signed __int128 src)
> +{
> +  *dst = (signed long) src[0];
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> new file mode 100644
> index 000..f3aa22be4e8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */

Similar to the above:

/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
/* { dg-require-effective-target int128 } */
/* { dg-require-effective-target powerpc_vsx } */

Also adding one comment why it requires power10.

BR,
Kewen

> +
> +#include 
> +
> +void
> +foo (signed int *dst, vector signed __int128 src)
> +{
> +  __builtin_vec_xst_trunc (src, 0, dst);
> +}
> 
> 



Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-21 Thread Kewen.Lin
Hi Carl,

on 2024/5/22 08:13, Carl Love wrote:
> Kewen:
> 
> On 5/13/24 19:54, Kewen.Lin wrote:
>> Hi,
>>
>> on 2024/4/20 05:17, Carl Love wrote:
>>> rs6000, add overloaded vec_sel with int128 arguments
>>>
>>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>>> and return a signed/unsigned int128 result.
>>>
>>> Extending the vec_sel built-in makes the existing buit-ins
>>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>>> patch removes these built-ins.
>>>
>>> The patch adds documentation and test cases for the new overloaded vec_sel
>>> built-ins.
>>>
>>> gcc/ChangeLog:
>>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>> __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>> * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>> definitions.
>>> * doc/extend.texi: Add documentation for new vec_sel arguments.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def |  6 --
>>>  gcc/config/rs6000/rs6000-overload.def |  4 +
>>>  gcc/doc/extend.texi   | 14 
>>>  .../powerpc/vec-sel-runnable-i128.c   | 84 +++
>>>  4 files changed, 102 insertions(+), 6 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index d09e21a9151..46d2ae7b7cb 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -1931,12 +1931,6 @@
>>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>>  
>>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>>> -XXSEL_1TI vector_select_v1ti {}
>>> -
>>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>>> -
>>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>>  XXSEL_2DF vector_select_v2df {}
>>>  
>>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>>> b/gcc/config/rs6000/rs6000-overload.def
>>> index 68501c05289..5912c9452f4 100644
>>> --- a/gcc/config/rs6000/rs6000-overload.def
>>> +++ b/gcc/config/rs6000/rs6000-overload.def
>>> @@ -3274,6 +3274,10 @@
>>>  VSEL_2DF  VSEL_2DF_B
>>>vd __builtin_vec_sel (vd, vd, vull);
>>>  VSEL_2DF  VSEL_2DF_U
>>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>>> +VSEL_1TI  VSEL_1TI_S
>>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>>> +VSEL_1TI_UNS  VSEL_1TI_U
>>>  ; The following variants are deprecated.
>>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>>  VSEL_2DI_B  VSEL_2DI_S
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index 64a43b55e2d..86b8e536dbe 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding 
>>> the endianness issues involved
>>>  with the first argument and the result.
>>>  @findex vec_replace_unaligned
>>>  
>>> +Vector select
>>> +
>>> +@smallexample
>>> +vector signed __int128 vec_sel (vector signed __int128,
>>> +   vector signed __int128, vector signed __int128);
>>> +vector unsigned __int128 vec_sel (vector unsigned __int128,
>>> +   vector unsigned __int128, vector unsigned __int128);
>>> +@end smallexample
>>> +
>>> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
>>> +arguments and returns a vector selecting bits from the two source vectors 
>>> based
>>> +on the values of the third input vector.  This built-in is an extension of 
>>> the
>>> +@code{vec_sel} built-in documented in the PVIPR.
>>> +
>>
>> Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't really
>> require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are placed
>> in altivec stanza, so it looks that we should put it under the section
>> "PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an 
&g

Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-19 Thread Kewen.Lin
Hi Carl,

on 2024/5/18 04:20, Carl Love wrote:
> Kewen:
> 
> I am working thru the patches.  I made the changes as requested for this 
> patch but have a question about 
> one of your comments.
> 
> On 5/14/24 00:53, Kewen.Lin wrote:
>> Hi,
>>
>> on 2024/4/20 05:17, Carl Love wrote:
>>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>>
>>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>>> vector of floats to return the even/odd signed/unsigned integers.
>>>
>>> Add testcases and update documentation.
>>>
>>> gcc/ChangeLog:
>>> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>>> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>>> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
>>> Add new overloaded specifications.
>>> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
>>> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/powerpc/builtins-3-runnable: New tests for the added
> 
> 
> 
>>
>> As the existing instances for vec_signed and vec_unsigned are with
>> names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
>> with similar style, maybe something like:
>>
>> VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf
> 
> Yes, sounds reasonable.  Changed XVCVSPUXDS -> VEC_VUNSIGNEDE_V4SF
>  XVCVSPUXDSO -> VEC_VUNSIGNEDO_V4SF
>XVCVSPSXDS  -> VEC_VSIGNEDE_V4SF
>XVCVSPSXDSO  -> VEC_VSIGNEDO_V4SF
> 
> QUESTION:
> I am not sure what you want changed to v{un,}signed{e,o}_v4sf??  The 
> overloaded instance entry names

It's about the expander name, just like the existing *vunsignede_v2df* and 
*vunsignedo_v2df*:

  const vsi __builtin_vsx_vunsignede_v2df (vd);
VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}

  const vsi __builtin_vsx_vunsignedo_v2df (vd);
VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}

, not for the actual builtin names, sorry for the confusion.

BR,
Kewen

> for vd, vf have to match the first line of the definition. The name can't be 
> type specific, i.e. v4sf.  
> So not sure where you want the v{un,}signed{e,o}_v4sf name used?
> 
> For example, file rs6000-overloaded.def now looks like:
> 
> [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>vsi __builtin_vec_vsignede (vd);
>  VEC_VSIGNEDE_V2DF
> +  vsll __builtin_vec_vsignede (vf);
> +VEC_VSIGNEDE_V4SF
>  
>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>vsi __builtin_vec_vsignedo (vd);
>  VEC_VSIGNEDO_V2DF
> +  vsll __builtin_vec_vsignedo (vf);
> +VEC_VSIGNEDO_V4SF
>  
> 
> 
> 
> 
>  Carl 



Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-16 Thread Kewen.Lin
Hi,

on 2024/5/16 12:08, Andrew Pinski wrote:
> 
> On Thu, May 16, 2024, 4:09 AM Kewen.Lin  <mailto:li...@linux.ibm.com>> wrote:
> 
> Hi,
> 
> As the associated test case in PR114846 shows, currently
> with eh_return involved some register restoring for EH
> RETURN DATA in epilogue can clobber the one which holding
> the return value.  Referring to the existing handlings in
> some other targets, this patch makes eh_return expander
> call one new define_insn_and_split eh_return_internal which
> directly calls rs6000_emit_epilogue with epilogue_type
> EPILOGUE_TYPE_EH_RETURN instead of the previous treating
> normal return with crtl->calls_eh_return specially.
> 
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
> 
> I'm going to push this next week if no objections.
> 
> 
> 
> Thanks for fixing this for powerpc. I hope my patch for aarch64 gets reviewed 
> soon and it will contain many more testcases. Hopefully someone will fix the 
> arm target too.
> 

Looking forward to that!  Thanks for contributing those new eh-return c-torture
test cases, I just tested all of them on LE, all passed. :)

BR,
Kewen

> Thanks,
> Andrew
> 
> 
> 
> BR,
> Kewen
> -
>         PR target/114846
> 
> gcc/ChangeLog:
> 
>         * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
>         EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
>         now, adjust the relevant handlings on it.
>         * config/rs6000/rs6000.md (eh_return expander): Append by calling
>         gen_eh_return_internal and emit_barrier.
>         (eh_return_internal): New define_insn_and_split, call function
>         rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.target/powerpc/pr114846.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc           |  7 +++
>  gcc/config/rs6000/rs6000.md                 | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
>  3 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..bd5d56ba002 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    rs6000_stack_t *info = rs6000_stack_info ();
> 
> -  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
> -    epilogue_type = EPILOGUE_TYPE_EH_RETURN;
> -
>    int strategy = info->savres_strategy;
>    bool using_load_multiple = !!(strategy & REST_MULTIPLE);
>    bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
> @@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type 
> epilogue_type)
> 
>    /* In the ELFv2 ABI we need to restore all call-saved CR fields from
>       *separate* slots if the routine calls __builtin_eh_return, so
> -     that they can be independently restored by the unwinder.  */
> +     that they can be independently restored by the unwinder.  Since
> +     it is for CR fields restoring, it should be done for any epilogue
> +     types (not EPILOGUE_TYPE_EH_RETURN specific).  */
>    if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
>      {
>        int i, cr_off = info->ehcr_offset;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..d4120c3b9ce 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14281,6 +14281,8 @@ (define_expand "eh_return"
>    ""
>  {
>    emit_insn (gen_eh_set_lr (Pmode, operands[0]));
> +  emit_jump_insn (gen_eh_return_internal ());
> +  emit_barrier ();
>    DONE;
>  })
> 
> @@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
>    DONE;
>  })
> 
> +(define_insn_and_split "eh_return_internal"
> +  [(eh_return)]
> +  ""
> +  "#"
> +  "epilogue_completed"
> +  [(const_int 0)]
> +{
> +  if (!TARGET_SCHED_PROLOG)
> +    emit_insn (gen_blockage ());
> +  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
> +  DONE;
> +})
> +
>  (define_insn "pr

[PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-15 Thread Kewen.Lin
Hi,

As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value.  Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/114846

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr114846.c: New test.
---
 gcc/config/rs6000/rs6000-logue.cc   |  7 +++
 gcc/config/rs6000/rs6000.md | 15 +++
 gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
 3 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 60ba15a8bc3..bd5d56ba002 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)

   rs6000_stack_t *info = rs6000_stack_info ();

-  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
-epilogue_type = EPILOGUE_TYPE_EH_RETURN;
-
   int strategy = info->savres_strategy;
   bool using_load_multiple = !!(strategy & REST_MULTIPLE);
   bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
@@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)

   /* In the ELFv2 ABI we need to restore all call-saved CR fields from
  *separate* slots if the routine calls __builtin_eh_return, so
- that they can be independently restored by the unwinder.  */
+ that they can be independently restored by the unwinder.  Since
+ it is for CR fields restoring, it should be done for any epilogue
+ types (not EPILOGUE_TYPE_EH_RETURN specific).  */
   if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
 {
   int i, cr_off = info->ehcr_offset;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ac5651d7420..d4120c3b9ce 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14281,6 +14281,8 @@ (define_expand "eh_return"
   ""
 {
   emit_insn (gen_eh_set_lr (Pmode, operands[0]));
+  emit_jump_insn (gen_eh_return_internal ());
+  emit_barrier ();
   DONE;
 })

@@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
   DONE;
 })

+(define_insn_and_split "eh_return_internal"
+  [(eh_return)]
+  ""
+  "#"
+  "epilogue_completed"
+  [(const_int 0)]
+{
+  if (!TARGET_SCHED_PROLOG)
+emit_insn (gen_blockage ());
+  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
+  DONE;
+})
+
 (define_insn "prefetch"
   [(prefetch (match_operand 0 "indexed_or_indirect_address" "a")
 (match_operand:SI 1 "const_int_operand" "n")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr114846.c 
b/gcc/testsuite/gcc.target/powerpc/pr114846.c
new file mode 100644
index 000..efe2300b73a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr114846.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-require-effective-target builtin_eh_return } */
+
+/* Ensure it runs successfully.  */
+
+__attribute__ ((noipa))
+int f (int *a, long offset, void *handler)
+{
+  if (*a == 5)
+return 5;
+  __builtin_eh_return (offset, handler);
+}
+
+int main ()
+{
+  int t = 5;
+  if (f (, 0, 0) != 5)
+__builtin_abort ();
+  return 0;
+}
--
2.39.3


Re: [PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-14 Thread Kewen.Lin
Hi Joseph and Richi,

Thanks for the suggestions and comments!

on 2024/5/10 14:31, Richard Biener wrote:
> On Thu, May 9, 2024 at 9:12 PM Joseph Myers  wrote:
>>
>> On Wed, 8 May 2024, Kewen.Lin wrote:
>>
>>> to widen IFmode to TFmode.  To make build_common_tree_nodes
>>> be able to find the correct mode for long double type node,
>>> it introduces one hook mode_for_longdouble to offer target
>>> a way to specify the mode used for long double type node.
>>
>> I don't really like layering a hook on top of the old target macro as a
>> way to address a deficiency in the design of that target macro (floating
>> types should have their mode, not a poorly defined precision value,
>> specified directly by the target).

Good point!

> 
> Seconded.
> 
>> A better hook design might be something like mode_for_floating_type (enum
>> tree_index), where the argument is TI_FLOAT_TYPE, TI_DOUBLE_TYPE or
>> TI_LONG_DOUBLE_TYPE, replacing all definitions and uses of
>> FLOAT_TYPE_SIZE, DOUBLE_TYPE_SIZE and LONG_DOUBLE_TYPE_SIZE with the
>> single new hook and appropriate definitions for each target (with a
>> default definition that uses SFmode for float and DFmode for double and
>> long double, which would be suitable for many targets).
> 

The originally proposed hook was meant to make the other ports unaffected,
but I agree that introducing such hook would be more clear.

> In fact replacing all of X_TYPE_SIZE with a single hook might be worthwhile
> though this removes the "convenient" defaulting, requiring each target to
> enumerate all standard C ABI type modes.  But that might be also a good thing.
> 

I guess the main value by extending from floating point types to all is to
unify them?  (Assuming that excepting for floating types the others would
not have multiple possible representations like what we faces on 128bit fp).

> The most pragmatic solution would be to do
> s/LONG_DOUBLE_TYPE_SIZE/LONG_DOUBLE_TYPE_MODE/

Yeah, this beats my proposed hook (assuming the default is VOIDmode too).

So it seems we have three alternatives here:
  1) s/LONG_DOUBLE_TYPE_SIZE/LONG_DOUBLE_TYPE_MODE/
  2) mode_for_floating_type
  3) mode_for_abi_type

Since 1) would make long double type special (different from the other types
having _TYPE_SIZE), personally I'm inclined to 3): implement 2) first, get
this patch series landed, extend to all.

Do you have any preference?  

BR,
Kewen


Re: [PATCH 11/13] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-14 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in
> 
> The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
> __builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
> there are no test cases for it.  The patch removes built-in
> __builtin_vsx_xvcmpeqsp_p.
As the previous review comments in the v1 (this is actually v2):
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646728.html
, both __builtin_vsx_xvcmpeqsp_p and __builtin_vsx_xvcmpeqsp can be
dropped, so please consider __builtin_vsx_xvcmpeqsp as well.

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtin.cc (case RS6000_BIF_RSQRT):
>   Remove case statement.

It seems you mixed this with some other patch, this line doesn't
belong to this patch, ...

> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
>   Remove built-in definition.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 6 --
>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>  2 files changed, 12 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index f83d65b06ef..74ed8fc1805 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -269,12 +269,6 @@ rs6000_builtin_md_vectorized_function (tree fndecl, tree 
> type_out,
>  = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>switch (fn)
>  {
> -case RS6000_BIF_RSQRTF:
> -  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
> -   && out_mode == SFmode && out_n == 4
> -   && in_mode == SFmode && in_n == 4)
> - return rs6000_builtin_decls[RS6000_BIF_VRSQRTFP];
> -  break;

... and this ...

>  case RS6000_BIF_RSQRT:
>if (VECTOR_UNIT_VSX_P (V2DFmode)
> && out_mode == DFmode && out_n == 2
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index d65c858ac0c..2f6149edd5f 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -917,9 +917,6 @@
>fpmath vf __builtin_altivec_vrsqrtefp (vf);
>  VRSQRTEFP rsqrtev4sf2 {}
>  
> -  fpmath vf __builtin_altivec_vrsqrtfp (vf);
> -VRSQRTFP rsqrtv4sf2 {}
> -

..., also this.

BR,
Kewen

>const vsc __builtin_altivec_vsel_16qi (vsc, vsc, vuc);
>  VSEL_16QI vector_select_v16qi {}
>  
> @@ -1619,9 +1616,6 @@
>const vf __builtin_vsx_xvcmpeqsp (vf, vf);
>  XVCMPEQSP vector_eqv4sf {}
>  
> -  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
> -XVCMPEQSP_P vector_eq_v4sf_p {pred}
> -
>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
>  


Re: [PATCH 2/13] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-14 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, Remove __builtin_vsx_xvcvspsxws built-in
> 
> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
> built-in is not documented and there are no test cases for it.
> 
> This patch removes the redundant built-in.

By revisiting the comments on the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646723.html

I wonder if it's intentional to keep the others, at least bifs
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws and
__builtin_vsx_xvcvuxddp_uns looks removable, users can just uses the
equivalent ones in PVIPR.  And for the others, users can still use
the PVIPR ones by considering endianness (controlling with endianness
macros).

BR,
Kewen

> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws):
>   Remove built-in definition.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 7c36976a089..c6d2ea1bc39 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1709,9 +1709,6 @@
>const vsll __builtin_vsx_xvcvspsxds (vf);
>  XVCVSPSXDS vsx_xvcvspsxds {}
>  
> -  const vsi __builtin_vsx_xvcvspsxws (vf);
> -XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
> -
>const vsll __builtin_vsx_xvcvspuxds (vf);
>  XVCVSPUXDS vsx_xvcvspuxds {}
>  


Re: [PATCH 4/13] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-14 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, extend the current vec_{un,}signed{e,o} built-ins
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
> convert a vector of floats to signed/unsigned long long ints.  Extend the
> existing vec_{un,}signed{e,o} built-ins to handle the argument
> vector of floats to return the even/odd signed/unsigned integers.
> 
> Add testcases and update documentation.
> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
> __builtin_vsx_xvcvspuxds_low): New built-in definitions.
> * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo):
> Add new overloaded specifications.
> * config/rs6000/vsx.md (vsx_xvcvspxds_low): New define_expand.
> * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
> 
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/builtins-3-runnable: New tests for the added
> overloaded built-ins.

This part is missing, there are no test case changes in this patch.

> ---
>  gcc/config/rs6000/rs6000-builtins.def |  6 ++
>  gcc/config/rs6000/rs6000-overload.def |  8 
>  gcc/config/rs6000/vsx.md  | 23 +++
>  gcc/doc/extend.texi   | 13 +
>  4 files changed, 50 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index bf9a0ae22fc..5b7237a2327 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1709,9 +1709,15 @@
>const vsll __builtin_vsx_xvcvspsxds (vf);
>  XVCVSPSXDS vsx_xvcvspsxds {}
>  
> +  const vsll __builtin_vsx_xvcvspsxds_low (vf);
> +XVCVSPSXDSO vsx_xvcvspsxds_low {}
> +
>const vsll __builtin_vsx_xvcvspuxds (vf);
>  XVCVSPUXDS vsx_xvcvspuxds {}

This existing should return with type vull, ...

>  
> +  const vsll __builtin_vsx_xvcvspuxds_low (vf);
> +XVCVSPUXDSO vsx_xvcvspuxds_low {}

... so this copied one should be vull too.

As the existing instances for vec_signed and vec_unsigned are with
names like VEC_V{UN,}SIGNED{O,E}_V2DF, I prefer these are updated
with similar style, maybe something like:

VEC_V{UN,}SIGNED{E,O}_V4SF v{un,}signed{e,o}_v4sf

>const vsi __builtin_vsx_xvcvspuxws (vf);
>  XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
>  > diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 84bd9ae6554..68501c05289 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3307,10 +3307,14 @@
>  [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>vsi __builtin_vec_vsignede (vd);
>  VEC_VSIGNEDE_V2DF
> +  vsll __builtin_vec_vsignede (vf);
> +XVCVSPSXDS
>  
>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>vsi __builtin_vec_vsignedo (vd);
>  VEC_VSIGNEDO_V2DF
> +  vsll __builtin_vec_vsignedo (vf);
> +XVCVSPSXDSO
>  
>  [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
>vsi __builtin_vec_signexti (vsc);
> @@ -4433,10 +4437,14 @@
>  [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
>vui __builtin_vec_vunsignede (vd);
>  VEC_VUNSIGNEDE_V2DF
> +  vull __builtin_vec_vunsignede (vf);
> +XVCVSPUXDS
>  
>  [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
>vui __builtin_vec_vunsignedo (vd);
>  VEC_VUNSIGNEDO_V2DF
> +  vull __builtin_vec_vunsignedo (vf);
> +XVCVSPUXDSO
>  
As above, the name can be tweaked.

>  [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
>vui __builtin_vec_extract_exp (vf);
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..3d39ae7995f 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -2704,6 +2704,29 @@
>DONE;
>  })
>  
> +;; Convert low vector elements of 32-bit floating point numbers to vector of
> +;; 64-bit signed/unsigned integers.
> +(define_expand "vsx_xvcvspxds_low"
> +  [(match_operand:V2DI 0 "vsx_register_operand")
> +   (match_operand:V4SF 1 "vsx_register_operand")
> +   (any_fix (pc))]
> +  "VECTOR_UNIT_VSX_P (V2DFmode)"
> +{
> +  /* Shift left one word to put even word in correct location */
> +  rtx rtx_tmp;
> +  rtx rtx_val = GEN_INT (4);
> +  rtx_tmp = gen_reg_rtx (V4SFmode);
> +  emit_insn (gen_altivec_vsldoi_v4sf (rtx_tmp, operands[1], operands[1],
> +  rtx_val));
> +

I think this shift is only needed for LE, see the existing handlings on 
float/signed int to double conversions, like:

(define_expand "doublee2"
(define_expand "doubleo2"

> +  if (BYTES_BIG_ENDIAN)
> +emit_insn (gen_vsx_xvcvspxds_be (operands[0], rtx_tmp));
> +  else
> +emit_insn (gen_vsx_xvcvspxds_le (operands[0], rtx_tmp));
> +
> +  DONE;
> +})
> +
>  ;; Generate float2 double
>  ;; convert two double to float
>  (define_expand "float2_v2df"
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 

Re: [PATCH 3/13] rs6000, fix error in unsigned vector float to unsigned int built-in definitions

2024-05-14 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, fix error in unsigned vector float to unsigned  int built-in 
> definitions
> 
> The built-ins __builtin_vsx_vunsigned_v2df and__builtin_vsx_vunsigned_v4sf
> are supposed to take a vector of floats and return a vector of unsigned
> long long ints.  The definitions are using the signed version of the

Sorry for nitpicking, here __builtin_vsx_vunsigned_v2df takes vector of doubles
and returns vector of unsigned long long ints while __builtin_vsx_vunsigned_v4sf
takes vector of floats and returns vector of unsigned ints.

> instructions not the unsigned version of the instruction.  The results
> should also be unsigned.  The builtins are used by the overloaded
> vec_unsigned builtin which has an unsigned result.
> 
> Similarly the built-ins __builtin_vsx_vunsignede_v2df and
> __builtin_vsx_vunsignedo_v2df are supposed to retun an unsigned result.

Nit: s/retun/return/

> If the floating point argument is negative, the unsigned result is zero.
> The built-ins are used in the overloaded built-in vec_unsignede and
> vec_unsignedo respectively.
> 
> Add a test cases for a negative floating point arguments for each of the
> above built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
>   __builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
>   __builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/builtins-3-runnable.c: Add tests for
>   vec_unsignede and vec_unsignedo with negative arguments.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 +-
>  .../gcc.target/powerpc/builtins-3-runnable.c  | 23 ---
>  2 files changed, 26 insertions(+), 9 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index c6d2ea1bc39..bf9a0ae22fc 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1580,16 +1580,16 @@
>const vsi __builtin_vsx_vsignedo_v2df (vd);
>  VEC_VSIGNEDO_V2DF vsignedo_v2df {}
>  
> -  const vsll __builtin_vsx_vunsigned_v2df (vd);
> -VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
> +  const vull __builtin_vsx_vunsigned_v2df (vd);
> +VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
>  
> -  const vsi __builtin_vsx_vunsigned_v4sf (vf);
> -VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
> +  const vui __builtin_vsx_vunsigned_v4sf (vf);
> +VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
>  
> -  const vsi __builtin_vsx_vunsignede_v2df (vd);
> +  const vui __builtin_vsx_vunsignede_v2df (vd);
>  VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
>  
> -  const vsi __builtin_vsx_vunsignedo_v2df (vd);
> +  const vui __builtin_vsx_vunsignedo_v2df (vd);
>  VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
>  
>const vf __builtin_vsx_xscvdpsp (double);
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> index 0231a1fd086..6d4fe84c8a1 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
> @@ -313,6 +313,15 @@ int main()
>   test_unsigned_int_result (ALL, vec_uns_int_result,
> vec_uns_int_expected);
>  
> + /* Convert single precision float to  unsigned int.  Negative
> +arguments
> +  */
> + vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
> + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
> + vec_uns_int_result = vec_unsigned (vec_flt0);
> + test_unsigned_int_result (ALL, vec_uns_int_result,
> +   vec_uns_int_expected);
> +
>   /* Convert double precision float to long long unsigned int */
>   vec_dble0 = (vector double){124.930, 8134.49};
>   vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
> @@ -321,9 +330,9 @@ int main()
>vec_ll_uns_int_expected);

Nit: Similar coverage on negative for vector double can be added here.

BR,
Kewen

>  
>   /* Convert double precision vector float to vector unsigned int,
> -even words */
> - vec_dble0 = (vector double){3124.930, 8234.49};
> - vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
> +even words.  Negative arguments */
> + vec_dble0 = (vector double){-124.930, -234.49};
> + vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
>   vec_uns_int_result = vec_unsignede (vec_dble0);
>   test_unsigned_int_result (EVEN, vec_uns_int_result,
> vec_uns_int_expected);
> @@ -335,5 +344,13 @@ int main()
>   vec_uns_int_result = vec_unsignedo (vec_dble0);
>   test_unsigned_int_result (ODD, vec_uns_int_result,
> vec_uns_int_expected);
> +
> + /* Convert double precision vector float to vector unsigned int,
> +

Re: [PATCH 13/13] rs6000, remove vector set and vector init built-ins.

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove vector set and vector init built-ins.
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_set_v1ti
> 
> perform the same operation as initializing the vector in C code.  For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf
> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.

I think we can replace these calls with the equivalent gimple codes
(early expanding it) and then we can get rid of these instances.

BR,
Kewen

> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>__builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>   __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>   __builtin_vec_init_v2df, __builtin_vec_set_v1ti,
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>   __builtin_vec_set_v1ti): Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>  1 file changed, 2 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 19d05b8043a..d04ad4ce7e5 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1115,37 +1115,6 @@
>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>  VEC_EXT_V8HI nothing {extract}
>  
> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char);
> -VEC_INIT_V16QI nothing {init}
> -
> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
> -VEC_INIT_V4SF nothing {init}
> -
> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
> - signed int);
> -VEC_INIT_V4SI nothing {init}
> -
> -  const vss __builtin_vec_init_v8hi (signed short, signed short, signed 
> short,\
> - signed short, signed short, signed short, signed short, \
> - signed short);
> -VEC_INIT_V8HI nothing {init}
> -
> -  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
> -VEC_SET_V16QI nothing {set}
> -
> -  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
> -VEC_SET_V4SF nothing {set}
> -
> -  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
> -VEC_SET_V4SI nothing {set}
> -
> -  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
> -VEC_SET_V8HI nothing {set}
> -
> -
>  ; Cell builtins.
>  [cell]
>pure vsc __builtin_altivec_lvlx (signed long, const void *);
> @@ -1292,15 +1261,8 @@
>const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  VEC_EXT_V2DI nothing {extract}
>  
> -  const vsq __builtin_vec_init_v1ti (signed __int128);
> -VEC_INIT_V1TI nothing {init}
> -
> -  const vd __builtin_vec_init_v2df (double, double);
> -VEC_INIT_V2DF nothing {init}
> -
> -  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
> -VEC_INIT_V2DI nothing {init}
> -
> +;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
> +;; resolve_vec_insert(), rs6000-c.cc
>const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
>  VEC_SET_V1TI nothing {set}
>  



Re: [PATCH 12/13] rs6000, remove __builtin_vsx_xvcmpeqsp built-in

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_xvcmpeqsp built-in
> 
> The built-in __builtin_vsx_xvcmpeqsp is a duplicate of the overloaded
> vec_cmpeq built-in.  The built-in is undocumented.  The built-in and
> the test cases are removed.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp):
>   Remove built-in definition.
> 

Ah, you separated this __builtin_vsx_xvcmpeqsp from the one for
__builtin_vsx_xvcmpeqsp_p, it's fine, please ignore the comments for
considering this __builtin_vsx_xvcmpeqsp in my previous reply to 11/13.


> gcc/testsuite/ChangeLog:
>   * vsx-builtin-3.c (do_cmp): Remove test case for
>   __builtin_vsx_xvcmpeqsp.
> ---
>  gcc/config/rs6000/rs6000-builtins.def| 3 ---
>  gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 2 --
>  2 files changed, 5 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 2f6149edd5f..19d05b8043a 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,9 +1613,6 @@
>const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>  XVCMPEQDP_P vector_eq_v2df_p {pred}
>  
> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
> -XVCMPEQSP vector_eqv4sf {}
> -
>const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 35ea31b2616..245893dc0e3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -27,7 +27,6 @@
>  /* { dg-final { scan-assembler "xvcmpeqdp" } } */
>  /* { dg-final { scan-assembler "xvcmpgtdp" } } */
>  /* { dg-final { scan-assembler "xvcmpgedp" } } */
> -/* { dg-final { scan-assembler "xvcmpeqsp" } } */
>  /* { dg-final { scan-assembler "xvcmpgtsp" } } */
>  /* { dg-final { scan-assembler "xvcmpgesp" } } */
>  /* { dg-final { scan-assembler "xxsldwi" } } */
> @@ -112,7 +111,6 @@ int do_cmp (void)
>d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
>d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
>  
> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
>f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
>f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
>return i;

As the other in this patch series, I prefer to change it with
vec_cmpeq here, OK for trunk with this tweaked (also keep the
scan there), thanks!

BR,
Kewen



Re: [PATCH 10/13] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, extend vec_xxpermdi built-in for __int128 args
> 
> Add a new overloaded instance for vec_xxpermdi
> 
>__int128 vec_xxpermdi (__int128, __int128, const int);
> 
> Update the documentation to include a reference to the new built-in
> instance.
> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (vec_xxpermdi): Add new
>   overloaded built-in instance.
> ---
>  gcc/config/rs6000/rs6000-overload.def | 2 ++
>  gcc/doc/extend.texi   | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 5912c9452f4..49962e2f2a2 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4932,6 +4932,8 @@
>  XXPERMDI_4SF  XXPERMDI_VF
>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>  XXPERMDI_2DF  XXPERMDI_VD
> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
> +XXPERMDI_1TI  XXPERMDI_1TI

This actually introduces the signed __int128, considering the other
existing ones, I think we want both signed and unsigned.

>  
>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 86b8e536dbe..47cf2f3bc8b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22505,6 +22505,7 @@ void vec_vsx_st (vector bool char, int, vector bool 
> char *);
>  void vec_vsx_st (vector bool char, int, unsigned char *);
>  void vec_vsx_st (vector bool char, int, signed char *);
>  
> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>  vector double vec_xxpermdi (vector double, vector double, const int);
>  vector float vec_xxpermdi (vector float, vector float, const int);

Nit: Considering the existing ones sorted by element size descending, I guess
it's better to move the above here (and with the explicit signed and unsigned).

And we need a test case for it as well?

BR,
Kewen

>  vector long long vec_xxpermdi (vector long long, vector long long, const 
> int);




Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/14 11:00, Jiufu Guo wrote:
> Hi,
> 
> Thanks a lot for your helpful review!
> 
> "Kewen.Lin"  writes:
> 
>> Hi,
>>
>> on 2024/5/13 10:57, Jiufu Guo wrote:
>>> Hi,
>>>
>>> For PR96866, when gcc print asm code for modifier "%a" which requires
>>> an address operand, while the operand is with the constraint "X" which
>>> allow non-address form.  An error message would be reported to indicate
>>> the invalid asm operands.
>>>
>>> Bootstrap pass on ppc64{,le}.
>>> Is this ok for trunk?
>>>
>>> BR,
>>> Jeff(Jiufu Guo)
>>>
>>> PR target/96866
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (print_operand_address):
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/powerpc/pr96866-1.c: New test.
>>> * gcc.target/powerpc/pr96866-2.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000.cc  |  6 ++
>>>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
>>>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
>>>  3 files changed, 31 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 117999613d8..50943d76f79 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
>>>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
>>>|| GET_CODE (x) == LABEL_REF)
>>>  {
>>> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
>>
>> Do we really need this_is_asm_operands here?
> I understand your point: 
> since in function 'print_operand_address' which supports not only user
> asm code.  So, it maybe incorrect if 'x' is not an 'address_operand',
> no matter this_is_asm_operands.
> 
> Here, 'this_is_asm_operands' is needed because it would be treated as an
> user fault in asm-code (otherwise, internal_error in the compiler).

The called function "output_operand_lossage" already takes different
actions for this_is_asm_operands and !this_is_asm_operands cases, so
for this_is_asm_operands, it goes with error_for_asm and no ICE, no?

And without this_is_asm_operands, if we adopt constraint X internally
and hit this (it means it's already unexpected), isn't better to see
the ICE instead of going further?

BR,
Kewen

> 
> I notice one thing:
> As what we need is emitting error for printing address if the address
> can not be access directly.
> So it would be better to emit message through 'output_operand_lossage'
> just befor gcc_assert(TARGET_TOC).
> 
> Thanks a lot for your insight comment!
> 
>>
>>> +   {
>>> + output_operand_lossage ("invalid expression as operand");
>>> + return;
>>> +   }
>>> +
>>>output_addr_const (file, x);
>>>if (small_data_operand (x, GET_MODE (x)))
>>> fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>> new file mode 100644
>>> index 000..6554a472a11
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>>> @@ -0,0 +1,15 @@
>>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>>> */
>>> +/* { dg-excess-errors "pr96866-2.c" } */
>>> +/* { dg-options "-fPIC -O2" } */
>>
>> Nit: If these two options are required, it would be good to have a comment 
>> explaining it a bit
>> when it's not obvious.
> 
> Good suggestion, thanks!
>>
>>> +
>>> +int x[2];
>>> +
>>> +int __attribute__ ((noipa))
>>> +f1 (void)
>>> +{
>>> +  int n;
>>> +  int *p = x;
>>> +  *p++;
>>> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
>>> +  return n;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>> new file mode 100644
>>> index 000..a5ec96f29dd
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
>>> @@ -0,0 +1,10 @@
>>> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  
>>> */
>>> +/* { dg-excess-errors "pr96866-2.c" } */
>>> +/* { dg-options "-fPIC -O2" } */
>>
>> Ditto.
> Thanks!
> 
> BR,
> Jeff(Jiufu) Guo
>>
>> BR,
>> Kewen
>>
>>> +
>>> +void
>>> +f (void)
>>> +{
>>> +  extern int x;
>>> +  __asm__ volatile("#%a0" ::"X"());
>>> +}





Re: [PATCH 9/13] rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins
> 
> The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
> redundant.  The overloaded vec_neg built-in provides the same
> functionality.  The two buit-ins are not documented nor are there any
> test cases for them.
> 
> Remove the definitions so users will use the overloaded vec_neg built-in
> which is documented in the PVIPR.

OK, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
>   __builtin_vsx_xvnegsp): Remove built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 6 --
>  1 file changed, 6 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f33564d3d9c..d65c858ac0c 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1763,12 +1763,6 @@
>const vf __builtin_vsx_xvnabssp (vf);
>  XVNABSSP vsx_nabsv4sf2 {}
>  
> -  const vd __builtin_vsx_xvnegdp (vd);
> -XVNEGDP negv2df2 {}
> -
> -  const vf __builtin_vsx_xvnegsp (vf);
> -XVNEGSP negv4sf2 {}
> -
>const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
>  XVNMADDDP nfmav2df4 {}
>  



Re: [PATCH 8/13] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove __builtin_vsx_vperm_* built-ins
> 
> The undocumented built-ins:
>   __builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti,
>   __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df,
>   __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns,
>   __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si,
>   __builtin_vsx_vperm_4si_uns
> 
> are duplicats of the __builtin_altivec_* builtins that are used by
> the overloaded vec_perm built-in that is documented in the PVIPR.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
>   __builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
>   built-in definitions and comments.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
>__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
>   __builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
>   __builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
>   __builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
>   test cases.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 33 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c| 20 ---
>  2 files changed, 53 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3c409d729ea..f33564d3d9c 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1529,39 +1529,6 @@
>const vf __builtin_vsx_uns_floato_v2di (vsll);
>  UNS_FLOATO_V2DI unsfloatov2di {}
>  
> -; These are duplicates of __builtin_altivec_* counterparts, and are being
> -; kept for backwards compatibility.  The reason for their existence is
> -; unclear.  TODO: Consider deprecation/removal at some point.
> -  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
> -VPERM_16QI_X altivec_vperm_v16qi {}
> -
> -  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
> -VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
> -
> -  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
> -VPERM_1TI_X altivec_vperm_v1ti {}
> -
> -  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
> -VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
> -
> -  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
> -VPERM_2DF_X altivec_vperm_v2df {}
> -
> -  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
> -VPERM_2DI_X altivec_vperm_v2di {}
> -
> -  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
> -VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
> -
> -  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
> -VPERM_4SF_X altivec_vperm_v4sf {}
> -
> -  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
> -VPERM_4SI_X altivec_vperm_v4si {}
> -
> -  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
> -VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
> -
>const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
>  VPERM_8HI_X altivec_vperm_v8hi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 01f35dad713..35ea31b2616 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -2,7 +2,6 @@
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> -/* { dg-final { scan-assembler "vperm" } } */
>  /* { dg-final { scan-assembler "xvrdpi" } } */
>  /* { dg-final { scan-assembler "xvrdpic" } } */
>  /* { dg-final { scan-assembler "xvrdpim" } } */
> @@ -56,25 +55,6 @@ extern __vector unsigned long long ull[][4];
>  extern __vector __bool long bl[][4];
>  #endif
>  
> -int do_perm(void)
> -{
> -  int i = 0;
> -
> -  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
> -
> -  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
> -  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
> -  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
> -  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
> -  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
> -
> -  return i;
> -}
> -

I prefer to just relace these __builtin_vsx_vperm with vec_perm,
OK with this tweaked (also keep the above removed vperm scan), thanks!

BR,
Kewen

>  int do_xxperm (void)
>  {
>int i 

Re: [PATCH 7/13] rs6000, remove the vec_xxsel built-ins, they are duplicates

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:18, Carl Love wrote:
> rs6000, remove the vec_xxsel built-ins, they are duplicates
> 
> The following undocumented built-ins are covered by the existing overloaded
> vec_sel built-in definitions.
> 
>   const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)
> 
>   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)
> 
>   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)
> 
>   const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)
> 
>   const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)
> 
>   const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)
> 
>   const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)
> 
>   const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)
> 
>   const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)
> 
>   const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)
> 
> This patch removed the duplicate built-in definitions so users will only
> use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
> 16qi, 4sf, 2df] tests are also removed.
> 
> gcc/ChangeLog:
> * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrglw_4si,

Typo: __builtin_vsx_xxmrglw_4si, which doesn't belong to this patch.

>   __builtin_vsx_xxsel_16qi, __builtin_vsx_xxsel_16qi_uns,
>   __builtin_vsx_xxsel_2df, __builtin_vsx_xxsel_2di,
>   __builtin_vsx_xxsel_2di_uns, __builtin_vsx_xxsel_4sf,
>   __builtin_vsx_xxsel_4si, __builtin_vsx_xxsel_4si_uns,
>   __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_8hi_uns): Remove
>   built-in definitions.
> 
> gcc/testsuite/ChangeLog:
> * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
> __builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
> __builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df): Remove test
> cases for removed built-ins.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 30 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c| 26 
>  2 files changed, 56 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 46d2ae7b7cb..3c409d729ea 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1925,36 +1925,6 @@
>const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
>  XXPERMDI_8HI vsx_xxpermdi_v8hi {}
>  
> -  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
> -XXSEL_16QI vector_select_v16qi {}
> -
> -  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
> -XXSEL_16QI_UNS vector_select_v16qi_uns {}
> -
> -  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
> -XXSEL_2DF vector_select_v2df {}
> -
> -  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
> -XXSEL_2DI vector_select_v2di {}
> -
> -  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
> -XXSEL_2DI_UNS vector_select_v2di_uns {}
> -
> -  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
> -XXSEL_4SF vector_select_v4sf {}
> -
> -  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
> -XXSEL_4SI vector_select_v4si {}
> -
> -  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
> -XXSEL_4SI_UNS vector_select_v4si_uns {}
> -
> -  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
> -XXSEL_8HI vector_select_v8hi {}
> -
> -  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
> -XXSEL_8HI_UNS vector_select_v8hi_uns {}
> -
>const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
>  XXSLDWI_16QI vsx_xxsldwi_v16qi {}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index ff875c55304..01f35dad713 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -2,7 +2,6 @@
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power7" } */
> -/* { dg-final { scan-assembler "xxsel" } } */
>  /* { dg-final { scan-assembler "vperm" } } */
>  /* { dg-final { scan-assembler "xvrdpi" } } */
>  /* { dg-final { scan-assembler "xvrdpic" } } */
> @@ -57,31 +56,6 @@ extern __vector unsigned long long ull[][4];
>  extern __vector __bool long 

Re: [PATCH 6/13] rs6000, add overloaded vec_sel with int128 arguments

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, add overloaded vec_sel with int128 arguments
> 
> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
> and return a signed/unsigned int128 result.
> 
> Extending the vec_sel built-in makes the existing buit-ins
> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
> patch removes these built-ins.
> 
> The patch adds documentation and test cases for the new overloaded vec_sel
> built-ins.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>   __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>   * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>   definitions.
>   * doc/extend.texi: Add documentation for new vec_sel arguments.
> 
> gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec_sel_runnable-int128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-builtins.def |  6 --
>  gcc/config/rs6000/rs6000-overload.def |  4 +
>  gcc/doc/extend.texi   | 14 
>  .../powerpc/vec-sel-runnable-i128.c   | 84 +++
>  4 files changed, 102 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index d09e21a9151..46d2ae7b7cb 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1931,12 +1931,6 @@
>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>  
> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
> -XXSEL_1TI vector_select_v1ti {}
> -
> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
> -
>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>  XXSEL_2DF vector_select_v2df {}
>  
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 68501c05289..5912c9452f4 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3274,6 +3274,10 @@
>  VSEL_2DF  VSEL_2DF_B
>vd __builtin_vec_sel (vd, vd, vull);
>  VSEL_2DF  VSEL_2DF_U
> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
> +VSEL_1TI  VSEL_1TI_S
> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
> +VSEL_1TI_UNS  VSEL_1TI_U
>  ; The following variants are deprecated.
>vsll __builtin_vec_sel (vsll, vsll, vsll);
>  VSEL_2DI_B  VSEL_2DI_S
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 64a43b55e2d..86b8e536dbe 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23358,6 +23358,20 @@ The programmer is responsible for understanding the 
> endianness issues involved
>  with the first argument and the result.
>  @findex vec_replace_unaligned
>  
> +Vector select
> +
> +@smallexample
> +vector signed __int128 vec_sel (vector signed __int128,
> +   vector signed __int128, vector signed __int128);
> +vector unsigned __int128 vec_sel (vector unsigned __int128,
> +   vector unsigned __int128, vector unsigned __int128);
> +@end smallexample
> +
> +The overloaded built-in @code{vec_sel} with vector signed/unsigned __int128
> +arguments and returns a vector selecting bits from the two source vectors 
> based
> +on the values of the third input vector.  This built-in is an extension of 
> the
> +@code{vec_sel} built-in documented in the PVIPR.
> +

Why did you place this in a section for ISA 3.1 (Power10)?  It doesn't really
require this support.  The used instance VSEL_1TI and VSEL_1TI_UNS are placed
in altivec stanza, so it looks that we should put it under the section
"PowerPC AltiVec Built-in Functions on ISA 2.05".  And since it's an extension
of @code{vec_sel} documented in the PVIPR, I prefer to just mention it's "an
extension of the @code{vec_sel} built-in documented in the PVIPR" and omitting
the description to avoid possible slightly different wording.

>  Vector Shift Left Double Bit Immediate
>  @smallexample
>  @exdent vector signed char vec_sldb (vector signed char, vector signed char,
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> new file mode 100644
> index 000..58eb383e8c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
> @@ -0,0 +1,84 @@
> +/* { dg-do run  { target power10_hw }} */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target power10_hw } */

As mentioned above, this doesn't require power10, you can specify vmx_hw.
(btw removing { target power10_hw } on dg-do run line).

> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

s/-mdejagnu-cpu=power10/-maltivec/
s/-save-temps//

> +
> +
> +#include 
> +
> +
> +#define DEBUG 0
> +
> +#if DEBUG
> +#include 
> +void 

Re: [PATCH 5/13] rs6000, remove duplicated built-ins of vecmergl and vec_mergeh

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:17, Carl Love wrote:
> rs6000, remove duplicated built-ins of vecmergl and vec_mergeh
> 
> The following undocumented built-ins are same as existing documented
> overloaded builtins.
> 
>   const vf __builtin_vsx_xxmrghw (vf, vf);
> same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)
> 
>   const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
> same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)
> 
>   const vf __builtin_vsx_xxmrglw (vf, vf);
> same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)
> 
>   const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
> same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)
> 
> This patch removes the duplicate built-in definitions so only the
> documented built-ins will be available for use.  The case statements in
> rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
> patch removes the now unused define_expands for vsx_xxmrghw_ and
> vsx_xxmrglw_.

Ok for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
>   __builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
>   __builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
>   built-in definition.
>   * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
>   remove case entries RS6000_BIF_XXMRGLW_4SI,
>   RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
>   RS6000_BIF_XXMRGHW_4SF.
>   * config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
>   Remove unused define_expands.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
>  gcc/config/rs6000/rs6000-builtins.def | 12 
>  gcc/config/rs6000/vsx.md  | 41 ---
>  3 files changed, 57 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index ac9f16fe51a..f83d65b06ef 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  /* vec_mergel (integrals).  */
>  case RS6000_BIF_VMRGLH:
>  case RS6000_BIF_VMRGLW:
> -case RS6000_BIF_XXMRGLW_4SI:
>  case RS6000_BIF_VMRGLB:
>  case RS6000_BIF_VEC_MERGEL_V2DI:
> -case RS6000_BIF_XXMRGLW_4SF:
>  case RS6000_BIF_VEC_MERGEL_V2DF:
>fold_mergehl_helper (gsi, stmt, 1);
>return true;
>  /* vec_mergeh (integrals).  */
>  case RS6000_BIF_VMRGHH:
>  case RS6000_BIF_VMRGHW:
> -case RS6000_BIF_XXMRGHW_4SI:
>  case RS6000_BIF_VMRGHB:
>  case RS6000_BIF_VEC_MERGEH_V2DI:
> -case RS6000_BIF_XXMRGHW_4SF:
>  case RS6000_BIF_VEC_MERGEH_V2DF:
>fold_mergehl_helper (gsi, stmt, 0);
>return true;
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 5b7237a2327..d09e21a9151 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1904,18 +1904,6 @@
>const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
>  XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
>  
> -  const vf __builtin_vsx_xxmrghw (vf, vf);
> -XXMRGHW_4SF vsx_xxmrghw_v4sf {}
> -
> -  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
> -XXMRGHW_4SI vsx_xxmrghw_v4si {}
> -
> -  const vf __builtin_vsx_xxmrglw (vf, vf);
> -XXMRGLW_4SF vsx_xxmrglw_v4sf {}
> -
> -  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
> -XXMRGLW_4SI vsx_xxmrglw_v4si {}
> -
>const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
>  XXPERMDI_16QI vsx_xxpermdi_v16qi {}
>  
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 3d39ae7995f..26560ecc38a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4810,47 +4810,6 @@
>  }
>[(set_attr "type" "vecperm")])
>  
> -;; V4SF/V4SI interleave
> -(define_expand "vsx_xxmrghw_"
> -  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> -(vec_select:VSX_W
> -   (vec_concat:
> - (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> - (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
> -   (parallel [(const_int 0) (const_int 4)
> -  (const_int 1) (const_int 5)])))]
> -  "VECTOR_MEM_VSX_P (mode)"
> -{
> -  rtx (*fun) (rtx, rtx, rtx);
> -  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
> -  : gen_altivec_vmrglw_direct_;
> -  if (!BYTES_BIG_ENDIAN)
> -std::swap (operands[1], operands[2]);
> -  emit_insn (fun (operands[0], operands[1], operands[2]));
> -  DONE;
> -}
> -  [(set_attr "type" "vecperm")])
> -
> -(define_expand "vsx_xxmrglw_"
> -  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
> - (vec_select:VSX_W
> -   (vec_concat:
> - (match_operand:VSX_W 1 "vsx_register_operand" "wa")
> - (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
> -   (parallel [(const_int 2) 

Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/9 15:35, HAO CHEN GUI wrote:
> Hi Kewen,
>   Thanks for your comments.
> 
> 在 2024/5/9 13:44, Kewen.Lin 写道:
>> Hi,
>>
>> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>>> Hi,
>>>   This patch enables overlapped by-piece operations. On rs6000, default
>>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>>> by-pieces.
>>
>> Thanks for enabling this, did you evaluate if it can help some benchmark?
> 
> Tested it with SPEC2017. No obvious performance impact. I think memory
> compare might not be hot enough.
> 
> Tested it with my micro benchmark. 5-10% performance gain when compare
> length is 7.

Nice!

> 
>>
>>>
>>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> rs6000: Enable overlapped by-pieces operations
>>>
>>> This patch enables overlapped by-piece operations by defining
>>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>>
>>> gcc/
>>> * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>>
>>> gcc/testsuite/
>>> * gcc.target/powerpc/block-cmp-9.c: New.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6b9a40fcc66..2b5f5cf1d86 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
>>> rs6000_attribute_table[] =
>>>  #undef TARGET_CONST_ANCHOR
>>>  #define TARGET_CONST_ANCHOR 0x8000
>>>
>>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>>> +
>>>  
>>>
>>>  /* Processor table.  */
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
>>> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> new file mode 100644
>>> index 000..b5f51affbb7
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
>>
>> Why does it need power8 forced here?
> 
> I just want to exclude P7 LE as targetm.slow_unaligned_access return false
> for it and the expand cmpmemsi won't be invoked.

> I think it over. It's no need. For the sub-targets which library is
> called, l[hb]z won't be generated too.

Thanks for checking, OK with dropping this forced power8.

BR,
Kewen

> 
>>
>> BR,
>> Kewen
>>
>>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>>> +
>>> +/* Test if by-piece overlap compare is enabled and following case is
>>> +   implemented by two overlap word loads and compares.  */
>>> +
>>> +int foo (const char* s1, const char* s2)
>>> +{
>>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>>> +}
>>
> 
> Thanks
> Gui Haochen



Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/5/13 10:57, Jiufu Guo wrote:
> Hi,
> 
> For PR96866, when gcc print asm code for modifier "%a" which requires
> an address operand, while the operand is with the constraint "X" which
> allow non-address form.  An error message would be reported to indicate
> the invalid asm operands.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu Guo)
> 
>   PR target/96866
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (print_operand_address):
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr96866-1.c: New test.
>   * gcc.target/powerpc/pr96866-2.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  |  6 ++
>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
>  3 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..50943d76f79 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
>  || GET_CODE (x) == LABEL_REF)
>  {
> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))

Do we really need this_is_asm_operands here?

> + {
> +   output_operand_lossage ("invalid expression as operand");
> +   return;
> + }
> +
>output_addr_const (file, x);
>if (small_data_operand (x, GET_MODE (x)))
>   fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> new file mode 100644
> index 000..6554a472a11
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> @@ -0,0 +1,15 @@
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */
> +/* { dg-options "-fPIC -O2" } */

Nit: If these two options are required, it would be good to have a comment 
explaining it a bit
when it's not obvious.

> +
> +int x[2];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int n;
> +  int *p = x;
> +  *p++;
> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
> +  return n;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> new file mode 100644
> index 000..a5ec96f29dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> @@ -0,0 +1,10 @@
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */
> +/* { dg-options "-fPIC -O2" } */

Ditto.

BR,
Kewen

> +
> +void
> +f (void)
> +{
> +  extern int x;
> +  __asm__ volatile("#%a0" ::"X"());
> +}



Re: [PATCH 1/13] rs6000, Remove __builtin_vsx_cmple* builtins

2024-05-13 Thread Kewen.Lin
Hi,

on 2024/4/20 05:16, Carl Love wrote:
> 
> rs6000, Remove __builtin_vsx_cmple* builtins
> 
> The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
> unsigned arguments and return an unsigned result.  The current definitions
> take signed arguments and return signed results which is incorrect.
> 
> The signed and unsigned versions of __builtin_vsx_cmple* are not
> documented in extend.texi.  Also there are no test cases for the
> built-ins.
> 
> Users can use the existing vec_cmple as PVIPR defines instead of
> __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
> __builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
> __builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
> __builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.
> 
> Hence these built-ins are redundant and are removed by this patch.

OK for trunk, thanks.

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
>   RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
>   RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
>   RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
>   RS6000_BIF_CMPLE_U1TI): Remove case statements.
>   config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
>   __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
>   __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
>   __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
>   __builtin_vsx_cmple_u8hi): Remove buit-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 13 
>  gcc/config/rs6000/rs6000-builtins.def | 30 ---
>  2 files changed, 43 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 320affd79e3..ac9f16fe51a 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
>  
> -case RS6000_BIF_CMPLE_16QI:
> -case RS6000_BIF_CMPLE_U16QI:
> -case RS6000_BIF_CMPLE_8HI:
> -case RS6000_BIF_CMPLE_U8HI:
> -case RS6000_BIF_CMPLE_4SI:
> -case RS6000_BIF_CMPLE_U4SI:
> -case RS6000_BIF_CMPLE_2DI:
> -case RS6000_BIF_CMPLE_U2DI:
> -case RS6000_BIF_CMPLE_1TI:
> -case RS6000_BIF_CMPLE_U1TI:
> -  fold_compare_helper (gsi, LE_EXPR, stmt);
> -  return true;
> -
>  /* flavors of vec_splat_[us]{8,16,32}.  */
>  case RS6000_BIF_VSPLTISB:
>  case RS6000_BIF_VSPLTISH:
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..7c36976a089 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1337,30 +1337,6 @@
>const vss __builtin_vsx_cmpge_u8hi (vus, vus);
>  CMPGE_U8HI vector_nltuv8hi {}
>  
> -  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
> -CMPLE_16QI vector_ngtv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
> -CMPLE_2DI vector_ngtv2di {}
> -
> -  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
> -CMPLE_4SI vector_ngtv4si {}
> -
> -  const vss __builtin_vsx_cmple_8hi (vss, vss);
> -CMPLE_8HI vector_ngtv8hi {}
> -
> -  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
> -CMPLE_U16QI vector_ngtuv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
> -CMPLE_U2DI vector_ngtuv2di {}
> -
> -  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
> -CMPLE_U4SI vector_ngtuv4si {}
> -
> -  const vss __builtin_vsx_cmple_u8hi (vss, vss);
> -CMPLE_U8HI vector_ngtuv8hi {}
> -
>const vd __builtin_vsx_concat_2df (double, double);
>  CONCAT_2DF vsx_concat_v2df {}
>  
> @@ -3117,12 +3093,6 @@
>const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
>  CMPGE_U1TI vector_nltuv1ti {}
>  
> -  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
> -CMPLE_1TI vector_ngtv1ti {}
> -
> -  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
> -CMPLE_U1TI vector_ngtuv1ti {}
> -
>const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
>  VCNTMBB vec_cntmb_v16qi {}
>  




Re: [PATCHv2] rs6000: Enable overlapped by-pieces operations

2024-05-12 Thread Kewen.Lin
on 2024/5/10 17:29, HAO CHEN GUI wrote:
> Hi,
>   This patch enables overlapped by-piece operations. On rs6000, default
> move/set/clear ratio is 2. So the overlap is only enabled with compare
> by-pieces.
> 
>   Compared to previous version, the change is to remove power8
> requirement from test case.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651045.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

OK,thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable overlapped by-pieces operations
> 
> This patch enables overlapped by-piece operations by defining
> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
> ratio is 2.  So the overlap is only enabled with compare by-pieces.
> 
> gcc/
>   * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-9.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..e713a1e1d57 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1776,6 +1776,9 @@ static const scoped_attribute_specs *const 
> rs6000_attribute_table[] =
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x8000
> 
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
>  
> 
>  /* Processor table.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> new file mode 100644
> index 000..f16429c2ffb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
> +
> +/* Test if by-piece overlap compare is enabled and following case is
> +   implemented by two overlap word loads and compares.  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 7) == 0;
> +}



Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-08 Thread Kewen.Lin
Hi,

on 2024/5/8 14:47, HAO CHEN GUI wrote:
> Hi,
>   This patch enables overlapped by-piece operations. On rs6000, default
> move/set/clear ratio is 2. So the overlap is only enabled with compare
> by-pieces.

Thanks for enabling this, did you evaluate if it can help some benchmark?

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable overlapped by-pieces operations
> 
> This patch enables overlapped by-piece operations by defining
> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
> ratio is 2.  So the overlap is only enabled with compare by-pieces.
> 
> gcc/
>   * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-9.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6b9a40fcc66..2b5f5cf1d86 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const 
> rs6000_attribute_table[] =
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x8000
> 
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
>  
> 
>  /* Processor table.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> new file mode 100644
> index 000..b5f51affbb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does it need power8 forced here?

BR,
Kewen

> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
> +
> +/* Test if by-piece overlap compare is enabled and following case is
> +   implemented by two overlap word loads and compares.  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 7) == 0;
> +}



Re: [PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]g

2024-05-08 Thread Kewen.Lin
Hi,

on 2024/5/9 06:01, Steve Kargl wrote:
> On Wed, May 08, 2024 at 01:27:53PM +0800, Kewen.Lin wrote:
>>
>> Previously effective target fortran_real_c_float128 never
>> passes on Power regardless of the default 128 long double
>> is ibmlongdouble or ieeelongdouble.  It's due to that TF
>> mode is always used for kind 16 real, which has precision
>> 127, while the node float128_type_node for c_float128 has
>> 128 type precision, get_real_kind_from_node can't find a
>> matching as it only checks gfc_real_kinds[i].mode_precision
>> and type precision.
>>
>> With changing TFmode/IFmode/KFmode to have the same mode
>> precision 128, now fortran_real_c_float12 can pass with
>> ieeelongdouble enabled by default and test cases guarded
>> with it get tested accordingly.  But with ibmlongdouble
>> enabled by default, since TFmode has precision 128 which
>> is the same as type precision 128 of float128_type_node,
>> get_real_kind_from_node considers kind for TFmode matches
>> float128_type_node, but it's wrong as at this time point
>> TFmode is with ibm extended format.  So this patch is to
>> teach get_real_kind_from_node to check one more field which
>> can be differentiable from the underlying real format, it
>> can avoid the unexpected matching when there more than one
>> modes have the same precision.
>>
>> Bootstrapped and regress-tested on:
>>   - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
>>   - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
>>   - powerpc64le-linux-gnu P9 (with ieee128 by default)
>>
>> BR,
>> Kewen
>> -
>>  PR target/112993
>>

> OK from the fortran point of view.
> Thanks.

> 
> First, I have no issue with Mikael's OK for committing the
> patch. 

Thanks to both!

> 
> That said, Fortran has the concept of model numbers, which
> are set in arith.c.  Does this change give the expected 
> value for ibm128?  For example, with "REAL(16) X", one
> has "DIGITS(X) = 113", which is the precision on the 
> of the underlying IEEE754 binary128 type.
> 

With some testings locally, I noticed that currently DIGITS has
been already correct even without this change.  For "REAL(16) X",
with -mabi=ibmlongdouble it's long double with ibm128 format and
its DIGITS(X) is 106, while with -mabi=ieeelongdouble it's long
double with ieee128 format and its DIGITS(X) is 113.

BR,
Kewen



Re: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128

2024-05-08 Thread Kewen.Lin
on 2024/4/30 07:11, Alexandre Oliva wrote:
> On Apr 29, 2024, "Kewen.Lin"  wrote:
> 
>> Thanks for catching this and sorry
>> that I didn't check it before suggesting it, I think we can aggressively
>> drop this effective target instead to avoid any possible confusion.
> 
> The 128-bit ones, unfortunately, follow the same pattern but are
> probably used.  IMHO we should transition all 3 to an '_ok' suffix, but...
> 

Yeah, I noticed the 128-bit ones are used, I was just suggesting dropping
check_effective_target_long_double_64bit and add_options_for_long_double_64bit
as there is no user (since release 12 when it's introduced r12-3151), IMHO
there would be not any uses in future, ...

>> How about the generic one "longdouble64"?  I did a grep and found it has one
>> use, I'd expect it can work here. :)
> 
> ... since this and longdouble128 exist, maybe we can fix it and leave
> them all alone, despite the interface oddity.
> 
... personally I'm inclined to drop this 64 bit one. :)

BR,
Kewen



Re: [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-08 Thread Kewen.Lin
Hi Richi,

>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index c584664e168..58e48f7dc55 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -18363,11 +18363,11 @@ If @code{N=0}, no pad location is recorded.
>>  The NOP instructions are inserted at---and maybe before, depending on
>>  @var{M}---the function entry address, even before the prologue.  On
>>  PowerPC with the ELFv2 ABI, for a function with dual entry points,
>> -the local entry point is this function entry address.
>> +@var{M} NOP instructions are inserted before the global entry point and
>> +@var{N} - @var{M} NOP instructions are inserted after the local entry
>> +point, which means the NOP instructions may not be consecutive.
> 
> Isn't it @var{M-1} NOP instructions before the global entry?  I suppose

No, the existing documentation is a bit confusing, sigh ...

> the existing
> 
> "... with the function entry point before the @var{M}th NOP.
> If @var{M} is omitted, it defaults to @code{0} so the
> function entry points to the address just at the first NOP."
> 
> wording is self-contradicting in a way since before the 0th NOP (default)
> to me is the same as before the 1st NOP (M == 1).  So maybe that should
> be _after_ the @var{M}th NOP instead which would be consistent with your
> ELFv2 docs?  Maybe the sentence should be re-worded similar to your
> ELVv2 one, specifying the number of NOPs before and after the entry point.
> 

... the current "with the function entry point before the Mth NOP."
has the 0th NOP assumption, so the default (0th) NOP and 1st NOP (M == 1)
are actually different, such as:

-fpatchable-function-entry=3,0

foo:
nop
nop
nop

-fpatchable-function-entry=3,1

nop
foo:
nop
nop

Alan also had the similar concern on this wording before:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99888#c8

" Alan Modra 2022-08-12 03:00:29 UTC
"
"(In reply to Segher Boessenkool from comment #7)
"> '-fpatchable-function-entry=N[,M]'
">  Generate N NOPs right at the beginning of each function, with the
">  function entry point before the Mth NOP.
"
" Bad doco.  Should be "after the Mth NOP" I think.
" Or better written to avoid the concept of a 0th nop.
" Default for M is zero, placing all nops after the function entry and
" before normal function prologue code.

BR,
Kewen

>> -The maximum value of @var{N} and @var{M} is 65535.  On PowerPC with the
>> -ELFv2 ABI, for a function with dual entry points, the supported values
>> -for @var{M} are 0, 2, 6 and 14.
>> +The maximum value of @var{N} and @var{M} is 65535.
>>  @end table
>>





[PATCH] testsuite, rs6000: Remove powerpcspe test cases and checks

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-4728 the powerpcspe support had been removed, this
follow-up patch is to remove the remaining pieces in testsuite.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_cmdline_needed): Remove
check_effective_target_powerpc_spe.
(check_effective_target_powerpc_spe_nocache): Remove.
(check_effective_target_powerpc_spe): Remove.
(check_ppc_cpu_supports_hw_available): Remove powerpc*-*-eabispe check.
(check_p8vector_hw_available): Likewise.
(check_p9vector_hw_available): Likewise.
(check_p9modulo_hw_available): Likewise.
(check_ppc_float128_sw_available): Likewise.
(check_ppc_float128_hw_available): Likewise.
(check_vsx_hw_available): Likewise.
(check_vmx_hw_available): Likewise.
(check_ppc_recip_hw_available): Likewise.
(check_dfp_hw_available): Likewise.
(check_htm_hw_available): Likewise.
* g++.dg/ext/spe1.C: Remove.
* g++.dg/other/opaque-1.C: Remove.
* g++.dg/other/opaque-2.C: Remove.
* g++.dg/other/opaque-3.C: Remove.
* g++.target/powerpc/simd-5.C: Remove.
---
 gcc/testsuite/g++.dg/ext/spe1.C   | 10 -
 gcc/testsuite/g++.dg/other/opaque-1.C | 31 --
 gcc/testsuite/g++.dg/other/opaque-2.C | 19 -
 gcc/testsuite/g++.dg/other/opaque-3.C | 12 --
 gcc/testsuite/g++.target/powerpc/simd-5.C | 44 ---
 gcc/testsuite/lib/target-supports.exp | 51 +++
 6 files changed, 5 insertions(+), 162 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/ext/spe1.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-1.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-2.C
 delete mode 100644 gcc/testsuite/g++.dg/other/opaque-3.C
 delete mode 100644 gcc/testsuite/g++.target/powerpc/simd-5.C

diff --git a/gcc/testsuite/g++.dg/ext/spe1.C b/gcc/testsuite/g++.dg/ext/spe1.C
deleted file mode 100644
index b98d4b27b3d..000
--- a/gcc/testsuite/g++.dg/ext/spe1.C
+++ /dev/null
@@ -1,10 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single -O0" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-typedef int v2si __attribute__ ((vector_size (8)));
-
-/* The two specializations must be considered different.  */
-template  class X { };
-template <>class X<__ev64_opaque__> { };
-template <>class X   { };
diff --git a/gcc/testsuite/g++.dg/other/opaque-1.C 
b/gcc/testsuite/g++.dg/other/opaque-1.C
deleted file mode 100644
index 669776b9f97..000
--- a/gcc/testsuite/g++.dg/other/opaque-1.C
+++ /dev/null
@@ -1,31 +0,0 @@
-/* { dg-do run } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-#define __vector __attribute__((vector_size(8)))
-typedef float __vector __ev64_fs__;
-
-__ev64_fs__ f;
-__ev64_opaque__ o;
-
-int here = 0;
-
-void bar (__ev64_opaque__ x)
-{
-  here = 0;
-}
-
-void bar (__ev64_fs__ x)
-{
-  here = 888;
-}
-
-int main ()
-{
-  f = o;
-  o = f;
-  bar (f);
-  if (here != 888)
-return 1;
-  return 0;
-}
diff --git a/gcc/testsuite/g++.dg/other/opaque-2.C 
b/gcc/testsuite/g++.dg/other/opaque-2.C
deleted file mode 100644
index 414f87e6c9a..000
--- a/gcc/testsuite/g++.dg/other/opaque-2.C
+++ /dev/null
@@ -1,19 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-#define __vector __attribute__((vector_size(8)))
-typedef float __vector __ev64_fs__;
-
-__ev64_fs__ f;
-__ev64_opaque__ o;
-
-extern void bar (__ev64_opaque__);
-
-int main ()
-{
-  f = o;
-  o = f;
-  bar (f);
-  return 0;
-}
diff --git a/gcc/testsuite/g++.dg/other/opaque-3.C 
b/gcc/testsuite/g++.dg/other/opaque-3.C
deleted file mode 100644
index f915f840510..000
--- a/gcc/testsuite/g++.dg/other/opaque-3.C
+++ /dev/null
@@ -1,12 +0,0 @@
-/* { dg-do compile } */
-/* { dg-options "-mcpu=8540 -mspe -mabi=spe -mfloat-gprs=single" } */
-/* { dg-skip-if "not an SPE target" { ! powerpc_spe_nocache } } */
-
-__ev64_opaque__ o;
-#define v __attribute__((vector_size(8)))
-v unsigned int *p;
-
-void m()
-{
-  o = __builtin_spe_evldd(p, 5);
-}
diff --git a/gcc/testsuite/g++.target/powerpc/simd-5.C 
b/gcc/testsuite/g++.target/powerpc/simd-5.C
deleted file mode 100644
index 71e117ead2a..000
--- a/gcc/testsuite/g++.target/powerpc/simd-5.C
+++ /dev/null
@@ -1,44 +0,0 @@
-// Test EH with V2SI SIMD registers actually restores correct values.
-// Origin: Joseph Myers 
-// { dg-options "-O" }
-// { dg-do run { target { powerpc_spe && { ! *-*-vxworks* } } } }
-
-extern "C" void abort (void);
-extern 

[PATCH] testsuite, rs6000: Remove powerpc_popcntb_ok

2024-05-08 Thread Kewen.Lin
Hi,

There are three uses of effective target powerpc_popcntb_ok,
they are all for compiling, but powerpc_popcntb_ok checks
for executable generation, which is too heavy.  This patch
is to remove powerpc_popcntb_ok and adjust its three uses
accordingly.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_powerpc_popcntb_ok):
Remove.
* gcc.target/powerpc/cmpb-2.c: Adjust with dg-skip-if as
powerpc_popcntb_ok gets removed.
* gcc.target/powerpc/cmpb-3.c: Likewise.
* gcc.target/powerpc/cmpb32-2.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/cmpb-3.c   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c |  3 ++-
 gcc/testsuite/lib/target-supports.exp   | 20 
 4 files changed, 6 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
index 02b84d0731d..44a554bee4a 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
 /* { dg-options "-mdejagnu-cpu=power5" } */

 void abort ();
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
index 75641bdb22c..43de37a571d 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb-3.c
@@ -1,6 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-require-effective-target ilp32 } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
 /* { dg-options "-mdejagnu-cpu=power6" } */

 void abort ();
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c 
b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
index d4264ab6e7d..0713c44fcff 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
@@ -1,5 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-require-effective-target powerpc_popcntb_ok } */
+/* Skip powerpc*-*-darwin* powerpc-*-eabi as dropped popcntb_ok.  */
+/* { dg-skip-if "" { powerpc*-*-darwin* powerpc-*-eabi } } */
 /* { dg-options "-mdejagnu-cpu=power5" } */

 void abort ();
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 16dc2766850..5f34f02c387 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3952,26 +3952,6 @@ proc check_effective_target_unsigned_char {} {
 }]
 }

-proc check_effective_target_powerpc_popcntb_ok { } {
-return [check_cached_effective_target powerpc_popcntb_ok {
-
-   # Disable on Darwin.
-   if { [istarget powerpc-*-eabi] || [istarget powerpc*-*-eabispe] || 
[istarget *-*-darwin*]} {
-   expr 0
-   } else {
-   check_runtime_nocache powerpc_popcntb_ok {
-   volatile int r;
-   volatile int a = 0x12345678;
-   int main()
-   {
-   asm volatile ("popcntb %0,%1" : "=r" (r) : "r" (a));
-   return 0;
-   }
-   } "-mcpu=power5"
-   }
-}]
-}
-
 # Return 1 if the target supports executing DFP hardware instructions,
 # 0 otherwise.  Cache the result.

--
2.39.1


[PATCH 1/2] testsuite, rs6000: Make powerpc_vsx consider current_compiler_flags [PR114842]

2024-05-08 Thread Kewen.Lin
Hi,

As noted in PR114842, most of the test cases which require
effective target check powerpc_vsx_ok actually care about
if VSX feature is enabled, and they should adopt effective
target powerpc_vsx instead.  By considering we already have
a number of test cases having explicit -mvsx in dg-options
etc., to keep them still be tested as before even without
vsx enabled by default, this patch is to make powerpc_vsx
consider current_compiler_flags.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

PR testsuite/114842

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_powerpc_vsx): Take
current_compiler_flags into account.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 969456281c7..713898d5554 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7144,7 +7144,7 @@ proc check_effective_target_powerpc_vsx { } {
  nope no vsx
#endif
}
-}]
+} [current_compiler_flags]]
 }

 # Return 1 if this is a PowerPC target supporting -mvsx
--
2.39.1


[PATCH] testsuite, rs6000: Remove effective target powerpc_405_nocache

2024-05-08 Thread Kewen.Lin
Hi,

With the introduction of -mdejagnu-cpu=, when the test case
is specifying -mdejagnu-cpu=405, it would override the other
possibly given -mcpu=, so it would compile for PowerPC 405
for sure.  This patch is to remove the effective target
powerpc_405_nocache and update all its uses.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/405-dlmzb-strlen-1.c: Remove the line using
powerpc_405_nocache check.
* gcc.target/powerpc/405-macchw-1.c: Likewise.
* gcc.target/powerpc/405-macchw-2.c: Likewise.
* gcc.target/powerpc/405-macchwu-1.c: Likewise.
* gcc.target/powerpc/405-macchwu-2.c: Likewise.
* gcc.target/powerpc/405-machhw-1.c: Likewise.
* gcc.target/powerpc/405-machhw-2.c: Likewise.
* gcc.target/powerpc/405-machhwu-1.c: Likewise.
* gcc.target/powerpc/405-machhwu-2.c: Likewise.
* gcc.target/powerpc/405-maclhw-1.c: Likewise.
* gcc.target/powerpc/405-maclhw-2.c: Likewise.
* gcc.target/powerpc/405-maclhwu-1.c: Likewise.
* gcc.target/powerpc/405-maclhwu-2.c: Likewise.
* gcc.target/powerpc/405-mulchw-1.c: Likewise.
* gcc.target/powerpc/405-mulchw-2.c: Likewise.
* gcc.target/powerpc/405-mulchwu-1.c: Likewise.
* gcc.target/powerpc/405-mulchwu-2.c: Likewise.
* gcc.target/powerpc/405-mulhhw-1.c: Likewise.
* gcc.target/powerpc/405-mulhhw-2.c: Likewise.
* gcc.target/powerpc/405-mulhhwu-1.c: Likewise.
* gcc.target/powerpc/405-mulhhwu-2.c: Likewise.
* gcc.target/powerpc/405-mullhw-1.c: Likewise.
* gcc.target/powerpc/405-mullhw-2.c: Likewise.
* gcc.target/powerpc/405-mullhwu-1.c: Likewise.
* gcc.target/powerpc/405-mullhwu-2.c: Likewise.
* gcc.target/powerpc/405-nmacchw-1.c: Likewise.
* gcc.target/powerpc/405-nmacchw-2.c: Likewise.
* gcc.target/powerpc/405-nmachhw-1.c: Likewise.
* gcc.target/powerpc/405-nmachhw-2.c: Likewise.
* gcc.target/powerpc/405-nmaclhw-1.c: Likewise.
* gcc.target/powerpc/405-nmaclhw-2.c: Likewise.
* lib/target-supports.exp
(check_effective_target_powerpc_405_nocache): Remove.
---
 .../gcc.target/powerpc/405-dlmzb-strlen-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-macchw-1.c |  6 +-
 gcc/testsuite/gcc.target/powerpc/405-macchw-2.c |  1 -
 .../gcc.target/powerpc/405-macchwu-1.c  |  1 -
 .../gcc.target/powerpc/405-macchwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-machhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-machhw-2.c |  1 -
 .../gcc.target/powerpc/405-machhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-machhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-maclhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-maclhw-2.c |  1 -
 .../gcc.target/powerpc/405-maclhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-maclhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulchw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulchw-2.c |  1 -
 .../gcc.target/powerpc/405-mulchwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mulchwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulhhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mulhhw-2.c |  1 -
 .../gcc.target/powerpc/405-mulhhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mulhhwu-2.c  |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mullhw-1.c |  1 -
 gcc/testsuite/gcc.target/powerpc/405-mullhw-2.c |  1 -
 .../gcc.target/powerpc/405-mullhwu-1.c  |  1 -
 .../gcc.target/powerpc/405-mullhwu-2.c  |  1 -
 .../gcc.target/powerpc/405-nmacchw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmacchw-2.c  |  1 -
 .../gcc.target/powerpc/405-nmachhw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmachhw-2.c  |  1 -
 .../gcc.target/powerpc/405-nmaclhw-1.c  |  1 -
 .../gcc.target/powerpc/405-nmaclhw-2.c  |  1 -
 gcc/testsuite/lib/target-supports.exp   | 17 -
 32 files changed, 5 insertions(+), 48 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c 
b/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
index 5ee427a3b4a..984ffe7144c 100644
--- a/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/405-dlmzb-strlen-1.c
@@ -4,7 +4,6 @@
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
 /* { dg-require-effective-target ilp32 } */
 /* { dg-options "-O2 -mdejagnu-cpu=405" } */
-/* { dg-skip-if "other options override -mcpu=405" { ! powerpc_405_nocache } } 
*/

 /* { dg-final { scan-assembler "dlmzb\\. " } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c 
b/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c
index 2253a9c9deb..10ea9cc10f8 100644
--- a/gcc/testsuite/gcc.target/powerpc/405-macchw-1.c
+++ 

[PATCH] libgcc, rs6000: Remove powerpcspe related code

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-4728 the powerpcspe support had been removed, this
follow-up patch is to remove the remaining pieces in libgcc.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

libgcc/ChangeLog:

* config.host: Remove powerpc-*-eabispe* support.
* config/rs6000/linux-unwind.h (ppc_fallback_frame_state): Remove
__SPE__ code.
* config/rs6000/t-savresfgpr (LIB2ADD_ST): Remove e500crtres32gpr.S,
e500crtres32gpr.S, e500crtsav64gpr.S, e500crtsav64gprctr.S,
e500crtres64gpr.S, e500crtsav32gpr.S, e500crtsavg32gpr.S,
e500crtres64gprctr.S, e500crtsavg64gprctr.S, e500crtresx32gpr.S,
e500crtrest32gpr.S, e500crtrest64gpr.S and e500crtresx64gpr.S.
* config/rs6000/e500crtres32gpr.S: Remove.
* config/rs6000/e500crtres64gpr.S: Remove.
* config/rs6000/e500crtres64gprctr.S: Remove.
* config/rs6000/e500crtrest32gpr.S: Remove.
* config/rs6000/e500crtrest64gpr.S: Remove.
* config/rs6000/e500crtresx32gpr.S: Remove.
* config/rs6000/e500crtresx64gpr.S: Remove.
* config/rs6000/e500crtsav32gpr.S: Remove.
* config/rs6000/e500crtsav64gpr.S: Remove.
* config/rs6000/e500crtsav64gprctr.S: Remove.
* config/rs6000/e500crtsavg32gpr.S: Remove.
* config/rs6000/e500crtsavg64gpr.S: Remove.
* config/rs6000/e500crtsavg64gprctr.S: Remove.
---
 libgcc/config.host |  4 -
 libgcc/config/rs6000/e500crtres32gpr.S | 73 -
 libgcc/config/rs6000/e500crtres64gpr.S | 73 -
 libgcc/config/rs6000/e500crtres64gprctr.S  | 90 -
 libgcc/config/rs6000/e500crtrest32gpr.S| 75 --
 libgcc/config/rs6000/e500crtrest64gpr.S| 74 --
 libgcc/config/rs6000/e500crtresx32gpr.S| 75 --
 libgcc/config/rs6000/e500crtresx64gpr.S| 75 --
 libgcc/config/rs6000/e500crtsav32gpr.S | 73 -
 libgcc/config/rs6000/e500crtsav64gpr.S | 72 -
 libgcc/config/rs6000/e500crtsav64gprctr.S  | 91 --
 libgcc/config/rs6000/e500crtsavg32gpr.S| 73 -
 libgcc/config/rs6000/e500crtsavg64gpr.S| 73 -
 libgcc/config/rs6000/e500crtsavg64gprctr.S | 90 -
 libgcc/config/rs6000/linux-unwind.h| 11 ---
 libgcc/config/rs6000/t-savresfgpr  | 15 +---
 16 files changed, 1 insertion(+), 1036 deletions(-)
 delete mode 100644 libgcc/config/rs6000/e500crtres32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtres64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtres64gprctr.S
 delete mode 100644 libgcc/config/rs6000/e500crtrest32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtrest64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtresx32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtresx64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsav64gprctr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg32gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg64gpr.S
 delete mode 100644 libgcc/config/rs6000/e500crtsavg64gprctr.S

diff --git a/libgcc/config.host b/libgcc/config.host
index e75a7af647f..fe958caa040 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1236,10 +1236,6 @@ powerpc*-*-freebsd*)
 powerpc-*-netbsd*)
tmake_file="$tmake_file rs6000/t-netbsd rs6000/t-crtstuff"
;;
-powerpc-*-eabispe*)
-   tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-savresfgpr 
rs6000/t-crtstuff t-crtstuff-pic t-fdpbit"
-   extra_parts="$extra_parts crtbegin.o crtend.o crtbeginS.o crtendS.o 
crtbeginT.o ecrti.o ecrtn.o ncrti.o ncrtn.o"
-   ;;
 powerpc-*-eabisimaltivec*)
tmake_file="${tmake_file} rs6000/t-ppccomm rs6000/t-crtstuff 
t-crtstuff-pic t-fdpbit"
extra_parts="$extra_parts crtbegin.o crtend.o crtbeginS.o crtendS.o 
crtbeginT.o ecrti.o ecrtn.o ncrti.o ncrtn.o"
diff --git a/libgcc/config/rs6000/e500crtres32gpr.S 
b/libgcc/config/rs6000/e500crtres32gpr.S
deleted file mode 100644
index b19703073ca..000
--- a/libgcc/config/rs6000/e500crtres32gpr.S
+++ /dev/null
@@ -1,73 +0,0 @@
-/*
- * Special support for e500 eabi and SVR4
- *
- *   Copyright (C) 2008-2024 Free Software Foundation, Inc.
- *   Written by Nathan Froyd
- *
- * This file is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License as published by the
- * Free Software Foundation; either version 3, or (at your option) any
- * later version.
- *
- * This file is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * 

[PATCH] rs6000: Add assert !TARGET_VSX if !TARGET_ALTIVEC and strip a useless check

2024-05-08 Thread Kewen.Lin
Hi,

In function rs6000_option_override_internal, we have the
checks and adjustments like:

  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;

  if (TARGET_P8_VECTOR && !TARGET_VSX)
rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;

But in fact some previous code has guaranteed !TARGET_VSX if
!TARGET_ALTIVEC, so we can remove the former check and
adjustment.  This patch is to remove it accordingly and also
place an explicit assertion.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
useless check on TARGET_P8_VECTOR && !TARGET_ALTIVEC and add an
assertion on !TARGET_VSX if !TARGET_ALTIVEC.
---
 gcc/config/rs6000/rs6000.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 117999613d8..b5553e27aa3 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3937,8 +3937,9 @@ rs6000_option_override_internal (bool global_init_p)
   rs6000_isa_flags &= ~OPTION_MASK_FPRND;
 }

-  if (TARGET_P8_VECTOR && !TARGET_ALTIVEC)
-rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
+  /* Assert !TARGET_VSX if !TARGET_ALTIVEC and make some adjustments
+ based on either !TARGET_VSX or !TARGET_ALTIVEC concise.  */
+  gcc_assert (TARGET_ALTIVEC || !TARGET_VSX);

   if (TARGET_P8_VECTOR && !TARGET_VSX)
 rs6000_isa_flags &= ~OPTION_MASK_P8_VECTOR;
--
2.39.1


[PATCH] testsuite, rs6000: Remove all linux*paired* checks and cases

2024-05-08 Thread Kewen.Lin
Hi,

Since r9-115-g559289370f76bf the support of paired single
had been dropped, but we still have some test checks and
cases for that, this patch is to get rid of them.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_effective_target_vect_int): Remove
the check on powerpc-*-linux*paired*.
(check_effective_target_vect_intfloat_cvt): Likewise.
(check_effective_target_vect_uintfloat_cvt): Likewise.
(check_effective_target_vect_floatint_cvt): Likewise.
(check_effective_target_vect_floatuint_cvt): Likewise.
(check_effective_target_powerpc_altivec_ok): Likewise.
(check_effective_target_powerpc_p9modulo_ok): Likewise.
(check_effective_target_powerpc_float128_sw_ok): Likewise.
(check_effective_target_powerpc_float128_hw_ok): Likewise.
(check_effective_target_powerpc_vsx_ok): Likewise.
(check_effective_target_powerpc_htm_ok): Likewise.
(check_effective_target_vect_shift): Likewise.
(check_effective_target_vect_char_add): Likewise.
(check_effective_target_vect_shift_char): Likewise.
(check_effective_target_vect_long): Likewise.
(check_effective_target_ifn_copysign): Likewise.
(check_effective_target_vect_sdot_hi): Likewise.
(check_effective_target_vect_udot_hi): Likewise.
(check_effective_target_vect_pack_trunc): Likewise.
(check_effective_target_vect_int_mult): Likewise.
* gcc.target/powerpc/paired-1.c: Remove.
* gcc.target/powerpc/paired-10.c: Remove.
* gcc.target/powerpc/paired-2.c: Remove.
* gcc.target/powerpc/paired-3.c: Remove.
* gcc.target/powerpc/paired-4.c: Remove.
* gcc.target/powerpc/paired-5.c: Remove.
* gcc.target/powerpc/paired-6.c: Remove.
* gcc.target/powerpc/paired-7.c: Remove.
* gcc.target/powerpc/paired-8.c: Remove.
* gcc.target/powerpc/paired-9.c: Remove.
* gcc.target/powerpc/ppc-paired.c: Remove.
---
 gcc/testsuite/gcc.target/powerpc/paired-1.c   | 33 ---
 gcc/testsuite/gcc.target/powerpc/paired-10.c  | 25 
 gcc/testsuite/gcc.target/powerpc/paired-2.c   | 35 ---
 gcc/testsuite/gcc.target/powerpc/paired-3.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-4.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-5.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-6.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-7.c   | 34 ---
 gcc/testsuite/gcc.target/powerpc/paired-8.c   | 25 
 gcc/testsuite/gcc.target/powerpc/paired-9.c   | 25 
 gcc/testsuite/gcc.target/powerpc/ppc-paired.c | 45 --
 gcc/testsuite/lib/target-supports.exp | 59 +++
 12 files changed, 20 insertions(+), 397 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-1.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-10.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-3.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-4.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-5.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-6.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/paired-9.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/ppc-paired.c

diff --git a/gcc/testsuite/gcc.target/powerpc/paired-1.c 
b/gcc/testsuite/gcc.target/powerpc/paired-1.c
deleted file mode 100644
index 19a66a15b30..000
--- a/gcc/testsuite/gcc.target/powerpc/paired-1.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-do compile { target { powerpc-*-linux*paired* && ilp32} } } */
-/* { dg-options "-mpaired -ffinite-math-only " } */
-
-/* Test PowerPC PAIRED extensions.  */
-
-#include 
-
-static float in1[2] __attribute__ ((aligned (8))) =
-{6.0, 7.0};
-static float in2[2] __attribute__ ((aligned (8))) =
-{4.0, 3.0};
-
-static float out[2] __attribute__ ((aligned (8)));
-
-vector float a, b, c, d;
-void
-test_api ()
-{
-  b = paired_lx (0, in1);
-  c = paired_lx (0, in2);
-
-  a = paired_sub (b, c);
-
-  paired_stx (a, 0, out);
-}
-
-int
-main ()
-{
-  test_api ();
-  return (0);
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/paired-10.c 
b/gcc/testsuite/gcc.target/powerpc/paired-10.c
deleted file mode 100644
index 1f904c25841..000
--- a/gcc/testsuite/gcc.target/powerpc/paired-10.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* { dg-do compile { target { powerpc-*-linux*paired* && ilp32 } } } */
-/* { dg-options "-mpaired -ffinite-math-only " } */
-
-/* Test PowerPC PAIRED extensions.  */
-
-#include 
-
-static float out[2] __attribute__ ((aligned (8)));
-void

[PATCH] testsuite, rs6000: Remove some checks with aix[456]

2024-05-08 Thread Kewen.Lin
Hi,

Since r12-75-g0745b6fa66c69c aix6 support had been dropped,
so we don't need to check for aix[456].* when testing, this
patch is to remove such checks.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_powerpc_altivec_ok): Remove checks for
aix[456].*
(check_effective_target_powerpc_p9modulo_ok): Likewise.
(check_effective_target_powerpc_float128_sw_ok): Likewise.
(check_effective_target_powerpc_float128_hw_ok): Likewise.
(check_effective_target_powerpc_vsx_ok): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 29 ---
 1 file changed, 29 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 3a55b2a4159..16dc2766850 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6963,11 +6963,6 @@ proc check_effective_target_powerpc_altivec_ok { } {
 # Paired Single, then not ok
 if { [istarget powerpc-*-linux*paired*] } { return 0 }

-# AltiVec is not supported on AIX before 5.3.
-if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } { return 0 }
-
 # Return true iff compiling with -maltivec does not error.
 return [check_no_compiler_messages powerpc_altivec_ok object {
int dummy;
@@ -6980,12 +6975,6 @@ proc check_effective_target_powerpc_p9modulo_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
return [check_no_compiler_messages powerpc_p9modulo_ok object {
int main (void) {
int i = 5, j = 3, r = -1;
@@ -7116,12 +7105,6 @@ proc check_effective_target_powerpc_float128_sw_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
# Darwin doesn't have VSX, so no soft support for float128.
if { [istarget *-*-darwin*] } {
return 0
@@ -7146,12 +7129,6 @@ proc check_effective_target_powerpc_float128_hw_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # AltiVec is not supported on AIX before 5.3.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5.1*]
-|| [istarget powerpc*-*-aix5.2*] } {
-   return 0
-   }
# Darwin doesn't run on any machine with float128 h/w so far.
if { [istarget *-*-darwin*] } {
return 0
@@ -7215,12 +7192,6 @@ proc check_effective_target_powerpc_vsx_ok { } {
 if { ([istarget powerpc*-*-*]
  && ![istarget powerpc-*-linux*paired*])
 || [istarget rs6000-*-*] } {
-   # VSX is not supported on AIX before 7.1.
-   if { [istarget powerpc*-*-aix4*]
-|| [istarget powerpc*-*-aix5*]
-|| [istarget powerpc*-*-aix6*] } {
-   return 0
-   }
# Darwin doesn't have VSX, even if it's used with an assembler
# which recognises the insns.
if { [istarget *-*-darwin*] } {
--
2.39.1


[PATCH] testsuite: Fix typo in torture/vector-{1,2}.c

2024-05-08 Thread Kewen.Lin
Hi,

When making some clean up patches, I happened to find test
cases vector-{1,2}.c are having typo "powerpc64--*-*" in
target selector, which should be powerpc64-*-*.  The reason
why we didn't catch before is that all our testing machines
support VMX insns, so it passes always.  But it would break
if a test machine doesn't support that, so this patch is to
fix it to ensure robustness.

Regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/testsuite/ChangeLog:

* gcc.dg/torture/vector-1.c: Fix typo.
* gcc.dg/torture/vector-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/torture/vector-1.c | 2 +-
 gcc/testsuite/gcc.dg/torture/vector-2.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/vector-1.c 
b/gcc/testsuite/gcc.dg/torture/vector-1.c
index 205fee6d6de..1b98ee26ff3 100644
--- a/gcc/testsuite/gcc.dg/torture/vector-1.c
+++ b/gcc/testsuite/gcc.dg/torture/vector-1.c
@@ -4,7 +4,7 @@
 /* { dg-options "-msse" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-require-effective-target sse_runtime { target { i?86-*-* x86_64-*-* } 
} } */
 /* { dg-options "-mabi=altivec" { target { powerpc-*-* powerpc64-*-* } } } */
-/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64--*-* 
} } } */
+/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64-*-* } 
} } */

 #define vector __attribute__((vector_size(16) ))

diff --git a/gcc/testsuite/gcc.dg/torture/vector-2.c 
b/gcc/testsuite/gcc.dg/torture/vector-2.c
index b004d005775..c9a3a44d4df 100644
--- a/gcc/testsuite/gcc.dg/torture/vector-2.c
+++ b/gcc/testsuite/gcc.dg/torture/vector-2.c
@@ -4,7 +4,7 @@
 /* { dg-options "-msse" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-require-effective-target sse_runtime { target { i?86-*-* x86_64-*-* } 
} } */
 /* { dg-options "-mabi=altivec" { target { powerpc-*-* powerpc64-*-* } } } */
-/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64--*-* 
} } } */
+/* { dg-require-effective-target vmx_hw { target { powerpc-*-* powerpc64-*-* } 
} } */

 #define vector __attribute__((vector_size(16) ))

--
2.39.1


[PATCH] rs6000: Drop useless vector_{load,store}_ defines

2024-05-08 Thread Kewen.Lin
Hi,

When I was working on a patch to get rid of TFmode, I
noticed that define_expands vector_load_ and
vector_store_ are useless.  This patch is to clean up
both.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/vector.md (define_expand vector_load_): Remove.
(vector_store_): Likewise.
---
 gcc/config/rs6000/vector.md | 14 --
 1 file changed, 14 deletions(-)

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index f9796fb3781..59489e06839 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -163,20 +163,6 @@ (define_expand "mov"
 }
 })

-;; Generic vector floating point load/store instructions.  These will match
-;; insns defined in vsx.md or altivec.md depending on the switches.
-(define_expand "vector_load_"
-  [(set (match_operand:VEC_M 0 "vfloat_operand")
-   (match_operand:VEC_M 1 "memory_operand"))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "")
-
-(define_expand "vector_store_"
-  [(set (match_operand:VEC_M 0 "memory_operand")
-   (match_operand:VEC_M 1 "vfloat_operand"))]
-  "VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)"
-  "")
-
 ;; Splits if a GPR register was chosen for the move
 (define_split
   [(set (match_operand:VEC_L 0 "nonimmediate_operand")
--
2.39.1


[PATCH] rs6000: Remove useless entries in rreg

2024-05-08 Thread Kewen.Lin
Hi,

When I was working on a trial patch to get rid of TFmode,
I noticed that mode attribute rreg only gets used for mode
iterator SFDF, it means that only SF and DF key-value pairs
are useful, the other are useless, so this patch is to clean
up them.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.md (mode attribute rreg): Remove useless
entries with modes TF, TD, V4SF and V2DF.
---
 gcc/config/rs6000/rs6000.md | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..4b70b50edca 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -725,11 +725,7 @@ (define_mode_attr ptrm [(SI "m")
(DI "Y")])

 (define_mode_attr rreg [(SF   "f")
-   (DF   "wa")
-   (TF   "f")
-   (TD   "f")
-   (V4SF "wa")
-   (V2DF "wa")])
+   (DF   "wa")])

 (define_mode_attr rreg2 [(SF   "f")
 (DF   "d")])
--
2.39.1


[PATCH] rs6000: Remove useless operands[3]

2024-05-08 Thread Kewen.Lin
Hi,

As shown, three uses of operands[3] are totally useless, so
this patch is to remove them to avoid any confusion.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.md (@ieee_128bit_vsx_neg2): Remove
the use of operands[3].
(@ieee_128bit_vsx_neg2): Likewise.
(*ieee_128bit_vsx_nabs2): Likewise.
---
 gcc/config/rs6000/rs6000.md | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4b70b50edca..daae2f81061 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9256,7 +9256,6 @@ (define_insn_and_split "@ieee_128bit_vsx_neg2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
@@ -9285,7 +9284,6 @@ (define_insn_and_split "@ieee_128bit_vsx_abs2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
@@ -9317,7 +9315,6 @@ (define_insn_and_split "*ieee_128bit_vsx_nabs2"
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (V16QImode);

-  operands[3] = gen_reg_rtx (V16QImode);
   emit_insn (gen_ieee_128bit_negative_zero (operands[2]));
 }
   [(set_attr "length" "8")
--
2.39.1


[PATCH] rs6000: Clean up TF and TD check with FLOAT128_2REG_P

2024-05-08 Thread Kewen.Lin
Hi,

Commit r6-2116-g2c83faf86827bf did some clean up on TFmode
and TFmode check with FLOAT128_2REG_P, but it missed to
update an assertion, this patch is to make it align.

btw, it's noticed when I'm making a patch to get rid of
TFmode.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this soon if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000-call.cc (rs6000_darwin64_record_arg_recurse):
Clean up TFmode and TDmode check with FLOAT128_2REG_P.
---
 gcc/config/rs6000/rs6000-call.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 1f8f93a2ee7..a039ff75f3c 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -1391,7 +1391,7 @@ rs6000_darwin64_record_arg_recurse (CUMULATIVE_ARGS *cum, 
const_tree type,
if (cum->fregno + n_fpreg > FP_ARG_MAX_REG + 1)
  {
gcc_assert (cum->fregno == FP_ARG_MAX_REG
-   && (mode == TFmode || mode == TDmode));
+   && FLOAT128_2REG_P (mode));
/* Long double or _Decimal128 split over regs and memory.  */
mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode : DFmode;
cum->use_stack=1;
--
2.39.1


[PATCH] rs6000: Fix ICE on IEEE128 long double without vsx [PR114402]

2024-05-07 Thread Kewen.Lin
Hi,

As PR114402 shows, we supports IEEE128 format long double
even if there is no vsx support, but there is an ICE about
cbranch as the test case shows.  For now, we only supports
compare:CCFP pattern for IEEE128 fp if TARGET_FLOAT128_HW,
so in function rs6000_generate_compare we have a check with
!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode) to make
!TARGET_FLOAT128_HW IEEE128 fp handling go with libcall.
But unfortunately the IEEE128 without vsx support doesn't
meet FLOAT128_VECTOR_P (mode) so it goes further with an
unmatched compare:CCFP pattern which triggers ICE.

So this patch is to make rs6000_generate_compare consider
IEEE128 without vsx as well then it can end up with libcall.

Bootstrapped and regress-tested on powerpc64-linux-gnu
P8/P9 and powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/114402

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_generate_compare): Make IEEE128
handling without vsx go with libcall.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr114402.c: New test.
---
 gcc/config/rs6000/rs6000.cc |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr114402.c | 16 
 2 files changed, 18 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114402.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d6214bd672b..7ae6cf43da4 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15283,7 +15283,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
   rtx op0 = XEXP (cmp, 0);
   rtx op1 = XEXP (cmp, 1);

-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 comp_mode = CCmode;
   else if (FLOAT_MODE_P (mode))
 comp_mode = CCFPmode;
@@ -15315,7 +15315,7 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)

   /* IEEE 128-bit support in VSX registers when we do not have hardware
  support.  */
-  if (!TARGET_FLOAT128_HW && FLOAT128_VECTOR_P (mode))
+  if (!TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode))
 {
   rtx libfunc = NULL_RTX;
   bool check_nan = false;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr114402.c 
b/gcc/testsuite/gcc.target/powerpc/pr114402.c
new file mode 100644
index 000..b927138382f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr114402.c
@@ -0,0 +1,16 @@
+/* Explicitly disable VSX when VSX is on.  */
+/* { dg-options "-mno-vsx" { target powerpc_vsx } } */
+
+/* Verify there is no ICE.  */
+
+long double a;
+long double b;
+
+int
+foo ()
+{
+  if (a > b)
+return 0;
+  else
+return 1;
+}
--
2.39.1


[PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-05-07 Thread Kewen.Lin
Hi,

As the discussion in PR112980, although the current
implementation for -fpatchable-function-entry* conforms
with the documentation (making N NOPs be consecutive),
it's inefficient for both kernel and userspace livepatching
(see comments in PR for the details).

So this patch is to change the current implementation by
emitting the "before" NOPs before global entry point and
the "after" NOPs after local entry point.  The new behavior
would not keep NOPs to be consecutive, so the documentation
is updated to emphasize this.

Bootstrapped and regress-tested on powerpc64-linux-gnu
P8/P9 and powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?  And backporting to active branches
after burn-in time?  I guess we should also mention this
change in changes.html?

BR,
Kewen
-
PR target/112980

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
Adjust the handling on patch area emitting with dual entry, remove
the restriction on "before" NOPs count, not emit "before" NOPs any
more but only emit "after" NOPs.
* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Adjust by respecting cfun->machine->stop_patch_area_print.
(rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
cfun->machine->stop_patch_area_print as true.
* config/rs6000/rs6000.h (struct machine_function): Remove member
global_entry_emitted, add new member stop_patch_area_print.
* doc/invoke.texi (option -fpatchable-function-entry): Adjust the
documentation for PowerPC ELFv2 dual entry.

gcc/testsuite/ChangeLog:

* c-c++-common/patchable_function_entry-default.c: Adjust.
* gcc.target/powerpc/pr99888-4.c: Likewise.
* gcc.target/powerpc/pr99888-5.c: Likewise.
* gcc.target/powerpc/pr99888-6.c: Likewise.
---
 gcc/config/rs6000/rs6000-logue.cc | 40 +--
 gcc/config/rs6000/rs6000.cc   | 15 +--
 gcc/config/rs6000/rs6000.h| 10 +++--
 gcc/doc/invoke.texi   |  8 ++--
 .../patchable_function_entry-default.c|  3 --
 gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
 gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
 gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
 8 files changed, 33 insertions(+), 55 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 60ba15a8bc3..0eb019b44b3 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
  fprintf (file, "\tadd 2,2,12\n");
}

-  unsigned short patch_area_size = crtl->patch_area_size;
-  unsigned short patch_area_entry = crtl->patch_area_entry;
-  /* Need to emit the patching area.  */
-  if (patch_area_size > 0)
-   {
- cfun->machine->global_entry_emitted = true;
- /* As ELFv2 ABI shows, the allowable bytes between the global
-and local entry points are 0, 4, 8, 16, 32 and 64 when
-there is a local entry point.  Considering there are two
-non-prefixed instructions for global entry point prologue
-(8 bytes), the count for patchable nops before local entry
-point would be 2, 6 and 14.  It's possible to support those
-other counts of nops by not making a local entry point, but
-we don't have clear use cases for them, so leave them
-unsupported for now.  */
- if (patch_area_entry > 0)
-   {
- if (patch_area_entry != 2
- && patch_area_entry != 6
- && patch_area_entry != 14)
-   error ("unsupported number of nops before function entry (%u)",
-  patch_area_entry);
- rs6000_print_patchable_function_entry (file, patch_area_entry,
-true);
- patch_area_size -= patch_area_entry;
-   }
-   }
-
   fputs ("\t.localentry\t", file);
   assemble_name (file, name);
   fputs (",.-", file);
   assemble_name (file, name);
   fputs ("\n", file);
   /* Emit the nops after local entry.  */
-  if (patch_area_size > 0)
-   rs6000_print_patchable_function_entry (file, patch_area_size,
-  patch_area_entry == 0);
+  unsigned short patch_area_size = crtl->patch_area_size;
+  unsigned short patch_area_entry = crtl->patch_area_entry;
+  if (patch_area_size > patch_area_entry)
+   {
+ cfun->machine->stop_patch_area_print = false;
+ patch_area_size -= patch_area_entry;
+ rs6000_print_patchable_function_entry (file, patch_area_size,
+patch_area_entry == 0);
+   }
 }

   else if (rs6000_pcrel_p ())
diff --git 

[PATCH 4/4] tree: Remove KFmode workaround [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

The fix for PR112993 makes KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with mode precision.  So this patch is to
remove KFmode workaround.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk if {1,2}/4 in this series get landed?

BR,
Kewen
-

PR target/112993

gcc/ChangeLog:

* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.
---
 gcc/tree.cc | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index f801712c9dd..f730981ec8b 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9575,15 +9575,6 @@ build_common_tree_nodes (bool signed_char)
   if (!targetm.floatn_mode (n, extended).exists ())
continue;
   int precision = GET_MODE_PRECISION (mode);
-  /* Work around the rs6000 KFmode having precision 113 not
-128.  */
-  const struct real_format *fmt = REAL_MODE_FORMAT (mode);
-  gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
-  int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
-  if (!extended)
-   gcc_assert (min_precision == n);
-  if (precision < min_precision)
-   precision = min_precision;
   FLOATN_NX_TYPE_NODE (i) = make_node (REAL_TYPE);
   TYPE_PRECISION (FLOATN_NX_TYPE_NODE (i)) = precision;
   layout_type (FLOATN_NX_TYPE_NODE (i));
--
2.39.1


[PATCH 3/4] ranger: Revert the workaround introduced in PR112788 [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

This reverts commit r14-6478-gfda8e2f8292a90 "range:
Workaround different type precision between _Float128 and
long double [PR112788]" as the fixes for PR112993 make
all 128 bits scalar floating point have the same 128 bit
precision, this workaround isn't needed any more.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk if {1,2}/4 in this series get landed?

BR,
Kewen
-

PR target/112993

gcc/ChangeLog:

* value-range.h (range_compatible_p): Remove the workaround on
different type precision between _Float128 and long double.
---
 gcc/value-range.h | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/gcc/value-range.h b/gcc/value-range.h
index 9531df56988..39de7daf3d9 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -1558,13 +1558,7 @@ range_compatible_p (tree type1, tree type2)
   // types_compatible_p requires conversion in both directions to be useless.
   // GIMPLE only requires a cast one way in order to be compatible.
   // Ranges really only need the sign and precision to be the same.
-  return TYPE_SIGN (type1) == TYPE_SIGN (type2)
-&& (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
-// FIXME: As PR112788 shows, for now on rs6000 _Float128 has
-// type precision 128 while long double has type precision 127
-// but both have the same mode so their precision is actually
-// the same, workaround it temporarily.
-|| (SCALAR_FLOAT_TYPE_P (type1)
-&& TYPE_MODE (type1) == TYPE_MODE (type2)));
+  return (TYPE_PRECISION (type1) == TYPE_PRECISION (type2)
+ && TYPE_SIGN (type1) == TYPE_SIGN (type2));
 }
 #endif // GCC_VALUE_RANGE_H
--
2.39.1


[PATCH 2/4] fortran: Teach get_real_kind_from_node for Power 128 fp modes [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

Previously effective target fortran_real_c_float128 never
passes on Power regardless of the default 128 long double
is ibmlongdouble or ieeelongdouble.  It's due to that TF
mode is always used for kind 16 real, which has precision
127, while the node float128_type_node for c_float128 has
128 type precision, get_real_kind_from_node can't find a
matching as it only checks gfc_real_kinds[i].mode_precision
and type precision.

With changing TFmode/IFmode/KFmode to have the same mode
precision 128, now fortran_real_c_float12 can pass with
ieeelongdouble enabled by default and test cases guarded
with it get tested accordingly.  But with ibmlongdouble
enabled by default, since TFmode has precision 128 which
is the same as type precision 128 of float128_type_node,
get_real_kind_from_node considers kind for TFmode matches
float128_type_node, but it's wrong as at this time point
TFmode is with ibm extended format.  So this patch is to
teach get_real_kind_from_node to check one more field which
can be differentiable from the underlying real format, it
can avoid the unexpected matching when there more than one
modes have the same precision.

Bootstrapped and regress-tested on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

BR,
Kewen
-
PR target/112993

gcc/fortran/ChangeLog:

* trans-types.cc (get_real_kind_from_node): Consider the case where
more than one modes have the same precision.
---
 gcc/fortran/trans-types.cc | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 676014e9b98..dd94ef77741 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -183,7 +183,21 @@ get_real_kind_from_node (tree type)

   for (i = 0; gfc_real_kinds[i].kind != 0; i++)
 if (gfc_real_kinds[i].mode_precision == TYPE_PRECISION (type))
-  return gfc_real_kinds[i].kind;
+  {
+   /* On Power, we have three 128-bit scalar floating-point modes
+  and all of their types have 128 bit type precision, so we
+  should check underlying real format details further.  */
+#if defined(HAVE_TFmode) && defined(HAVE_IFmode) && defined(HAVE_KFmode)
+   if (gfc_real_kinds[i].kind == 16)
+ {
+   machine_mode mode = TYPE_MODE (type);
+   const struct real_format *fmt = REAL_MODE_FORMAT (mode);
+   if (fmt->p != gfc_real_kinds[i].digits)
+ continue;
+ }
+#endif
+   return gfc_real_kinds[i].kind;
+  }

   return -4;
 }
--
2.39.1


[PATCH 1/4] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-05-07 Thread Kewen.Lin
Hi,

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  To make build_common_tree_nodes
be able to find the correct mode for long double type node,
it introduces one hook mode_for_longdouble to offer target
a way to specify the mode used for long double type node.
Previously I tried to put the order as TF/KF/IF then long
double type node can pick up the TF as expected, but we
need to teach some functions not to try the conversions
from IF(TF) to KF, one more important thing is that we
would further remove TF and leave only two modes for 128
bit floating point modes, without such hook the first 128
bit mode will be chosen for long double type node but
whichever we replace first would be possible not the
expected one as it actually depends on the selected long
double format.

In function convert_mode_scalar, it adopts sext_optab for
same precision modes conversion if !DECIMAL_FLOAT_MODE_P,
so we only need to support sext_optab for any possible
conversion.  Thus this patch removes some useless trunc
optab supports, supplements one new sext_optab which calls
the common handler rs6000_expand_float128_convert, unnames
two define_insn_and_split to avoid conflicts and make them
more clear.  Considering the current implementation that
there is no chance to have KF <-> IF conversion (since
either of them would be TF already), it adds two dummy
define_expands to assert this.

Bootstrapped and regress-tested (with patch 2/4) on:
  - powerpc64-linux-gnu P8/P9 (with ibm128 by default)
  - powerpc64le-linux-gnu P9/P10 (with ibm128 by default)
  - powerpc64le-linux-gnu P9 (with ieee128 by default)

Is it OK for trunk (especially the generic code change)?

[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(TARGET_C_MODE_FOR_LONGDOUBLE): New macro.
(rs6000_c_mode_for_longdouble): New hook implementation.
* config/rs6000/rs6000.md (define_expand trunciftf2): Remove.
(define_expand truncifkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand expandtfkf2, expandtfif2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Update with assert.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_C_MODE_FOR_LONGDOUBLE): New hook.
* target.def (mode_for_longdouble): Likewise.
* targhooks.cc (default_mode_for_longdouble): New hook default
implementation.
* targhooks.h (default_mode_for_longdouble): New hook declaration.
* tree.cc (build_common_tree_nodes): Call newly added hook
targetm.c.mode_for_longdouble.
---
 gcc/config/rs6000/rs6000-modes.def | 31 +
 gcc/config/rs6000/rs6000-modes.h   | 36 ---
 gcc/config/rs6000/rs6000.cc| 18 +---
 gcc/config/rs6000/rs6000.h |  5 ---
 gcc/config/rs6000/rs6000.md| 72 

Re: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128

2024-04-29 Thread Kewen.Lin
on 2024/4/29 15:20, Alexandre Oliva wrote:
> On Apr 28, 2024, "Kewen.Lin"  wrote:
> 
>> OK, from this perspective IMHO it seems more clear to adopt xfail
>> with effective target long_double_64bit?
> 
> That's effective target is quite broken, alas.  I doubt it's used
> anywhere: it calls an undefined proc, and its memcmp call seems to have
> the size cut from the 128-bit functions.  (a patchlet that
> fixes these most glaring issues is below)
> 
> Furthermore, it doesn't really work.  Since it adds -mlong-double-64 for
> the effective target test, it overrides the default, so it sort of
> always passes, even on a 128-bit long double target.  But since the test
> itself doesn't add that option, any xfails on long_double_64bit would be
> flagged as XPASS.
> 
> There's no effective target test for 64-bit long double that doesn't
> override the default, so we'd have to add one.  Alas, the natural name
> for it is the one that's taken with overriding behavior, and the current
> option-overriding tests, that need to be used along with the
> corresponding add-options in testcases, might benefit from a renaming to
> make them fit the already-established (?) naming standards.  Yuck.
> 

Oops, it's really out of my expectation, I just noticed that no test cases
are using this effective target and the commit r12-3151-g4c5d76a655b9ab
contributing this even doesn't adopt it.  Thanks for catching this and sorry
that I didn't check it before suggesting it, I think we can aggressively
drop this effective target instead to avoid any possible confusion.

CC Mike for this.

How about the generic one "longdouble64"?  I did a grep and found it has one
use, I'd expect it can work here. :)

gcc/testsuite//gcc.target/powerpc/pr99708.c:/* { dg-xfail-run-if "unsupported 
type __ibm128 with long-double-64" { longdouble64 } } */

BR,
Kewen

> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 182d80129de9b..603da25c97d67 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -2961,12 +2961,12 @@ proc check_effective_target_long_double_64bit { } {
> /* eliminate removing volatile cast warning.  */
> a2 = a;
> b2 = b;
> -   if (memcmp (, , 16) != 0)
> +   if (memcmp (, , 8) != 0)
>   return 1;
> sprintf (buffer, "%lg", b);
> return strcmp (buffer, "3") != 0;
>   }
> -}  [add_options_for_ppc_long_double_override_64bit ""]]
> +}  [add_options_for_long_double_64bit ""]]
>  }
>  
>  # Return the appropriate options to specify that long double uses the IEEE
> 
> 



Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b

2024-04-29 Thread Kewen.Lin
on 2024/4/29 14:28, Alexandre Oliva wrote:
> On Apr 28, 2024, "Kewen.Lin"  wrote:
> 
>> Nit: Maybe add a prefix "testsuite: ".
> 
> ACK
> 
>>>
>>> From: Kewen Lin 
> 
>> Thanks, you can just drop this.  :)
> 
> I've turned it into Co-Authored-By, since you insist.
> 
> But unfortunately with the patch it still fails when testing for
> -mcpu=power7 on ppc64le-linux-gnu: it does vectorize the loop with 13
> iterations.  We need 16 iterations, as in an earlier version of this
> test, for it to pass for -mcpu=power7, but then it doesn't pass for
> -mcpu=power6.
> 
> It looks like we're going to have to adjust the expectations.
> 

I had a look at the failure, it's due to that "vect_no_align" is
evaluated as true unexpectedly.

  "selector_expression: ` vect_no_align || {! vector_alignment_reachable} ' 1"

Currently powerpc* checks check_p8vector_hw_available, ppc64le-linux-gnu
has at least Power8 support (that is testing machine supports p8vector run),
so it concludes vect_no_align is true.

proc check_effective_target_vect_no_align { } {
return [check_cached_effective_target_indexed vect_no_align {
  expr { [istarget mipsisa64*-*-*]
 || [istarget mips-sde-elf]
 || [istarget sparc*-*-*]
 || [istarget ia64-*-*]
 || [check_effective_target_arm_vect_no_misalign]
 || ([istarget powerpc*-*-*] && [check_p8vector_hw_available])

I'll fix this in PR113535 which was filed previously for visiting powerpc
specific check in these vect* effective targets.  If the testing just goes
with native cpu type, this issue will become invisible.  I think you can
still push the patch as the testing just exposes another issue.

BR,
Kewen



Re: [PATCH] ppc: testsuite: pr79004 needs -mlong-double-128

2024-04-28 Thread Kewen.Lin
Hi,

on 2024/4/28 16:20, Alexandre Oliva wrote:
> On Apr 23, 2024, "Kewen.Lin"  wrote:
> 
>> This patch seemed to miss to CC gcc-patches list. :)
> 
> Oops, sorry, thanks for catching that.
> 
> Here it is.  FTR, you've already responded suggesting an apparent
> preference for addressing PR105359, but since I meant to contribute it,
> I'm reposting is to gcc-patches, now with a reference to the PR.

OK, from this perspective IMHO it seems more clear to adopt xfail
with effective target long_double_64bit?

BR,
Kewen

> 
> 
> ppc: testsuite: pr79004 needs -mlong-double-128
> 
> Some of the asm opcodes expected by pr79004 depend on
> -mlong-double-128 to be output.  E.g., without this flag, the
> conditions of patterns @extenddf2 and extendsf2 do not
> hold, and so GCC resorts to libcalls instead of even trying
> rs6000_expand_float128_convert.
> 
> Perhaps the conditions are too strict, and they could enable the use
> of conversion insns involving __ieee128/_Float128 even with 64-bit
> long doubles.  Alas, for now, we need this flag for the test to pass
> on target variants that use 64-bit long doubles.
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/powerpr/pr79004.c: Add -mlong-double-128.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr79004.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
> b/gcc/testsuite/gcc.target/powerpc/pr79004.c
> index e411702dc98a9..061a0e83fe2ad 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
>  /* { dg-require-effective-target powerpc_p9vector_ok } */
> -/* { dg-options "-mdejagnu-cpu=power9 -O2 -mfloat128" } */
> +/* { dg-options "-mdejagnu-cpu=power9 -O2 -mfloat128 -mlong-double-128" } */
>  /* { dg-prune-output ".-mfloat128. option may not be fully supported" } */
>  
>  #include 
> 
> 





Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b

2024-04-28 Thread Kewen.Lin
Hi,

on 2024/4/28 16:14, Alexandre Oliva wrote:
> On Apr 24, 2024, "Kewen.Lin"  wrote:
> 
>> For !has_arch_pwr7 case, it still adopts peeling but as the comment (one 
>> line above)
>> shows the original intention of this case is to expect not profitable for 
>> peeling
>> so it's not expected to be handled here, can we just tweak the loop bound 
>> instead,
>> such as:
> 
>> -#define N 14
>> +#define N 13
>>  #define OFF 4 
> 
>> ?, it can make this loop not profitable to be vectorized for !vect_no_align 
>> with
>> peeling (both pwr7 and pwr6) and keep consistent.
> 
> Like this?  I didn't feel I could claim authorship of this one-liner
> just because I turned it into a patch and tested it, so I took the
> liberty of turning your own words above into the commit message.  So

Feel free to do so!

> far, tested on ppc64le-linux-gnu (ppc9).  Testing with vxworks targets
> now.  Would you like to tweak the commit message to your liking?

OK, tweaked as below.

> Otherwise, is this ok to install?
> 
> Thanks,
> 
> 
> adjust iteration count for ppc costmodel 76b

Nit: Maybe add a prefix "testsuite: ".

> 
> From: Kewen Lin 

Thanks, you can just drop this.  :)

> 
> The original intention of this case is to expect not profitable for
> peeling.  Tweak the loop bound to make this loop not profitable to be
> vectorized for !vect_no_align with peeling (both pwr7 and pwr6) and
> keep consistent.

For some hardware which doesn't support unaligned vector memory access,
test case costmodel-vect-76b.c expects to see cost modeling would make
the decision that it's not profitable for peeling, according to the
commit history, test case comments and the way to check.

For now, the existing loop bound 14 works well for Power7, but it does
not for some targets on which the cost of operation vec_perm can be
different from Power7, such as: Power6, it's 3 vs. 1.  This difference
further causes the difference (10 vs. 12) on the minimum iteration for
profitability and cause the failure.  To keep the original test point,
this patch is to tweak the loop bound to ensure it's not profitable
to be vectorized for !vect_no_align with peeling.

OK for trunk (assuming the testings run well on p6/p7 too), thanks!

BR,
Kewen

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c (N): Tweak.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index cbbfbb24658f8..e48b0ab759e75 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -6,7 +6,7 @@
>  
>  /* On Power7 without misalign vector support, this case is to check it's not
> profitable to perform vectorization by peeling to align the store.  */
> -#define N 14
> +#define N 13
>  #define OFF 4
>  
>  /* Check handling of accesses for which the "initial condition" -
> 
> 



Re: [PATCH] adjust vectorization expectations for ppc costmodel 76b

2024-04-24 Thread Kewen.Lin
Hi,

on 2024/4/22 17:28, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566525.html
> 
> 
> This test expects vectorization at power8+ because strict alignment is
> not required for vectors.  For power7, vectorization is not to take
> place because it's not deemed profitable: 12 iterations would be
> required to make it so.
> 
> But for power6 and below, the test's 10 iterations are enough to make
> vectorization profitable, but the test doesn't expect this.  Assuming
> the decision is indeed appropriate, I'm adjusting the expectations.

For a record, the cost difference between power6 and power7 is the cost
for vec_perm, it's:

* p6 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 1 in body

vs.

* p7 *

ic[i_23] 2 times vector_stmt costs 2 in prologue
ic[i_23] 1 times vector_stmt costs 1 in prologue
ic[i_23] 1 times vector_load costs 2 in body
ic[i_23] 1 times vec_perm costs 3 in body

, it further cause minimum iters for profitability difference.

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c: Adjust
>   expectations for cpus below power7.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c |9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> index cbbfbb24658f8..0dab2c08acdb4 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-76b.c
> @@ -46,9 +46,10 @@ int main (void)
>return 0;
>  }
>  
> -/* Peeling to align the store is used. Overhead of peeling is too high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target 
> { vector_alignment_reachable && {! vect_no_align} } } } } */
> -/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" 
> { target { vector_alignment_reachable && {! vect_hw_misalign} } } } } */
> +/* Peeling to align the store is used. Overhead of peeling is too high
> +   for power7, but acceptable for earlier architectures.  */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { target 
> { has_arch_pwr7 && { vector_alignment_reachable && {! vect_no_align} } } } } 
> } */
> +/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" 
> { target { has_arch_pwr7 && { vector_alignment_reachable && {! 
> vect_hw_misalign} } } } } } */
>  
>  /* Versioning to align the store is used. Overhead of versioning is not too 
> high.  */
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_no_align || {! vector_alignment_reachable} } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_no_align || { {! vector_alignment_reachable} || {! has_arch_pwr7 } } } 
> } } } */

For !has_arch_pwr7 case, it still adopts peeling but as the comment (one line 
above)
shows the original intention of this case is to expect not profitable for 
peeling
so it's not expected to be handled here, can we just tweak the loop bound 
instead,
such as:

-#define N 14
+#define N 13
 #define OFF 4 

?, it can make this loop not profitable to be vectorized for !vect_no_align with
peeling (both pwr7 and pwr6) and keep consistent.

BR,
Kewen

> 
> 



Re: [PATCH v2] [testsuite] require sqrt_insn effective target where needed

2024-04-23 Thread Kewen.Lin
Hi,

on 2024/4/22 17:56, Alexandre Oliva wrote:
> This patch takes feedback received for 3 earlier patches, and adopts a
> simpler approach to skip the still-failing tests, that I believe to be
> in line with ppc maintainers' expressed preferences.
> https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565939.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566617.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566521.html
> Ping?-ish :-)
> 
> 
> Some tests fail on ppc and ppc64 when testing a compiler [with options
> for] for a CPU [emulator] that doesn't support the sqrt insn.
> 
> The gcc.dg/cdce3.c is one in which the expected shrink-wrap
> optimization only takes place when the target CPU supports a sqrt
> insn.
> 
> The gcc.target/powerpc/pr46728-1[0-4].c tests use -mpowerpc-gpopt and
> call sqrt(), which involves the sqrt insn that the target CPU under
> test may not support.
> 
> Require a sqrt_insn effective target for all the affected tests.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also testing
> with gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/cdce3.c: Require sqrt_insn effective target.
>   * gcc.target/powerpc/pr46728-10.c: Likewise.
>   * gcc.target/powerpc/pr46728-11.c: Likewise.
>   * gcc.target/powerpc/pr46728-13.c: Likewise.
>   * gcc.target/powerpc/pr46728-14.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/cdce3.c  |3 ++-
>  gcc/testsuite/gcc.target/powerpc/pr46728-10.c |1 +
>  gcc/testsuite/gcc.target/powerpc/pr46728-11.c |1 +
>  gcc/testsuite/gcc.target/powerpc/pr46728-13.c |1 +
>  gcc/testsuite/gcc.target/powerpc/pr46728-14.c |1 +
>  5 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/cdce3.c b/gcc/testsuite/gcc.dg/cdce3.c
> index 601ddf055fd71..f759a95972e8b 100644
> --- a/gcc/testsuite/gcc.dg/cdce3.c
> +++ b/gcc/testsuite/gcc.dg/cdce3.c
> @@ -1,7 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target hard_float } */
> +/* { dg-require-effective-target sqrt_insn } */
>  /* { dg-options "-O2 -fmath-errno -fdump-tree-cdce-details 
> -fdump-tree-optimized" } */
> -/* { dg-final { scan-tree-dump "cdce3.c:11: \[^\n\r]* function call is 
> shrink-wrapped into error conditions\." "cdce" } } */
> +/* { dg-final { scan-tree-dump "cdce3.c:12: \[^\n\r]* function call is 
> shrink-wrapped into error conditions\." "cdce" } } */
>  /* { dg-final { scan-tree-dump "sqrtf \\(\[^\n\r]*\\); \\\[tail call\\\]" 
> "optimized" } } */
>  /* { dg-skip-if "doesn't have a sqrtf insn" { mmix-*-* } } */
> 

This change needs an approval from global maintainer as it touches a generic 
test case?

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr46728-10.c 
> b/gcc/testsuite/gcc.target/powerpc/pr46728-10.c
> index 3be4728d333a4..7e9bb638106c2 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr46728-10.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr46728-10.c
> @@ -1,6 +1,7 @@
>  /* { dg-do run } */
>  /* { dg-skip-if "-mpowerpc-gpopt not supported" { powerpc*-*-darwin* } } */
>  /* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm 
> -mpowerpc-gpopt" } */
> +/* { dg-require-effective-target sqrt_insn } */

This change looks sensible to me.

Nit: With the proposed change, I'd expect that we can remove the line for 
powerpc*-*-darwin*.

CC Iain to confirm.

BR,
Kewen

> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr46728-11.c 
> b/gcc/testsuite/gcc.target/powerpc/pr46728-11.c
> index 43b6728a4b812..5bfa25925675a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr46728-11.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr46728-11.c
> @@ -1,6 +1,7 @@
>  /* { dg-do run } */
>  /* { dg-skip-if "-mpowerpc-gpopt not supported" { powerpc*-*-darwin* } } */
>  /* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm 
> -mpowerpc-gpopt" } */
> +/* { dg-require-effective-target sqrt_insn } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr46728-13.c 
> b/gcc/testsuite/gcc.target/powerpc/pr46728-13.c
> index b9fd63973b728..b66d0209a5e54 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr46728-13.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr46728-13.c
> @@ -1,6 +1,7 @@
>  /* { dg-do run } */
>  /* { dg-skip-if "-mpowerpc-gpopt not supported" { powerpc*-*-darwin* } } */
>  /* { dg-options "-O2 -ffast-math -fno-inline -fno-unroll-loops -lm 
> -mpowerpc-gpopt" } */
> +/* { dg-require-effective-target sqrt_insn } */
> 
>  #include 
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr46728-14.c 
> b/gcc/testsuite/gcc.target/powerpc/pr46728-14.c
> index 5a13bdb6c..71a1a70c4e7a2 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr46728-14.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr46728-14.c
> @@ -1,6 +1,7 @@
>  /* { dg-do run } */
>  /* { dg-skip-if "-mpowerpc-gpopt not supported" { powerpc*-*-darwin* } } */
>  /* { dg-options "-O2 -ffast-math -fno-inline 

Re: [PATCH v2] xfail fetestexcept test - ppc always uses fcmpu

2024-04-23 Thread Kewen.Lin
Hi,

on 2024/4/22 18:00, Alexandre Oliva wrote:
> On Mar 10, 2021, Joseph Myers  wrote:
> 
>> On Wed, 10 Mar 2021, Alexandre Oliva wrote:
>>> operand exception for quiet NaN.  I couldn't find any evidence that
>>> the rs6000 backend ever outputs fcmpo.  Therefore, I'm adding the same
>>> execution xfail marker to this test.
> 
>> In my view, such an XFAIL (for a GCC bug as opposed to an environmental 
>> issue) should have a comment pointing to a corresponding open bug in GCC 
>> Bugzilla.  In this case, that's bug 58684.
> 
> Thanks for the suggestion, yeah, that makes sense.  Fixed in v2 below.
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566523.html
> Ping?-ish
> 
> 
> gcc.dg/torture/pr91323.c tests that a compare with NaNf doesn't set an
> exception using builtin compare intrinsics, and that it does when
> using regular compare operators.
> 
> That doesn't seem to be expected to work on powerpc targets.  It fails
> on GNU/Linux, it's marked to be skipped on AIX, and a similar test,
> gcc.dg/torture/pr93133.c, has the execution test xfailed for all of
> powerpc*-*-*.
> 
> In this test, the functions that use intrinsics for the compare end up
> with the same code as the one that uses compare operators, using
> fcmpu, a floating compare that, unlike fcmpo, does not set the invalid
> operand exception for quiet NaN.  I couldn't find any evidence that
> the rs6000 backend ever outputs fcmpo.  Therefore, I'm adding the same
> execution xfail marker to this test.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR target/58684
>   * gcc.dg/torture/pr91323.c: Expect execution fail on
>   powerpc*-*-*.
> ---
>  gcc/testsuite/gcc.dg/torture/pr91323.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr91323.c 
> b/gcc/testsuite/gcc.dg/torture/pr91323.c
> index 1411fcaa3966c..f188faa3ccf47 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr91323.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr91323.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do run { xfail powerpc*-*-* } } */
> +/* The ppc xfail is because of PR target/58684.  */

OK, though the proposed comment is slightly different from what's in
the related commit r8-6445-g86145a19abf39f. :)  Thanks!

BR,
Kewen

>  /* { dg-add-options ieee } */
>  /* { dg-require-effective-target fenv_exceptions } */
>  /* { dg-skip-if "fenv" { powerpc-ibm-aix* } } */
> 
> 



Re: [PATCH] ppc: testsuite: vec-mul requires vsx runtime

2024-04-23 Thread Kewen.Lin
on 2024/4/22 17:35, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593947.html
> 
> 
> vec-mul is an execution test, but it only requires a powerpc_vsx_ok
> effective target, which is enough only for compile tests.  In order to
> To check for runtime and execution environment support, we need to
> require vsx_hw.  Make that a condition for execution, but still
> perform a compile test if the condition is not satisfied.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/powerpc/vec-mul.c: Run on target vsx_hw, just
>   compile otherwise.
> ---
>  gcc/testsuite/gcc.target/powerpc/vec-mul.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-mul.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-mul.c
> index bfcaf80719d1d..11da86159723f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-mul.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-mul.c
> @@ -1,4 +1,5 @@
> -/* { dg-do run } */
> +/* { dg-do compile { target { ! vsx_hw } } } */
> +/* { dg-do run { target vsx_hw } } */
>  /* { dg-require-effective-target powerpc_vsx_ok } */

Nit: It's useless to check powerpc_vsx_ok for vsx_hw, so powerpc_vsx_ok check
can be moved to be with ! vsx_hw.

OK with this nit tweaked, thanks!

BR,
Kewen

>  /* { dg-options "-mvsx -O3" } */
> 
> 
> 


Re: [PATCH] Request check for hw support in ppc run tests with -maltivec/-mvsx

2024-04-23 Thread Kewen.Lin
on 2024/4/22 17:31, Alexandre Oliva wrote:
> 
> From: Olivier Hainque 
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?

OK, thanks!

BR,
Kewen

> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/powerpc/swaps-p8-20.c: Change powerpc_altivec_ok
>   require-effective-target test into vmx_hw.
>   * gcc.target/powerpc/vsx-vector-5.c: Change powerpc_vsx_ok
>   require-effective-target test into vsx_hw.
> ---
>  gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c  |2 +-
>  gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c |5 +
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c 
> b/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c
> index 564e8acb1f421..755519bfe847d 100644
> --- a/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c
> +++ b/gcc/testsuite/gcc.target/powerpc/swaps-p8-20.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-require-effective-target vmx_hw } */
>  /* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
>  
>  /* The expansion for vector character multiply introduces a vperm operation.
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c
> index dcc88b1f3a4c6..37a324b6f897d 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-5.c
> @@ -1,11 +1,8 @@
>  /* { dg-do run { target lp64 } } */
>  /* { dg-skip-if "" { powerpc*-*-darwin* } } */
> -/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-require-effective-target vsx_hw } */
>  /* { dg-options "-mvsx -O2" } */
>  
> -/* This will run, and someday we should add the support to test whether we 
> are
> -   running on VSX hardware.  */
> -
>  #include 
>  #include 
>  
> 



Re: [PATCH] disable ldist for test, to restore vectorizing-candidate loop

2024-04-23 Thread Kewen.Lin
on 2024/4/22 17:27, Alexandre Oliva wrote:
> Ping?
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566524.html
> 
> The loop we're supposed to try to vectorize in
> gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c is turned into a memset
> before the vectorizer runs.
> 
> Various other tests in this set have already run into this, and the
> solution has been to disable this loop distribution transformation,
> enabled at -O2, so that the vectorizer gets a chance to transform the
> loop and, in this testcase, fail to do so.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?

OK, thanks!

BR,
Kewen

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c: Disable
>   ldist.
> ---
>  .../gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c 
> b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> index 454a714a30916..90b5d5a7f400b 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-31a.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fno-tree-loop-distribute-patterns" } */
> 
>  #include 
>  #include "../../tree-vect.h"
> 




Re: [PATCH] [testsuite] [ppc64] expect error on vxworks too

2024-04-23 Thread Kewen.Lin
on 2024/4/22 17:23, Alexandre Oliva wrote:
> 
> These ppc lp64 tests check for errors or warnings on -mno-powerpc64.
> On powerpc64-*-vxworks* we get the same errors as on most other
> covered platforms, but the tests did not mark them as expected for
> this target.  On powerpc-*-vxworks*, the tests are skipped because
> lp64 is not satisfied, so I'm naming powerpc*-*-vxworks* rather than
> something more specific.
> 
> Regstrapped on x86_64-linux-gnu and ppc64el-linux-gnu.  Also tested with
> gcc-13 on ppc64-vx7r2 and ppc-vx7r2.  Ok to install?

OK, thanks!

BR,
Kewen

> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * gcc.target/powerpc/pr106680-1.c: Error on vxworks too.
>   * gcc.target/powerpc/pr106680-2.c: Likewise.
>   * gcc.target/powerpc/pr106680-3.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/powerpc/pr106680-1.c |2 +-
>  gcc/testsuite/gcc.target/powerpc/pr106680-2.c |2 +-
>  gcc/testsuite/gcc.target/powerpc/pr106680-3.c |2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106680-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106680-1.c
> index d624d43230a7a..aadaa614cfeba 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr106680-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106680-1.c
> @@ -8,6 +8,6 @@ int foo ()
>return 1;
>  }
> 
> -/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* } 0 } */
> +/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* powerpc*-*-vxworks* } 
> 0 } */
>  /* { dg-warning "'-m64' requires PowerPC64 architecture, enabling" 
> "PR106680" { target powerpc*-*-darwin* } 0 } */
>  /* { dg-warning "'-maix64' requires PowerPC64 architecture remain enabled" 
> "PR106680" { target powerpc*-*-aix* } 0 } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106680-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106680-2.c
> index a9ed73726ef0c..f0758e303350a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr106680-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106680-2.c
> @@ -9,6 +9,6 @@ int foo ()
>return 1;
>  }
> 
> -/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* } 0 } */
> +/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* powerpc*-*-vxworks* } 
> 0 } */
>  /* { dg-warning "'-m64' requires PowerPC64 architecture, enabling" 
> "PR106680" { target powerpc*-*-darwin* } 0 } */
>  /* { dg-warning "'-maix64' requires PowerPC64 architecture remain enabled" 
> "PR106680" { target powerpc*-*-aix* } 0 } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106680-3.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106680-3.c
> index b642d5c7a008d..bca012e2cf663 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr106680-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106680-3.c
> @@ -8,6 +8,6 @@ int foo ()
>return 1;
>  }
> 
> -/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* } 0 } */
> +/* { dg-error "'-m64' requires a PowerPC64 cpu" "PR106680" { target 
> powerpc*-*-linux* powerpc*-*-freebsd* powerpc-*-rtems* powerpc*-*-vxworks* } 
> 0 } */
>  /* { dg-warning "'-m64' requires PowerPC64 architecture, enabling" 
> "PR106680" { target powerpc*-*-darwin* } 0 } */
>  /* { dg-warning "'-maix64' requires PowerPC64 architecture remain enabled" 
> "PR106680" { target powerpc*-*-aix* } 0 } */
> 



<    1   2   3   4   5   6   7   8   9   10   >