Re: [PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-08-01 Thread Kewen.Lin
on 2024/8/1 15:04, Richard Biener wrote:
> On Wed, Jul 31, 2024 at 9:00 PM Qing Zhao  wrote:
>>
>> Hi, Kewen,
>>
>> Thanks a lot for fixing this testing case issue.
>> Yes, the change LGTM though I can’t approve it.
> 
> OK.

Thanks to all, pushed as r15-2658.

BR,
Kewen

> 
> Richard.
> 
>> Qing
>>
>>> On Jul 31, 2024, at 05:22, Kewen.Lin  wrote:
>>>
>>> Hi,
>>>
>>> As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
>>> was designed for little-endian, the recent commit r15-2403 made it
>>> be tested with running on BE and PR116148 got exposed.
>>>
>>> This patch is to adjust the expected data for members in with_fam_2_v
>>> and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
>>> from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.
>>>
>>> Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -
>>>   PR testsuite/116148
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>   * c-c++-common/fam-in-union-alone-in-struct-2.c: Define macros
>>>   WITH_FAM_2_V_B[03] and WITH_FAM_3_V_A[07] as endianness, update the
>>>   checking with these macros and initialize with_fam_3_v.b[1] with
>>>   0x5f6f7f8f instead of 0x5f6f7f7f.
>>> ---
>>> .../fam-in-union-alone-in-struct-2.c  | 22 ++-
>>> 1 file changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c 
>>> b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
>>> index 93f9d5128f6..7845a7fbab3 100644
>>> --- a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
>>> +++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
>>> @@ -16,7 +16,7 @@ union with_fam_2 {
>>> union with_fam_3 {
>>>   char a[];
>>>   int b[];
>>> -} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}};
>>> +} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f8f}};
>>>
>>> struct only_fam {
>>>   int b[];
>>> @@ -28,16 +28,28 @@ struct only_fam_2 {
>>>   int b[];
>>> } only_fam_2_v = {{7, 11}};
>>>
>>> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>>> +#define WITH_FAM_2_V_B0 0x4f
>>> +#define WITH_FAM_2_V_B3 0x1f
>>> +#define WITH_FAM_3_V_A0 0x4f
>>> +#define WITH_FAM_3_V_A7 0x5f
>>> +#else
>>> +#define WITH_FAM_2_V_B0 0x1f
>>> +#define WITH_FAM_2_V_B3 0x4f
>>> +#define WITH_FAM_3_V_A0 0x1f
>>> +#define WITH_FAM_3_V_A7 0x8f
>>> +#endif
>>> +
>>> int main ()
>>> {
>>>   if (with_fam_1_v.b[3] != 4
>>>   || with_fam_1_v.b[0] != 1)
>>> __builtin_abort ();
>>> -  if (with_fam_2_v.b[3] != 0x1f
>>> -  || with_fam_2_v.b[0] != 0x4f)
>>> +  if (with_fam_2_v.b[3] != WITH_FAM_2_V_B3
>>> +  || with_fam_2_v.b[0] != WITH_FAM_2_V_B0)
>>> __builtin_abort ();
>>> -  if (with_fam_3_v.a[0] != 0x4f
>>> -  || with_fam_3_v.a[7] != 0x5f)
>>> +  if (with_fam_3_v.a[0] != WITH_FAM_3_V_A0
>>> +  || with_fam_3_v.a[7] != WITH_FAM_3_V_A7)
>>> __builtin_abort ();
>>>   if (only_fam_v.b[0] != 7
>>>   || only_fam_v.b[1] != 11)
>>> --
>>> 2.45.2
>>



Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Kewen.Lin
on 2024/8/1 01:52, Carl Love wrote:
> Kewen:
> 
> On 7/31/24 2:12 AM, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2024/7/27 06:56, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> Per a report from a user, the existing vec_test_lsbb_all_ones and, 
>>> vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
>>> documentation file.
>>>
>>> The following patch adds missing documentation for the 
>>> vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.
>>>
>>> Please let me know if the patch is acceptable for mainline.  Thanks.
>>>
>>>  Carl
>>>
>>> ---
>>> rs6000, document built-ins vec_test_lsbb_all_ones and 
>>> vec_test_lsbb_all_zeros
>>>
>>> Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
>>> and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
>>> returns 1 if the least significant bit in each byte is a 1, returns
>>> 0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
>>> the least significant bit in each byte is a zero and 0 otherwise.
>>>
>>> The test cases for the built-ins are in files:
>>>    gcc/testsuite/gcc.target/powerpc/lsbb.c
>>>    gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c
>>>
>>>
>>> gcc/ChangeLog:
>>>      * doc/extend.texi (vec_test_lsbb_all_ones,
>>>      vec_test_lsbb_all_zeros): Add documentation for the
>>>      existing built-ins.
>>> ---
>>>   gcc/doc/extend.texi | 15 +++
>>>   1 file changed, 15 insertions(+)
>>>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index 83ff168faf6..96e41c9a905 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost 
>>> byte of each doubleword.
>>>   The following additional built-in functions are also available for the
>>>   PowerPC family of processors, starting with ISA 3.1 
>>> (@option{-mcpu=power10}):
>>>
>>> +@smallexample
>>> +@exdent int vec_test_lsbb_all_ones (vector char);
>> I think we need to specify "unsigned" char explicitly since we don't actually
>> allow vector "signed" char as the below testing shows:
>>
>> int foo11 (vector signed char va)
>> {
>>    return vec_test_lsbb_all_ones (va);
>> }
>>
>> :17:3: error: invalid parameter combination for AltiVec intrinsic 
>> '__builtin_vec_xvtlsbb_all_ones'
>>     17 |   return vec_test_lsbb_all_ones (va);
>>
>>
>> Now we make these two bifs as overload, but there is only one instance 
>> respectively,
> Yes, I noticed that the built-ins were defined as overloaded but only had one 
> definition.   Did seem odd to me.
> 
>> either is with "vector unsigned char" as argument type, but the 
>> corresponding instance
>> prototype in builtin table is with "vector signed char".  It's inconsistent 
>> and weird,
>> I think we can just update the prototype in builtin table with "vector 
>> unsigned char"
>> and remove the entries in overload table.  It can be a follow up patch.
> 
> I didn't notice that it was signed in the instance prototype but unsigned in 
> the overloaded definition.  That is definitely inconsistent.
> 
> That said, should we just go ahead and support both signed and unsigned 
> argument versions of the all ones and all zeros built-ins?

Good question, I thought about that but found openxl only supports the unsigned 
version 
so I felt it's probably better to keep consistent with it.  But I'm fine for 
either, if
we decide to extend it to cover both signed and unsigned, we should notify 
openxl team
to extend it as well.

openxl doc links:

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros

BR,
Kewen

> 
> For example
> 
> [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
> __builtin_vec_xvtlsbb_all_ones]
>   signed int __builtin_vec_xvtlsbb_all_ones (vsc);
>     XVTLSBB_ONES   LSBB_ALL_ONES_VSC
>   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
>     XVTLSBB_ONES   LSBB_ALL_ONES_VUC
> 
> I tried this with the testcase, I borrowed from you and extended:
> 
> int foo11 (vector char va) <- 
> compi

[PATCH] testsuite: Adjust fam-in-union-alone-in-struct-2.c to support BE [PR116148]

2024-07-31 Thread Kewen.Lin
Hi,

As Andrew pointed out in PR116148, fam-in-union-alone-in-struct-2.c
was designed for little-endian, the recent commit r15-2403 made it
be tested with running on BE and PR116148 got exposed.

This patch is to adjust the expected data for members in with_fam_2_v
and with_fam_3_v by considering endianness, also update with_fam_3_v.b[1]
from 0x5f6f7f7f to 0x5f6f7f8f to avoid two "7f"s.

Tested on powerpc64-linux-gnu P8/P9 and powerpc64le-linux-gnu P9/P10.

Is it ok for trunk?

BR,
Kewen
-
PR testsuite/116148

gcc/testsuite/ChangeLog:

* c-c++-common/fam-in-union-alone-in-struct-2.c: Define macros
WITH_FAM_2_V_B[03] and WITH_FAM_3_V_A[07] as endianness, update the
checking with these macros and initialize with_fam_3_v.b[1] with
0x5f6f7f8f instead of 0x5f6f7f7f.
---
 .../fam-in-union-alone-in-struct-2.c  | 22 ++-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c 
b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
index 93f9d5128f6..7845a7fbab3 100644
--- a/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
+++ b/gcc/testsuite/c-c++-common/fam-in-union-alone-in-struct-2.c
@@ -16,7 +16,7 @@ union with_fam_2 {
 union with_fam_3 {
   char a[];
   int b[];
-} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f7f}};
+} with_fam_3_v = {.b = {0x1f2f3f4f, 0x5f6f7f8f}};

 struct only_fam {
   int b[];
@@ -28,16 +28,28 @@ struct only_fam_2 {
   int b[];
 } only_fam_2_v = {{7, 11}};

+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define WITH_FAM_2_V_B0 0x4f
+#define WITH_FAM_2_V_B3 0x1f
+#define WITH_FAM_3_V_A0 0x4f
+#define WITH_FAM_3_V_A7 0x5f
+#else
+#define WITH_FAM_2_V_B0 0x1f
+#define WITH_FAM_2_V_B3 0x4f
+#define WITH_FAM_3_V_A0 0x1f
+#define WITH_FAM_3_V_A7 0x8f
+#endif
+
 int main ()
 {
   if (with_fam_1_v.b[3] != 4
   || with_fam_1_v.b[0] != 1)
 __builtin_abort ();
-  if (with_fam_2_v.b[3] != 0x1f
-  || with_fam_2_v.b[0] != 0x4f)
+  if (with_fam_2_v.b[3] != WITH_FAM_2_V_B3
+  || with_fam_2_v.b[0] != WITH_FAM_2_V_B0)
 __builtin_abort ();
-  if (with_fam_3_v.a[0] != 0x4f
-  || with_fam_3_v.a[7] != 0x5f)
+  if (with_fam_3_v.a[0] != WITH_FAM_3_V_A0
+  || with_fam_3_v.a[7] != WITH_FAM_3_V_A7)
 __builtin_abort ();
   if (only_fam_v.b[0] != 7
   || only_fam_v.b[1] != 11)
--
2.45.2


Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Kewen.Lin
Hi Carl,

on 2024/7/27 06:56, Carl Love wrote:
> GCC maintainers:
> 
> Per a report from a user, the existing vec_test_lsbb_all_ones and, 
> vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
> file.
> 
> The following patch adds missing documentation for the vec_test_lsbb_all_ones 
> and, vec_test_lsbb_all_zeros built-ins.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>     Carl
> 
> ---
> rs6000, document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros
> 
> Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
> and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
> returns 1 if the least significant bit in each byte is a 1, returns
> 0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
> the least significant bit in each byte is a zero and 0 otherwise.
> 
> The test cases for the built-ins are in files:
>   gcc/testsuite/gcc.target/powerpc/lsbb.c
>   gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c
> 
> 
> gcc/ChangeLog:
>     * doc/extend.texi (vec_test_lsbb_all_ones,
>     vec_test_lsbb_all_zeros): Add documentation for the
>     existing built-ins.
> ---
>  gcc/doc/extend.texi | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 83ff168faf6..96e41c9a905 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost byte 
> of each doubleword.
>  The following additional built-in functions are also available for the
>  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):
> 
> +@smallexample
> +@exdent int vec_test_lsbb_all_ones (vector char);

I think we need to specify "unsigned" char explicitly since we don't actually
allow vector "signed" char as the below testing shows:

int foo11 (vector signed char va)
{ 
  return vec_test_lsbb_all_ones (va);
}

:17:3: error: invalid parameter combination for AltiVec intrinsic 
'__builtin_vec_xvtlsbb_all_ones'
   17 |   return vec_test_lsbb_all_ones (va);


Now we make these two bifs as overload, but there is only one instance 
respectively,
either is with "vector unsigned char" as argument type, but the corresponding 
instance
prototype in builtin table is with "vector signed char".  It's inconsistent and 
weird,
I think we can just update the prototype in builtin table with "vector unsigned 
char"
and remove the entries in overload table.  It can be a follow up patch.

> +@end smallexample
> +@findex vec_test_lsbb_all_ones
> +
> +The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
> +bit in each byte is a one.  It returns a zero otherwise.

May be better to use the wording "equal to 1" referred from ISA and "returns 0"
matches the preceding "returns 1", like:

“... in each byte is equal to 1.  It returns 0 otherwise.”

> +
> +@smallexample
> +@exdent int vec_test_lsbb_all_zeros (vector char);
> +@end smallexample
> +@findex vec_test_lsbb_all_zeros
> +
> +The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
> +bit in each byte is a zero.  It returns a zero otherwise.

Likewise, "... in each byte is equal to 0.  It returns 0 otherwise."

OK with these nits tweaked, thanks!

BR,
Kewen



Re: [PATCH, rs6000] Add const_vector into any_operand predicate

2024-07-31 Thread Kewen.Lin
Hi Haochen,

on 2024/7/25 11:34, HAO CHEN GUI wrote:
> Hi,
>   This patch add const_vector into any_operand predicate. From my
> understanding, any_operand should include all kinds of operands.
> The const_vector should be included. As emit_move_insn doesn't check
> the predicate, the const_vector is actually supported by vector mode
> move expand. So it should be added into any_operand in case other gen
> function (for instance, maybe_gen_insn) checks the predicate.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add const_vector into any_operand predicate
> 
> gcc/
>   * config/rs6000/predicates.md (any_operand): Add const_vector.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index d23ce9a77a3..12600368c43 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -19,7 +19,7 @@
> 
>  ;; Return 1 for anything except PARALLEL.
>  (define_predicate "any_operand"
> -  (match_code 
> "const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem"))
> +  (match_code 
> "const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem,const_vector"))

CC Mike since he added mov and movmisalign.

>From the name, its associated comments and what it currently consists of,
this seemed to be an oversight and looks reasonable to fix.  Maybe it's
read better to put "const_vector" after "const_wide_int", OK for trunk
with this tweaked, but please give others two days or so to chime in,
thanks!

BR,
Kewen



[PATCH] testsuite, rs6000: Adjust pr78056-[1357].c and remove pr78056-[246].c

2024-07-31 Thread Kewen.Lin
Hi,

When cleaning up the remaining powerpc_{vsx,altivec}_ok test
cases, I found some issues are related to pr78056-*.c.
Firstly, the test points of pr78056-[246].c are no longer
available since r9-3164 drops many HAVE_AS_* and the expected
warning are dropped together, so this patch is to remove them.
Secondly, pr78056-1.c and pr78056-3.c include altivec.h but
don't use any builtins, checking powerpc_altivec is enough
(don't need to check powerpc_vsx).  And pr78056-5.c doesn't
require any altivec/vsx feature, so powerpc_vsx_ok can be
removed.  Lastly, pr78056-7.c should just use powerpc_fprs
instead of dfp_hw as it only cares about insn fcpsgn.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr78056-1.c: Check for powerpc_altivec rather than
powerpc_vsx.
* gcc.target/powerpc/pr78056-3.c: Likewise.
* gcc.target/powerpc/pr78056-5.c: Drop powerpc_vsx_ok check.
* gcc.target/powerpc/pr78056-7.c: Check for powerpc_fprs rather than
dfp_hw.
* gcc.target/powerpc/pr78056-2.c: Remove.
* gcc.target/powerpc/pr78056-4.c: Remove.
* gcc.target/powerpc/pr78056-6.c: Remove.
---
 gcc/testsuite/gcc.target/powerpc/pr78056-1.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr78056-2.c | 18 --
 gcc/testsuite/gcc.target/powerpc/pr78056-3.c |  4 ++--
 gcc/testsuite/gcc.target/powerpc/pr78056-4.c | 19 ---
 gcc/testsuite/gcc.target/powerpc/pr78056-5.c |  2 --
 gcc/testsuite/gcc.target/powerpc/pr78056-6.c | 25 
 gcc/testsuite/gcc.target/powerpc/pr78056-7.c |  2 --
 7 files changed, 4 insertions(+), 70 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-2.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-4.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr78056-6.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
index 72640007dbb..49ebafe39b6 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr78056-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power8 -mvsx" } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */

 /* This test should succeed on both 32- and 64-bit configurations.  */
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-2.c
deleted file mode 100644
index 5cda9d6193b..000
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-2.c
+++ /dev/null
@@ -1,18 +0,0 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-skip-if "" { powerpc_vsx_ok } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power8 -mvsx" } */
-
-/* This test should succeed on both 32- and 64-bit configurations.  */
-#include 
-
-/* Though the command line specifies power8 target, this function is
-   to support power9. Expect an error message here because this target
-   does not support power9.  */
-__attribute__((target("cpu=power9")))
-int get_random ()
-{ /* { dg-warning "lacks power9 support" } */
-  return __builtin_darn_32 (); /* { dg-warning "implicit declaration" } */
-}
-
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-3.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
index cf57d058e8b..745552b244d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr78056-3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
-/* { dg-options "-mdejagnu-cpu=power7" } */
-/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-skip-if "" { powerpc*-*-aix* } } */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+/* { dg-require-effective-target powerpc_altivec } */

 /* This test should succeed on both 32- and 64-bit configurations.  */
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr78056-4.c 
b/gcc/testsuite/gcc.target/powerpc/pr78056-4.c
deleted file mode 100644
index 0bea0f895fa..000
--- a/gcc/testsuite/gcc.target/powerpc/pr78056-4.c
+++ /dev/null
@@ -1,19 +0,0 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
-/* powerpc_vsx_ok represents power7 */
-/* { dg-require-effective-target powerpc_vsx_ok } */
-/* { dg-skip-if "" { powerpc_vsx_ok } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-options "-mdejagnu-cpu=power7" } */
-
-/* This test should succeed on both 32- and 64-bit configurations.  */
-#include 
-
-/* Though the command line specifies power7 target, this function is
-   to support power8, which will fail because this platform does not
-   support power8.  */

[PATCH] testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_altivec etc.

2024-07-31 Thread Kewen.Lin
Hi,

This is a follow up patch for the previous patch adjusting
powerpc_vsx_ok with powerpc_vsx, focusing on those test cases
which don't really require VSX feature but used powerpc_vsx_ok
before, they actually require some other effective target check,
like some of them just require ALTIVEC feature, some of them
just require hard float support, and some of them just require
ISA 2.06 etc..

By the way, ppc-fpconv-4.c is the only one missing powerpc_fprs
among ppc-fpconv-*.c after this replacement, so I also fix it
here.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/bswap64-2.c: Replace powerpc_vsx_ok check with
has_arch_pwr7.
* gcc.target/powerpc/ppc-fpconv-2.c: Replace powerpc_vsx_ok check with
powerpc_fprs.
* gcc.target/powerpc/ppc-fpconv-6.c: Likewise.
* gcc.target/powerpc/ppc-pow.c: Likewise.
* gcc.target/powerpc/ppc-target-1.c: Likewise.
* gcc.target/powerpc/ppc-target-2.c: Likewise.
* gcc.target/powerpc/ppc-target-3.c: Likewise.
* gcc.target/powerpc/ppc-target-4.c: Likewise.
* gcc.target/powerpc/ppc-fpconv-4.c: Check for powerpc_fprs.
* gcc.target/powerpc/fold-vec-select-char.c: Replace powerpc_vsx_ok
with powerpc_altivec check and move it after dg-options line.
* gcc.target/powerpc/fold-vec-select-float.c: Likewise.
* gcc.target/powerpc/fold-vec-select-int.c: Likewise.
* gcc.target/powerpc/fold-vec-select-short.c: Likewise.
* gcc.target/powerpc/p9-novsx.c: Likewise.
* gcc.target/powerpc/p9-options-1.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/bswap64-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c | 6 +++---
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/fold-vec-select-short.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-novsx.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-options-1.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-2.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-4.c  | 1 +
 gcc/testsuite/gcc.target/powerpc/ppc-fpconv-6.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-pow.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-target-1.c  | 3 ++-
 gcc/testsuite/gcc.target/powerpc/ppc-target-2.c  | 3 ++-
 gcc/testsuite/gcc.target/powerpc/ppc-target-3.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/ppc-target-4.c  | 2 +-
 15 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bswap64-2.c 
b/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
index 6c3d8ca0528..70d872b5e30 100644
--- a/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bswap64-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-options "-O2 -mpopcntd" } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target has_arch_pwr7 } */
 /* { dg-final { scan-assembler "ldbrx" } } */
 /* { dg-final { scan-assembler "stdbrx" } } */

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
index e055c017536..17e28914aae 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-char.c
@@ -2,8 +2,8 @@
inputs produce the right code.  */

 /* { dg-do compile } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target powerpc_altivec } */

 #include 

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
index 1656fbff2ca..848bd750ff8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-float.c
@@ -1,9 +1,9 @@
-/* Verify that overloaded built-ins for vec_sel with float
-   inputs for VSX produce the right code.  */
+/* Verify that overloaded built-ins for vec_sel with float
+   inputs produce the right code.  */

 /* { dg-do compile } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target powerpc_altivec } */

 #include 

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
index 510fc564370..f51d741d401 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-select-int.c
@@ -2,8 +2,8 @@
inputs produce the right code.  */

 /* 

[PATCH] testsuite, rs6000: Fix some run cases with appropriate *_hw

2024-07-31 Thread Kewen.Lin
Hi,

When cleaning up the remaining powerpc_{vsx,altivec}_ok test
cases, I found some dg-do run test cases which should check
for the appropriate {p8vector,vmx}_hw check instead.  This
patch is to adjust them accordingly.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* gcc.target/powerpc/swaps-p8-46.c: Check for p8vector_hw rather than
powerpc_vsx_ok.
* gcc.target/powerpc/ppc64-abi-2.c: Check for vmx_hw rather than
powerpc_altivec_ok.
* gcc.target/powerpc/pr96139-c.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/pr96139-c.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 
b/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
index b490fc3c2fd..2a5a7602004 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { { powerpc*-*-linux* && lp64 } && powerpc_altivec_ok 
} } } */
+/* { dg-do run { target { { powerpc*-*-linux* && lp64 } && vmx_hw } } } */
 /* { dg-options "-O2 -fprofile -mprofile-kernel -maltivec -mabi=altivec 
-mno-pcrel" } */
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c 
b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
index 3ada2603428..b39c559ec0b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96139-c.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -Wall -maltivec" } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-require-effective-target vmx_hw } */

 /*
  * Based on test created by sjmunroe for pr96139
diff --git a/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c 
b/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
index 3b5154b1231..d0392f25eee 100644
--- a/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
+++ b/gcc/testsuite/gcc.target/powerpc/swaps-p8-46.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target le } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 " } */

 typedef __attribute__ ((__aligned__ (8))) unsigned long long __m64;
--
2.43.5


[PATCH] testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_vsx

2024-07-31 Thread Kewen.Lin
Hi,

Following up the previous r15-886, this patch to clean up
the remaining powerpc_vsx_ok which actually should use
powerpc_vsx instead.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/error-1.c: Replace powerpc_vsx_ok check with
powerpc_vsx.
* gcc.target/powerpc/warn-2.c: Likewise.
* gcc.target/powerpc/fold-vec-logical-ors-longlong.c: Likewise.
* gcc.target/powerpc/ppc-fortran/pr80108-1.f90: Replace powerpc_vsx_ok
check with powerpc_vsx and remove useless -mfloat128.
* gcc.target/powerpc/pragma_power8.c: Replace powerpc_vsx_ok check with
powerpc_vsx.
---
 gcc/testsuite/gcc.target/powerpc/error-1.c   | 2 +-
 .../gcc.target/powerpc/fold-vec-logical-ors-longlong.c   | 4 ++--
 gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90   | 4 ++--
 gcc/testsuite/gcc.target/powerpc/pragma_power8.c | 5 -
 gcc/testsuite/gcc.target/powerpc/warn-2.c| 2 +-
 5 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/error-1.c 
b/gcc/testsuite/gcc.target/powerpc/error-1.c
index d38eba8bb8a..9327076baf0 100644
--- a/gcc/testsuite/gcc.target/powerpc/error-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/error-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-O -mvsx -mno-altivec" } */

 /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target *-*-* 
} 0 } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
index 60af61a7f16..aae4694f551 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-logical-ors-longlong.c
@@ -4,7 +4,7 @@
 /* { dg-do compile } */
 /* { dg-options "-mvsx -O2" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */

 #include 

@@ -154,7 +154,7 @@ test6_nor (vector unsigned long long x, vector unsigned 
long long y)

 // The number of xxlor instructions generated varies between 6 and 24 for
 // older systems (power6,power7), as well as for 32-bit versus 64-bit targets.
-// For simplicity, this test now only targets "powerpc_vsx_ok" environments
+// For simplicity, this test now only targets "powerpc_vsx" environments
 // where the answer is expected to be 6.

 /* { dg-final { scan-assembler-times {\mxxlor\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90 
b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
index 00392b5fed9..e0e157bd245 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr80108-1.f90
@@ -1,7 +1,7 @@
 ! Originally contributed by Tobias Burnas.
 ! { dg-do compile { target { powerpc*-*-* } } }
-! { dg-require-effective-target powerpc_vsx_ok }
-! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax -mfloat128" }
+! { dg-require-effective-target powerpc_vsx }
+! { dg-options "-mdejagnu-cpu=405 -mpower9-minmax" }
 ! { dg-excess-errors "expect error due to conflicting target options" }
 ! Since the error message is not associated with a particular line
 ! number, we cannot use the dg-error directive and cannot specify a
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c 
b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
index 8de815e5a9e..43ea6dd406e 100644
--- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
+++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
@@ -1,6 +1,9 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target lp64 } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* Ensure there is no explicit -mno-vsx etc., otherwise
+   the below bif __builtin_vec_vcmpeq_p replies on power8
+   vsx would fail.  */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-mdejagnu-cpu=power6 -maltivec -O2" } */

 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/warn-2.c 
b/gcc/testsuite/gcc.target/powerpc/warn-2.c
index 29c6ce50cd7..ba294cb52e5 100644
--- a/gcc/testsuite/gcc.target/powerpc/warn-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/warn-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-require-effective-target powerpc_vsx } */
 /* { dg-options "-O -mdejagnu-cpu=power7 -mno-altivec" } */

 /* { dg-warning "'-mno-altivec' disables vsx" "" { 

[PATCH] testsuite, rs6000: Remove useless powerpc_{altivec,vsx}_ok

2024-07-31 Thread Kewen.Lin
Hi,

Checking the existing powerpc_{altivec,vsx}_ok test cases,
I found there are some test cases which don't require the
checks powerpc_{altivec,vsx} even, some of them already
have other effective target check which can cover check
powerpc_{altivec,vsx}, or some of them don't actually
require VSX/AltiVec feature at all.  So this patch is to
remove such useless checks.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


PR testsuite/114842

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/amo2.c: Remove powerpc_vsx_ok effective target
check as p9vector_hw already covers it.
* gcc.target/powerpc/p9-sign_extend-runnable.c: Likewise.
* gcc.target/powerpc/clone2.c: Remove powerpc_vsx_ok effective target
check as ppc_cpu_supports_hw already covers it.
* gcc.target/powerpc/pr47251.c: Remove powerpc_vsx_ok effective target
check as it doesn't need VSX.
* gcc.target/powerpc/pr60137.c: Likewise.
* gcc.target/powerpc/pr80098-1.c: Likewise.
* gcc.target/powerpc/pr80098-2.c: Likewise.
* gcc.target/powerpc/pr80098-3.c: Likewise.
* gcc.target/powerpc/sd-pwr6.c: Likewise.
* gcc.target/powerpc/pr57744.c: Remove powerpc_vsx_ok effective target
check and option -mvsx as it doesn't need VSX.
* gcc.target/powerpc/pr69548.c: Remove powerpc_vsx_ok effective target
check as it doesn't need VSX, remove lp64 and use int128 instead.
* gcc.target/powerpc/vec-cmpne-long.c: Remove powerpc_vsx_ok effective
target check as p8vector_hw already covers it.
* gcc.target/powerpc/darwin-save-world-1.c: Remove powerpc_altivec_ok
effective target check as vmx_hw already covers it.
---
 gcc/testsuite/gcc.target/powerpc/amo2.c| 1 -
 gcc/testsuite/gcc.target/powerpc/clone2.c  | 1 -
 gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr47251.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr57744.c | 3 +--
 gcc/testsuite/gcc.target/powerpc/pr60137.c | 1 -
 gcc/testsuite/gcc.target/powerpc/pr69548.c | 6 +++---
 gcc/testsuite/gcc.target/powerpc/pr80098-1.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/pr80098-2.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/pr80098-3.c   | 1 -
 gcc/testsuite/gcc.target/powerpc/sd-pwr6.c | 1 -
 gcc/testsuite/gcc.target/powerpc/vec-cmpne-long.c  | 1 -
 13 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/amo2.c 
b/gcc/testsuite/gcc.target/powerpc/amo2.c
index 9cb493da53e..592f0fb3f92 100644
--- a/gcc/testsuite/gcc.target/powerpc/amo2.c
+++ b/gcc/testsuite/gcc.target/powerpc/amo2.c
@@ -1,5 +1,4 @@
 /* { dg-do run { target { powerpc*-*-linux* && { lp64 && p9vector_hw } } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O2 -mvsx -mpower9-misc" } */
 /* { dg-additional-options "-mdejagnu-cpu=power9" { target { ! has_arch_pwr9 } 
} } */

diff --git a/gcc/testsuite/gcc.target/powerpc/clone2.c 
b/gcc/testsuite/gcc.target/powerpc/clone2.c
index e64940b7952..4098e878c21 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone2.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone2.c
@@ -1,6 +1,5 @@
 /* { dg-do run { target { powerpc*-*-linux* } } } */
 /* { dg-options "-mvsx -O2" } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-require-effective-target ppc_cpu_supports_hw } */

 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c 
b/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
index 3326765f4fb..27fc1d30a8b 100644
--- a/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/darwin-save-world-1.c
@@ -1,7 +1,7 @@
 /* { dg-do run { target powerpc*-*-* } } */
 /* { dg-options "-maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
-/* { dg-skip-if "need to be able to execute AltiVec" { ! { powerpc_altivec_ok 
&& vmx_hw } } } */
+/* { dg-skip-if "need to be able to execute AltiVec" { ! vmx_hw } } */

 /* With altivec turned on, Darwin wants to save the world but we did not mark 
lr as being saved any more
as saving the lr is not needed for saving altivec registers.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
index f0514993bc0..595aa4768cc 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c
@@ -1,5 +1,4 @@
 /* { dg-do run { target { *-*-linux* && { lp64 && p9vector_hw } } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options 

[PATCH] testsuite, rs6000: Make {vmx,vsx,p8vector}_hw check for altivec/vsx feature

2024-07-31 Thread Kewen.Lin
Hi,

Different from p9vector_hw, vmx_hw/vsx_hw/p8vector_hw checks
can still succeed without Altivec/VSX feature support.  We
have many runnable test cases only checking for these *_hw
without extra checking for if Altivec/VSX feature enabled or
not.  It means they can fail if being tested by explicitly
disabling Altivec/VSX.  So I think it's reasonable to check
if Altivec/VSX feature is enabled too while checking testing
environment is able to execute some instructions since these
instructions reply on these features.  So similar to what we
test for p9vector_hw, this patch is to modify C functions
used for vmx_hw, vsx_hw and p8vector_hw with according vector
types and constraints.  For p8vector_hw, excepting for VSX
feature, it also requires ISA 2.7 support.  A good thing is
that now almost all of the test cases using p8vector_hw have
specified -mdejagnu-cpu=power8 always or if !has_arch_pwr8.
Considering checking _ARCH_PWR8 in p8vector_hw can stop test
cases being tested even if test case itself has specified
-mdejagnu-cpu=power8, this patch doesn't force p8vector_hw to
check _ARCH_PWR8, instead it updates all existing test cases
which adopt p8vector_hw but don't have -mdejagnu-cpu=power8.
By the way, all test cases adopting p9vector_hw are all fine.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen


gcc/testsuite/ChangeLog:

* lib/target-supports.exp (check_vsx_hw_available): Modify C source
code used for testing with type vector long long and constraint wa
which require VSX feature.
(check_p8vector_hw_available): Likewise.
(check_vmx_hw_available): Modify C source code used for testing with
type vector int and constraint v which require Altivec feature.
* gcc.target/powerpc/divkc3-1.c: Specify -mdejagnu-cpu=power8 for
!has_arch_pwr8 to ensure power8 support.
* gcc.target/powerpc/mulkc3-1.c: Likewise.
* gcc.target/powerpc/pr96264.c: Likewise.
---
 gcc/testsuite/gcc.target/powerpc/divkc3-1.c |  1 +
 gcc/testsuite/gcc.target/powerpc/mulkc3-1.c |  1 +
 gcc/testsuite/gcc.target/powerpc/pr96264.c  |  1 +
 gcc/testsuite/lib/target-supports.exp   | 24 -
 4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/divkc3-1.c 
b/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
index 89bf04f12a9..96fb5c21204 100644
--- a/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/divkc3-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64*-*-* && p8vector_hw } } } */
 /* { dg-options "-mfloat128 -mvsx" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */

 void abort ();

diff --git a/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c 
b/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
index b975a91dbd7..1b0a1e24814 100644
--- a/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/mulkc3-1.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64*-*-* && p8vector_hw } } } */
 /* { dg-options "-mfloat128 -mvsx" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */

 void abort ();

diff --git a/gcc/testsuite/gcc.target/powerpc/pr96264.c 
b/gcc/testsuite/gcc.target/powerpc/pr96264.c
index 9f7d885daf2..906720fdcd1 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr96264.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr96264.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc64le-*-* } } } */
 /* { dg-options "-Os -fno-forward-propagate -fschedule-insns -fno-tree-ter 
-Wno-psabi" } */
+/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target p8vector_hw } */

 typedef unsigned char __attribute__ ((__vector_size__ (64))) v512u8;
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index daa0c75d2bc..2101e9c9c83 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -2864,11 +2864,9 @@ proc check_p8vector_hw_available { } {
check_runtime_nocache p8vector_hw_available {
int main()
{
-   #ifdef __MACH__
- asm volatile ("xxlorc vs0,vs0,vs0");
-   #else
- asm volatile ("xxlorc 0,0,0");
-   #endif
+ vector long long v1 = {0x1, 0x2};
+ vector long long v2;
+ asm ("xxlorc %0,%1,%1" : "=wa" (v2) : "wa" (v1));
  return 0;
}
} $options
@@ -3165,11 +3163,9 @@ proc check_vsx_hw_available { } {
check_runtime_nocache vsx_hw_available {
int main()
{
-   #ifdef __MACH__
- asm volatile ("xxlor vs0,vs0,vs0");
-   #else
- asm volatile 

Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-29 Thread Kewen.Lin
on 2024/7/29 23:47, Peter Bergner wrote:
> On 7/29/24 5:21 AM, Kewen.Lin wrote:
>> on 2024/7/27 06:37, Carl Love wrote:
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
>>> @@ -0,0 +1,358 @@
>>> +/* { dg-do run  { target power10_hw } } */
>>> +/* { dg-do link { target { ! power10_hw } } } */
>>> +/* { dg-require-effective-target power10_ok } */
>>
>> As Peter pointed out in another thread, you need int128 effective target 
>> check as well,
>> otherwise it will fail with power10 -m32.
>>
>> Another nit: power10_hw should already guarantee power10_ok, so power10_ok
>> is only required for dg-do link.
> 
> I really dislike those *_ok tests.  The power10_ok test doesn't verify that
> the options being used to compile the test case enables Power10.  It only
> verifies the assembler you're using is Power10 enabled.  I agree that the
> power10_hw test includes the same (useless) assembler check that power10_ok
> includes, so power10_ok isn't needed.
> 
> 
> Those *_ok tests really should be verifying the compiler options that will
> be used to compile the test case enables the features the test case is
> attempting to use.
> 
> 
> 

Yes!

> Maybe the following will work?
> 
> +/* { dg-do run  { target power10_hw } } */
> +/* { dg-do link { target { ! power10_hw } } } */

Maybe we can replace link by compile here, as we care about compilation and
execution result more here.  (IMHO if it's still "link", power10_ok is useful
to stop this being tested on an environment with an assembler not supporting
power10).

BR,
Kewen

> +/* { dg-require-effective-target int128 } */
> ...
> 
> Carl, can you try testing the above change on ltcd97-lp7 and run the test
> in both 32-bit and 64-bit modes?
> 
> Peter
> 



Re: [PATCH] rs6000, add comment to VEC_IC definition

2024-07-29 Thread Kewen.Lin
Hi Carl,

on 2024/7/27 07:31, Carl Love wrote:
> GCC maintainers:
> 
> This patch adds a comment to the VEC_IC definitions to clarify the V1TI 
> "TARGET_POWER10" mode per the request by Segher in the feedback to patch 
> "https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html;.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html
> 
> Please let me know if this patch is acceptable for mainline.
> 
> Thanks.
> 
>   Carl
> 
> rs6000, add comment to VEC_IC definition
> 
> This patch adds a comment to the VEC_IC definition to clarify
> the V1TI "TARGET_POWER10" mode that was added.
> 
> gcc/ChangeLog:
>     * config/rs6000/vector.md: Add comment for the VEC_IC
>     define_mode_iterator.
> ---
>  gcc/config/rs6000/vector.md | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 0d3e0a24e11..75d95ccfb47 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -26,7 +26,8 @@
>  ;; Vector int modes
>  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
> 
> -;; Vector int modes for comparison, shift and rotation
> +;; Vector int modes for comparison, shift and rotation.  ISA 3.1 adds the 
> V1TI mode
> +;; for the int128 type.

Maybe s/int128/vector int128/, OK with/without this nit tweaked, thanks!

BR,
Kewen

>  (define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI "TARGET_POWER10")])
> 
>  ;; 128-bit int modes



Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-29 Thread Kewen.Lin
Hi Carl,

on 2024/7/27 06:37, Carl Love wrote:
> GCC developers:
> 
> Version 2, updated rs6000-overload.def to remove adding additonal internal 
> names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen.  Move 
> new documentation statement for the PIVPR built-ins per comments from Kewen.  
> Updated dg-do-run directive and added comment about the save-temps  in 
> testcase per feedback from Segher.  Retested the patch on Power 10 with no 
> regressions.
> 
> The following patch adds the int128 varients to the existing overloaded 
> built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
> vec_sro.  These varients were requested by Steve Munroe.
> 
> The patch has been tested on a Power 10 system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.
> 
>    Carl
> 
> 
> ---
> rs6000, Add new overloaded vector shift builtin int128 varients
> 
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
> testcase and update the documentation for the built-in.
> 
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vsdb_): Change
>     define_insn iterator to VEC_IC.
>     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>     definitions.
>     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,

Nit: s// /

>     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>     built-ins.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
> ---
>  gcc/config/rs6000/altivec.md  |   6 +-
>  gcc/config/rs6000/rs6000-builtins.def |  12 +
>  gcc/config/rs6000/rs6000-overload.def |  40 ++
>  gcc/doc/extend.texi   |  43 +++
>  .../vec-shift-double-runnable-int128.c    | 358 ++
>  5 files changed, 456 insertions(+), 3 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> 

snip...

> 
>  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
>    vuc __builtin_vec_vsrv (vuc, vuc);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 0b572afca72..83ff168faf6 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -23504,6 +23504,10 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_sldb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors left by the amount specified by the 
> low-order
> @@ -23531,12 +23535,51 @@ const unsigned int);
>  vector signed long long, const unsigned int);
>  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
>  vector unsigned long long, const unsigned int);
> +@exdent vector signed __int128 vec_srdb (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
>  @end smallexample
> 
>  Shift the combined input vectors right by the amount specified by the 
> low-order
>  three bits of the third argument, and return the remaining 128 bits.  Code
>  using this built-in must be endian-aware.
> 
> +@smallexample
> +@exdent vector signed __int128 vec_sld (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_sldw (vector signed __int128,
> +vector signed __int128, const unsigned int);
> +@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
> +vector unsigned __int128, const unsigned int);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector signed char);
> +@exdent vector signed __int128 vec_slo (vector signed __int128,
> +vector unsigned char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector signed char);
> +@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
> +vector unsigned char);
> +@exdent vector signed __int128 vec_sro (vector signed __int128,
> +vector signed char);
> +@exdent vector signed 

Re: [PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:06, Carl Love wrote:
> GCC maintainers:
> 
> version 2, Updated patch comments, added missing ChangeLog.  Fixed unintended 
> line removal.
> 
> The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
> builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  The 
> difference is the overloaded built-ins return a vector of boolean or a vector 
> of long long booleans where as the removed built-ins returned a vector of 
> floats or vector of doubles.
> 
> The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
> __builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
> vec_cmp[eq|ge|gt] built-in with the required changes for the return type.  
> Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.
> 
> The patches have been tested on a Power 10 LE system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> -
> rs6000, remove __builtin_vsx_xvcmp* built-ins
> 
> This patch removes the built-ins:
>  __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
>  __builtin_vsx_xvcmpgtsp.
> 
> which are similar to the recommended PVIPR documented overloaded
> vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
> 
> The difference is that the overloaded built-ins return a vector of
> 32-bit booleans.  The removed built-ins returned a vector of floats.
> 
> The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
> __builtin_vsx_xvcmpgtdp are not removed as they are used by the
> overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
> 
> The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
> __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
> __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
> the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
> overloaded built-ins requires the result to be stored in a vector of
> boolean of the appropriate size or the result must be cast to the return
> type used by the original __builtin_vsx_xvcmp* built-ins.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
>     __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
>     definitions.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp,
>     __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp,
>     __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp,
>     __builtin_vsx_xvcmpgesp): Remove.
>     (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float
>     arguments that     store result in boolean and cast result to
>     store result in float.  Add tests for double arguments that
>     store the result in long long boolean and cast result to
>     double.

Nit: Normally the one in "()" is the name of the function you changed,
so how about:

(do_cmp): Replace __builtin_vsx_xvcmp{eq,gt,ge}{sp,dp} by vec_cmp{eq,gt,ge}
respectively and add explicit casts to vector {float,double}.  Add more
testing code assigning to vector boolean types.

OK for trunk with this nit tweaked, thanks!

BR,
Kewen

> ---
>  gcc/config/rs6000/rs6000-builtins.def |  9 --
>  .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
>  2 files changed, 21 insertions(+), 16 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..47830b7dcb0 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1579,18 +1579,12 @@
>    const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>  XVCMPEQDP_P vector_eq_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
> -    XVCMPEQSP vector_eqv4sf {}
> -
>    const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
> 
>    const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
>  XVCMPGEDP_P vector_ge_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgesp (vf, vf);
> -    XVCMPGESP vector_gev4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
>  XVCMPGESP_P vector_ge_v4sf_p {pred}
> 
> @@ -1600,9 +1594,6 @@
>    const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
>  XVCMPGTDP_P vector_gt_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
> -    XVCMPGTSP vector_gtv4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
>  XVCMPGTSP_P vector_gt_v4sf_p {pred}
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 60f91aad23c..d67f97c8011 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -156,13 +156,27 @@ int do_cmp (void)
>  {
>    int i = 0;
> 
> -  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], 

Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:52, Carl Love wrote:
> 
> GCC maintainers:
> 
> This patch was previously posted.  Per the feedback, it is now the first of 
> two patches to remove the set built-ins.
> 
> This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
> __builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
> update the various vector elements.  This change was originally intended to 
> be part of the earlier series of cleanup patches.  It was initially thought 
> that some additional work would be needed to do some gimple generation 
> instead of these built-ins.  However, the existing default code generation 
> does produce the needed code.    For the vec_set bif, the equivalent C code 
> is as good or better than the built-in.  For the vec_insert bif whose 
> resolving previously made use of the vec_set bif, the assembly code 
> generation is as good as before with the -O3 optimization.

This background information will be also mentioned in commit log, right?

> 
> The patch has been tested on Power 10 LE with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> 
> -
> rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
> __builtin_vec_set_v2di
> 
> Remove the built-ins, use the default gimple generation instead.

OK for trunk with better commit log like the above paragraph, thanks!

// Assuming testing on BE goes well too. :)

BR,
Kewen

> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
>     __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
>     definitions.
>     * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
>     handling for constant vec_insert position with
>     VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 13 -
>  gcc/config/rs6000/rs6000-c.cc | 40 ---
>  2 files changed, 53 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 47830b7dcb0..75c33aa9ffc 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1263,19 +1263,6 @@
>    const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  VEC_EXT_V2DI nothing {extract}
> 
> -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
> -;; resolve_vec_insert(), rs6000-c.cc
> -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
> -;; in resolve_vec_insert are replaced by the equivalent gimple statements.
> -  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
> -    VEC_SET_V1TI nothing {set}
> -
> -  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
> -    VEC_SET_V2DF nothing {set}
> -
> -  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
> -    VEC_SET_V2DI nothing {set}
> -
>    const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
>  CMPGE_16QI vector_nltv16qi {}
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 68519e1397f..04882c396bf 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vec 
> *arglist,
>    return error_mark_node;
>  }
> 
> -  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
> -  machine_mode mode = TYPE_MODE (arg1_type);
> -
> -  if ((mode == V2DFmode || mode == V2DImode)
> -  && VECTOR_UNIT_VSX_P (mode)
> -  && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  wide_int selector = wi::to_wide (arg2);
> -  selector = wi::umod_trunc (selector, 2);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  tree call = NULL_TREE;
> -  if (mode == V2DFmode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
> -  else if (mode == V2DImode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     reversed.  */
> -  if (call)
> -    {
> -      *res = resolved;
> -      return build_call_expr (call, 3, arg1, arg0, arg2);
> -    }
> -    }
> -
> -  else if (mode == V1TImode
> -       && VECTOR_UNIT_VSX_P (mode)
> -       && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
> -  wide_int selector = wi::zero(32);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     reversed.  */
> -  *res = resolved;
> -  return build_call_expr (call, 3, arg1, arg0, arg2);
> -    }
> -
>    /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = 

Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:52, Carl Love wrote:
> GCC maintainers:
> 
> This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, 
> __builtin_vsx_set_2df, __builtin_vsx_set_2di.  With the  removal of these 
> built-ins, the built-in attribute "set", used in the built-in definition 
> file, is no longer needed.  The "set"  and the associated code for the "set" 
> is removed.
> 
> The assembly code generated by using C code to set an element of a vector 
> versus using the vsx set built-in to set an element was investigated.  With 
> -O0 optimization the generated assmenly code is comparable in therms of the 
> generated assembly instrucitons and number of instructions.  For the -O3 
> optimization level, the 2DI an 2DF cases the built-ins and the C code 
> generate identical assembly code.  The assembly code generated for the 1TI 
> case for the C code has one less instruction.  The built-in generates an 
> extra load instruction.  Hence, the C code is better as it has fewer load 
> instructions.
> 
> The testcase for the __builtin_vsx_set_2df is removed.  The other built-ins 
> do not have testcases.
> 
> The patch has been tested on a Power 10 LE system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> 
> --
> rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, 
> __builtin_vsx_set_2di
> 
> The built-ins set a value in a vector.  The same operation can be done
> in C-code.  The assembly code generated from the C-code is as good or
> better than the code generated by the built-ins.  With default
> optimization the number of assembly generated for the two methods are
> similar.  With -O3 optimization, the assembly generated for the two
> approaches is identical for the 2DF and 2DI types.  The assembly for
> the C-code version of the 1Ti requres one less assembly instruction.

Nit: s/requres/requires/

> It also only uses one load versus two loads for the built-in.
> 
> With the removal of the built-ins, there are no other uses of the
> set built-in attribute.  The code associated with the set built-in
> attribute is removed.
> 
> Finally, the testcase for the __builtin_vsx_set_2df is removed.  The
> other built-ins do not have testcases.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtin.cc (get_element_number,
>     altivec_expand_vec_set_builtin): Remove functions.
>     (rs6000_expand_builtin): Remove the if statement to call
>     altivec_expand_vec_set_builtin.
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
>     __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
>     built-in definitions.
>     * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
>     Remove the isset variable from the structure.
>     (parse_bif_attrs): Remove the uses of the isset variable.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
>     __builtin_vsx_set_2df built-in.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 53 ---
>  gcc/config/rs6000/rs6000-builtins.def | 10 
>  gcc/config/rs6000/rs6000-gen-builtins.cc  | 29 --
>  .../gcc.target/powerpc/vsx-builtin-3.c    |  6 ---
>  4 files changed, 11 insertions(+), 87 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 117cf0125f8..099cbc82245 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
> icode, tree exp, rtx target)
>    return target;
>  }
> 
> -/* Return the integer constant in ARG.  Constrain it to be in the range
> -   of the subparts of VEC_TYPE; issue an error if not.  */
> -
> -static int
> -get_element_number (tree vec_type, tree arg)
> -{
> -  unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1;
> -
> -  if (!tree_fits_uhwi_p (arg)
> -  || (elt = tree_to_uhwi (arg), elt > max))
> -    {
> -  error ("selector must be an integer constant in the range [0, %wi]", 
> max);
> -  return 0;
> -    }
> -
> -  return elt;
> -}
> -
> -/* Expand vec_set builtin.  */
> -static rtx
> -altivec_expand_vec_set_builtin (tree exp)
> -{
> -  machine_mode tmode, mode1;
> -  tree arg0, arg1, arg2;
> -  int elt;
> -  rtx op0, op1;
> -
> -  arg0 = CALL_EXPR_ARG (exp, 0);
> -  arg1 = CALL_EXPR_ARG (exp, 1);
> -  arg2 = CALL_EXPR_ARG (exp, 2);
> -
> -  tmode = TYPE_MODE (TREE_TYPE (arg0));
> -  mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
> -  gcc_assert (VECTOR_MODE_P (tmode));
> -
> -  op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL);
> -  op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
> -  elt = get_element_number (TREE_TYPE (arg0), arg2);
> -
> -  if (GET_MODE (op1) 

Re: [PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-25 Thread Kewen.Lin
Hi Carl,

on 2024/7/24 01:32, Carl Love wrote:
> GCC maintainers:
> 
> The code generated by using C-code to set a vector element versus using a 
> built-in has been investigated.  The assembly code generated from the C-code 
> is as good or better than the assembly code generated for the built-ins for 
> both the -O0 and -O3 levels of optimization.
> 
> For the vec_insert built-in bif whose resolving makes use of the vec_set bif 
> previously, is now removed, is as good as before with optimization.
> 
> This two patch series removes the __builtin_vec_set_v1ti, 
> __builtin_vec_set_v2df, __builtin_vec_set_v2di and  built-ins 
> __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
> built-ins in favor of using C-code instead.  The built-ins use the built-in 
> set attribute in the definitions of the built-ins.  With the removal of these 
> 6 built-ins, the set built-in attribute is no longer used and the related 
> code for the attribute is removed.
> 
> The patch, first patch in this series, to remove the __builtin_vec_set_v1ti, 
> __builtin_vec_set_v2df, __builtin_vec_set_v2di was previously posted.  The 
> feedback on the patch was that we could also remove set bif attribute.  
> Removal of the set bif attribute requires also removing the 
> __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
> built-ins.  The second patch removes the vsx set built-ins and the now no 
> longer used set built-in attribute and associated code.
> 
> The patches have been tested on a Power 10 LE system with no regressions.

It would be good to test this on BE as well (both 64-bit and 32-bit).

BR,
Kewen



Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Kewen.Lin
Hi Carl,

Some minor comments are inlined on top of Segher's and Peter's comments.

on 2024/7/20 04:04, Carl Love wrote:
> GCC developers:
> 
> The following patch adds the int128 varients to the existing overloaded 
> built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
> vec_sro.  These varients were requested by Steve Munroe.
> 
> The patch has been tested on a Power 10 system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.
> 
>    Carl
> 
> 
> ---
>  rs6000, Add new overloaded vector shift builtin int128 varients
> 
> Add the signed __int128 and unsigned __int128 argument types for the
> overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
> vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
> testcase and update the documentation for the built-in.
> 
> Add the missing internal names for the float and double types for
> overloaded builtin vec_sld for the float and double types.

This isn't needed, see below explanation.

> 
> gcc/ChangeLog:
>     * config/rs6000/altivec.md (vsdb_): Change
>     define_insn iterator to VEC_IC.
>     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
>     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
>     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
>     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
>     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
>     definitions.
>     (vec_sld): Add missing internal names.
>     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
>     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
>     built-ins.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
>     file.
> ---
>  gcc/config/rs6000/altivec.md  |   6 +-
>  gcc/config/rs6000/rs6000-builtins.def |  12 +
>  gcc/config/rs6000/rs6000-overload.def |  44 ++-
>  gcc/doc/extend.texi   |  42 +++
>  .../vec-shift-double-runnable-int128.c    | 349 ++
>  5 files changed, 448 insertions(+), 5 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 5af9bf920a2..2a18ee44526 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
>  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])
> 
>  (define_insn "vsdb_"
> - [(set (match_operand:VI2 0 "register_operand" "=v")
> -  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
> -       (match_operand:VI2 2 "register_operand" "v")
> + [(set (match_operand:VEC_IC 0 "register_operand" "=v")
> +  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
> +       (match_operand:VEC_IC 2 "register_operand" "v")
>     (match_operand:QI 3 "const_0_to_12_operand" "n")]
>    VSHIFT_DBL_LR))]
>    "TARGET_POWER10"
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..fbb6e1ddf85 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -964,6 +964,9 @@
>    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
>  VSLDOI_8HI altivec_vsldoi_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
> +    VSLDOI_V1TI altivec_vsldoi_v1ti {}
> +
>    const vss __builtin_altivec_vslh (vss, vus);
>  VSLH vashlv8hi3 {}
> 
> @@ -1831,6 +1834,9 @@
>    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
>  XXSLDWI_2DI vsx_xxsldwi_v2di {}
> 
> +  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
> +    XXSLDWI_Q vsx_xxsldwi_v1ti {}
> +
>    const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
>  XXSLDWI_4SF vsx_xxsldwi_v4sf {}
> 
> @@ -3299,6 +3305,9 @@
>    const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
>  VSLDB_V8HI vsldb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
> +    VSLDB_V1TI vsldb_v1ti {}
> +
>    const vsq __builtin_altivec_vslq (vsq, vuq);
>  VSLQ vashlv1ti3 {}
> 
> @@ -3317,6 +3326,9 @@
>    const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
>  VSRDB_V8HI vsrdb_v8hi {}
> 
> +  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
> +    VSRDB_V1TI vsrdb_v1ti {}
> +
>    const vsq __builtin_altivec_vsrq (vsq, vuq);
>  VSRQ vlshrv1ti3 {}
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index c4ecafc6f7e..302e0232533 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ 

Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-24 Thread Kewen.Lin
on 2024/7/24 06:53, Andrew Pinski wrote:
> On Mon, Jul 22, 2024 at 7:41 PM Kewen.Lin  wrote:
>>
>> Hi Andrew,
>>
>> on 2024/7/23 08:09, Andrew Pinski wrote:
>>> On Sun, Jun 30, 2024 at 11:17 PM Kewen.Lin  wrote:
>>>>
>>>> Hi,
>>>>
>>>> As PR115659 shows, assuming c = x CMP y, there are some
>>>> folding chances for patterns r = c ? 0/z : z/-1:
>>>>   - For r = c ? 0 : z, it can be folded into r = ~c & z.
>>>>   - For r = c ? z : -1, it can be folded into r = ~c | z.
>>>>
>>>> But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
>>>> compound operation, I'm not sure if each target with
>>>> vector capability have a single vector instruction for it,
>>>> if no, it's arguable to consider it always beats vector
>>>> selection (like vector constant gets hoisted or combined
>>>> and selection has same latency as normal logical operation).
>>>> So IMHO we probably need to query target with new optabs.
>>>> So this patch is to introduce new optabs andc, iorc and its
>>>> corresponding internal functions BIT_{ANDC,IORC} (looking
>>>> for suggestion for naming optabs and ifns), and if targets
>>>> defines such optabs for vector modes, it means targets
>>>> support these hardware insns and should be not worse than
>>>> vector selection.  btw, the rs6000 changes are meant to
>>>> give an example for a target supporting andc/iorc.
>>>>
>>>> Does this sound reasonable?
>>>
>>> Just a quick FYI (I will be making the change and testing the change).
>>> The optab names `andc` and `iorc` unfortunately do not work with
>>> scalar modes since there are complex modes which start with c and are
>>> combined with the scalar modes. So for an example a pattern named
>>> `andcsi3` is not for the optab `andc` with the mode of si but rather
>>> for `and` optab and for the mode `csi`. The same issue happens for
>>> `iorc` too.
>>
>> ah, thanks for pointing out this!  I guess a "_" can help, that is:
>>
>> OPTAB_D (andc_optab, "andc_$a3")
>> OPTAB_D (iorc_optab, "iorc_$a3")
>>
>> but the downside is the code naming become different from "and$a3"
>> and "ior$a3", so it seems better to use different names like what
>> you proposed.
>>
>>> Thinking out loud on what names we should use instead; `andn` and
>>> `iorn` might be ok? Does anyone else have any suggestions?
>>
>> FWIW, they look good to me.
> 
> Just FYI. I also noticed the powerpc backend could define these optabs
> for scalars and would benefit for better code with the following
> example (after I finish up my patches):
> ```
> long f1(long a, long b)
> {
> a = ~0x4;
> return a | ~b;
> }
> long f2(long a, long b)
> {
> a = ~0x4;
> return a & ~b;
> }
> ```
> 

Yeah, andc/orc would be better, thanks for the heads up.

BR,
Kewen


Re: [PATCH] optabs/rs6000: Rename iorc and andc to iorn and andn

2024-07-24 Thread Kewen.Lin
Hi Andrew,

on 2024/7/24 10:49, Andrew Pinski wrote:
> When I was trying to add an scalar version of iorc and andc, the optab that
> got matched was for and/ior with the mode of csi and cdi instead of iorc and
> andc optabs for si and di modes. Since csi/cdi are the complex integer modes,
> we need to rename the optabs to be without c there. This changes c to n which
> is a neutral and known not to be first letter of a mode.
> 
> Bootstrapped and tested on x86_64 and powerpc64le.
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-builtins.def: s/iorc/iorn/. s/andc/andn/
>   for the code.
>   * config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update
>   to andn.

Nit: s/andn/iorn/

>   * config/rs6000/rs6000.md (andc3): Rename to ...
>   (andn3): This.
>   (iorc3): Rename to ...
>   (iorn3): This.

Thanks for doing this, rs6000 part change is OK (in case you need that).

BR,
Kewen

>   * doc/md.texi: Update documentation for the rename.
>   * internal-fn.def (BIT_ANDC): Rename to ...
>   (BIT_ANDN): This.
>   (BIT_IORC): Rename to ...
>   (BIT_IORN): This.
>   * optabs.def (andc_optab): Rename to ...
>   (andn_optab): This.
>   (iorc_optab): Rename to ...
>   (iorn_optab): This.
>   * gimple-isel.cc (gimple_expand_vec_cond_expr): Update for the
>   renamed internal functions, ANDC/IORC to ANDN/IORN.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 44 +--
>  gcc/config/rs6000/rs6000-string.cc|  2 +-
>  gcc/config/rs6000/rs6000.md   |  4 +--
>  gcc/doc/md.texi   |  8 ++---
>  gcc/gimple-isel.cc| 12 
>  gcc/internal-fn.def   |  4 +--
>  gcc/optabs.def| 10 --
>  7 files changed, 44 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..ffbeff64d6d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -518,25 +518,25 @@
>  VAND_V8HI_UNS andv8hi3 {}
>  
>const vsc __builtin_altivec_vandc_v16qi (vsc, vsc);
> -VANDC_V16QI andcv16qi3 {}
> +VANDC_V16QI andnv16qi3 {}
>  
>const vuc __builtin_altivec_vandc_v16qi_uns (vuc, vuc);
> -VANDC_V16QI_UNS andcv16qi3 {}
> +VANDC_V16QI_UNS andnv16qi3 {}
>  
>const vf __builtin_altivec_vandc_v4sf (vf, vf);
> -VANDC_V4SF andcv4sf3 {}
> +VANDC_V4SF andnv4sf3 {}
>  
>const vsi __builtin_altivec_vandc_v4si (vsi, vsi);
> -VANDC_V4SI andcv4si3 {}
> +VANDC_V4SI andnv4si3 {}
>  
>const vui __builtin_altivec_vandc_v4si_uns (vui, vui);
> -VANDC_V4SI_UNS andcv4si3 {}
> +VANDC_V4SI_UNS andnv4si3 {}
>  
>const vss __builtin_altivec_vandc_v8hi (vss, vss);
> -VANDC_V8HI andcv8hi3 {}
> +VANDC_V8HI andnv8hi3 {}
>  
>const vus __builtin_altivec_vandc_v8hi_uns (vus, vus);
> -VANDC_V8HI_UNS andcv8hi3 {}
> +VANDC_V8HI_UNS andnv8hi3 {}
>  
>const vsc __builtin_altivec_vavgsb (vsc, vsc);
>  VAVGSB avgv16qi3_ceil {}
> @@ -1189,13 +1189,13 @@
>  VAND_V2DI_UNS andv2di3 {}
>  
>const vd __builtin_altivec_vandc_v2df (vd, vd);
> -VANDC_V2DF andcv2df3 {}
> +VANDC_V2DF andnv2df3 {}
>  
>const vsll __builtin_altivec_vandc_v2di (vsll, vsll);
> -VANDC_V2DI andcv2di3 {}
> +VANDC_V2DI andnv2di3 {}
>  
>const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
> -VANDC_V2DI_UNS andcv2di3 {}
> +VANDC_V2DI_UNS andnv2di3 {}
>  
>const vd __builtin_altivec_vnor_v2df (vd, vd);
>  VNOR_V2DF norv2df3 {}
> @@ -1975,40 +1975,40 @@
>  NEG_V2DI negv2di2 {}
>  
>const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
> -ORC_V16QI iorcv16qi3 {}
> +ORC_V16QI iornv16qi3 {}
>  
>const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
> -ORC_V16QI_UNS iorcv16qi3 {}
> +ORC_V16QI_UNS iornv16qi3 {}
>  
>const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
> -ORC_V1TI iorcv1ti3 {}
> +ORC_V1TI iornv1ti3 {}
>  
>const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
> -ORC_V1TI_UNS iorcv1ti3 {}
> +ORC_V1TI_UNS iornv1ti3 {}
>  
>const vd __builtin_altivec_orc_v2df (vd, vd);
> -ORC_V2DF iorcv2df3 {}
> +ORC_V2DF iornv2df3 {}
>  
>const vsll __builtin_altivec_orc_v2di (vsll, vsll);
> -ORC_V2DI iorcv2di3 {}
> +ORC_V2DI iornv2di3 {}
>  
>const vull __builtin_altivec_orc_v2di_uns (vull, vull);
> -ORC_V2DI_UNS iorcv2di3 {}
> +ORC_V2DI_UNS iornv2di3 {}
>  
>const vf __builtin_altivec_orc_v4sf (vf, vf);
> -ORC_V4SF iorcv4sf3 {}
> +ORC_V4SF iornv4sf3 {}
>  
>const vsi __builtin_altivec_orc_v4si (vsi, vsi);
> -ORC_V4SI iorcv4si3 {}
> +ORC_V4SI iornv4si3 {}
>  
>const vui __builtin_altivec_orc_v4si_uns (vui, vui);
> -ORC_V4SI_UNS iorcv4si3 {}
> +ORC_V4SI_UNS iornv4si3 {}
>  
>const vss __builtin_altivec_orc_v8hi 

Re: [PATCH FYI] [powerpc] [testsuite] reorder dg directives [PR106069]

2024-07-22 Thread Kewen.Lin
Hi Alexandre,

on 2024/7/23 10:32, Alexandre Oliva wrote:
> 
> The dg-do directive appears after dg-require-effective-target in
> g++.target/powerpc/pr106069.C.  That doesn't work the way that was
> presumably intended.  Both of these directives set dg-do-what, but
> dg-do does so fully and unconditionally, overriding any decisions
> recorded there by earlier directives.  Reorder the directives more
> canonically, so that both take effect.

Thanks for catching and fixing!

BR,
Kewen

> 
> Tested with gcc-13 targeting ppc64-vx7r2 on altivec-less hardware.  I'm
> installing it as obvious, and backporting to the branches that have the
> inconsistent testcase.
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   PR target/106069
>   * g++.target/powerpc/pr106069.C: Reorder dg directives.
> ---
>  gcc/testsuite/g++.target/powerpc/pr106069.C |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C 
> b/gcc/testsuite/g++.target/powerpc/pr106069.C
> index 537207d2fe838..826379a4479a8 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr106069.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
> @@ -1,6 +1,6 @@
> +/* { dg-do run } */
>  /* { dg-options "-O -fno-tree-forwprop -maltivec" } */
>  /* { dg-require-effective-target vmx_hw } */
> -/* { dg-do run } */
>  
>  typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
>  
> 
> 



Re: [PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-22 Thread Kewen.Lin
Hi Carl,

on 2024/7/23 01:37, Carl Love wrote:
> 
> Kewen:
> 
> On 7/22/24 2:09 AM, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2024/7/18 00:01, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
>>> __builtin_vec_set_v2di built-ins.  The users should just use normal C-code 
>>> to update the various vector elements.  This change was originally intended 
>>> to be part of the earlier series of cleanup patches.  It was initially 
>>> thought that some additional work would be needed to do some gimple 
>>> generation instead of these built-ins.  However, the existing default code 
>>> generation does produce the needed code.  The code generated with normal 
>>> C-code is as good or better than the code generated with these built-ins.
>> I think we need to expand this a bit:
>>    - For vec_set bif, the equivalent C code is as good as or better than it.
>>    - For vec_insert bif whose resolving makes use of vec_set bif previously 
>> (now get removed),
>>  it's as good as before with optimization.
>>> The patch has been tested on Power 10 LE with no regressions.
>>>
>>> Please let me know if the patch is acceptable for mainline.  Thanks.
>>>
>>>     Carl
>>>
>>> ---
>>> rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
>>> __builtin_vec_set_v2di
>>>
>>> Remove the built-ins, use the default gimple generation instead.
>>>
>>> gcc/ChangeLog:
>>>  * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
>>>  __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
>>>  definitions.
>>>  * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
>>>  statemnts for mode == V2DFmode, mode == V2DImode and
>> Nit: s/statemnts/statements/
> 
> OK, fixed
>> Maybe a bit more meaningful like: Remove the handling for constant 
>> vec_insert position
>> with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
> OK, changed
>>
>>
>>>  mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
>>>  RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
>>> ---
>>>   gcc/config/rs6000/rs6000-builtins.def | 13 -
>>>   gcc/config/rs6000/rs6000-c.cc | 40 ---
>>>   2 files changed, 53 deletions(-)
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 896d9686ac6..0ebc940f395 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -1263,19 +1263,6 @@
>>>     const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>>>   VEC_EXT_V2DI nothing {extract}
>>>
>>> -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
>>> -;; resolve_vec_insert(), rs6000-c.cc
>>> -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
>>> -;; in resolve_vec_insert are replaced by the equivalent gimple statements.
>>> -  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
>>> -    VEC_SET_V1TI nothing {set}
>>> -
>>> -  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
>>> -    VEC_SET_V2DF nothing {set}
>>> -
>>> -  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
>>> -    VEC_SET_V2DI nothing {set}
>>> ->> Unexpected empty line removed.
> ??  I don't remove the blank line before the removed comment, so there is 
> still a single blank line before the next entry. Specifically, the code with 
> the above removed now looks like:
> 
> ...
>   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>     VEC_EXT_V2DI nothing {extract}
> 
>   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
>     CMPGE_16QI vector_nltv16qi {}
> 
>   const vsll __builtin_vsx_cmpge_2di (vsll, vsll);
>     CMPGE_2DI vector_nltv2di {}
> 
> 
> Which looks OK to me?

ah, I missed that, so it's fine, thanks for clarifying!

>>
>> Similar to vec_init removal, we should also get rid of set bif attribute,
>> bif_is_set and altivec_expand_vec_set_builtin etc.
> That will also require removing:
> 
>  const vsq __builtin_vsx_set_1ti (vsq, signed __int128, co

Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-22 Thread Kewen.Lin
Hi Andrew,

on 2024/7/23 08:09, Andrew Pinski wrote:
> On Sun, Jun 30, 2024 at 11:17 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR115659 shows, assuming c = x CMP y, there are some
>> folding chances for patterns r = c ? 0/z : z/-1:
>>   - For r = c ? 0 : z, it can be folded into r = ~c & z.
>>   - For r = c ? z : -1, it can be folded into r = ~c | z.
>>
>> But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
>> compound operation, I'm not sure if each target with
>> vector capability have a single vector instruction for it,
>> if no, it's arguable to consider it always beats vector
>> selection (like vector constant gets hoisted or combined
>> and selection has same latency as normal logical operation).
>> So IMHO we probably need to query target with new optabs.
>> So this patch is to introduce new optabs andc, iorc and its
>> corresponding internal functions BIT_{ANDC,IORC} (looking
>> for suggestion for naming optabs and ifns), and if targets
>> defines such optabs for vector modes, it means targets
>> support these hardware insns and should be not worse than
>> vector selection.  btw, the rs6000 changes are meant to
>> give an example for a target supporting andc/iorc.
>>
>> Does this sound reasonable?
> 
> Just a quick FYI (I will be making the change and testing the change).
> The optab names `andc` and `iorc` unfortunately do not work with
> scalar modes since there are complex modes which start with c and are
> combined with the scalar modes. So for an example a pattern named
> `andcsi3` is not for the optab `andc` with the mode of si but rather
> for `and` optab and for the mode `csi`. The same issue happens for
> `iorc` too.

ah, thanks for pointing out this!  I guess a "_" can help, that is:

OPTAB_D (andc_optab, "andc_$a3")
OPTAB_D (iorc_optab, "iorc_$a3")

but the downside is the code naming become different from "and$a3"
and "ior$a3", so it seems better to use different names like what
you proposed.

> Thinking out loud on what names we should use instead; `andn` and
> `iorn` might be ok? Does anyone else have any suggestions?

FWIW, they look good to me.

BR,
Kewen

> 
> Note I will also be adding the cond version of them since at least for
> aarch64 SVE, bic (andc) can be conditional.
> 
> Thanks,
> Andrew Pinski
> 
>>




PING^3 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-07-22 Thread Kewen.Lin
Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html

BR,
Kewen

on 2024/7/12 00:15, Martin Jambor wrote:
> Hi,
> 
> can I add myself to the bunch of people who are pinging this?  Having
> this in will make our life easier.
> 
> Thanks a lot,
> 
> Martin
> 
> 
> On Wed, May 08 2024, Kewen.Lin wrote:
>> Hi,
>>
>> As the discussion in PR112980, although the current
>> implementation for -fpatchable-function-entry* conforms
>> with the documentation (making N NOPs be consecutive),
>> it's inefficient for both kernel and userspace livepatching
>> (see comments in PR for the details).
>>
>> So this patch is to change the current implementation by
>> emitting the "before" NOPs before global entry point and
>> the "after" NOPs after local entry point.  The new behavior
>> would not keep NOPs to be consecutive, so the documentation
>> is updated to emphasize this.
>>
>> Bootstrapped and regress-tested on powerpc64-linux-gnu
>> P8/P9 and powerpc64le-linux-gnu P9 and P10.
>>
>> Is it ok for trunk?  And backporting to active branches
>> after burn-in time?  I guess we should also mention this
>> change in changes.html?
>>
>> BR,
>> Kewen
>> -
>>  PR target/112980
>>
>> gcc/ChangeLog:
>>
>>  * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>>  Adjust the handling on patch area emitting with dual entry, remove
>>  the restriction on "before" NOPs count, not emit "before" NOPs any
>>  more but only emit "after" NOPs.
>>  * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>>  Adjust by respecting cfun->machine->stop_patch_area_print.
>>  (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
>>  cfun->machine->stop_patch_area_print as true.
>>  * config/rs6000/rs6000.h (struct machine_function): Remove member
>>  global_entry_emitted, add new member stop_patch_area_print.
>>  * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
>>  documentation for PowerPC ELFv2 dual entry.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * c-c++-common/patchable_function_entry-default.c: Adjust.
>>  * gcc.target/powerpc/pr99888-4.c: Likewise.
>>  * gcc.target/powerpc/pr99888-5.c: Likewise.
>>  * gcc.target/powerpc/pr99888-6.c: Likewise.
>> ---
>>  gcc/config/rs6000/rs6000-logue.cc | 40 +--
>>  gcc/config/rs6000/rs6000.cc   | 15 +--
>>  gcc/config/rs6000/rs6000.h| 10 +++--
>>  gcc/doc/invoke.texi   |  8 ++--
>>  .../patchable_function_entry-default.c|  3 --
>>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>>  8 files changed, 33 insertions(+), 55 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
>> b/gcc/config/rs6000/rs6000-logue.cc
>> index 60ba15a8bc3..0eb019b44b3 100644
>> --- a/gcc/config/rs6000/rs6000-logue.cc
>> +++ b/gcc/config/rs6000/rs6000-logue.cc
>> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
>>fprintf (file, "\tadd 2,2,12\n");
>>  }
>>
>> -  unsigned short patch_area_size = crtl->patch_area_size;
>> -  unsigned short patch_area_entry = crtl->patch_area_entry;
>> -  /* Need to emit the patching area.  */
>> -  if (patch_area_size > 0)
>> -{
>> -  cfun->machine->global_entry_emitted = true;
>> -  /* As ELFv2 ABI shows, the allowable bytes between the global
>> - and local entry points are 0, 4, 8, 16, 32 and 64 when
>> - there is a local entry point.  Considering there are two
>> - non-prefixed instructions for global entry point prologue
>> - (8 bytes), the count for patchable nops before local entry
>> - point would be 2, 6 and 14.  It's possible to support those
>> - other counts of nops by not making a local entry point, but
>> - we don't have clear use cases for them, so leave them
>> - unsupported for now.  */
>> -  if (patch_area_entry > 0)
>> -{
>> -  if (patch_area_entry != 2
>> -  && patch_area_entry != 6
>> -  && patch_area_entry != 14)
>> - 

Re: [PATCH-2v5, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-07-22 Thread Kewen.Lin
Hi Haochen,

on 2024/7/18 09:45, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isfinite for SFDF and IEEE128 by
> test data class instructions.
> 
>   Compared with previous version, the main change is to merge
> the patterns of SFDF and IEEE128 into one.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655780.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isfinite for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isfinite2): New expand.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-4.c: New test.
>   * gcc.target/powerpc/pr97786-5.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index d30416a53e7..763cd916c8d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5304,6 +5304,20 @@ (define_expand "isinf2"
>DONE;
>  })
> 
> +(define_expand "isfinite2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE_FP 1 ""))]
> +  "TARGET_P9_VECTOR
> +   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
> +{
> +  rtx tmp = gen_reg_rtx (SImode);

Nit: maybe add one comment like

/* It is neither infinite nor NAN.  */

OK for trunk with/without this tweaked, thanks.

BR,
Kewen

> +  int mask = VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF
> +  | VSX_TEST_DATA_CLASS_NAN;
> +  emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask)));
> +  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-4.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
> new file mode 100644
> index 000..9cdde78257d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-4.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +
> +int test1 (double x)
> +{
> +  return __builtin_isfinite (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isfinite (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-5.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
> new file mode 100644
> index 000..0ef8b86f6cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-5.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +/* { dg-require-effective-target powerpc_vsx } */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isfinite (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */



Re: [PATCH-3v5, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-07-22 Thread Kewen.Lin
Hi Haochen,

on 2024/7/18 09:45, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isnormal for SFDF and IEEE128 by
> test data class instructions.
> 
>   Compared with previous version, the main change is to merge
> the patterns of SFDF and IEEE128 into one.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655781.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isnormal for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isnormal2): New expand.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-7.c: New test.
>   * gcc.target/powerpc/pr97786-8.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 763cd916c8d..f818aba9e3e 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5318,6 +5318,23 @@ (define_expand "isfinite2"
>DONE;
>  })
> 
> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE_FP 1 ""))]
> +  "TARGET_P9_VECTOR
> +   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
> +{
> +  rtx tmp = gen_reg_rtx (SImode);

Nit: maybe add one comment like

/* It is neither NAN, infinite, zero, nor denormal.  */

OK for trunk with/without this tweaked, thanks.

BR,
Kewen

> +  int mask = VSX_TEST_DATA_CLASS_NAN
> +  | VSX_TEST_DATA_CLASS_POS_INF | VSX_TEST_DATA_CLASS_NEG_INF
> +  | VSX_TEST_DATA_CLASS_POS_ZERO | VSX_TEST_DATA_CLASS_NEG_ZERO
> +  | VSX_TEST_DATA_CLASS_POS_DENORMAL
> +  | VSX_TEST_DATA_CLASS_NEG_DENORMAL;
> +  emit_insn (gen_xststdc_ (tmp, operands[1], GEN_INT (mask)));
> +  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-7.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
> new file mode 100644
> index 000..eb01eed39d3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +
> +int test1 (double x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-8.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
> new file mode 100644
> index 000..eba90d3b1b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +/* { dg-require-effective-target powerpc_vsx } */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler {\mxststdcqp\M} } } */




Re: [PATCH] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-22 Thread Kewen.Lin
Hi Carl,

on 2024/7/17 23:52, Carl Love wrote:
> GCC maintainers:
> 
> The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
> builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  The 
> difference is the overloaded built-ins return a vector of boolean or a vector 
> of long long booleans where as the removed built-ins returned a vector of 
> floats or vector of doubles.
> 
> The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
> __builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
> vec_cmp[eq|ge|gt] built-in with the required changes for the return type.  
> Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.
> 
> The patches have been tested on a Power 10 LE system with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> -
> rs6000, remove __builtin_vsx_xvcmp* built-ins
> 
> This patch removes the built-ins:
>  __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
>  __builtin_vsx_xvcmpgtsp.
> 
> which are similar to the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge
> built-ins.

The important thing is that vec_cmp{eq,ge,gt} are PVIPR built-in functions,
which are more recommended.

> 
> The difference is that the overloaded built-ins return a vector of
> booleans or a vector of long long boolean depending if the inputs were a
> vector of floats or a vector of doubles.  The removed built-ins
> returned a vector of floats or vector of double for the vector float and
> vector double inputs respectively.

This paragraph is a bit confusing, as the previous paragraph is saying to
remove those *sp built-ins, then they can only return a bool int vector
(not a bool long long vector).  It needs some rewording.

> 
> The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
> __builtin_vsx_xvcmpgtdp are not removed as they are used by the
> overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.
> 
> The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
> __builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
> __builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
> the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
> overloaded built-ins requires the result to be stored in a vector of
> boolean of the appropriate size or the result must be cast to the return
> type used by the original __builtin_vsx_xvcmp* built-ins.

It is missing a changelog here.

> ---
>  gcc/config/rs6000/rs6000-builtins.def | 10 ---
>  .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
>  2 files changed, 21 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 77eb0f7e406..896d9686ac6 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1579,30 +1579,20 @@
>    const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
>  XVCMPEQDP_P vector_eq_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
> -    XVCMPEQSP vector_eqv4sf {}
> -
>    const vd __builtin_vsx_xvcmpgedp (vd, vd);
>  XVCMPGEDP vector_gev2df {}
> 
>    const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
>  XVCMPGEDP_P vector_ge_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgesp (vf, vf);
> -    XVCMPGESP vector_gev4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
>  XVCMPGESP_P vector_ge_v4sf_p {pred}
> 
>    const vd __builtin_vsx_xvcmpgtdp (vd, vd);
>  XVCMPGTDP vector_gtv2df {}
> -

Unexpected remove.

>    const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
>  XVCMPGTDP_P vector_gt_v2df_p {pred}
> 
> -  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
> -    XVCMPGTSP vector_gtv4sf {}
> -
>    const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
>  XVCMPGTSP_P vector_gt_v4sf_p {pred}
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
> b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> index 60f91aad23c..d67f97c8011 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
> @@ -156,13 +156,27 @@ int do_cmp (void)
>  {
>    int i = 0;
> 
> -  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
> -  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
> -  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
> -
> -  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
> -  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
> -  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
> +  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and __builtin_vsx_xvcmp[gt|ge|eq]sp
> + have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
> + vec_cmpge built-ins.  The __builtin_vsx_xvcmp* 

Re: [PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-22 Thread Kewen.Lin
Hi Carl,

on 2024/7/18 00:01, Carl Love wrote:
> GCC maintainers:
> 
> This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
> __builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
> update the various vector elements.  This change was originally intended to 
> be part of the earlier series of cleanup patches.  It was initially thought 
> that some additional work would be needed to do some gimple generation 
> instead of these built-ins.  However, the existing default code generation 
> does produce the needed code.  The code generated with normal C-code is as 
> good or better than the code generated with these built-ins.

I think we need to expand this a bit:
  - For vec_set bif, the equivalent C code is as good as or better than it. 
  - For vec_insert bif whose resolving makes use of vec_set bif previously (now 
get removed),
it's as good as before with optimization.

> 
> The patch has been tested on Power 10 LE with no regressions.
> 
> Please let me know if the patch is acceptable for mainline.  Thanks.
> 
>    Carl
> 
> ---
> rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
> __builtin_vec_set_v2di
> 
> Remove the built-ins, use the default gimple generation instead.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
>     __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
>     definitions.
>     * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
>     statemnts for mode == V2DFmode, mode == V2DImode and

Nit: s/statemnts/statements/

Maybe a bit more meaningful like: Remove the handling for constant vec_insert 
position
with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.


>     mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
>     RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 13 -
>  gcc/config/rs6000/rs6000-c.cc | 40 ---
>  2 files changed, 53 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 896d9686ac6..0ebc940f395 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1263,19 +1263,6 @@
>    const signed long long __builtin_vec_ext_v2di (vsll, signed int);
>  VEC_EXT_V2DI nothing {extract}
> 
> -;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
> -;; resolve_vec_insert(), rs6000-c.cc
> -;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
> -;; in resolve_vec_insert are replaced by the equivalent gimple statements.
> -  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
> -    VEC_SET_V1TI nothing {set}
> -
> -  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
> -    VEC_SET_V2DF nothing {set}
> -
> -  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
> -    VEC_SET_V2DI nothing {set}
> -

Unexpected empty line removed.

Similar to vec_init removal, we should also get rid of set bif attribute,
bif_is_set and altivec_expand_vec_set_builtin etc.

BR,
Kewen

>    const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
>  CMPGE_16QI vector_nltv16qi {}
> 
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 6229c503bd0..c288acc200b 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -1522,46 +1522,6 @@ resolve_vec_insert (resolution *res, vec 
> *arglist,
>    return error_mark_node;
>  }
> 
> -  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
> -  machine_mode mode = TYPE_MODE (arg1_type);
> -
> -  if ((mode == V2DFmode || mode == V2DImode)
> -  && VECTOR_UNIT_VSX_P (mode)
> -  && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  wide_int selector = wi::to_wide (arg2);
> -  selector = wi::umod_trunc (selector, 2);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  tree call = NULL_TREE;
> -  if (mode == V2DFmode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
> -  else if (mode == V2DImode)
> -    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     reversed.  */
> -  if (call)
> -    {
> -      *res = resolved;
> -      return build_call_expr (call, 3, arg1, arg0, arg2);
> -    }
> -    }
> -
> -  else if (mode == V1TImode
> -       && VECTOR_UNIT_VSX_P (mode)
> -       && TREE_CODE (arg2) == INTEGER_CST)
> -    {
> -  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
> -  wide_int selector = wi::zero(32);
> -  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> -
> -  /* Note, __builtin_vec_insert_ has vector and scalar types
> -     

Re: [PATCH V5] report message for operator %a on unaddressible operand

2024-07-22 Thread Kewen.Lin
Hi Jeff,

on 2024/7/16 13:39, Jiufu Guo wrote:
> Hi,
> 
> For PR96866, when printing asm code for modifier "%a", an addressable
> operand is required.  While the constraint "X" allow any kind of
> operand even which is hard to get the address directly. e.g. extern
> symbol whose address is in TOC.
> An error message would be reported to indicate the invalid asm operand.
> 
> Compare with previous version, test case is updated with -mno-pcrel.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu Guo)
> 
>   PR target/96866
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (print_operand_address): Emit message for
>   Unsupported operand.

Nit: s/Unsupported/unsupported/

> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr96866-1.c: New test.
>   * gcc.target/powerpc/pr96866-2.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  |  7 ++-
>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 18 ++
>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 13 +
>  3 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..7e7c36a1bad 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14664,7 +14664,12 @@ print_operand_address (FILE *file, rtx x)
>   fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>reg_names[SMALL_DATA_REG]);
>else
> - gcc_assert (!TARGET_TOC);
> + {
> +   /* Do not support getting address directly from TOC, emit error.
> +  No more work is needed for !TARGET_TOC. */
> +   if (TARGET_TOC)
> + output_operand_lossage ("%%a requires an address of memory");
> + }
>  }
>else if (GET_CODE (x) == PLUS && REG_P (XEXP (x, 0))
>  && REG_P (XEXP (x, 1)))
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> new file mode 100644
> index 000..bcebbd6e310
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> @@ -0,0 +1,18 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC, even the symbol is propgated for "X" constraint under -O2. */

Nit: s/propgated/propagated/

OK with these nits tweaked, thanks!

BR,
Kewen

> +/* { dg-options "-fPIC -O2 -mno-pcrel" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-1.c" } */
> +
> +int x[2];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int n;
> +  int *p = x;
> +  *p++;
> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
> +  return n;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> new file mode 100644
> index 000..0577fd6d588
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> @@ -0,0 +1,13 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC. */
> +/* { dg-options "-fPIC -O2 -mno-pcrel" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */
> +
> +void
> +f (void)
> +{
> +  extern int x;
> +  __asm__ volatile("#%a0" ::"X"());
> +}



Re: [PATCH] testsuite: powerpc: fix dg-do run typo

2024-07-21 Thread Kewen.Lin
Hi Sam,

on 2024/7/20 07:10, Sam James wrote:
> "Kewen.Lin"  writes:
> 
>> Hi Sam,
> 
> Hi Kewen,
> 
>>
>> on 2024/7/19 11:28, Sam James wrote:
>>> 'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here
>>> for the test to be executed.
>>>
>>> 2024-07-18  Sam James  
>>>
>>> PR target/108699
>>> * gcc.target/powerpc/pr108699.c: Fix 'dg-run' typo.
>>> ---
>>> Kewen, could you check this on powerpc to ensure it doesn't execute 
>>> beforehand
>>> and now it does? I could do it on powerpc but I don't have anything setup
>>> right now.
>>
>> Oops, thanks for catching and fixing this stupid typo!  Yes, I just 
>> confirmed that,
>> w/ this fix pr108699.exe gets generated and executed (# of expected passes 
>> is changed
>> from 1 to 2).
> 
> Many thanks! Could you push for me please?

Sure, pushed as r15-2190.

BR,
Kewen

> 
>>
>> BR,
>> Kewen
> 
> best,
> sam
> 
>>
>>>
>>>  gcc/testsuite/gcc.target/powerpc/pr108699.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108699.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> index f02bac130cc7..beb8b601fd51 100644
>>> --- a/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108699.c
>>> @@ -1,4 +1,4 @@
>>> -/* { dg-run } */
>>> +/* { dg-do run } */
>>>  /* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>>>  
>>>  #define N 16
>>>



Re: [PATCH] testsuite: powerpc: fix dg-do run typo

2024-07-18 Thread Kewen.Lin
Hi Sam,

on 2024/7/19 11:28, Sam James wrote:
> 'dg-run' is not a valid dejagnu directive, 'dg-do run' is needed here
> for the test to be executed.
> 
> 2024-07-18  Sam James  
> 
>   PR target/108699
>   * gcc.target/powerpc/pr108699.c: Fix 'dg-run' typo.
> ---
> Kewen, could you check this on powerpc to ensure it doesn't execute beforehand
> and now it does? I could do it on powerpc but I don't have anything setup
> right now.

Oops, thanks for catching and fixing this stupid typo!  Yes, I just confirmed 
that,
w/ this fix pr108699.exe gets generated and executed (# of expected passes is 
changed
from 1 to 2).

BR,
Kewen

> 
>  gcc/testsuite/gcc.target/powerpc/pr108699.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108699.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108699.c
> index f02bac130cc7..beb8b601fd51 100644
> --- a/gcc/testsuite/gcc.target/powerpc/pr108699.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108699.c
> @@ -1,4 +1,4 @@
> -/* { dg-run } */
> +/* { dg-do run } */
>  /* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>  
>  #define N 16
> 



Re: [PATCH 1/2] PR 115800: Fix libgfortran build using --with-cpu=power5

2024-07-18 Thread Kewen.Lin
Hi Mike,

I guess you should CC fort...@gcc.gnu.org as well.

on 2024/7/11 01:25, Michael Meissner wrote:
> If you build a little endian compiler and select a default CPU of power5
> (i.e. --with-cpu=power5), GCC cannot be built.  The reason is that both the
> libgfortran and libstdc++-v3 libraries assume that all little endian powerpc
> builds support IEEE 128-bit floating point.
> 
> However, if the default cpu does not support the VSX instruction set, then we
> cannot build the IEEE 128-bit libraries.  This patch fixes the libgfortran
> library so if the GCC compiler does not support IEEE 128-bit floating point, 
> the
> IEEE 128-bit floating point libraries are not built.  A companion patch will 
> fix
> the libstdc++-v3 library.
> 
> I have built these patches on a little endian system, doing both normal 
> builds,
> and making a build with a power5 default.  There was no regression in the 
> normal
> builds.  I have also built a big endian GCC compiler and there was no 
> regression
> there.  Can I check this patch into the trunk?
> 
> 2024-07-10  Michael Meissner  
> 
> libgfortran/
> 
>   PR target/115800
>   * configure.ac (powerpc64le*-linux*): Check to see that the compiler
>   uses VSX before enabling IEEE 128-bit support.
>   * configure: Regenerate.
>   * kinds-override.h (GFC_REAL_17): Add check for __VSX__.
>   * libgfortran.h (POWER_IEEE128): Likewise.
> 
> ---
>  libgfortran/configure| 7 +--
>  libgfortran/configure.ac | 3 +++
>  libgfortran/kinds-override.h | 2 +-
>  libgfortran/libgfortran.h| 2 +-
>  4 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/libgfortran/configure b/libgfortran/configure
> index 11a1bc5f070..2708e5c7eca 100755
> --- a/libgfortran/configure
> +++ b/libgfortran/configure
> @@ -5981,6 +5981,9 @@ if test "x$GCC" = "xyes"; then
>  #if __SIZEOF_LONG_DOUBLE__ != 16
>  #error long double is double
>  #endif
> +#if !defined(__VSX__)
> +#error VSX is not available
> +#endif

All the touched code cares about type _Float128 which is available only if
TARGET_FLOAT128_TYPE is set (TARGET_FLOAT128_TYPE depends on VSX).  I think
we should check for macro __FLOAT128_TYPE__ instead (we define this macro
when TARGET_FLOAT128_TYPE is set), IMHO it looks more meaningful, this is
similar for libstdc++ sub-patch.

BR,
Kewen

>  int
>  main ()
>  {
> @@ -12847,7 +12850,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 12850 "configure"
> +#line 12853 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -12953,7 +12956,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 12956 "configure"
> +#line 12959 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> diff --git a/libgfortran/configure.ac b/libgfortran/configure.ac
> index cca1ea0ea97..cfaeb9717ab 100644
> --- a/libgfortran/configure.ac
> +++ b/libgfortran/configure.ac
> @@ -148,6 +148,9 @@ if test "x$GCC" = "xyes"; then
>AC_PREPROC_IFELSE(
>  [AC_LANG_PROGRAM([[#if __SIZEOF_LONG_DOUBLE__ != 16
>  #error long double is double
> +#endif
> +#if !defined(__VSX__)
> +#error VSX is not available
>  #endif]],
>   [[(void) 0;]])],
>  [AM_FCFLAGS="$AM_FCFLAGS -mabi=ibmlongdouble -mno-gnu-attribute";
> diff --git a/libgfortran/kinds-override.h b/libgfortran/kinds-override.h
> index f6b4956c5ca..51f440e5323 100644
> --- a/libgfortran/kinds-override.h
> +++ b/libgfortran/kinds-override.h
> @@ -30,7 +30,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #endif
>  
>  /* Keep these conditions on one line so grep can filter it out.  */
> -#if defined(__powerpc64__)  && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__  && 
> __SIZEOF_LONG_DOUBLE__ == 16
> +#if defined(__powerpc64__)  && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__  && 
> __SIZEOF_LONG_DOUBLE__ == 16 && defined(__VSX__)
>  typedef _Float128 GFC_REAL_17;
>  typedef _Complex _Float128 GFC_COMPLEX_17;
>  #define HAVE_GFC_REAL_17
> diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h
> index 5c59ec26e16..23660335243 100644
> --- a/libgfortran/libgfortran.h
> +++ b/libgfortran/libgfortran.h
> @@ -104,7 +104,7 @@ typedef off_t gfc_offset;
>  #endif
>  
>  #if defined(__powerpc64__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ \
> -&& defined __GLIBC_PREREQ
> +&& defined __GLIBC_PREREQ && defined(__VSX__)
>  #if __GLIBC_PREREQ (2, 32)
>  #define POWER_IEEE128 1
>  #endif




[PATCH] rs6000: Use standard name uabd for vabsdu insns

2024-07-18 Thread Kewen.Lin
Hi,

r14-1832 adds recognition pattern, ifn and optab for ABD
(ABsolute Difference), we have some vector absolute
difference unsigned instructions since ISA 3.0, as the
associated test cases shown, they are not exploited well
as we don't define it (them) with a standard name.  So this
patch is to rename it with standard name first.  And it
merges both define_expand and define_insn as a separated
define_expand isn't needed.  Besides, it adjusts the RTL
pattern by using generic umax and umin rather than
UNSPEC_VADU, it's more meaningful and can catch umin/umax
opportunity.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9/P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/altivec.md (p9_vadu3): Rename to ...
(uabd3): ... this.  Update RTL pattern with umin and umax rather
than UNSPEC_VADU.
(vadu3): Remove.
(UNSPEC_VADU): Remove.
(usadv16qi): Replace gen_p9_vaduv16qi3 with gen_uabdv16qi3.
(usadv8hi): Replace gen_p9_vaduv8hi3 with gen_uabdv8hi3.
* config/rs6000/rs6000-builtins.def (__builtin_altivec_vadub): Replace
expander with uabdv16qi3.
(__builtin_altivec_vaduh): Adjust expander with uabdv8hi3.
(__builtin_altivec_vaduw): Adjust expander with uabdv4si3.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/abd-vectorize-1.c: New test.
* gcc.target/powerpc/abd-vectorize-2.c: New test.
---
 gcc/config/rs6000/altivec.md  | 25 +
 gcc/config/rs6000/rs6000-builtins.def |  6 +--
 .../gcc.target/powerpc/abd-vectorize-1.c  | 27 ++
 .../gcc.target/powerpc/abd-vectorize-2.c  | 37 +++
 4 files changed, 77 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/abd-vectorize-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/abd-vectorize-2.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..aa9d8fffc90 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -119,7 +119,6 @@ (define_c_enum "unspec"
UNSPEC_STVLXL
UNSPEC_STVRX
UNSPEC_STVRXL
-   UNSPEC_VADU
UNSPEC_VSLV
UNSPEC_VSRV
UNSPEC_VMULWHUB
@@ -4323,19 +4322,15 @@ (define_insn "*p8v_clz2"
   [(set_attr "type" "vecsimple")])

 ;; Vector absolute difference unsigned
-(define_expand "vadu3"
-  [(set (match_operand:VI 0 "register_operand")
-(unspec:VI [(match_operand:VI 1 "register_operand")
-   (match_operand:VI 2 "register_operand")]
- UNSPEC_VADU))]
-  "TARGET_P9_VECTOR")
-
-;; Vector absolute difference unsigned
-(define_insn "p9_vadu3"
+(define_insn "uabd3"
   [(set (match_operand:VI 0 "register_operand" "=v")
-(unspec:VI [(match_operand:VI 1 "register_operand" "v")
-   (match_operand:VI 2 "register_operand" "v")]
- UNSPEC_VADU))]
+   (minus:VI
+ (umax:VI
+   (match_operand:VI 1 "register_operand" "v")
+   (match_operand:VI 2 "register_operand" "v"))
+ (umin:VI
+   (match_dup 1)
+   (match_dup 2]
   "TARGET_P9_VECTOR"
   "vabsdu %0,%1,%2"
   [(set_attr "type" "vecsimple")])
@@ -4500,7 +4495,7 @@ (define_expand "usadv16qi"
   rtx zero = gen_reg_rtx (V4SImode);
   rtx psum = gen_reg_rtx (V4SImode);

-  emit_insn (gen_p9_vaduv16qi3 (absd, operands[1], operands[2]));
+  emit_insn (gen_uabdv16qi3 (absd, operands[1], operands[2]));
   emit_insn (gen_altivec_vspltisw (zero, const0_rtx));
   emit_insn (gen_altivec_vsum4ubs (psum, absd, zero));
   emit_insn (gen_addv4si3 (operands[0], psum, operands[3]));
@@ -4521,7 +4516,7 @@ (define_expand "usadv8hi"
   rtx zero = gen_reg_rtx (V4SImode);
   rtx psum = gen_reg_rtx (V4SImode);

-  emit_insn (gen_p9_vaduv8hi3 (absd, operands[1], operands[2]));
+  emit_insn (gen_uabdv8hi3 (absd, operands[1], operands[2]));
   emit_insn (gen_altivec_vspltisw (zero, const0_rtx));
   emit_insn (gen_altivec_vsum4shs (psum, absd, zero));
   emit_insn (gen_addv4si3 (operands[0], psum, operands[3]));
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 77eb0f7e406..07d18a8eced 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2377,13 +2377,13 @@
 VFIRSTMISMATCHOREOSINDEX_V4SI first_mismatch_or_eos_index_v4si {}

   const vsc __builtin_altivec_vadub (vsc, vsc);
-VADUB vaduv16qi3 {}
+VADUB uabdv16qi3 {}

   const vss __builtin_altivec_vaduh (vss, vss);
-VADUH vaduv8hi3 {}
+VADUH uabdv8hi3 {}

   const vsi __builtin_altivec_vaduw (vsi, vsi);
-VADUW vaduv4si3 {}
+VADUW uabdv4si3 {}

   const vsll __builtin_altivec_vbpermd (vsll, vsc);
 VBPERMD altivec_vbpermd {}
diff --git a/gcc/testsuite/gcc.target/powerpc/abd-vectorize-1.c 
b/gcc/testsuite/gcc.target/powerpc/abd-vectorize-1.c
new file mode 100644
index 000..d63b887b4b8
--- 

Re: [PATCH v2] rs6000: Fix .machine cpu selection w/ altivec [PR97367]

2024-07-18 Thread Kewen.Lin
on 2024/7/18 22:14, Peter Bergner wrote:
> On 7/18/24 4:14 AM, Kewen.Lin wrote:
>>> +/* { dg-final { scan-assembler {\.\mmachine power4\M} } } */
>>> +/* { dg-final { scan-assembler {\.\mmachine altivec\M} } } */
>>
>> Nit: Both \m looks useless and can be removed.
> 
> Fine with me.  Is that because the \. acts like a \m?

Yes, \m is to make sure "machine" is starting at word boundary excluding
some unexpected matching on words like "timemachine", but the preceding
dot (\.) already guarantees that.

BR,
Kewen

> 
> 
> 
>>> Ok for trunk and the release branches after some trunk burn-in time?
>>
>> OK for all with/without the below minor nit tweaked.
> 
> Great, thanks!
> 
> 
> Peter
> 
> 



Re: [PATCH ver 3] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-18 Thread Kewen.Lin
Hi Carl,

on 2024/7/18 00:15, Carl Love wrote:
> GCC maintainers:
> 
> Version 3, in version 2, the ChangeLog didn't get updated to remove the LP64 
> references.  Fixed that and updated the patch description per the feedback 
> from Peter.
> 
> Version 2, removed the lp64 from the target per discussion.  Tested and it is 
> not needed.  The int128 qualifier is sufficient for the thest to report as 
> unsupported on a 32-bit Power system.
> 
> The tests:
> 
>   tests builtins-10-runnable.c
>   tests builtins-10.c
>   vec_perm-runnable-i128.c
> 
> generate the following errors when run on a 32-bit BE Power system with GCC 
> configured with multilib enabled.
> 
> FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
> FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
> FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

For BE testing, I always test both 32-bit and 64-bit, eg: make check 
RUNTESTFLAGS="--target_board=unix'{-m32,-m64}'".

> 
> The tests use the __int128 type which is not supported on 32-bit systems.  
> The test for int128 and lp64 was added to the test cases to disable the test 
> on 32-bit systems and systems that do not support the __int128 type.  The 
> three tests now report "# of unsupported tests 1".

Nit: "... int128 and lp64 was added ..." still mentioned lp64, but I think ...

> 
> The patch has been tested on a Power 9 BE system with multilib enabled for 
> GCC and on a Power 10 LE 64-bit configuration with no regression failures.
> 
> Please let me know if the patch is acceptable for mainline. Thanks.
> 
>    Carl
> --
> rs6000, update effective target for tests builtins-10*.c and 
> vec_perm-runnable-i128.c
> > The tests:
> 
>   tests builtins-10-runnable.c
>   tests builtins-10.c
>   vec_perm-runnable-i128.c
> 
> use __int128 types that are not supported on all platforms.  Update the
> tests to check int128 effective target to avoid unsupported type errors
> on unsupported platforms.

... here is the actual content for commit log, so OK for trunk, thanks!

BR,
Kewen

> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/builtins-10-runnable.c: Add
>     target int128.
>     * gcc.target/powerpc/builtins-10.c: Add
>     target int128.
>     * gcc.target/powerpc/vec_perm-runnable-i128: Add
>     target int128.
> ---
>  gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
>  gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
>  gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
> index dede08358e1..e2d3c990852 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-do run { target int128 } } */
>  /* { dg-require-effective-target vmx_hw } */
>  /* { dg-options "-maltivec -O2 " } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
> b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
> index b00f53cfc62..007892e2731 100644
> --- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
> +++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
> @@ -1,4 +1,4 @@
> -/* { dg-do compile } */
> +/* { dg-do compile { target int128 } } */
>  /* { dg-options "-O2 -maltivec" } */
>  /* { dg-require-effective-target powerpc_altivec } */
>  /* { dg-final { scan-assembler-times "xxsel" 6 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> index 0e0d77bcb84..df1bf873cfc 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> @@ -1,4 +1,4 @@
> -/* { dg-do run } */
> +/* { dg-do run { target  int128 } } */
>  /* { dg-require-effective-target vmx_hw } */
>  /* { dg-options "-maltivec -O2 " } */
> 



Re: [PATCH v2] rs6000: Fix .machine cpu selection w/ altivec [PR97367]

2024-07-18 Thread Kewen.Lin
Hi Peter,

on 2024/7/13 05:48, Peter Bergner wrote:
> René's patch seems to have stalled, so here is an updated version of the
> patch with the requested changes to his patch.
> 
> I'll note I have added an additional code change, which is to also emit a
> ".machine altivec" if Altivec is enabled.  The problem this fixes is for
> cpus like the G5, which is basically a power4 plus an Altivec unit, its
> ".machine power4" doesn't enable the assembler to recognize Altivec insns.
> That isn't a problem if you use gcc -mcpu=G5 to assemble the assembler file,
> since gcc passes -maltivec to the assembler.  However, if you try to assemble
> the assembler file with as by hand, you'll get "unrecognized opcode" errors.
> I did not do the same for VSX, since all ".machine " for cpus that
> support VSX already enable VSX insn recognition, so it's not needed.

Sounds great, thanks for improving it.

> 
> 
> rs6000: Fix .machine cpu selection w/ altivec [PR97367]
> 
> There are various non-IBM CPUs with altivec, so we cannot use that
> flag to determine which .machine cpu to use, so ignore it.
> Emit an additional ".machine altivec" if Altivec is enabled so
> that the assembler doesn't require an explicit -maltivec option
> to assemble any Altivec instructions for those targets where
> the ".machine cpu" is insufficient to enable Altivec.  For example,
> -mcpu=G5 emits a ".machine power4".
> 
> This passed bootstrap and regtesting on powrpc64-linux (running the testsuite
> in both 32-bit and 64-bit modes) with no regressions.
> 
> Ok for trunk and the release branches after some trunk burn-in time?

OK for all with/without the below minor nit tweaked.

> 
> Peter
> 
> 
> 2024-07-12  René Rebe  
>   Peter Bergner  
> 
> gcc/
>   PR target/97367
>   * config/rs6000/rs6000.c (rs6000_machine_from_flags): Do not consider
>   OPTION_MASK_ALTIVEC.
>   (emit_asm_machine): For Altivec compiles, emit a ".machine altivec".
> 
> gcc/testsuite/
>   PR target/97367
>   * gcc.target/powerpc/pr97367.c: New test.
> 
> Signed-of-by: René Rebe 
> ---
>  gcc/config/rs6000/rs6000.cc|  5 -
>  gcc/testsuite/gcc.target/powerpc/pr97367.c | 13 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97367.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2cbea6ea2d7..2cb8f35739b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -5888,7 +5888,8 @@ rs6000_machine_from_flags (void)
>HOST_WIDE_INT flags = rs6000_isa_flags;
>  
>/* Disable the flags that should never influence the .machine selection.  
> */
> -  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> OPTION_MASK_ISEL);
> +  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
> OPTION_MASK_ISEL
> +  | OPTION_MASK_ALTIVEC);
>  
>if ((flags & (ISA_3_1_MASKS_SERVER & ~ISA_3_0_MASKS_SERVER)) != 0)
>  return "power10";
> @@ -5913,6 +5914,8 @@ void
>  emit_asm_machine (void)
>  {
>fprintf (asm_out_file, "\t.machine %s\n", rs6000_machine);
> +  if (TARGET_ALTIVEC)
> +fprintf (asm_out_file, "\t.machine altivec\n");
>  }
>  #endif
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97367.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97367.c
> new file mode 100644
> index 000..f9118dbcdec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97367.c
> @@ -0,0 +1,13 @@
> +/* PR target/97367 */
> +/* { dg-options "-mdejagnu-cpu=G5" } */
> +
> +/* Verify we emit a ".machine power4" and ".machine altivec" rather
> +   than a ".machine power7".  */
> +
> +int dummy (void)
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler {\.\mmachine power4\M} } } */
> +/* { dg-final { scan-assembler {\.\mmachine altivec\M} } } */

Nit: Both \m looks useless and can be removed.

BR,
Kewen



Re: [PATCHv2, rs6000] Optimize vector construction with two vector doubleword loads [PR103568]

2024-07-18 Thread Kewen.Lin
Hi Haochen,

on 2024/5/31 11:25, HAO CHEN GUI wrote:
> Hi,
>   This patch optimizes vector construction with two vector doubleword loads.
> It generates an optimal insn sequence as "xxlor" has lower latency than
> "mtvsrdd" on Power10.
> 
>   Compared with previous version, the main change is to use "isa" attribute
> to guard "lxsd" and "lxsdx".
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653103.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Optimize vector construction with two vector doubleword loads
> 
> When constructing a vector by two doublewords from memory, originally it
> does
>   ld 10,0(3)
>   ld 9,0(4)
>   mtvsrdd 34,9,10
> 
> An optimal sequence on Power10 should be
>   lxsd 0,0(4)
>   lxvrdx 1,0,3
>   xxlor 34,1,32

Thanks for doing this, as PR #c0, could you also evaluate if it can actually
help SPEC2017 bmk 510.parest_r on Power10?

> 
> This patch does this optimization by insn combine and split.
> 
> gcc/
>   PR target/103568
>   * config/rs6000/vsx.md (vsx_ld_lowpart_zero_): New insn
>   pattern.
>   (vsx_ld_highpart_zero_): New insn pattern.
>   (vsx_concat_mem_): New insn_and_split pattern.
> 
> gcc/testsuite/
>   PR target/103568
>   * gcc.target/powerpc/pr103568.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..f9a2a260e89 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1395,6 +1395,27 @@ (define_insn "vsx_ld_elemrev_v2di"
>"lxvd2x %x0,%y1"
>[(set_attr "type" "vecload")])
> 
> +(define_insn "vsx_ld_lowpart_zero_"

Nit: Maybe just use mnemonic in the name? 

> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
> + (vec_concat:VSX_D
> +   (match_operand: 1 "memory_operand" "wY,Z")
> +   (match_operand: 2 "zero_constant" "j,j")))]

I think we should consider BE and LE here, this pattern only
matches the underlying insn on BE, we need a new pattern for LE
by swapping operand 1 and operand 2.

> +  ""
> +  "@
> +   lxsd %0,%1
> +   lxsdx %x0,%y1"
> +  [(set_attr "type" "vecload,vecload")
> +   (set_attr "isa" "p9v,p7v")])

Guarding this semantic with pre-p10 isa is wrong here, these two
insns are not guaranteed to have zero doubleword 1 semantic
on pre-Power10 like Power9 etc.

ISA 3.1

  The contents of doubleword element 1 of VSR[VRT+32]
  are set to 0.

ISA 3.0...2.06

  The contents of doubleword element 1 of VSR[XT] are
  undefined.

> +
> +(define_insn "vsx_ld_highpart_zero_"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> + (vec_concat:VSX_D
> +   (match_operand: 1 "zero_constant" "j")
> +   (match_operand: 2 "memory_operand" "Z")))]

Likewise on the pattern semantic.

> +  "TARGET_POWER10"
> +  "lxvrdx %x0,%y2"
> +  [(set_attr "type" "vecload")])
> +
>  (define_insn "vsx_ld_elemrev_v1ti"
>[(set (match_operand:V1TI 0 "vsx_register_operand" "=wa")
>  (vec_select:V1TI
> @@ -3063,6 +3084,26 @@ (define_insn "vsx_concat_"
>  }
>[(set_attr "type" "vecperm,vecmove")])
> 
> +(define_insn_and_split "vsx_concat_mem_"
> +  [(set (match_operand:VSX_D 0 "vsx_register_operand" "=v,wa")
> + (vec_concat:VSX_D
> +   (match_operand: 1 "memory_operand" "wY,Z")
> +   (match_operand: 2 "memory_operand" "Z,Z")))]
> +  "TARGET_POWER10 && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  rtx tmp1 = gen_reg_rtx (mode);
> +  rtx tmp2 = gen_reg_rtx (mode);
> +  emit_insn (gen_vsx_ld_highpart_zero_ (tmp1, CONST0_RTX 
> (mode),
> +   operands[1]));
> +  emit_insn (gen_vsx_ld_lowpart_zero_ (tmp2, operands[2],
> +  CONST0_RTX (mode)));
> +  emit_insn (gen_ior3 (operands[0], tmp1, tmp2));
> +  DONE;
> +})
> +
>  ;; Combiner patterns to allow creating XXPERMDI's to access either double
>  ;; word element in a vector register.
>  (define_insn "*vsx_concat__1"
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103568.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103568.c
> new file mode 100644
> index 000..b2a06fb2162
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103568.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +vector double test (double *a, double *b)
> +{
> +  return (vector double) {*a, *b};
> +}
> +
> +vector long long test1 (long long *a, long long *b)
> +{
> +  return (vector long long) {*a, *b};
> +}
> +
> +/* { dg-final { scan-assembler-times {\mlxsd} 2 } } */
> +/* { dg-final { scan-assembler-times {\mlxvrdx\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxxlor\M} 2 } } */
> +

BR,
Kewen



Re: [PATCH] rs6000: Relax some FLOAT128 expander condition for FLOAT128_IEEE_P [PR105359]

2024-07-17 Thread Kewen.Lin
Hi Peter,

on 2024/7/17 21:00, Peter Bergner wrote:
> On 7/17/24 4:09 AM, Kewen.Lin wrote:
>>  * config/rs6000/rs6000.md (@extenddf2): Remove condition
>>  TARGET_LONG_DOUBLE_128 for FLOAT128_IEEE_P modes.
> 
> This all LGTM, except this ChangeLog fragment doesn't match the code changes
> below.  Rather than removing TARGET_LONG_DOUBLE_128, you've added
> FLOAT128_IEEE_P (mode)).

Thanks for the comments!

For FLOAT128_IEEE_P modes, the previous condition is (TARGET_HARD_FLOAT &&
TARGET_LONG_DOUBLE_128), with this change the condition becomes to 
TARGET_HARD_FLOAT,
from this perspective it still matches.  I guess you meant "Remove" is expected 
to
remove some code explicitly and can be misleading here, if so how about "Don't 
check
TARGET_LONG_DOUBLE_128 for FLOAT128_IEEE_P modes"?

> 
> 
>> -  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
>> +  "TARGET_HARD_FLOAT
>> +   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
> 
> 
> Peter
> 

BR,
Kewen



[PATCH] rs6000: Relax some FLOAT128 expander condition for FLOAT128_IEEE_P [PR105359]

2024-07-17 Thread Kewen.Lin
Hi,

As PR105359 shows, we disable some FLOAT128 expanders for
64-bit long double, but in fact IEEE float128 types like
__ieee128 are only guarded with TARGET_FLOAT128_TYPE and
TARGET_LONG_DOUBLE_128 is only checked when determining if
we can reuse long_double_type_node.  So this patch is to
relax all affected FLOAT128 expander conditions for
FLOAT128_IEEE_P.  By the way, currently IBM double double
type __ibm128 is guarded by TARGET_LONG_DOUBLE_128, so we
have to use TARGET_LONG_DOUBLE_128 for it.  IMHO, it's not
necessary and can be enhanced later.

Btw, for all test cases mentioned in PR105359, I removed
the xfails and tested them with explicit -mlong-double-64,
both pr79004.c and float128-hw.c are tested well and
float128-hw4.c isn't tested (unsupported due to 64 bit
long double conflicts with -mabi=ieeelongdouble).

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9/P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/105359

gcc/ChangeLog:

* config/rs6000/rs6000.md (@extenddf2): Remove condition
TARGET_LONG_DOUBLE_128 for FLOAT128_IEEE_P modes.
(extendsf2): Likewise.
(truncdf2): Likewise.
(truncsf2): Likewise.
(floatsi2): Likewise.
(fix_truncsi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr79004.c: Remove xfails.
---
 gcc/config/rs6000/rs6000.md| 18 --
 gcc/testsuite/gcc.target/powerpc/pr79004.c | 14 ++
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 276a5c9cf2d..c79858ba064 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8845,7 +8845,8 @@ (define_insn_and_split "*mov_softfloat"
 (define_expand "@extenddf2"
   [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
(float_extend:FLOAT128 (match_operand:DF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8903,7 +8904,8 @@ (define_insn_and_split "@extenddf2_vsx"
 (define_expand "extendsf2"
   [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
(float_extend:FLOAT128 (match_operand:SF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8919,7 +8921,8 @@ (define_expand "extendsf2"
 (define_expand "truncdf2"
   [(set (match_operand:DF 0 "gpc_reg_operand")
(float_truncate:DF (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 {
@@ -8956,7 +8959,8 @@ (define_insn "truncdf2_internal2"
 (define_expand "truncsf2"
   [(set (match_operand:SF 0 "gpc_reg_operand")
(float_truncate:SF (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8973,7 +8977,8 @@ (define_expand "floatsi2"
   [(parallel [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
   (float:FLOAT128 (match_operand:SI 1 "gpc_reg_operand")))
  (clobber (match_scratch:DI 2))])]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
@@ -9009,7 +9014,8 @@ (define_insn "fix_trunc_helper"
 (define_expand "fix_truncsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand")
(fix:SI (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
b/gcc/testsuite/gcc.target/powerpc/pr79004.c
index 60c576cd36b..ac89a4c9f32 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
@@ -100,12 +100,10 @@ void to_uns_short_store_n (TYPE a, unsigned short *p, 
long n) { p[n] = (unsigned
 void to_uns_int_store_n (TYPE a, unsigned int *p, long n) { p[n] = (unsigned 
int)a; }
 void to_uns_long_store_n (TYPE a, unsigned long *p, long n) { p[n] = (unsigned 
long)a; }

-/* On targets with 64-bit long double, some opcodes to deal with __float128 are
-   disabled, see PR target/105359.  */
-/* { dg-final { scan-assembler-not {\mbl __}   { xfail longdouble64 } 

Re: [PATCH v2] rs6000: Error on CPUs and ABIs that don't support the ROP, protection insns [PR114759]

2024-07-15 Thread Kewen.Lin
Hi Peter,

on 2024/7/16 06:07, Peter Bergner wrote:
> Hi Kewen,
> 
> Here's the updated patch per your review comments, minus your suggestion
> to disable the ROP mask which I mentioned isn't needed in my other reply.
> 
> This passed bootstrap and regtesting with no regressions on powerpc64le-linux.
> Ok for trunk?

OK for trunk, thanks!

BR,
Kewen

> 
> Peter
> 
> 
> Changes from v1:
>   * Moved checks for invalid targets from rs6000_override_options_after_change
> to rs6000_option_override_internal.
> 
> 
> rs6000: Error on CPUs and ABIs that don't support the ROP, protection insns 
> [PR114759]
> 
> We currently silently ignore the -mrop-protect option for old CPUs we don't
> support with the ROP hash insns, but we throw an error for unsupported ABIs.
> This patch treats unsupported CPUs and ABIs similarly by throwing an error
> both both.  This matches clang behavior and allows us to simplify our tests
> in the code that generates our prologue and epilogue code.
> 
> 2024-07-15  Peter Bergner  
> 
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000.cc (rs6000_option_override_internal): Disallow
>   CPUs and ABIs that do no support the ROP protection insns.
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
>   unneeded tests.
>   (rs6000_emit_prologue): Likewise.
>   Remove unneeded gcc_assert.
>   (rs6000_emit_epilogue): Likewise.
>   * config/rs6000/rs6000.md: Likewise.
> 
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-3.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 22 +--
>  gcc/config/rs6000/rs6000.cc   | 12 ++
>  gcc/config/rs6000/rs6000.md   |  4 ++--
>  gcc/testsuite/gcc.target/powerpc/pr114759-3.c | 19 
>  4 files changed, 39 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-3.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index bd363b625a4..fdb6414f486 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -716,17 +716,11 @@ rs6000_stack_info (void)
>info->calls_p = (!crtl->is_leaf || cfun->machine->ra_needs_full_frame);
>info->rop_hash_size = 0;
>  
> -  if (TARGET_POWER8
> -  && info->calls_p
> -  && DEFAULT_ABI == ABI_ELFv2
> -  && rs6000_rop_protect)
> +  /* If we want ROP protection and this function makes a call, indicate
> + we need to create a stack slot to save the hashed return address in.  */
> +  if (rs6000_rop_protect
> +  && info->calls_p)
>  info->rop_hash_size = 8;
> -  else if (rs6000_rop_protect && DEFAULT_ABI != ABI_ELFv2)
> -{
> -  /* We can't check this in rs6000_option_override_internal since
> -  DEFAULT_ABI isn't established yet.  */
> -  error ("%qs requires the ELFv2 ABI", "-mrop-protect");
> -}
>  
>/* Determine if we need to save the condition code registers.  */
>if (save_reg_p (CR2_REGNO)
> @@ -3277,9 +3271,8 @@ rs6000_emit_prologue (void)
>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>   but there's no way to know that here.  Harmless except for
>   performance, of course.  */
> -  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)
> +  if (info->rop_hash_size)
>  {
> -  gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
>rtx addr = gen_rtx_PLUS (Pmode, stack_ptr,
>  GEN_INT (info->rop_hash_save_offset));
> @@ -5056,12 +5049,9 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>  
>/* The ROP hash check must occur after the stack pointer is restored
>   (since the hash involves r1), and is not performed for a sibcall.  */
> -  if (TARGET_POWER8
> -  && rs6000_rop_protect
> -  && info->rop_hash_size != 0
> +  if (info->rop_hash_size
>&& epilogue_type != EPILOGUE_TYPE_SIBCALL)
>  {
> -  gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
>rtx addr = gen_rtx_PLUS (Pmode, stack_ptr,
>  GEN_INT (info->rop_hash_save_offset));
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index fd6e013c346..1cee9c2011d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4825,6 +4825,18 @@ rs6000_option_override_internal (bool global_init_p)
>   }
>  }
>  
> +  /* We only support ROP protection on certain targets.  */
> +  if (rs6000_rop_protect)
> +{
> +  /* Disallow CPU targets we don't support.  */
> +  if (!TARGET_POWER8)
> + error ("%<-mrop-protect%> requires %<-mcpu=power8%> or later");
> +
> +  /* Disallow ABI targets we don't support.  */
> +  if (DEFAULT_ABI != ABI_ELFv2)
> + error ("%<-mrop-protect%> requires the ELFv2 ABI");
> 

Re: [PATCH, rs6000] Remove redundant guard for float128 mode patterns

2024-07-15 Thread Kewen.Lin
Hi Haochen,

on 2024/7/15 10:14, HAO CHEN GUI wrote:
> Hi,
>   This patch removes FLOAT128_IEEE_P guard when the mode of pattern
> is IEEE128 and FLOAT128_IBM_P when the mode of pattern is IBM128.
> The mode iterators already do the checking. So they're redundant.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?

Thanks for this clean up, OK for trunk.

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Remove redundant guard for float128 mode patterns
> 
> gcc/
>   * config/rs6000/rs6000.md (movcc, *movcc_p10,
>   *movcc_invert_p10, *fpmask, *xxsel,
>   @ieee_128bit_vsx_abs2, *ieee_128bit_vsx_nabs2,
>   add3, sub3, mul3, div3, sqrt2,
>   copysign3, copysign3_hard, copysign3_soft,
>   @neg2_hw, @abs2_hw, *nabs2_hw, fma4_hw,
>   *fms4_hw, *nfma4_hw, *nfms4_hw,
>   extend2_hw, truncdf2_hw,
>   truncsf2_hw, fix_2_hw,
>   fix_trunc2,
>   *fix_trunc2_mem,
>   float_di2_hw, float_si2_hw,
>   float2, floatuns_di2_hw,
>   floatuns_si2_hw, floatuns2,
>   floor2, ceil2, btrunc2, round2,
>   add3_odd, sub3_odd, mul3_odd, div3_odd,
>   sqrt2_odd, fma4_odd, *fms4_odd, *nfma4_odd,
>   *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
>   Remove guard FLOAT128_IEEE_P.
>   (@extenddf2_fprs, @extenddf2_vsx,
>   truncdf2_internal1, truncdf2_internal2,
>   fix_trunc_helper, neg2, *cmp_internal1,
>   *cmp_internal2 for IBM128): Remove guard FLOAT128_IBM_P.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index c0f6599c08b..f22b7ed6256 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -5736,7 +5736,7 @@ (define_expand "movcc"
>(if_then_else:IEEE128 (match_operand 1 "comparison_operator")
>  (match_operand:IEEE128 2 "gpc_reg_operand")
>  (match_operand:IEEE128 3 "gpc_reg_operand")))]
> -  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>  {
>if (rs6000_emit_cmove (operands[0], operands[1], operands[2], operands[3]))
>  DONE;
> @@ -5753,7 +5753,7 @@ (define_insn_and_split "*movcc_p10"
>(match_operand:IEEE128 4 "altivec_register_operand" "v,v")
>(match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
> (clobber (match_scratch:V2DI 6 "=0,"))]
> -  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>"#"
>"&& 1"
>[(set (match_dup 6)
> @@ -5785,7 +5785,7 @@ (define_insn_and_split "*movcc_invert_p10"
>(match_operand:IEEE128 4 "altivec_register_operand" "v,v")
>(match_operand:IEEE128 5 "altivec_register_operand" "v,v")))
> (clobber (match_scratch:V2DI 6 "=0,"))]
> -  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>"#"
>"&& 1"
>[(set (match_dup 6)
> @@ -5820,7 +5820,7 @@ (define_insn "*fpmask"
>(match_operand:IEEE128 3 "altivec_register_operand" "v")])
>(match_operand:V2DI 4 "all_ones_constant" "")
>(match_operand:V2DI 5 "zero_constant" "")))]
> -  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>"xscmp%V1qp %0,%2,%3"
>[(set_attr "type" "fpcompare")])
> 
> @@ -5831,7 +5831,7 @@ (define_insn "*xxsel"
>(match_operand:V2DI 2 "zero_constant" ""))
>(match_operand:IEEE128 3 "altivec_register_operand" "v")
>(match_operand:IEEE128 4 "altivec_register_operand" "v")))]
> -  "TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (mode)"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>"xxsel %x0,%x4,%x3,%x1"
>[(set_attr "type" "vecmove")])
> 
> @@ -8904,7 +8904,7 @@ (define_insn_and_split "@extenddf2_fprs"
>(match_operand:DF 1 "nonimmediate_operand" "d,m,d")))
> (use (match_operand:DF 2 "nonimmediate_operand" "m,m,d"))]
>"!TARGET_VSX && TARGET_HARD_FLOAT
> -   && TARGET_LONG_DOUBLE_128 && FLOAT128_IBM_P (mode)"
> +   && TARGET_LONG_DOUBLE_128"
>"#"
>"&& reload_completed"
>[(set (match_dup 3) (match_dup 1))
> @@ -8921,7 +8921,7 @@ (define_insn_and_split "@extenddf2_vsx"
>[(set (match_operand:IBM128 0 "gpc_reg_operand" "=d,d")
>   (float_extend:IBM128
>(match_operand:DF 1 "nonimmediate_operand" "wa,m")))]
> -  "TARGET_LONG_DOUBLE_128 && TARGET_VSX && FLOAT128_IBM_P (mode)"
> +  "TARGET_LONG_DOUBLE_128 && TARGET_VSX"
>"#"
>"&& reload_completed"
>[(set (match_dup 2) (match_dup 1))
> @@ -8967,7 +8967,7 @@ (define_insn_and_split "truncdf2_internal1"
>[(set (match_operand:DF 0 "gpc_reg_operand" "=d,?d")
>   (float_truncate:DF
>(match_operand:IBM128 1 "gpc_reg_operand" "0,d")))]
> -  "FLOAT128_IBM_P (mode) && !TARGET_XL_COMPAT
> +  "!TARGET_XL_COMPAT
> && TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
>

Re: [PATCHv2, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-15 Thread Kewen.Lin
Hi Haochen,

on 2024/7/15 10:10, HAO CHEN GUI wrote:
> Hi,
>   This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
> precision insns. Some qp patterns are guarded by TARGET_P9_VECTOR
> originally, so replace it with "TARGET_FLOAT128_HW".
> 
>   For test case float128-cmp2-runnable.c, it should be guarded with
> ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
> ppc_float128_hw, so it's removed.
> 
>   Compared to previous version, the main change it to split redundant
> FLOAT128_IEEE_P removal to another patch.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?

OK for trunk, thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns
> 
> gcc/
>   * config/rs6000/rs6000.md (floatti2, floatunsti2,
>   fix_truncti2): Add guard TARGET_FLOAT128_HW.
>   * config/rs6000/vsx.md (xsxexpqp__,
>   xsxsigqp__, xsiexpqpf_,
>   xsiexpqp__, xscmpexpqp__,
>   *xscmpexpqp, xststdcnegqp_): Replace guard TARGET_P9_VECTOR
>   with TARGET_FLOAT128_HW.
>   (xststdc_, *xststdc_, isinf2): Add guard
>   TARGET_FLOAT128_HW for the IEEE128 modes.
> 
> gcc/testsuite/
>   * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
>   ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index deffc4b601c..c0f6599c08b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -6928,7 +6928,7 @@ (define_insn "floatdidf2"
>  (define_insn "floatti2"
>[(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
>   (float:IEEE128 (match_operand:TI 1 "vsx_register_operand" "v")))]
> -  "TARGET_POWER10"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>  {
>return  "xscvsqqp %0,%1";
>  }
> @@ -6937,7 +6937,7 @@ (define_insn "floatti2"
>  (define_insn "floatunsti2"
>[(set (match_operand:IEEE128 0 "vsx_register_operand" "=v")
>   (unsigned_float:IEEE128 (match_operand:TI 1 "vsx_register_operand" 
> "v")))]
> -  "TARGET_POWER10"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>  {
>return  "xscvuqqp %0,%1";
>  }
> @@ -6946,7 +6946,7 @@ (define_insn "floatunsti2"
>  (define_insn "fix_truncti2"
>[(set (match_operand:TI 0 "vsx_register_operand" "=v")
>   (fix:TI (match_operand:IEEE128 1 "vsx_register_operand" "v")))]
> -  "TARGET_POWER10"
> +  "TARGET_POWER10 && TARGET_FLOAT128_HW"
>  {
>return  "xscvqpsqz %0,%1";
>  }
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 1272f8b2080..7dd08895bec 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__"
>   (unspec:V2DI_DI
> [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>"xsxexpqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__"
>   (unspec:VEC_TI [(match_operand:IEEE128 1
>   "altivec_register_operand" "v")]
>UNSPEC_VSX_SXSIG))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>"xsxsigqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_"
>[(match_operand:IEEE128 1 "altivec_register_operand" "v")
> (match_operand:DI 2 "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__"
>(match_operand:V2DI_DI 2
> "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (CMP_TEST:SI (match_dup 3)
>(const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>  {
>if ( == UNORDERED && !HONOR_NANS (mode))
>  {
> @@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp"
> (match_operand:IEEE128 2 "altivec_register_operand" 
> "v")]
> UNSPEC_VSX_SCMPEXPQP)
>(match_operand:SI 3 "zero_constant" "j")))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_FLOAT128_HW"
>"xscmpexpqp %0,%1,%2"
>[(set_attr "type" "fpcompare")])
> 
> @@ -5315,7 +5315,8 @@ (define_expand "xststdc_"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (eq:SI (match_dup 3)
>  (const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR
> +   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
>  {
>operands[3] = gen_reg_rtx (CCFPmode);
>operands[4] = CONST0_RTX (SImode);
> @@ -5324,7 +5325,8 @@ (define_expand 

Re: [PATCH] rs6000: Error on CPUs and ABIs that don't support the ROP, protection insns [PR114759]

2024-07-15 Thread Kewen.Lin
on 2024/7/16 04:30, Peter Bergner wrote:
> On 7/11/24 1:24 AM, Kewen.Lin wrote:
>> Sorry for the confusion, I meant for most target options when we emit some 
>> error
>> message meanwhile we also unset it, such as:
>>
>>   if (TARGET_CRYPTO && !TARGET_ALTIVEC)
>> {
>>   if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO)
>>  error ("%qs requires %qs", "-mcrypto", "-maltivec");
>>   rs6000_isa_flags &= ~OPTION_MASK_CRYPTO;
>> }
> 
> That is not what is happening here though.  The code here to disable
> crypto is for the case where crypto is enabled implicitly, via say
> -mcpu=power8, but the user has also explicitly disabled Altivec which
> crypto depends on.  In that case, we do not emit an error or warning and

But if it's just for the case implicit enabling, the unmask should be
in an else arm and not for both implicit and explicit, the code still does
unmasking for explicit enabling.   Since it's very unlikely to cause some
unexpected behaviors in following processing even in future, it's your call. :)

BR,
Kewen

> we silently disable crypto.  This is similar to -mcpu=power8 -mno-altivec
> where we silently disable VSX.  The other cases you showed are of similar
> scenarios.
> 
> In my ROP code, we know ROP was explicitly enabled (it is never turned on
> implicitly with any -mcpu= option) and the target cpu and/or ABI does
> not support it, so there's nothing more to do, other than to emit an error.
> There is no matching implicit use case where we silently disable ROP as
> there was in your case above.  Therefore, I think the code as I showed it
> is correct as is...other than the code snipit location, which I have moved
> and am testing now.
> 
> Peter
> 
> 
> 



Re: [REPOST 0/3] Add support for -mcpu=power11

2024-07-12 Thread Kewen.Lin
Hi Mike,

on 2024/7/11 01:32, Michael Meissner wrote:
> Note, this is a repost of the 3 patches I posted on June 4th.  The first two
> patches are the same.  The third patch modifies the power11 tests to do a
> compile instead of assemble, and I removed the power11 specific target support
> that was posted as suggested by Kewen.Lin.
> 
> The following 3 patches add support for -mcpu=power11 to GCC 15.  Assuming
> these patches are approved and go into GCC 15, I will need to back port them 
> to
> GCC 14.
> 
> The first patch adds the basic support for -mcpu=power11, except for the
> scheduling infomration.
> 
> The second patch goes through power10.md and adds scheduling support for
> power11, treating -mtune=power11 to be the same as -mtune=power10 at the
> current time.
> 
> The third patch adds some new tests for -mcpu=power11 support.
> 
> In order to use -mcpu=power11, you will need to use a new enough binutils that
> supports the .machine power11 option.
> 
> I have bootstrapped the compiler on both little endian and big endian systems.
> There were no regressions in either case.  Can I check these patches into the
> GCC trunk?  After a waiting period assuming there are no issues, can I check
> these patches into the GCC 14 branch?
> 

Thanks for doing this!  This patch series looks good to me, as it's still on
Segher's review list, I'd leave this to him for final approval, thanks!

BR,
Kewen



[PATCH] rs6000: Change optab for ibm128 and ieee128 conversion

2024-07-12 Thread Kewen.Lin
Hi,

Currently for 128 bit floating-point ibm128 and ieee128
formats conversion, the corresponding libcalls are:
  ibm128 -> ieee128 "__trunctfkf2"
  ieee128 -> ibm128 "__extendkftf2"
, and generic code handling (like convert_mode_scalar) also
adopts sext_optab for ieee128 -> ibm128 while trunc_optab
for ibm128 -> ieee128.  But in rs6000 port as function
rs6000_expand_float128_convert and init_float128_ieee show,
we adopt sext_optab for ibm128 -> ieee128 with "__trunctfkf2"
while trunc_optab for ieee128 -> ibm128 with "__extendkftf2".

To make them consistent and avoid some surprises, this patch
is to adjust rs6000 internal handlings by adopting trunc_optab
for ibm128 -> ieee128 with "__trunctfkf2" while sext_optab for
ieee128 -> ibm128 with "__extendkftf2".

Bootstrapped and regtested on powerpc64{,le}-linux-gnu
(ibm128 long double default) and powerpc64le-linux-gnu
(ieee128 long double default).

I'm going to install this next week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.cc (init_float128_ieee): Use trunc_optab rather
than sext_optab for converting FLOAT128_IBM_P mode to FLOAT128_IEEE_P
mode, and use sext_optab rather than trunc_optab for converting
FLOAT128_IEEE_P mode to FLOAT128_IBM_P mode.
(rs6000_expand_float128_convert): Likewise.
---
 gcc/config/rs6000/rs6000.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4af1eeb3722..7e30ab5b207 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11460,13 +11460,13 @@ init_float128_ieee (machine_mode mode)
   set_conv_libfunc (trunc_optab, SFmode, mode, "__trunckfsf2");
   set_conv_libfunc (trunc_optab, DFmode, mode, "__trunckfdf2");

-  set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
+  set_conv_libfunc (trunc_optab, mode, IFmode, "__trunctfkf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
-   set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");
+   set_conv_libfunc (trunc_optab, mode, TFmode, "__trunctfkf2");

-  set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
+  set_conv_libfunc (sext_optab, IFmode, mode, "__extendkftf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
-   set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
+   set_conv_libfunc (sext_optab, TFmode, mode, "__extendkftf2");

   set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdkf");
   set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddkf");
@@ -15624,7 +15624,7 @@ rs6000_expand_float128_convert (rtx dest, rtx src, bool 
unsigned_p)
case E_IFmode:
case E_TFmode:
  if (FLOAT128_IBM_P (src_mode))
-   cvt = sext_optab;
+   cvt = trunc_optab;
  else
do_move = true;
  break;
@@ -15686,7 +15686,7 @@ rs6000_expand_float128_convert (rtx dest, rtx src, bool 
unsigned_p)
case E_IFmode:
case E_TFmode:
  if (FLOAT128_IBM_P (dest_mode))
-   cvt = trunc_optab;
+   cvt = sext_optab;
  else
do_move = true;
  break;
--
2.43.5



[PATCH 2/3 v3] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-07-12 Thread Kewen.Lin
Hi,

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.

In function convert_mode_scalar, with the proposed change,
it adopts sext_optab for converting ieee128 format mode to
ibm128 format mode while trunc_optab for converting ibm128
format mode to ieee128 format mode.  Thus this patch removes
useless extend and trunc optab supports, supplements new
define_expands expandkftf2 and trunctfkf2 to align with
convert_mode_scalar implementation.  It also unnames two
define_insn_and_split to avoid conflicts and make them more
clear.  Considering the current implementation that there is
no chance to have KF <-> IF conversion (since either of them
would be TF already), it adds two dummy define_expands to
assert this.

Comparing to v2[2], it adjusts the optabs for ibm128 to
ieee128 conversion with trunc as 1/3 v2 new changes[3].

Bootstrapped and regtested on powerpc64{,le}-linux-gnu
(ibm128 long double default) and powerpc64le-linux-gnu
(ieee128 long double default).

[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656371.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656667.html

I'm going to install this next week if no objections.

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(rs6000_c_mode_for_floating_type): Likewise.
* config/rs6000/rs6000.md (define_expand extendiftf2): Remove.
(define_expand extendifkf2): Remove.
(define_expand extendtfkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand extendtfif2): Add new assertion.
(define_expand expandkftf2): New.
(define_expand trunciftf2): Add new assertion.
(define_expand trunctfkf2): New.
(define_expand truncifkf2): Change with gcc_unreachable.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
---
 gcc/config/rs6000/rs6000-modes.def | 31 ++
 gcc/config/rs6000/rs6000-modes.h   | 36 
 gcc/config/rs6000/rs6000.cc|  9 +---
 gcc/config/rs6000/rs6000.h |  5 ---
 gcc/config/rs6000/rs6000.md| 67 --
 gcc/config/rs6000/t-rs6000 |  1 -
 6 files changed, 41 insertions(+), 108 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-modes.h

diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index 094b246c834..b69593c40a6 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -18,12 +18,11 @@
along with GCC; see the file COPYING3.  If not see
.  */

-/* We order the 3 128-bit floating point types so that IFmode (IBM 128-bit
-   floating point) is the 128-bit floating point type with the highest
-   precision (128 bits).  This so that machine independent parts of the
-   compiler do not try to widen IFmode to TFmode on ISA 3.0 (power9) that has
-   hardware support for IEEE 128-bit.  We set TFmode (long double mode) in
-   between, and KFmode (explicit __float128) below it.
+/* We order the 3 128-bit floating point type modes here as KFmode, TFmode and
+   IFmode, it is the same as the previous order, to make machine independent
+   parts of the compiler 

Re: [PATCH, rs6000] Add TARGET_FLOAT128_HW guard for quad-precision insns

2024-07-11 Thread Kewen.Lin
Hi Haochen,

on 2024/7/11 13:50, HAO CHEN GUI wrote:
> Hi,
>   This patch adds TARGET_FLOAT128_HW into pattern conditions for quad-
> precision insns. Also it removes FLOAT128_IEEE_P check from pattern
> conditions if the mode of pattern is IEEE128 as the mode iterator -
> IEEE128 already checks with FLOAT128_IEEE_P.

I noticed that there are several patterns with similar useless 
FLOAT128_IBM_P condition, could you make a separated patch for both
redundant FLOAT128_IBM_P and FLOAT128_IEEE_P removal?  Then it can
be separated from this TARGET_FLOAT128_HW change and become purely
a NFC patch.

> 
>   For test case float128-cmp2-runnable.c, it should be guarded with
> ppc_float128_hw as it calls qp insns. The p9vector_hw is covered with
> ppc_float128_hw, so it's removed.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Add TARGET_FLOAT128_HW guard for quad-precision insns
> 
> gcc/
>   * config/rs6000/rs6000.md (*fpmask, floatdidf2, floatti2,
>   floatunsti2, fix_truncti2): Add guard
>   TARGET_FLOAT128_HW.
>   (add3, sub3, mul3, div3, sqrt2,
>   copysign3_hard, copysign3_soft, @neg2_hw,
>   @abs2_hw, *nabs2_hw, fma4_hw, *fms4_hw,
>   *nfma4_hw, *nfms4_hw,
>   extend2_hw, truncdf2_hw,
>   truncsf2_hw, fix_trunc2,
>   *fix_trunc2_mem,
>   float_si2_hw, floatuns_di2_hw, floor2,
>   ceil2, btrunc2, round2, add3_odd,
>   sub3_odd, mul3_odd, div3_odd, sqrt2_odd,
>   fma4_odd, *fms4_odd, *nfma4_odd,
>   *nfms4_odd, truncdf2_odd, *cmp_hw for IEEE128):
>   Remove guard FLOAT128_IEEE_P.
>   * config/rs6000/vsx.md (xsxexpqp__,
>   xsxsigqp__, xsiexpqpf_,
>   xsiexpqp__, xscmpexpqp__,
>   *xscmpexpqp, xststdcnegqp_): Add guard TARGET_FLOAT128_HW.
>   (xststdc_, *xststdc_, xststdc_): Add guard
>   TARGET_FLOAT128_HW for the IEEE128 mode.
> 
> gcc/testsuite/
>   * testsuite/gcc.target/powerpc/float128-cmp2-runnable.c: Replace
>   ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.
> 
> patch.diff

snip...

> "xscmpuqp %0,%1,%2"
>[(set_attr "type" "veccmp")
> (set_attr "size" "128")])
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 56d1d8c737e..b5c143b1523 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5157,7 +5157,7 @@ (define_insn "xsxexpqp__"
>   (unspec:V2DI_DI
> [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
>UNSPEC_VSX_SXEXPDP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

TARGET_FLOAT128_HW checks ISA_3_0_MASKS_IEEE which has OPTION_MASK_P9_VECTOR,
so I think TARGET_P9_VECTOR is redundant here.

>"xsxexpqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5176,7 +5176,7 @@ (define_insn "xsxsigqp__"
>   (unspec:VEC_TI [(match_operand:IEEE128 1
>   "altivec_register_operand" "v")]
>UNSPEC_VSX_SXSIG))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsxsigqp %0,%1"
>[(set_attr "type" "vecmove")])
> 
> @@ -5196,7 +5196,7 @@ (define_insn "xsiexpqpf_"
>[(match_operand:IEEE128 1 "altivec_register_operand" "v")
> (match_operand:DI 2 "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5208,7 +5208,7 @@ (define_insn "xsiexpqp__"
>(match_operand:V2DI_DI 2
> "altivec_register_operand" "v")]
>UNSPEC_VSX_SIEXPQP))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xsiexpqp %0,%1,%2"
>[(set_attr "type" "vecmove")])
> 
> @@ -5278,7 +5278,7 @@ (define_expand "xscmpexpqp__"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (CMP_TEST:SI (match_dup 3)
>(const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>  {
>if ( == UNORDERED && !HONOR_NANS (mode))
>  {
> @@ -5296,7 +5296,7 @@ (define_insn "*xscmpexpqp"
> (match_operand:IEEE128 2 "altivec_register_operand" 
> "v")]
> UNSPEC_VSX_SCMPEXPQP)
>(match_operand:SI 3 "zero_constant" "j")))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR && TARGET_FLOAT128_HW"

Ditto.

>"xscmpexpqp %0,%1,%2"
>[(set_attr "type" "fpcompare")])
> 
> @@ -5315,7 +5315,8 @@ (define_expand "xststdc_"
> (set (match_operand:SI 0 "register_operand" "=r")
>   (eq:SI (match_dup 3)
>  (const_int 0)))]
> -  "TARGET_P9_VECTOR"
> +  "TARGET_P9_VECTOR
> +   && (!FLOAT128_IEEE_P (mode) || TARGET_FLOAT128_HW)"
>  {
>operands[3] = gen_reg_rtx (CCFPmode);
>operands[4] = CONST0_RTX (SImode);
> @@ -5324,7 +5325,9 @@ (define_expand "xststdc_"
>  (define_expand "isinf2"
>[(use 

Re: [PATCH] rs6000: Error on CPUs and ABIs that don't support the ROP, protection insns [PR114759]

2024-07-11 Thread Kewen.Lin
on 2024/7/11 11:36, Peter Bergner wrote:
> On 7/10/24 1:01 AM, Kewen.Lin wrote:
>>> +  if (rs6000_rop_protect)
>>> +{
>>> +  /* Disallow CPU targets we don't support.  */
>>> +  if (!TARGET_POWER8)
>>> +   error ("%<-mrop-protect%> requires %<-mcpu=power8%> or later");
>>> +
>>> +  /* Disallow ABI targets we don't support.  */
>>> +  if (DEFAULT_ABI != ABI_ELFv2)
>>> +   error ("%<-mrop-protect%> requires the ELFv2 ABI");
>>> +}
>>
>> I wonder if there is some reason to put this hunk here.  IMHO we want the 
>> hunk
>> in rs6000_option_override_internal instead since no optimization options can
>> affect cpu type and DEFAULT_ABI? 
> 
> So the original code that used to disable shrink-wrapping that looked like:
> 
>   /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
>   if (rs6000_rop_protect)
> flag_shrink_wrap = 0;
> 
> ...used to be in rs6000_option_override_internal, but was moved to
> rs6000_override_options_after_change as part of PR101324 (commit 
> cff7879a381d).

Yeah, I noticed that, but it's for optimization option flag_shrink_wrap.  If we
just disable shrink-wrap in rs6000_option_override_internal, some function 
adopts
optimize attribute to enable shrink-wrap, it would be re-enabled, which is
unexpected.

But target options (this patch cares about) don't have the issue, so I think
function rs6000_option_override_internal is a better fit.

> I guess I just placed this code here since it was the correct location for
> that old usage (it's changed in the patch I'm waiting for Segher to 
> re-review),
> but I will look at moving it to rs6000_option_override_internal.  I think I
> thought we could use a target attribute to enable -mrop-protect, but looking
> closer, we don't actually allow that option there.
> 
> 
> 
>> And we probably want to unset rs6000_rop_protect to align with the handlings
>> on other options?
> 
> I'm not sure I know what you mean?  Why would we unset rs6000_rop_protect?
> Either we've concluded the current target options allow ROP code gen or not
> and for the cases where we don't/can't allow ROP, we want to give the user
> and error to match clang's behavior and how we already handle unsupported
> ABIs.  So what is it you're trying to describe here?

Sorry for the confusion, I meant for most target options when we emit some error
message meanwhile we also unset it, such as:

  if (TARGET_CRYPTO && !TARGET_ALTIVEC)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_CRYPTO)
error ("%qs requires %qs", "-mcrypto", "-maltivec");
  rs6000_isa_flags &= ~OPTION_MASK_CRYPTO;
}

  if (!TARGET_FPRND && TARGET_VSX)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_FPRND)
/* TARGET_VSX = 1 implies Power 7 and newer */
error ("%qs requires %qs", "-mvsx", "-mfprnd");
  rs6000_isa_flags &= ~OPTION_MASK_FPRND;
}

...

  if (TARGET_DFP && !TARGET_HARD_FLOAT)
{
  if (rs6000_isa_flags_explicit & OPTION_MASK_DFP)
error ("%qs requires %qs", "-mhard-dfp", "-mhard-float");
  rs6000_isa_flags &= ~OPTION_MASK_DFP;
}

...

I guess this point here is to avoid the caught unexpected thing to propagate
in the following processing like causing re-error or hit some unexpected
assertion etc., although in most cases this concern won't happen, it seems
not harmful to unset it (align with the others).

BR,
Kewen



[PATCH] rs6000: Update option set in rs6000_inner_target_options [PR115713]

2024-07-10 Thread Kewen.Lin
Hi,

When function rs6000_inner_target_options parsing target
options, it updates the explicit option set information for
rs6000_opt_masks by rs6000_isa_flags_explicit, but it misses
to update that information for rs6000_opt_vars, and it can
result in some unexpected consequence as the associated test
case shows.  This patch is to fix rs6000_inner_target_options
to update the option set for rs6000_opt_vars as well.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_inner_target_options): Update option
set information for rs6000_opt_vars.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115713-2.c: New test.
---
 gcc/config/rs6000/rs6000.cc   |  3 ++-
 gcc/testsuite/gcc.target/powerpc/pr115713-2.c | 22 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115713-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index ed7a9fdeb58..8647aa92fe9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24668,7 +24668,8 @@ rs6000_inner_target_options (tree args, bool attr_p)
if (strcmp (r, rs6000_opt_vars[i].name) == 0)
  {
size_t j = rs6000_opt_vars[i].global_offset;
-   *((int *) ((char *)_options + j)) = !invert;
+   *((int *) ((char *) _options + j)) = !invert;
+   *((int *) ((char *) _options_set + j)) = 1;
error_p = false;
not_valid_p = false;
break;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115713-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr115713-2.c
new file mode 100644
index 000..47b39c0faba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115713-2.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* Force power7 to avoid possible error message on AltiVec ABI change.  */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+
+/* Verify there is an error message for -mvsx incompatible with
+   -mavoid-indexed-addresses even when they are specified by
+   target attributes.  */
+
+int __attribute__ ((target ("avoid-indexed-addresses,vsx")))
+test1 (void)
+{
+  /* { dg-error "'-mvsx' and '-mavoid-indexed-addresses' are incompatible" "" 
{ target *-*-* } .-1 } */
+  return 0;
+}
+
+int __attribute__ ((target ("vsx,avoid-indexed-addresses")))
+test2 (void)
+{
+  /* { dg-error "'-mvsx' and '-mavoid-indexed-addresses' are incompatible" "" 
{ target *-*-* } .-1 } */
+  return 0;
+}
+
--
2.45.2


[PATCH] rs6000: Consider explicitly set options in target option parsing [PR115713]

2024-07-10 Thread Kewen.Lin
Hi,

In rs6000_inner_target_options, when enabling VSX we enable
altivec and disable -mavoid-indexed-addresses implicitly,
but it doesn't consider the case that the options altivec
and avoid-indexed-addresses can be explicitly disabled.  As
the test case in PR115713#c1 shows, with target attribute
"no-altivec,vsx", it results in that VSX unexpectedly set
altivec flag and there isn't an expected error.

This patch is to avoid the automatic enablement when they
are explicitly specified.  With this change, an existing
test case ppc-target-4.c also requires an adjustment by
specifying explicit altivec in target attribute (since it
requires altivec feature and command line is specifying
no-altivec).

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_inner_target_options): Avoid to
enable altivec or disable avoid-indexed-addresses automatically
when they get specified explicitly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115713-1.c: New test.
* gcc.target/powerpc/ppc-target-4.c: Adjust by specifying altivec
in target attribute.
---
 gcc/config/rs6000/rs6000.cc   |  7 +--
 .../gcc.target/powerpc/ppc-target-4.c |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr115713-1.c | 20 +++
 3 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115713-1.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3b1ee3a262a..ed7a9fdeb58 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -24643,8 +24643,11 @@ rs6000_inner_target_options (tree args, bool attr_p)
  {
if (mask == OPTION_MASK_VSX)
  {
-   mask |= OPTION_MASK_ALTIVEC;
-   TARGET_AVOID_XFORM = 0;
+   if (!(rs6000_isa_flags_explicit
+ & OPTION_MASK_ALTIVEC))
+ mask |= OPTION_MASK_ALTIVEC;
+   if (!OPTION_SET_P (TARGET_AVOID_XFORM))
+ TARGET_AVOID_XFORM = 0;
  }
  }

diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c 
b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
index 43a98b353cf..db9ba500e0e 100644
--- a/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/ppc-target-4.c
@@ -18,7 +18,7 @@
 #error "__VSX__ should not be defined."
 #endif

-#pragma GCC target("vsx")
+#pragma GCC target("altivec,vsx")
 #include 
 #pragma GCC reset_options

diff --git a/gcc/testsuite/gcc.target/powerpc/pr115713-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr115713-1.c
new file mode 100644
index 000..1b93a78682a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115713-1.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* Force power7 to avoid possible error message on AltiVec ABI change.  */
+/* { dg-options "-mdejagnu-cpu=power7" } */
+
+/* Verify there is an error message for incompatible -maltivec and -mvsx
+   even when they are specified by target attributes.  */
+
+int __attribute__ ((target ("no-altivec,vsx")))
+test1 (void)
+{
+  /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } .-1 } */
+  return 0;
+}
+
+int __attribute__ ((target ("vsx,no-altivec")))
+test2 (void)
+{
+  /* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } .-1 } */
+  return 0;
+}
--
2.45.2


[PATCH] rs6000: Escalate warning to error for VSX with explicit no-altivec etc.

2024-07-10 Thread Kewen.Lin
Hi,

As the discussion in PR115688, for now when users specify
-mvsx and -mno-altivec explicitly, compiler emits warning
rather than error, but considering both options are given
explicitly, emitting hard error should be better.

So this patch is to escalate some related warning to error
when both are incompatible.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-

PR target/115713

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Emit error
messages when explicit VSX encounters explicit soft-float, no-altivec
or avoid-indexed-addresses.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/warn-1.c: Move to ...
* gcc.target/powerpc/error-1.c: ... here.  Adjust dg-warning with
dg-error and remove ineffective scan.
---
 gcc/config/rs6000/rs6000.cc   | 41 +++
 .../powerpc/{warn-1.c => error-1.c}   |  3 +-
 2 files changed, 24 insertions(+), 20 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{warn-1.c => error-1.c} (70%)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 76bbb3a28ea..3b1ee3a262a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3822,32 +3822,37 @@ rs6000_option_override_internal (bool global_init_p)
   /* Add some warnings for VSX.  */
   if (TARGET_VSX)
 {
-  const char *msg = NULL;
+  bool explicit_vsx_p = rs6000_isa_flags_explicit & OPTION_MASK_VSX;
   if (!TARGET_HARD_FLOAT)
{
- if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
-   msg = N_("%<-mvsx%> requires hardware floating point");
- else
+ if (explicit_vsx_p)
{
- rs6000_isa_flags &= ~ OPTION_MASK_VSX;
- rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
+ if (rs6000_isa_flags_explicit & OPTION_MASK_SOFT_FLOAT)
+   error ("%<-mvsx%> and %<-msoft-float%> are incompatible");
+ else
+   warning (0, N_("%<-mvsx%> requires hardware floating-point"));
}
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
+ rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
   else if (TARGET_AVOID_XFORM > 0)
-   msg = N_("%<-mvsx%> needs indexed addressing");
-  else if (!TARGET_ALTIVEC && (rs6000_isa_flags_explicit
-  & OPTION_MASK_ALTIVEC))
-{
- if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
-   msg = N_("%<-mvsx%> and %<-mno-altivec%> are incompatible");
+   {
+ if (explicit_vsx_p && OPTION_SET_P (TARGET_AVOID_XFORM))
+   error ("%<-mvsx%> and %<-mavoid-indexed-addresses%>"
+  " are incompatible");
  else
-   msg = N_("%<-mno-altivec%> disables vsx");
-}
-
-  if (msg)
+   warning (0, N_("%<-mvsx%> needs indexed addressing"));
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
+ rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
+   }
+  else if (!TARGET_ALTIVEC
+  && (rs6000_isa_flags_explicit & OPTION_MASK_ALTIVEC))
{
- warning (0, msg);
- rs6000_isa_flags &= ~ OPTION_MASK_VSX;
+ if (explicit_vsx_p)
+   error ("%<-mvsx%> and %<-mno-altivec%> are incompatible");
+ else
+   warning (0, N_("%<-mno-altivec%> disables vsx"));
+ rs6000_isa_flags &= ~OPTION_MASK_VSX;
  rs6000_isa_flags_explicit |= OPTION_MASK_VSX;
}
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/warn-1.c 
b/gcc/testsuite/gcc.target/powerpc/error-1.c
similarity index 70%
rename from gcc/testsuite/gcc.target/powerpc/warn-1.c
rename to gcc/testsuite/gcc.target/powerpc/error-1.c
index 76ac0c4e26e..d38eba8bb8a 100644
--- a/gcc/testsuite/gcc.target/powerpc/warn-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/error-1.c
@@ -3,7 +3,7 @@
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-options "-O -mvsx -mno-altivec" } */

-/* { dg-warning "'-mvsx' and '-mno-altivec' are incompatible" "" { target 
*-*-* } 0 } */
+/* { dg-error "'-mvsx' and '-mno-altivec' are incompatible" "" { target *-*-* 
} 0 } */

 double
 foo (double *x, double *y)
@@ -16,4 +16,3 @@ foo (double *x, double *y)
   return z[0] * z[1];
 }

-/* { dg-final { scan-assembler-not "xsadddp" } } */
--
2.45.2


Re: [PATCH V4] report message for operator %a on unaddressible operand

2024-07-10 Thread Kewen.Lin
Hi Jeff,

on 2024/6/5 16:30, Jiufu Guo wrote:
> Hi,
> 
> For PR96866, when printing asm code for modifier "%a", an addressable
> operand is required.  While the constraint "X" allow any kind of
> operand even which is hard to get the address directly. e.g. extern
> symbol whose address is in TOC.
> An error message would be reported to indicate the invalid asm operand.
> 
> Compare with previous version, changelog and emit message are updated.
> 
> Bootstrap pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff(Jiufu Guo)
> 
>   PR target/96866
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (print_operand_address): Emit message for
>   Unsupported operand.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/pr96866-1.c: New test.
>   * gcc.target/powerpc/pr96866-2.c: New test.
> 
> ---
>  gcc/config/rs6000/rs6000.cc  |  7 ++-
>  gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 18 ++
>  gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 13 +
>  3 files changed, 37 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..7e7c36a1bad 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14664,7 +14664,12 @@ print_operand_address (FILE *file, rtx x)
>   fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
>reg_names[SMALL_DATA_REG]);
>else
> - gcc_assert (!TARGET_TOC);
> + {
> +   /* Do not support getting address directly from TOC, emit error.
> +  No more work is needed for !TARGET_TOC. */
> +   if (TARGET_TOC)
> + output_operand_lossage ("%%a requires an address of memory");
> + }
>  }
>else if (GET_CODE (x) == PLUS && REG_P (XEXP (x, 0))
>  && REG_P (XEXP (x, 1)))
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> new file mode 100644
> index 000..bcebbd6e310
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
> @@ -0,0 +1,18 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC, even the symbol is propgated for "X" constraint under -O2. */
> +/* { dg-options "-fPIC -O2" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-1.c" } */

This seems to XPASS on Power10 with pcrel?  This needs ! powerpc_pcrel guard if 
so.

> +
> +int x[2];
> +
> +int __attribute__ ((noipa))
> +f1 (void)
> +{
> +  int n;
> +  int *p = x;
> +  *p++;
> +  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
> +  return n;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> new file mode 100644
> index 000..0577fd6d588
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
> @@ -0,0 +1,13 @@
> +/* The "%a" modifier can't get the address of extern symbol directly from TOC
> +   with -fPIC. */
> +/* { dg-options "-fPIC -O2" } */
> +
> +/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
> +/* { dg-excess-errors "pr96866-2.c" } */

Ditto.

The others look good to me.

BR,
Kewen

> +
> +void
> +f (void)
> +{
> +  extern int x;
> +  __asm__ volatile("#%a0" ::"X"());
> +}



Re: [PATCH] rs6000: Error on CPUs and ABIs that don't support the ROP, protection insns [PR114759]

2024-07-10 Thread Kewen.Lin
Hi Peter,

on 2024/7/10 05:39, Peter Bergner wrote:
> Hi Kewen,
> 
> Here is that promised cleanup patch we discussed in the previous patch review.
> I'll note this patch is dependent on the previous patch you approved.  I have
> not pushed that yet (in case you looked) since I'm waiting on Segher to 
> approve
> the updated patch for not disabling shrink-wrapping for leaf-functions in the
> presence of -mrop-protect first.
> 
> 
> We currently silently ignore the -mrop-protect option for old CPUs we don't
> support with the ROP hash insns, but we throw an error for unsupported ABIs.
> This patch treats unsupported CPUs and ABIs similarly by throwing an error
> for both.  This matches clang behavior and allows us to simplify our ROP
> tests in the code that generates our prologue and epilogue code.
> 
> This passed bootstrap and regtesting on powerpc64le-linux and powerpc64-linux
> with no regressions.  Ok for trunk?
> 
> I'll note I did not create a test case for unsupported ABIs, since I'll be
> working on adding ROP support for powerpc-linux and powerpc64-linux next.
> 
> Peter
> 
> 
> 
> 2024-06-26  Peter Bergner  
> 
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000.cc (rs6000_override_options_after_change):
>   Disallow CPUs and ABIs that do no support the ROP protection insns.
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Remove now
>   unneeded tests.
>   (rs6000_emit_prologue): Likewise.
>   Remove unneeded gcc_assert.
>   (rs6000_emit_epilogue): Likewise.
>   * config/rs6000/rs6000.md: Likewise.
> 
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-3.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc   | 11 ++
>  gcc/config/rs6000/rs6000-logue.cc | 22 +--
>  gcc/config/rs6000/rs6000.md   |  4 ++--
>  gcc/testsuite/gcc.target/powerpc/pr114759-3.c | 19 
>  4 files changed, 38 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-3.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index fd6e013c346..e9642fd5310 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3427,6 +3427,17 @@ rs6000_override_options_after_change (void)
>  }
>else if (!OPTION_SET_P (flag_cunroll_grow_size))
>  flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> +
> +  if (rs6000_rop_protect)
> +{
> +  /* Disallow CPU targets we don't support.  */
> +  if (!TARGET_POWER8)
> + error ("%<-mrop-protect%> requires %<-mcpu=power8%> or later");
> +
> +  /* Disallow ABI targets we don't support.  */
> +  if (DEFAULT_ABI != ABI_ELFv2)
> + error ("%<-mrop-protect%> requires the ELFv2 ABI");
> +}

I wonder if there is some reason to put this hunk here.  IMHO we want the hunk
in rs6000_option_override_internal instead since no optimization options can
affect cpu type and DEFAULT_ABI?  And we probably want to unset 
rs6000_rop_protect
to align with the handlings on other options?

The others look good to me, thanks!

BR,
Kewen

>  }
>  
>  #ifdef TARGET_USES_LINUX64_OPT
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index bd363b625a4..fdb6414f486 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -716,17 +716,11 @@ rs6000_stack_info (void)
>info->calls_p = (!crtl->is_leaf || cfun->machine->ra_needs_full_frame);
>info->rop_hash_size = 0;
>  
> -  if (TARGET_POWER8
> -  && info->calls_p
> -  && DEFAULT_ABI == ABI_ELFv2
> -  && rs6000_rop_protect)
> +  /* If we want ROP protection and this function makes a call, indicate
> + we need to create a stack slot to save the hashed return address in.  */
> +  if (rs6000_rop_protect
> +  && info->calls_p)
>  info->rop_hash_size = 8;
> -  else if (rs6000_rop_protect && DEFAULT_ABI != ABI_ELFv2)
> -{
> -  /* We can't check this in rs6000_option_override_internal since
> -  DEFAULT_ABI isn't established yet.  */
> -  error ("%qs requires the ELFv2 ABI", "-mrop-protect");
> -}
>  
>/* Determine if we need to save the condition code registers.  */
>if (save_reg_p (CR2_REGNO)
> @@ -3277,9 +3271,8 @@ rs6000_emit_prologue (void)
>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>   but there's no way to know that here.  Harmless except for
>   performance, of course.  */
> -  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)
> +  if (info->rop_hash_size)
>  {
> -  gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
>rtx addr = gen_rtx_PLUS (Pmode, stack_ptr,
>  GEN_INT (info->rop_hash_save_offset));
> @@ -5056,12 +5049,9 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>  
>/* The ROP hash 

Re: [PATCH-1v5, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-07-09 Thread Kewen.Lin
Hi Haochen,

on 2024/7/9 15:18, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main changes are:
> 1 Define 3 mode attributes which are used for predicate, constraint
> and asm print selection. They help merge sp/dp/qp patterns to one.
> 2 Remove original sp/dp and qp patterns and combine them into one.
> 3 Rename corresponding icode name in rs6000-builtin.cc and
> rs6000-builtins.def.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655779.html
> 
>   The expand "isinf2" and following insn pattern for TF and
> KF mode should be guarded on "TARGET_FLOAT128_HW". It will be
> changed in sequential patch as some other "qp" insn patterns are
> also need to be changed.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/rs6000.md (constant VSX_TEST_DATA_CLASS_NAN,
>   VSX_TEST_DATA_CLASS_POS_INF, VSX_TEST_DATA_CLASS_NEG_INF,
>   VSX_TEST_DATA_CLASS_POS_ZERO, VSX_TEST_DATA_CLASS_NEG_ZERO,
>   VSX_TEST_DATA_CLASS_POS_DENORMAL, VSX_TEST_DATA_CLASS_NEG_DENORMAL):
>   Define.
>   (mode_attr sdq, vsx_altivec, wa_v, x): Define.
>   (mode_iterator IEEE_FP): Define.
>   * config/rs6000/vsx.md (isinf2): New expand.
>   (expand xststdcqp_, xststdcp): Combine into...
>   (expand xststdc_): ...this.
>   (insn *xststdcqp_, *xststdcp): Combine into...
>   (insn *xststdc_): ...this.
>   * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename
>   CODE_FOR_xststdcqp_kf as CODE_FOR_xststdc_kf,
>   CODE_FOR_xststdcqp_tf as CODE_FOR_xststdc_tf.
>   * config/rs6000/rs6000-builtins.def: Rename xststdcdp as xststdc_df,
>   xststdcsp as xststdc_sf, xststdcqp_kf as xststdc_kf.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index bb9da68edc7..a62a5d4afa7 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -3357,8 +3357,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
> subtarget */,
>case CODE_FOR_xsiexpqpf_kf:
>   icode = CODE_FOR_xsiexpqpf_tf;
>   break;
> -  case CODE_FOR_xststdcqp_kf:
> - icode = CODE_FOR_xststdcqp_tf;
> +  case CODE_FOR_xststdc_kf:
> + icode = CODE_FOR_xststdc_tf;
>   break;
>case CODE_FOR_xscmpexpqp_eq_kf:
>   icode = CODE_FOR_xscmpexpqp_eq_tf;
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..8ac4cc200c9 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -2752,11 +2752,11 @@
> 
>const signed int \
>__builtin_vsx_scalar_test_data_class_dp (double, const int<7>);
> -VSTDCDP xststdcdp {}
> +VSTDCDP xststdc_df {}
> 
>const signed int \
>__builtin_vsx_scalar_test_data_class_sp (float, const int<7>);
> -VSTDCSP xststdcsp {}
> +VSTDCSP xststdc_sf {}
> 
>const signed int __builtin_vsx_scalar_test_neg_dp (double);
>  VSTDCNDP xststdcnegdp {}
> @@ -2925,7 +2925,7 @@
> 
>const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \
>  const int<7>);
> -VSTDCQP xststdcqp_kf {}
> +VSTDCQP xststdc_kf {}
> 
>const signed int __builtin_vsx_scalar_test_neg_qp (_Float128);
>  VSTDCNQP xststdcnegqp_kf {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a5d20594789..2d7f227e362 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -53,6 +53,20 @@ (define_constants
> (FRAME_POINTER_REGNUM 110)
>])
> 
> +;;
> +;; Test data class mask

Nit: s/mask/mask bits/

> +;;
> +
> +(define_constants
> +  [(VSX_TEST_DATA_CLASS_NAN  0x40)
> +   (VSX_TEST_DATA_CLASS_POS_INF  0x20)
> +   (VSX_TEST_DATA_CLASS_NEG_INF  0x10)

Formatting nit: Move 0x20 and 0x10 align with the others?

> +   (VSX_TEST_DATA_CLASS_POS_ZERO 0x8)
> +   (VSX_TEST_DATA_CLASS_NEG_ZERO 0x4)
> +   (VSX_TEST_DATA_CLASS_POS_DENORMAL 0x2)
> +   (VSX_TEST_DATA_CLASS_NEG_DENORMAL 0x1)
> +  ])
> +
>  ;;
>  ;; UNSPEC usage
>  ;;
> @@ -605,6 +619,24 @@ (define_mode_iterator SFDF2 [SF DF])
>  (define_mode_attr sd [(SF   "s") (DF   "d")
> (V4SF "s") (V2DF "d")])
> 
> +; A generic s/d/q attribute, for sp/dp/qp for example.
> +(define_mode_attr sdq [(SF "s") (DF "d")
> +(TF "q") (KF "q")])
> +
> +; A predicate attribute, for IEEE floating point
> +(define_mode_attr vsx_altivec [(SF 

Re: [PATCH 1/3] expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format [PR112993]

2024-07-09 Thread Kewen.Lin
Hi Richard,

on 2024/7/8 19:14, Richard Sandiford wrote:
> "Kewen.Lin"  writes:[snip...]
>>>
>>> This part looks good to me FWIW, but what's the correct behaviour of:
>>>
>>>   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>>> {
>>>   if (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
>>>   && REAL_MODE_FORMAT (from_mode) == _half_format)
>>> /* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
>>> tab = trunc_optab;
>>>   else
>>> /* Conversion between decimal float and binary float, same
>>>size.  */
>>> tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
>>>
>>> for the new pairing?  The intent for bfloat/half seems to be that bfloat
>>> is treated as arbitrarily “lesser than” half, so half->bfloat is a
>>> truncation and bfloat->half is an extension.  It seems like it would be
>>> good to do something similar for the new pair, so that the float modes
>>> still form a total order in terms of extension & truncation.
>>
>> Good question!  If I read it right, this special handling for half->bfloat is
>> to align with the previous implementation in libgcc and easy for backporting
>> , but isn't to keep the modes to form a total order, as Jakub's comments
>> PR114907 #c6 and #c8.  Similar to half vs. bfloat, neither of ibm128 nor 
>> ieee128
>> is a subset or superset of the other, the current behavior for this new 
>> paring is
>> that:
>>   1) define sext_optab for any two of TF/KF/IF (also bi-direction), since 
>> generic
>>  code above adopts sext_optab for same size conversion.
> 
> But before your patch, that code only expected equal precisions if:
> 
> || (DECIMAL_FLOAT_MODE_P (from_mode)
> != DECIMAL_FLOAT_MODE_P (to_mode))
> || (REAL_MODE_FORMAT (from_mode) == _bfloat_half_format
> && REAL_MODE_FORMAT (to_mode) == _half_format)
> || (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
> && REAL_MODE_FORMAT (from_mode) == _half_format)
> 
> So the effect was:
> 
> binary->decimal: extend
> decimal->binary: truncate
> bfloat->IEEE: extend
> IEEE->bfloat: truncate
> 
> AFAICT there was no X and Y for which X->Y and Y->X are both extensions.

Yeah, this is exactly the status quo, though Jakub's comments implied that
IEEE->bfloat should have been supported with extend.

> 
>>   2) for each define_expand in 1), it actually calls 
>> rs6000_expand_float128_convert
>>  which is one existing helper to handle conversions to/from __float128 
>> and
>>  __ibm128, it will take care of the remaining (generate __extend* or 
>> __trunc*).
>>  Similar to half vs. bfloat which has extend and trunc libgcc functions, 
>> we
>>  have some extend and trunc libgcc functions for __float128 vs. __ibm128.
>>
>> We can add some special treatment for the new pairing like what's for half 
>> vs. bfloat,
>> but since extend and truncate can be arbitrary for them, I thought we can 
>> just define
>> sext_optab to align with what generic code does and wants, then do the 
>> actual handling
>> in the corresponding expander (someone may argue that sext_optab is expected 
>> to be end
>> with libcall __extend*, it's surprising to end up with __trunc*, but 
>> considering
>> the extend/truncate here is arbitrary, guessing it's acceptable?).
> 
> Yeah, using a trunc libgcc function for sext_optab seems surprising to me.

OK, thanks for the input!  Unfortunately it's even worse in rs6000:

  set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
  if (mode != TFmode && FLOAT128_IBM_P (TFmode))
set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");

  set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
  if (mode != TFmode && FLOAT128_IBM_P (TFmode))
set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");

, the optabs and their corresponding libgcc functions look reversed.
I'll follow up to adjust them as well.

> 
> Like Jakub said in #c8 above, both extension and truncation are conceptually
> wrong.  But given that both are wrong, we can't use the literal meaning to
> choose between them, and need to find something else instead.
> 
> Making the sext/trunc choice mirror the current libgcc functions seems
> cleaner.  It also avoids the (IMO) odd situation that you c

[PATCH] rs6000: Remove vcond{,u} expanders

2024-07-08 Thread Kewen.Lin
Hi,

As PR114189 shows, middle-end will obsolete vcond, vcondu
and vcondeq optabs soon.  This patch is to remove all
vcond{,u} expanders in rs6000 port and adjust the function
rs6000_emit_vector_cond_expr which is called by those
expanders as static.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr): Remove.
* config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add static
qualifier as it is only called by rs6000_emit_swsqrt now.
* config/rs6000/vector.md (vcond): Remove.
(vcond): Remove.
(vcondv4sfv4si): Likewise.
(vcondv4siv4sf): Likewise.
(vcondv2dfv2di): Likewise.
(vcondv2div2df): Likewise.
(vcondu): Likewise.
(vconduv4sfv4si): Likewise.
(vconduv2dfv2di): Likewise.
---
 gcc/config/rs6000/rs6000-protos.h |   1 -
 gcc/config/rs6000/rs6000.cc   |   2 +-
 gcc/config/rs6000/vector.md   | 160 --
 3 files changed, 1 insertion(+), 162 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..b40557a8557 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -126,7 +126,6 @@ extern void rs6000_emit_dot_insn (rtx dst, rtx src, int 
dot, rtx ccreg);
 extern bool rs6000_emit_set_const (rtx, rtx);
 extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
 extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
-extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx);
 extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
 extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
 extern rtx swap_endian_selector_for_mode (machine_mode mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 58553ff66f4..24044f3a558 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -16145,7 +16145,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
OP_FALSE are two VEC_COND_EXPR operands.  CC_OP0 and CC_OP1 are the two
operands for the relation operation COND.  */

-int
+static int
 rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
  rtx cond, rtx cc_op0, rtx cc_op1)
 {
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 59489e06839..0d3e0a24e11 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -331,166 +331,6 @@ (define_expand "vector_copysign3"
 })



-;; Vector comparisons
-(define_expand "vcond"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (if_then_else:VEC_F
-(match_operator 3 "comparison_operator"
-[(match_operand:VEC_F 4 "vfloat_operand")
- (match_operand:VEC_F 5 "vfloat_operand")])
-(match_operand:VEC_F 1 "vfloat_operand")
-(match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcond"
-  [(set (match_operand:VEC_I 0 "vint_operand")
-   (if_then_else:VEC_I
-(match_operator 3 "comparison_operator"
-[(match_operand:VEC_I 4 "vint_operand")
- (match_operand:VEC_I 5 "vint_operand")])
-(match_operand:VEC_I 1 "vector_int_reg_or_same_bit")
-(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcondv4sfv4si"
-  [(set (match_operand:V4SF 0 "vfloat_operand")
-   (if_then_else:V4SF
-(match_operator 3 "comparison_operator"
-[(match_operand:V4SI 4 "vint_operand")
- (match_operand:V4SI 5 "vint_operand")])
-(match_operand:V4SF 1 "vfloat_operand")
-(match_operand:V4SF 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
-   && VECTOR_UNIT_ALTIVEC_P (V4SImode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcondv4siv4sf"
-  [(set (match_operand:V4SI 0 "vint_operand")
-   (if_then_else:V4SI
-(match_operator 3 "comparison_operator"
-[(match_operand:V4SF 4 "vfloat_operand")
- (match_operand:V4SF 5 "vfloat_operand")])
-(match_operand:V4SI 1 

[PATCH] rs6000: Consider explicit VSX when masking off ALTIVEC [PR115688]

2024-07-04 Thread Kewen.Lin
Hi,

PR115688 exposes an inconsistent state in which we have VSX
enabled but ALTIVEC disabled.  There is one hunk:

  if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi)
rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
  & ~rs6000_isa_flags_explicit);

which disables both VSX and ALTIVEC together only considering
them explicitly set or not.  For the given case, VSX is explicitly
specified, altivec is implicitly enabled as it's part of set
ISA_2_6_MASKS_SERVER.  When falling into the above hunk, vsx is
kept as it's explicitly enabled but altivec gets masked off, it's
unexpected.

This patch is to consider explicit VSX when masking off ALTIVEC,
not mask off it if TARGET_VSX and it's explicitly set.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/115688

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_option_override_internal): Consider
explicit VSX when masking off ALTIVEC.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr115688.c: New test.
---
 gcc/config/rs6000/rs6000.cc |  8 ++--
 gcc/testsuite/gcc.target/powerpc/pr115688.c | 14 ++
 2 files changed, 20 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr115688.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 58553ff66f4..2cbea6ea2d7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3933,8 +3933,12 @@ rs6000_option_override_internal (bool global_init_p)
  not for 32-bit.  Don't move this before the above code using ignore_masks,
  since it can reset the cleared VSX/ALTIVEC flag again.  */
   if (main_target_opt && !main_target_opt->x_rs6000_altivec_abi)
-rs6000_isa_flags &= ~((OPTION_MASK_VSX | OPTION_MASK_ALTIVEC)
- & ~rs6000_isa_flags_explicit);
+{
+  rs6000_isa_flags &= ~(OPTION_MASK_VSX & ~rs6000_isa_flags_explicit);
+  /* Don't mask off ALTIVEC if it is enabled by an explicit VSX.  */
+  if (!TARGET_VSX)
+   rs6000_isa_flags &= ~(OPTION_MASK_ALTIVEC & ~rs6000_isa_flags_explicit);
+}

   if (TARGET_CRYPTO && !TARGET_ALTIVEC)
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr115688.c 
b/gcc/testsuite/gcc.target/powerpc/pr115688.c
new file mode 100644
index 000..5222e66ef17
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr115688.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target powerpc*-*-linux* } } */
+/* { dg-options "-mdejagnu-cpu=power5 -O2" } */
+
+/* Ignore some error messages on "target attribute or
+   pragma changes AltiVec ABI".  */
+/* { dg-excess-errors "pr115688" { target ilp32 } } */
+
+/* Verify there is no ICE under 32 bit env.  */
+
+__attribute__((target("vsx")))
+int test (void)
+{
+  return 0;
+}


Re: [PATCH 1/3] expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format [PR112993]

2024-07-04 Thread Kewen.Lin
Hi Richard,

Thanks for the review comments!

on 2024/7/4 23:58, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi,
>>
>> With some historical reasons, rs6000 defines KFmode, TFmode
>> and IFmode to have different mode precision, but it causes
>> some issues and needs some workarounds such as r14-6478 for
>> PR112788.  So we are going to make all rs6000 128 bit scalar
>> FP modes have 128 bit precision.  Be prepared for that, this
>> patch is to make function convert_mode_scalar allow same
>> precision FP modes conversion if their underlying formats are
>> ibm_extended_format and ieee_quad_format respectively, just
>> like the existing special treatment on arm_bfloat_half_format
>> <-> ieee_half_format.  It also factors out all the relevant
>> checks into a lambda function.
>>
>> Bootstrapped and regtested on x86_64-redhat-linux and
>> powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>>  PR target/112993
>>
>> gcc/ChangeLog:
>>
>>  * expr.cc (convert_mode_scalar): Allow same precision conversion
>>  between scalar floating point modes if whose underlying format is
>>  ibm_extended_format or ieee_quad_format, and refactor assertion
>>  with new lambda function acceptable_same_precision_modes.
>> ---
>>  gcc/expr.cc | 30 --
>>  1 file changed, 24 insertions(+), 6 deletions(-)
>>
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index ffbac513692..eac4dcc982e 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -338,6 +338,29 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
>>enum rtx_code equiv_code = (unsignedp < 0 ? UNKNOWN
>>: (unsignedp ? ZERO_EXTEND : SIGN_EXTEND));
>>
>> +  auto acceptable_same_precision_modes
>> += [] (scalar_mode from_mode, scalar_mode to_mode) -> bool
>> +{
>> +  if (DECIMAL_FLOAT_MODE_P (from_mode) != DECIMAL_FLOAT_MODE_P 
>> (to_mode))
>> +return true;
>> +
>> +  /* arm_bfloat_half_format <-> ieee_half_format */
>> +  if ((REAL_MODE_FORMAT (from_mode) == _bfloat_half_format
>> +   && REAL_MODE_FORMAT (to_mode) == _half_format)
>> +  || (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
>> +  && REAL_MODE_FORMAT (from_mode) == _half_format))
>> +return true;
>> +
>> +  /* ibm_extended_format <-> ieee_quad_format */
>> +  if ((REAL_MODE_FORMAT (from_mode) == _extended_format
>> +   && REAL_MODE_FORMAT (to_mode) == _quad_format)
>> +  || (REAL_MODE_FORMAT (from_mode) == _quad_format
>> +  && REAL_MODE_FORMAT (to_mode) == _extended_format))
>> +return true;
>> +
>> +  return false;
>> +};
>> +
>>if (to_real)
>>  {
>>rtx value;
>> @@ -346,12 +369,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
>>
>>gcc_assert ((GET_MODE_PRECISION (from_mode)
>> != GET_MODE_PRECISION (to_mode))
>> -  || (DECIMAL_FLOAT_MODE_P (from_mode)
>> -  != DECIMAL_FLOAT_MODE_P (to_mode))
>> -  || (REAL_MODE_FORMAT (from_mode) == _bfloat_half_format
>> -  && REAL_MODE_FORMAT (to_mode) == _half_format)
>> -  || (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
>> -  && REAL_MODE_FORMAT (from_mode) == _half_format));
>> +  || acceptable_same_precision_modes (from_mode, to_mode));
>>
>>if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>>  {
>> --
>> 2.39.1
> 
> This part looks good to me FWIW, but what's the correct behaviour of:
> 
>   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>   {
> if (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
> && REAL_MODE_FORMAT (from_mode) == _half_format)
>   /* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
>   tab = trunc_optab;
> else
>   /* Conversion between decimal float and binary float, same
>  size.  */
>   tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
> 
> for the new pairing?  The intent for bfloat/half seems to be that bfloat
> is treated as arbitrarily “lesser than” half, so half->bfloat is a
> truncation and bfloat->half is an extension.  It seems like it would be
> good to do something similar for the new pair

[PATCH 3/3] tree: Remove KFmode workaround [PR112993]

2024-07-04 Thread Kewen.Lin
Hi,

The fix for PR112993 will make KFmode have 128 bit mode precision,
we don't need this workaround to fix up the type precision any
more, and just go with the mode precision.  So this patch is to
remove KFmode workaround.

Bootstrapped and regtested on x86_64-redhat-linux,
powerpc64{,le}-linux-gnu (ibm128 long double default)
and powerpc64le-linux-gnu (ieee128 long double default).

Is it OK for trunk if {1,2}/3 in this series get landed?

BR,
Kewen
-

PR target/112993

gcc/ChangeLog:

* tree.cc (build_common_tree_nodes): Drop the workaround for rs6000
KFmode precision adjustment.
---
 gcc/tree.cc | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index f801712c9dd..f730981ec8b 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -9575,15 +9575,6 @@ build_common_tree_nodes (bool signed_char)
   if (!targetm.floatn_mode (n, extended).exists ())
continue;
   int precision = GET_MODE_PRECISION (mode);
-  /* Work around the rs6000 KFmode having precision 113 not
-128.  */
-  const struct real_format *fmt = REAL_MODE_FORMAT (mode);
-  gcc_assert (fmt->b == 2 && fmt->emin + fmt->emax == 3);
-  int min_precision = fmt->p + ceil_log2 (fmt->emax - fmt->emin);
-  if (!extended)
-   gcc_assert (min_precision == n);
-  if (precision < min_precision)
-   precision = min_precision;
   FLOATN_NX_TYPE_NODE (i) = make_node (REAL_TYPE);
   TYPE_PRECISION (FLOATN_NX_TYPE_NODE (i)) = precision;
   layout_type (FLOATN_NX_TYPE_NODE (i));
--
2.39.1


[PATCH 2/3 v2] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-07-04 Thread Kewen.Lin
Hi,

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.

In function convert_mode_scalar, it adopts sext_optab for
same precision modes conversion if !DECIMAL_FLOAT_MODE_P,
so we only need to support sext_optab for any possible
conversion.  Thus this patch removes some useless trunc
optab supports, supplements one new sext_optab which calls
the common handler rs6000_expand_float128_convert, unnames
two define_insn_and_split to avoid conflicts and make them
more clear.  Considering the current implementation that
there is no chance to have KF <-> IF conversion (since
either of them would be TF already), it adds two dummy
define_expands to assert this.

Bootstrapped and regtested on x86_64-redhat-linux,
powerpc64{,le}-linux-gnu (ibm128 long double default)
and powerpc64le-linux-gnu (ieee128 long double default).

Comparing to v1 [2], it makes use of new hook
mode_for_floating_type and factors out the change on generic
part of code to 1/3.

I'm going to push this once the generic part 1/3 gets
approved and no objection on this one.

btw, two related patches on fortran[3] and ranger[4] have been
approved.

[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651018.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651019.html

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(rs6000_c_mode_for_floating_type): Likewise.
* config/rs6000/rs6000.md (define_expand trunciftf2): Remove.
(define_expand truncifkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand expandtfkf2, expandtfif2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Merge to ...
(define_expand expandtf2): ... this, new.
(define_expand expandiftf2): Update with assert.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
---
 gcc/config/rs6000/rs6000-modes.def | 31 +
 gcc/config/rs6000/rs6000-modes.h   | 36 ---
 gcc/config/rs6000/rs6000.cc|  9 +---
 gcc/config/rs6000/rs6000.h |  5 ---
 gcc/config/rs6000/rs6000.md| 72 --
 gcc/config/rs6000/t-rs6000 |  1 -
 6 files changed, 32 insertions(+), 122 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-modes.h

diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index 094b246c834..b69593c40a6 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -18,12 +18,11 @@
along with GCC; see the file COPYING3.  If not see
.  */

-/* We order the 3 128-bit floating point types so that IFmode (IBM 128-bit
-   floating point) is the 128-bit floating point type with the highest
-   precision (128 bits).  This so that machine independent parts of the
-   compiler do not try to widen IFmode to TFmode on ISA 3.0 (power9) that has
-   hardware support for IEEE 128-bit.  We set TFmode (long double mode) in
-   between, and KFmode (explicit __float128) below it.
+/* We order the 3 128-bit floating point type modes here as 

[PATCH 1/3] expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format [PR112993]

2024-07-04 Thread Kewen.Lin
Hi,

With some historical reasons, rs6000 defines KFmode, TFmode
and IFmode to have different mode precision, but it causes
some issues and needs some workarounds such as r14-6478 for
PR112788.  So we are going to make all rs6000 128 bit scalar
FP modes have 128 bit precision.  Be prepared for that, this
patch is to make function convert_mode_scalar allow same
precision FP modes conversion if their underlying formats are
ibm_extended_format and ieee_quad_format respectively, just
like the existing special treatment on arm_bfloat_half_format
<-> ieee_half_format.  It also factors out all the relevant
checks into a lambda function.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* expr.cc (convert_mode_scalar): Allow same precision conversion
between scalar floating point modes if whose underlying format is
ibm_extended_format or ieee_quad_format, and refactor assertion
with new lambda function acceptable_same_precision_modes.
---
 gcc/expr.cc | 30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index ffbac513692..eac4dcc982e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -338,6 +338,29 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
   enum rtx_code equiv_code = (unsignedp < 0 ? UNKNOWN
  : (unsignedp ? ZERO_EXTEND : SIGN_EXTEND));

+  auto acceptable_same_precision_modes
+= [] (scalar_mode from_mode, scalar_mode to_mode) -> bool
+{
+  if (DECIMAL_FLOAT_MODE_P (from_mode) != DECIMAL_FLOAT_MODE_P (to_mode))
+   return true;
+
+  /* arm_bfloat_half_format <-> ieee_half_format */
+  if ((REAL_MODE_FORMAT (from_mode) == _bfloat_half_format
+  && REAL_MODE_FORMAT (to_mode) == _half_format)
+ || (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
+ && REAL_MODE_FORMAT (from_mode) == _half_format))
+   return true;
+
+  /* ibm_extended_format <-> ieee_quad_format */
+  if ((REAL_MODE_FORMAT (from_mode) == _extended_format
+  && REAL_MODE_FORMAT (to_mode) == _quad_format)
+ || (REAL_MODE_FORMAT (from_mode) == _quad_format
+ && REAL_MODE_FORMAT (to_mode) == _extended_format))
+   return true;
+
+  return false;
+};
+
   if (to_real)
 {
   rtx value;
@@ -346,12 +369,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)

   gcc_assert ((GET_MODE_PRECISION (from_mode)
   != GET_MODE_PRECISION (to_mode))
- || (DECIMAL_FLOAT_MODE_P (from_mode)
- != DECIMAL_FLOAT_MODE_P (to_mode))
- || (REAL_MODE_FORMAT (from_mode) == _bfloat_half_format
- && REAL_MODE_FORMAT (to_mode) == _half_format)
- || (REAL_MODE_FORMAT (to_mode) == _bfloat_half_format
- && REAL_MODE_FORMAT (from_mode) == _half_format));
+ || acceptable_same_precision_modes (from_mode, to_mode));

   if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
{
--
2.39.1


Re: [PATCH 13/13 ver5] rs6000, remove vector set and vector init built-ins.

2024-07-04 Thread Kewen.Lin
Hi Carl,

on 2024/7/4 07:51, Carl Love wrote:
>  GCC maintainers:
> 
> The patch has been updated to remove the customized vec_init built-in code.  
> Specfivically the init identifier, the related generated code for the init 
> built-in attribute bit, function altivec_expand_vec_init_builtin and calls to 
> the function.
> 
> Please let me know if the patch is acceptable for mainline. Thanks.
> 
>   Carl
> 
> ---
> 
> rs6000, remove vector set and vector init built-ins.
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_init_v1ti
> 
> perform the same operation as initializing the vector in C code. For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
>   __builtin_vec_set_v2df
> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.
> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> The code to define the bif_init_bit, bif_is_init, as well as their uses
> is removed.  The function altivec_expand_vec_init_builtin is also removed.

Nit: s/is removed/are removed/ ?

> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtin.cc (altivec_expand_vec_init_builtin):
>     Removed the function.

Nit: s/Removed/Remove/, applied for the other changelog entries.

>     (rs6000_expand_builtin): Removed the if bif_is_int check to call
>     the altivec_expand_vec_init_builtin function.


>     * config/rs6000/rs6000-builtins.def: Removed the attribute string
>     comment for init.
>     (__builtin_vec_init_v16qi,
>     __builtin_vec_init_v4sf, __builtin_vec_init_v4si,
>     __builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
>     __builtin_vec_init_v2df, __builtin_vec_init_v2di,
>     __builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
>     __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
>     built-in definitions.
>     * config/rs6000-gen-builtins.cc: Removed comment for init attribute
>     string.
>     (struct attrinfo): Removed isint entry.

Typo: s/isint/isinit/

>     (parse_bif_attrs): Removed the if statement to check for attribute
>     init.
>     (ifdef DEBUG): Removed print for init attribute string.
>     (write_decls): Removed print for define bif_init_bit and
>     define for bif_is_init.
>     (write_bif_static_init): Removed if bifp->attrs.isinit statement.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc  | 40 -
>  gcc/config/rs6000/rs6000-builtins.def    | 45 +++-
>  gcc/config/rs6000/rs6000-gen-builtins.cc | 16 +++--
>  3 files changed, 8 insertions(+), 93 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 646e740774e..0a24d20a58c 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2313,43 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
> icode, tree exp, rtx target)
>    return target;
>  }
> 
> -/* Expand vec_init builtin.  */
> -static rtx
> -altivec_expand_vec_init_builtin (tree type, tree exp, rtx target)
> -{
> -  machine_mode tmode = TYPE_MODE (type);
> -  machine_mode inner_mode = GET_MODE_INNER (tmode);
> -  int i, n_elt = GET_MODE_NUNITS (tmode);
> -
> -  gcc_assert (VECTOR_MODE_P (tmode));
> -  gcc_assert (n_elt == call_expr_nargs (exp));
> -
> -  if (!target || !register_operand (target, tmode))
> -    target = gen_reg_rtx (tmode);
> -
> -  /* If we have a vector compromised of a single element, such as V1TImode, 
> do
> - the initialization directly.  */
> -  if (n_elt == 1 && GET_MODE_SIZE (tmode) == GET_MODE_SIZE (inner_mode))
> -    {
> -  rtx x = expand_normal (CALL_EXPR_ARG (exp, 0));
> -  emit_move_insn (target, gen_lowpart (tmode, x));
> -    }
> -  else
> -    {
> -  rtvec v = rtvec_alloc (n_elt);
> -
> -  for (i = 0; i < n_elt; ++i)
> -    {
> -    

Re: [PATCH 4/13 ver5] rs6000, extend the current vec_{un, }signed{e, o} built-ins

2024-07-04 Thread Kewen.Lin
Hi,

on 2024/7/4 07:40, Carl Love wrote:
> 
> GCC maintainers:
> 
> I moved the removal of built-ins __builtin_vsx_xvcvdpsxws and 
> __builtin_vsx_xvcvdpuxws from patch 4 to  patch patch 2.
> 
> I fixed various issues with the ChangeLog wording, spaces and descriptions.
> 
> Fixed the comments in file gcc/config/rs6000/vsx.md.
> 
> Updated the built-in description in gcc/doc/extend.texi.
> 
> Please let me know if the patch is acceptable for mainline. Thanks.
> 
> Carl
> 
> 
> 
>  rs6000, extend the current vec_{un,}signed{e,o}  built-ins

Nit: s/  / /

> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
> convert a vector of floats to a vector of signed/unsigned long long ints.
> Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument
> vector of floats to return a vector of even/odd signed/unsigned integers.
> 
> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
> built-ins.
> 
> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
> now for internal use only. They are not documented and they do not
> have test cases.
> 
> Add testcases and update documentation.

OK for trunk, thanks!

BR,
Kewen

> 
> gcc/ChangeLog:
>     (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Rename to
>     __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
>     (XVCVSPSXDS, XVCVSPUXDS): Rename to VEC_VSIGNEDE_V4SF,
>     VEC_VUNSIGNEDE_V4SF respectively.
>     (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
>     built-in definitions.
>     * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>     vec_unsignede, vec_unsignedo): Add new overloaded specifications.
>     * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>     vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
>     * doc/extend.texi (vec_signedo, vec_signede, vec_unsignedo,
>     vec_unsignede): Add documentation for new overloaded built-ins to
>     convert vector float to vector {un,}signed long long.
> 
> gcc/testsuite/ChangeLog:
>     * gcc.target/powerpc/builtins-3-runnable.c
>     (test_unsigned_int_result, test_ll_unsigned_int_result): Add
>     new argument.
>     (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
>     tests for the overloaded built-ins.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 14 +++-
>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>  gcc/config/rs6000/vsx.md  | 84 +++
>  gcc/doc/extend.texi   | 10 +++
>  .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
>  5 files changed, 154 insertions(+), 11 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 43d5c229dc3..29a9deb3410 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1697,11 +1697,17 @@
>    const vd __builtin_vsx_xvcvspdp (vf);
>  XVCVSPDP vsx_xvcvspdp {}
> 
> -  const vsll __builtin_vsx_xvcvspsxds (vf);
> -    XVCVSPSXDS vsx_xvcvspsxds {}
> +  const vsll __builtin_vsignede_v4sf (vf);
> +    VEC_VSIGNEDE_V4SF vsignede_v4sf {}
> 
> -  const vsll __builtin_vsx_xvcvspuxds (vf);
> -    XVCVSPUXDS vsx_xvcvspuxds {}
> +  const vsll __builtin_vsignedo_v4sf (vf);
> +    VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
> +
> +  const vull __builtin_vunsignede_v4sf (vf);
> +    VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
> +
> +  const vull __builtin_vunsignedo_v4sf (vf);
> +    VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
> 
>    const vd __builtin_vsx_xvcvsxddp (vsll);
>  XVCVSXDDP vsx_floatv2div2df2 {}
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 84bd9ae6554..4d857bb1af3 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -3307,10 +3307,14 @@
>  [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
>    vsi __builtin_vec_vsignede (vd);
>  VEC_VSIGNEDE_V2DF
> +  vsll __builtin_vec_vsignede (vf);
> +    VEC_VSIGNEDE_V4SF
> 
>  [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
>    vsi __builtin_vec_vsignedo (vd);
>  VEC_VSIGNEDO_V2DF
> +  vsll __builtin_vec_vsignedo (vf);
> +    VEC_VSIGNEDO_V4SF
> 
>  [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
>    vsi __builtin_vec_signexti (vsc);
> @@ -4433,10 +4437,14 @@
>  [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
>    vui __builtin_vec_vunsignede (vd);
>  VEC_VUNSIGNEDE_V2DF
> +  vull __builtin_vec_vunsignede (vf);
> +    VEC_VUNSIGNEDE_V4SF
> 
>  [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
>    vui __builtin_vec_vunsignedo (vd);
>  VEC_VUNSIGNEDO_V2DF
> +  vull __builtin_vec_vunsignedo (vf);
> +    VEC_VUNSIGNEDO_V4SF
> 
>  [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
>    vui __builtin_vec_extract_exp (vf);
> diff --git 

Re: [PATCH 2/13 ver5] rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

2024-07-04 Thread Kewen.Lin
Hi,

on 2024/7/4 07:33, Carl Love wrote:
> GCC maintainers:
> 
> Per the comments on patch 2 from version 4, I have moved the removal of 
> built-ins __builtin_vsx_xvcvdpsxws and __builtin_vsx_xvcvdpuxws from patch 4 
> to this patch.
> 
> Please let me know if this patch is acceptable.  Thanks.
> 
>     Carl
> 
> 
> 
> rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

Nit: uncomplete subject
rs6000: Remove built-ins __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

OK for trunk with this nit fixed, thanks!

BR,
Kewen

> 
> The built-in __builtin_vsx_xvcvspsxws is covered by built-in vec_signed
> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
> built-in is not documented and there are no test cases for it.
> 
> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
> vec_unsigned, remove.
> 
> The __builtin_vsx_xvcvspuxws is redundant as it is covered by
> vec_unsigned, remove.
> 
> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove.
> 
> This patch removes the redundant built-ins.
> 
> gcc/ChangeLog:
>     * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
>     __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws,
>     __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Remove
>     built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 15 ---
>  1 file changed, 15 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 7c36976a089..60ccc5542be 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1688,36 +1688,21 @@
>    const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
> 
> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
> -    XVCVDPSXWS vsx_xvcvdpsxws {}
> -
>    const vsll __builtin_vsx_xvcvdpuxds (vd);
>  XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
> 
>    const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
> 
> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
> -    XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
> -
> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
> -    XVCVDPUXWS vsx_xvcvdpuxws {}
> -
>    const vd __builtin_vsx_xvcvspdp (vf);
>  XVCVSPDP vsx_xvcvspdp {}
> 
>    const vsll __builtin_vsx_xvcvspsxds (vf);
>  XVCVSPSXDS vsx_xvcvspsxds {}
> 
> -  const vsi __builtin_vsx_xvcvspsxws (vf);
> -    XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
> -
>    const vsll __builtin_vsx_xvcvspuxds (vf);
>  XVCVSPUXDS vsx_xvcvspuxds {}
> 
> -  const vsi __builtin_vsx_xvcvspuxws (vf);
> -    XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
> -
>    const vd __builtin_vsx_xvcvsxddp (vsll);
>  XVCVSXDDP vsx_floatv2div2df2 {}
> 


Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-04 Thread Kewen.Lin
Hi Carl,

on 2024/7/4 01:23, Carl Love wrote:
> 
> On 7/3/24 2:36 AM, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2024/6/27 01:05, Carl Love wrote:
>>> GCC maintainers:
>>>
>>> The following patch updates the user documentation for the vec_ld, vec_lde, 
>>> vec_st and vec_ste built-ins to make it clearer that there are data 
>>> alignment requirements for these built-ins.  If the data alignment 
>>> requirements are not followed, the data loaded or stored by these built-ins 
>>> will be wrong.
>>>
>>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>>
>>>    Carl
>>>
>>> 
>>> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
>>>
>>> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
>>> aligned to work properly.  Add some additional text to the existing
>>> documentation to make this clearer to the user.
>>>
>>> Similarly, the vec_lde and vec_ste built-ins also have data alignment
>>> requirements based on the size of the vector element.  Update the
>>> documentation to make this clear to the user.
>>>
>>> gcc/ChangeLog:
>>> * doc/extend.texi: Add clarification for the use of the vec_ld
>>> vec_st, vec_lde and vec_ste built-ins.
>>> ---
>>>   gcc/doc/extend.texi | 15 +++
>>>   1 file changed, 11 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index ee3644a5264..55faded17b9 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned 
>>> char,
>>>   @end smallexample
>>>     Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
>>> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
>>> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
>>> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
>>> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
>>> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
>> This change removed "even if the VSX instruction set is available.", I think 
>> it's
>> not intentional?  vec_ld and vec_st are well defined in PVIPR, this 
>> paragraph is
>> not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st 
>> here, it
>> aims to note the difference between these two pairs.  But I'm not opposed to 
>> add
>> more words to emphasis the special masking off, I prefer to use the same 
>> words to
>> PVIPR "ignoring the four low-order bits of the calculated address".  And 
>> IMHO we
>> should not say "it requires the data to be 16-byte aligned to work properly" 
>> in
>> case the users are aware of this behavior well and have some no 16-byte 
>> aligned
>> data and expect it to behave like that, it's arguable to define "it" as not 
>> work
>> properly.
> 
> Yea, probably should have left "even if the VSX instruction set is available."
> 
> I was looking to make it clear that if the data is not 16-bye aligned you may 
> not get the expected data loaded/stored.
> 
> So how about the following instead:
> 
>    Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
>    generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
>    if the VSX
>    instruction set is available. The instructions mask off the lower
>    4-bits of
>    the calculated address. The use of these instructions on data that
>    is not
>    16-byte aligned may result in unexpected bytes being loaded or stored.

Sorry for nitpicking, to avoid the implicit conclusion between "not 16-byte
aligned" and "unexpected bytes" (considering even if the given address isn't
16-byte aligned, if users would like to leverage this masking off trick, they
can do that as the behavior is definite, the results are still expected for
them, and PVIPR doesn't object this), so maybe "... of the calculated address,
so be careful of the alignment of the calculated address when meeting unexpected
load or store data."?

> 
>>> +instructions mask off the lower 4 bits of the effective address thus 
>>> requiring
>>> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
>>> +@samp{vec_ste} built-in functions operate 

Re: [PATCH] rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

2024-07-04 Thread Kewen.Lin
on 2024/7/3 23:05, Peter Bergner wrote:
> On 7/3/24 4:01 AM, Kewen.Lin wrote:
>>> -  if (TARGET_POWER10
>>> +  if (TARGET_POWER8
>>>&& info->calls_p
>>>&& DEFAULT_ABI == ABI_ELFv2
>>>&& rs6000_rop_protect)
>>
>> Nit: I noticed that this is the only place to change
>> info->rop_hash_size to non-zero, and ...
>>
>>> @@ -3277,7 +3277,7 @@ rs6000_emit_prologue (void)
>>>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>>>   but there's no way to know that here.  Harmless except for
>>>   performance, of course.  */
>>> -  if (TARGET_POWER10 && rs6000_rop_protect && info->rop_hash_size != 0)
>>> +  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)
>>
>> ... this condition and ...
>>
>>>  {
>>>gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>>>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
>>> @@ -5056,7 +5056,7 @@ rs6000_emit_epilogue (enum epilogue_type 
>>> epilogue_type)
>>>  
>>>/* The ROP hash check must occur after the stack pointer is restored
>>>   (since the hash involves r1), and is not performed for a sibcall.  */
>>> -  if (TARGET_POWER10
>>> +  if (TARGET_POWER8>&& rs6000_rop_protect
>>>&& info->rop_hash_size != 0
>>
>> ... here, both check info->rop_hash_size isn't zero, I think we can drop 
>> these
>> two TARGET_POWER10 (TARGET_POWER8) and rs6000_rop_protect checks?  Instead 
>> just
>> update the inner gcc_assert (now checking DEFAULT_ABI == ABI_ELFv2) by extra
>> checkings on TARGET_POWER8 && rs6000_rop_protect?
>>
>> The other looks good to me, ok for trunk with this nit tweaked (if you agree
>> with it and re-tested well), thanks!
> 
> I agree with you, because the next patch I haven't submitted yet (waiting
> on this to get in), makes that simplification as part of the adding earlier
> checking of invalid options. :-)  The follow-on patch will not only remove
> the TARGET_* and the 2nd/3rd rs6000_rop_protect usage, but will also remove
> the test and asserts of ELFv2...because we've already verified valid option
> usage earlier in the normal options handling code.
> 
> Therefore, I'd like to keep this patch as simple as possible and limited to
> the TARGET_POWER10 -> TARGET_POWER8 change and the cleanup of those tests is
> coming in the next patch...which has already been tested.

Looking forward to the upcoming patch, then this patch is ok for trunk, thanks!

BR,
Kewen



Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-03 Thread Kewen.Lin
Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:
> GCC maintainers:
> 
> The following patch updates the user documentation for the vec_ld, vec_lde, 
> vec_st and vec_ste built-ins to make it clearer that there are data alignment 
> requirements for these built-ins.  If the data alignment requirements are not 
> followed, the data loaded or stored by these built-ins will be wrong.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> 
> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
> 
> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
> aligned to work properly.  Add some additional text to the existing
> documentation to make this clearer to the user.
> 
> Similarly, the vec_lde and vec_ste built-ins also have data alignment
> requirements based on the size of the vector element.  Update the
> documentation to make this clear to the user.
> 
> gcc/ChangeLog:
>   * doc/extend.texi: Add clarification for the use of the vec_ld
>   vec_st, vec_lde and vec_ste built-ins.
> ---
>  gcc/doc/extend.texi | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ee3644a5264..55faded17b9 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned 
> char,
>  @end smallexample
>  
>  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think 
it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words 
to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly. 

> +instructions mask off the lower 4 bits of the effective address thus 
> requiring
> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
> +integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
> +off the lower bits of the effective address based on the size of the data.
> +Thus the data must be aligned to the size of the vector element to work
> +properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
> +@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and 
users
should read the description in PVIPR instead (it's more recommended).

BR,
Kewen

>  
>  @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
>  @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07



Re: [PATCH-1v4, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-07-03 Thread Kewen.Lin
Hi Haochen,

on 2024/6/27 09:41, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main change is to define
> and use the constant mask for test data class insns.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/rs6000.md (ISNAN, ISINF, ISZERO, ISDENORMAL): Define.
>   * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>   (isinf2 for IEEE128): New expand.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..e84e6b08f03 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -53,6 +53,17 @@ (define_constants
> (FRAME_POINTER_REGNUM 110)
>])
> 
> +;;
> +;; Test data class mask
> +;;
> +
> +(define_constants
> +  [(ISNAN0x40)
> +   (ISINF0x30)
> +   (ISZERO   0xC)
> +   (ISDENORMAL   0x3)

Nit: Maybe it's better to add prefix on test data class, such
as: TEST_DATA_CLASS_NAN or DATA_CLASS_NAN.

And DATA_CLASS_INF can be separated as DATA_CLASS_POS_INF 0x20
and DATA_CLASS_NEG_INF 0x10, similar separating for DENORM.

> +  ])
> +
>  ;;
>  ;; UNSPEC usage
>  ;;
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..67615bae8c0 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>operands[4] = CONST0_RTX (SImode);
>  })
> 
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (ISINF)));
> +  DONE;
> +})
> +
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]

QP insns are special, only altivec regs can be used, so 
s/vsx_register_operand/altivec_register_operand/

Also applied to the other two patches for isnormal and isfinite.

And as discussed offline, let's merge these patterns with mode attribute. :)

> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
> (ISINF)));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> new file mode 100644
> index 000..c1c4f64ee8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */

Nit: Not necessary, but it's preferred to put dg-options line before the line 
for powerpc_vsx
as powerpc_vsx considers current_compiler_flags but dg-options line isn't 
processed if it's put
behind.  Also applied for the other test cases.

BR,
Kewen

> +
> +int test1 (double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test3 (float x)
> +{
> +  return __builtin_isinff (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> new file mode 100644
> index 000..ed305e8572e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (long double x)
> +{
> +  return __builtin_isinfl (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */



Re: [PATCH] rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

2024-07-03 Thread Kewen.Lin
Hi Peter,

on 2024/6/20 05:14, Peter Bergner wrote:
> We currently only emit the ROP-protect hash* insns for Power10, where the
> insns were added to the architecture.  We want to emit them for earlier
> cpus (where they operate as NOPs), so that if those older binaries are
> ever executed on a Power10, then they'll be protected from ROP attacks.
> Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
> them for Power8 and later.  This matches clang's behavior.
> 
> This patch is independent of the ROP shrink-wrap fix submitted earlier.
> This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
> Ok for trunk?  
> 
> Peter
> 
> 
> 
> 2024-06-19  Peter Bergner  
> 
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Use TARGET_POWER8.
>   (rs6000_emit_prologue): Likewise.
>   * config/rs6000/rs6000.md (hashchk): Likewise.
>   (hashst): Likewise.
>   Fix whitespace.
> 
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-2.c: New test.
>   * lib/target-supports.exp (rop_ok): Use
>   check_effective_target_has_arch_pwr8.
> ---
>  gcc/config/rs6000/rs6000-logue.cc |  6 +++---
>  gcc/config/rs6000/rs6000.md   |  6 +++---
>  gcc/testsuite/gcc.target/powerpc/pr114759-2.c | 17 +
>  gcc/testsuite/lib/target-supports.exp |  2 +-
>  4 files changed, 24 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index c384e48e378..bd363b625a4 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -716,7 +716,7 @@ rs6000_stack_info (void)
>info->calls_p = (!crtl->is_leaf || cfun->machine->ra_needs_full_frame);
>info->rop_hash_size = 0;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_POWER8
>&& info->calls_p
>&& DEFAULT_ABI == ABI_ELFv2
>&& rs6000_rop_protect)

Nit: I noticed that this is the only place to change
info->rop_hash_size to non-zero, and ...

> @@ -3277,7 +3277,7 @@ rs6000_emit_prologue (void)
>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>   but there's no way to know that here.  Harmless except for
>   performance, of course.  */
> -  if (TARGET_POWER10 && rs6000_rop_protect && info->rop_hash_size != 0)
> +  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)

... this condition and ...

>  {
>gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
> @@ -5056,7 +5056,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>  
>/* The ROP hash check must occur after the stack pointer is restored
>   (since the hash involves r1), and is not performed for a sibcall.  */
> -  if (TARGET_POWER10
> +  if (TARGET_POWER8>&& rs6000_rop_protect
>&& info->rop_hash_size != 0

... here, both check info->rop_hash_size isn't zero, I think we can drop these
two TARGET_POWER10 (TARGET_POWER8) and rs6000_rop_protect checks?  Instead just
update the inner gcc_assert (now checking DEFAULT_ABI == ABI_ELFv2) by extra
checkings on TARGET_POWER8 && rs6000_rop_protect?

The other looks good to me, ok for trunk with this nit tweaked (if you agree
with it and re-tested well), thanks!

BR,
Kewen


>&& epilogue_type != EPILOGUE_TYPE_SIBCALL)
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a5d20594789..694076e311f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -15808,9 +15808,9 @@ (define_insn "*cmpeqb_internal"
>  
>  (define_insn "hashst"
>[(set (match_operand:DI 0 "simple_offsettable_mem_operand" "=m")
> -(unspec_volatile:DI [(match_operand:DI 1 "int_reg_operand" "r")]
> + (unspec_volatile:DI [(match_operand:DI 1 "int_reg_operand" "r")]
>   UNSPEC_HASHST))]
> -  "TARGET_POWER10 && rs6000_rop_protect"
> +  "TARGET_POWER8 && rs6000_rop_protect"
>  {
>static char templ[32];
>const char *p = rs6000_privileged ? "p" : "";
> @@ -15823,7 +15823,7 @@ (define_insn "hashchk"
>[(unspec_volatile [(match_operand:DI 0 "int_reg_operand" "r")
>(match_operand:DI 1 "simple_offsettable_mem_operand" "m")]
>   UNSPEC_HASHCHK)]
> -  "TARGET_POWER10 && rs6000_rop_protect"
> +  "TARGET_POWER8 && rs6000_rop_protect"
>  {
>static char templ[32];
>const char *p = rs6000_privileged ? "p" : "";
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114759-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> new file mode 100644
> index 000..3881ebd416e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mrop-protect" } */
> +/* { dg-require-effective-target 

PING^1 [PATCH] rs6000: Adjust -fpatchable-function-entry* support for dual entry [PR112980]

2024-07-02 Thread Kewen.Lin
Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html

BR,
Kewen

on 2024/5/8 13:49, Kewen.Lin wrote:
> Hi,
> 
> As the discussion in PR112980, although the current
> implementation for -fpatchable-function-entry* conforms
> with the documentation (making N NOPs be consecutive),
> it's inefficient for both kernel and userspace livepatching
> (see comments in PR for the details).
> 
> So this patch is to change the current implementation by
> emitting the "before" NOPs before global entry point and
> the "after" NOPs after local entry point.  The new behavior
> would not keep NOPs to be consecutive, so the documentation
> is updated to emphasize this.
> 
> Bootstrapped and regress-tested on powerpc64-linux-gnu
> P8/P9 and powerpc64le-linux-gnu P9 and P10.
> 
> Is it ok for trunk?  And backporting to active branches
> after burn-in time?  I guess we should also mention this
> change in changes.html?
> 
> BR,
> Kewen
> -
>   PR target/112980
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
>   Adjust the handling on patch area emitting with dual entry, remove
>   the restriction on "before" NOPs count, not emit "before" NOPs any
>   more but only emit "after" NOPs.
>   * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
>   Adjust by respecting cfun->machine->stop_patch_area_print.
>   (rs6000_elf_declare_function_name): For ELFv2 with dual entry, set
>   cfun->machine->stop_patch_area_print as true.
>   * config/rs6000/rs6000.h (struct machine_function): Remove member
>   global_entry_emitted, add new member stop_patch_area_print.
>   * doc/invoke.texi (option -fpatchable-function-entry): Adjust the
>   documentation for PowerPC ELFv2 dual entry.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/patchable_function_entry-default.c: Adjust.
>   * gcc.target/powerpc/pr99888-4.c: Likewise.
>   * gcc.target/powerpc/pr99888-5.c: Likewise.
>   * gcc.target/powerpc/pr99888-6.c: Likewise.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 40 +--
>  gcc/config/rs6000/rs6000.cc   | 15 +--
>  gcc/config/rs6000/rs6000.h| 10 +++--
>  gcc/doc/invoke.texi   |  8 ++--
>  .../patchable_function_entry-default.c|  3 --
>  gcc/testsuite/gcc.target/powerpc/pr99888-4.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-5.c  |  4 +-
>  gcc/testsuite/gcc.target/powerpc/pr99888-6.c  |  4 +-
>  8 files changed, 33 insertions(+), 55 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..0eb019b44b3 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4006,43 +4006,21 @@ rs6000_output_function_prologue (FILE *file)
> fprintf (file, "\tadd 2,2,12\n");
>   }
> 
> -  unsigned short patch_area_size = crtl->patch_area_size;
> -  unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> - {
> -   cfun->machine->global_entry_emitted = true;
> -   /* As ELFv2 ABI shows, the allowable bytes between the global
> -  and local entry points are 0, 4, 8, 16, 32 and 64 when
> -  there is a local entry point.  Considering there are two
> -  non-prefixed instructions for global entry point prologue
> -  (8 bytes), the count for patchable nops before local entry
> -  point would be 2, 6 and 14.  It's possible to support those
> -  other counts of nops by not making a local entry point, but
> -  we don't have clear use cases for them, so leave them
> -  unsupported for now.  */
> -   if (patch_area_entry > 0)
> - {
> -   if (patch_area_entry != 2
> -   && patch_area_entry != 6
> -   && patch_area_entry != 14)
> - error ("unsupported number of nops before function entry (%u)",
> -patch_area_entry);
> -   rs6000_print_patchable_function_entry (file, patch_area_entry,
> -  true);
> -   patch_area_size -= patch_area_entry;
> - }
> - }
> -
>fputs ("\t.localentry\t", file);
>assemble_name (file, name);
>fputs (",.-", file);
>assemble_name (file, name);
>fputs ("\n",

Re: [PATCH V2] rs6000: load high and low part of 128bit vector independently [PR110040]

2024-07-02 Thread Kewen.Lin
Hi Jeevitha,

on 2024/6/19 20:39, jeevitha wrote:
> Hi All,
> 
> Updated the patch based on review comments. This patch passed bootstrap
> and regression testing on powerpc64le-linux with no regressions.
> 
> PR110040 exposes an issue concerning moves from vector registers to GPRs.
> There are two moves, one for upper 64 bits and the other for the lower
> 64 bits.  In the problematic test case, we are only interested in storing
> the lower 64 bits.  However, the instruction for copying the upper 64 bits
> is still emitted and is dead code.  This patch adds a splitter that splits
> apart the two move instructions so that DCE can remove the dead code after
> splitting.
> 
> 2024-06-19  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110040
>   * config/rs6000/vsx.md (split pattern for V1TI to DI move): Defined.

Nit: s/Defined/New define/

> 
> gcc/testsuite/
>   PR target/110040
>   * gcc.target/powerpc/pr110040-1.c: New testcase.
>   * gcc.target/powerpc/pr110040-2.c: New testcase.
> 
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..f1979815df6 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6706,3 +6706,20 @@
>"vmsumcud %0,%1,%2,%3"
>[(set_attr "type" "veccomplex")]
>  )
> +
> +(define_split
> +  [(set (match_operand:V1TI 0 "gpc_reg_operand")
> +   (match_operand:V1TI 1 "vsx_register_operand"))]
> +  "reload_completed
> +   && TARGET_DIRECT_MOVE_64BIT
> +   && int_reg_operand (operands[0], V1TImode)
> +   && vsx_register_operand (operands[1], V1TImode)"
> +   [(pc)]
> +{
> +  rtx src_op = gen_rtx_REG (V2DImode, REGNO (operands[1]));
> +  rtx dest_op0 = gen_rtx_REG (DImode, REGNO (operands[0]));
> +  rtx dest_op1 = gen_rtx_REG (DImode, REGNO (operands[0]) + 1);
> +  emit_insn (gen_vsx_extract_v2di (dest_op0, src_op, const0_rtx));
> +  emit_insn (gen_vsx_extract_v2di (dest_op1, src_op, const1_rtx));
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> new file mode 100644
> index 000..0a521e9e51d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-1.c
> @@ -0,0 +1,15 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +#include 
> +
> +void
> +foo (signed long *dst, vector signed __int128 src)
> +{
> +  *dst = (signed long) src[0];
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr110040-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> new file mode 100644
> index 000..d2ef471d666
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110040-2.c
> @@ -0,0 +1,16 @@
> +/* PR target/110040 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */
> +
> +/* Note: __builtin_altivec_tr_stxvrwx requires the -mcpu=power10 option */

It's a bit obscure here that the used bif name is "vec_xst_trunc" but this 
comment
says "__builtin_altivec_tr_stxvrwx", I can understand it's copied from the 
actual
error message but it also mentions "vec_xst_trunc" there when erroring.

Maybe just say "/* builtin vec_xst_trunc requires power10.  */" to avoid any
confusion here.

OK for trunk with these nits tweaked, thanks!

BR,
Kewen


> +
> +#include 
> +
> +void
> +foo (signed int *dst, vector signed __int128 src)
> +{
> +  __builtin_vec_xst_trunc (src, 0, dst);
> +}
> 
> 


Re: [RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-02 Thread Kewen.Lin
Hi!

on 2024/7/2 04:28, Segher Boessenkool wrote:
> On Mon, Jul 01, 2024 at 04:36:44PM +0200, Richard Biener wrote:
>> On Mon, Jul 1, 2024 at 8:17 AM Kewen.Lin  wrote:
>>> As PR115659 shows, assuming c = x CMP y, there are some
>>> folding chances for patterns r = c ? 0/z : z/-1:
>>>   - For r = c ? 0 : z, it can be folded into r = ~c & z.
>>>   - For r = c ? z : -1, it can be folded into r = ~c | z.
> 
> (!c instead of ~c, right?)

It's meant to indicate bitwise_not, the context here is vector,
for each vector element c has all bits 1 or 0, so ~c is fine?

> 
>>> But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
>>> compound operation, I'm not sure if each target with
>>> vector capability have a single vector instruction for it,
>>> if no, it's arguable to consider it always beats vector
>>> selection (like vector constant gets hoisted or combined
>>> and selection has same latency as normal logical operation).
>>> So IMHO we probably need to query target with new optabs.
>>> So this patch is to introduce new optabs andc, iorc and its
>>> corresponding internal functions BIT_{ANDC,IORC} (looking
>>> for suggestion for naming optabs and ifns), and if targets
>>> defines such optabs for vector modes, it means targets
>>> support these hardware insns and should be not worse than
>>> vector selection.  btw, the rs6000 changes are meant to
>>> give an example for a target supporting andc/iorc.
>>>
>>> Does this sound reasonable?
>>
>> I think it's reasonable to have andc - there are quite some CPUs
>> that have this op on GPRs as well I think, called andn (but I don't
>> want to get into bike-shedding).
> 
> The usual names are and for a & b, andc for a & ~b, andc1 for ~a & b,
> andcc for ~a & ~b, and an "n" in front of everything to complement the
> result.
> 
>> A corresponding iorc is then
> 
> Sure.  A full complement of *and* insns is equivalent to a full
> complement of *or* insns, of course.
> 
>> a natural extension (likewise xorc).
> 
> xor and nxor (which is called "eqv" on powerpc) are all that can exist
> of course :-)
> 
>> AVX512 has a very powerful
>> vector ternlog (but no scalar andn).
> 
> We have that as well, "xxeval", a Power ISA v3.1 insn.  It just has a
> full 8-bit logic table as part of the opcode.  But to fit that many bits
> it is a prefixed insn.

Yes, I guess we don't exploit it well so far.

> 
>> So OK from my side in case there are no negative comments or
>> bikeshedding on the name.  I can't approve the rs6000 changes
>> though.
> 
> But I can :-)  I'll reply to just that.  Thanks for handling this!

Thanks to both of you!  I'll wait for two more days or so in case other
people have some comments.

BR,
Kewen



Re: [PATCH] isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

2024-07-02 Thread Kewen.Lin
on 2024/7/1 22:28, Richard Biener wrote:
> On Mon, Jul 1, 2024 at 8:16 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As PR115659 shows, assuming c = x CMP y, there are some
>> folding chances for patterns r = c ? -1/z : z/0.
>>
>> For r = c ? -1 : z, it can be folded into:
>>   - r = c | z (with ior_optab supported)
>>   - or r = c ? c : z
>>
>> while for r = c ?  z : 0, it can be foled into:
>>   - r = c & z (with and_optab supported)
>>   - or r = c ? z : c
>>
>> This patch is to teach ISEL to take care of them and also
>> remove the redundant gsi_replace as the caller of function
>> gimple_expand_vec_cond_expr will handle it.
> 
> Yeah, not the nicest API ...
> 
>> Bootstrapped and regtested on x86_64-redhat-linux and
>> powerpc64{,le}-linux-gnu.
>>
>> Is it ok for trunk?
> 
> Minor nit below
> 
>> BR,
>> Kewen
>> -
>> PR tree-optimization/115659
>>
>> gcc/ChangeLog:
>>
>> * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
>> patterns x CMP y ? -1 : z and x CMP y ? z : 0.
>> ---
>>  gcc/gimple-isel.cc | 48 +++---
>>  1 file changed, 41 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
>> index 54c1801038b..71af1a8cd97 100644
>> --- a/gcc/gimple-isel.cc
>> +++ b/gcc/gimple-isel.cc
>> @@ -240,16 +240,50 @@ gimple_expand_vec_cond_expr (struct function *fun, 
>> gimple_stmt_iterator *gsi,
>> can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
>>  tcode);
>>
>> - /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
>>   if (can_compute_op0
>> - && integer_minus_onep (op1)
>> - && integer_zerop (op2)
>>   && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
>> {
>> - tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), 
>> op0);
>> - gassign *new_stmt = gimple_build_assign (lhs, conv_op);
>> - gsi_replace (gsi, new_stmt, true);
>> - return new_stmt;
>> + /* Assuming c = x CMP y.  */
>> + bool op1_minus_onep = integer_minus_onep (op1);
>> + bool op2_zerop = integer_zerop (op2);
>> + tree vtype = TREE_TYPE (lhs);
>> + machine_mode vmode = TYPE_MODE (vtype);
>> + /* Try to fold r = c ? -1 : 0 to r = c.  */
>> + if (op1_minus_onep && op2_zerop)
>> +   {
>> + tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
>> + return gimple_build_assign (lhs, conv_op);
>> +   }
>> + /* Try to fold r = c ? -1 : z to r = c | z, or
>> +r = c ? c : z.  */
>> + if (op1_minus_onep)
>> +   {
>> + tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
>> + tree new_op0 = make_ssa_name (vtype);
>> + gassign *new_stmt = gimple_build_assign (new_op0, conv_op);
>> + gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
>> + if (optab_handler (ior_optab, vmode) != CODE_FOR_nothing)
>> +   /* r = c | z */
>> +   return gimple_build_assign (lhs, BIT_IOR_EXPR, new_op0,
>> +   op2);
>> + /* r = c ? c : z */
>> + op1 = new_op0;
> 
> maybe better call it new_op1 then?  Or new_op.
> 
>> +   }
>> + /* Try to fold r = c ? z : 0 to r = c & z, or
>> +r = c ? z : c.  */
>> + else if (op2_zerop)
>> +   {
>> + tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
>> + tree new_op0 = make_ssa_name (vtype);
>> + gassign *new_stmt = gimple_build_assign (new_op0, conv_op);
>> + gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
>> + if (optab_handler (and_optab, vmode) != CODE_FOR_nothing)
>> +   /* r = c | z */
>> +   return gimple_build_assign (lhs, BIT_AND_EXPR, new_op0,
>> +   op1);
>> + /* r = c ? z : c */
>> + op2 = new_op0;
> 
> Likewise (new_op2 or also new_op).
> 
> OK with that nit fixed.

Thanks Richi, refined with new_op1/new_op2, re-tested well and pushed as 
r15-1763.

BR,
Kewen

> 
> Thanks,
> Richard.
> 
>> +   }
>> }
>>
>>   /* When the compare has EH we do not want to forward it when
>> --
>> 2.43.0



[PATCH] sparc: define SPARC_LONG_DOUBLE_TYPE_SIZE for vxworks [PR115739]

2024-07-01 Thread Kewen.Lin
Hi,

Commit r15-1594 removed define of LONG_DOUBLE_TYPE_SIZE in
sparc.cc, it's based on the assumption that each OS has its
own define (see the comments in sparc.h), but it exposes an
issue on vxworks which lacks of the define.

We can bring back the default SPARC_LONG_DOUBLE_TYPE_SIZE to
sparc.cc, but according to the comments in sparc.h, I think
it's better to define this in vxworks.h.  btw, I also went
through all the sparc supported triples, vxworks is the only
one that misses this define.

Built well with cross build --target=sparc-wrs-vxworks,
is it ok for trunk?

BR,
Kewen
-

PR target/115739

gcc/ChangeLog:

* config/sparc/vxworks.h (SPARC_LONG_DOUBLE_TYPE_SIZE): New define.
---
 gcc/config/sparc/vxworks.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/sparc/vxworks.h b/gcc/config/sparc/vxworks.h
index c1a9310fb3f..4cdb3b1685d 100644
--- a/gcc/config/sparc/vxworks.h
+++ b/gcc/config/sparc/vxworks.h
@@ -62,3 +62,7 @@ along with GCC; see the file COPYING3.  If not see
 /* This platform supports the probing method of stack checking (RTP mode).
8K is reserved in the stack to propagate exceptions in case of overflow.  */
 #define STACK_CHECK_PROTECT 8192
+
+/* SPARC_LONG_DOUBLE_TYPE_SIZE should be defined per OS.  */
+#undef SPARC_LONG_DOUBLE_TYPE_SIZE
+#define SPARC_LONG_DOUBLE_TYPE_SIZE (BITS_PER_WORD * 2)
--
2.43.0


[PATCH] isel: Fold more in gimple_expand_vec_cond_expr [PR115659]

2024-07-01 Thread Kewen.Lin
Hi,

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? -1/z : z/0.

For r = c ? -1 : z, it can be folded into:
  - r = c | z (with ior_optab supported)
  - or r = c ? c : z

while for r = c ?  z : 0, it can be foled into:
  - r = c & z (with and_optab supported)
  - or r = c ? z : c

This patch is to teach ISEL to take care of them and also
remove the redundant gsi_replace as the caller of function
gimple_expand_vec_cond_expr will handle it.

Bootstrapped and regtested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu.

Is it ok for trunk?

BR,
Kewen
-
PR tree-optimization/115659

gcc/ChangeLog:

* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? -1 : z and x CMP y ? z : 0.
---
 gcc/gimple-isel.cc | 48 +++---
 1 file changed, 41 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple-isel.cc b/gcc/gimple-isel.cc
index 54c1801038b..71af1a8cd97 100644
--- a/gcc/gimple-isel.cc
+++ b/gcc/gimple-isel.cc
@@ -240,16 +240,50 @@ gimple_expand_vec_cond_expr (struct function *fun, 
gimple_stmt_iterator *gsi,
can_compute_op0 = expand_vec_cmp_expr_p (op0a_type, op0_type,
 tcode);

- /* Try to fold x CMP y ? -1 : 0 to x CMP y.  */
  if (can_compute_op0
- && integer_minus_onep (op1)
- && integer_zerop (op2)
  && TYPE_MODE (TREE_TYPE (lhs)) == TYPE_MODE (TREE_TYPE (op0)))
{
- tree conv_op = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), op0);
- gassign *new_stmt = gimple_build_assign (lhs, conv_op);
- gsi_replace (gsi, new_stmt, true);
- return new_stmt;
+ /* Assuming c = x CMP y.  */
+ bool op1_minus_onep = integer_minus_onep (op1);
+ bool op2_zerop = integer_zerop (op2);
+ tree vtype = TREE_TYPE (lhs);
+ machine_mode vmode = TYPE_MODE (vtype);
+ /* Try to fold r = c ? -1 : 0 to r = c.  */
+ if (op1_minus_onep && op2_zerop)
+   {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
+ return gimple_build_assign (lhs, conv_op);
+   }
+ /* Try to fold r = c ? -1 : z to r = c | z, or
+r = c ? c : z.  */
+ if (op1_minus_onep)
+   {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
+ tree new_op0 = make_ssa_name (vtype);
+ gassign *new_stmt = gimple_build_assign (new_op0, conv_op);
+ gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+ if (optab_handler (ior_optab, vmode) != CODE_FOR_nothing)
+   /* r = c | z */
+   return gimple_build_assign (lhs, BIT_IOR_EXPR, new_op0,
+   op2);
+ /* r = c ? c : z */
+ op1 = new_op0;
+   }
+ /* Try to fold r = c ? z : 0 to r = c & z, or
+r = c ? z : c.  */
+ else if (op2_zerop)
+   {
+ tree conv_op = build1 (VIEW_CONVERT_EXPR, vtype, op0);
+ tree new_op0 = make_ssa_name (vtype);
+ gassign *new_stmt = gimple_build_assign (new_op0, conv_op);
+ gsi_insert_seq_before (gsi, new_stmt, GSI_SAME_STMT);
+ if (optab_handler (and_optab, vmode) != CODE_FOR_nothing)
+   /* r = c | z */
+   return gimple_build_assign (lhs, BIT_AND_EXPR, new_op0,
+   op1);
+ /* r = c ? z : c */
+ op2 = new_op0;
+   }
}

  /* When the compare has EH we do not want to forward it when
--
2.43.0


[RFC/PATCH] isel: Fold more in gimple_expand_vec_cond_expr with andc/iorc

2024-07-01 Thread Kewen.Lin
Hi,

As PR115659 shows, assuming c = x CMP y, there are some
folding chances for patterns r = c ? 0/z : z/-1:
  - For r = c ? 0 : z, it can be folded into r = ~c & z.
  - For r = c ? z : -1, it can be folded into r = ~c | z.

But BIT_AND/BIT_IOR applied on one BIT_NOT operand is a
compound operation, I'm not sure if each target with
vector capability have a single vector instruction for it,
if no, it's arguable to consider it always beats vector
selection (like vector constant gets hoisted or combined
and selection has same latency as normal logical operation).
So IMHO we probably need to query target with new optabs.
So this patch is to introduce new optabs andc, iorc and its
corresponding internal functions BIT_{ANDC,IORC} (looking
for suggestion for naming optabs and ifns), and if targets
defines such optabs for vector modes, it means targets
support these hardware insns and should be not worse than
vector selection.  btw, the rs6000 changes are meant to
give an example for a target supporting andc/iorc.

Does this sound reasonable?

BR,
Kewen
-

PR tree-optimzation/115659

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def: Update some bif expanders by
replacing orc3 with iorc3.
* config/rs6000/rs6000-string.cc (expand_cmp_vec_sequence): Update gen
function by replacing orc3 with iorc3.
* config/rs6000/rs6000.md (orc3): Rename to ...
(iorc3): ... this.
* doc/md.texi: Document andcm3 and iorcm3.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for
patterns x CMP y ? 0 : z and x CMP y ? z : -1.
* internal-fn.def (BIT_ANDC): New internal function.
(BIT_IORC): Likewise.
* optabs.def (andc, iorc): New optab.
---
 gcc/config/rs6000/rs6000-builtins.def | 24 
 gcc/config/rs6000/rs6000-string.cc|  2 +-
 gcc/config/rs6000/rs6000.md   |  2 +-
 gcc/doc/md.texi   | 10 ++
 gcc/gimple-isel.cc| 24 
 gcc/internal-fn.def   |  4 
 gcc/optabs.def|  2 ++
 7 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..736890fe6cb 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2147,40 +2147,40 @@
 NEG_V2DI negv2di2 {}

   const vsc __builtin_altivec_orc_v16qi (vsc, vsc);
-ORC_V16QI orcv16qi3 {}
+ORC_V16QI iorcv16qi3 {}

   const vuc __builtin_altivec_orc_v16qi_uns (vuc, vuc);
-ORC_V16QI_UNS orcv16qi3 {}
+ORC_V16QI_UNS iorcv16qi3 {}

   const vsq __builtin_altivec_orc_v1ti (vsq, vsq);
-ORC_V1TI orcv1ti3 {}
+ORC_V1TI iorcv1ti3 {}

   const vuq __builtin_altivec_orc_v1ti_uns (vuq, vuq);
-ORC_V1TI_UNS orcv1ti3 {}
+ORC_V1TI_UNS iorcv1ti3 {}

   const vd __builtin_altivec_orc_v2df (vd, vd);
-ORC_V2DF orcv2df3 {}
+ORC_V2DF iorcv2df3 {}

   const vsll __builtin_altivec_orc_v2di (vsll, vsll);
-ORC_V2DI orcv2di3 {}
+ORC_V2DI iorcv2di3 {}

   const vull __builtin_altivec_orc_v2di_uns (vull, vull);
-ORC_V2DI_UNS orcv2di3 {}
+ORC_V2DI_UNS iorcv2di3 {}

   const vf __builtin_altivec_orc_v4sf (vf, vf);
-ORC_V4SF orcv4sf3 {}
+ORC_V4SF iorcv4sf3 {}

   const vsi __builtin_altivec_orc_v4si (vsi, vsi);
-ORC_V4SI orcv4si3 {}
+ORC_V4SI iorcv4si3 {}

   const vui __builtin_altivec_orc_v4si_uns (vui, vui);
-ORC_V4SI_UNS orcv4si3 {}
+ORC_V4SI_UNS iorcv4si3 {}

   const vss __builtin_altivec_orc_v8hi (vss, vss);
-ORC_V8HI orcv8hi3 {}
+ORC_V8HI iorcv8hi3 {}

   const vus __builtin_altivec_orc_v8hi_uns (vus, vus);
-ORC_V8HI_UNS orcv8hi3 {}
+ORC_V8HI_UNS iorcv8hi3 {}

   const vsc __builtin_altivec_vclzb (vsc);
 VCLZB clzv16qi2 {}
diff --git a/gcc/config/rs6000/rs6000-string.cc 
b/gcc/config/rs6000/rs6000-string.cc
index 917f5572a6d..c4c62e8e2f9 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -743,7 +743,7 @@ expand_cmp_vec_sequence (unsigned HOST_WIDE_INT 
bytes_to_compare,
  rtx cmp_combined = gen_reg_rtx (load_mode);
  emit_insn (gen_altivec_eqv16qi (cmp_res, s1data, s2data));
  emit_insn (gen_altivec_eqv16qi (cmp_zero, s1data, zero_reg));
- emit_insn (gen_orcv16qi3 (vec_result, cmp_zero, cmp_res));
+ emit_insn (gen_iorcv16qi3 (vec_result, cmp_zero, cmp_res));
  emit_insn (gen_altivec_vcmpequb_p (cmp_combined, vec_result, 
zero_reg));
}
}
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a5d20594789..276a5c9cf2d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7324,7 +7324,7 @@ (define_expand "nand3"

 ;; The canonical form is to have the negated element first, so we need to
 ;; reverse arguments.
-(define_expand "orc3"
+(define_expand 

Re: [PATCH ver3] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-25 Thread Kewen.Lin
Hi,

on 2024/6/25 03:00, Carl Love wrote:
> GCC maintainers:
> 
> version 3, rebased on current mainline tree.  Version 2 of the patch was out 
> of sync. Retested the patch on 
> Power 10 with no regressions.
> 
> version 2, update the dg options per the feedback.  Retested the patch on 
> Power 10 with no regressions.
> 
> This patch updates the dg options.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> 
> rs6000, altivec-1-runnable.c update the require-effective-target
> 
> Update the dg test directives.

OK with the very minor nit below tweaked, thanks!

> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> index 4e32860a169..6763ff3ff8b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -1,7 +1,9 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> -/* { dg-options "-maltivec" } */
> +/* { dg-do run { target vmx_hw } } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2 -maltivec" } */
>  /* { dg-require-effective-target powerpc_altivec } */
>  
> +

Nit: This newline is useless.

BR,
Kewen

>  #include 
>  
>  #ifdef DEBUG



Re: [PATCH 09/52 v2] Replace {FLOAT, {, LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type

2024-06-25 Thread Kewen.Lin
Hi all,

I just pushed this 09/52 v2 with its following target changes
as r15-1594, thanks a lot for your comments/reviews/approvals!

BR,
Kewen

> Subject: [PATCH 09/52] Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
>  mode_for_floating_type
> 
> Currently how we determine which mode will be used for a
> floating point type is that for a given type precision
> (size) call mode_for_size to get the first mode which has
> this size in the specified class.  On Powerpc, we have
> three modes (TF/KF/IF) having the same mode precision 128
> (see[1]), so the processing forces us to have to place TF
> at the first place, it would require us to make more
> adjustment in some generic code to avoid some unexpected
> mode conversions and it would be even worse if we get rid
> of TF eventually one day.  And as Joseph pointed out in [2],
> "floating types should have their mode, not a poorly
> defined precision value", as Joseph and Richi suggested,
> this patch is to introduce one hook mode_for_floating_type
> which returns the corresponding mode for type float, double
> or long double.  The default implementation returns SFmode
> for float and DFmode for double or long double.  For ports
> which need special treatment, there are some other patches
> for their own port specific implementation (referring to
> how {,LONG_}DOUBLE_TYPE_SIZE get used there).  For all
> generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
> on the context, some of them are replaced with TYPE_PRECISION
> of the according type node, some other are replaced with
> GET_MODE_PRECISION on the mode from mode_for_floating_type.
> This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
> so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
> specific are removed, but there are still some which are
> good to be kept for readability then they get renamed with
> port specific prefix.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651017.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
> 
> gcc/ChangeLog:
> 
>   * coretypes.h (enum tree_index): Forward declaration.
>   * defaults.h (FLOAT_TYPE_SIZE): Remove.
>   (DOUBLE_TYPE_SIZE): Likewise.
>   (LONG_DOUBLE_TYPE_SIZE): Likewise.
>   * doc/rtl.texi: Update document by replacing {FLOAT,DOUBLE}_TYPE_SIZE
>   with C type {float,double}.
>   * doc/tm.texi.in: Document new hook mode_for_floating_type, remove
>   document entries for {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE and
>   update document for WIDEST_HARDWARE_FP_SIZE.
>   * doc/tm.texi: Regenerate.
>   * emit-rtl.cc (init_emit_once): Replace DOUBLE_TYPE_SIZE by
>   calling targetm.c.mode_for_floating_type with TI_DOUBLE_TYPE.
>   * real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use TYPE_PRECISION of
>   long_double_type_node to replace LONG_DOUBLE_TYPE_SIZE.
>   * system.h (FLOAT_TYPE_SIZE): Poison.
>   (DOUBLE_TYPE_SIZE): Likewise.
>   (LONG_DOUBLE_TYPE_SIZE): Likewise.
>   * target.def (mode_for_floating_type): New hook.
>   * targhooks.cc (default_mode_for_floating_type): New function.
>   (default_scalar_mode_supported_p): Update macros
>   {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>   targetm.c.mode_for_floating_type with
>   TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>   * targhooks.h (default_mode_for_floating_type): New declaration.
>   * tree-core.h (enum tree_index): Specify underlying type unsigned
>   to sync with forward declaration in coretypes.h.
>   (NUM_FLOATN_TYPES): Explicitly convert to int.
>   (NUM_FLOATNX_TYPES): Likewise.
>   (NUM_FLOATN_NX_TYPES): Likewise.
>   * tree.cc (build_common_tree_nodes): Update macros
>   {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>   targetm.c.mode_for_floating_type with
>   TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE and set type mode accordingly.
> ---
>  gcc/coretypes.h|  1 +
>  gcc/defaults.h | 12 
>  gcc/doc/rtl.texi   |  2 +-
>  gcc/doc/tm.texi| 33 +
>  gcc/doc/tm.texi.in | 27 +++
>  gcc/emit-rtl.cc|  3 ++-
>  gcc/real.h |  7 ---
>  gcc/system.h   |  3 ++-
>  gcc/target.def |  9 +
>  gcc/targhooks.cc   | 18 +++---
>  gcc/targhooks.h|  1 +
>  gcc/tree-core.h| 13 +++--
>  gcc/tree.cc| 18 +++---
>  13 files changed, 77 insertions(+), 70 deletions(-)
> 
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 1ac6f0abea3..00c1c58bd8c 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -100,6 +100,7 @@ struct gimple;
>  typedef gimple *gimple_seq;
>  struct gimple_stmt_iterator;
>  class code_helper;
> +enum tree_index : unsigned;
> 
>  /* Forward declare rtx_code, so that we can use it in target hooks without
> needing to pull in rtl.h.  */
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index 92f3e07f742..ac2d25852ab 100644
> --- a/gcc/defaults.h
> 

Re: [PATCH 04/52] go: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-25 Thread Kewen.Lin
on 2024/6/12 20:32, Ian Lance Taylor wrote:
> "Kewen.Lin"  writes:
> 
>> Hi,
>>
>> Gentle ping:
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653387.html
>>
>> BR,
>> Kewen
>>
>> on 2024/6/3 11:00, Kewen Lin wrote:
>>> Joseph pointed out "floating types should have their mode,
>>> not a poorly defined precision value" in the discussion[1],
>>> as he and Richi suggested, the existing macros
>>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>>> hook mode_for_floating_type.  To be prepared for that, this
>>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>>> in go with TYPE_PRECISION of {float,{,long_}double}_type_node.
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>>
>>> gcc/go/ChangeLog:
>>>
>>> * go-gcc.cc (Gcc_backend::float_type): Use TYPE_PRECISION of
>>> {float,double,long_double}_type_node to replace
>>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>>> (Gcc_backend::complex_type): Likewise.
> 
> This is fine if the other parts of the patch are accepted.

Thanks, pushed as r15-1591-gfafd87830937d5.

BR,
Kewen



Re: [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-25 Thread Kewen.Lin
on 2024/6/21 18:36, Kewen.Lin wrote:
> Hi Arthur,
> 
> on 2024/6/21 18:17, Arthur Cohen wrote:
>> Hi,
>>
>> Sorry about the delay in my answer! The patch looks good to me :) Will you 
>> push it as part of your patchset?
>>
> 
> Thanks for the review!  Since this one doesn't necessarily depend on
> "09/52 Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
> mode_for_floating_type", I'm going to push this before that (just like
> the other FE changes excepting for the jit one 10/52 which depends on
> the new hook 09/52).  btw, all after 09/52 would be merged into 09/52
> when committing. :)

Pushed as r15-1592-gbcd1b7a097031d, thanks!

BR,
Kewen


[PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low char on LE

2024-06-24 Thread Kewen.Lin
Hi,

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-1.c is a typical example for this issue.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Following [1], this one is for vector element 8-bit size,
bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this two days later if no objections, thanks!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655239.html

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-1.c: New test.
---
 gcc/config/rs6000/altivec.md  | 66 +++
 gcc/config/rs6000/rs6000.cc   |  8 +--
 gcc/testsuite/gcc.target/powerpc/pr106069-1.c | 39 +++
 3 files changed, 95 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069-1.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index dcc71cc0f52..a0e8a35b843 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1152,15 +1152,16 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
   [(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1174,7 +1175,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int  8) 

Re: [PATCH 10/52] jit: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-24 Thread Kewen.Lin
Hi Dave,

May I ask if you still have some concerns on this patch with some
replies to your previous questions?

BR,
Kewen

on 2024/6/14 10:16, Kewen.Lin wrote:
> Hi David,
> 
> on 2024/6/13 21:44, David Malcolm wrote:
>> On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
>>> Joseph pointed out "floating types should have their mode,
>>> not a poorly defined precision value" in the discussion[1],
>>> as he and Richi suggested, the existing macros
>>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>>> hook mode_for_floating_type.  Unlike the other FEs, for the
>>> uses in recording::memento_of_get_type::get_size, since
>>> {float,{,long_}double}_type_node haven't been initialized
>>> yet, this is to replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>>> with calling hook targetm.c.mode_for_floating_type.
>>>
>>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>>
>>> gcc/jit/ChangeLog:
>>>
>>> * jit-recording.cc
>>> (recording::memento_of_get_type::get_size): Update
>>> macros {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE by calling
>>> targetm.c.mode_for_floating_type with
>>> TI_{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE.
>>> ---
>>>  gcc/jit/jit-recording.cc | 12 
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
>>> index 68a2e860c1f..7719b898e57 100644
>>> --- a/gcc/jit/jit-recording.cc
>>> +++ b/gcc/jit/jit-recording.cc
>>> @@ -21,7 +21,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "config.h"
>>>  #include "system.h"
>>>  #include "coretypes.h"
>>> -#include "tm.h"
>>> +#include "target.h"
>>>  #include "pretty-print.h"
>>>  #include "toplev.h"
>>>  
>>> @@ -2353,6 +2353,7 @@ size_t
>>>  recording::memento_of_get_type::get_size ()
>>>  {
>>>    int size;
>>> +  machine_mode m;
>>>    switch (m_kind)
>>>  {
>>>  case GCC_JIT_TYPE_VOID:
>>> @@ -2399,13 +2400,16 @@ recording::memento_of_get_type::get_size ()
>>>    size = 128;
>>>    break;
>>>  case GCC_JIT_TYPE_FLOAT:
>>> -  size = FLOAT_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_FLOAT_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_DOUBLE:
>>> -  size = DOUBLE_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_DOUBLE_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_LONG_DOUBLE:
>>> -  size = LONG_DOUBLE_TYPE_SIZE;
>>> +  m = targetm.c.mode_for_floating_type (TI_LONG_DOUBLE_TYPE);
>>> +  size = GET_MODE_PRECISION (m).to_constant ();
>>>    break;
>>>  case GCC_JIT_TYPE_SIZE_T:
>>>    size = MAX_BITS_PER_WORD;
>>
>> [CCing jit mailing list]
>>
>> Thanks for the patch; sorry for the delay in responding.
>>
>> Did your testing include jit?  Note that --enable-languages=all does
>> *not* include it (due to it needing --enable-host-shared).
> 
> Thanks for the hints!  Yes, as noted in the cover letter, I did test jit.
> Initially I used TYPE_PRECISION ({float,{long_,}double_type_node) to
> replace these just like what I proposed for the other FE changes, but the
> testing showed some failures on test-combination.c etc., by looking into
> them, I realized that this call recording::memento_of_get_type::get_size
> can happen before when we set up those type nodes.  Then I had to use the
> current approach with the new hook, it made all failures gone (no
> regressions).  btw, test result comparison showed some more lines with
> "NA->PASS: test-threads.c.exe", since it's positive, I didn't look into
> it.
> 
>>
>> The jit::recording code runs *very* early - before toplev::main.  For
>> example, a call to gcc_jit_type_get_size can trigger the above code
>> path before toplev::main has run.
>>
>> target.h says each target should have a:
>>
>>   struct gcc_target targetm = TARGET_INITIALIZER;
>>
>> Has targetm.c.mode_for_floating_type been initialized enough by that
>> static initialization?  
> 
> It depends on how to define "enough".  The hook has been initialized
> as you pointed

[PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low short on LE

2024-06-24 Thread Kewen.Lin
Hi,

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs.  These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order.  So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence.  The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Following [1], this one is for vector element 16-bit size,
bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this two days later if no objections, thanks!

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655239.html

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this.  Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE.  And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106069-2.c: New test.
---
 gcc/config/rs6000/altivec.md  | 76 +--
 gcc/config/rs6000/rs6000.cc   |  8 +-
 gcc/testsuite/gcc.target/powerpc/pr106069-2.c | 37 +
 3 files changed, 94 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069-2.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index a0e8a35b843..5af9bf920a2 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1203,17 +1203,18 @@ (define_expand "altivec_vmrghh"
(use (match_operand:V8HI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
-   : gen_altivec_vmrglh_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
   DONE;
 })

-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
-(vec_select:V8HI
+   (vec_select:V8HI
  (vec_concat:V16HI
(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v"))
@@ -1221,7 +1222,21 @@ (define_insn "altivec_vmrghh_direct"
  

Re: [PATCH] rs6000, change altivec*-runnable.c test file names

2024-06-24 Thread Kewen.Lin
Hi,

on 2024/6/22 00:15, Carl Love wrote:
> GCC maintainers:
> 
> Per the discussion of the dg header changes for test files 
> altivec-1-runnable.c and altivec-2-runnable.c it was decided it would be best 
> to change the names of the two tests to better align them with the tests that 
> they are better aligned with.
> 
> This patch is dependent on the two patches to update the dg arguments for 
> test files altivec-1-runnable.c and altivec-2-runnable.c being accepted and 
> committed before this patch.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.

OK, thanks!

BR,
Kewen

> 
> Carl 
> 
> --
> rs6000, change altivec*-runnable.c test file names
> 
> Changed the names of the test files.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the name to
>   altivec-38.c.
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the name to
>   p8vector-builtin-9.c.
> ---
>  .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0
>  .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0
>  2 files changed, 0 insertions(+), 0 deletions(-)
>  rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => 
> altivec-38.c} (100%)
>  rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => 
> p8vector-builtin-9.c} (100%)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
> similarity index 100%
> rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
> similarity index 100%
> rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c



Re: [PATCH version 4] rs6000, altivec-2-runnable.c update the, require-effective-target

2024-06-24 Thread Kewen.Lin
Hi Carl,

on 2024/6/22 00:15, Carl Love wrote:
> GCC maintainers:
> 
> version 4:  Additional dg option updates per the feedback.  Retested the 
> patch on Power 10, no regressions.
> 
> version 3:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
> Peter suggested the -mdejagnu-cpu= value must be power7.  
> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
> has been retested on a Power 10 box, it succeeds
> with 2 passes and no fails.
> 
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.
> 
> was approved and committed, I have updated the dg-require-effective-target
> and dg-options as requested so the test will compile with -O2 on a 
> machine that has a minimum support of Power 8 vector hardware.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.

OK, thanks!

BR,
Kewen

> 
> Carl 
> 
> --
> rs6000, altivec-2-runnable.c update the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.  Update the dg test directives.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> index 17b23eb9d50..660669f69fd 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,6 +1,6 @@
> -/* { dg-do run } */
> -/* { dg-options "-mvsx" } */
> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
> +/* { dg-do run { target p8vector_hw } } */
> +/* { dg-do compile { target { ! p8vector_hw } } } */
> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
>  /* { dg-require-effective-target powerpc_vsx } */
>  
>  #include 



Re: [PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-23 Thread Kewen.Lin
Hi,

on 2024/6/22 00:15, Carl Love wrote:
> GCC maintainers:
> 
> version 2, update the dg options per the feedback.  Retested the patch on 
> Power 10 with no regressions.
> 
> This patch updates the dg options.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> -- 
> rs6000, altivec-1-runnable.c update the require-effective-target
> 
> Update the dg test directives.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> index da8ebbc30ba..3f084c91798 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -1,6 +1,7 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> -/* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-options "-maltivec" } */
> +/* { dg-do run { target vmx_hw } } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2 -maltivec" } */
> +/* { dg-require-effective-target powerpc_altivec } */

This one needs rebasing, "powerpc_altivec" has been adjusted on trunk.

BR,
Kewen

>  
>  #include 
>  


Re: [PATCH-1v3, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-06-21 Thread Kewen.Lin
Hi Haochen,

on 2024/5/24 14:02, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main change is to narrow
> down the predict for float operand according to review's advice.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652128.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>   (isinf2 for IEEE128): New expand.

I think we can add one new mode iterator IEEE_FP including both SFDF
and IEEE128, then we can merge these two into one.

> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..08cce11da60 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>operands[4] = CONST0_RTX (SImode);
>  })
> 
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (0x30)));

Nit: It would be more readable if we can create some macros
for "Test Data Class" mask bits.

The other looks good to me, thanks!

BR,
Kewen

> +  DONE;
> +})
> +
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
> (0x30)));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> new file mode 100644
> index 000..c1c4f64ee8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +
> +int test1 (double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test3 (float x)
> +{
> +  return __builtin_isinff (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> new file mode 100644
> index 000..ed305e8572e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (long double x)
> +{
> +  return __builtin_isinfl (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */



Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-21 Thread Kewen.Lin
Hi Carl,

on 2024/6/20 00:18, Carl Love wrote:
> GCC maintainers:
> 
> The dg options for this test should be the same as for altivec-2-runnable.c.  
> This patch updates the dg options to match 
> the settings in altivec-2-runnable.c.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> --From
>  289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c update 
> the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.

This is not true, vec_unpackh and vec_unpackl doesn't require power8,
vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03.

> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-1-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> index da8ebbc30ba..c113089c13a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -1,6 +1,7 @@
> -/* { dg-do compile { target powerpc*-*-* } } */
> -/* { dg-require-effective-target powerpc_altivec_ok } */
> -/* { dg-options "-maltivec" } */
> +/* { dg-do run { target vsx_hw } } */

So this line should check for vmx_hw.

> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

With more thinking, I think it's better to use
"-O2 -maltivec" to be consistent with the others.

As mentioned in the other thread, powerpc_altivec
effective target check should guarantee the altivec
feature support, if any default cpu type or user
specified option disable altivec, this test case
will not be tested.  If we specify one cpu type
specially here, it may cause confusion why it's
different from the other existing ones.  So let's
go without no specified cpu type.

Besides, similar to the request for altivec-1-runnable.c,
could you also rename this to altivec-38.c?

BR,
Kewen

> +/* { dg-require-effective-target powerpc_altivec } */
>  
>  #include 
>  



Re: [PATCH V5 1/2] split complicate 64bit constant to memory

2024-06-21 Thread Kewen.Lin
Hi Jeff,

on 2024/6/13 10:19, Jiufu Guo wrote:
> Hi,
> 
> Sometimes, a complicated constant is built via 3(or more)
> instructions.  Generally speaking, it would not be as fast
> as loading it from the constant pool (as the discussions in
> PR63281):
> "ld" is one instruction.  If consider "address/toc" adjust,
> we may count it as 2 instructions. And "pld" may need fewer
> cycles.
> 
> As testing(SPEC2017), it could get better/stable runtime
> if set the threshold as "> 2" (compare with "> 3").
> 
> As known, because the constant is load from memory by this
> patch,  so this functionality may affect the cache missing.

I wonder if it's a good idea to offer one rs6000 specific
parameter to control this threshold, since this change isn't
always a win, like 5 constant building simple insns can be
well scheduled among insns in its own bb and an equilavent
load can suffer from cache miss or insufficient LSU resource
A parameter may be better for users in case they want to
fine tune further for some cases.

> While, IMHO, this patch would be still do the right thing.
> 
> Compare with the previous version:
> This version 1. allow assigning complicate constant to r0 before RA,
> 2. allow more condition beside TARGET_ELF,
> 3. updated test cases, and remove 2 test cases as the orignal test
> point is not used any more.

Can they be written with the proposed FORCE_CONST_INTO_REG used in
other test cases?

> 
> Boostrap & regtest pass on ppc64{,le}.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu Guo)
> 
>   PR target/63281
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_const): Split constant to
>   memory under -m64.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/powerpc/const_anchors.c: Test final-rtl.
>   * gcc.target/powerpc/pr106550_1.c (FORCE_CONST_INTO_REG): New macro.
>   * gcc.target/powerpc/pr106550_1.c: Use macro FORCE_CONST_INTO_REG.

Curious if git gcc-verify will complain this (same changed file have multipe 
lines).

>   * gcc.target/powerpc/pr87870.c: Update asm insn checking.
>   * gcc.target/powerpc/pr93012.c: Likewise.
>   * gcc.target/powerpc/parall_5insn_const.c: Removed.
>   * gcc.target/powerpc/pr106550.c: Removed.

Nit: s/Removed/Remove/

>   * gcc.target/powerpc/pr63281.c: New test.
> ---
>  gcc/config/rs6000/rs6000.cc   | 15 +++
>  .../gcc.target/powerpc/const_anchors.c|  5 ++--
>  .../gcc.target/powerpc/parall_5insn_const.c   | 27 ---
>  gcc/testsuite/gcc.target/powerpc/pr106550.c   | 14 --
>  gcc/testsuite/gcc.target/powerpc/pr106550_1.c | 16 ++-
>  gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 
>  gcc/testsuite/gcc.target/powerpc/pr87870.c|  5 +++-
>  gcc/testsuite/gcc.target/powerpc/pr93012.c|  6 -
>  8 files changed, 47 insertions(+), 52 deletions(-)
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>  delete mode 100644 gcc/testsuite/gcc.target/powerpc/pr106550.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..bc9d6f5c34f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source)
> c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
> +

Nit: Unexpected new line.

> +  else if ((can_create_pseudo_p () || base_reg_operand (dest, mode))

Nit: It's not obvious, maybe one comment on why we need base_reg_operand
restriction under !can_create_pseudo_p.

BR,
Kewen

> +&& TARGET_64BIT && num_insns_constant (source, mode) > 2)
> + {
> +   rtx sym = force_const_mem (mode, source);
> +   if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +   && use_toc_relative_ref (XEXP (sym, 0), mode))
> + {
> +   rtx toc = create_TOC_reference (XEXP (sym, 0), dest);
> +   sym = gen_const_mem (mode, toc);
> +   set_mem_alias_set (sym, get_TOC_alias_set ());
> + }
> +
> +   emit_move_insn (dest, sym);
> + }
>else
>   rs6000_emit_set_long_const (dest, c);
>break;
> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..682e773d506 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
> long b)
>  *a++ = C2;
>  }
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
> +
> 

Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-21 Thread Kewen.Lin
Hi Carl,

on 2024/6/20 00:13, Carl Love wrote:
> GCC maintainers:
> 
> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
> Peter suggested the -mdejagnu-cpu= value must be power7.  
> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
> has been retested on a Power 10 box, it succeeds
> with 2 passes and no fails.

IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for
altivec-1-runnable.c.  Both your testing and the comments in the test
case show this altivec-2-runnable.c requires at least power8.

> 
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.
> 
> was approved and committed, I have updated the dg-require-effective-target
> and dg-options as requested so the test will compile with -O2 on a 
> machine that has a minimum support of Power 8 vector hardware.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> rs6000, altivec-2-runnable.c update the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> index 17b23eb9d50..9e7ef89327b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,7 +1,7 @@
> -/* { dg-do run } */
> -/* { dg-options "-mvsx" } */
> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
> -/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-do run { target vsx_hw } } */

As this test case requires power8 and up, and dg-options specifies
-mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here,
otherwise it will fail on power7 env.

> +/* { dg-do compile { target { ! vmx_hw } } } */

This condition should be ! , so ! p8vector_hw.

> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */> +/* { 
> dg-require-effective-target powerpc_altivec } */

This should be powerpc_vsx instead, otherwise this case can still be
tested with -mno-vsx -maltivec, then this test case would fail.

Besides, as the discussion on the name of this test case, could you also
rename this to p8vector-builtin-9.c instead?

BR,
Kewen



Re: [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-21 Thread Kewen.Lin
Hi Arthur,

on 2024/6/21 18:17, Arthur Cohen wrote:
> Hi,
> 
> Sorry about the delay in my answer! The patch looks good to me :) Will you 
> push it as part of your patchset?
> 

Thanks for the review!  Since this one doesn't necessarily depend on
"09/52 Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook
mode_for_floating_type", I'm going to push this before that (just like
the other FE changes excepting for the jit one 10/52 which depends on
the new hook 09/52).  btw, all after 09/52 would be merged into 09/52
when committing. :)

Does it sound good to you?

BR,
Kewen

> Kindly,
> 
> Arthur
> 
> On 6/3/24 05:00, Kewen Lin wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/rust/ChangeLog:
>>
>> * rust-gcc.cc (float_type): Use TYPE_PRECISION of
>> {float,double,long_double}_type_node to replace
>> {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>> ---
>>   gcc/rust/rust-gcc.cc | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/rust/rust-gcc.cc b/gcc/rust/rust-gcc.cc
>> index f17e19a2dfc..38169c08985 100644
>> --- a/gcc/rust/rust-gcc.cc
>> +++ b/gcc/rust/rust-gcc.cc
>> @@ -411,11 +411,11 @@ tree
>>   float_type (int bits)
>>   {
>>     tree type;
>> -  if (bits == FLOAT_TYPE_SIZE)
>> +  if (bits == TYPE_PRECISION (float_type_node))
>>   type = float_type_node;
>> -  else if (bits == DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (double_type_node))
>>   type = double_type_node;
>> -  else if (bits == LONG_DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (long_double_type_node))
>>   type = long_double_type_node;
>>     else
>>   {


PING^2 [PATCH 05/52] rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE

2024-06-20 Thread Kewen.Lin
Hi!

Gentle ping^2:

https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653339.html

BR,
Kewen

on 2024/6/12 17:35, Kewen.Lin wrote:
> Hi,
> 
> Gentle ping:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653339.html
> 
> BR,
> Kewen
> 
> on 2024/6/3 11:00, Kewen Lin wrote:
>> Joseph pointed out "floating types should have their mode,
>> not a poorly defined precision value" in the discussion[1],
>> as he and Richi suggested, the existing macros
>> {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
>> hook mode_for_floating_type.  To be prepared for that, this
>> patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
>> in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651209.html
>>
>> gcc/rust/ChangeLog:
>>
>>  * rust-gcc.cc (float_type): Use TYPE_PRECISION of
>>  {float,double,long_double}_type_node to replace
>>  {FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
>> ---
>>  gcc/rust/rust-gcc.cc | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/rust/rust-gcc.cc b/gcc/rust/rust-gcc.cc
>> index f17e19a2dfc..38169c08985 100644
>> --- a/gcc/rust/rust-gcc.cc
>> +++ b/gcc/rust/rust-gcc.cc
>> @@ -411,11 +411,11 @@ tree
>>  float_type (int bits)
>>  {
>>tree type;
>> -  if (bits == FLOAT_TYPE_SIZE)
>> +  if (bits == TYPE_PRECISION (float_type_node))
>>  type = float_type_node;
>> -  else if (bits == DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (double_type_node))
>>  type = double_type_node;
>> -  else if (bits == LONG_DOUBLE_TYPE_SIZE)
>> +  else if (bits == TYPE_PRECISION (long_double_type_node))
>>  type = long_double_type_node;
>>else
>>  {
> 



Re: [PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low word on LE

2024-06-20 Thread Kewen.Lin
Hi Segher,

on 2024/6/21 01:20, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Jun 20, 2024 at 06:22:07PM +0800, Kewen.Lin wrote:
>> Following your review comments in [1], this patch is
>> separated from Xionghu's patch v4 [2] and mainly targetted
>> for 32-bit element size, it changes with the generic call
>> altivec_vmrg*w in vec_widen_[su]mult_{hi,lo}* expanders as
>> well.  If this patch looks good to you, I'll post the others
>> for 16 and 8 bits element size.
> 
> This looks good.  Thank you!  It still is kinda big, but it is
> manageable now :-)
> 
>> btw, there are still something that can be improved like
>> the loose predicates pointed out by Peter, vsx_xxmrg[hl]w_*
>> can be merged with altivec_vmrg*w etc., but I think we want
>> this one to focus on fixing regression and those enhancements
>> can be supported by some separated follow-up patches.
> 
> Yeah, separate patches, either after or before these, whatever works
> best for you :-)

OK, I'll make it after these by considering their priority. :)

> 
>> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
>> and powerpc64le-linux-gnu P9 and P10.
> 
> That also tested -m32 (on BE at least), right?

Yes, BE testings were with unix'{-m64,-m32}'.

> 
> Okay for trunk, thanks for dealing with this!

Thanks!  Pushed as r15-1504!  Will backport after burn-in time.

BR,
Kewen



[PATCH] rs6000: Fix wrong RTL patterns for vector merge high/low word on LE

2024-06-20 Thread Kewen.Lin
Hi Segher,

Following your review comments in [1], this patch is
separated from Xionghu's patch v4 [2] and mainly targetted
for 32-bit element size, it changes with the generic call
altivec_vmrg*w in vec_widen_[su]mult_{hi,lo}* expanders as
well.  If this patch looks good to you, I'll post the others
for 16 and 8 bits element size.

btw, there are still something that can be improved like
the loose predicates pointed out by Peter, vsx_xxmrg[hl]w_*
can be merged with altivec_vmrg*w etc., but I think we want
this one to focus on fixing regression and those enhancements
can be supported by some separated follow-up patches.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9 and P10.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655019.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611683.html

Is it ok for trunk?

BR,
Kewen
-

Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low word, which are altivec_vmrg[hl]w,
vsx_xxmrg[hl]w_.  These defines are mainly for
built-in function vec_merge{h,l}, __builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si and some internal gen function
needs.  These functions should consider endianness, taking
vec_mergeh as example, as PVIPR defines, vec_mergeh "Merges
the first halves (in element order) of two vectors", it does
note it's in element order.  So it's mapped into vmrghw on
BE while vmrglw on LE respectively.  Although the mapped
insns are different, as the discussion in PR106069, the RTL
pattern should be still the same, it is conformed before
commit r12-4496, define_expand altivec_vmrghw got expanded
into:

  (vec_select:VSX_W
 (vec_concat:
(match_operand:VSX_W 1 "register_operand" "wa,v")
(match_operand:VSX_W 2 "register_operand" "wa,v"))
(parallel [(const_int 0) (const_int 4)
   (const_int 1) (const_int 5)])))]

on both BE and LE then.  But commit r12-4496 changed it to
expand into:

  (vec_select:VSX_W
 (vec_concat:
(match_operand:VSX_W 1 "register_operand" "wa,v")
(match_operand:VSX_W 2 "register_operand" "wa,v"))
(parallel [(const_int 0) (const_int 4)
   (const_int 1) (const_int 5)])))]

on BE, and

  (vec_select:VSX_W
 (vec_concat:
(match_operand:VSX_W 1 "register_operand" "wa,v")
(match_operand:VSX_W 2 "register_operand" "wa,v"))
(parallel [(const_int 2) (const_int 6)
   (const_int 3) (const_int 7)])))]

on LE, although the mapped insn are still vmrghw on BE and
vmrglw on LE, the associated RTL pattern is completely
wrong and inconsistent with the mapped insn.  If optimization
passes leave this pattern alone, even if its pattern doesn't
represent its mapped insn, it's still fine, that's why simple
testing on bif doesn't expose this issue.  But once some
optimization pass such as combine does some changes basing
on this wrong pattern, because the pattern doesn't match the
semantics that the expanded insn is intended to represent,
it would cause the unexpected result.

So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns.  With the
proposed patch, the expanders like altivec_vmrghw expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.

Co-authored-by: Xionghu Luo 

PR target/106069
PR target/115355

gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vmrghw_direct_): Rename
to ...
(altivec_vmrghw_direct__be): ... this.  Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrghw_direct__le): New define_insn.
(altivec_vmrglw_direct_): Rename to ...
(altivec_vmrglw_direct__be): ... this.  Add the condition
BYTES_BIG_ENDIAN.
(altivec_vmrglw_direct__le): New define_insn.
(altivec_vmrghw): Adjust by calling gen_altivec_vmrghw_direct_v4si_be
for BE and gen_altivec_vmrglw_direct_v4si_le for LE.
(altivec_vmrglw): Adjust by calling gen_altivec_vmrglw_direct_v4si_be
for BE and gen_altivec_vmrghw_direct_v4si_le for LE.
(vec_widen_umult_hi_v8hi): Adjust the call to
gen_altivec_vmrghw_direct_v4si by gen_altivec_vmrghw for BE
and by gen_altivec_vmrglw for LE.
(vec_widen_smult_hi_v8hi): Likewise.
(vec_widen_umult_lo_v8hi): Adjust the call to
gen_altivec_vmrglw_direct_v4si by gen_altivec_vmrglw for BE
and by gen_altivec_vmrghw for LE
(vec_widen_smult_lo_v8hi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghw_direct_v4si by
CODE_FOR_altivec_vmrghw_direct_v4si_be for BE and
CODE_FOR_altivec_vmrghw_direct_v4si_le for LE.  And replace
  

Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2024-06-19 Thread Kewen.Lin
Hi Segher,

on 2024/6/19 04:31, Segher Boessenkool wrote:
> On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
> So, nothing here is obvious at all still.  Could you please split it up
> a bit more, so that every step is either small or simple?

I just chatted with Xionghu off-list, he is being busy on some other tasks
and preferred me to follow up this.

> 
> So maybe first just split patterns to BE and LE versions, and nothing
> else?
> 
> And one patch per insn, if at all possible.

OK, I'll try to separate them as element type word, half-word and byte.

> 
> This matters so that a regression search will immediately show the
> culprit pattern, if anything went wrong.
> 
> Most patches will not change anything consequential, but some will, and
> it should be very clear which do!
> 
> And change (or add) comments in the patch so that I don't have to ask
> the same questions as before again! :-)
> 
> Most of this seems clean and good, but there is just too much
> independent stuff going on at the same time.  If your patch series is
> split up correctly writing a changelog for it is very easy (this is a
> good canary to use!), and if we get regressions from this it should be
> trivial to fond the problem, too.

Good point.

> 
>> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
>>  {
>>emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
>>emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
>> -  emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
>> +  emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
>>  }
>>else
>>  {
>>emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
>>emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
>> -  emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
>> +  emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
>>  }
>>DONE;
>>  })
> 
> Please don't.  Call the generic gen_vmrg* patterns from the widen
> things, don't try to do the compilers job of specialising stuff, it
> only makes things much less readable, and causes more mistakes.  Just do
> like what was there before, essentially.

Before r12-4496 (the culprit commit), this part looks like:

@@ -3795,182 +3708,182 @@ (define_expand "vec_widen_smult_hi_v16qi"
   emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
 }
   else
 {
   emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
   emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
   emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
 }
   DONE;
 })

, its associated gen_altivec_vmrghh_direct looks like:

-(define_insn "altivec_vmrghh_direct"
-  [(set (match_operand:V8HI 0 "register_operand" "=v")
-(unspec:V8HI [(match_operand:V8HI 1 "register_operand" "v")
-  (match_operand:V8HI 2 "register_operand" "v")]
- UNSPEC_VMRGH_DIRECT))]
-  "TARGET_ALTIVEC"
   "vmrghh %0,%1,%2"
   [(set_attr "type" "vecperm")])

, the intention is to emit exactly the insn "vmrghh".

It's doable to call gen_vmrg* here instead, but I'm not sure if it's more
readable, as this vec_widen_smult_hi_v16qi expander already has the
different arms for BE and LE, for calling with the generic gen_vmrg*, it
would be gen_altivec_vmrghb for BE and gen_altivec_vmrglb for LE, for LE
readers need to be more careful that we actually generate vmrghh.  From
this perspective, gen_altivec_vmrghh_direct_{be,le} seems more straight.

BR,
Kewen



Re: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2024-06-19 Thread Kewen.Lin
on 2024/6/19 03:02, Peter Bergner wrote:
> On 6/12/24 2:50 AM, Kewen.Lin wrote:
>> As the recent PR115355 shows, this issue can also affect the
>> behavior when users are adopting vectorization optimization,
>> IMHO we should get this landed as soon as possible.
> 
> I agree we want this fixed ASAP.
> 
> 
> 
> 
>> As all said above, I believe this patch is a correct fix and
>> considering the impact of the issue, I'd like to get this
>> pushed next week if no objections.
> 
> The only complaint I have on the patch, and I know this existed before
> the patch, is we're using register_operand for the predicate for these
> patterns when we probably should be using altivec_register_operand or
> vsx_register_operand depending on the specific pattern.

Good catch.

> 
> Yes, other pre-existing patterns use that, but those should probably be
> fixed too.  Maybe we go with register_operand for now with this patch
> and then have a follow-on patch (from us) that cleans those all up???

Yes, since this issue existed before and sort of widely, I think we want
some other separated patch to clean them up.  

> 
> Otherwise, LGTM (although I can't approve it).

Thanks!  I noticed Segher posted some more review comments on patch v4,
I'll follow up them. :)

BR,
Kewen



Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-18 Thread Kewen.Lin
Hi Carl,

>> I'd expect the "-runnable" test case focuses on testing for run.  Normally,
>> the one without "-runnable" would focus on testing for compiling (scan some
>> desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test
>> for different things, maybe we should separate them into different names
>> if they don't test for a same test point.
> 
> The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various
> built-ins that didn't have any test cases.  There wasn't an intention that 
> there was 
> any connection to the existing altivec-*.c test files.  I started creating 
> runnable
> when I started adding support for built-ins that we claimed to support but 
> had never
> actually been implemented.  I created runnable tests to make sure my 
> implementation
> actually worked.  I continued to add runnable tests for built-ins
> that existed but didn't have a test case.  Adding runnable tests did find a 
> couple
> of issues where the existing implementation had a bug.  
> 
> That all said, if we want tochange the name of altivec-1-runnable.c and 
> altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps 
> we should 
> finish fixing the header for this test file, then do altivec-1-runnable, and 
> then 
> a final patch that does all the file renaming?

Yes, that's what I preferred, maybe something like altivec-run-n.c or
altivec-runnable-n.c to avoid the possible confusion.


>>> That said, I don't like not having a -mdejagnu-cpu=... here.
>>> I think for our server cpus, this is fine, but on an embedded system
>>> with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
>>> just adding -maltivec to that default may not make much sense for that
>>> default and probably should be an error.  Maybe something like:
>>
>> Yes, for some embedded cpus, there will be some error messages, but since
>> we have powerpc_altivec_ok effective target, the error would make that
>> effective target checking fail so I'd expect it'll stop it being tested
>> (unsupported).
>>
>>>
>>> /* { dg-do run { target vmx_hw } } */
>>> /* { dg-do compile { target { ! vmx_hw } } } */
>>> /* { dg-require-effective-target powerpc_altivec_ok } */
>>> /* { dg-options "-O2 -mdejagnu=power7" } */
>>>

...

> We had -mdejagnu=power8 before, but it looks like we want to go to power7 now.
> 
> It sounds like we want the following:
> 
> /* { dg-do run { target vmx_hw } } */
> /* { dg-do compile { target { ! vmx_hw } } } */
> /* { dg-options "-O2 -mdejagnu=power7" } */
> /* { dg-require-effective-target powerpc_altivec } */

As mentioned above, I'd expect powerpc_altivec can stop this being tested
without altivec feature support, so IMHO an explicit cpu type isn't necessary
(though I'm not opposed to specifying it), btw, s/-mdejagnu/-mdejagnu-cpu/.

BR,
Kewen



Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> GCC maintainers:
> 
> The patch has been updated per the feedback from version 3.  Please let me 
> know it the patch is acceptable for mainline.
> 
> Thanks.
> 
>   Carl 
> 
> --
> 
> rs6000, remove vector set and vector init built-ins
> 
> The vector init built-ins:
> 
>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>   __builtin_vec_init_v1ti
> 
> perform the same operation as initializing the vector in C code.  For
> example:
> 
>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>   result_v4si = {1, 2, 3, 4};
> 
> These two constructs were tested and verified they generate identical
> assembly instructions with no optimization and -O3 optimization.
> 
> The vector set built-ins:
> 
>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
>   __builtin_vec_set_v2df
> 
> perform the same operation as setting a specific element in the vector in
> C code.  For example:
> 
>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>   src_v4si[index] = int_val;
> 
> The built-in actually generates more instructions than the inline C code
> with no optimization but is identical with -O3 optimizations.
> 
> All of the above built-ins that are removed do not have test cases and
> are not documented.
> 
> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
> __builtin_vec_set_v2df are not removed as they are used in function
> resolve_vec_insert() in file rs6000-c.cc.
> 
> The built-ins are removed as they don't provide any benefit over just
> using C code.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>   __builtin_vec_init_v4sf, __builtin_vec_init_v4si,
>   __builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
>   __builtin_vec_init_v2df, __builtin_vec_init_v2di,
>   __builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
>   __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
>   built-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtins.def | 44 +++
>  1 file changed, 4 insertions(+), 40 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 02aa04e5698..053dc0115d2 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1118,37 +1118,6 @@
>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>  VEC_EXT_V8HI nothing {extract}
>  
> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char, signed char, signed char, 
> \
> -signed char, signed char, signed char);
> -VEC_INIT_V16QI nothing {init}

I just realized this {init} is customized for vec_init only, these removed 
vec_init
bifs are the only users of it, so we should remove this attribute as well.  
Sorry that
I should have found and pointed out this in the previous review.  I think it 
means
some removals are needed on:

1) comments in rs6000-builtins.def
   ;   init Process as a vec_init function

2) related gen code for this attribute bit, like:

  fprintf (header_file, "#define bif_init_bit\t\t(0x0001)\n");
  fprintf (header_file,
   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
  if (bifp->attrs.isinit)
fprintf (init_file, " | bif_init_bit");

The others look good to me!

BR,
Kewen


Re: [PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-18 Thread Kewen.Lin
Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:
> 
> GCC maintainers:
> 
> The patch has been updated per the comments from version 3.  Please let me 
> know if the patch is acceptable for mainline.
> 
> Thanks.
> 
>  Carl 
> 
> -
> 
> rs6000, extend vec_xxpermdi built-in for __int128 args
> 
> Add a new signed and unsigned overloaded instances for vec_xxpermdi
> 
>__int128 vec_xxpermdi (__int128, __int128, const int);
>__uint128 vec_xxpermdi (__uint128, __uint128, const int);

Nit: I think we need the "vector" keyword here to avoid confusion.

> 
> Update the documentation to include a reference to the new built-in
> instances.
> 
> Add test cases for the new overloaded instances.
> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>   overloaded built-in instances.

Better to mention something like:  "built-in instances for vector
signed and unsigned int128".

>   * doc/extend.texi:  Add documentation for new overloaded built-in

Nit: One more space before "Add".

>   instances.

... can be extended similarly.

> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
> ---
>  gcc/config/rs6000/rs6000-overload.def |   4 +
>  gcc/doc/extend.texi   |   4 +
>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>  3 files changed, 237 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def 
> b/gcc/config/rs6000/rs6000-overload.def
> index 6cec1ad4f1a..354f8fabe0f 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4936,6 +4936,10 @@
>  XXPERMDI_2DI  XXPERMDI_VSLL
>vull __builtin_vsx_xxpermdi (vull, vull, const int);
>  XXPERMDI_2DI  XXPERMDI_VULL
> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
> +XXPERMDI_1TI  XXPERMDI_1SQ
> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
> +XXPERMDI_1TI  XXPERMDI_1UQ

Nit: XXPERMDI_1SQ -> XXPERMDI_SQ
 XXPERMDI_1UQ -> XXPERMDI_UQ
(removing "1" to align with the above).

>vf __builtin_vsx_xxpermdi (vf, vf, const int);
>  XXPERMDI_4SF  XXPERMDI_VF
>vd __builtin_vsx_xxpermdi (vd, vd, const int);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d7d8d149a43..9e45976436b 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char 
> *);
>  
>  vector double vec_xxpermdi (vector double, vector double, const int);
>  vector float vec_xxpermdi (vector float, vector float, const int);
> +vector __int128 vec_xxpermdi (vector signed __int128,
> +  vector signed __int128, const int);

Nit: either s/vector __int128/vector signed __int128/
 or s/signed //g
to keep consistent.

> +vector __int128 vec_xxpermdi (vector unsigned __int128,
> +  vector unsigned __int128, const int);

This line misses unsigned for the return type.

OK for trunk with nits above tweaked, thanks!

BR,
Kewen


  1   2   3   4   5   6   7   8   9   10   >