[PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-10-03 Thread Carl Love

GCC maintainers:

The builtins-1-10-runnable.c has the debugging inadvertently enabled.  
The test uses #ifdef to enable/disable the debugging. Unfortunately, the 
#define DEBUG was set to 0 to disable debugging and enable the call to 
abort in case of error.  The #define should have been removed to disable 
debugging.
Additionally, a change in the expected output which was made for testing 
purposes was not removed.  Hence, the test is printing that there was an 
error not calling abort.  The result is the test does not get reported 
as failing.


This patch removes the #define DEBUG to enable the call to abort and 
restores the expected output to the correct value.  The patch was tested 
on a Power 10 without the #define DEBUG to verify that the test does 
fail with the incorrect expected value.  The correct expected value was 
then restored.  The test reports 19 expected passes and no errors.


Please let me know if this patch is acceptable for mainline. Thanks.

   Carl


---

rs6000, fix test builtins-1-p10-runnable.c

The test has two issues:

1) The test should generate execute abort() if an error is found.
However, the test contains a #define 0 which actually enables the
error prints not exectuting void() because the debug code is protected
by an #ifdef not #if.  The #define DEBUG needs to be removed to so the
test will abort on an error.

2) The vec_i_expected output was tweeked to test that it would fail.
The test value was not removed.

By removing the #define DEBUG, the test fails and reports 1 failure.
Removing the intentionally wrong expected value results in the test
passing with no errors as expected.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove #define
    DEBUG.    Replace vec_i_expected value with correct value.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..3e8a1c736e3 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -25,8 +25,6 @@
 #include 
 #include 

-#define DEBUG 0
-
 #ifdef DEBUG
 #include 
 #endif
@@ -281,8 +279,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

--
2.46.0




[PATCH ver2 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-10-01 Thread Carl Love



GCC maintainers:

Version 2, added the argument changes for the__builtin_vsx_uns_double[e 
| o | h | l ]_v4si built-ins. Added support to the vector {un,}signed 
int to vector float builtins so they are supported using Altivec 
instructions if VSX is not available per the feedback comments.


The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be 
unsigned but are listed as signed.


Additionally, there are a number of test cases that are missing for the 
various instances of the built-ins.  Additionally, the documentation for 
the various built-ins is missing.


This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

    Carl

-
rs6000, Add tests and documentation for vector conversions between 
integer and float


The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di, __builtin_vsx_uns_float2_v2di,
__builtin_vsx_xvcvuxwsp built-ins,__builtin_vsx_uns_doublee_v4si,
__builtin_vsx_uns_doubleh_v4si, __builtin_vsx_uns_doublel_v4si and
__builtin_vsx_uns_doubleo_v4si built-ins should be unsigned not signed.

Add tests for the following existing vector integer and vector long long
int to vector float built-ins:
  __builtin_altivec_float_sisf (vsi);
  __builtin_altivec_uns_float_sisf (vui);

Add tests for the vector float to vector int built-ins:
  __builtin_altivec_fix_sfsi
  __builtin_altivec_fixuns_sfsi

The four built-ins are not documented.  The patch adds the missing
documentation for the built-ins.

The vector signed/unsigned integer to vector floating point built-ins
__builtin_vsx_xvcvsxwsp, __builtin_vsx_xvcvuxwsp are extended to generate
Altivec instructions if VSX is not available.  A new test case for these
built-ins with Altivec is added to test the new functionality.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
and __builtin_vsx_xvcvuxwsp argument types and adds test cases for each
of the built-ins listed above.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
    __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di,
    __builtin_vsx_xvcvuxwspm __builtin_vsx_uns_doublee_v4si,
    __builtin_vsx_uns_doubleh_v4si, __builtin_vsx_uns_doublel_v4si,
    __builtin_vsx_uns_doubleo_v4si): Change argument from signed to
    unsigned.
    ( __builtin_vsx_xvcvsxwsp, __builtin_vsx_xvcvuxwsp): Move to
    section Altivec.
    * config/rs6000/vsx.md (vsx_floatv4siv4sf2, vsx_floatunsv4siv4sf2):
    Add define expand to generate VSX instructions if VSX is enabled
    and Altivec instructions otherwise.
    (vsx_float2, vsx_floatuns2): Change
    define_insns to define_insn for vsx_float2_internal and
    vsx_floatuns2_internal.
    (vsx_floatv2div2df2, vsx_floatunsv2div2df2): Add define expands.
    * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-3-altivec-runnable.c: New file.
    * gcc.target/powerpc/builtins-3-runnable.c: Move functions void
    and test_result_sp to file builtins-3-runnable.h. Add include for
    builtins3-runnable.h.
    gcc.target/powerpc/builtins-3-altivec-runnable.h: New file.
    * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
 gcc/config/rs6000/rs6000-builtins.def |  28 +-
 gcc/config/rs6000/vsx.md  |  50 +++-
 gcc/doc/extend.texi   |  22 ++
 .../powerpc/builtins-3-altivec-runnable.c |  35 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  |  43 +--
 .../gcc.target/powerpc/builtins-3-runnable.h  |  51 
 .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
 7 files changed, 432 insertions(+), 57 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtins-3-altivec-runnable.c

 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.h
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 7350b913d03..9bda109d955 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1119,6 +1119,14 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}

+;; __builtin_vsx_xvcvsxwsp, __builtin_vsx_xvcvuxwsp generate VSX 
instructions

+;; if VSX enabled and Altivec instructions if VSX is not enabled.
+  const vf __builtin_vsx_xvcvsxwsp (vsi);
+    XVCVSXWSP vsx_floatv4siv4sf2 {}
+
+  const vf __builtin_vsx_xvcvuxwsp (vui);
+    XVCVUXWSP vsx_floatunsv4siv4sf2 {}
+
 ; Cell builtins.
 [cell]

Re: [PATCH ver2 2/4] rs6000, remove built-ins __builtin_vsx_vperm_8hi and, __builtin_vsx_vperm_8hi_uns

2024-10-01 Thread Carl Love



GCC maintainers:

version 2, added the reference to the patch where the removal of the 
built-ins was missed.  Note, patch was approved by Kewen with this change.


The following patch removes two redundant built-ins 
__builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns.  The built-ins 
are covered by the overloaded vec_perm built-in.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl


---
rs6000, remove built-ins __builtin_vsx_vperm_8hi and 
__builtin_vsx_vperm_8hi_uns


The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
are redundant. The are covered by the overloaded vec_perm built-in. The
built-ins are not documented and do not have test cases.

The removal of these built-ins was missed in commit gcc r15-1923 on
7/9/2024.

This patch removes the redundant built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_8hi,
    __builtin_vsx_vperm_8hi_uns): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 0e9dc05dbcf..adb4fe761f3 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1472,12 +1472,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}

-  const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
-    VPERM_8HI_X altivec_vperm_v8hi {}
-
-  const vus __builtin_vsx_vperm_8hi_uns (vus, vus, vuc);
-    VPERM_8HI_UNS_X altivec_vperm_v8hi_uns {}
-
   const vsll __builtin_vsx_vsigned_v2df (vd);
 VEC_VSIGNED_V2DF vsx_xvcvdpsxds {}

--
2.46.0




[PATCH ver2 3/4] rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

2024-10-01 Thread Carl Love




GCC maintainers:

Version 2: Fixed the wording in the changelog per the feedback. With 
this change the patch was approved by Kewen.


The patch removed the built-in __builtin_vsx_xvcvuxwdp as it is covered 
by the overloaded vec_doubleo built-in.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl



rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

The built-in __builtin_vsx_xvcvuxwdp can be covered with PVIPR
function vec_doubleo on LE and vec_doublee on BE.  There are no test
cases or documentation for __builtin_vsx_xvcvuxwdp.  This patch
removes the redundant built-in.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvuxwdp):
    Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index adb4fe761f3..7350b913d03 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1616,9 +1616,6 @@
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}

-  const vd __builtin_vsx_xvcvuxwdp (vsi);
-    XVCVUXWDP vsx_xvcvuxwdp {}
-
   const vf __builtin_vsx_xvcvuxwsp (vsi);
 XVCVUXWSP vsx_floatunsv4siv4sf2 {}

--
2.46.0




Re: [PATCH ver2 1/4] rs6000, add testcases to the overloaded vec_perm built-in

2024-10-01 Thread Carl Love




GCC maintainers:

Version 2, fixed the changelog, updated the wording in the documentation 
and updated the argument types in the vsx-builtin-3.c test file.


The following patch adds missing test cases for the overloaded vec_perm 
built-in.  It also fixes and issue with printing the 128-bit values in 
the DEBUG section that was noticed when adding the additional test cases.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl

---
From 4c672e8895107bc1f62e09122e7af157436cb83d Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Wed, 31 Jul 2024 16:31:34 -0400
Subject: [PATCH 1/4] rs6000, add testcases to the overloaded vec_perm 
built-in


The overloaded vec_perm built-in supports permuting signed and unsigned
vectors of char, bool char, short int, short bool, int, bool, long long
int, long long bool, int128, float and double.  However, not all of the
supported arguments are included in the test cases.  This patch adds
the missing test cases.

Additionally, in the 128-bit debug print statements the expected result and
the result need to be cast to unsigned long long to print correctly.  The
patch makes this additional change to the print statements.

gcc/ChangeLog:
    * doc/extend.texi: Fix spelling mistake in description of the
    vec_sel built-in.  Add documentation of the 128-bit vec_perm
    instance.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases for
    arguments of type vector signed long long int, long long bool,
    bool, bool short, bool char and pixel,    vector unsigned long long
    int, unsigned int, unsigned short int, unsigned char.  Cast
    arguments for debug prints to unsigned long long.
    * gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
    test cases for signed and unsigned int128 arguments.
---
 gcc/doc/extend.texi   |  12 +-
 .../powerpc/builtins-4-int128-runnable.c  | 108 +++---
 .../gcc.target/powerpc/vsx-builtin-3.c    |  14 ++-
 3 files changed, 116 insertions(+), 18 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2d795ba7e59..adc4a54c5fa 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21642,9 +21642,19 @@ vector bool __int128 vec_sel (vector bool __int128,
    vector bool __int128, vector unsigned __int128);
 @end smallexample

-The instance is an extension of the exiting overloaded built-in 
@code{vec_sel}
+The instance is an extension of the existing overloaded built-in 
@code{vec_sel}

 that is documented in the PVIPR.

+@smallexample
+vector signed __int128 vec_perm (vector signed __int128,
+   vector signed __int128);
+vector unsigned __int128 vec_perm (vector unsigned __int128,
+   vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the existing overloaded built-in
+@code{vec_perm} that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c

index 62c11132cf3..c61b0ecb854 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
@@ -18,6 +18,16 @@ int main() {
   __uint128_t data_u128[100];
   __int128_t data_128[100];

+#ifdef __BIG_ENDIAN__
+  vector unsigned char vuc = {0xC, 0xD, 0xE, 0xF, 0x8, 0x9, 0xA, 0xB,
+  0x1C, 0x1D, 0x1E, 0x1F, 0x18, 0x19, 0x1A, 
0x1B};

+#else
+  vector unsigned char vuc = {0x4, 0x5, 0x6, 0x7, 0x0, 0x1, 0x2, 0x3,
+  0x14, 0x15, 0x16, 0x17, 0x10, 0x11, 0x12, 0x13};
+#endif
+
+  vector __int128_t vec_128_arg1, vec_128_arg2;
+  vector __uint128_t vec_u128_arg1, vec_u128_arg2;
   vector __int128_t vec_128_expected1, vec_128_result1;
   vector __uint128_t vec_u128_expected1, vec_u128_result1;
   signed long long zero = (signed long long) 0;
@@ -37,11 +47,13 @@ int main() {
 {
 #ifdef DEBUG
 printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",
-   vec_128_result1[0] >> 64,
-   vec_128_result1[0] & (__int128_t)0x);
+   (unsigned long long)(vec_128_result1[0] >> 64),
+   (unsigned long long)(vec_128_result1[0]
+    & (__int128_t)0x));
 printf("vec_128_expected1[0] = %lld %llu\n",
-   vec_128_expected1[0] >> 64,
-   vec_128_expected1[0] & (__int128_t)0x);
+   (unsigned long long)(vec_128_expected1[0] >> 64),
+   (unsigned long long)(vec_128_expected1[0]
+    & (__

[PATCH ver2 0/4] rs6000, remove redundant built-ins and add more test cases

2024-10-01 Thread Carl Love



GCC maintainers:

The following version 2 of a series of patches for PowerPC removes some 
built-ins that are covered by existing overloaded built-ins. 
Additionally, there are patches to add missing testcases and 
documentation.  The original version of the patch series was posted on 
8/7/2024.  It was originally reviewed by Kewen.


The patches have been updated per the review.  Note patches 2 and 3 in 
the series were approved with minor changes.  I will post the entire 
series for review for completeness.


The patch series has been re-tested on Power 10 LE and BE with no 
regressions.


Please let me know if the patches are acceptable for mainline. Thanks.

    Carl


Re: [PATCH 2/4] rs6000, remove built-ins __builtin_vsx_vperm_8hi and, __builtin_vsx_vperm_8hi_uns

2024-09-30 Thread Carl Love

GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/9/24 3:11 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:

GCC maintainers:

The following patch removes two redundant built-ins __builtin_vsx_vperm_8hi and 
__builtin_vsx_vperm_8hi_uns.  The built-ins are covered by the overloaded 
vec_perm built-in.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

   Carl

-
rs6000, remove built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns

The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
are redundant. The are covered by the overloaded vec_perm built-in.  The
built-ins are not documented and do not have test cases.

OK for trunk, maybe also mention this is a follow up of r15-1923, thanks!


Yes, added:

  The removal of these built-ins was missed in commit gcc r15-1923 on 
7/9/2024.


to the patch description.

 Carl


Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-09-30 Thread Carl Love

GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/20/24 12:54 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:


  GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be unsigned but 
are listed as signed.

Additionally, there are a number of test cases that are missing for the various 
instances of the built-ins.  Additionally, the documentation for the various 
built-ins is missing.

This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

     Carl
-
rs6000, Add tests and documentation for vector conversions between integer and 
float

The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
   __builtin_altivecfloat_sisf (vsi);
   __builtin_altivec_uns_float_sisf (vui);
   __builtin_vsxfloate_v2di (vsll);
   __builtin_vsx_uns_floate_v2di (vull);
   __builtin_vsx_floato_v2di (vsll);
   __builtin_vsx_uns_floato_v2di (vull);
   __builtin_vsx_float2_v2di (vsll, vsll);
   __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
   __builtin_altivec_fix_sfsi
   __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed above.

gcc/ChangeLog:
     * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
     __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
     argument from signed to unsigned.
     * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
  gcc/config/rs6000/rs6000-builtins.def |   6 +-
  gcc/doc/extend.texi   |  37 +++
  .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
  3 files changed, 300 insertions(+), 3 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
    const vd __builtin_vsx_uns_doubleo_v4si (vsi);
  UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

I noticed there are extra four that should be updated together:

const vd __builtin_vsx_uns_doublee_v4si (vsi);
  UNS_DOUBLEE_V4SI unsdoubleev4si2 {}

const vd __builtin_vsx_uns_doubleh_v4si (vsi);
  UNS_DOUBLEH_V4SI unsdoublehv4si2 {}

const vd __builtin_vsx_uns_doublel_v4si (vsi);
  UNS_DOUBLEL_V4SI unsdoublelv4si2 {}

const vd __builtin_vsx_uns_doubleo_v4si (vsi);
  UNS_DOUBLEO_V4SI unsdoubleov4si2 {}


Yes, those definitions are also incorrect.  Fixed.


-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
  UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
  UNS_FLOATO_V2DI unsfloatov2di {}

    const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
    const vss __builtin_vsx_revb_v8hi (vss);
  REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
  UNS_FLOAT2_V2DI uns_float2_v2di {}

    const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

  Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);

These functions are to convert vector {un,}signed int to vector float,
PVIPR has defined "vec_float" for this kind of conversion.  For now,
this function onl

Re: [PATCH 3/4] rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

2024-09-30 Thread Carl Love

GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl


On 8/9/24 3:11 AM, Kewen.Lin wrote:

rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

The built-in __builtin_vsx_xvcvuxwdp is a duplicate of the overloaded
built-in vec_doubleo.  There are no test cases or documentation for

I think this wording is wrong, __builtin_vsx_xvcvuxwdp is a bif doing
1-1 map to xvcvuxwdp, but vec_doubleo with vector unsigned int is only
mapped to xvcvuxwdp on LE while it's vec_doublee on BE.  So how about
"... __builtin_vsx_xvcvuxwdp can be covered with PVIPR function
vec_doubleo on LE and vec_doublee on BE...".

OK with this wording tweaked, thanks!

Yes, the mapping is LE/BE dependent.  Updated the description as suggested.

    Carl


Re: [PATCH 1/4] rs6000, add testcases to the overloaded vec_perm built-in

2024-09-30 Thread Carl Love

GCC maintainers:

Here is my respnses to the review comments by Kewen.  Unfortunately, 
Kewen is no longer working on GCC power.


I will submit an updated version of the patch with Kewen's suggested 
changes.


 Carl

On 8/9/24 3:11 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/8 01:15, Carl Love wrote:

GCC maintainers:

The following patch adds missing test cases for the overloaded vec_perm 
built-in.  It also fixes and issue with printing the 128-bit values in the 
DEBUG section that was noticed when adding the additional test cases.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

   Carl

-

rs6000, add testcases to the overloaded vec_perm built-in

The overloaded vec_perm built-in supports permuting signed and unsigned
vectors of char, bool char, short int, short bool, int, bool,
long long int, long long bool, int128, float and double.  However, not all
of the supported arguments are included in the test cases.  This patch adds
the missing test cases.

Additionally, in the 128-bit debug print statements the expected result and
the result need to be cast to unsigned long long to print correctly.  The
patch makes this additional change to the print statements.

gcc/ChangeLog:
     * doc/extend.texi: Fix spelling mistake in description of the
     vec_sel built-in.
     Add documentation of the 128-bit vec_perm instance.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases    for
     arguments of type vector signed long long int, long long bool,
     bool, bool short, bool char and pixel,
     vector unsigned long long int, unsigned int, unsigned short int,
     unsigned char.
     Cast arguments for debug prints to unsigned long long.
     * gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
     test cases for signed and unsigned int128 arguments.

Nit: Some changelog lines have unnecessary newlines and spaces.


Fixed.

---
  gcc/doc/extend.texi   |  12 +-
  .../powerpc/builtins-4-int128-runnable.c  | 108 +++---
  .../gcc.target/powerpc/vsx-builtin-3.c    |  18 +++
  3 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 48b27ff9f39..bf6f4094040 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21553,9 +21553,19 @@ vector bool __int128 vec_sel (vector bool __int128,
     vector bool __int128, vector unsigned __int128);
  @end smallexample

-The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+The instance is an extension of the existing overloaded built-in @code{vec_sel}
  that is documented in the PVIPR.

Good catch!


+@smallexample
+vector signed __int128 vec_perm (vector signed __int128,
+   vector signed __int128);
+vector unsigned __int128 vec_perm (vector unsigned __int128,
+   vector unsigned __int128);
+@end smallexample
+
+The 128-bit integer arguments for the @code{vec_perm} built-in are in addition
+to the instances that are documented in the PVIPR.

Nit: Maybe just copy the above wording for @code{vec_sel} but replaced with
@code{vec_perm} to keep them consistent.


OK, made them consistent.





diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index 67c93be1469..b3b76be34b9 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -39,10 +39,17 @@

  #include 

+extern __vector long long int sll[][4];
There is a "extern __vector long long sll[][4]" below.


+extern __vector long long bool bll[][4];
  extern __vector int si[][4];
+extern __vector bool int bi[][4];

Similar, having "... __vector __bool int bi[][4]" below.


  extern __vector short ss[][4];
+extern __vector bool short bs[][4];

Similar, having "... __vector __bool short bs[][4]" below.


  extern __vector signed char sc[][4];
+extern __vector bool char bc[][4];

Ditto.


+extern __vector pixel p[][4];

Similar, having "... __vector __pixel p[][4]" below.


  extern __vector float f[][4];
+extern __vector unsigned long long int ull[][4];

As above, I think we only need "bll" and "ull" here.


Yea, looks like I didn't notice that they were previously defined. Looks 
like all I really needed to add is the bll.  There is as ull definition 
already for __VSX__ which I think needs to be moved so it is always there.


Surprised the compiler didn't complain about multiple definitions.

   Carl


Re: [PATCH] rs6000, Fix test builtins-1-p10-runnable.c

2024-09-19 Thread Carl Love



GCC maintainers:

Please ignore this patch.  Attached the wrong patch to the message.   
Sorry for the noise.


 Carl


On 9/19/24 4:40 PM, Carl Love wrote:

GCC maintainers:

This patch removes an expected value change that was made to verify 
the error checking for the test was working.  Apparently, it didn't 
get removed from the final patch.


The patch fixes the single test error in the builtins-1-10-runnable.c 
test.


The patch was run on a Power 10.

Please let me know if the patch is acceptable for mainline. Thanks.

 Carl Love

-
rs6000, Fix test builtins-1-p10-runnable.c

The first element of the expected result was apparently changed
for testing purposes.  The change didn't get removed before the
commit.

The issue was introduced in commit:

  commit f1ad419ebfdcfaf26117e069b10bd1b154276049
  Author: Carl Love 
  Date:   Fri Sep 4 19:24:22 2020 -0500

  rs6000, vector integer multiply/divide/modulo instructions

Remove the test input.

gcc/testsuite/ChangeLog:

    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove
    expected value for testing.  Uncomment correct    expected
    result.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..5402852f82b 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -281,8 +281,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);





[PATCH] rs6000, Fix test builtins-1-p10-runnable.c

2024-09-19 Thread Carl Love

GCC maintainers:

This patch removes an expected value change that was made to verify the 
error checking for the test was working.  Apparently, it didn't get 
removed from the final patch.


The patch fixes the single test error in the builtins-1-10-runnable.c test.

The patch was run on a Power 10.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love

-
rs6000, Fix test builtins-1-p10-runnable.c

The first element of the expected result was apparently changed
for testing purposes.  The change didn't get removed before the
commit.

The issue was introduced in commit:

  commit f1ad419ebfdcfaf26117e069b10bd1b154276049
  Author: Carl Love 
  Date:   Fri Sep 4 19:24:22 2020 -0500

  rs6000, vector integer multiply/divide/modulo instructions

Remove the test input.

gcc/testsuite/ChangeLog:

    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove
    expected value for testing.  Uncomment correct    expected
    result.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..5402852f82b 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -281,8 +281,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

--
2.46.0




[PATCH ver 3] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-22 Thread Carl Love

Gcc maintainers:

Version 3, fixed a few typos per Kewen's review.  Fixed the expected 
number of scan-assembler-times for xvtlsbb and setbc.  Retested on Power 
10 LE.


Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed char 
and vector bool char.  The patch has been tested on Power 10 LE and BE 
with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl



rs6000,extend and document built-ins vec_test_lsbb_all_ones  and 
vec_test_lsbb_all_zeros


The built-ins currently support vector unsigned char arguments. Extend the
built-ins to also support vector signed char and vector bool char
arguments.

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  28 +++-
 4 files changed, 158 insertions(+), 32 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..8971d9fbf3c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost 
byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns 0 otherwise.

 @smallexample
 @exdent vector unsi

Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-22 Thread Carl Love



Kewen:

On 8/20/24 12:56 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/9 23:57, Carl Love wrote:

Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been added.  
The additional instances are for arguments of vector signed char and vector 
bool char.  The patch has been tested on Power 10 LE and BE with no regressions.

Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
file.

The following patch adds missing documentation for the vec_test_lsbb_all_ones 
and, vec_test_lsbb_all_zeros built-ins.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros

The built-ins currently support unsigned char arguments.  Extend the

Nit: /unsigned char/vector unsigned char/


Fixed.




built-ins to also support vector signed char and vector bool char aruments.

Nit: /aruments/arguments/


Fixed





ndex 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost byte 
of each doubleword.
  The following additional built-in functions are also available for the
  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):

+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
+bit in each byte is equal to 1.  It returns a 0 otherwise.
Nit: s/a 0/0/


Fixed




+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
+bit in each byte is equal to zero.  It returns a 0 otherwise.

Nit: s/a 0/0/


Fixed




diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb.c 
b/gcc/testsuite/gcc.target/powerpc/lsbb.c
index b5c037094a5..650e944e082 100644
--- a/gcc/testsuite/gcc.target/powerpc/lsbb.c
+++ b/gcc/testsuite/gcc.target/powerpc/lsbb.c
@@ -9,16 +9,32 @@
  /* { dg-require-effective-target power10_ok } */

Nit: This power10_ok isn't needed, could you also remove it together?


OK, removed.




  /* { dg-options "-fno-inline -mdejagnu-cpu=power10 -O2" } */

... and this "-fno-inline".


Removed



-/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */
-/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 3 } } */
+/* { dg-final { scan-assembler-times {\msetbc\M} 3 } } */

I would expect the times are changed to 6 rather than 3, was this test
case really tested?  Or am I missing something?

BR,
Kewen


I retested and yes it fails.  Should be 6.  Not sure why my original 
testing didn't catch that.  Perhaps

I looked at the wrong output file???

Changed to

-/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 2 } } */
-/* { dg-final { scan-assembler-times {\msetbc\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxvtlsbb\M} 6 } } */
+/* { dg-final { scan-assembler-times {\msetbc\M} 6 } } */

and retested.  It now passes.

 Carl



Re: [PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-16 Thread Carl Love

Ping.

 Carl

On 8/9/24 8:57 AM, Carl Love wrote:


Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed 
char and vector bool char.  The patch has been tested on Power 10 LE 
and BE with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline. Thanks.

  Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


The built-ins currently support unsigned char arguments.  Extend the
built-ins to also support vector signed char and vector bool char 
aruments.


Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  24 +++-
 4 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the 
rightmost byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns a 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns a 0 otherwise.

 @smallexample
 @exdent vector unsigned long long int
diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c 
b/gcc/testsuite

Re: [PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-08-16 Thread Carl Love

Kewen:

Ping.

  Carl

On 8/7/24 10:15 AM, Carl Love wrote:



 GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be 
unsigned but are listed as signed.


Additionally, there are a number of test cases that are missing for 
the various instances of the built-ins.  Additionally, the 
documentation for the various built-ins is missing.


This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

    Carl
- 

rs6000, Add tests and documentation for vector conversions between 
integer and float


The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
  __builtin_altivecfloat_sisf (vsi);
  __builtin_altivec_uns_float_sisf (vui);
  __builtin_vsxfloate_v2di (vsll);
  __builtin_vsx_uns_floate_v2di (vull);
  __builtin_vsx_floato_v2di (vsll);
  __builtin_vsx_uns_floato_v2di (vull);
  __builtin_vsx_float2_v2di (vsll, vsll);
  __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
  __builtin_altivec_fix_sfsi
  __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed 
above.


gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
    __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
    argument from signed to unsigned.
    * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   |  37 +++
 .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
 3 files changed, 300 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
   const vd __builtin_vsx_uns_doubleo_v4si (vsi);
 UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
 UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
 UNS_FLOATO_V2DI unsfloatov2di {}

   const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
   const vss __builtin_vsx_revb_v8hi (vss);
 REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
 UNS_FLOAT2_V2DI uns_float2_v2di {}

   const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

 Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);
+vector float __builtin_vsx_floate_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floate_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_floato_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floato_v2di (vector unsigned long long 
int);

+vector float __builtin_vsx_float2_v2di (vector signed long long int,
+    vector signed long long int);
+vector float __builtin_vsx_uns_float2_v2di (vector unsigned long long 
int,
+    vector signed long long 
int);

+@end smallexample
+
+The @code{__builtin_altivec_float_sisf} and
+@code{__builtin_altivec_uns_float_sisf} built-ins convert signed and
+unsigned vectors of 32-bit integers to a vector of 32-bit floating point
+values.  The @code{__builtin_vsx_floate_v2di} and
+@code{__builtin_vsx_uns_floate_v2di} built-ins converts a vector
+long long ints to 32-bit floating point values

[PATCH ver 2] rs6000,extend and document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

2024-08-09 Thread Carl Love



Gcc maintainers:

Version 2, based on discussion additional overloaded instances of the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins has been 
added.  The additional instances are for arguments of vector signed char 
and vector bool char.  The patch has been tested on Power 10 LE and BE 
with no regressions.


Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl

rs6000,extend and document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


The built-ins currently support unsigned char arguments.  Extend the
built-ins to also support vector signed char and vector bool char aruments.

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

Add addtional test cases for the built-ins in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

gcc/ChangeLog:
    * config/rs6000/rs6000-overloaded.def (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add built-in instances for vector signed
    char and vector bool char.
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/lsbb-runnable.c: Add test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
    * gcc.target/powerpc/lsbb.c: Add compile test cases for the vector
    signed char and vector bool char instances of
    vec_test_lsbb_all_zeros and vec_test_lsbb_all_ones built-ins.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 gcc/doc/extend.texi   |  19 +++
 .../gcc.target/powerpc/lsbb-runnable.c    | 131 ++
 gcc/testsuite/gcc.target/powerpc/lsbb.c   |  24 +++-
 4 files changed, 156 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 87495aded49..7d9e31c3f9e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4403,12 +4403,20 @@
 XXEVAL  XXEVAL_VUQ

 [VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

+  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VSC
   signed int __builtin_vec_xvtlsbb_all_ones (vuc);
-    XVTLSBB_ONES
+    XVTLSBB_ONES LSBB_ALL_ONES_VUC
+  signed int __builtin_vec_xvtlsbb_all_ones (vbc);
+    XVTLSBB_ONES LSBB_ALL_ONES_VBC

 [VEC_TEST_LSBB_ALL_ZEROS, vec_test_lsbb_all_zeros, 
__builtin_vec_xvtlsbb_all_zeros]

+  signed int __builtin_vec_xvtlsbb_all_zeros (vsc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VSC
   signed int __builtin_vec_xvtlsbb_all_zeros (vuc);
-    XVTLSBB_ZEROS
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VUC
+  signed int __builtin_vec_xvtlsbb_all_zeros (vbc);
+    XVTLSBB_ZEROS LSBB_ALL_ZEROS_VBC

 [VEC_TRUNC, vec_trunc, __builtin_vec_trunc]
   vf __builtin_vec_trunc (vf);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 89fe5db7aed..5ca87889831 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23332,6 +23332,25 @@ signed long long will sign extend the rightmost 
byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector signed char);
+@exdent int vec_test_lsbb_all_ones (vector unsigned char);
+@exdent int vec_test_lsbb_all_ones (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is equal to 1.  It returns a 0 otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector signed char);
+@exdent int vec_test_lsbb_all_zeros (vector unsigned char);
+@exdent int vec_test_lsbb_all_zeros (vector bool char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is equal to zero.  It returns a 0 otherwise.

 @smallexample
 @exdent vector unsigned long long int
diff --git a/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c

index 2e97cc17b60..3e4f71bed12 100644
-

Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-08-07 Thread Carl Love



Steve, Peter, Kewen:

Yes, it does look like supporting signed and unsigned char would be 
consistent with the vec_cmpeq built-in.n  I have played around with 
adding both and it looks to be reasonable.


Per your second response:

On 8/7/24 9:25 AM, Steven Munroe wrote:

This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/AdhS1Rd-!-XFVHHo6HWfV3b_T4u-ovzvM2oECEmlsAI8bs75KG8ppbxA0ZMm3tLYvA1X6MiILD9q66A8Zvx82vUGfKlRvNQL57yam4nxYX3qt72hHJCHgHCIrFLK48OJa7w$> 


Actually for consistency should include vector bool/signed/unsigned char.


I haven't looked at including vector bool but at first glance don't see 
any reason why that couldn't be supported as well.


I will look at adding the additional vector signed char and and vector 
bool instances to both of the built-ins as well as adding documentation 
for all of the supported instances.


  Carl


On 8/7/24 9:24 AM, Steven Munroe wrote:
I would compare this case to vec_cmpeq. It supports both vector 
unsigned/signed char operands but generates the same instruction for 
either. So it would seem more consistent with the history of 
altivec. h to support both unsigned/sign char for


I would compare this case to vec_cmpeq.

It supports both vector unsigned/signed char operands but generates 
the same instruction for either.


So it would seem more consistent with the history of altivec.h to 
support both unsigned/sign char for



vec_test_lsbb_all_ones, vec_test_lsbb_all_zeros

And this might make life a little easier for users.


On Tue, Aug 6, 2024 at 10:12 AM Carl Love  wrote:

Steve:

Agreed the documentation only specifies unsigned char argument for
the
two built-ins.

Do you think we should add support signed char arguments in
addition to
the documented unsigned char arguments?

Do you see any situations where a user might want to to have both
signed
and unsigned char arguments for the two built-ins?

Thanks.

 Carl

On 8/5/24 2:12 PM, Steven Munroe wrote:
> Looking at the latest version of the Power Vector Intrinsic
> Programming Reference (Revision 2. 0. 0_prd, Bill slipped this
to me
> for review), I see that vec_test_lsbb_all_ones
vec_test_lsbb_all_zeros
> both specify vector unsigned char, only. On
>
> Looking at the latest version of the Power Vector Intrinsic
> Programming Reference (Revision 2.0.0_prd, Bill slipped this to
me for
> review), I see that
>
>
>     vec_test_lsbb_all_ones
>
>
>     vec_test_lsbb_all_zeros
>
> both specify vector unsigned char, only.
>
> On Mon, Aug 5, 2024 at 1:15 AM Kewen.Lin 
wrote:
>
>     on 2024/8/3 05:48, Peter Bergner wrote:
    >     > On 7/31/24 10:21 PM, Kewen.Lin wrote:
>     >> on 2024/8/1 01:52, Carl Love wrote:
>     >>> Yes, I noticed that the built-ins were defined as overloaded
>     but only had one definition.   Did seem odd to me.
>     >>>
>     >>>> either is with "vector unsigned char" as argument type, but
>     the corresponding instance
>     >>>> prototype in builtin table is with "vector signed char".
>     It's inconsistent and weird,
>     >>>> I think we can just update the prototype in builtin table
>     with "vector unsigned char"
>     >>>> and remove the entries in overload table.  It can be a
follow
>     up patch.
>     >>>
>     >>> I didn't notice that it was signed in the instance prototype
>     but unsigned in the overloaded definition. That is definitely
>     inconsistent.
>     >>>
>     >>> That said, should we just go ahead and support both
signed and
>     unsigned argument versions of the all ones and all zeros
built-ins?
>     >>
>     >> Good question, I thought about that but found openxl only
>     supports the unsigned version
>     >> so I felt it's probably better to keep consistent with
it.  But
>     I'm fine for either, if
>     >> we decide to extend it to cover both signed and unsigned, we
>     should notify openxl team
>     >> to extend it as well.
>     >>
>     >> openxl doc links:
>     >>
>     >>
>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
>     >>
>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic

[PATCH 4/4] rs6000, Add tests and documentation for vector, conversions between integer and float

2024-08-07 Thread Carl Love




 GCC maintainers:

The following patch fixes errors in the definition of the 
__builtin_vsx_uns_floate_v2di, __builtin_vsx_uns_floato_v2di and 
__builtin_vsx_uns_float2_v2di built-ins.  The arguments should be 
unsigned but are listed as signed.


Additionally, there are a number of test cases that are missing for the 
various instances of the built-ins.  Additionally, the documentation for 
the various built-ins is missing.


This patch adds the missing test cases and documentation.

The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

    Carl
-
rs6000, Add tests and documentation for vector conversions between 
integer and float


The arguments for the __builtin_vsx_uns_floate_v2di,
__builtin_vsx_uns_floato_v2di and __builtin_vsx_uns_float2_v2di built-ins
should be unsigned.

Add tests for the following existing integer and long long int to float
built-ins:
  __builtin_altivecfloat_sisf (vsi);
  __builtin_altivec_uns_float_sisf (vui);
  __builtin_vsxfloate_v2di (vsll);
  __builtin_vsx_uns_floate_v2di (vull);
  __builtin_vsx_floato_v2di (vsll);
  __builtin_vsx_uns_floato_v2di (vull);
  __builtin_vsx_float2_v2di (vsll, vsll);
  __builtin_vsx_uns_float2_v2di (vull, vull);

Add tests for the vector float to vector int built-ins:
  __builtin_altivec_fix_sfsi
  __builtin_altivec_fixuns_sfsi

The various built-ins are not documented.  The patch adds the missing
documentation for the variouls built-ins.

This patch fixes the incorrect __builtin_vsx_uns_float[o|e|2]_v2di
argument types and adds test cases for each of the built-ins listed above.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_uns_floate_v2di,
    __builtin_vsx_uns_floato_v2di,__builtin_vsx_uns_float2_v2di): Change
    argument from signed to unsigned.
    * doc/extend.texi: Add documentation for each of the built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-int-to-float-runnable.c: New file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 +-
 gcc/doc/extend.texi   |  37 +++
 .../powerpc/vsx-int-to-float-runnable.c   | 260 ++
 3 files changed, 300 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vsx-int-to-float-runnable.c


diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index f2bebd299b2..1227daa1555 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1463,10 +1463,10 @@
   const vd __builtin_vsx_uns_doubleo_v4si (vsi);
 UNS_DOUBLEO_V4SI unsdoubleov4si2 {}

-  const vf __builtin_vsx_uns_floate_v2di (vsll);
+  const vf __builtin_vsx_uns_floate_v2di (vull);
 UNS_FLOATE_V2DI unsfloatev2di {}

-  const vf __builtin_vsx_uns_floato_v2di (vsll);
+  const vf __builtin_vsx_uns_floato_v2di (vull);
 UNS_FLOATO_V2DI unsfloatov2di {}

   const vsll __builtin_vsx_vsigned_v2df (vd);
@@ -2272,7 +2272,7 @@
   const vss __builtin_vsx_revb_v8hi (vss);
 REVB_V8HI revb_v8hi {}

-  const vf __builtin_vsx_uns_float2_v2di (vsll, vsll);
+  const vf __builtin_vsx_uns_float2_v2di (vull, vull);
 UNS_FLOAT2_V2DI uns_float2_v2di {}

   const vsi __builtin_vsx_vsigned2_v2df (vd, vd);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index bf6f4094040..7ec4f19a6bf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22919,6 +22919,43 @@ but the index value must be 0.

 Only functions excluded from the PVIPR are listed here.

+The following built-ins convert signed and unsigned vectors of ints and
+long long ints to a vector of 32-bit floating point values.
+
+@smallexample
+vector float __builtin_altivec_float_sisf (vector int);
+vector float __builtin_altivec_uns_float_sisf (vector unsigned int);
+vector float __builtin_vsx_floate_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floate_v2di (vector unsigned long long int);
+vector float __builtin_vsx_floato_v2di (vector signed long long int);
+vector float __builtin_vsx_uns_floato_v2di (vector unsigned long long int);
+vector float __builtin_vsx_float2_v2di (vector signed long long int,
+    vector signed long long int);
+vector float __builtin_vsx_uns_float2_v2di (vector unsigned long long int,
+    vector signed long long int);
+@end smallexample
+
+The @code{__builtin_altivec_float_sisf} and
+@code{__builtin_altivec_uns_float_sisf} built-ins convert signed and
+unsigned vectors of 32-bit integers to a vector of 32-bit floating point
+values.  The @code{__builtin_vsx_floate_v2di} and
+@code{__builtin_vsx_uns_floate_v2di} built-ins converts a vector
+long long ints to 32-bit floating point values storing the results in
+the corresonding even vector element locations.  Similarly,

[PATCH 3/4] rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

2024-08-07 Thread Carl Love



GCC maintainers:

The patch removed the built-in __builtin_vsx_xvcvuxwdp as it is covered 
by the overloaded vec_doubleo built-in.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl


rs6000, Remove redundant built-in __builtin_vsx_xvcvuxwdp

The built-in __builtin_vsx_xvcvuxwdp is a duplicate of the overloaded
built-in vec_doubleo.  There are no test cases or documentation for
__builtin_vsx_xvcvuxwdp.  This patch removes the redundant built-in.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvuxwdp):
    Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 8bb7686bcc8..f2bebd299b2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1613,9 +1613,6 @@
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}

-  const vd __builtin_vsx_xvcvuxwdp (vsi);
-    XVCVUXWDP vsx_xvcvuxwdp {}
-
   const vf __builtin_vsx_xvcvuxwsp (vsi);
 XVCVUXWSP vsx_floatunsv4siv4sf2 {}

--
2.45.2




[PATCH 2/4] rs6000, remove built-ins __builtin_vsx_vperm_8hi and, __builtin_vsx_vperm_8hi_uns

2024-08-07 Thread Carl Love



GCC maintainers:

The following patch removes two redundant built-ins 
__builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns.  The built-ins 
are covered by the overloaded vec_perm built-in.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl

-
rs6000, remove built-ins __builtin_vsx_vperm_8hi and 
__builtin_vsx_vperm_8hi_uns


The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
are redundant. The are covered by the overloaded vec_perm built-in.  The
built-ins are not documented and do not have test cases.

This patch removes the redundant built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_8hi,
    __builtin_vsx_vperm_8hi_uns): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 0c3c884c110..8bb7686bcc8 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1469,12 +1469,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}

-  const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
-    VPERM_8HI_X altivec_vperm_v8hi {}
-
-  const vus __builtin_vsx_vperm_8hi_uns (vus, vus, vuc);
-    VPERM_8HI_UNS_X altivec_vperm_v8hi_uns {}
-
   const vsll __builtin_vsx_vsigned_v2df (vd);
 VEC_VSIGNED_V2DF vsx_xvcvdpsxds {}

--
2.45.2




[PATCH 1/4] rs6000, add testcases to the overloaded vec_perm built-in

2024-08-07 Thread Carl Love



GCC maintainers:

The following patch adds missing test cases for the overloaded vec_perm 
built-in.  It also fixes and issue with printing the 128-bit values in 
the DEBUG section that was noticed when adding the additional test cases.


The patch has been tested on Power 10 LE and BE with no regressions.

Please let me know if it is acceptable for mainline.  Thanks.

  Carl

-

rs6000, add testcases to the overloaded vec_perm built-in

The overloaded vec_perm built-in supports permuting signed and unsigned
vectors of char, bool char, short int, short bool, int, bool,
long long int, long long bool, int128, float and double.  However, not all
of the supported arguments are included in the test cases.  This patch adds
the missing test cases.

Additionally, in the 128-bit debug print statements the expected result and
the result need to be cast to unsigned long long to print correctly.  The
patch makes this additional change to the print statements.

gcc/ChangeLog:
    * doc/extend.texi: Fix spelling mistake in description of the
    vec_sel built-in.
    Add documentation of the 128-bit vec_perm instance.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases    for
    arguments of type vector signed long long int, long long bool,
    bool, bool short, bool char and pixel,
    vector unsigned long long int, unsigned int, unsigned short int,
    unsigned char.
    Cast arguments for debug prints to unsigned long long.
    * gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
    test cases for signed and unsigned int128 arguments.
---
 gcc/doc/extend.texi   |  12 +-
 .../powerpc/builtins-4-int128-runnable.c  | 108 +++---
 .../gcc.target/powerpc/vsx-builtin-3.c    |  18 +++
 3 files changed, 121 insertions(+), 17 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 48b27ff9f39..bf6f4094040 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21553,9 +21553,19 @@ vector bool __int128 vec_sel (vector bool __int128,
    vector bool __int128, vector unsigned __int128);
 @end smallexample

-The instance is an extension of the exiting overloaded built-in 
@code{vec_sel}
+The instance is an extension of the existing overloaded built-in 
@code{vec_sel}

 that is documented in the PVIPR.

+@smallexample
+vector signed __int128 vec_perm (vector signed __int128,
+   vector signed __int128);
+vector unsigned __int128 vec_perm (vector unsigned __int128,
+   vector unsigned __int128);
+@end smallexample
+
+The 128-bit integer arguments for the @code{vec_perm} built-in are in 
addition

+to the instances that are documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06

diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c

index 62c11132cf3..c61b0ecb854 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-4-int128-runnable.c
@@ -18,6 +18,16 @@ int main() {
   __uint128_t data_u128[100];
   __int128_t data_128[100];

+#ifdef __BIG_ENDIAN__
+  vector unsigned char vuc = {0xC, 0xD, 0xE, 0xF, 0x8, 0x9, 0xA, 0xB,
+  0x1C, 0x1D, 0x1E, 0x1F, 0x18, 0x19, 0x1A, 
0x1B};

+#else
+  vector unsigned char vuc = {0x4, 0x5, 0x6, 0x7, 0x0, 0x1, 0x2, 0x3,
+              0x14, 0x15, 0x16, 0x17, 0x10, 0x11, 0x12, 0x13};
+#endif
+
+  vector __int128_t vec_128_arg1, vec_128_arg2;
+  vector __uint128_t vec_u128_arg1, vec_u128_arg2;
   vector __int128_t vec_128_expected1, vec_128_result1;
   vector __uint128_t vec_u128_expected1, vec_u128_result1;
   signed long long zero = (signed long long) 0;
@@ -37,11 +47,13 @@ int main() {
 {
 #ifdef DEBUG
 printf("Error: vec_xl(), vec_128_result1[0] = %lld %llu; ",
-       vec_128_result1[0] >> 64,
-       vec_128_result1[0] & (__int128_t)0x);
+       (unsigned long long)(vec_128_result1[0] >> 64),
+       (unsigned long long)(vec_128_result1[0]
+                    & (__int128_t)0x));
 printf("vec_128_expected1[0] = %lld %llu\n",
-       vec_128_expected1[0] >> 64,
-       vec_128_expected1[0] & (__int128_t)0x);
+       (unsigned long long)(vec_128_expected1[0] >> 64),
+       (unsigned long long)(vec_128_expected1[0]
+                    & (__int128_t)0x));
 #else
 abort ();
 #endif
@@ -53,11 +65,13 @@ int main() {
 {
 #ifdef DEBUG
 printf("Error: vec_xl(), vec_u128_result1[0] = %lld; ",
-       vec_u128_result1[0] >> 64,
-       vec_u128_result1[0] & (__int128_t)0x);
+       (unsigned long long)(vec_u128_result1[0] >> 64),
+      

[PATCH 0/4] rs6000, remove redundant built-ins and add more test cases

2024-08-07 Thread Carl Love



GCC maintainers:

The following series of patches for PowerPC removes some built-ins that 
are covered by existing overloaded built-ins.  Additionally, there are 
patches to add missing testcases and documentation.


The patch series has been tested on Power 10 LE and BE with no regressions.

Please let me know if the patches are acceptable for mainline. Thanks.

    Carl



Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-08-06 Thread Carl Love

Steve:

Agreed the documentation only specifies unsigned char argument for the 
two built-ins.


Do you think we should add support signed char arguments in addition to 
the documented unsigned char arguments?


Do you see any situations where a user might want to to have both signed 
and unsigned char arguments for the two built-ins?


Thanks.

    Carl

On 8/5/24 2:12 PM, Steven Munroe wrote:
Looking at the latest version of the Power Vector Intrinsic 
Programming Reference (Revision 2. 0. 0_prd, Bill slipped this to me 
for review), I see that vec_test_lsbb_all_ones vec_test_lsbb_all_zeros 
both specify vector unsigned char, only. On


Looking at the latest version of the Power Vector Intrinsic 
Programming Reference (Revision 2.0.0_prd, Bill slipped this to me for 
review), I see that



vec_test_lsbb_all_ones


vec_test_lsbb_all_zeros

both specify vector unsigned char, only.

On Mon, Aug 5, 2024 at 1:15 AM Kewen.Lin  wrote:

on 2024/8/3 05:48, Peter Bergner wrote:
> On 7/31/24 10:21 PM, Kewen.Lin wrote:
>> on 2024/8/1 01:52, Carl Love wrote:
>>> Yes, I noticed that the built-ins were defined as overloaded
but only had one definition.   Did seem odd to me.
>>>
>>>> either is with "vector unsigned char" as argument type, but
the corresponding instance
>>>> prototype in builtin table is with "vector signed char". 
It's inconsistent and weird,
>>>> I think we can just update the prototype in builtin table
with "vector unsigned char"
>>>> and remove the entries in overload table.  It can be a follow
up patch.
>>>
>>> I didn't notice that it was signed in the instance prototype
but unsigned in the overloaded definition. That is definitely
inconsistent.
>>>
>>> That said, should we just go ahead and support both signed and
unsigned argument versions of the all ones and all zeros built-ins?
>>
>> Good question, I thought about that but found openxl only
supports the unsigned version
>> so I felt it's probably better to keep consistent with it.  But
I'm fine for either, if
>> we decide to extend it to cover both signed and unsigned, we
should notify openxl team
>> to extend it as well.
>>
>> openxl doc links:
>>
>>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-ones
>>

https://www.ibm.com/docs/en/openxl-c-and-cpp-aix/17.1.2?topic=functions-vec-test-lsbb-all-zeros
>
> If it makes sense to support vector signed char rather than only
the vector unsigned char,
> then I'm fine adding support for it.  It almost seems since we
tried adding an overload
> for it, that that was our intention (to support both signed and
unsigned) and we just
> had a bug so only unsigned was supported?

Good question but I'm not sure, it could be an oversight without
adding one more instance
for overloading, or adopting some useless code (only for
overloading) for a single instance.
I found it's introduced by r11-2437-gcf5d0fc2d1adcd, CC'ed Will as
he contributed this.

BR,
Kewen

>
> CC'ing Steve since he noticed the missing documentation when we
was trying to
> use the built-ins.  Steve, do you see a need to also support
vector signed char
> with these built-ins?
>
> Peter
>
>





Re: [PATCH ver 3] rs6000, Add new overloaded vector shift builtin int128, variants

2024-08-05 Thread Carl Love

Kewen:

On 8/4/24 11:13 PM, Kewen.Lin wrote:

Hi Carl,

on 2024/8/2 03:35, Carl Love wrote:

GCC developers:

Version 3, updated the testcase dg-do link to dg-do compile.  Moved the new 
documentation again.  Retested on Power 10 LE and BE to verify the dg arguments 
disable the test on Power10BE but enable the test for Power10LE.  Reran the 
full regression testsuite.   There were no new regressions for the testsuite.

Version 2, updated rs6000-overload.def to remove adding additonal internal 
names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from Kewen.  Move new 
documentation statement for the PIVPR built-ins per comments from Kewen.  
Updated dg-do-run directive and added comment about the save-temps  in testcase 
per feedback from Segher.  Retested the patch on Power 10 with no regressions.

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
vec_sro.  These varients were requested by Steve Munroe.

The patch has been tested on a Power 10 system with no regressions.

OK with the below nits tweaked and tested well on both BE and LE, thanks!


Fixed the various nits, see specifics below.

Rebased on current mainline, rebuilt and retested on LE and BE with no 
regression failures.


Will commit in a day or so assuming I don't receive any additional comments.




Please let me know if the patch is acceptable for mainline.

    Carl

--
rs6000, Add new overloaded vector shift builtin int128 variants

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

gcc/ChangeLog:
     * config/rs6000/altivec.md (vsdb_): Change
     define_insn iterator to VEC_IC.
     * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
     __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
     __builtin_altivec_vsrdb_v1ti): New builtin definitions.
     * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
     vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
     definitions.
     * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
     vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
     built-ins.

gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
---
  gcc/config/rs6000/altivec.md  |   6 +-
  gcc/config/rs6000/rs6000-builtins.def |  12 +
  gcc/config/rs6000/rs6000-overload.def |  40 ++
  gcc/doc/extend.texi   |  43 ++
  .../vec-shift-double-runnable-int128.c    | 419 ++
  5 files changed, 517 insertions(+), 3 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-       (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+       (match_operand:VEC_IC 2 "register_operand" "v")
     (match_operand:QI 3 "const_0_to_12_operand" "n")]
    VSHIFT_DBL_LR))]
    "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 77eb0f7e406..a2b2b729270 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
    const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
  VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}

Nit: s/VSLDOI_V1TI/VSLDOI_1TI/ to align with the other VSLDOI_* (no 'V' in *).


Changed to:

  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
    VSLDOI_1TI altivec_vsldoi_v1ti {}






+
    const vss __builtin_altivec_vslh (vss, vus);
  VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@
    const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
  XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldw

[PATCH ver 3] rs6000, Add new overloaded vector shift builtin int128, variants

2024-08-01 Thread Carl Love

GCC developers:

Version 3, updated the testcase dg-do link to dg-do compile.  Moved the 
new documentation again.  Retested on Power 10 LE and BE to verify the 
dg arguments disable the test on Power10BE but enable the test for 
Power10LE.  Reran the full regression testsuite.   There were no new 
regressions for the testsuite.


Version 2, updated rs6000-overload.def to remove adding additonal 
internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from 
Kewen.  Move new documentation statement for the PIVPR built-ins per 
comments from Kewen.  Updated dg-do-run directive and added comment 
about the save-temps  in testcase per feedback from Segher.  Retested 
the patch on Power 10 with no regressions.


The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, 
vec_srl, vec_sro.  These varients were requested by Steve Munroe.


The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

   Carl

--
rs6000, Add new overloaded vector shift builtin int128 variants

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

gcc/ChangeLog:
    * config/rs6000/altivec.md (vsdb_): Change
    define_insn iterator to VEC_IC.
    * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
    __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
    __builtin_altivec_vsrdb_v1ti): New builtin definitions.
    * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
    vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
    definitions.
    * doc/extend.texi (vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
    vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
    built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 gcc/config/rs6000/rs6000-builtins.def |  12 +
 gcc/config/rs6000/rs6000-overload.def |  40 ++
 gcc/doc/extend.texi   |  43 ++
 .../vec-shift-double-runnable-int128.c    | 419 ++
 5 files changed, 517 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
 (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

 (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-       (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+       (match_operand:VEC_IC 2 "register_operand" "v")
    (match_operand:QI 3 "const_0_to_12_operand" "n")]
   VSHIFT_DBL_LR))]
   "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..a2b2b729270 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
   const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
 VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
   const vss __builtin_altivec_vslh (vss, vus);
 VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@
   const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
 XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+    XXSLDWI_1TI vsx_xxsldwi_v1ti {}
+
   const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
 XXSLDWI_4SF vsx_xxsldwi_v4sf {}

@@ -3299,6 +3305,9 @@
   const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
 VSLDB_V8HI vsldb_v8hi {}

+  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+    VSLDB_V1TI vsldb_v1ti {}
+
   const vsq __builtin_altivec_vslq (vsq, vuq);
 VSLQ vashlv1ti3 {}

@@ -3317,6 +3326,9 @@
   const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
 VSRDB_V8HI vsrdb_v8hi {}

+  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+    VSRDB_V1TI vsrdb_v1ti {}
+
   const vsq __builtin_altivec_vsrq (vsq, vuq);
 VSRQ vlshrv1ti3 {}

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.d

Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-31 Thread Carl Love



Kewen:

On 7/29/24 3:21 AM, Kewen.Lin wrote:

+@smallexample
+@exdent vector signed __int128 vec_sld (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sld (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_sldw (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldw (vector unsigned __int,
+vector unsigned __int128, const unsigned int);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_slo (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_slo (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector signed char);
+@exdent vector signed __int128 vec_sro (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector signed char);
+@exdent vector unsigned __int128 vec_sro (vector unsigned __int128,
+vector unsigned char);
+@exdent vector signed __int128 vec_srl (vector signed __int128,
+vector unsigned char);
+@exdent vector unsigned __int128 vec_srl (vector unsigned __int128,
+vector unsigned char);
+@end smallexample
+
+The above instances are extension of the existing overloaded built-ins
+@code{vec_sld}, @code{vec_sldw}, @code{vec_slo}, @code{vec_sro}, @code{vec_srl}
+that are documented in the PVIPR.
+
  @findex vec_srdb

Nit: The above new @smallexample section and its associated description should 
be
placed after this @findex vec_srdb (otherwise it breaks the connection between 
the
index and the content of vec_srdb),

Yes, my bad.  I didn't notice I got the findex vec_srdb in the wrong place.


  but personally I preferred it to be placed at
the end of this node, that is: after
"int vec_any_le (vector unsigned __int128, vector unsigned __int128);
@end smallexample
" as what's in your previous version, since most of these beginning entries have
their headings but this @smallexample section doesn't have a heading, it looks a
bit weird.


OK, perhaps I didn't understand where you wanted it in the previous 
email.  I moved it.  Hopefully I have it correct this time.



  Vector Splat
diff --git 
a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
index 000..65e8e94ec07
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
@@ -0,0 +1,358 @@
+/* { dg-do run  { target power10_hw } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */

As Peter pointed out in another thread, you need int128 effective target check 
as well,
otherwise it will fail with power10 -m32.

Another nit: power10_hw should already guarantee power10_ok, so power10_ok
is only required for dg-do link.


Changed to:

+/* { dg-do run  { target power10_hw } } */
+/* { dg-do compile  { target { ! power10_hw } } } */
+/* { dg-require-effective-target int128 } */

per the discussion/feedback from Kewen and Peter.

 Carl


Re: [PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-31 Thread Carl Love

Kewen:

On 7/31/24 2:12 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/7/27 06:56, Carl Love wrote:

GCC maintainers:

Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC documentation 
file.

The following patch adds missing documentation for the vec_test_lsbb_all_ones 
and, vec_test_lsbb_all_zeros built-ins.

Please let me know if the patch is acceptable for mainline.  Thanks.

     Carl

---
rs6000, document built-ins vec_test_lsbb_all_ones and vec_test_lsbb_all_zeros

Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

The test cases for the built-ins are in files:
   gcc/testsuite/gcc.target/powerpc/lsbb.c
   gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c


gcc/ChangeLog:
     * doc/extend.texi (vec_test_lsbb_all_ones,
     vec_test_lsbb_all_zeros): Add documentation for the
     existing built-ins.
---
  gcc/doc/extend.texi | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83ff168faf6..96e41c9a905 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost byte 
of each doubleword.
  The following additional built-in functions are also available for the
  PowerPC family of processors, starting with ISA 3.1 (@option{-mcpu=power10}):

+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector char);

I think we need to specify "unsigned" char explicitly since we don't actually
allow vector "signed" char as the below testing shows:

int foo11 (vector signed char va)
{
   return vec_test_lsbb_all_ones (va);
}

:17:3: error: invalid parameter combination for AltiVec intrinsic 
'__builtin_vec_xvtlsbb_all_ones'
17 |   return vec_test_lsbb_all_ones (va);


Now we make these two bifs as overload, but there is only one instance 
respectively,
Yes, I noticed that the built-ins were defined as overloaded but only 
had one definition.   Did seem odd to me.



either is with "vector unsigned char" as argument type, but the corresponding 
instance
prototype in builtin table is with "vector signed char".  It's inconsistent and 
weird,
I think we can just update the prototype in builtin table with "vector unsigned 
char"
and remove the entries in overload table.  It can be a follow up patch.


I didn't notice that it was signed in the instance prototype but 
unsigned in the overloaded definition.  That is definitely inconsistent.


That said, should we just go ahead and support both signed and unsigned 
argument versions of the all ones and all zeros built-ins?


For example

[VEC_TEST_LSBB_ALL_ONES, vec_test_lsbb_all_ones, 
__builtin_vec_xvtlsbb_all_ones]

  signed int __builtin_vec_xvtlsbb_all_ones (vsc);
    XVTLSBB_ONES   LSBB_ALL_ONES_VSC
  signed int __builtin_vec_xvtlsbb_all_ones (vuc);
    XVTLSBB_ONES   LSBB_ALL_ONES_VUC

I tried this with the testcase, I borrowed from you and extended:

int foo11 (vector char va) 
<- compiles fine

{
  return vec_test_lsbb_all_ones (va);
}

int sfoo11 (vector signed char sva) <- currently fails to compile 
without change to overloaded def, compiles with

{ additional overloaded definition.
  return vec_test_lsbb_all_ones (sva);
}

int ufoo11 (vector unsigned char uva) <- 
compiles fine

{
  return vec_test_lsbb_all_ones (uva);
}

I did a quick test to see that the testcase does compile.  We would need 
to add testcases to lsbb.c and lsbb-runnable.c and then update

the documentation to say both are supported.

Thoughts on expanding the scope of the patch from just documentation to 
adding additional overloaded cases and updating the documentation?





+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least significant
+bit in each byte is a one.  It returns a zero otherwise.

May be better to use the wording "equal to 1" referred from ISA and "returns 0"
matches the preceding "returns 1", like:

“... in each byte is equal to 1.  It returns 0 otherwise.”


Changed.

+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least significant
+bit in each byte is a zero.  It returns a zero otherwise.

Likewise, "... in each byte is equal to 0.  It returns 0 otherwise."


Changed.

  Carl



Re: [PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-30 Thread Carl Love



Peter, Kewen:

Per Peter's request, I did the following testing on ltcd97-lp7 which is 
a Power 10 running in BE mode.


On 7/29/24 8:47 AM, Peter Bergner wrote:

Maybe the following will work?

+/* { dg-do run  { target power10_hw } } */
+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target int128 } */
...

Carl, can you try testing the above change on ltcd97-lp7 and run the test
in both 32-bit and 64-bit modes?

I tested with the above specification and -m64 and I get

    # of expected passes    8

I tested the above specification with -m32


/home/carll/GCC/gcc-steve/gcc/testsuite/gcc.target/powerpc/vec-shift-double-run\
nable-int128.c:390:346: warning: overflow in conversion from 'long long 
int' to\
 'int' changes value from '8526495043095935640' to '-19088744' 
[-Woverflow]^M

/home/carll/GCC/gcc-steve/gcc/testsuite/gcc.target/powerpc/vec-shift-double-run\
nable-int128.c:394:60: error: '__int128' is not supported on this target^



FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c (test for 
excess er\

rors)
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvsrdbi\\M 
found 0 tim\

es
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvsrdbi\\M 2
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvsldbi\\M 
found 0 tim\

es
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvsldbi\\M 2
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvsl\\M found 0 
times
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvsl\\M 2
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvsr\\M found 0 
times
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvsr\\M 2
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvslo\\M found 
0 times
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvslo\\M 4
gcc.target/powerpc/vec-shift-double-runnable-int128.c: \\mvsro\\M found 
0 times
FAIL: gcc.target/powerpc/vec-shift-double-runnable-int128.c 
scan-assembler-time\

s \\mvsro\\M 4


# of unexpected failures    7

Basically, the header is not detecting the int128.

But if I put the int128 in the dg-do run line, like vsc-buildin-20d.c

/* { dg-do run  { target { power10_hw }  && { int128 } } } */
/* { dg-do link { target { ! power10_hw } } } */
/* { dg-require-effective-target vsx_hw  } */

I get the following with -m32:

# of unsupported tests  1


Per the comments from Kewen:

On 7/29/24 7:27 PM, Kewen.Lin wrote:

Maybe the following will work?

+/* { dg-do run  { target power10_hw } } */
+/* { dg-do link { target { ! power10_hw } } } */

Maybe we can replace link by compile here, as we care about compilation and
execution result more here.  (IMHO if it's still "link", power10_ok is useful
to stop this being tested on an environment with an assembler not supporting
power10).

BR,
Kewen



I tried, I hope I got it right, with -m32t:

/* { dg-do run  { target power10_hw } } */
/* { dg-do compile  { target { ! power10_hw } } } */
/* { dg-require-effective-target int128 } */

This gives:

# of unsupported tests  1

The same header with -m64 I get:

# of expected passes    8

This header seems to give us what we want on Power10 BE with -m32 and 
m64 (tested on ltcd97-lp7).


 Carl




Re: [PATCH] rs6000, add comment to VEC_IC definition

2024-07-29 Thread Carl Love

Kewen:

On 7/29/24 3:21 AM, Kewen.Lin wrote:

index 0d3e0a24e11..75d95ccfb47 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,7 +26,8 @@
  ;; Vector int modes
  (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])

-;; Vector int modes for comparison, shift and rotation
+;; Vector int modes for comparison, shift and rotation.  ISA 3.1 adds the V1TI 
mode
+;; for the int128 type.

Maybe s/int128/vector int128/, OK with/without this nit tweaked, thanks!

OK, made the change and committed the patch.

Thanks.

   Carl


[PATCH] rs6000, add comment to VEC_IC definition

2024-07-26 Thread Carl Love

GCC maintainers:

This patch adds a comment to the VEC_IC definitions to clarify the V1TI 
"TARGET_POWER10" mode per the request by Segher in the feedback to patch 
"https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html";.


https://gcc.gnu.org/pipermail/gcc-patches/2024-July/658156.html

Please let me know if this patch is acceptable for mainline.

Thanks.

  Carl

rs6000, add comment to VEC_IC definition

This patch adds a comment to the VEC_IC definition to clarify
the V1TI "TARGET_POWER10" mode that was added.

gcc/ChangeLog:
    * config/rs6000/vector.md: Add comment for the VEC_IC
    define_mode_iterator.
---
 gcc/config/rs6000/vector.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0d3e0a24e11..75d95ccfb47 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,7 +26,8 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])

-;; Vector int modes for comparison, shift and rotation
+;; Vector int modes for comparison, shift and rotation.  ISA 3.1 adds 
the V1TI mode

+;; for the int128 type.
 (define_mode_iterator VEC_IC [V16QI V8HI V4SI V2DI (V1TI 
"TARGET_POWER10")])


 ;; 128-bit int modes
--
2.45.2




[PATCH] rs6000, document built-ins vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros

2024-07-26 Thread Carl Love

GCC maintainers:

Per a report from a user, the existing vec_test_lsbb_all_ones and, 
vec_test_lsbb_all_zeros built-ins are not documented in the GCC 
documentation file.


The following patch adds missing documentation for the 
vec_test_lsbb_all_ones and, vec_test_lsbb_all_zeros built-ins.


Please let me know if the patch is acceptable for mainline.  Thanks.

    Carl

---
rs6000, document built-ins vec_test_lsbb_all_ones and 
vec_test_lsbb_all_zeros


Add documentation for the Power 10 built-ins vec_test_lsbb_all_ones
and vec_test_lsbb_all_zeros.  The vec_test_lsbb_all_ones built-in
returns 1 if the least significant bit in each byte is a 1, returns
0 otherwise.  Similarly, vec_test_lsbb_all_zeros returns a 1 if
the least significant bit in each byte is a zero and 0 otherwise.

The test cases for the built-ins are in files:
  gcc/testsuite/gcc.target/powerpc/lsbb.c
  gcc/testsuite/gcc.target/powerpc/lsbb-runnable.c


gcc/ChangeLog:
    * doc/extend.texi (vec_test_lsbb_all_ones,
    vec_test_lsbb_all_zeros): Add documentation for the
    existing built-ins.
---
 gcc/doc/extend.texi | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 83ff168faf6..96e41c9a905 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23240,6 +23240,21 @@ signed long long will sign extend the rightmost 
byte of each doubleword.

 The following additional built-in functions are also available for the
 PowerPC family of processors, starting with ISA 3.1 
(@option{-mcpu=power10}):


+@smallexample
+@exdent int vec_test_lsbb_all_ones (vector char);
+@end smallexample
+@findex vec_test_lsbb_all_ones
+
+The builtin @code{vec_test_lsbb_all_ones} returns 1 if the least 
significant

+bit in each byte is a one.  It returns a zero otherwise.
+
+@smallexample
+@exdent int vec_test_lsbb_all_zeros (vector char);
+@end smallexample
+@findex vec_test_lsbb_all_zeros
+
+The builtin @code{vec_test_lsbb_all_zeros} returns 1 if the least 
significant

+bit in each byte is a zero.  It returns a zero otherwise.

 @smallexample
 @exdent vector unsigned long long int
--
2.45.2




[PATCH ver 2] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-26 Thread Carl Love

GCC developers:

Version 2, updated rs6000-overload.def to remove adding additonal 
internal names and to change XXSLDWI_Q to XXSLDWI_1TI per comments from 
Kewen.  Move new documentation statement for the PIVPR built-ins per 
comments from Kewen.  Updated dg-do-run directive and added comment 
about the save-temps  in testcase per feedback from Segher.  Retested 
the patch on Power 10 with no regressions.


The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, 
vec_srl, vec_sro.  These varients were requested by Steve Munroe.


The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

   Carl


---
rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

gcc/ChangeLog:
    * config/rs6000/altivec.md (vsdb_): Change
    define_insn iterator to VEC_IC.
    * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
    __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
    __builtin_altivec_vsrdb_v1ti): New builtin definitions.
    * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
    vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
    definitions.
    * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
    vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
    built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test file.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 gcc/config/rs6000/rs6000-builtins.def |  12 +
 gcc/config/rs6000/rs6000-overload.def |  40 ++
 gcc/doc/extend.texi   |  43 +++
 .../vec-shift-double-runnable-int128.c    | 358 ++
 5 files changed, 456 insertions(+), 3 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
 (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

 (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-       (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+       (match_operand:VEC_IC 2 "register_operand" "v")
    (match_operand:QI 3 "const_0_to_12_operand" "n")]
   VSHIFT_DBL_LR))]
   "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..a2b2b729270 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
   const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
 VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
   const vss __builtin_altivec_vslh (vss, vus);
 VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@
   const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
 XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+    XXSLDWI_1TI vsx_xxsldwi_v1ti {}
+
   const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
 XXSLDWI_4SF vsx_xxsldwi_v4sf {}

@@ -3299,6 +3305,9 @@
   const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
 VSLDB_V8HI vsldb_v8hi {}

+  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+    VSLDB_V1TI vsldb_v1ti {}
+
   const vsq __builtin_altivec_vslq (vsq, vuq);
 VSLQ vashlv1ti3 {}

@@ -3317,6 +3326,9 @@
   const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
 VSRDB_V8HI vsrdb_v8hi {}

+  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+    VSRDB_V1TI vsrdb_v1ti {}
+
   const vsq __builtin_altivec_vsrq (vsq, vuq);
 VSRQ vlshrv1ti3 {}

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index c4ecafc6f7e..96b0ecbd675 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3399,6 +3399,10 @@
 VSLDOI_4SF
   vd __builtin_vec_sld (vd, vd, const int);
 VSLDOI_2DF
+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_ve

Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-26 Thread Carl Love

Segher:

On 7/24/24 11:47 AM, Segher Boessenkool wrote:

Hi!

On Wed, Jul 24, 2024 at 11:38:11AM -0700, Carl Love wrote:

On 7/24/24 10:03 AM, Segher Boessenkool wrote:

So much manual stuff needed, sigh.

On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:

gcc/ChangeLog:
     * config/rs6000/altivec.md (vsdb_): Change
     define_insn iterator to VEC_IC.

 From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).

Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
at least?  Just something like "ISA 3.1 added 128-bit things" or
whatever, but don't leave the reader second-guessing, a reader will
often guess wrong :-)

I don't disagree that the reader will guess wrong, probably after being
frustated that it isn't obvious.  :-)
The VEC_IC was an existing definition, this patch does not add it.  Your
comments seems to imply you want a comment on the definition for VEC_IC
in vector.md?  I could add one to the existing definition if you like
but it seems outside the scope of this patch.

Yes, I'm just lamenting the state of things :-)

It would have to be a separate patch, yes.  A trivial patch to add such
a comment is pre-approved :-)


OK, no changes in this patch.  Will do a second patch.  Best that I post 
it for review anyway.





The change log entry could be improved to say "Change define_insn
iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would
that address your concerns?

The current changelog is fine.  Changelogs never can replace comments in
the code.


OK


+/* { dg-do run  { target { int128 } && { power10_hw } } } */

Everything power10 is int128 always.

OK, so don't need the power10_hw.  Changed to just int128 for the target:

No, the other way around: you cannot run the code on machines without
these (ISA 3.1) instructions!


OK, made it { dg-do run  { target power10_hw } } */


But p10 always satisfies the int128 predicate.  Although, hrm, how
about -m32 :-)


If I test with -m64 I get 8 passes:

   make -j 1 && make check-gcc RUNTESTFLAGS="-v -v 
powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m64}'"

  # of expected passes    8

If I test with -m32 I get unsupported test,

   make -j 1 && make check-gcc RUNTESTFLAGS="-v -v 
powerpc.exp=vec-shift-double-runnable-int128.c --target_board=unix'{-m32}'"

    # of unsupported tests  1

Looking further into the output the checks say:
   Checking pattern "hppa*-*-hpux*" with powerpc64le-unknown-linux-gnu
   compiler exited with status 1
   output is:
   cc1: error: '-m32' not supported in this configuration^M

   check_cached_effective_target power10_hw_available: returning 0 for 
unix/-m32

   is-effective-target: power10_hw 0

Looks like the power10_hw is sufficient to prevent the test from 
running.  Don't need to explicitly check for int128 as well.






     /* { dg-do run  target int128  } */

+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */

So this is enough always.

Often we have two testcases, one for run, one for compiling only.  It's
a bit simpler and cleaner.

Sounds like you would prefer to have a run and a compile test file? I
will create a new file vec-shift-double-int128.c  consisting of a series
of functions to test each built-in definition.

No, I don't prefer that, but it is easier to handle (also for you).
That that results in a bit more files, who cares, I don't anyway :-)


+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

Why the -save-temps?  Always document it if you want that for something,
but never put it in the testcase if not.  A leftover from development?

Okay for trunk, thank you!  Well Peter had some comments too, modulo
those I guess, I'll read them now ;-)

So as Peter said, the save-temps is because the runnable case file also
checks for assembler times at the end of the file.

Yup.  A comment would help :-)


OK, added comment.

/* { dg-require-effective-target power10_ok } */

/* Need -save-temps for dg-final scan-assembler-times at end of test.  */
/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */


  Carl


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Carl Love

Peter, Segher:

On 7/23/24 2:26 PM, Peter Bergner wrote:

On 7/19/24 3:04 PM, Carl Love wrote:

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+   (match_operand:VEC_IC 2 "register_operand" "v")
 (match_operand:QI 3 "const_0_to_12_operand" "n")]
VSHIFT_DBL_LR))]
"TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter

Segher's response was:

register_operand is just fine usually.  The "v" constraint already makes
sure things end up in a VMX (a lower VSX) register, the predicate
doesn't help here.  register_operand is shorter (and thus, preferred),
and also more likely correct if the code changes later 🙂


Which Peter said and Segher responded:

On Wed, Jul 24, 2024 at 12:12:05PM -0500, Peter Bergner wrote:


On 7/24/24 12:06 PM, Segher Boessenkool wrote:
I thought we always wanted the predicate to match the constraint being used?


Predicates and constraints have different purposes, and are used at
different times (typically).  Everything before RA is predicates
only, and RA and everything after it use constraints (as well).

register_operand says it has to be a register.  It allows any
pseudo-register, so before RA, there is no real difference between
register_operand and altivec_register_operand (which allows all pseudos
as well)..

The constraint should not demand things that weren't clear earlier,
because that will then cause reloading eventually, often with less
efficient code.  It still will*work*  though.

But that is not the case here 🙂


So, I think the final word here is don't change it.

 Carl







Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:21 AM, Kewen.Lin wrote:

Hi Carl,

Some minor comments are inlined on top of Segher's and Peter's comments.

on 2024/7/20 04:04, Carl Love wrote:

GCC developers:

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, vec_srl, 
vec_sro.  These varients were requested by Steve Munroe.

The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

    Carl


---
  rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

Add the missing internal names for the float and double types for
overloaded builtin vec_sld for the float and double types.

This isn't needed, see below explanation.


OK, per comments below, removed the additional internal names.



diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c4ecafc6f7e..302e0232533 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3396,9 +3396,13 @@
    vull __builtin_vec_sld (vull, vull, const int);
  VSLDOI_2DI  VSLDOI_VULL
    vf __builtin_vec_sld (vf, vf, const int);
-    VSLDOI_4SF
+    VSLDOI_4SF VSLDOI_VF
    vd __builtin_vec_sld (vd, vd, const int);
-    VSLDOI_2DF
+    VSLDOI_2DF VSLDOI_VD

The other instances for vector integer type have multiple uses of 1st token,
such as:

   vsll __builtin_vec_sld (vsll, vsll, const int);
 VSLDOI_2DI  VSLDOI_VSLL
   vbll __builtin_vec_sld (vbll, vbll, const int);
 VSLDOI_2DI  VSLDOI_VBLL
   vull __builtin_vec_sld (vull, vull, const int);
 VSLDOI_2DI  VSLDOI_VULL

, it's unable to use the 1st token VSLDOI_2DI for the overload id (otherwise
it can be ambiguous), but for vector float/double they don't have multiple
variants, VSLDOI_4SF and VSLDOI_2DF are used once respectively so they are
fine here.  I think the existing code is intentional so let's keep them
unchanged (creating more unnecessary ids is slightly worse than before).


OK, removed the addtional tokens VSLDOI_VF and VSLDOI_VD.



+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_vec_sld (vuq, vuq, const int);
+    VSLDOI_V1TI  VSLDOI_VUQ

  [VEC_SLDB, vec_sldb, __builtin_vec_sldb]
    vsc __builtin_vec_sldb (vsc, vsc, const int);
@@ -3417,6 +3421,10 @@
  VSLDB_V2DI  VSLDB_VSLL
    vull __builtin_vec_sldb (vull, vull, const int);
  VSLDB_V2DI  VSLDB_VULL
+  vsq __builtin_vec_sldb (vsq, vsq, const int);
+    VSLDB_V1TI  VSLDB_VSQ
+  vuq __builtin_vec_sldb (vuq, vuq, const int);
+    VSLDB_V1TI  VSLDB_VUQ

  [VEC_SLDW, vec_sldw, __builtin_vec_sldw]
    vsc __builtin_vec_sldw (vsc, vsc, const int);
@@ -3439,6 +3447,10 @@
  XXSLDWI_4SF  XXSLDWI_VF
    vd __builtin_vec_sldw (vd, vd, const int);
  XXSLDWI_2DF  XXSLDWI_VD
+  vsq __builtin_vec_sldw (vsq, vsq, const int);
+    XXSLDWI_Q  XXSLDWI_VSQ
+  vuq __builtin_vec_sldw (vuq, vuq, const int);
+    XXSLDWI_Q  XXSLDWI_VUQ

Nit: s/XXSLDWI_Q/XXSLDWI_1TI/ to keep consistent with the
other XXSLDWI_* as 1st token (XXSLDWI_16QI etc. are used
above rather than XXSLDWI_{SC,UC} etc.)


OK, changed to:

  vsq __builtin_vec_sldw (vsq, vsq, const int);
    XXSLDWI_1TI  XXSLDWI_VSQ
  vuq __builtin_vec_sldw (vuq, vuq, const int);
    XXSLDWI_1TI  XXSLDWI_VUQ




  [VEC_SRV, vec_srv, __builtin_vec_vsrv]
    vuc __builtin_vec_vsrv (vuc, vuc);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0b572afca72..5125a6d9def 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -23504,6 +23504,10 @@ const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_sldb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_sldb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_sldb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int);
  @end smallexample

  Shift the combined input vectors left by the amount specified by the low-order
@@ -23531,6 +23535,10 @@ const unsigned int);
  vector signed long long, const unsigned int);
  @exdent vector unsigned long long vec_srdb (vector unsigned long long,
  vector unsigned long long, const unsigned int);
+@exdent vector signed __int128 vec_srdb (vector signed __int128,
+vector signed __int128, const unsigned int);
+@exdent vector unsigned __int128 vec_srdb (vector unsigned __int128,
+vector unsigned __int128, const unsigned int)

Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:24 AM, Kewen.Lin wrote:




optimization the number of assembly generated for the two methods are
similar.  With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types.  The assembly for
the C-code version of the 1Ti requres one less assembly instruction.

Nit: s/requres/requires/

Fixed

    fprintf (header_file, "\n");
-  fprintf (header_file,
-       "#define bif_is_set(x)\t\t((x).bifattrs & bif_set_bit)\n");
    fprintf (header_file,
     "#define bif_is_extract(x)\t((x).bifattrs & bif_extract_bit)\n");
    fprintf (header_file,
@@ -2497,10 +2491,9 @@ write_bif_static_init (void)
    fprintf (init_file, "  /* nargs */\t%d,\n",
     bifp->proto.nargs);
    fprintf (init_file, "  /* bifattrs */\t0");
-  if (bifp->attrs.isset)
-    fprintf (init_file, " | bif_set_bit");
    if (bifp->attrs.isextract)
  fprintf (init_file, " | bif_extract_bit");
+

Nit: unnecessary empty line.


Fixed

 Carl


Re: [PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:21 AM, Kewen.Lin wrote:

The patch, first patch in this series, to remove the __builtin_vec_set_v1ti, 
__builtin_vec_set_v2df, __builtin_vec_set_v2di was previously posted.  The 
feedback on the patch was that we could also remove set bif attribute.  Removal 
of the set bif attribute requires also removing the __builtin_vsx_set_1ti,  
__builtin_vsx_set_2df, __builtin_vsx_set_2di built-ins.  The second patch 
removes the vsx set built-ins and the now no longer used set built-in attribute 
and associated code.

The patches have been tested on a Power 10 LE system with no regressions.

It would be good to test this on BE as well (both 64-bit and 32-bit).


Yes.  I updated my scripts to test the vec_set and vsx_set code 
generations with -m32.  The code generation for the various test case 
with -m32 and -m64.  The code generation for -m32 versus -m64 is 
slightly different as expected.  When comparing the results with and 
without the patch for -m32 the generated assembly is again the same or 
better for the C code versus the built-ins.   So, no surprises with any 
of the testing with -m32.  It is consistent with the results for the 
-m64 testing.


    Carl



Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:22 AM, Kewen.Lin wrote:

on 2024/7/24 01:52, Carl Love wrote:

GCC maintainers:

This patch was previously posted.  Per the feedback, it is now the first of two 
patches to remove the set built-ins.

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
__builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
update the various vector elements.  This change was originally intended to be 
part of the earlier series of cleanup patches.  It was initially thought that 
some additional work would be needed to do some gimple generation instead of 
these built-ins.  However, the existing default code generation does produce 
the needed code.    For the vec_set bif, the equivalent C code is as good or 
better than the built-in.  For the vec_insert bif whose resolving previously 
made use of the vec_set bif, the assembly code generation is as good as before 
with the -O3 optimization.

This background information will be also mentioned in commit log, right?


Forgot to mention, I added the paragragh to the commit log.

   Carl



Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-25 Thread Carl Love

Kewen:

On 7/25/24 1:22 AM, Kewen.Lin wrote:

-
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di

Remove the built-ins, use the default gimple generation instead.

OK for trunk with better commit log like the above paragraph, thanks!

// Assuming testing on BE goes well too. 🙂
Good point, I hadn't double checked things on BE.  I tested the patches 
today on BE.  The patches do not generate any additional regression testing.


I also investigated the assembly code generation with and without the 
patches for -O0 and -O3 using the same scripts as I used previously on 
LE.  I see the same results.  With -O0 the assembly code generations 
take one extra instruction for the built-in.  With -O3, the code 
generated for the  vsx set 2df and 2di cases is identical.  The code for 
the vsx set 1di case requires fewer assembly instructions for the C code 
versus the built-in.


 Carl


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Carl Love

Segher:

Thanks for the review, a few questions...

On 7/24/24 10:03 AM, Segher Boessenkool wrote:

Hi!

So much manual stuff needed, sigh.

On Fri, Jul 19, 2024 at 01:04:12PM -0700, Carl Love wrote:

gcc/ChangeLog:
     * config/rs6000/altivec.md (vsdb_): Change
     define_insn iterator to VEC_IC.

 From VI2 (a nothing-saying name) to VEC_IC (also a nonsensical name).

Maybe VEC_IC should have a comment explaining the TARGET_POWER10 thing
at least?  Just something like "ISA 3.1 added 128-bit things" or
whatever, but don't leave the reader second-guessing, a reader will
often guess wrong :-)


I don't disagree that the reader will guess wrong, probably after being 
frustated that it isn't obvious.  :-)
The VEC_IC was an existing definition, this patch does not add it.  Your 
comments seems to imply you want a comment on the definition for VEC_IC 
in vector.md?  I could add one to the existing definition if you like 
but it seems outside the scope of this patch.


The change log entry could be improved to say "Change define_insn 
iterator to VEC_IC which included the V1TI type added in ISA 3.1." Would 
that address your concerns?



gcc/testsuite/ChangeLog:
     * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
     file.

Please don't line-wrap where not wanted.  Changelog lines are 80
character positions wide.  (Or 79 if you want, but heh).


Yea, it does look like file will just fit on the same line.  Fixed.



+The above instances are extension of the exiting overloaded built-ins

(existing)

Fixed spelling error.




a/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
new file mode 100644
index 000..bb90f489149
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c
@@ -0,0 +1,349 @@
+/* { dg-do run  { target { int128 } && { power10_hw } } } */

Everything power10 is int128 always.


OK, so don't need the power10_hw.  Changed to just int128 for the target:

    /* { dg-do run  target int128  } */

+/* { dg-do link { target { ! power10_hw } } } */
+/* { dg-require-effective-target power10_ok } */

So this is enough always.

Often we have two testcases, one for run, one for compiling only.  It's
a bit simpler and cleaner.


Sounds like you would prefer to have a run and a compile test file? I 
will create a new file vec-shift-double-int128.c  consisting of a series 
of functions to test each built-in definition.



+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */

Why the -save-temps?  Always document it if you want that for something,
but never put it in the testcase if not.  A leftover from development?

Okay for trunk, thank you!  Well Peter had some comments too, modulo
those I guess, I'll read them now ;-)
So as Peter said, the save-temps is because the runnable case file also 
checks for assembler times at the end of the file.


I will move the scan-assembler-times checks to the new compile only test.

    Carl


Re: [PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-24 Thread Carl Love

Peter:

On 7/23/24 2:26 PM, Peter Bergner wrote:

On 7/19/24 3:04 PM, Carl Love wrote:

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
  (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

  (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-   (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+   (match_operand:VEC_IC 2 "register_operand" "v")
 (match_operand:QI 3 "const_0_to_12_operand" "n")]
VSHIFT_DBL_LR))]
"TARGET_POWER10"

I know the old code used the register_operand predicate for the vector
operands, but those really should be changed to altivec_register_operand.

Peter


OK, will add that change and retest the patch.  Thanks.

 Carl


Re: [PATCH 1/2] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-23 Thread Carl Love



GCC maintainers:

This patch was previously posted.  Per the feedback, it is now the first 
of two patches to remove the set built-ins.


This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.    For the vec_set bif, the equivalent C code is as good or better 
than the built-in.  For the vec_insert bif whose resolving previously 
made use of the vec_set bif, the assembly code generation is as good as 
before with the -O3 optimization.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

-
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert): Remove the
    handling for constant vec_insert position with
    VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 47830b7dcb0..75c33aa9ffc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 68519e1397f..04882c396bf 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1524,46 +1524,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




Re: [PATCH 2/2] rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, __builtin_vsx_set_2di

2024-07-23 Thread Carl Love

GCC maintainers:

This patch removes the vsx set built-ins: __builtin_vsx_set_1ti, 
__builtin_vsx_set_2df, __builtin_vsx_set_2di.  With the  removal of 
these built-ins, the built-in attribute "set", used in the built-in 
definition file, is no longer needed.  The "set"  and the associated 
code for the "set" is removed.


The assembly code generated by using C code to set an element of a 
vector versus using the vsx set built-in to set an element was 
investigated.  With -O0 optimization the generated assmenly code is 
comparable in therms of the generated assembly instrucitons and number 
of instructions.  For the -O3 optimization level, the 2DI an 2DF cases 
the built-ins and the C code generate identical assembly code.  The 
assembly code generated for the 1TI case for the C code has one less 
instruction.  The built-in generates an extra load instruction.  Hence, 
the C code is better as it has fewer load instructions.


The testcase for the __builtin_vsx_set_2df is removed.  The other 
built-ins do not have testcases.


The patch has been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000, remove built-ins __builtin_vsx_set_1ti, __builtin_vsx_set_2df, 
__builtin_vsx_set_2di


The built-ins set a value in a vector.  The same operation can be done
in C-code.  The assembly code generated from the C-code is as good or
better than the code generated by the built-ins.  With default
optimization the number of assembly generated for the two methods are
similar.  With -O3 optimization, the assembly generated for the two
approaches is identical for the 2DF and 2DI types.  The assembly for
the C-code version of the 1Ti requres one less assembly instruction.
It also only uses one load versus two loads for the built-in.

With the removal of the built-ins, there are no other uses of the
set built-in attribute.  The code associated with the set built-in
attribute is removed.

Finally, the testcase for the __builtin_vsx_set_2df is removed.  The
other built-ins do not have testcases.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtin.cc (get_element_number,
    altivec_expand_vec_set_builtin): Remove functions.
    (rs6000_expand_builtin): Remove the if statement to call
    altivec_expand_vec_set_builtin.
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_set_1ti,
    __builtin_vsx_set_2df, __builtin_vsx_set_2di): Remove the
    built-in definitions.
    * config/rs6000/rs6000-gen-builtins.cc (struct attrinfo):
    Remove the isset variable from the structure.
    (parse_bif_attrs): Remove the uses of the isset variable.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c: Remove test cases for the
    __builtin_vsx_set_2df built-in.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 53 ---
 gcc/config/rs6000/rs6000-builtins.def | 10 
 gcc/config/rs6000/rs6000-gen-builtins.cc  | 29 --
 .../gcc.target/powerpc/vsx-builtin-3.c    |  6 ---
 4 files changed, 11 insertions(+), 87 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc

index 117cf0125f8..099cbc82245 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2313,56 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
icode, tree exp, rtx target)

   return target;
 }

-/* Return the integer constant in ARG.  Constrain it to be in the range
-   of the subparts of VEC_TYPE; issue an error if not.  */
-
-static int
-get_element_number (tree vec_type, tree arg)
-{
-  unsigned HOST_WIDE_INT elt, max = TYPE_VECTOR_SUBPARTS (vec_type) - 1;
-
-  if (!tree_fits_uhwi_p (arg)
-  || (elt = tree_to_uhwi (arg), elt > max))
-    {
-  error ("selector must be an integer constant in the range [0, 
%wi]", max);

-  return 0;
-    }
-
-  return elt;
-}
-
-/* Expand vec_set builtin.  */
-static rtx
-altivec_expand_vec_set_builtin (tree exp)
-{
-  machine_mode tmode, mode1;
-  tree arg0, arg1, arg2;
-  int elt;
-  rtx op0, op1;
-
-  arg0 = CALL_EXPR_ARG (exp, 0);
-  arg1 = CALL_EXPR_ARG (exp, 1);
-  arg2 = CALL_EXPR_ARG (exp, 2);
-
-  tmode = TYPE_MODE (TREE_TYPE (arg0));
-  mode1 = TYPE_MODE (TREE_TYPE (TREE_TYPE (arg0)));
-  gcc_assert (VECTOR_MODE_P (tmode));
-
-  op0 = expand_expr (arg0, NULL_RTX, tmode, EXPAND_NORMAL);
-  op1 = expand_expr (arg1, NULL_RTX, mode1, EXPAND_NORMAL);
-  elt = get_element_number (TREE_TYPE (arg0), arg2);
-
-  if (GET_MODE (op1) != mode1 && GET_MODE (op1) != VOIDmode)
-    op1 = convert_modes (mode1, GET_MODE (op1), op1, true);
-
-  op0 = force_reg (tmode, op0);
-  op1 = force_reg (mode1, op1);
-
-  rs6000_expand_vector_set (op0, op1, GEN_INT (elt));
-
-  return op0;
-}
-
 /* Expand vec_ext builtin.  */
 static rtx
 altivec_expan

[PATCH 0/2] rs6000, remove vec and vsx set builtins

2024-07-23 Thread Carl Love

GCC maintainers:

The code generated by using C-code to set a vector element versus using 
a built-in has been investigated.  The assembly code generated from the 
C-code is as good or better than the assembly code generated for the 
built-ins for both the -O0 and -O3 levels of optimization.


For the vec_insert built-in bif whose resolving makes use of the vec_set 
bif previously, is now removed, is as good as before with optimization.


This two patch series removes the __builtin_vec_set_v1ti, 
__builtin_vec_set_v2df, __builtin_vec_set_v2di and  built-ins 
__builtin_vsx_set_1ti,  __builtin_vsx_set_2df, __builtin_vsx_set_2di 
built-ins in favor of using C-code instead.  The built-ins use the 
built-in set attribute in the definitions of the built-ins.  With the 
removal of these 6 built-ins, the set built-in attribute is no longer 
used and the related code for the attribute is removed.


The patch, first patch in this series, to remove the 
__builtin_vec_set_v1ti, __builtin_vec_set_v2df, __builtin_vec_set_v2di 
was previously posted.  The feedback on the patch was that we could also 
remove set bif attribute.  Removal of the set bif attribute requires 
also removing the __builtin_vsx_set_1ti,  __builtin_vsx_set_2df, 
__builtin_vsx_set_2di built-ins.  The second patch removes the vsx set 
built-ins and the now no longer used set built-in attribute and 
associated code.


The patches have been tested on a Power 10 LE system with no regressions.

Carl


[PATCH ver 2] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-23 Thread Carl Love

GCC maintainers:

version 2, Updated patch comments, added missing ChangeLog.  Fixed 
unintended line removal.


The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the recommended PVIPR documented overloaded
vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The difference is that the overloaded built-ins return a vector of
32-bit booleans.  The removed built-ins returned a vector of floats.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp,
    __builtin_vsx_xvcmpgesp, __builtin_vsx_xvcmpgtsp): Remove
    definitions.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xvcmpeqdp,
    __builtin_vsx_xvcmpgtdp, __builtin_vsx_xvcmpgedp,
    __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgtsp,
    __builtin_vsx_xvcmpgesp): Remove.
    (vec_cmpeq, vec_cmpgt, vec_cmpge): Add tests for float
    arguments that     store result in boolean and cast result to
    store result in float.  Add tests for double arguments that
    store the result in long long boolean and cast result to
    double.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 --
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..47830b7dcb0 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,18 +1579,12 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

@@ -1600,9 +1594,6 @@
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the 

Re: [PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-22 Thread Carl Love



Kewen:

On 7/22/24 2:09 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/7/18 00:01, Carl Love wrote:

GCC maintainers:

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df and 
__builtin_vec_set_v2di built-ins.  The users should just use normal C-code to 
update the various vector elements.  This change was originally intended to be 
part of the earlier series of cleanup patches.  It was initially thought that 
some additional work would be needed to do some gimple generation instead of 
these built-ins.  However, the existing default code generation does produce 
the needed code.  The code generated with normal C-code is as good or better 
than the code generated with these built-ins.

I think we need to expand this a bit:
   - For vec_set bif, the equivalent C code is as good as or better than it.
   - For vec_insert bif whose resolving makes use of vec_set bif previously 
(now get removed),
 it's as good as before with optimization.

The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

    Carl

---
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di

Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
     * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
     __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
     definitions.
     * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
     statemnts for mode == V2DFmode, mode == V2DImode and

Nit: s/statemnts/statements/


OK, fixed

Maybe a bit more meaningful like: Remove the handling for constant vec_insert 
position
with VECTOR_UNIT_VSX_P V1TImode, V2DFmode and V2DImode modes.

OK, changed




     mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
     RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
---
  gcc/config/rs6000/rs6000-builtins.def | 13 -
  gcc/config/rs6000/rs6000-c.cc | 40 ---
  2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 896d9686ac6..0ebc940f395 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
    const signed long long __builtin_vec_ext_v2di (vsll, signed int);
  VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-

Unexpected empty line removed.
??  I don't remove the blank line before the removed comment, so there 
is still a single blank line before the next entry. Specifically, the 
code with the above removed now looks like:


...
  const signed long long __builtin_vec_ext_v2di (vsll, signed int);
    VEC_EXT_V2DI nothing {extract}

  const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
    CMPGE_16QI vector_nltv16qi {}

  const vsll __builtin_vsx_cmpge_2di (vsll, vsll);
    CMPGE_2DI vector_nltv2di {}


Which looks OK to me?


Similar to vec_init removal, we should also get rid of set bif attribute,
bif_is_set and altivec_expand_vec_set_builtin etc.

That will also require removing:

 const vsq __builtin_vsx_set_1ti (vsq, signed __int128, const int<0,0>);
   SET_1TI vsx_set_v1ti {set}

  const vd __builtin_vsx_set_2df (vd, double, const int<0,1>);
    SET_2DF vsx_set_v2df {set}

 const vsll __builtin_vsx_set_2di (vsll, signed long long, const int<0,1>);
    SET_2DI vsx_set_v2di {set}

I would assume the C-code generation for the above will be as good or 
better than the code generation for the built-ins but will need to 
verify that.  I haven't looked at them specifically.


  Carl


[PATCH] rs6000, Add new overloaded vector shift builtin int128, varients

2024-07-19 Thread Carl Love

GCC developers:

The following patch adds the int128 varients to the existing overloaded 
built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo, vec_srdb, 
vec_srl, vec_sro.  These varients were requested by Steve Munroe.


The patch has been tested on a Power 10 system with no regressions.

Please let me know if the patch is acceptable for mainline.

   Carl


---
 rs6000, Add new overloaded vector shift builtin int128 varients

Add the signed __int128 and unsigned __int128 argument types for the
overloaded built-ins vec_sld, vec_sldb, vec_sldw, vec_sll, vec_slo,
vec_srdb, vec_srl, vec_sro.  For each of the new argument types add a
testcase and update the documentation for the built-in.

Add the missing internal names for the float and double types for
overloaded builtin vec_sld for the float and double types.

gcc/ChangeLog:
    * config/rs6000/altivec.md (vsdb_): Change
    define_insn iterator to VEC_IC.
    * config/rs6000/rs6000-builtins.def (__builtin_altivec_vsldoi_v1ti,
    __builtin_vsx_xxsldwi_v1ti, __builtin_altivec_vsldb_v1ti,
    __builtin_altivec_vsrdb_v1ti): New builtin definitions.
    * config/rs6000/rs6000-overload.def (vec_sld, vec_sldb, vec_sldw,
    vec_sll, vec_slo, vec_srdb, vec_srl, vec_sro): New overloaded
    definitions.
    (vec_sld): Add missing internal names.
    * doc/extend.texi (vec_sld, vec_sldb, vec_sldw,    vec_sll, vec_slo,
    vec_srdb, vec_srl, vec_sro): Add documentation for new overloaded
    built-ins.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/vec-shift-double-runnable-int128.c: New test
    file.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 gcc/config/rs6000/rs6000-builtins.def |  12 +
 gcc/config/rs6000/rs6000-overload.def |  44 ++-
 gcc/doc/extend.texi   |  42 +++
 .../vec-shift-double-runnable-int128.c    | 349 ++
 5 files changed, 448 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/vec-shift-double-runnable-int128.c


diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 5af9bf920a2..2a18ee44526 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -878,9 +878,9 @@ (define_int_attr SLDB_lr [(UNSPEC_SLDB "l")
 (define_int_iterator VSHIFT_DBL_LR [UNSPEC_SLDB UNSPEC_SRDB])

 (define_insn "vsdb_"
- [(set (match_operand:VI2 0 "register_operand" "=v")
-  (unspec:VI2 [(match_operand:VI2 1 "register_operand" "v")
-       (match_operand:VI2 2 "register_operand" "v")
+ [(set (match_operand:VEC_IC 0 "register_operand" "=v")
+  (unspec:VEC_IC [(match_operand:VEC_IC 1 "register_operand" "v")
+       (match_operand:VEC_IC 2 "register_operand" "v")
    (match_operand:QI 3 "const_0_to_12_operand" "n")]
   VSHIFT_DBL_LR))]
   "TARGET_POWER10"
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..fbb6e1ddf85 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -964,6 +964,9 @@
   const vss __builtin_altivec_vsldoi_8hi (vss, vss, const int<4>);
 VSLDOI_8HI altivec_vsldoi_v8hi {}

+  const vsq __builtin_altivec_vsldoi_v1ti (vsq, vsq, const int<4>);
+    VSLDOI_V1TI altivec_vsldoi_v1ti {}
+
   const vss __builtin_altivec_vslh (vss, vus);
 VSLH vashlv8hi3 {}

@@ -1831,6 +1834,9 @@
   const vsll __builtin_vsx_xxsldwi_2di (vsll, vsll, const int<2>);
 XXSLDWI_2DI vsx_xxsldwi_v2di {}

+  const vsq __builtin_vsx_xxsldwi_v1ti (vsq, vsq, const int<2>);
+    XXSLDWI_Q vsx_xxsldwi_v1ti {}
+
   const vf __builtin_vsx_xxsldwi_4sf (vf, vf, const int<2>);
 XXSLDWI_4SF vsx_xxsldwi_v4sf {}

@@ -3299,6 +3305,9 @@
   const vss __builtin_altivec_vsldb_v8hi (vss, vss, const int<3>);
 VSLDB_V8HI vsldb_v8hi {}

+  const vsq __builtin_altivec_vsldb_v1ti (vsq, vsq, const int<3>);
+    VSLDB_V1TI vsldb_v1ti {}
+
   const vsq __builtin_altivec_vslq (vsq, vuq);
 VSLQ vashlv1ti3 {}

@@ -3317,6 +3326,9 @@
   const vss __builtin_altivec_vsrdb_v8hi (vss, vss, const int<3>);
 VSRDB_V8HI vsrdb_v8hi {}

+  const vsq __builtin_altivec_vsrdb_v1ti (vsq, vsq, const int<3>);
+    VSRDB_V1TI vsrdb_v1ti {}
+
   const vsq __builtin_altivec_vsrq (vsq, vuq);
 VSRQ vlshrv1ti3 {}

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index c4ecafc6f7e..302e0232533 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3396,9 +3396,13 @@
   vull __builtin_vec_sld (vull, vull, const int);
 VSLDOI_2DI  VSLDOI_VULL
   vf __builtin_vec_sld (vf, vf, const int);
-    VSLDOI_4SF
+    VSLDOI_4SF VSLDOI_VF
   vd __builtin_vec_sld (vd, vd, const int);
-    VSLDOI_2DF
+    VSLDOI_2DF VSLDOI_VD
+  vsq __builtin_vec_sld (vsq, vsq, const int);
+    VSLDOI_V1TI  VSLDOI_VSQ
+  vuq __builtin_vec_sld

[PATCH ver 3] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love

GCC maintainers:

Version 3, in version 2, the ChangeLog didn't get updated to remove the 
LP64 references.  Fixed that and updated the patch description per the 
feedback from Peter.


Version 2, removed the lp64 from the target per discussion.  Tested and 
it is not needed.  The int128 qualifier is sufficient for the thest to 
report as unsupported on a 32-bit Power system.


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl
--
rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..e2d3c990852 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..007892e2731 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..df1bf873cfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target  int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




Re: [PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love




On 7/16/24 6:01 PM, Peter Bergner wrote:

On 7/16/24 6:19 PM, Carl Love wrote:

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

The test cases themselves look good, but you need to update your git log entry
to not mention the lp64/64-bits since you removed them.

Yea, I didn't get the lp64 references clean up properly.  Sorry about that.

  Yes, currently, only
64-bit targets support __int128, but our hope is that one day, even 32-bit
targets will as well.  So how about the following text instead?


...
use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.


OK, changed.

 Carl




[PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-17 Thread Carl Love

GCC maintainers:

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.  The code generated with normal C-code is as good or better than 
the code generated with these built-ins.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

---
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
    statemnts for mode == V2DFmode, mode == V2DImode and
    mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
    RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 896d9686ac6..0ebc940f395 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 6229c503bd0..c288acc200b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1522,46 +1522,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




[PATCH] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-17 Thread Carl Love

GCC maintainers:

The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge
built-ins.

The difference is that the overloaded built-ins return a vector of
booleans or a vector of long long boolean depending if the inputs were a
vector of floats or a vector of doubles.  The removed built-ins
returned a vector of floats or vector of double for the vector float and
vector double inputs respectively.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 10 ---
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..896d9686ac6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,30 +1579,20 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

   const vd __builtin_vsx_xvcmpgtdp (vd, vd);
 XVCMPGTDP vector_gtv2df {}
-
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the arguments.  The vec_cmp* built-ins 
return
+ a vector of boolenas of the same size as the arguments. Thus the 
result
+ assignment must be to a boolean or cast to a boolean.  Test both 
cases.

+  */
+
+  d[i][0] = (vector double) vec_cmpeq (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpgt (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpge (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpeq (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpgt (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpge (d[i][1], d[i][2]); i++;
+

[PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-16 Thread Carl Love



GCC maintainers:

Version 2, removed the lp64 from the target per discussion.  Tested and 
it is not needed.  The int128 qualifier is sufficient for the thest to 
report as unsupported on a 32-bit Power system.


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl



[PATCH] rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128 and lp64.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..e2d3c990852 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..007892e2731 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..df1bf873cfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target  int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




Re: [PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-16 Thread Carl Love

Peter:

On 7/15/24 4:14 PM, Peter Bergner wrote:

On 7/15/24 5:43 PM, Carl Love wrote:

-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */

Why isn't this just:

   /* { dg-do run { target int128 } } */

???   The int128 test should disable this on 32-bit systems just fine.


I agree it seems like that should work.  I had tried just the int128 
initially but was still getting errors so I added the


{ lp64 } and that fixed it.

That said, I went back and tried dg-do run { target int128 } again on one of 
the files.  Now it seems to work?  Hmm, I guess I must have had a typo or 
something when I first tried it.  I will try fixing the patch for all of the 
test files and retest to see if just int128 works.

Carl



[PATCH] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-15 Thread Carl Love

GCC maintainers:

The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl

--
rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128 and lp64.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128 and lp64.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..da3011d4c00 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..bc3cdb69305 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target { lp64 } && { int128 } } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..c9b8a2053b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target { lp64 } && { int128 } } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-07-03 Thread Carl Love

Kewen:

On 6/18/24 20:04, Kewen.Lin wrote:


Hi Carl,

on 2024/6/14 03:40, Carl Love wrote:

GCC maintainers:

The patch has been updated per the feedback from version 3.  Please let me know 
it the patch is acceptable for mainline.

Thanks.

   Carl

--

rs6000, remove vector set and vector init built-ins

The vector init built-ins:

   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
   __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
   result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
   __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
   __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
   __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
   src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
---
  gcc/config/rs6000/rs6000-builtins.def | 44 +++
  1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 02aa04e5698..053dc0115d2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
const signed short __builtin_vec_ext_v8hi (vss, signed int);
  VEC_EXT_V8HI nothing {extract}
  
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \

-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}

I just realized this {init} is customized for vec_init only, these removed 
vec_init
bifs are the only users of it, so we should remove this attribute as well.  
Sorry that
I should have found and pointed out this in the previous review.  I think it 
means
some removals are needed on:

 1) comments in rs6000-builtins.def
;   init Process as a vec_init function

 2) related gen code for this attribute bit, like:

   fprintf (header_file, "#define bif_init_bit\t\t(0x0001)\n");
   fprintf (header_file,
   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
   if (bifp->attrs.isinit)
fprintf (init_file, " | bif_init_bit");


OK, Yes, we can remove the attribute string for the vec_init built-in.  In 
addition to the code you mentioned, we will need to remove the uses of 
bif_init_bit, bif_is_init and the function altivec_expand_vec_init_builtin.

  Carl



Re: [PATCH 13/13 ver5] rs6000, remove vector set and vector init built-ins.

2024-07-03 Thread Carl Love

 GCC maintainers:

The patch has been updated to remove the customized vec_init built-in 
code.  Specfivically the init identifier, the related generated code for 
the init built-in attribute bit, function 
altivec_expand_vec_init_builtin and calls to the function.


Please let me know if the patch is acceptable for mainline. Thanks.

  Carl

---

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code. For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

The code to define the bif_init_bit, bif_is_init, as well as their uses
is removed.  The function altivec_expand_vec_init_builtin is also removed.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtin.cc (altivec_expand_vec_init_builtin):
    Removed the function.
    (rs6000_expand_builtin): Removed the if bif_is_int check to call
    the altivec_expand_vec_init_builtin function.
    * config/rs6000/rs6000-builtins.def: Removed the attribute string
    comment for init.
    (__builtin_vec_init_v16qi,
    __builtin_vec_init_v4sf, __builtin_vec_init_v4si,
    __builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
    __builtin_vec_init_v2df, __builtin_vec_init_v2di,
    __builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
    __builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
    built-in definitions.
    * config/rs6000-gen-builtins.cc: Removed comment for init attribute
    string.
    (struct attrinfo): Removed isint entry.
    (parse_bif_attrs): Removed the if statement to check for attribute
    init.
    (ifdef DEBUG): Removed print for init attribute string.
    (write_decls): Removed print for define bif_init_bit and
    define for bif_is_init.
    (write_bif_static_init): Removed if bifp->attrs.isinit statement.
---
 gcc/config/rs6000/rs6000-builtin.cc  | 40 -
 gcc/config/rs6000/rs6000-builtins.def    | 45 +++-
 gcc/config/rs6000/rs6000-gen-builtins.cc | 16 +++--
 3 files changed, 8 insertions(+), 93 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc

index 646e740774e..0a24d20a58c 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2313,43 +2313,6 @@ altivec_expand_predicate_builtin (enum insn_code 
icode, tree exp, rtx target)

   return target;
 }

-/* Expand vec_init builtin.  */
-static rtx
-altivec_expand_vec_init_builtin (tree type, tree exp, rtx target)
-{
-  machine_mode tmode = TYPE_MODE (type);
-  machine_mode inner_mode = GET_MODE_INNER (tmode);
-  int i, n_elt = GET_MODE_NUNITS (tmode);
-
-  gcc_assert (VECTOR_MODE_P (tmode));
-  gcc_assert (n_elt == call_expr_nargs (exp));
-
-  if (!target || !register_operand (target, tmode))
-    target = gen_reg_rtx (tmode);
-
-  /* If we have a vector compromised of a single element, such as 
V1TImode, do

- the initialization directly.  */
-  if (n_elt == 1 && GET_MODE_SIZE (tmode) == GET_MODE_SIZE (inner_mode))
-    {
-  rtx x = expand_normal (CALL_EXPR_ARG (exp, 0));
-  emit_move_insn (target, gen_lowpart (tmode, x));
-    }
-  else
-    {
-  rtvec v = rtvec_alloc (n_elt);
-
-  for (i = 0; i < n_elt; ++i)
-    {
-      rtx x = expand_normal (CALL_EXPR_ARG (exp, i));
-      RTVEC_ELT (v, i) = gen_lowpart (inner_mode, x);
-    }
-
-  rs6000_expand_vector_init (target, gen_rtx_PARALLEL (tmode, v));
-    }
-
-  return target;
-}
-
 /* Return the integer constant in ARG.  Constrain it to be in the range
    of the subparts of VEC_TYPE; issue an error if not.  */

@@ -3401,9 +3364,6 @@ rs6000_expand_builtin (tree exp, rtx target, rtx 
/

Re: [PATCH 4/13 ver5] rs6000, extend the current vec_{un, }signed{e, o} built-ins

2024-07-03 Thread Carl Love



GCC maintainers:

I moved the removal of built-ins __builtin_vsx_xvcvdpsxws and 
__builtin_vsx_xvcvdpuxws from patch 4 to  patch patch 2.


I fixed various issues with the ChangeLog wording, spaces and descriptions.

Fixed the comments in file gcc/config/rs6000/vsx.md.

Updated the built-in description in gcc/doc/extend.texi.

Please let me know if the patch is acceptable for mainline. Thanks.

Carl



 rs6000, extend the current vec_{un,}signed{e,o}  built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to a vector of signed/unsigned long long ints.
Extend the existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return a vector of even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have test cases.

Add testcases and update documentation.

gcc/ChangeLog:
    (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Rename to
    __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
    (XVCVSPSXDS, XVCVSPUXDS): Rename to VEC_VSIGNEDE_V4SF,
    VEC_VUNSIGNEDE_V4SF respectively.
    (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
    built-in definitions.
    * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
    vec_unsignede, vec_unsignedo): Add new overloaded specifications.
    * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
    vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
    * doc/extend.texi (vec_signedo, vec_signede, vec_unsignedo,
    vec_unsignede): Add documentation for new overloaded built-ins to
    convert vector float to vector {un,}signed long long.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-3-runnable.c
    (test_unsigned_int_result, test_ll_unsigned_int_result): Add
    new argument.
    (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
    tests for the overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 14 +++-
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 84 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
 5 files changed, 154 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 43d5c229dc3..29a9deb3410 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1697,11 +1697,17 @@
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}

-  const vsll __builtin_vsx_xvcvspsxds (vf);
-    XVCVSPSXDS vsx_xvcvspsxds {}
+  const vsll __builtin_vsignede_v4sf (vf);
+    VEC_VSIGNEDE_V4SF vsignede_v4sf {}

-  const vsll __builtin_vsx_xvcvspuxds (vf);
-    XVCVSPUXDS vsx_xvcvspuxds {}
+  const vsll __builtin_vsignedo_v4sf (vf);
+    VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
+
+  const vull __builtin_vunsignede_v4sf (vf);
+    VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
+
+  const vull __builtin_vunsignedo_v4sf (vf);
+    VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}

   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def

index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+    VEC_VSIGNEDE_V4SF

 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+    VEC_VSIGNEDO_V4SF

 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+    VEC_VUNSIGNEDE_V4SF

 [VEC_UNSIGNEDO, vec_unsignedo, __builtin_vec_vunsignedo]
   vui __builtin_vec_vunsignedo (vd);
 VEC_VUNSIGNEDO_V2DF
+  vull __builtin_vec_vunsignedo (vf);
+    VEC_VUNSIGNEDO_V4SF

 [VEC_VEE, vec_extract_exp, __builtin_vec_extract_exp]
   vui __builtin_vec_extract_exp (vf);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 48ba262f7e4..0f0837a1d43 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2704,6 +2704,90 @@
   DONE;
 })

+;; Convert float vector even elements to signed long long vector
+(define_expand "vsignede_v4sf"
+  [(match_operand:V2DI 0 "vsx_register_operand")
+   (match_

Re: [PATCH 2/13 ver5] rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

2024-07-03 Thread Carl Love

GCC maintainers:

Per the comments on patch 2 from version 4, I have moved the removal of 
built-ins __builtin_vsx_xvcvdpsxws and __builtin_vsx_xvcvdpuxws from patch 4 to 
this patch.

Please let me know if this patch is acceptable.  Thanks.

Carl



rs6000, __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}

The built-in __builtin_vsx_xvcvspsxws is covered by built-in vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

This patch removes the redundant built-ins.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
    __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws,
    __builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws): Remove
    built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 7c36976a089..60ccc5542be 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,36 +1688,21 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}

-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-    XVCVDPSXWS vsx_xvcvdpsxws {}
-
   const vsll __builtin_vsx_xvcvdpuxds (vd);
 XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}

   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}

-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-    XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-    XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}

   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}

-  const vsi __builtin_vsx_xvcvspsxws (vf);
-    XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}

-  const vsi __builtin_vsx_xvcvspuxws (vf);
-    XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
-
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}

--
2.45.0




[PATCH 0/13 ver5] rs6000, built-in cleanup patch series

2024-07-03 Thread Carl Love

GCC maintainers:

The following is the updates to the three patches that have yet to be approved.

Patches 1, 3, 5, 6, 8, 9, 10, and 12 were approved in the version 3 or earlier.

Patches 7 and 11 from version 4 were approved with minor nits fixed.

This leaves patches 2, 4 and 13 still to be approved. Only these unapproved 
patches are posted in the version 5 series.

The goal is to commit the entire series all at once as they are all related.  
So I a holding off committing the approved patches.

Thank you for your time and feedback of these patches.  The entire patch series 
has been tested on Power 10 LE as the changes are fairly minor.

Please let me know if the remaining patches are acceptable for mainline.  
Thanks.

 Carl



Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-03 Thread Carl Love



On 7/3/24 2:36 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:

GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, 
vec_st and vec_ste built-ins to make it clearer that there are data alignment 
requirements for these built-ins.  If the data alignment requirements are not 
followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
  gcc/doc/extend.texi | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
  @end smallexample
  
  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always

-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think 
it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words 
to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly.


Yea, probably should have left "even if the VSX instruction set is 
available."


I was looking to make it clear that if the data is not 16-bye aligned 
you may not get the expected data loaded/stored.


So how about the following instead:

   Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
   generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
   if the VSX
   instruction set is available. The instructions mask off the lower
   4-bits of
   the calculated address. The use of these instructions on data that
   is not
   16-byte aligned may result in unexpected bytes being loaded or stored.


+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and 
users
should read the description in PVIPR instead (it's more recommended).


The goal of mentioning the vec_lde and vec_ste built-ins was to give the 
user a pointer to built-ins that will work as expected on unaligned 
data.  It will probably save them a lot of time an frustration if they 
are given a hint of what built-ins they should look at.  So, how about 
the following:


   See the PVIPR description of the vec_lde and vec_ste for loading and
   storing
   data that is not 16-byte aligned.

   Carl


[PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-06-26 Thread Carl Love
GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, 
vec_st and vec_ste built-ins to make it clearer that there are data alignment 
requirements for these built-ins.  If the data alignment requirements are not 
followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
 gcc/doc/extend.texi | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
 @end smallexample
 
 Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The
+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.
 
 @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
 @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07
-- 
2.45.0



Re: [PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-25 Thread Carl Love
Kewen:

On 6/23/24 19:41, Kewen.Lin wrote:
> Hi,
> 
> on 2024/6/22 00:15, Carl Love wrote:
>> GCC maintainers:
>>
>> version 2, update the dg options per the feedback.  Retested the patch on 
>> Power 10 with no regressions.
>>
>> This patch updates the dg options.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> -- 
>> rs6000, altivec-1-runnable.c update the require-effective-target
>>
>> Update the dg test directives.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-1-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> index da8ebbc30ba..3f084c91798 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> @@ -1,6 +1,7 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> -/* { dg-require-effective-target powerpc_altivec_ok } */
>> -/* { dg-options "-maltivec" } */
>> +/* { dg-do run { target vmx_hw } } */
>> +/* { dg-do compile { target { ! vmx_hw } } } */
>> +/* { dg-options "-O2 -maltivec" } */
>> +/* { dg-require-effective-target powerpc_altivec } */
> 
> This one needs rebasing, "powerpc_altivec" has been adjusted on trunk.

Yes, this seems to be out of sync.  I will rebase on the current upstream tree 
and re-post.

 Carl  


[PATCH ver3] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-25 Thread Carl Love
GCC maintainers:

version 3, rebased on current mainline tree.  Version 2 of the patch was out of 
sync. Retested the patch on 
Power 10 with no regressions.

version 2, update the dg options per the feedback.  Retested the patch on Power 
10 with no regressions.

This patch updates the dg options.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 



rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index 4e32860a169..6763ff3ff8b 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,7 +1,9 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 
+
 #include 
 
 #ifdef DEBUG
-- 
2.45.0



Re: [PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-24 Thread Carl Love
Kewen:

On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>> GCC maintainers:
>>
>> Per the comments on patch 0004 from version 3, the removal of 
>> The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
>> moved to this patch.  The rest of the patch is unchanged from version 3.  
>> There were no comments on this patch for version 3.
>>
>> Please let me know if this patch is acceptable.  Thanks.
>>
>> Carl 
>>
>>
>> -
>>
>> rs6000, Remove __builtin_vsx_xvcvspsxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.
> 
> Nit: Maybe make it shorter like: Remove built-ins 
> __builtin_vsx_xvcv{sp{sx,u}ws,dpuxds_uns}
> 
>>
>> The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
> 
> Nit: Strictly speaking, not a duplicate of vec_signed but covered by it.
> 
>> built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
>> built-in is not documented and there are no test cases for it.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundant as it is covered by
>> vec_unsigned, remove.
> 
> As mentioned in the previous review, I'd expect patch 4/13 only focuses on
> extending vec_{un,}signed{e,o} for vector float (aka. __builtin_vsx_xvcvspsxds
> and __builtin_vsx_xvcvspuxds related), and this patch focuses on some built-in
> removals which have been covered by the existing vec_{un,}signed{,e,o}, so
> it can also drop the built-ins:
> 
> "The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
> vec_signed{e,o}, remove.
> 
> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
> vec_unsigned{e,o}, remove."
> 
> // copied from 4/13.

Not sure why I didn't move these two with the other two???  Sorry.

Moved them from patch 4.

  Carl 


Re: [PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-24 Thread Carl Love



On 6/18/24 20:03, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/14 03:40, Carl Love wrote:
>>
>> GCC maintainers:
>>
>> As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
>> __builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has 
>> been updated per the comments from version 3.
>>
>> Please let me know if this patch is acceptable for mainline.  
>>
>>  Carl 
>>
>> --
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
> 
> Nit: s/signed/a vector of signed/

Fixed.

> 
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
> 
> Likewise.

Fixed.

> 
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>
> 
> 
>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
> 
> As the comments in 2/13 v4 and the previous review comments, I preferred
> these two are moved to 2/13 as well (this patch should focus on extending).
> 

Moved to patch 2.

>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
>>  __builtin_vsx_xvcvdpuxws): Removed.
>>  (__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
> 
> Nit: s/Renamed/Rename to/

OK, fixed.

> 
>>  __builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
> 
> Likewise.

OK, fixed. 

> 
>>  (__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
>>  built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
> 
> Formatting nits: "..,.." -> ".., ..", "  " -> " "

OK, I fixed the various spacing issues.
> 
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
> 
> Likewise.

dito

> 
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation
>>  for new overloaded built-ins.
> 
> Missing vec_unsignedo and vec_unsignede, may be also mention for which
> types, like "converting vector float to vector {un,}signed long long".
> 

OK, fixed.

>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c
>>  (test_unsigned_int_result, test_ll_unsigned_int_result): Add
>>  new argument.
>>  (vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
>>  tests for the overloaded built-ins.
>> ---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 84 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
>>  5 files changed, 154 insertions(+), 17 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 322d27b7a0d..29a9deb3410 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,26 +1688,26 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds (vd);
>>  XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>>  
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS

[PATCH version 4] rs6000, altivec-2-runnable.c update the, require-effective-target

2024-06-21 Thread Carl Love
GCC maintainers:

version 4:  Additional dg option updates per the feedback.  Retested the patch 
on Power 10, no regressions.

version 3:  Updated per the feedback from Peter, Kewen and Segher.  Note, Peter 
suggested the -mdejagnu-cpu= value must be power7.  
The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
has been retested on a Power 10 box, it succeeds
with 2 passes and no fails.

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--
rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.  Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..660669f69fd 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,6 +1,6 @@
-/* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
+/* { dg-do run { target p8vector_hw } } */
+/* { dg-do compile { target { ! p8vector_hw } } } */
+/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
 /* { dg-require-effective-target powerpc_vsx } */
 
 #include 
-- 
2.45.0



[PATCH version 2] rs6000, altivec-1-runnable.c update the, require-effective-target

2024-06-21 Thread Carl Love
GCC maintainers:

version 2, update the dg options per the feedback.  Retested the patch on Power 
10 with no regressions.

This patch updates the dg options.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

-- 
rs6000, altivec-1-runnable.c update the require-effective-target

Update the dg test directives.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index da8ebbc30ba..3f084c91798 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,6 +1,7 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vmx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -maltivec" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:37, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:18, Carl Love wrote:
>> GCC maintainers:
>>
>> The dg options for this test should be the same as for altivec-2-runnable.c. 
>>  This patch updates the dg options to match 
>> the settings in altivec-2-runnable.c.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> --From
>>  289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c 
>> update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
> 
> This is not true, vec_unpackh and vec_unpackl doesn't require power8,
> vupk[hl]s[hb]/vupk[hl]px are all ISA 2.03.
> 
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-1-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> index da8ebbc30ba..c113089c13a 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
>> @@ -1,6 +1,7 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> -/* { dg-require-effective-target powerpc_altivec_ok } */
>> -/* { dg-options "-maltivec" } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> So this line should check for vmx_hw.

OK, fingers are used to typing vsx   Fixed.

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> With more thinking, I think it's better to use
> "-O2 -maltivec" to be consistent with the others.

OK, changed it back.  We now have:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -maltivec" } */
/* { dg-require-effective-target powerpc_altivec } */

The regression test runs fine with the above.  Two passes, no failures.


> 
> As mentioned in the other thread, powerpc_altivec
> effective target check should guarantee the altivec
> feature support, if any default cpu type or user
> specified option disable altivec, this test case
> will not be tested.  If we specify one cpu type
> specially here, it may cause confusion why it's
> different from the other existing ones.  So let's
> go without no specified cpu type.
> 
> Besides, similar to the request for altivec-1-runnable.c,
> could you also rename this to altivec-38.c?

OK, will change the names for the two test cases at the same time in a separate 
patch.
 
 Carl 


[PATCH] rs6000, change altivec*-runnable.c test file names

2024-06-21 Thread Carl Love
GCC maintainers:

Per the discussion of the dg header changes for test files altivec-1-runnable.c 
and altivec-2-runnable.c it was decided it would be best to change the names of 
the two tests to better align them with the tests that they are better aligned 
with.

This patch is dependent on the two patches to update the dg arguments for test 
files altivec-1-runnable.c and altivec-2-runnable.c being accepted and 
committed before this patch.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--
rs6000, change altivec*-runnable.c test file names

Changed the names of the test files.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.
---
 .../gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} | 0
 .../powerpc/{altivec-2-runnable.c => p8vector-builtin-9.c}| 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-1-runnable.c => altivec-38.c} 
(100%)
 rename gcc/testsuite/gcc.target/powerpc/{altivec-2-runnable.c => 
p8vector-builtin-9.c} (100%)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-38.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/altivec-38.c
diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
similarity index 100%
rename from gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
rename to gcc/testsuite/gcc.target/powerpc/p8vector-builtin-9.c
-- 
2.45.0



Re: [PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-21 Thread Carl Love
Kewen:

On 6/21/24 03:36, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/6/20 00:13, Carl Love wrote:
>> GCC maintainers:
>>
>> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
>> Peter suggested the -mdejagnu-cpu= value must be power7.  
>> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  
>> Patch has been retested on a Power 10 box, it succeeds
>> with 2 passes and no fails.
> 
> IMHO Peter's suggestion on power7 (-mdejagnu-cpu=power7) is mainly for
> altivec-1-runnable.c.  Both your testing and the comments in the test
> case show this altivec-2-runnable.c requires at least power8.

OK.  Per other thread changed altivec-1-runnable to power7.

> 
>>
>> Per the additional feedback after patch: 
>>
>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>   Author: Carl Love 
>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>
>>   rs6000, altivec-2-runnable.c should be a runnable test
>> 
>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>   test.  This patch changes the dg-do command argument to run.
>> 
>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>   argument to run.
>>
>> was approved and committed, I have updated the dg-require-effective-target
>> and dg-options as requested so the test will compile with -O2 on a 
>> machine that has a minimum support of Power 8 vector hardware.
>>
>> The patch has been tested on Power 10 with no regression failures.
>>
>> Please let me know if this patch is acceptable for mainline.  Thanks.
>>
>> Carl 
>>
>> 
>> rs6000, altivec-2-runnable.c update the require-effective-target
>>
>> The test requires a minimum of Power8 vector HW and a compile level
>> of -O2.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/altivec-2-runnable.c: Change the
>>  require-effective-target for the test.
>> ---
>>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
>> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> index 17b23eb9d50..9e7ef89327b 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,7 +1,7 @@
>> -/* { dg-do run } */
>> -/* { dg-options "-mvsx" } */
>> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>> -/* { dg-require-effective-target powerpc_vsx } */
>> +/* { dg-do run { target vsx_hw } } */
> 
> As this test case requires power8 and up, and dg-options specifies
> -mdejagnu-cpu=power8, we should use p8vector_hw instead of vsx_hw here,
> otherwise it will fail on power7 env.

Changed to p8vector_hw

> 
>> +/* { dg-do compile { target { ! vmx_hw } } } */
> 
> This condition should be ! , so ! p8vector_hw.

Changed. 

> 
>> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */> +/* { 
>> dg-require-effective-target powerpc_altivec } */
> 
> This should be powerpc_vsx instead, otherwise this case can still be
> tested with -mno-vsx -maltivec, then this test case would fail.

OK
> 
> Besides, as the discussion on the name of this test case, could you also
> rename this to p8vector-builtin-9.c instead?

Put the name change in a separate patch to change both test file names.
 
  Carl 


[PATCH] rs6000, altivec-1-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
GCC maintainers:

The dg options for this test should be the same as for altivec-2-runnable.c.  
This patch updates the dg options to match 
the settings in altivec-2-runnable.c.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--From
 289e15d215161ad45ae1aae7a5dedd2374737ec4 rs6000, altivec-1-runnable.c update 
the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
index da8ebbc30ba..c113089c13a 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
@@ -1,6 +1,7 @@
-/* { dg-do compile { target powerpc*-*-* } } */
-/* { dg-require-effective-target powerpc_altivec_ok } */
-/* { dg-options "-maltivec" } */
+/* { dg-do run { target vsx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH ver3] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
Everyone, Oops, this should be version 3 not 2.  Sorry.

  Carl 

On 6/19/24 09:13, Carl Love wrote:
> GCC maintainers:
> 
> version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, 
> Peter suggested the -mdejagnu-cpu= value must be power7.  
> The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
> has been retested on a Power 10 box, it succeeds
> with 2 passes and no fails.
> 
> Per the additional feedback after patch: 
> 
>   commit c892525813c94b018464d5a4edc17f79186606b7
>   Author: Carl Love 
>   Date:   Tue Jun 11 14:01:16 2024 -0400
> 
>   rs6000, altivec-2-runnable.c should be a runnable test
> 
>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>   test.  This patch changes the dg-do command argument to run.
> 
>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>   argument to run.
> 
> was approved and committed, I have updated the dg-require-effective-target
> and dg-options as requested so the test will compile with -O2 on a 
> machine that has a minimum support of Power 8 vector hardware.
> 
> The patch has been tested on Power 10 with no regression failures.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl 
> 
> 
> rs6000, altivec-2-runnable.c update the require-effective-target
> 
> The test requires a minimum of Power8 vector HW and a compile level
> of -O2.
> 
> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>   * gcc.target/powerpc/altivec-2-runnable.c: Change the
>   require-effective-target for the test.
> ---
>  gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
> b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> index 17b23eb9d50..9e7ef89327b 100644
> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -1,7 +1,7 @@
> -/* { dg-do run } */
> -/* { dg-options "-mvsx" } */
> -/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 
> } } } */
> -/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-do run { target vsx_hw } } */
> +/* { dg-do compile { target { ! vmx_hw } } } */
> +/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
> +/* { dg-require-effective-target powerpc_altivec } */
>  
>  #include 
>  


[PATCH ver2] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-19 Thread Carl Love
GCC maintainers:

version 2:  Updated per the feedback from Peter, Kewen and Segher.  Note, Peter 
suggested the -mdejagnu-cpu= value must be power7.  
The test fails if -mdejagnu-cpu= is set to power7, needs to be power8.  Patch 
has been retested on a Power 10 box, it succeeds
with 2 passes and no fails.

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 


rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..9e7ef89327b 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,7 +1,7 @@
-/* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-do run { target vsx_hw } } */
+/* { dg-do compile { target { ! vmx_hw } } } */
+/* { dg-options "-O2  -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target powerpc_altivec } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-18 Thread Carl Love
Kewen, Peter, Segher:

On 6/17/24 19:56, Kewen.Lin wrote:
> Hi,
> 
> on 2024/6/18 00:08, Peter Bergner wrote:
>> On 6/14/24 1:37 PM, Carl Love wrote:
>>> Per the additional feedback after patch: 
>>>
>>>   commit c892525813c94b018464d5a4edc17f79186606b7
>>>   Author: Carl Love 
>>>   Date:   Tue Jun 11 14:01:16 2024 -0400
>>>
>>>   rs6000, altivec-2-runnable.c should be a runnable test
>>> 
>>>   The test case has "dg-do compile" set not "dg-do run" for a runnable
>>>   test.  This patch changes the dg-do command argument to run.
>>> 
>>>   gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>>   * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
>>>   argument to run.
>>
>> Test case altivec-1-runnable.c seems to have the same issue, in that it
>> is currently a dg-do compile test case rather than the intended dg-do run.
> 
> Good catch!

OK, will update that as well.  I think it will need the same header as 
altivec-2-runnable.c
so once we have a final change for altivec-2-runnable.c, I will make the header 
for
altivec-1-runnable.c be the same.

> 
>> Can you have a look at changing that to dg-do run too?  My guess it that
>> this one will want something similar to some other altivec test cases, ala:
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -maltivec -mabi=altivec" } */
> 
> I'd expect the "-runnable" test case focuses on testing for run.  Normally,
> the one without "-runnable" would focus on testing for compiling (scan some
> desired insn), but this altivec-1.c and altivec-1-runnable.c seems to test
> for different things, maybe we should separate them into different names
> if they don't test for a same test point.

The altivec-1-runnable.c and altivec-2-runnable.c tests were added for various
built-ins that didn't have any test cases.  There wasn't an intention that 
there was 
any connection to the existing altivec-*.c test files.  I started creating 
runnable
when I started adding support for built-ins that we claimed to support but had 
never
actually been implemented.  I created runnable tests to make sure my 
implementation
actually worked.  I continued to add runnable tests for built-ins
that existed but didn't have a test case.  Adding runnable tests did find a 
couple
of issues where the existing implementation had a bug.  

That all said, if we want tochange the name of altivec-1-runnable.c and 
altivec-2-runnable.c a different naming scheme that is fine with me. Perhaps we 
should 
finish fixing the header for this test file, then do altivec-1-runnable, and 
then 
a final patch that does all the file renaming?

> 
>>
>> That said, I don't like not having a -mdejagnu-cpu=... here.
>> I think for our server cpus, this is fine, but on an embedded system
>> with a old ISA default for -mcpu=... (so we be doing a dg-do compile),
>> just adding -maltivec to that default may not make much sense for that
>> default and probably should be an error.  Maybe something like:
> 
> Yes, for some embedded cpus, there will be some error messages, but since
> we have powerpc_altivec_ok effective target, the error would make that
> effective target checking fail so I'd expect it'll stop it being tested
> (unsupported).
> 
>>
>> /* { dg-do run { target vmx_hw } } */
>> /* { dg-do compile { target { ! vmx_hw } } } */
>> /* { dg-require-effective-target powerpc_altivec_ok } */
>> /* { dg-options "-O2 -mdejagnu=power7" } */
>>
>> ...makes more sense?   Ke Wen & Segher, thoughts on that?
>> Ke Wen, should powerpc_altivec_ok be powerpc_altivec here???
> 
> Yes, I just pushed r15-1390 for this change.
> 
> BR,
> Kewen
> 

We had -mdejagnu=power8 before, but it looks like we want to go to power7 now.

It sounds like we want the following:

/* { dg-do run { target vmx_hw } } */
/* { dg-do compile { target { ! vmx_hw } } } */
/* { dg-options "-O2 -mdejagnu=power7" } */
/* { dg-require-effective-target powerpc_altivec } */

 Carl 


[PATCH] rs6000, altivec-2-runnable.c update the require-effective-target

2024-06-14 Thread Carl Love
GCC maintainers:

Per the additional feedback after patch: 

  commit c892525813c94b018464d5a4edc17f79186606b7
  Author: Carl Love 
  Date:   Tue Jun 11 14:01:16 2024 -0400

  rs6000, altivec-2-runnable.c should be a runnable test

  The test case has "dg-do compile" set not "dg-do run" for a runnable
  test.  This patch changes the dg-do command argument to run.

  gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
  * gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
  argument to run.

was approved and committed, I have updated the dg-require-effective-target
and dg-options as requested so the test will compile with -O2 on a 
machine that has a minimum support of Power 8 vector hardware.

The patch has been tested on Power 10 with no regression failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 

--

rs6000, altivec-2-runnable.c update the require-effective-target

The test requires a minimum of Power8 vector HW and a compile level
of -O2.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change the
require-effective-target for the test.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 17b23eb9d50..04c7d1ac70e 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
-/* { dg-options "-mvsx" } */
-/* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
-/* { dg-require-effective-target powerpc_vsx } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-require-effective-target p8vector_hw } */
 
 #include 
 
-- 
2.45.0



Re: [PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love
Segher:

On 6/13/24 12:51, Segher Boessenkool wrote:



> 
>> --- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
>> @@ -1,4 +1,4 @@
>> -/* { dg-do compile { target powerpc*-*-* } } */
>> +/* { dg-do run { target powerpc*-*-* } } */
>>  /* { dg-options "-mvsx" } */
>>  /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! 
>> has_arch_pwr8 } } } */
>>  /* { dg-require-effective-target powerpc_vsx } */
> 
> Everything in gcc.target/powerpc/ is tested for "target powerpc*-*-*"
> already, so you could remove that target clause even (after testing of
> course :-) )
> 
> Okay for trunk with or without that extra tweak.  Thank you!

I updated the patch by removing the target clause as suggested:

-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
 
Retested on Power 10.  Reports 2 passes and no failures.  I will go ahead and 
commit.

Thanks. 

   Carl 


[PATCH 4/13 ver4] rs6000, extend the current vec_{un,}signed{e,o}, built-ins

2024-06-13 Thread Carl Love


GCC maintainers:

As noted the removal of __builtin_vsx_xvcvdpuxds_uns and 
__builtin_vsx_xvcvspuxws was moved to patch 2 in the seris.  The patch has been 
updated per the comments from version 3.

Please let me know if this patch is acceptable for mainline.  

 Carl 

--

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: __builtin_vsx_xvcvdpsxws,
__builtin_vsx_xvcvdpuxws): Removed.
(__builtin_vsx_xvcvspsxds, __builtin_vsx_xvcvspuxds): Renamed
__builtin_vsignede_v4sf, __builtin_vunsignede_v4sf respectively.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(__builtin_vsignedo_v4sf, __builtin_vunsignedo_v4sf): New
built-in definitions.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New  define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation
for new overloaded built-ins.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c
(test_unsigned_int_result, test_ll_unsigned_int_result): Add
new argument.
(vec_signede, vec_signedo, vec_unsignede, vec_unsignedo): New
tests for the overloaded built-ins.
---  gcc/config/rs6000/rs6000-builtins.def | 20 ++---
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 84 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 49 +--
 5 files changed, 154 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 322d27b7a0d..29a9deb3410 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,26 +1688,26 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
   const vsll __builtin_vsx_xvcvdpuxds (vd);
 XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
 
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
-  const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+  const vsll __builtin_vsignede_v4sf (vf);
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsignedo_v4sf (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
+
+  const vull __builtin_vunsignede_v4sf (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vunsignedo_v4sf (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede (vf);
+VEC_VUNSIGNEDE

[PATCH 7/13 ver4] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

 Carl 

-

rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned/bool int128
arguments and return a signed/unsigned/bool int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded
vec_sel built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new
overloaded  definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-10-runnable.c: New runnable test
file.
* gcc.target/powerpc/builtins-10.c: New compile only test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |  12 +
 gcc/doc/extend.texi   |  20 ++
 .../gcc.target/powerpc/builtins-10-runnable.c | 220 ++
 .../gcc.target/powerpc/builtins-10.c  |  63 +
 5 files changed, 315 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-10.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index b90b3f34167..c969cd0f3f6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1907,12 +1907,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..6cec1ad4f1a 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,18 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vbq);
+VSEL_1TI  VSEL_1TI_B
+  vsq __builtin_vec_sel (vsq, vsq, vuq);
+VSEL_1TI  VSEL_1TI_U
+  vuq __builtin_vec_sel (vuq, vuq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_UB
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_UU
+  vbq __builtin_vec_sel (vbq, vbq, vbq);
+VSEL_1TI_UNS  VSEL_1TI_BB
+  vbq __builtin_vec_sel (vbq, vbq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_BU
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b1620274285..d7d8d149a43 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21420,6 +21420,26 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector bool __int128);
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector unsigned __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector bool __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector bool __int128);
+vector bool __int128 vec_sel (vector bool __int128,
+   vector bool __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
new file mode 100644
index 000..b7b4a95ea0e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -0,0 +1,220 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+vo

Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:59, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:16, Carl Love wrote:
>> This was patch 13 from the previous series.  Note the previous series patch 
>> 12 was dropped.  This patch is the same as the previous version.  The 
>> additional work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
>> __builtin_vec_set_v2d per the feedback comments with equivalent gimple code 
>> is being deferred to a future patch.  The goal of this series was simply to 
>> remove duplicated built-ins, extending overloaded built-ins as needed.  
>> Adding the needed gimple code to remove the additional built-ins is beyond 
>> the goal of this patch series.
>>
>>  Carl 
>> ---
>>
>> rs6000, remove vector set and vector init built-ins.
>>
>> The vector init built-ins:
>>
>>   __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
>>   __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
>>   __builtin_vec_init_v2di, __builtin_vec_init_v2df,
>>   __builtin_vec_set_v1ti
> 
> Typo here, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed.

> 
>>
>> perform the same operation as initializing the vector in C code.  For
>> example:
>>
>>   result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
>>   result_v4si = {1, 2, 3, 4};
>>
>> These two constructs were tested and verified they generate identical
>> assembly instructions with no optimization and -O3 optimization.
>>
>> The vector set built-ins:
>>
>>   __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>   __builtin_vec_set_v4si, __builtin_vec_set_v4sf
> 
> Please also add the reserved ones (...v1ti/v2di/v2df), as they are the 
> same too, temporarily reserving them for the uses in resolve_vec_insert()
> doesn't affect this.

Added the three additional built-ins to the list.

> 
>>
>> perform the same operation as setting a specific element in the vector in
>> C code.  For example:
>>
>>   src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
>>   src_v4si[index] = int_val;
>>
>> The built-in actually generates more instructions than the inline C code
>> with no optimization but is identical with -O3 optimizations.
>>
>> All of the above built-ins that are removed do not have test cases and
>> are not documented.
>>
>> Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
>> __builtin_vec_set_v2df are not removed as they are used in function
>> resolve_vec_insert() in file rs6000-c.cc.
>>
>> The built-ins are removed as they don't provide any benefit over just
>> using C code.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
>>  __builtin_vec_init_v8hi, __builtin_vec_init_v4si,
>>  __builtin_vec_init_v4sf, __builtin_vec_init_v2di,
>>  __builtin_vec_init_v2df, __builtin_vec_set_v1ti,
> 
> Typo, s/__builtin_vec_set_v1ti/__builtin_vec_init_v1ti/

Fixed

> 
>>  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
>>  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
>>  __builtin_vec_set_v2di, __builtin_vec_set_v2df,
>>  __builtin_vec_set_v1ti): Remove built-in definitions.
> 
> The last three ones are not actually removed.

OK, fixed.

> 
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 42 ++-
>>  1 file changed, 2 insertions(+), 40 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 48ebc018a8d..8349d45169f 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1118,37 +1118,6 @@
>>const signed short __builtin_vec_ext_v8hi (vss, signed int);
>>  VEC_EXT_V8HI nothing {extract}
>>  
>> -  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char, signed char, signed 
>> char, \
>> -signed char, signed char, signed char);
>> -VEC_INIT_V16QI nothing {init}
>> -
>> -  const vf __builtin_vec_init_v4sf (float, float, float, float);
>> -VEC_INIT_V4SF nothing {init}
>> -
>> -  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
>> - signed int);
>> -VEC_INIT_V4SI nothing {init}
>> -
>> -  const vss __bu

[PATCH 0/13 ver4] rs6000, built-in cleanup patch series

2024-06-13 Thread Carl Love
GCC maintainers:

I have addressed the comments to the five patches in the series that have not 
yet been approved.
The patches that have already been approved are 1, 3, 5, 6, 8, 9, 10, and 12.

The remaining patches all have fairly minor fixes requested.  I will just post 
version 4 of these patches here.  The goal is to commit the entire series all 
at once as they are all related.  So I a holding off committing the approved 
patches.  

Thank you for your time and feedback of these patches.  The entire patch series 
has been tested on Power 10 LE, Power 9 BE with no regression failures.

   Carl 


Re: [PATCH 13/13 ver4] rs6000, remove vector set and vector init built-ins

2024-06-13 Thread Carl Love
GCC maintainers:

The patch has been updated per the feedback from version 3.  Please let me know 
it the patch is acceptable for mainline.

Thanks.

  Carl 

--

rs6000, remove vector set and vector init built-ins

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_init_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf,
  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,
  __builtin_vec_set_v2df

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v4sf, __builtin_vec_init_v4si,
__builtin_vec_init_v8hi, __builtin_vec_init_v1ti,
__builtin_vec_init_v2df, __builtin_vec_init_v2di,
__builtin_vec_set_v16qi, __builtin_vec_set_v4sf,
__builtin_vec_set_v4si, __builtin_vec_set_v8hi): Remove
built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 44 +++
 1 file changed, 4 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 02aa04e5698..053dc0115d2 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,10 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
+;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
+;; in resolve_vec_insert are replaced by the equivalent gimple statements.
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



[PATCH 11/13 ver4] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love


GCC maintainers:

The patch has been updated per the comments from version 3.  Please let me know 
if the patch is acceptable for mainline.

Thanks.

 Carl 

-

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   4 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 237 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 6cec1ad4f1a..354f8fabe0f 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4936,6 +4936,10 @@
 XXPERMDI_2DI  XXPERMDI_VSLL
   vull __builtin_vsx_xxpermdi (vull, vull, const int);
 XXPERMDI_2DI  XXPERMDI_VULL
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1SQ
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1UQ
   vf __builtin_vsx_xxpermdi (vf, vf, const int);
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d7d8d149a43..9e45976436b 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22610,6 +22610,10 @@ void vec_vsx_st (vector bool char, int, signed char *);
 
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
+vector __int128 vec_xxpermdi (vector signed __int128,
+  vector signed __int128, const int);
+vector __int128 vec_xxpermdi (vector unsigned __int128,
+  vector unsigned __int128, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..0e0d77bcb84
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-maltivec -O2 " } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_

[PATCH 2/13 ver4] rs6000, Remove __builtin_vsx_xvcvspsxws,, __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

2024-06-13 Thread Carl Love
GCC maintainers:

Per the comments on patch 0004 from version 3, the removal of 
The built-in __builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws was 
moved to this patch.  The rest of the patch is unchanged from version 3.  There 
were no comments on this patch for version 3.

Please let me know if this patch is acceptable.  Thanks.

Carl 


-

rs6000, Remove __builtin_vsx_xvcvspsxws,
 __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws built-ins.

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundant as it is covered by
vec_unsigned, remove.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws):
Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 9 -
 1 file changed, 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..8cf0b715898 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1697,9 +1697,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsi __builtin_vsx_xvcvdpuxws (vd);
 XVCVDPUXWS vsx_xvcvdpuxws {}
 
@@ -1709,15 +1706,9 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
-
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
 
-- 
2.45.0



[PATCH] rs6000, altivec-2-runnable.c should be a runnable test

2024-06-13 Thread Carl Love


GCC maintainers:

The test gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c is supposed to 
be a runnable test
to verify the execution of the vec_unpackl and vec_unpackh built-ins.  The 
dg-do command is set to
compile not run.  This patch fixes the dg-do command argument.

The patch has been verified on a P10.  The test runs without errors.

Please let me know if the patch is acceptable.  Thanks.

Carl 

-

rs6000, altivec-2-runnable.c should be a runnable test

The test case has "dg-do compile" set not "dg-do run" for a runnable
test.  This patch changes the dg-do command argument to run.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-2-runnable.c: Change dg-do
argument to run.
---
 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
index 6975ea57e65..3e66435d0d2 100644
--- a/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-do run { target powerpc*-*-* } } */
 /* { dg-options "-mvsx" } */
 /* { dg-additional-options "-mdejagnu-cpu=power8" { target { ! has_arch_pwr8 } 
} } */
 /* { dg-require-effective-target powerpc_vsx } */
-- 
2.45.0



Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:10, Carl Love wrote:
>>  This was patch 10 from the previous series.  The patch was updated to 
>> address feedback comments.
>>
>> Carl 
>> ---
>>
>> rs6000, extend vec_xxpermdi built-in for __int128 args
>>
>> Add a new signed and unsigned overloaded instances for vec_xxpermdi
>>
>>__int128 vec_xxpermdi (__int128, __int128, const int);
>>__uint128 vec_xxpermdi (__uint128, __uint128, const int);
>>
>> Update the documentation to include a reference to the new built-in
>> instances.
>>
>> Add test cases for the new overloaded instances.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
>>  overloaded built-in instances.
>>  * doc/extend.texi:  Add documentation for new overloaded built-in
>>  instances.
>>
>> gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |   2 +
>>  .../powerpc/vec_perm-runnable-i128.c  | 229 ++
>>  3 files changed, 235 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index a210c5ad10d..45000f161e4 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -4932,6 +4932,10 @@
>>  XXPERMDI_4SF  XXPERMDI_VF
>>vd __builtin_vsx_xxpermdi (vd, vd, const int);
>>  XXPERMDI_2DF  XXPERMDI_VD
>> +  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TI
>> +  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>> +XXPERMDI_1TI  XXPERMDI_1TUI
> 
> Nits:
>   - Move them before "vf __builtin_vsx_xxpermdi (vf, vf, const int);" so
> they are close to instances for other integral types.
>   - As the existing name convention, _{SQ,UQ} are better.
> 
> vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
>XXPERMDI_1TI  XXPERMDI_1SQ
> vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
>XXPERMDI_1TI  XXPERMDI_1UQ
> 

OK, moved the definitions up and changed the names.

>>  
>>  [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
>>vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index 0756230b19e..edfef1bdab7 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char 
>> *);
>>  vector double vec_xxpermdi (vector double, vector double, const int);
>>  vector float vec_xxpermdi (vector float, vector float, const int);
>>  vector long long vec_xxpermdi (vector long long, vector long long, const 
>> int);
> 
>> +vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
>> +vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const 
>> int);
> 
> Nit: These two lines break the long long and unsigned long long lines, can 
> you move
> them one line upward?  Also using the explicit "signed" and "unsigned" would 
> be
> better than "__{u,}int128".
> 

Yup, I didn't get them in the right place.  Fixed.

>>  vector unsigned long long vec_xxpermdi (vector unsigned long long,
>>  vector unsigned long long, const 
>> int);
>>  vector int vec_xxpermdi (vector int, vector int, const int);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
>> b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> new file mode 100644
>> index 000..2d5dce09404
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
>> @@ -0,0 +1,229 @@
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vmx_hw } */
>> +/* { dg-options "-save-temps" } */
> 
> Nit: dg-options line isn't needed as it doesn't check assembly.

Removed the save-temps.

> 
> BR,
> Kewen
> 
>> +
>> +#include 
>> +
>> +#define DEBUG 0
>> +
>> +#if DEBUG
>> +#include 
>> +void print_i128 (unsigned __int128 val)
>> +{
>> +  printf(" 0x%016llx%016llx",
>> +   

Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-06-13 Thread Carl Love
Kewen:

On 6/3/24 22:58, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/30 00:03, Carl Love wrote:
>> This was patch 6 in the previous series.  Updated the documentation file per 
>> the comments.  No functional changes to the patch.
>>
>>   Carl 
>> 
>>
>> rs6000, add overloaded vec_sel with int128 arguments
>>
>> Extend the vec_sel built-in to take three signed/unsigned int128 arguments
>> and return a signed/unsigned int128 result.
>>
>> Extending the vec_sel built-in makes the existing buit-ins
>> __builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
>> patch removes these built-ins.
>>
>> The patch adds documentation and test cases for the new overloaded vec_sel
>> built-ins.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
>>  __builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
>>  * config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
>>  definitions.
>>  * doc/extend.texi: Add documentation for new vec_sel instances.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def |   6 -
>>  gcc/config/rs6000/rs6000-overload.def |   4 +
>>  gcc/doc/extend.texi   |  12 ++
>>  .../powerpc/vec-sel-runnable-i128.c   | 129 ++
>>  4 files changed, 145 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index 13e36df008d..ea0da77f13e 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1904,12 +1904,6 @@
>>const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
>>  XXSEL_16QI_UNS vector_select_v16qi_uns {}
>>  
>> -  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
>> -XXSEL_1TI vector_select_v1ti {}
>> -
>> -  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
>> -XXSEL_1TI_UNS vector_select_v1ti_uns {}
>> -
>>const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
>>  XXSEL_2DF vector_select_v2df {}
>>  
>> diff --git a/gcc/config/rs6000/rs6000-overload.def 
>> b/gcc/config/rs6000/rs6000-overload.def
>> index 4d857bb1af3..a210c5ad10d 100644
>> --- a/gcc/config/rs6000/rs6000-overload.def
>> +++ b/gcc/config/rs6000/rs6000-overload.def
>> @@ -3274,6 +3274,10 @@
>>  VSEL_2DF  VSEL_2DF_B
>>vd __builtin_vec_sel (vd, vd, vull);
>>  VSEL_2DF  VSEL_2DF_U
>> +  vsq __builtin_vec_sel (vsq, vsq, vsq);
>> +VSEL_1TI  VSEL_1TI_S
>> +  vuq __builtin_vec_sel (vuq, vuq, vuq);
>> +VSEL_1TI_UNS  VSEL_1TI_U
> 
> I just noticed that for integral types, such as: signed/unsigned int, we have 
> six instances:
> 
>   vsi __builtin_vec_sel (vsi, vsi, vbi);
> VSEL_4SI  VSEL_4SI_B
>   vsi __builtin_vec_sel (vsi, vsi, vui);
> VSEL_4SI  VSEL_4SI_U
>   vui __builtin_vec_sel (vui, vui, vbi);
> VSEL_4SI_UNS  VSEL_4SI_UB
>   vui __builtin_vec_sel (vui, vui, vui);
> VSEL_4SI_UNS  VSEL_4SI_UU
>   vbi __builtin_vec_sel (vbi, vbi, vbi);
> VSEL_4SI_UNS  VSEL_4SI_BB
>   vbi __builtin_vec_sel (vbi, vbi, vui);
> 
> It considers the control vector can only have unsigned and bool types, also 
> consider the
> return type can be bool.  It aligns with what PVIPR defines, so here we 
> should have:
> 
> vsq __builtin_vec_sel (vsq, vsq, vbq);
> vsq __builtin_vec_sel (vsq, vsq, vuq);
> vuq __builtin_vec_sel (vuq, vuq, vbq);
> vuq __builtin_vec_sel (vuq, vuq, vuq);
> vbq __builtin_vec_sel (vbq, vbq, vbq);
> vbq __builtin_vec_sel (vbq, vbq, vuq);
> 
> Sorry that I didn't find this in the previous review.

Yea, my bad I missed that as well.  Fixed to add all six instances.
> 
> 
>>  ; The following variants are deprecated.
>>vsll __builtin_vec_sel (vsll, vsll, vsll);
>>  VSEL_2DI_B  VSEL_2DI_S
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index b88e61641a2..0756230b19e 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
>> 64-bit PowerPC
>>  family of processors, for efficient use of 128-bit floating point
>>  (@code{__float128}) values.
>>  
>> +Vector select
>> +
>>

Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-06-13 Thread Carl Love
Kewen:

On 6/4/24 00:19, Kewen.Lin wrote:
> Hi,
> 
> on 2024/5/29 23:58, Carl Love wrote:
>> Updated the patch per the feedback comments from the previous version.
>>
>>  Carl 
>> ---
>>
>> rs6000, extend the current vec_{un,}signed{e,o} built-ins
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
>> convert a vector of floats to signed/unsigned long long ints.  Extend the
>> existing vec_{un,}signed{e,o} built-ins to handle the argument
>> vector of floats to return the even/odd signed/unsigned integers.
>>
>> The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
>> vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
>> built-ins.
>>
>> The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
>> now for internal use only. They are not documented and they do not
>> have testcases.
>>> The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
>> vec_signed{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
>> vec_unsigned{e,o}, remove.
>>
>> The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
>> vec_unsigned, remove.
>>
>> The __builtin_vsx_xvcvspuxws is redundante as it is covered by
>> vec_unsigned, remove.
> 
> I perfer to move these removals into sub-patch 2/13 or split them out into
> a new patch, since they don't match the subject of this patch.  Moving it
> to sub-patch 2/13 looks good as they are all about vec_{un,}signed{,e,o}.

Yes, we need to have all of the vec_unsigned in the same patch.  Moved 
__builtin_vsx_xvcvdpuxds_uns and __builtin_vsx_xvcvspuxws to patch 2.
> 
>>
>> Add testcases and update documentation.
>>
>> gcc/ChangeLog:
>>  * config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
>>  __builtin_vsx_xvcvspuxds_low): New built-in definitions.
>>  (__builtin_vsx_xvcvspuxds): Fix return type.
>>  (XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
>>  VEC_VUNSIGNEDE_V4SF respectively.
>>  (vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
>>  vunsignede_v4sf respectively.
>>  (__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
>>  __builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
>>  * config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
>>  vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
>>  * config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
>>  vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
>>  * doc/extend.texi (vec_signedo, vec_signede): Add documentation.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
>>  overloaded built-ins.
>> ---
>>  gcc/config/rs6000/rs6000-builtins.def | 25 ++
>>  gcc/config/rs6000/rs6000-overload.def |  8 ++
>>  gcc/config/rs6000/vsx.md  | 88 +++
>>  gcc/doc/extend.texi   | 10 +++
>>  .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
>>  5 files changed, 157 insertions(+), 25 deletions(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
>> b/gcc/config/rs6000/rs6000-builtins.def
>> index bf9a0ae22fc..cea2649b86c 100644
>> --- a/gcc/config/rs6000/rs6000-builtins.def
>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>> @@ -1688,32 +1688,23 @@
>>const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
>>  XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
>>  
>> -  const vsi __builtin_vsx_xvcvdpsxws (vd);
>> -XVCVDPSXWS vsx_xvcvdpsxws {}
>> -
>> -  const vsll __builtin_vsx_xvcvdpuxds (vd);
>> -XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
>> -
>>const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
>>  XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
>>  
>> -  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
>> -XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
>> -
>> -  const vsi __builtin_vsx_xvcvdpuxws (vd);
>> -XVCVDPUXWS vsx_xvcvdpuxws {}
>> -
>>const vd __builtin_vsx_xvcvspdp (vf);
>>  XVCVSPDP vsx_xvcvspdp {}
>>  
>>const vsll __builtin_vsx_xvcvspsxds (vf);
>> -XVCVSPSXDS vsx_xvcvspsxds {}
>> +VEC_VSIGNEDE_V4SF vsignede_v4sf {}
> 
> We should rename __builtin_vsx_xvcvspsxds to
> __builtin_vsx_vsignede_v4sf, one reason is to align with

Re: [PATCH 1/13 ver 3] rs6000, Remove __builtin_vsx_cmple* builtins

2024-06-05 Thread Carl Love
Kewen:

On 6/3/24 23:00, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2024/5/29 23:52, Carl Love wrote:
>> This patch was approved in the previous series.  There are no changes to 
>> this patch.  Reposting for completeness. 
> I guess you can just push the approved ones, as there is no dependency
> between any two of them?  It can help to reduce the size of this series.

The patches do touch some similar files so they are not completely independent 
from a patch standpoint.  Functionally they are all independent.

I tried applying the approved patches only to the current mainline tree.  The 
approved patches were: 1,3,5 (with tweak), 6, 8, 9, 10, 12.  Patch 5 requires a 
little rebasing due to a little fuzz in the lines.  Not a big deal.  Patch 8 
also doesn't apply cleanly with git.  The patch command gets a little confused 
when I tried to use it, so I had to manually "recreate" the patch.  The changes 
are straight forward so that is fairly easy.  The rest of the patches applied 
cleanly with git. I am guessing there will be some rebasing needed for the 
non-approved patches to apply them after the approved patches.

The main reason that I reposted everything was that the patch numbers changed 
and I wanted it to be fairly clear what was going on.  

I toyed with the idea of committing the 8 approved patches and then working on 
the additional 5 but I think that is hard as I would have to manually adjust 
the patch numbers to keep them lined up with version 3 or version 4 has a new 
numbering patches 1 to 5 (i.e. remapping of version 3 patch numbers).  Either 
way I think it would be hard/confusing. 

Given that separating out the approved and non-approved patches causes some 
re-basing issues, it is probably best to just update the 5 patches, posting 
them as version 4 and not re-post the whole series. I will just note in the 
header patch 0/13 the patches that have already been approved.  I hope that is 
ok?

 Carl 


Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-05-29 Thread Carl Love
This was patch 13 from the previous series.  Note the previous series patch 12 
was dropped.  This patch is the same as the previous version.  The additional 
work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
__builtin_vec_set_v2d per the feedback comments with equivalent gimple code is 
being deferred to a future patch.  The goal of this series was simply to remove 
duplicated built-ins, extending overloaded built-ins as needed.  Adding the 
needed gimple code to remove the additional built-ins is beyond the goal of 
this patch series.

 Carl 
---

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_set_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v8hi, __builtin_vec_init_v4si,
__builtin_vec_init_v4sf, __builtin_vec_init_v2di,
__builtin_vec_init_v2df, __builtin_vec_set_v1ti,
__builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
__builtin_vec_set_v4si, __builtin_vec_set_v4sf,
__builtin_vec_set_v2di, __builtin_vec_set_v2df,
__builtin_vec_set_v1ti): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 42 ++-
 1 file changed, 2 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 48ebc018a8d..8349d45169f 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,8 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



Re: [PATCH 12/13 ver 3] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-29 Thread Carl Love
This was patch 11 from the previous series.  Patch was updated to address 
feedback comments.

   Carl 
--

rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
__builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
there are no test cases for it.  The patch removes built-in
__builtin_vsx_xvcmpeqsp_p.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 64690b9b9b5..48ebc018a8d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1619,9 +1619,6 @@
   const vf __builtin_vsx_xvcmpeqsp (vf, vf);
 XVCMPEQSP vector_eqv4sf {}
 
-  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
-XVCMPEQSP_P vector_eq_v4sf_p {pred}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}
 
-- 
2.45.0



Re: [PATCH 9/13 ver 3] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-29 Thread Carl Love
This was patch 8 in the previous series.  Updated patch per the feedback 
comments.

Carl 


rs6000, remove __builtin_vsx_vperm_* built-ins

The undocumented built-ins:
  __builtin_vsx_vperm_16qi_uns,
  __builtin_vsx_vperm_1ti,
  __builtin_vsx_vperm_1ti_uns,
  __builtin_vsx_vperm_2df,
  __builtin_vsx_vperm_2di,
  __builtin_vsx_vperm_2di_uns,
  __builtin_vsx_vperm_4sf,
  __builtin_vsx_vperm_4si,
  __builtin_vsx_vperm_4si_uns

are duplicats of the __builtin_altivec_* builtins that are used by
the overloaded vec_perm built-in that is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
built-in definitions and comments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
__builtin_vsx_vperm): Change call to built-in to the  overloaded
built-in vec_perm.
---
 gcc/config/rs6000/rs6000-builtins.def | 33 ---
 .../gcc.target/powerpc/vsx-builtin-3.c| 22 ++---
 2 files changed, 11 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index a78c52183bc..f02a8c4de45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1529,39 +1529,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}
 
-; These are duplicates of __builtin_altivec_* counterparts, and are being
-; kept for backwards compatibility.  The reason for their existence is
-; unclear.  TODO: Consider deprecation/removal at some point.
-  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
-VPERM_16QI_X altivec_vperm_v16qi {}
-
-  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
-VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
-
-  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
-VPERM_1TI_X altivec_vperm_v1ti {}
-
-  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
-VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
-
-  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
-VPERM_2DF_X altivec_vperm_v2df {}
-
-  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
-VPERM_2DI_X altivec_vperm_v2di {}
-
-  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
-VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
-
-  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
-VPERM_4SF_X altivec_vperm_v4sf {}
-
-  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
-VPERM_4SI_X altivec_vperm_v4si {}
-
-  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
-VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
-
   const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
 VPERM_8HI_X altivec_vperm_v8hi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index e20d3f03c86..f06d871b6b1 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -88,17 +88,17 @@ int do_perm(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
-
-  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
+
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
 
   return i;
 }
-- 
2.45.0



Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-29 Thread Carl Love
 This was patch 10 from the previous series.  The patch was updated to address 
feedback comments.

Carl 
---

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 235 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index a210c5ad10d..45000f161e4 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4932,6 +4932,10 @@
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
 XXPERMDI_2DF  XXPERMDI_VD
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1TI
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1TUI
 
 [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
   vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0756230b19e..edfef1bdab7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char *);
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
+vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
+vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
 vector int vec_xxpermdi (vector int, vector int, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..2d5dce09404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_vresult_u128;
+
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = src_va_s128 << 64; 
+  src_va_s128

Re: [PATCH 10/13 ver 3] rs6000, remove __builtin_vsx_xvnegdp and, __builtin_vsx_xvnegsp built-ins

2024-05-29 Thread Carl Love
 This was patch 9 in the previous series.  It was previously approved.  
Reposting for completeness.

 Carl
-

rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
redundant.  The overloaded vec_neg built-in provides the same
functionality.  The two buit-ins are not documented nor are there any
test cases for them.

Remove the definitions so users will use the overloaded vec_neg built-in
which is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
__builtin_vsx_xvnegsp): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f02a8c4de45..64690b9b9b5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1736,12 +1736,6 @@
   const vf __builtin_vsx_xvnabssp (vf);
 XVNABSSP vsx_nabsv4sf2 {}
 
-  const vd __builtin_vsx_xvnegdp (vd);
-XVNEGDP negv2df2 {}
-
-  const vf __builtin_vsx_xvnegsp (vf);
-XVNEGSP negv4sf2 {}
-
   const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
 XVNMADDDP nfmav2df4 {}
 
-- 
2.45.0



Re: [PATCH 8/13 ver 3] rs6000, remove the vec_xxsel built-ins, they are, duplicates

2024-05-29 Thread Carl Love
This was patch 7 in the previous series.  Patch was updated to address the 
feedback comments.

Carl 


rs6000, remove the vec_xxsel built-ins, they are duplicates

The following undocumented built-ins are covered by the existing overloaded
vec_sel built-in definitions.

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so users will only
use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
16qi, 4sf, 2df] tests are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di,__builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf,__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns,__builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removebuilt-in definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel): Change built-in call to overloaded built-in
call vec_sel.
---
 gcc/config/rs6000/rs6000-builtins.def | 30 
 .../gcc.target/powerpc/vsx-builtin-3.c| 36 ++-
 2 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ea0da77f13e..a78c52183bc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1898,36 +1898,6 @@
   const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
 XXPERMDI_8HI vsx_xxpermdi_v8hi {}
 
-  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
-XXSEL_16QI vector_select_v16qi {}
-
-  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
-XXSEL_16QI_UNS vector_select_v16qi_uns {}
-
-  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
-XXSEL_2DF vector_select_v2df {}
-
-  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
-XXSEL_2DI vector_select_v2di {}
-
-  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
-XXSEL_2DI_UNS vector_select_v2di_uns {}
-
-  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
-XXSEL_4SF vector_select_v4sf {}
-
-  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
-XXSEL_4SI vector_select_v4si {}
-
-  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
-XXSEL_4SI_UNS vector_select_v4si_uns {}
-
-  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
-XXSEL_8HI vector_select_v8hi {}
-
-  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
-XXSEL_8HI_UNS vector_select_v8hi_uns {}
-
   const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
 XXSLDWI_16QI vsx_xxsldwi_v16qi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index ff875c55304..e20d3f03c86 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -37,6 +37,8 @@
 /* { dg-final { scan-assembler "xvcvsxdsp" } } */
 /* { dg-final { scan-assembler "xvcvuxdsp" } } */
 
+#include 
+
 extern __vector int si[][4];
 extern __vector short ss[][4];
 extern __vector signed char sc[][4];
@@ -61,23 +63,23 @@ int do_sel(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
-  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
-  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
-  f[i][0] = __built

Re: [PATCH 6/13 ver 3] rs6000, remove duplicated built-ins of vecmergl and, vec_mergeh

2024-05-29 Thread Carl Love
This was patch 5 in the previous series.  It was previously approved.  Not 
changes in this version.  Being posted for completeness.

 Carl 


rs6000, remove duplicated built-ins of vecmergl and
 vec_mergeh

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

This patch removes the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
patch removes the now unused define_expands for vsx_xxmrghw_ and
vsx_xxmrglw_.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.
* config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
Remove unused define_expands.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 gcc/config/rs6000/vsx.md  | 41 ---
 3 files changed, 57 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index ac9f16fe51a..f83d65b06ef 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 6049f3a4599..13e36df008d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1877,18 +1877,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 
-  const vf __builtin_vsx_xxmrghw (vf, vf);
-XXMRGHW_4SF vsx_xxmrghw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
-XXMRGHW_4SI vsx_xxmrghw_v4si {}
-
-  const vf __builtin_vsx_xxmrglw (vf, vf);
-XXMRGLW_4SF vsx_xxmrglw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
-XXMRGLW_4SI vsx_xxmrglw_v4si {}
-
   const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
 XXPERMDI_16QI vsx_xxpermdi_v16qi {}
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index a8f3d459232..4402b8b01d5 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4875,47 +4875,6 @@ (define_insn "vsx_xxspltd_"
 }
   [(set_attr "type" "vecperm")])
 
-;; V4SF/V4SI interleave
-(define_expand "vsx_xxmrghw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-(vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 0) (const_int 4)
-(const_int 1) (const_int 5)])))]
-  "VECTOR_MEM_VSX_P (mode)"
-{
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
-: gen_altivec_vmrglw_direct_;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  [(set_attr "type" "vecperm")])
-
-(define_expand "vsx_xxmrglw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-   (vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 2) (const_int 6)
-(cons

Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-05-29 Thread Carl Love
This was patch 6 in the previous series.  Updated the documentation file per 
the comments.  No functional changes to the patch.

  Carl 


rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned int128 arguments
and return a signed/unsigned int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded vec_sel
built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |  12 ++
 .../powerpc/vec-sel-runnable-i128.c   | 129 ++
 4 files changed, 145 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 13e36df008d..ea0da77f13e 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1904,12 +1904,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..a210c5ad10d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,10 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vsq);
+VSEL_1TI  VSEL_1TI_S
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_U
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b88e61641a2..0756230b19e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector signed __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
new file mode 100644
index 000..d82225cc847
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+/* { dg-final { scan-assembler-times "xxsel" 2 } } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will run
+ with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;

Re: [PATCH 5/13 ver 3] rs6000, Remove redundant float/double type conversions

2024-05-29 Thread Carl Love
This is a new patch to removed the built-ins that were inadvertently missing in 
the previous series.

  Carl 
--

rs6000, Remove redundant float/double type conversions

The following built-ins are redundant as they are covered by another
overloaded built-in.

  __builtin_vsx_xvcvspdp covered by vec_double{e,o}
  __builtin_vsx_xvcvdpsp covered by vec_float{e,o}
  __builtin_vsx_xvcvsxwdp covered by vec_double{e,o}
  __builtin_vsx_xvcvuxddp_uns covered by  vec_double

Remove the redundant built-ins. They are not documented nor do they have
test cases.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp,
__builtin_vsx_xvcvuxddp_uns): Remove.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 1 file changed, 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index cea2649b86c..6049f3a4599 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1679,9 +1679,6 @@
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}
 
-  const vf __builtin_vsx_xvcvdpsp (vd);
-XVCVDPSP vsx_xvcvdpsp {}
-
   const vsll __builtin_vsx_xvcvdpsxds (vd);
 XVCVDPSXDS vsx_fix_truncv2dfv2di2 {}
 
@@ -1691,9 +1688,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vd __builtin_vsx_xvcvspdp (vf);
-XVCVSPDP vsx_xvcvspdp {}
-
   const vsll __builtin_vsx_xvcvspsxds (vf);
 VEC_VSIGNEDE_V4SF vsignede_v4sf {}
 
@@ -1715,9 +1709,6 @@
   const vf __builtin_vsx_xvcvsxdsp (vsll);
 XVCVSXDSP vsx_xvcvsxdsp {}
 
-  const vd __builtin_vsx_xvcvsxwdp (vsi);
-XVCVSXWDP vsx_xvcvsxwdp {}
-
   const vf __builtin_vsx_xvcvsxwsp (vsi);
 XVCVSXWSP vsx_floatv4siv4sf2 {}
 
@@ -1727,9 +1718,6 @@
   const vd __builtin_vsx_xvcvuxddp_scale (vsll, const int<5>);
 XVCVUXDDP_SCALE vsx_xvcvuxddp_scale {}
 
-  const vd __builtin_vsx_xvcvuxddp_uns (vull);
-XVCVUXDDP_UNS vsx_floatunsv2div2df2 {}
-
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-- 
2.45.0



Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-29 Thread Carl Love
Updated the patch per the feedback comments from the previous version.

 Carl 
---

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundante as it is covered by
vec_unsigned, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
__builtin_vsx_xvcvspuxds_low): New built-in definitions.
(__builtin_vsx_xvcvspuxds): Fix return type.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
vunsignede_v4sf respectively.
(__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 25 ++
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 88 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
 5 files changed, 157 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index bf9a0ae22fc..cea2649b86c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,32 +1688,23 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
-  const vsll __builtin_vsx_xvcvdpuxds (vd);
-XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
   const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsx_xvcvspsxds_low (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vsx_xvcvspuxds (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
+  const vull __builtin_vsx_xvcvspuxds_low (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede 

Re: [PATCH 3/13 ver 3] rs6000, fix error in unsigned vector float to unsigned int built-in definition

2024-05-29 Thread Carl Love
This patch was updated per the feedback comment from the previous version in 
series 2.

 Carl 
---

rs6000, fix error in unsigned vector float to unsigned int built-in definitions

The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of
doubles and return a vector of unsigned long long ints.  Similarly
__builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to
return a vector of unsinged ints.  The definitions are using the signed
version of the instructions not the unsigned version of the instruction.
The results should also be unsigned.  The builtins are used by the
overloaded vec_unsigned builtin which has an unsigned result.

Similarly the built-ins __builtin_vsx_vunsignede_v2df and
__builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result.
If the floating point argument is negative, the unsigned result is zero.
The built-ins are used in the overloaded built-in vec_unsignede and
vec_unsignedo respectively.

Add a test cases for a negative floating point arguments for each of the
above built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
__builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
__builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: Add tests for
vec_unsignede and vec_unsignedo with negative arguments.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 .../gcc.target/powerpc/builtins-3-runnable.c  | 30 +--
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index c6d2ea1bc39..bf9a0ae22fc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1580,16 +1580,16 @@
   const vsi __builtin_vsx_vsignedo_v2df (vd);
 VEC_VSIGNEDO_V2DF vsignedo_v2df {}
 
-  const vsll __builtin_vsx_vunsigned_v2df (vd);
-VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
+  const vull __builtin_vsx_vunsigned_v2df (vd);
+VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
 
-  const vsi __builtin_vsx_vunsigned_v4sf (vf);
-VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
+  const vui __builtin_vsx_vunsigned_v4sf (vf);
+VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
 
-  const vsi __builtin_vsx_vunsignede_v2df (vd);
+  const vui __builtin_vsx_vunsignede_v2df (vd);
 VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
 
-  const vsi __builtin_vsx_vunsignedo_v2df (vd);
+  const vui __builtin_vsx_vunsignedo_v2df (vd);
 VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
 
   const vf __builtin_vsx_xscvdpsp (double);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 0231a1fd086..5dcdfbee791 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -313,6 +313,14 @@ int main()
test_unsigned_int_result (ALL, vec_uns_int_result,
  vec_uns_int_expected);
 
+   /* Convert single precision float to  unsigned int.  Negative
+  arguments.  */
+   vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
+   vec_uns_int_result = vec_unsigned (vec_flt0);
+   test_unsigned_int_result (ALL, vec_uns_int_result,
+ vec_uns_int_expected);
+
/* Convert double precision float to long long unsigned int */
vec_dble0 = (vector double){124.930, 8134.49};
vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
@@ -320,10 +328,18 @@ int main()
test_ll_unsigned_int_result (vec_ll_uns_int_result,
 vec_ll_uns_int_expected);
 
+   /* Convert double precision float to long long unsigned int. Negative
+  arguments.  */
+   vec_dble0 = (vector double){-24.93, -134.9};
+   vec_ll_uns_int_expected = (vector long long unsigned int){0, 0};
+   vec_ll_uns_int_result = vec_unsigned (vec_dble0);
+   test_ll_unsigned_int_result (vec_ll_uns_int_result,
+vec_ll_uns_int_expected);
+
/* Convert double precision vector float to vector unsigned int,
-  even words */
-   vec_dble0 = (vector double){3124.930, 8234.49};
-   vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
+  even words.  Negative arguments */
+   vec_dble0 = (vector double){-124.930, -234.49};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
vec_uns_int_result = vec_unsignede (vec_dble0);
test_unsigned_int_result (EVEN, vec_uns_int_result,
  vec_uns_int_expected);
@@ -335,5 +351,13 @@ int main()
vec_uns_int_resul

  1   2   3   4   5   >