Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

Kewen.Lin via Gcc-patches Mon, 22 May 2023 20:36:52 -0700

on 2023/5/23 03:50, Carl Love wrote:
> On Mon, 2023-05-22 at 17:04 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/11 02:06, Carl Love via Gcc-patches wrote:
>>> GCC maintainers:
>>>
>>> The following patch fixes errors in the arguments in the
>>> __builtin_altivec_tr_stxvrhx,       __builtin_altivec_tr_stxvrwx
>>> builtin
>>> definitions.  Note, these builtins are used by the overloaded
>>> __builtin_vec_xst_trunc builtin.
>>>
>>> The patch adds a new overloaded builtin definition for
>>> __builtin_vec_xst_trunc for the third argument to be unsigned and
>>> signed long int.
>>>
>>> A new testcase is added for the various overloaded versions of
>>> __builtin_vec_xst_trunc.
>>>
>>> The patch has been tested on Power 10 with no new regressions.
>>>
>>> Please let me know if the patch is acceptable for
>>> mainline.  Thanks.
>>>
>>>                     Carl
>>>
>>> -------------------------------------------
>>> rs6000: Fix __builtin_vec_xst_trunc definition
>>>
>>> Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx
>>> and __builtin_altivec_tr_stxvrwx to handle the short and word
>>> cases.  The
>>> arguments for these two builtins are wrong.  This patch fixes the
>>> wrong
>>> arguments for the builtins.
>>>
>>> Additionally, the patch adds a new __builtin_vec_xst_trunc
>>> overloaded
>>> version for the destination being signed or unsigned long int.
>>>
>>> A runnable test case is added to test each of the overloaded
>>> definitions
>>> of __builtin_vec_xst_tru
>>>
>>> gcc/
>>>     * config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx,
>>>     __builtin_altivec_tr_stxvrwx): Fix type of second argument.
>>>     Add, definition for send argument to be signed long.
>>>     * config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc):
>>>     add definition with thrird arument signed and unsigned long.
>>>     * doc/extend.texi (__builtin_vec_xst_trunc): Add documentation
>>> for
>>>     new unsinged long and signed long versions.
>>>
>>> gcc/testsuite/
>>>     * gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case
>>>     for __builtin_vec_xst_trunc builtin.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def         |   7 +-
>>>  gcc/config/rs6000/rs6000-overload.def         |   4 +
>>>  gcc/doc/extend.texi                           |   2 +
>>>  .../powerpc/vsx-builtin-vec_xst_trunc.c       | 217
>>> ++++++++++++++++++
>>>  4 files changed, 228 insertions(+), 2 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-
>>> vec_xst_trunc.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 638d0bc72ca..a378491b358 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -3161,12 +3161,15 @@
>>>    void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char
>>> *);
>>>      TR_STXVRBX vsx_stxvrbx {stvec}
>>>  
>>> -  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int
>>> *);
>>> +  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed
>>> short *);
>>>      TR_STXVRHX vsx_stxvrhx {stvec}
>>>  
>>> -  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed
>>> short *);
>>> +  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int
>>> *);
>>>      TR_STXVRWX vsx_stxvrwx {stvec}
>>
>> Good catching!
>>
>>>  
>>> +  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long
>>> *);
>>> +    TR_STXVRLX vsx_stxvrdx {stvec}
>>> +
>>
>> This is mapped to the one used for type long long, it's a hard
>> mapping,
>> IMHO it's wrong and not consistent with what the users expect, since
>> on Power
>> the size of type long int is 4 bytes at -m32 while 8 bytes at -m64,
>> this
>> implementation binding to 8 bytes can cause trouble in 32-bit.  I
>> wonder if
>> it's a good idea to add one overloaded version for type long int, for
>> now
>> openxl also emits error message for long int type pointer (see its
>> doc [1]),
>> users can use casting to make it to the acceptable pointer types
>> (long long
>> or int as its size).
>>
>> [1] 
>> https://www.ibm.com/docs/en/openxl-c-and-cpp-lop/17.1.1?topic=functions-vec-xst-trunc
>>
>>
> 
> If I understand this correctly, the "signed long" is mapped to type
> "long long int"?  Just curious, where is the mapping done?


Sorry for the confusion, the mapping here is for your implementation,

>>> +  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long
>>> *);
>>> +    TR_STXVRLX vsx_stxvrdx {stvec}

you used the one **vsx_stxvrdx** which is for "signed long long int"
(doubleword, exactly 8 bytes size element), so I said you have a hard
mapping, takes "signed long int" as "signed long long int".

But again type signed long int isn't guaranteed to be with the same size
as type signed long long int, on Power it has the same size as int (4 bytes)
on 32-bit env, while has the same size as long long (8 bytes) on 64-bit
env, since your test case is guarded with power10, you didn't get a chance
to run it on 32-bit env yet, that's why you didn't spot any failures.

> 
> So I believe you would like to have an additional overloaded
> definition:
> 
>  void __builtin_vec_xst_trunc (vuq, signed long long, long *);
> 
> Am I understanding this correctly?  I added the above definition.
> 

No, I questioned that if it's really a good idea to add one overloaded
for signed long, openxl also only supports int and long long types and
emits error message for long type, users can just do the pointer type
casting when they have signed long type pointer.  Do we have any
requirements for this long type, could we just keep supporting int and
long long, but not long (not adding this new overloaded definition)?

BR,
Kewen

>>>    void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long
>>> long *);
>>>      TR_STXVRDX vsx_stxvrdx {stvec}
>>>  
>>> diff --git a/gcc/config/rs6000/rs6000-overload.def
>>> b/gcc/config/rs6000/rs6000-overload.def
>>> index c582490c084..54b7ae5e51b 100644
>>> --- a/gcc/config/rs6000/rs6000-overload.def
>>> +++ b/gcc/config/rs6000/rs6000-overload.def
>>> @@ -4872,6 +4872,10 @@
>>>      TR_STXVRWX  TR_STXVRWX_S
>>>    void __builtin_vec_xst_trunc (vuq, signed long long, unsigned
>>> int *);
>>>      TR_STXVRWX  TR_STXVRWX_U
>>> +  void __builtin_vec_xst_trunc (vsq, signed long long, signed long
>>> *);
>>> +    TR_STXVRLX  TR_STXVRLX_S
>>> +  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned
>>> long *);
>>> +    TR_STXVRLX  TR_STXVRLX_U
>>>    void __builtin_vec_xst_trunc (vsq, signed long long, signed long
>>> long *);
>>>      TR_STXVRDX  TR_STXVRDX_S
>>>    void __builtin_vec_xst_trunc (vuq, signed long long, unsigned
>>> long long *);
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index e426a2eb7d8..7e2ae790ab3 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -18570,10 +18570,12 @@ instructions.
>>>  @defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed
>>> long long, signed char *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed
>>> long long, signed short *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed
>>> long long, signed int *)}
>>> +@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed
>>> long long, signed long *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed
>>> long long, signed long long *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128,
>>> signed long long, unsigned char *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128,
>>> signed long long, unsigned short *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128,
>>> signed long long, unsigned int *)}
>>> +@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128,
>>> signed long long, unsigned long *)}
>>>  @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128,
>>> signed long long, unsigned long long *)}
>>>  
>>>  Truncate and store the rightmost element of a vector, as if
>>> implemented by the
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-
>>> vec_xst_trunc.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-
>>> vec_xst_trunc.c
>>> new file mode 100644
>>> index 00000000000..7108109560d
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c
>>> @@ -0,0 +1,217 @@
>>> +/* Test of __builtin_vec_xst_trunc  */
>>> +
>>> +/* { dg-do run { target power10_hw } } */
>>> +/* { dg-require-effective-target power10_ok } */
>>
>> Normally _hw is stricter than _ok, you already check power10_hw,
>> I think power10_ok is useless here.
> 
> OK, removed the  dg-require-effective-target power10_ok line.
> 
>>
>>> +/* { dg-require-effective-target int128 } */
>>> +/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
>>> +
>>> +#include <altivec.h>
>>> +#include <stdio.h>
>>> +#include <inttypes.h>
>>> +#include <string.h>
>>> +#include <stdlib.h>
>>> +
>>> +#define DEBUG 0
>>> +#define TRUE  1
>>> +#define FALSE 0
>>> +#define SIZE  4
>>> +
>>> +vector signed __int128 zero_vsint128 = {0x0};
>>> +
>>> +vector signed __int128 store_data[SIZE] = {
>>> +{ (__int128) 0x79BD000000000000 << 64 | (__int128)
>>> 0x123456789abcdef8ULL},
>>> +{ (__int128) 0x8ACE000000000000 << 64 | (__int128)
>>> 0xfedcba9876543217ULL},
>>> +{ (__int128) 0x1357000000000000 << 64 | (__int128)
>>> 0xccccccccccccccccULL},
>>> +{ (__int128) 0xf000000000000000 << 64 | (__int128)
>>> 0xaaaaaaaaaaaaaaaaULL}
>>> +};
>>> +
>>> +signed char signed_char_expected[SIZE] = {0xF8ULL, 0x17, 0xCC,
>>> 0xAA};
>>> +signed short signed_short_expected[SIZE] = {0xDEF8, 0x3217,
>>> 0xcccc, 0xaaaa, };
>>> +signed int signed_int_expected[SIZE] = {0x9ABCDEF8, 0x76543217,
>>> 0xCCCCCCCC,
>>> +                                   0xAAAAAAAA};
>>> +signed long int signed_long_expected[SIZE] = {0x123456789ABCDEF8,
>>> +                                                   0xFEDCBA9876543
>>> 217ULL,
>>> +                                                   0xCCCCCCCCCCCCC
>>> CCCULL,
>>> +                                                   0xAAAAAAAAAAAAA
>>> AAAULL};
>>> +signed long long int signed_long_long_expected[SIZE] =
>>> {0x123456789ABCDEF8ULL,
>>> +                                                   0xFEDCBA9876543
>>> 217ULL,
>>> +                                                   0xCCCCCCCCCCCCC
>>> CCCULL,
>>> +                                                   0xAAAAAAAAAAAAA
>>> AAAULL};
>>> +
>>> +union conv_t {
>>> +  vector signed __int128 vsi128;
>>> +  unsigned long long ull[2];
>>> +  signed char schar[16];
>>> +  signed __int128 s128;
>>> +} conv;
>>> +
>>> +int check_expected_byte (signed char expected,
>>> +                    signed char actual)
>>> +{
>>> +  /* Return TRUE if expected and actual values all match.  */
>>> +  if (expected != actual)
>>> +    {
>>> +#if DEBUG
>>> +      printf ("ERROR: Expected half values don't match. \n");
>>> +      printf ("  Expected 0x%x & 0xFFFF, actual 0x%x & 0xFFFF\n",
>>> +         expected & 0xFF, actual & 0xFF);
>>> +#endif
>>> +      return FALSE;
>>> +    }
>>> +  return TRUE;
>>> +}
>>> +
>>> +int check_expected_half (signed short int expected,
>>> +                    signed short int  actual)
>>> +{
>>> +  /* Return TRUE if expected and actual values all match.  */
>>> +  if (expected != actual)
>>> +    {
>>> +#if DEBUG
>>> +      printf ("ERROR: Expected short values don't match. \n");
>>> +      printf ("  Expected 0x%x, actual 0x%x\n",
>>> +         expected & 0xFFFF, actual & 0xFFFF);
>>> +#endif
>>> +      return FALSE;
>>> +    }
>>> +  return TRUE;
>>> +}
>>> +
>>> +int check_expected_int (signed int expected,
>>> +                   signed int  actual)
>>> +{
>>> +  /* Return TRUE if expected and actual values all match.  */
>>> +  if (expected != actual)
>>> +    {
>>> +#if DEBUG
>>> +      printf ("ERROR: Expected int values don't match. \n");
>>> +      printf ("  Expected 0x%x, actual 0x%x\n",
>>> +         expected, actual);
>>> +#endif
>>> +      return FALSE;
>>> +    }
>>> +  return TRUE;
>>> +}
>>> +
>>> +int check_expected_long (signed long int expected,
>>> +                    signed long int  actual)
>>> +{
>>> +  /* Return TRUE if expected and actual values all match.  */
>>> +  if (expected != actual)
>>> +    {
>>> +#if DEBUG
>>> +      printf ("ERROR: Expected long values don't match. \n");
>>> +      printf ("  Expected 0x%x, actual 0x%x\n",
>>> +         expected, actual);
>>> +#endif
>>> +      return FALSE;
>>> +    }
>>> +  return TRUE;
>>> +}
>>> +
>>> +int check_expected_long_long (signed long long int expected,
>>> +                         signed long long int  actual)
>>> +{
>>> +  /* Return TRUE if expected and actual values all match.  */
>>> +  if (expected != actual)
>>> +    {
>>> +#if DEBUG
>>> +      printf ("ERROR: Expected long long values don't match. \n");
>>> +      printf ("  Expected 0x%x, actual 0x%x\n",
>>> +         expected, actual);
>>> +#endif
>>> +      return FALSE;
>>> +    }
>>> +  return TRUE;
>>> +}
>>> +
>>> +void print_store_data (vector signed __int128 *store_data, int
>>> size)
>>> +{
>>> +#if DEBUG
>>> +  union conv_t val;
>>> +  int i;
>>> +  
>>> +  for  (i = 0; i < size; i++)
>>> +    {
>>> +      val.vsi128 = store_data[i];
>>> +      printf("Data to store [%d] = 0x%llx %llx\n", i, val.ull[1],
>>> val.ull[0]);
>>> +    }
>>> +#endif
>>> +}
>>> +
>>> +
>>> +void print_raw_buffer (vector signed __int128 *rawbuffer, int
>>> size)
>>> +{
>>> +#if DEBUG
>>> +  union conv_t val;
>>> +  int i;
>>> +  
>>> +  for  (i = 0; i < size; i++)
>>> +    {
>>> +      val.vsi128 = rawbuffer[i];
>>> +      printf ("rawbuffer[%d] = 0x%llx %llx\n", i, val.ull[1],
>>> val.ull[0]);
>>> +    }
>>> +#endif
>>> +}
>>> +
>>> +int
>>> +main () {
>>> +  int i;
>>> +  
>>> +  vector signed __int128 rawbuffer[SIZE];
>>> +  signed char * vsbuffer_char = (signed char *)rawbuffer;
>>> +  signed short int * vsbuffer_short = (signed short int
>>> *)rawbuffer;
>>> +  signed int * vsbuffer_int = (signed int *)rawbuffer;
>>> +  signed long int * vsbuffer_long = (signed long *)rawbuffer;
>>> +  signed long long int * vsbuffer_long_long = (signed long long
>>> *)rawbuffer;
>>> +
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    rawbuffer[i] = zero_vsint128;
>>> +
>>> +  print_store_data (store_data, SIZE);
>>> +
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    {
>>> +      __builtin_vec_xst_trunc (store_data[i], i*sizeof(char),
>>> +                          vsbuffer_char);
>>> +      check_expected_byte (signed_char_expected[i],
>>> vsbuffer_char[i]);
>>> +    }
>>> +  
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    {
>>> +      __builtin_vec_xst_trunc (store_data[i], i*sizeof(short int),
>>> +                          vsbuffer_short);
>>> +      check_expected_half (signed_short_expected[i],
>>> vsbuffer_short[i]);
>>> +    }
>>> +
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    {
>>> +      __builtin_vec_xst_trunc (store_data[i], i*sizeof(int),
>>> +                          vsbuffer_int);
>>> +      check_expected_int (signed_int_expected[i],
>>> vsbuffer_int[i]);
>>> +    }
>>> +
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    {
>>> +      __builtin_vec_xst_trunc (store_data[i], i*sizeof(long int),
>>> +                                  vsbuffer_long);
>>> +      check_expected_long (signed_long_long_expected[i],
>>> +                      vsbuffer_long[i]);
>>> +    }
>>> +
>>> +  for  (i = 0; i < SIZE; i++)
>>> +    {
>>> +      __builtin_vec_xst_trunc (store_data[i], i*sizeof(long long
>>> int),
>>> +                          vsbuffer_long_long);
>>> +      check_expected_long_long (signed_long_long_expected[i],
>>> +                           vsbuffer_long_long[i]);
>>> +    }
>>> +
>>> +  print_raw_buffer (rawbuffer, SIZE);
>>> +  return 0;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler {\mstxvrbx\M} } } */
>>> +/* { dg-final { scan-assembler {\mstxvrhx\M} } } */
>>> +/* { dg-final { scan-assembler {\mstxvrwx\M} } } */
>>> +/* { dg-final { scan-assembler {\mstxvrdx\M} } } */
>>
>> Could you check the exact times instead?
> 
> OK, added check for the instruction counts.
>>
>> BR,
>> Kewen
>>
>

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

Reply via email to