Re: [PATCH]rs6000: avoid peeking eof after __vector keyword

2022-03-21 Thread David Edelsohn via Gcc-patches
On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo  wrote:
>
> Hi!
>
> There is a rare corner case: where __vector is followed only with ";"
> and near the end of the file.
>
> Like the case in PR101168:
> using vdbl =  __vector double;
> #define BREAK 1
>
> For this case, "__vector double" is not followed by a PP_NAME, it is
> followed by CPP_SEMICOLON and then EOF.  In this case, there is no
> more tokens in the file.  Then, do not need to continue to parse the
> file.
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.

This is okay. Maybe a tweak to the comment, see below.

Thanks, David

>
>
> BR,
> Jiufu
>
>
> PR preprocessor/101168
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-c.cc (rs6000_macro_to_expand):
> Avoid empty identifier.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/pr101168.C: New test.
>
>
> ---
>  gcc/config/rs6000/rs6000-c.cc   | 4 +++-
>  gcc/testsuite/g++.target/powerpc/pr101168.C | 6 ++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/powerpc/pr101168.C
>
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index 3b62b499df2..f8cc7bad812 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -282,7 +282,9 @@ rs6000_macro_to_expand (cpp_reader *pfile, const 
> cpp_token *tok)
> expand_bool_pixel = __pixel_keyword;
>   else if (ident == C_CPP_HASHNODE (__bool_keyword))
> expand_bool_pixel = __bool_keyword;
> - else
> +
> + /* If it needs to check tokens continue.  */

Maybe /* If there are more tokens to check.  */ ?

> + else if (ident)
> {
>   /* Try two tokens down, too.  */
>   do
> diff --git a/gcc/testsuite/g++.target/powerpc/pr101168.C 
> b/gcc/testsuite/g++.target/powerpc/pr101168.C
> new file mode 100644
> index 000..284e77fdc88
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr101168.C
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec" } */
> +
> +using vdbl =  __vector double;
> +#define BREAK 1
> --
> 2.25.1
>


Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD

2022-03-03 Thread David Edelsohn via Gcc-patches
I don't have any objection, but the patch is FreeBSD-specific.  You
are sending the patch from the FreeBSD organization, but I don't know
the authority structure within the organization.   Andreas Tobler is
the FreeBSD maintainer for GCC, but I don't know his current status.

Thanks, David

On Sun, Feb 20, 2022 at 6:38 PM  wrote:
>
> From: Piotr Kubaj 
>
> While FreeBSD currently uses 64-bit long double, there should be no
> problem with adding support for float128.
>
> Signed-off-by: Piotr Kubaj 
> ---
>  libgcc/configure| 22 ++
>  libgcc/configure.ac | 11 +++
>  2 files changed, 33 insertions(+)
>
> diff --git a/libgcc/configure b/libgcc/configure
> index 4919a56f518..334d20d1fb1 100755
> --- a/libgcc/configure
> +++ b/libgcc/configure
> @@ -5300,6 +5300,28 @@ fi
>  { $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_3_1_float128_hw" >&5
>  $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; }
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to 
> build __float128 libraries" >&5
> +$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " 
> >&6; }
> +if ${libgcc_cv_powerpc_float128+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +vector double dadd (vector double a, vector double b) { return a + b; }
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  libgcc_cv_powerpc_float128=yes
> +else
> +  libgcc_cv_powerpc_float128=no
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
> $libgcc_cv_powerpc_float128" >&5
> +$as_echo "$libgcc_cv_powerpc_float128" >&6; }
>  esac
>
>  # Collect host-machine-specific information.
> diff --git a/libgcc/configure.ac b/libgcc/configure.ac
> index 13a80b2551b..99ec5d405a4 100644
> --- a/libgcc/configure.ac
> +++ b/libgcc/configure.ac
> @@ -483,6 +483,17 @@ powerpc*-*-linux*)
>  [libgcc_cv_powerpc_3_1_float128_hw=yes],
>  [libgcc_cv_powerpc_3_1_float128_hw=no])])
>CFLAGS="$saved_CFLAGS"
> +;;
> +powerpc*-*-freebsd*)
> +  saved_CFLAGS="$CFLAGS"
> +  CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128"
> +  AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries],
> + [libgcc_cv_powerpc_float128],
> + [AC_COMPILE_IFELSE(
> +[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { 
> return a + b; }])],
> +[libgcc_cv_powerpc_float128=yes],
> +[libgcc_cv_powerpc_float128=no])])
> +  CFLAGS="$saved_CFLAGS"
>  esac
>
>  # Collect host-machine-specific information.
> --
> 2.35.1
>


Re: [PATCH v3, rs6000] Enable absolute jump table for PPC AIX and Linux

2022-03-01 Thread David Edelsohn via Gcc-patches
On Tue, Mar 1, 2022 at 12:41 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the 
> jump
> table is placed in data section. For Linux, it is placed in RELRO section when
> relocation is needed.
>
>Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is 
> this okay for trunk?
> Any recommendations? Thanks a lot.

Hi, Haochen

Thanks for working on this patch and for revising it.  The patch looks
okay now, but it is not appropriate for the current bug fixing stage
in GCC development -- it needs to wait for GCC Stage 1 later this
Spring.

Also the current code uses the default readonly data section for the
jump table.  The new rs6000_xcoff_function_rodata_section follows the
existing simple behavior, which is correct, but it should support
named data sections instead of placing everything in the default
".data" section, similar to the default ELF code.  I will work on
that.

Thanks, David

>
> ChangeLog
> 2022-03-01 Haochen Gui 
>
> gcc/
> * config/rs6000/aix.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Likewise.
> * config/rs6000/rs6000.cc (rs6000_option_override_internal): Enable
> absolute jump tables for AIX and Linux.
> (rs6000_xcoff_function_rodata_section): Implement.
> * config/rs6000/xcoff.h (TARGET_ASM_FUNCTION_RODATA_SECTION): Define.
>
> gcc/testsuite
> * gcc.target/powerpc/absolute-jump-table-section.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
> index ad3238bf09a..cf0708aa08b 100644
> --- a/gcc/config/rs6000/aix.h
> +++ b/gcc/config/rs6000/aix.h
> @@ -251,9 +251,9 @@
>  #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
>(!(FIRST) ? PAD_UPWARD : targetm.calls.function_arg_padding (MODE, TYPE))
>
> -/* Indicate that jump tables go in the text section.  */
> +/* Indicate that jump tables go in the data section.  */
>
> -#define JUMP_TABLES_IN_TEXT_SECTION 1
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* Define any extra SPECS that the compiler needs to generate.  */
>  #undef  SUBTARGET_EXTRA_SPECS
> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index b2a7afabc73..440e0fde52b 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -237,9 +237,9 @@ extern int dot_symbols;
>  #define TARGET_ALIGN_NATURAL 1
>  #endif
>
> -/* Indicate that jump tables go in the text section.  */
> +/* Indicate that jump tables go in the rodata or RELRO section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> than a doubleword should be padded upward or downward.  You could
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index bc3ef0721a4..07f78d3a05b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4954,6 +4954,10 @@ rs6000_option_override_internal (bool global_init_p)
>  warning (0, "%qs is deprecated and not recommended in any circumstances",
>  "-mno-speculate-indirect-jumps");
>
> +  /* Enable absolute jump tables for AIX and Linux.  */
> +  if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2)
> +rs6000_relative_jumptables = 0;
> +
>return ret;
>  }
>
> @@ -21419,6 +21423,16 @@ rs6000_xcoff_visibility (tree decl)
>enum symbol_visibility vis = DECL_VISIBILITY (decl);
>return visibility_types[vis];
>  }
> +
> +static section *
> +rs6000_xcoff_function_rodata_section (tree decl ATTRIBUTE_UNUSED,
> + bool relocatable)
> +{
> +  if (relocatable)
> +return data_section;
> +  else
> +return readonly_data_section;
> +}
>  #endif
>
>
> diff --git a/gcc/config/rs6000/xcoff.h b/gcc/config/rs6000/xcoff.h
> index cd0f99cb9c6..0dacd86eed9 100644
> --- a/gcc/config/rs6000/xcoff.h
> +++ b/gcc/config/rs6000/xcoff.h
> @@ -98,7 +98,7 @@
>  #define TARGET_ASM_SELECT_SECTION  rs6000_xcoff_select_section
>  #define TARGET_ASM_SELECT_RTX_SECTION  rs6000_xcoff_select_rtx_section
>  #define TARGET_ASM_UNIQUE_SECTION  rs6000_xcoff_unique_section
> -#define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section
> +#define TARGET_ASM_FUNCTION_RODATA_SECTION 
> rs6000_xcoff_function_rodata_section
>  #define TARGET_STRIP_NAME_ENCODING  rs6000_xcoff_strip_name_encoding
>  #define TARGET_SECTION_TYPE_FLAGS  rs6000_xcoff_section_type_flags
>  #ifdef HAVE_AS_TLS
> diff --git a/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c 
> b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c
> new file mode 100644
> index 000..688a6f42836
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile { target { *-*-aix* || *-*-linux* } } } */
> +/* { dg-op

Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-10 Thread David Edelsohn via Gcc-patches
On Thu, Feb 10, 2022 at 4:17 PM Bill Schmidt  wrote:
>
> Hi!
>
> On 2/10/22 2:50 PM, Segher Boessenkool wrote:
> > On Thu, Feb 10, 2022 at 12:22:28PM -0600, Bill Schmidt wrote:
> >> This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac.
> >> These built-ins were misimplemented as always having big-endian semantics.
> >>
> >> Because the built-in infrastructure has changed, the modifications to the
> >> source are different but achieve the same purpose.  The modifications to
> >> the test suite are identical (after fixing the issue with -mbig that David
> >> pointed out with the original patch).
> >>  /* 1 argument vector functions added in ISA 3.0 (power9). */
> >> -BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  
> >> vclzlsbb_v16qi)
> >> -BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
> >> -BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vclzlsbb_v4si)
> >> -BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  
> >> vctzlsbb_v16qi)
> >> -BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
> >> -BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vctzlsbb_v4si)
> >> +BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi",  CONST,  
> >> vctzlsbb_v16qi)
> >> +BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST,  vctzlsbb_v8hi)
> >> +BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST,  vctzlsbb_v4si)
> >> +BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi",  CONST,  
> >> vclzlsbb_v16qi)
> >> +BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST,  vclzlsbb_v8hi)
> >> +BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST,  vclzlsbb_v4si)
> > Please change the default to be equal to the builtin name, so, the BE
> > version.  We do that everywhere else as well, and it makes a lot more
> > sense (since everything in Power has BE numbering).
> >
> > The trunk version has this correct afaics?
>
> No, trunk has this, for example:
>
>   const signed int __builtin_altivec_vclzlsbb_v16qi (vsc);
> VCLZLSBB_V16QI vctzlsbb_v16qi {endian}
>
> So the backport matches what is on trunk.
>
> Throughout the new builtin infrastructure, the defaults are set for
> little-endian, and the "endian" flag changes behavior for big-endian.
>
> >
> >> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
> >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c
> >> @@ -1,6 +1,7 @@
> >>  /* { dg-do compile { target { powerpc*-*-* } } } */
> > (Delete the redundant target clause when modifying any testcase, please).
>
> Okay.
> >
> >>  /* { dg-require-effective-target powerpc_p9vector_ok } */
> >>  /* { dg-options "-mdejagnu-cpu=power9" } */
> >> +/* { dg-additional-options "-mbig" { target powerpc64le-*-* } } */
> > You don't need the target clause, if it already is BE by default it does
> > not do anything to add it redundantly.
> >
> > But this is wrong anyway: the name of the target triple does not say
> > whether we are BE or LE.  Instead you should use the be or le selectors.
> > But again, just add -mbig always.
>
> This was added by David Edelsohn to the trunk version of the patch, because
> -mbig actually is not supported on all subtargets.  (I found that quite
> surprising also.)  Apparently this doesn't work on AIX, for example.  But
> -mlittle works everywhere.  Go figure.

-mbig/-mlittle only applies to Linux, not AIX and not Darwin.

I changed the BE testcases to add "-mbig" for little endian default
targets because the compiler implicitly should be operating in big
endian mode for other targets and the testcases should succeed.

For the LE testcases, I changed the target selector to
"powrpc*-*-linux*" because that is the only PowerPC target that can
operate as little endian.  I could not find a generic "le" target
selector.  powerpc*-*-linux* understands "-mlittle", so I left the
dg-options clause because there is no need to separate out "-mlittle"
for that subset of PowerPC targets.

Thanks, David

>
> That's something that should be fixed, I guess, but it's orthogonal
> to this patch.
>
> Thanks!
> Bill
>
> >
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c
> >> @@ -0,0 +1,15 @@
> >> +/* { dg-do compile { target { powerpc*-*-* } } } */
> >> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> >> +/* { dg-options "-mdejagnu-cpu=power9 -mlittle" } */
> > And here you do it correctly :-)
> >
> > Okay with those fixes (all happen a few times).  Thanks!
> >
> >
> > Segher


Re: [PATCH] rs6000: Fix up vspltis_shifted [PR102140]

2022-02-08 Thread David Edelsohn via Gcc-patches
On Tue, Feb 8, 2022 at 12:25 PM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because
> (const_vector:V4SI [
> (const_int 0 [0]) repeated x3
> (const_int -2147483648 [0x8000])
> ])
> is recognized as valid easy_vector_constant in between split1 pass and
> end of RA.
> The problem is that such constants need to be split, and the only
> splitter for that is:
> (define_split
>   [(set (match_operand:VM 0 "altivec_register_operand")
> (match_operand:VM 1 "easy_vector_constant_vsldoi"))]
>   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> There is only a single splitting pass before RA, so after that finishes,
> if something gets matched in between that and end of RA (after that
> can_create_pseudo_p () would be no longer true), it will never be
> successfully split and we ICE at final.cc time or earlier.
>
> The i386 backend (and a few others) already use
> (cfun->curr_properties & PROP_rtl_split_insns)
> as a test for split1 pass finished, so that some insns that should be split
> during split1 and shouldn't be matched afterwards are properly guarded.
>
> So, the following patch does that for vspltis_shifted too.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks, David

>
> 2022-02-08  Jakub Jelinek  
>
> PR target/102140
> * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if
> split1 pass has finished already.
>
> * gcc.dg/pr102140.c: New test.
>
> --- gcc/config/rs6000/rs6000.cc.jj  2022-02-07 17:38:20.873123915 +0100
> +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100
> @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op)
>  return false;
>
>/* We need to create pseudo registers to do the shift, so don't recognize
> - shift vector constants after reload.  */
> -  if (!can_create_pseudo_p ())
> + shift vector constants after reload.  Don't match it even before RA
> + after split1 is done, because there won't be further splitting pass
> + before RA to do the splitting.  */
> +  if (!can_create_pseudo_p ()
> +  || (cfun->curr_properties & PROP_rtl_split_insns))
>  return false;
>
>nunits = GET_MODE_NUNITS (mode);
> --- gcc/testsuite/gcc.dg/pr102140.c.jj  2022-02-08 14:24:25.839041166 +0100
> +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/102140 */
> +/* { dg-do compile { target int128 } } */
> +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */
> +
> +typedef int __attribute__((__vector_size__ (64))) U;
> +typedef __int128 __attribute__((__vector_size__ (64))) V;
> +
> +int a, b;
> +
> +static void
> +bar (char c, V v)
> +{
> +  v *= c;
> +  U u = a + (U) v;
> +  (union { U b; }) { u };
> +  b = 0;
> +}
> +
> +void
> +foo (void)
> +{
> +  bar (1, (V){((__int128) 9223372036854775808ULL) << 64});
> +}
>
> Jakub
>


Re: [PATCH] testsuite: Fix up testsuite/gcc.c-torture/execute/builtins/lib/chk.c for powerpc [PR104380]

2022-02-07 Thread David Edelsohn via Gcc-patches
On Mon, Feb 7, 2022 at 8:20 AM Jakub Jelinek  wrote:
>
> On Fri, Feb 04, 2022 at 12:00:57PM -0500, David Edelsohn via Gcc-patches 
> wrote:
> > > The following testcase FAILs when configured with
> > > --with-long-double-format=ieee .  Only happens in the -std=c* modes, not 
> > > the
> > > GNU modes; while the glibc headers have __asm redirects of
> > > vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and
> > > __vsnprintf_chkieee128, the vsnprintf fortification extern inline 
> > > gnu_inline
> > > always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit
> > > a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of
> > > __vsnprintf_chkieee128.
> > >
> > > rs6000_mangle_decl_assembler_name already had cases for *printf and 
> > > *scanf,
> > > so this just adds another case for *printf_chk.  *scanf_chk doesn't exist.
> > > __ prefixing isn't done because *printf_chk already starts with __.
> > >
> > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk?
> >
> > Okay.
>
> Unfortunately, while I've tested the testcase also with -mabi=ieeelongdouble
> by hand, the full bootstrap/regtest was on GCCFarm where glibc is too old
> to test with --with-long-double-format=ieee.
> I've done full bootstrap/regtest with that option during the weekend and
> the patch regressed:
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution,  -Os
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O1
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto 
> -fno-use-linker-plugin -flto-partition=none
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O2 -flto 
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -O3 -g
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Og -g
> FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution,  -Os
>
> The problem is that the execute/builtins/ testsuite wants to override some
> of the library functions and with the change we (correctly) call
> __*printf_chkieee128 and so lib/chk.c is no longer called but the glibc
> APIs are.
>
> The following patch fixes it.
>
> Tested on powerpc64le-linux, ok fo

Re: [PATCH] rs6000: Fix up -D_FORTIFY_SOURCE* with -mabi=ieeelongdouble [PR104380]

2022-02-04 Thread David Edelsohn via Gcc-patches
On Fri, Feb 4, 2022 at 11:58 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase FAILs when configured with
> --with-long-double-format=ieee .  Only happens in the -std=c* modes, not the
> GNU modes; while the glibc headers have __asm redirects of
> vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and
> __vsnprintf_chkieee128, the vsnprintf fortification extern inline gnu_inline
> always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit
> a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of
> __vsnprintf_chkieee128.
>
> rs6000_mangle_decl_assembler_name already had cases for *printf and *scanf,
> so this just adds another case for *printf_chk.  *scanf_chk doesn't exist.
> __ prefixing isn't done because *printf_chk already starts with __.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks, David

>
> 2022-02-04  Jakub Jelinek  
>
> PR target/104380
> * config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): Also
> adjust mangling of __builtin*printf_chk.
>
> * gcc.dg/pr104380.c: New test.
>
> --- gcc/config/rs6000/rs6000.cc.jj  2022-01-28 10:01:41.224837656 +0100
> +++ gcc/config/rs6000/rs6000.cc 2022-02-04 12:31:27.651715472 +0100
> @@ -28228,6 +28228,7 @@ rs6000_mangle_decl_assembler_name (tree
> {
>   size_t printf_len = strlen ("printf");
>   size_t scanf_len = strlen ("scanf");
> + size_t printf_chk_len = strlen ("printf_chk");
>
>   if (len >= printf_len
>   && strcmp (name + len - printf_len, "printf") == 0)
> @@ -28237,6 +28238,10 @@ rs6000_mangle_decl_assembler_name (tree
>&& strcmp (name + len - scanf_len, "scanf") == 0)
> newname = xasprintf ("__isoc99_%sieee128", name);
>
> + else if (len >= printf_chk_len
> +  && strcmp (name + len - printf_chk_len, "printf_chk") == 0)
> +   newname = xasprintf ("%sieee128", name);
> +
>   else if (name[len - 1] == 'l')
> {
>   bool uses_ieee128_p = false;
> --- gcc/testsuite/gcc.dg/pr104380.c.jj  2022-02-04 12:51:50.152643364 +0100
> +++ gcc/testsuite/gcc.dg/pr104380.c 2022-02-04 12:53:25.092317741 +0100
> @@ -0,0 +1,32 @@
> +/* PR target/104380 */
> +/* This test needs runtime that provides __*_chk functions.  */
> +/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */
> +/* { dg-options "-O2 -std=c99" } */
> +
> +#define FORTIFY_SOURCE 2
> +#include 
> +#include 
> +
> +static char buf[4096];
> +static char gfmt[] = "%Lg";
> +
> +static int __attribute__ ((noipa))
> +foo (char *str, const char *fmt, ...)
> +{
> +  int ret;
> +  va_list ap;
> +  va_start (ap, fmt);
> +  ret = vsnprintf (str, 4096, fmt, ap);
> +  va_end (ap);
> +  return ret;
> +}
> +
> +int
> +main ()
> +{
> +  long double dval = 128.0L;
> +  int ret = foo (buf, gfmt, dval);
> +  if (ret != 3 || __builtin_strcmp (buf, "128") != 0)
> +__builtin_abort ();
> +  return 0;
> +}
>
> Jakub
>


Re: [PATCH 0/3] Enable pointer_query caching throughout.

2022-02-03 Thread David Edelsohn via Gcc-patches
On Thu, Feb 3, 2022 at 6:09 PM Martin Sebor  wrote:
>
> On 2/3/22 15:56, David Edelsohn wrote:
> > This series of patches has exploded memory usage and I can no longer
> > bootstrap GCC on AIX.
> >
> > As with the Ranger problem exposed by Aldy's patch last September,
> > something is not freeing memory.
> >
> > Even on systems where GCC still bootstrap, this excessive memory usage
> > severely damages GCC compile performance.
>
> Does the change below by any chance make a difference?  (It's just
> a hunch, I haven't tested it beyond quickly building stage 1 and
> running a few tests.)

Hi, Martin

Thanks for the quick response.  Yes, I am able to restore bootstrap on
AIX (32 bit) with the change.

Thanks, David

>
> Martin
>
>
> diff --git a/gcc/pointer-query.h b/gcc/pointer-query.h
> index 4c725eeaf34..801a240c38d 100644
> --- a/gcc/pointer-query.h
> +++ b/gcc/pointer-query.h
> @@ -164,9 +164,9 @@ class pointer_query
> struct cache_type
> {
>   /* 1-based indices into cache.  */
> -vec indices;
> +auto_vec indices;
>   /* The cache itself.  */
> -vec access_refs;
> +auto_vec access_refs;
> };
>
>   public:


Re: [PATCH 0/3] Enable pointer_query caching throughout.

2022-02-03 Thread David Edelsohn via Gcc-patches
This series of patches has exploded memory usage and I can no longer
bootstrap GCC on AIX.

As with the Ranger problem exposed by Aldy's patch last September,
something is not freeing memory.

Even on systems where GCC still bootstrap, this excessive memory usage
severely damages GCC compile performance.

Thanks, David


Re: [PATCH] rs6000: Fix up #include or [PR104239]

2022-01-26 Thread David Edelsohn via Gcc-patches
On Wed, Jan 26, 2022 at 3:45 PM Jakub Jelinek  wrote:
>
> Hi!
>
> r12-4717-g7d37abedf58d66 added immintrin.h and x86gprintrin.h headers
> to rs6000, these headers are on x86 standalone headers that various
> programs include directly rather than including them through
> .
> Unfortunately, for that change the bmiintrin.h and bmi2intrin.h
> headers haven't been adjusted, so the effect is that if one includes them
> (without including also x86intrin.h first) #error will trigger.
> Furthermore, when including such headers conditionally as some real-world
> packages do, this means a regression.
>
> The following patch fixes it and matches what the x86 bmi{,2}intrin.h
> headers do.
>
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Okay.

Thanks for catching this.

- David

>
> 2022-01-26  Jakub Jelinek  
>
> PR target/104239
> * config/rs6000/bmiintrin.h: Test _X86GPRINTRIN_H_INCLUDED instead of
> _X86INTRIN_H_INCLUDED and adjust #error wording.
> * config/rs6000/bmi2intrin.h: Likewise.
>
> * gcc.target/powerpc/pr104239-1.c: New test.
> * gcc.target/powerpc/pr104239-2.c: New test.
>
> --- gcc/config/rs6000/bmiintrin.h.jj2022-01-11 23:11:21.936296534 +0100
> +++ gcc/config/rs6000/bmiintrin.h   2022-01-26 13:35:08.705945170 +0100
> @@ -29,8 +29,8 @@
> standard C or GNU C extensions, which are more portable and better
> optimized across multiple targets.  */
>
> -#if !defined _X86INTRIN_H_INCLUDED
> -# error "Never use  directly; include  instead."
> +#if !defined _X86GPRINTRIN_H_INCLUDED
> +# error "Never use  directly; include  instead."
>  #endif
>
>  #ifndef _BMIINTRIN_H_INCLUDED
> --- gcc/config/rs6000/bmi2intrin.h.jj   2022-01-11 23:11:21.936296534 +0100
> +++ gcc/config/rs6000/bmi2intrin.h  2022-01-26 13:34:53.373162122 +0100
> @@ -29,8 +29,8 @@
> standard C or GNU C extensions, which are more portable and better
> optimized across multiple targets.  */
>
> -#if !defined _X86INTRIN_H_INCLUDED
> -# error "Never use  directly; include  instead."
> +#if !defined _X86GPRINTRIN_H_INCLUDED
> +# error "Never use  directly; include  
> instead."
>  #endif
>
>  #ifndef _BMI2INTRIN_H_INCLUDED
> --- gcc/testsuite/gcc.target/powerpc/pr104239-1.c.jj2022-01-26 
> 13:42:34.103643030 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr104239-1.c   2022-01-26 
> 13:42:23.101798701 +0100
> @@ -0,0 +1,9 @@
> +/* PR target/104239 */
> +/* { dg-do compile } */
> +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */
> +
> +#if __has_include()
> +#include 
> +#endif
> +
> +int i;
> --- gcc/testsuite/gcc.target/powerpc/pr104239-2.c.jj2022-01-26 
> 13:42:42.279527345 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr104239-2.c   2022-01-26 
> 13:42:23.101798701 +0100
> @@ -0,0 +1,9 @@
> +/* PR target/104239 */
> +/* { dg-do compile } */
> +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */
> +
> +#if __has_include()
> +#include 
> +#endif
> +
> +int i;
>
> Jakub
>


Re: [PATCH] Disable -fsplit-stack support on non-glibc targets

2022-01-25 Thread David Edelsohn via Gcc-patches
This patch broke bootstrap on AIX.  It may have broken Darwin.  I have
applied the following patch.  AIX doesn't need to distinguish between
different Linux libc implementations.

Bootstrapped on powerpc-ibm-aix7.2.3.0

Thanks, David

aix: AIX is not GLIBC.

A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h.  This broke bootstrap for non-Linux rs6000
configurations.  This patch defines OPTION_GLIBC as 0.

* config/rs6000/aix.h (OPTION_GLIBC): Define as 0.

diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h
index ad3238bf09a..eb7a0c09f72 100644
--- a/gcc/config/rs6000/aix.h
+++ b/gcc/config/rs6000/aix.h
@@ -23,6 +23,7 @@
 #define DEFAULT_ABI ABI_AIX
 #undef  TARGET_AIX
 #define TARGET_AIX 1
+#define OPTION_GLIBC 0


Re: Ping: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10.

2022-01-21 Thread David Edelsohn via Gcc-patches
On Fri, Jan 21, 2022 at 2:56 PM Michael Meissner  wrote:
>
> Ping patch
> https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587924.html
>
> | Date: Fri, 7 Jan 2022 16:05:53 -0500
> | From: Michael Meissner 
> | Subject: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10.
> | Message-ID: 

This patch is okay.

Thanks, David


Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-20 Thread David Edelsohn via Gcc-patches
On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
>With this patch, one unnecessary sign extend is eliminated.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-20 Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..1d8b212962f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,19 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
> +   (sign_extend:DI (plus:SI (reg:SI CA_REGNO)
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (reg:DI CA_REGNO)
> +   (const_int -1)))
> + (clobber (reg:DI CA_REGNO))])]
> +  ""
> +)
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does the testcase force power8? This testcase is not specific to
Power8 or later.

> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}

If you're only testing for lp64, the testcase could use "long" instead
of "long long".

This is okay with those changes.

Thanks, David


Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread David Edelsohn via Gcc-patches
On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
> With this patch, it eliminates one unnecessary sign extend. Also in 
> rs6000,
> regclass of CA register is set to NO_REGS. So CA is not in hard register set
> and it can't match register_operand. The patch changes it to any_operand.

Segher changed the class in 2014.

https://gcc.gnu.org/pipermail/gcc-patches/2014-September/399192.html

We need to ensure that it still is the correct decision in light of
these new patterns.

Thanks, David

>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-19 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (ca_operand): Match any_operand as CA
> register is not in hard register set.
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index c65dfb91f3d..cd2ae1dc8e0 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand"
>
>  ;; Return 1 if op is the carry register.
>  (define_predicate "ca_operand"
> -  (match_operand 0 "register_operand")
> +  (match_operand 0 "any_operand")
>  {
>if (SUBREG_P (op))
>  op = SUBREG_REG (op);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..f1b09aad3b5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
> +   (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand")
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (match_dup 2)
> +   (const_int -1)))
> + (clobber (match_dup 2))])]
> +{
> +  operands[2] = copy_rtx (operands[1]);
> +  PUT_MODE (operands[2], DImode);
> +})
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}


Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]

2022-01-19 Thread David Edelsohn via Gcc-patches
On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch adds a combine pattern for "CA minus one". As CA only has two
> values (0 or 1), we could convert following pattern
>   (sign_extend:DI (plus:SI (reg:SI 98 ca)
> (const_int -1 [0x]
> to
>(plus:DI (reg:DI 98 ca)
> (const_int -1 [0x])))
> With this patch, it eliminates one unnecessary sign extend. Also in 
> rs6000,
> regclass of CA register is set to NO_REGS. So CA is not in hard register set
> and it can't match register_operand. The patch changes it to any_operand.

CA_REGNO should be in class CA_REGS, not class NO_REGS.  This seems
like a major, latent bug.

Thanks, David

>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-19 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (ca_operand): Match any_operand as CA
> register is not in hard register set.
> * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr95737.c: New.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index c65dfb91f3d..cd2ae1dc8e0 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand"
>
>  ;; Return 1 if op is the carry register.
>  (define_predicate "ca_operand"
> -  (match_operand 0 "register_operand")
> +  (match_operand 0 "any_operand")
>  {
>if (SUBREG_P (op))
>  op = SUBREG_REG (op);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6ecb0bd6142..f1b09aad3b5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx"
>"subfe %0,%0,%0"
>[(set_attr "type" "add")])
>
> +(define_insn_and_split "*extenddi_ca_minus_one"
> +  [(set (match_operand:DI 0 "gpc_reg_operand")
> +   (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand")
> +(const_int -1]
> +  ""
> +  "#"
> +  ""
> +  [(parallel [(set (match_dup 0)
> +  (plus:DI (match_dup 2)
> +   (const_int -1)))
> + (clobber (match_dup 2))])]
> +{
> +  operands[2] = copy_rtx (operands[1]);
> +  PUT_MODE (operands[2], DImode);
> +})
>
>  (define_insn "@neg2"
>[(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c 
> b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> new file mode 100644
> index 000..94320f23423
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c
> @@ -0,0 +1,10 @@
> +/* PR target/95737 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not {\mextsw\M} } } */
> +
> +
> +unsigned long long negativeLessThan (unsigned long long a, unsigned long 
> long b)
> +{
> +   return -(a < b);
> +}


Re: [PATCH] libstdc++: Implement C++20 atomic and atomic

2022-01-18 Thread David Edelsohn via Gcc-patches
This patch introduced new AIX testsuite failures.

PR libstdc++/104101

Thanks, David


Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-14 Thread David Edelsohn via Gcc-patches
On Fri, Jan 14, 2022 at 5:42 AM Kewen.Lin  wrote:
>
> on 2022/1/13 下午11:15, David Edelsohn wrote:
> > On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
> >>
> >> Hi David,
> >>
> >> on 2022/1/13 上午11:12, David Edelsohn wrote:
> >>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> >>>> GET_MODE_NUNITS, which can use known constant instead.
> >>>
> >>> I'll let Segher decide, but often the additional code is useful
> >>> self-documentation instead of magic constants.  Or at least the change
> >>> requires comments documenting the derivation of the constants
> >>> currently described by the code itself.
> >>>
> >>
> >> Thanks for the comments, I added some comments as suggested, also removed
> >> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> >> by any built-in functions and even unused in the commit db042e1603db50573.
> >>
> >> The updated version has been tested as before.
> >
> > As we have discussed offline, the comments need to be clarified and 
> > expanded.
> >
> > And the removal of altivec_vreveti2 should be confirmed with Carl
> > Love, who added the pattern less than a year ago. There may be another
> > patch planning to use it.
> >
>
> Thanks for the suggestions David!  The comments have been updated, and Carl
> also helped to confirm the altivec_vreveti2 pattern is not planned for any
> future work and looks reasonable to remove.
>
> Does this updated version look good to you?

The revised patch is okay.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 16 
>  2 files changed, 12 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 950b17862c4..4412175a0dc 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index acd729d1687..ee748ff4ebd 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,10 @@
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> + GET_MODE_NUNITS (V4SImode) - 1.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4232,10 @@
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering, the below minuend 3 is computed by
> + GET_MODE_NUNITS (V4SImode) - 1.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in

Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin  wrote:
>
> on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> > on 2022/1/13 上午11:44, David Edelsohn wrote:
> >> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
> >>>
> >>> Hi David,
> >>>
> >>> on 2022/1/13 上午11:07, David Edelsohn wrote:
> >>>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> This patch is to fix register constraint v with
> >>>>> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> >>>>> just like some other existing register constraints with
> >>>>> RS6000_CONSTRAINT_*.
> >>>>>
> >>>>> I happened to see this and hope it's not intentional and just
> >>>>> got neglected.
> >>>>>
> >>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >>>>> powerpc64-linux-gnu P8.
> >>>>>
> >>>>> Is it ok for trunk?
> >>>>
> >>>> Why do you want to make this change?
> >>>>
> >>>> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> >>>>
> >>>> but all of the patterns that use a "v" constraint are (or should be)
> >>>> protected by TARGET_ALTIVEC, or some final condition that only is
> >>>> active for TARGET_ALTIVEC.  The other constraints are conditionally
> >>>> set because they can be used in a pattern with multiple alternatives
> >>>> where the pattern itself is active but some of the constraints
> >>>> correspond to NO_REGS when some instruction variants for VSX is not
> >>>> enabled.
> >>>>
> >>>
> >>> Good point!  Thanks for the explanation.
> >>>
> >>>> The change isn't wrong, but it doesn't correct a bug and provides no
> >>>> additional benefit nor clarty that I can see.
> >>>>
> >>>
> >>> The original intention is to make it consistent with the other existing
> >>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> >>> weird (like was neglected).  After you clarified above, 
> >>> RS6000_CONSTRAINT_v
> >>> seems useless at all in the current framework.  Do you prefer to remove
> >>> it to avoid any confusions instead?
> >>
> >> It's used in the reg_class, so there may be some heuristic in the GCC
> >> register allocator that cares about the number of registers available
> >> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
> >> conditionally, so it seems best to leave it as is.
> >>
> >
> > I may miss something, but I didn't find it's used for the above purposes.
> > If it's best to leave it as is, the proposed patch seems to offer better
> > readability.
>
> Two more inputs for maintainers' decision:
>
> 1) the original proposed patch fixed one "bug" that is:
>
> In function rs6000_debug_reg_global, it tries to print the register class
> for the register constraint:
>
>   fprintf (stderr,
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
>"v  reg_class = %s\n"
>"wa reg_class = %s\n"
>...
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>...
>
> It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
> set here:
>
>   /* Add conditional constraints based on various options, to allow us to
>  collapse multiple insn patterns.  */
>   if (TARGET_ALTIVEC)
> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>
> But the actual register class for register constraint is hardcoded as
> ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

I agree that the information is inaccurate, but it is informal
debugging output.  And if Altivec is disabled, the value of the
constraint is irrelevant / garbage.

>
> 2) Bootstrapped and tested one below patch to remove all the code using
> RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
> powerpc64-linux-gnu P

Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:12, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> >> GET_MODE_NUNITS, which can use known constant instead.
> >
> > I'll let Segher decide, but often the additional code is useful
> > self-documentation instead of magic constants.  Or at least the change
> > requires comments documenting the derivation of the constants
> > currently described by the code itself.
> >
>
> Thanks for the comments, I added some comments as suggested, also removed
> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> by any built-in functions and even unused in the commit db042e1603db50573.
>
> The updated version has been tested as before.

As we have discussed offline, the comments need to be clarified and expanded.

And the removal of altivec_vreveti2 should be confirmed with Carl
Love, who added the pattern less than a year ago. There may be another
patch planning to use it.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 12 
>  2 files changed, 8 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..b7f056f8c60 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..d246410880d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4273,8 +4275,9 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>rtx df_tmp = operands[4];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>  {
>int ele = INTVAL (operands[4]);
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>


Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-12 Thread David Edelsohn via Gcc-patches
On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:07, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to fix register constraint v with
> >> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> >> just like some other existing register constraints with
> >> RS6000_CONSTRAINT_*.
> >>
> >> I happened to see this and hope it's not intentional and just
> >> got neglected.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >
> > Why do you want to make this change?
> >
> > rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> >
> > but all of the patterns that use a "v" constraint are (or should be)
> > protected by TARGET_ALTIVEC, or some final condition that only is
> > active for TARGET_ALTIVEC.  The other constraints are conditionally
> > set because they can be used in a pattern with multiple alternatives
> > where the pattern itself is active but some of the constraints
> > correspond to NO_REGS when some instruction variants for VSX is not
> > enabled.
> >
>
> Good point!  Thanks for the explanation.
>
> > The change isn't wrong, but it doesn't correct a bug and provides no
> > additional benefit nor clarty that I can see.
> >
>
> The original intention is to make it consistent with the other existing
> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> weird (like was neglected).  After you clarified above, RS6000_CONSTRAINT_v
> seems useless at all in the current framework.  Do you prefer to remove
> it to avoid any confusions instead?

It's used in the reg_class, so there may be some heuristic in the GCC
register allocator that cares about the number of registers available
for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
conditionally, so it seems best to leave it as is.

Thanks, David


Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-12 Thread David Edelsohn via Gcc-patches
On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> GET_MODE_NUNITS, which can use known constant instead.

I'll let Segher decide, but often the additional code is useful
self-documentation instead of magic constants.  Or at least the change
requires comments documenting the derivation of the constants
currently described by the code itself.

Thanks, David

>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Use known constant
> values to simplify code.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9):
> Likewise.
> ---
>  gcc/config/rs6000/altivec.md | 11 +++
>  gcc/config/rs6000/vsx.md |  8 
>  2 files changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..d5c4ecfa9b7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3957,17 +3957,12 @@ (define_expand "altivec_vreveti2"
>   UNSPEC_VREVEV))]
>"TARGET_ALTIVEC"
>  {
> -  int i, j, size, num_elements;
> +  int i;
>rtvec v = rtvec_alloc (16);
>rtx mask = gen_reg_rtx (V16QImode);
>
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> +  for (i = 0; i < 16; i++)
> +RTVEC_ELT (v, i) = GEN_INT (i);
>
>emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
>emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..892b99c6d6b 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3855,7 +3855,7 @@ (define_insn_and_split  "*vsx_extract_si"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4231,7 +4231,7 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4274,7 +4274,7 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>int value;
>
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4467,7 +4467,7 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>int ele = INTVAL (operands[4]);
>
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>


Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-12 Thread David Edelsohn via Gcc-patches
On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to fix register constraint v with
> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> just like some other existing register constraints with
> RS6000_CONSTRAINT_*.
>
> I happened to see this and hope it's not intentional and just
> got neglected.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?

Why do you want to make this change?

rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

but all of the patterns that use a "v" constraint are (or should be)
protected by TARGET_ALTIVEC, or some final condition that only is
active for TARGET_ALTIVEC.  The other constraints are conditionally
set because they can be used in a pattern with multiple alternatives
where the pattern itself is active but some of the constraints
correspond to NO_REGS when some instruction variants for VSX is not
enabled.

The change isn't wrong, but it doesn't correct a bug and provides no
additional benefit nor clarty that I can see.

Thanks, David


Re: [PATCH, rs6000] Enable absolute jump table by default

2022-01-12 Thread David Edelsohn via Gcc-patches
On Wed, Jan 12, 2022 at 8:40 PM HAO CHEN GUI  wrote:
>
> Hi David,
>
> On 12/1/2022 下午 10:44, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI  wrote:
> >>
> >> Hi,
> >>This patch enables absolute jump table by default on rs6000. The 
> >> relative jump tables are used when
> >>it's explicit set by "rs6000_relative_jumptables",
> >>or jump tables are placed in text section but global relocation is 
> >> required.
> >>
> >>Bootstrapped and tested on powerpc64-linux BE and LE with no 
> >> regressions. Is this okay for trunk?
> >> Any recommendations? Thanks a lot.
> >>
> >> ChangeLog
> >> 2022-01-12 Haochen Gui 
> >>
> >> gcc/
> >> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> >> * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return
> >> true when relative jump table is explicit required or jump tables 
> >> have
> >> to be put in text section but global relocation is also required.
> >> * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable.
> >>
> >> patch.diff
> >> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> >> index d617f346f81..2e257c60f8c 100644
> >> --- a/gcc/config/rs6000/linux64.h
> >> +++ b/gcc/config/rs6000/linux64.h
> >> @@ -239,7 +239,7 @@ extern int dot_symbols;
> >>
> >>  /* Indicate that jump tables go in the text section.  */
> >>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> >> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> >> +#define JUMP_TABLES_IN_TEXT_SECTION 0
> >>
> >>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> >> than a doubleword should be padded upward or downward.  You could
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 319182e94d9..9fba893a27a 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value)
> >>  static bool
> >>  rs6000_gen_pic_addr_diff_vec (void)
> >>  {
> >> -  return rs6000_relative_jumptables;
> >> +  return rs6000_relative_jumptables
> >> +|| (JUMP_TABLES_IN_TEXT_SECTION
> >> +&& targetm.asm_out.reloc_rw_mask () == 3);
> >>  }
> >
> > This seems like contorted logic and overriding the
> > rs6000_relative_jumptables option change.  The later part of the patch
> > overrides rs6000_relative_jumptables for all rs6000 configurations,
> > and then changes this one use of rs6000_relative_jumptables to add
> > more logic to revert to the old meaning for some targets.
> >
> > What about all of the other uses of rs6000_relative_jumptables in the
> > target?  What about rs6000.md?
> >
> > I highly doubt that this patch is correct.
> >
> > Why not override rs6000_relative_jumptables for PPC64 Linux instead of
> > changing its value globally and then trying to fix it up?
> >
> > Thanks, David
>   Thanks for your comments.
>
>   In this patch, I tried to enable absolute jump table on all rs6000 targets.
> For PPC64 Linux, it supports RELRO section (e.g. .data.rel.ro.local) as
> "JUMP_TABLES_IN_TEXT_SECTION" is set to 0. So, absolute jump tables could be 
> placed
> in RELRO section whatever global relocation is required or not. The absolute 
> jump
> table can't be placed in text section when global relocation is required as 
> text
> section is not writable. So for other rs6000 targets, absolute jump table 
> can't be
> used if the target doesn't support RELRO and global relocation is required 
> also.
>
>   Looking forward to your advice. Thanks again.

Why not override rs6000_relative_jumptables in
rs6000_option_override_internal() for PPC64 Linux subtarget instead of
changing the default value and then trying to fix the behavior for
other configurations in rs6000_gen_pic_addr_diff_vec()?  Or override
it in SUBSUBTARGET_OVERRIDE_OPTIONS in linux64.h, or whichever header
file(s) is appropriate for the subtarget variants.

Your initial patch also changed rs6000_gen_pic_addr_diff_vec but
didn't address the use of rs6000_relative_jumptables in the definition
of CASE_VECTOR_MODE in rs6000.h.  That cannot have been correct.  At
least without the change to the default value of
rs6000_relative_jumptables you don't need to add kludges to all of the
places where that variable is used for other subtarget variants of the
rs6000 target.

Thanks, David


Re: [PATCH, rs6000] Enable absolute jump table by default

2022-01-12 Thread David Edelsohn via Gcc-patches
On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI  wrote:
>
> Hi,
>This patch enables absolute jump table by default on rs6000. The relative 
> jump tables are used when
>it's explicit set by "rs6000_relative_jumptables",
>or jump tables are placed in text section but global relocation is 
> required.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2022-01-12 Haochen Gui 
>
> gcc/
> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define.
> * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return
> true when relative jump table is explicit required or jump tables have
> to be put in text section but global relocation is also required.
> * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable.
>
> patch.diff
> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h
> index d617f346f81..2e257c60f8c 100644
> --- a/gcc/config/rs6000/linux64.h
> +++ b/gcc/config/rs6000/linux64.h
> @@ -239,7 +239,7 @@ extern int dot_symbols;
>
>  /* Indicate that jump tables go in the text section.  */
>  #undef  JUMP_TABLES_IN_TEXT_SECTION
> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT
> +#define JUMP_TABLES_IN_TEXT_SECTION 0
>
>  /* The linux ppc64 ABI isn't explicit on whether aggregates smaller
> than a doubleword should be padded upward or downward.  You could
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 319182e94d9..9fba893a27a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value)
>  static bool
>  rs6000_gen_pic_addr_diff_vec (void)
>  {
> -  return rs6000_relative_jumptables;
> +  return rs6000_relative_jumptables
> +|| (JUMP_TABLES_IN_TEXT_SECTION
> +&& targetm.asm_out.reloc_rw_mask () == 3);
>  }

This seems like contorted logic and overriding the
rs6000_relative_jumptables option change.  The later part of the patch
overrides rs6000_relative_jumptables for all rs6000 configurations,
and then changes this one use of rs6000_relative_jumptables to add
more logic to revert to the old meaning for some targets.

What about all of the other uses of rs6000_relative_jumptables in the
target?  What about rs6000.md?

I highly doubt that this patch is correct.

Why not override rs6000_relative_jumptables for PPC64 Linux instead of
changing its value globally and then trying to fix it up?

Thanks, David

>
>  void
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index c2a77182a9e..75e3fa86829 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -630,7 +630,7 @@ Target Mask(MMA) Var(rs6000_isa_flags)
>  Generate (do not generate) MMA instructions.
>
>  mrelative-jumptables
> -Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save
> +Target Undocumented Var(rs6000_relative_jumptables) Init(0) Save
>
>  mrop-protect
>  Target Var(rs6000_rop_protect) Init(0)


libgfortran bootstrap failure

2022-01-11 Thread David Edelsohn via Gcc-patches
The recent patch to support Power IEEE128 causes a bootstrap failure
on AIX and possibly all non-GLIBC systems.

+#if defined(__powerpc64__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ \
+&& defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32)
+#define POWER_IEEE128 1
+#endif

__GLIBC_PREREQ is tested on all systems.

/nasfarm/edelsohn/src/src/libgfortran/libgfortran.h:107:49: error:
missing binary operator before token "("
  107 | && defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32)
  | ^


Re: [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2022-01-11 Thread David Edelsohn via Gcc-patches
Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.

Okay.

Thanks, David


Re: [PATCH] PR 102935, Fix pr101384-1.c code generation test.

2022-01-11 Thread David Edelsohn via Gcc-patches
On Tue, Jan 11, 2022 at 12:06 PM Bill Schmidt  wrote:
>
> Hi Mike,
>
> This looks fine to me.  Maintainers?

Okay.

Thanks, David

>
> Thanks,
> Bill
>
> On 1/7/22 6:33 PM, Michael Meissner wrote:
> > Fix pr101384-1.c code generation test.
> >
> > Add support for the compiler using XXSPLTIB reg,255 to load all 1's into a
> > register on power9 and above instead of using VSPLTI{B,H,W} reg,-1.
> >
> > gcc/testsuite/
> > 2022-01-07  Michael Meissner  
> >
> >   PR testsuite/102935
> >   * gcc.target/powerpc/pr101384-1.c: Update insn regexp for power9
> >   and power10.
> > ---
> >  gcc/testsuite/gcc.target/powerpc/pr101384-1.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > index 627d7d76721..41cf84bf8bc 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c
> > @@ -2,7 +2,7 @@
> >  /* { dg-do compile { target le } } */
> >  /* { dg-options "-O2 -maltivec" } */
> >  /* { dg-require-effective-target powerpc_altivec_ok } */
> > -/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } 
> > */
> > +/* { dg-final { scan-assembler-times {\mvspltis[whb] 
> > [^\n\r]*,-1\M|\mxxspltib[^\n\r]*,255\M} 9 } } */
> >  /* { dg-final { scan-assembler-times {\mvslw\M} 3 } } */
> >  /* { dg-final { scan-assembler-times {\mvslh\M} 3 } } */
> >  /* { dg-final { scan-assembler-times {\mvslb\M} 3 } } */


Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-11 Thread David Edelsohn via Gcc-patches
On Tue, Jan 11, 2022 at 2:27 AM Xionghu Luo  wrote:
>
> On 2022/1/11 06:55, David Edelsohn wrote:
> >>> +(define_insn_and_split "sldoi_to_mov_"
> > It would be more consistent with the naming convention to use
> > "sldoi_to_mov" without the final "_".
>
> OK, thanks.
>
> >
> >>> +  [(set (match_operand:VM 0 "altivec_register_operand")
> >>> + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")
> > Should this be "easy_vector_constant_vsldoi"?
>
>
> This doesn't work. easy_vector_constant_vsldoi return false due to
> vspltis_shifted "return 0" as:
>
> vspltis_shifted (rtx op): /* If all elements are equal, we don't need to 
> do VSLDOI.  */
>
>
> (gdb) p op
> $7 = (rtx_def *) (const_vector:V4SI [
> (const_int 0 [0]) repeated x4
> ])
> (gdb) p easy_vector_constant_vsldoi(op, V4SImode)
> $8 = false
> p easy_vector_constant(op, V4SImode)
> $9 = true

Okay, thanks for checking.

>
> >
> >>> + (match_dup 1)
> >>> + (match_operand:VM 2 "u5bit_cint_operand")]
> > This should be match_operand:QI, right?
>
> Yes.

This patch is okay with the other changes.

Thanks, David


Re: Ping: [PATCH] rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]

2022-01-10 Thread David Edelsohn via Gcc-patches
On Mon, Jan 10, 2022 at 12:37 AM Xionghu Luo  wrote:
>
> Ping, thanks.
>
>
> On 2021/12/13 13:16, Xionghu Luo wrote:
> > Add specialized version to combine two instructions from
> >
> >  9: {r123:CC=cmp(r124:DI&0x6,0);clobber scratch;}
> >REG_DEAD r124:DI
> >  10: pc={(r123:CC==0)?L15:pc}
> >   REG_DEAD r123:CC
> >
> > to:
> >
> >  10: {pc={(r123:DI&0x6==0)?L15:pc};clobber scratch;clobber %0:CC;}
> >
> > then split2 will split it to one rotate dot instruction (to save one
> > rotate back instruction) as shifted result doesn't matter when comparing
> > to 0 in CCEQmode.
> >
> > Bootstrapped and regression tested pass on Power 8/9/10, OK for master?
> >
> > gcc/ChangeLog:
> >
> >   PR target/102239
> >   * config/rs6000/rs6000.md (*anddi3_insn_dot): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/102239
> >   * gcc.target/powerpc/pr102239.c: New test.
> > ---
> >  gcc/config/rs6000/rs6000-protos.h   |  1 +
> >  gcc/config/rs6000/rs6000.c  |  7 
> >  gcc/config/rs6000/rs6000.md | 38 +
> >  gcc/testsuite/gcc.target/powerpc/pr102239.c | 13 +++
> >  4 files changed, 59 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102239.c
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 14f6b313105..3644c524376 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -73,6 +73,7 @@ extern int expand_block_move (rtx[], bool);
> >  extern bool expand_block_compare (rtx[]);
> >  extern bool expand_strn_compare (rtx[], int);
> >  extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
> > +extern bool rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode);
> >  extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
> >  extern bool rs6000_is_valid_shift_mask (rtx, rtx, machine_mode);
> >  extern bool rs6000_is_valid_insert_mask (rtx, rtx, machine_mode);
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 5e129986516..57a38cf954a 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -11606,6 +11606,13 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, 
> > machine_mode mode)
> >return true;
> >  }
> >
> > +bool
> > +rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode)
> > +{
> > +  int nb, ne;
> > +  return rs6000_is_valid_mask (mask, &nb, &ne, mode) && nb >= ne && ne > 0;
> > +}
> > +
> >  /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, 
> > rldicl,
> > or rldicr instruction, to implement an AND with it in mode MODE.  */
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 6bec2bddbde..014dc9612ea 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -3762,6 +3762,44 @@ (define_insn_and_split "*and3_2insn_dot2"
> > (set_attr "dot" "yes")
> > (set_attr "length" "8,12")])
> >
> > +(define_insn_and_split "*anddi3_insn_dot"

This pattern needs a name that better represents its purpose.  The
pattern name implies that it's operating on a combination of AND and
Record Condition bit.  Also "insn" is confusing; I think that you are
using the template from the 2insn_dot names, so this should explicitly
be 1insn. Maybe "branch_anddi3_1insn_dot", or just
"branch_anddi3_dot".

> > + [(set (pc)
> > +(if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
> > +   (match_operand:DI 2 "const_int_operand" "n,n"))
> > +   (const_int 0))
> > +   (label_ref (match_operand 3 ""))
> > +   (pc)))
> > +  (clobber (match_scratch:DI 0 "=r,r"))
> > +  (clobber (reg:CC CR0_REGNO))]
> > +  "rs6000_is_valid_rotate_dot_mask (operands[2], DImode)
> > +  && TARGET_POWERPC64"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(pc)]
> > +{
> > +   int nb, ne;
> > +   if (rs6000_is_valid_mask (operands[2], &nb, &ne, DImode)
> > +   && nb >= ne
> > +   && ne > 0)
> > + {
> > + unsigned HOST_WIDE_INT val = INTVAL (operands[2]);
> > + int shift = 63 - nb;
> > + rtx tmp = gen_rtx_ASHIFT (DImode, operands[1], GEN_INT (shift));
> > + tmp = gen_rtx_AND (DImode, tmp, GEN_INT (val << shift));
> > + rtx cr0 = gen_rtx_REG (CCmode, CR0_REGNO);
> > + rs6000_emit_dot_insn (operands[0], tmp, 1, cr0);
> > + rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[3]);
> > + rtx cond = gen_rtx_EQ (CCEQmode, cr0, const0_rtx);
> > + rtx ite = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, loc_ref, pc_rtx);
> > + emit_jump_insn (gen_rtx_SET (pc_rtx, ite));
> > + DONE;
> > + }
> > +   else
> > + FAIL;
> > +}
> > +  [(set_attr "type" "shift")
> > +   (set_attr "dot" "yes")
> > +   (set_attr "length" "8,12")])
> >
> >  (define_expand "3"
> >[(set (match_operand:SDI 0 "gpc_reg_operand")
> > diff --git a/gcc/test

Re: [PATCH] rs6000: Remove useless code related to -mno-power10

2022-01-10 Thread David Edelsohn via Gcc-patches
On Wed, Dec 29, 2021 at 4:37 AM Kewen.Lin  wrote:
>
> Hi,
>
> Option -mpower10 was made as "WarnRemoved" since commit r11-2318,
> so -mno-power10 doesn't take effect any more.  This patch is to
> remove one line useless code which still respects it.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): 
> Remove
> useless related to option -mno-power10.
> ---
>  gcc/config/rs6000/rs6000.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e82a47f4c0e..66b01e589b0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -24825,7 +24825,6 @@ rs6000_disable_incompatible_switches (void)
>  const HOST_WIDE_INT dep_flags; /* flags that depend on this option.  
> */
>  const char *const name;/* name of the switch.  */
>} flags[] = {
> -{ OPTION_MASK_POWER10, OTHER_POWER10_MASKS,"power10"   },
>  { OPTION_MASK_P9_VECTOR,   OTHER_P9_VECTOR_MASKS,  "power9-vector" },
>  { OPTION_MASK_P8_VECTOR,   OTHER_P8_VECTOR_MASKS,  "power8-vector" },
>  { OPTION_MASK_VSX, OTHER_VSX_VECTOR_MASKS, "vsx"   },

Okay.

Thanks, David


Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2022-01-10 Thread David Edelsohn via Gcc-patches
On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI  wrote:
>
> Hi,
>
> Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587051.html
>
> Thanks
>
> On 17/12/2021 上午 9:55, HAO CHEN GUI wrote:
> > Hi,
> >This patch defines a new split pattern for TI to V1TI move. The pattern 
> > concatenates two subreg:DI of
> > a TI to a V2DI. With the pattern, the subreg pass can do register split for 
> > TI when there is a TI to V1TI
> > move. The patch optimizes one unnecessary "mr" out on P9. The new test case 
> > illustrates it.
> >
> >Bootstrapped and tested on powerpc64-linux BE and LE with no 
> > regressions. Is this okay for trunk?
> > Any recommendations? Thanks a lot.
> >
> > ChangeLog
> > 2021-12-13 Haochen Gui 
> >
> > gcc/
> >   * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.
> >
> > gcc/testsuite/
> >   * gcc.target/powerpc/pr103124.c: New testcase.
> >
> >
> > patch.diff
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> > index bf033e31c1c..52968eb4609 100644
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
> > [(set_attr "type" "vecperm")
> >  (set_attr "prefixed" "yes")])
> >
> > +;; Construct V1TI by vsx_concat_v2di
> > +(define_split
> > +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> > + (subreg:V1TI
> > +   (match_operand:TI 1 "int_reg_operand") 0 ))]
> > +  "TARGET_P9_VECTOR && !reload_completed"
> > +  [(const_int 0)]
> > +{
> > +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> > +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> > +  rtx tmp3 = gen_reg_rtx (V2DImode);
> > +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> > +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> > +  emit_move_insn (operands[0], tmp4);
> > +  DONE;
> > +})
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > new file mode 100644
> > index 000..e9072d19b8e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-require-effective-target int128 } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > +/* { dg-final { scan-assembler-not "\mmr\M" } } */

Segher probably would prefer {\mmr\M} .

> > +
> > +vector __int128 add (long long a)
> > +{
> > +  vector __int128 b;
> > +  b = (vector __int128) {a};
> > +  return b;
> > +}

This is okay.

Thanks, David


Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-10 Thread David Edelsohn via Gcc-patches
On Mon, Jan 10, 2022 at 12:04 AM Xionghu Luo  wrote:
>
> Gentle ping, thanks.
>
>
> On 2021/12/29 09:27, Xionghu Luo wrote:
> > 7: r120:V4SI=const_vector
> > 8: r121:V4SI=unspec[r120:V4SI,r120:V4SI,0xc] 260
> >
> > with r121:v4SI = r120:V4SI when r120 is a vector with same element.
> >
> > Bootstrapped and regtested pass on powerpc64le-linux-gnu {P10, P9}
> > and powerpc64-linux-gnu {P8, P7}.  OK for master?
> >
> > gcc/ChangeLog:
> >
> >   * config/rs6000/altivec.md (sldoi_to_mov_): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/powerpc/sldoi_to_mov.c: New test.
> > ---
> >  gcc/config/rs6000/altivec.md| 11 +++
> >  gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c | 15 +++
> >  2 files changed, 26 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> >
> > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> > index b2909857c34..25f86dbe828 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -383,6 +383,17 @@ (define_split
> >  }
> >  })
> >
> > +(define_insn_and_split "sldoi_to_mov_"

It would be more consistent with the naming convention to use
"sldoi_to_mov" without the final "_".

> > +  [(set (match_operand:VM 0 "altivec_register_operand")
> > + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")

Should this be "easy_vector_constant_vsldoi"?

> > + (match_dup 1)
> > + (match_operand:VM 2 "u5bit_cint_operand")]

This should be match_operand:QI, right?

Thanks, David

> > + UNSPEC_VSLDOI))]
> > +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0) (match_dup 1))])
> > +
> >  (define_insn "get_vrsave_internal"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> >   (unspec:SI [(reg:SI VRSAVE_REGNO)] UNSPEC_GET_VRSAVE))]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c 
> > b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > new file mode 100644
> > index 000..2053243c456
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +vector signed int foo1 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sum2s(a, b);
> > +}
> > +
> > +vector signed int foo2 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sld(b, b, 4);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 {target le} } } */
>
> --
> Thanks,
> Xionghu


Re: [power-ieee128] OPEN CONV

2022-01-08 Thread David Edelsohn via Gcc-patches
On Sat, Jan 8, 2022 at 1:59 PM Michael Meissner  wrote:
>
> On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote:
> > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote:
> > >
> > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote:
> > > > Note, as for byteswapping, apparently it wasn't ever working right fox
> > > > the IBM extended real(kind=16) and complex(kind=16).
> > >
> > > The lack of bug reports since the conversion feature was introduced in
> > > 2006, more than 15 years ago, tells us something, I guess...
> >
> > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less
> > than that, but still.
> > Either nobody interchanges/shares fortran unformatted data between
> > powerpc big and little endian, or if they do, they don't use real(kind=16)
> > or complex(kind=16) in there...
>
> I still wish I had had the forethought when we were setting up the LE ABI to
> change the default 128-bit format to IEEE instead of IBM.  But alas, I didn't.
> You would still need converters between the big endian IBM format and little
> endian IEEE format, but it would have avoided a lot of the problems where GCC
> assumes there is only one floating point format for each size.

Mike,

The LE ABI initial target was Power8 and IEEE128 hardware support was
added to Power9.  The ABI was a conscious decision. IEEE 128 was not a
viable requirement for the LE ABI at the time of the transition.

Thanks, David


Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8

2022-01-07 Thread David Edelsohn via Gcc-patches
On Fri, Jan 7, 2022 at 3:57 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:40:51PM -0500, David Edelsohn via Gcc-patches 
> wrote:
> > +#ifdef __LITTLE_ENDIAN__
> > +  /* Sum across four integers with two integer results.  */
> > +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> > +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> > + shifts are added that are not needed for this use-case.
> > + A vector shift to correctly position the 32-bit integer results
> > + (currently at [0] and [2]) to [1] and [3] would then need to be
> > + swapped back again since the desired results are two 64-bit
> > + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> > +#else
> >/* Sum across four integers with two integer results.  */
> >result = vec_sum2s (vsum, (__vector signed int) zero);
> >
> > If little-endian adds shifts to correct for the position and
> > big-endian does not, why not use the inline asm without the shifts for
> > both?  It seems confusing to add the inline asm only for LE instead of
> > always using it with the appropriate comment.
> >
> > It's a good and valuable optimization for LE.  Fewer variants are less
> > fragile, easier to test and easier to maintain.  If you're going to go
> > to the trouble of using inline asm for LE, use it for both.
>
> BE (only) _does_ need a shift as seen on the next two lines after the
> code snippet above:
>   /* Sum across four integers with two integer results.  */
>   result = vec_sum2s (vsum, (__vector signed int) zero);
>   /* Rotate the sums into the correct position.  */
>   result = vec_sld (result, result, 6);
>
> So, when using {vec_sum2s;vec_sld}:
> - LE gets an implicit shift in vec_sum2s which just needs to be undone
>   by the vec_sld, and those shifts don't "cancel out" and get removed
>   by GCC.
> - BE does not get any implicit shifts, but needs one that comes from
>   vec_sld.
>
> Are you saying use the asm(vsum2sws) and then conditionally call
> vec_sld on BE only?
>
> I viewed this change as a temporary bandage unless and until GCC can
> remove the unnecessary swaps.  It seems like the preferred code is
> vec_sum2s/vec_sld, not the asm, but that currently is suboptimal for LE.

Nevermind.  I thought that these patches had not been reviewed.

Thanks, David


Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2022-01-07 Thread David Edelsohn via Gcc-patches
On Fri, Jan 7, 2022 at 3:35 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:23:14PM -0500, David Edelsohn wrote:
> > > Power10 ISA added `vextract*` instructions which are realized in the
> > > `vec_extractm` instrinsic.
> > >
> > > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> > > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> > >
> > > 2021-10-21  Paul A. Clarke  
> > >
> > > gcc
> > > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> > > when _ARCH_PWR10.
> > > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> > > (_mm_movemask_epi8): Likewise.
> > > ---
> > > Tested on Power10 powerpc64le-linux (compiled with and without
> > > `-mcpu=power10`).
> > >
> > > OK for trunk?
> >
> > This is okay modulo
> >
> > > + return vec_extractm ((__v16qu) __A);
> >
> > Should the above be __v16qi like x86?
>
> That would match x86 better, but we don't have a function signature
> for vec_extractm which accepts a signed type.

Okay, nevermind.  I thought that vec_extractm also allowed signed.

Thanks, David


Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread David Edelsohn via Gcc-patches
On Fri, Jan 7, 2022 at 3:32 PM Paul A. Clarke  wrote:
>
> On Fri, Jan 07, 2022 at 02:15:22PM -0500, David Edelsohn wrote:
> > > Power10 ISA added `xxblendv*` instructions which are realized in the
> > > `vec_blendv` instrinsic.
> > >
> > > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> > > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> > >
> > > Also, copy a test from i386 for testing `_mm_blendv_ps`.
> > > This should have come with commit 
> > > ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> > > but was inadvertently omitted.
> > >
> > > 2021-10-20  Paul A. Clarke  
> > >
> > > gcc
> > > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> > > when _ARCH_PWR10.
> > > (_mm_blendv_ps): Likewise.
> > > (_mm_blendv_pd): Likewise.
> > >
> > > gcc/testsuite
> > > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> > > adjust dg directives to suit.
> > > ---
> > > Tested on Power10 powerpc64le-linux (compiled with and without
> > > `-mcpu=power10`).
> > >
> > > OK for trunk?
> >
> > This is okay modulo
> >
> > > + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> > > __mask);
> >
> > Should the above be __v16qi like x86?
>
> That does arguably match the types involved (epi8) better.
>
> Shall I change the original implementation as well (4 lines later)?
>
> >   return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask);

vec_blendv supports the signed type, so it seems that the function
should use that type, unless unsigned is preferred because PowerPC
defaults to unsigned char.

I wasn't going to recommend changing the existing code because I don't
know how the signed type interacts with the other builtins.

Thanks, David


Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8

2022-01-07 Thread David Edelsohn via Gcc-patches
+#ifdef __LITTLE_ENDIAN__
+  /* Sum across four integers with two integer results.  */
+  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
+  /* Note: vec_sum2s could be used here, but on little-endian, vector
+ shifts are added that are not needed for this use-case.
+ A vector shift to correctly position the 32-bit integer results
+ (currently at [0] and [2]) to [1] and [3] would then need to be
+ swapped back again since the desired results are two 64-bit
+ integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
+#else
   /* Sum across four integers with two integer results.  */
   result = vec_sum2s (vsum, (__vector signed int) zero);

If little-endian adds shifts to correct for the position and
big-endian does not, why not use the inline asm without the shifts for
both?  It seems confusing to add the inline asm only for LE instead of
always using it with the appropriate comment.

It's a good and valuable optimization for LE.  Fewer variants are less
fragile, easier to test and easier to maintain.  If you're going to go
to the trouble of using inline asm for LE, use it for both.

Thanks, David


Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2022-01-07 Thread David Edelsohn via Gcc-patches
> Power10 ISA added `vextract*` instructions which are realized in the
> `vec_extractm` instrinsic.
>
> Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
>
> 2021-10-21  Paul A. Clarke  
>
> gcc
> * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
> when _ARCH_PWR10.
> * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
> (_mm_movemask_epi8): Likewise.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
>
> OK for trunk?

This is okay modulo

> + return vec_extractm ((__v16qu) __A);

Should the above be __v16qi like x86?

Thanks, David


Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-07 Thread David Edelsohn via Gcc-patches
> Power10 ISA added `xxblendv*` instructions which are realized in the
> `vec_blendv` instrinsic.
>
> Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
>
> Also, copy a test from i386 for testing `_mm_blendv_ps`.
> This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> but was inadvertently omitted.
>
> 2021-10-20  Paul A. Clarke  
>
> gcc
> * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
> when _ARCH_PWR10.
> (_mm_blendv_ps): Likewise.
> (_mm_blendv_pd): Likewise.
>
> gcc/testsuite
> * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
> adjust dg directives to suit.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
>
> OK for trunk?

This is okay modulo

> + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> __mask);

Should the above be __v16qi like x86?

Thanks, David


Re: [PATCH] aix: handle 64bit inodes for include directories

2021-12-30 Thread David Edelsohn via Gcc-patches
Hi, Jeff

Is the revised patch from Clement okay?

Thanks, David

On Tue, Aug 24, 2021 at 3:59 AM CHIGOT, CLEMENT  wrote:
>
> >>> So my worry here is this is really a host property -- ie, this is
> >>> behavior of where GCC runs, not the target for which GCC is generating 
> >>> code.
> >>>
> >>> That implies that the change in aix.h is wrong.  aix.h is for the
> >>> target, not the host -- you don't want to define something like
> >>> HOST_STAT_FOR_64BIT_INODES there.
> >>>
> >>> You'd want to be triggering this behavior via a host fragment, x-aix, or
> >>> better yet via an autoconf test.
> >> Indeed, would this version be better ? I'm not sure about the configure 
> >> test.
> >> But as we are retrieving the size of dev_t and ino_t just above, I'm 
> >> assuming
> >> that the one being used in stat directly. At least, that's the case on 
> >> AIX, and
> >> this test is only made for AIX.
> > It's a clear improvement.  It's still checking for the aix target though:
> >
> > +# Select the right stat being able to handle 64bit inodes, if needed.
> > +if test "$enable_largefile" != no; then
> > +  case "$target" in
> > +*-*-aix*)
> > +  if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" ==
> > 4; then
> > +
> > +$as_echo "#define HOST_STAT_FOR_64BIT_INODES stat64x" >>confdefs.h
> > +
> > +  fi;;
> > +  esac
> > +fi
> >
> > Again, we're dealing with a host property.  You might be able to just
> > change $target above to $host.  Hmm, that makes me wonder about canadian
> > crosses where host != build.We may need to do this for both the aix
> > host and aix build.
>
> Yes, my bad, I've updated the case. I don't know if there is a usual way
> to check both $build and $host. I've tried to avoid code duplication so
> tell me if it's okay or if you'd rather have a case for $build and one
> for $host.
>
> Thanks,
> Clément


Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus

2021-12-20 Thread David Edelsohn via Gcc-patches
On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool
 wrote:
>
> On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote:
> > On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo  wrote:
> > > These four UNSPECS seems could be replaced with native RTL, and why
> > > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
> > > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
> > >
> > >   This bit is sticky; that is, once set to 1 it
> > >   remains set to 1 until it is set to 0 by an
> > >   mtvscr instruction.
> > >
> > > The RTL pattern set it to 0 but final ASM doesn't present it?  And why
> > > not use Clobber VSCR_REGNO instead?
> >
> > The design came from the early implementation of Altivec:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html
> >
> > If one later checks for saturation (reads VSCR), one needs a
> > corresponding SET of the value.  It's set in an architecture-specific
> > manner that isn't described to GCC, but it's set, not just clobbered
> > and in an undefined state.
>
> Well.  RTL clobber and set do exactly the same thing, except with
> clobber it is not specified *what* value is set.  All bits are set, all
> bits are defined.  There is no (direct) way in RTL to say
> "undetermined".
>
> An RTL clobber would work just fine afaics?

I don't know about the original intention from Aldy, but if one were
looking at an RTL dump and the code used the saturation bit from VSCR,
it might be confusing to see a CLOBBER instead of a SET.  The SET
documents that VSCR_REGNO is assigned a specific value; GCC doesn't
know about the semantics, but it's not some undefined bit pattern.
CLOBBER implies a trash value or a value that one will not query
later, i.e., one would want to SET the register to a specific value
before using it.

>
> > The RTL does not describe that VSCR is set to the value 0.  The
> > (const_int 0) is not the value set.  You can think of the (const_int
> > 0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
> > least one argument and the pattern doesn't try to express the
> > argument, so it uses a dummy RTL constant.
>
> Yup.  Traditionally (pc) was used for this.  Nowadays (const_int 0) is
> not really more expensive anymore, and many people find it clearer (but
> not in this case it seems :-) ).
>
> > It's part of a PARALLEL
> > and the plus or minus already expresses the data dependency of the
> > pattern on the input operands.
>
> But they do not describe any dependency on vscr, or output to it.  This
> is the same problem we have with fpscr (most FP insns use some of its
> fields, most set some, but there is no way to cleanly express that).

It describes that VSCR_REGNO is set, an output. It doesn't describe
how it is set nor inform the compiler that the value depends on the
input operands in some complicated way unknown to the compiler, but
the compiler cannot do anything useful with the additional
information.

>
> Explicit clobbers like this help one side of the issue.  For vscr, other
> than the sat bit there is only the nj bit, and we just ignore that :-)
>
> > This patch is okay.  Thanks for updating the machine description and
> > for cleaning up the formatting.
>
> x2.  Thanks!
>
>
> Segher


Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus

2021-12-20 Thread David Edelsohn via Gcc-patches
On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo  wrote:
>
> These four UNSPECS seems could be replaced with native RTL, and why
> "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))"
> in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT):
>
>   This bit is sticky; that is, once set to 1 it
>   remains set to 1 until it is set to 0 by an
>   mtvscr instruction.
>
> The RTL pattern set it to 0 but final ASM doesn't present it?  And why
> not use Clobber VSCR_REGNO instead?

The design came from the early implementation of Altivec:

https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html

If one later checks for saturation (reads VSCR), one needs a
corresponding SET of the value.  It's set in an architecture-specific
manner that isn't described to GCC, but it's set, not just clobbered
and in an undefined state.

The RTL does not describe that VSCR is set to the value 0.  The
(const_int 0) is not the value set.  You can think of the (const_int
0) as a dummy RTL argument to the VSCR UNSPEC.  UNSPEC requires at
least one argument and the pattern doesn't try to express the
argument, so it uses a dummy RTL constant.  It's part of a PARALLEL
and the plus or minus already expresses the data dependency of the
pattern on the input operands.

I'm unsure of the meaning of your question "final ASM doesn't present
it".  The operation on VSCR is implicit and not emitted as an
instruction.  It's in a PARALLEL, which means that the single Altivec
instruction has both effects.  Is that what you were asking?

> Tested pass on P10, OK for master?

This patch is okay.  Thanks for updating the machine description and
for cleaning up the formatting.

> Thanks.
>
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vaddus): Replace
> UNSPEC_VADDU with us_plus.
> (altivec_vaddss): Replace UNSPEC_VADDS with ss_plus.
> (altivec_vsubus): Replace UNSPEC_VSUBU with us_minus.
> (altivec_vsubss): Replace UNSPEC_VSUBS with ss_minus.
> (altivec_abss_): Likewise.
> ---
>  gcc/config/rs6000/altivec.md | 29 ++---
>  1 file changed, 10 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index a057218aa28..b2909857c34 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -29,8 +29,6 @@ (define_c_enum "unspec"
> UNSPEC_VMHADDSHS
> UNSPEC_VMHRADDSHS
> UNSPEC_VADDCUW
> -   UNSPEC_VADDU
> -   UNSPEC_VADDS
> UNSPEC_VAVGU
> UNSPEC_VAVGS
> UNSPEC_VMULEUB
> @@ -61,8 +59,6 @@ (define_c_enum "unspec"
> UNSPEC_VSR
> UNSPEC_VSRO
> UNSPEC_VSUBCUW
> -   UNSPEC_VSUBU
> -   UNSPEC_VSUBS
> UNSPEC_VSUM4UBS
> UNSPEC_VSUM4S
> UNSPEC_VSUM2SWS
> @@ -517,9 +513,8 @@ (define_insn "altivec_vaddcuw"
>
>  (define_insn "altivec_vaddus"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -   (match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VADDU))
> +(us_plus:VI (match_operand:VI 1 "register_operand" "v")
> +   (match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>""
>"vaddus %0,%1,%2"
> @@ -527,9 +522,8 @@ (define_insn "altivec_vaddus"
>
>  (define_insn "altivec_vaddss"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VADDS))
> +(ss_plus:VI (match_operand:VI 1 "register_operand" "v")
> +   (match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>"VECTOR_UNIT_ALTIVEC_P (mode)"
>"vaddss %0,%1,%2"
> @@ -563,9 +557,8 @@ (define_insn "altivec_vsubcuw"
>
>  (define_insn "altivec_vsubus"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VSUBU))
> +(us_minus:VI (match_operand:VI 1 "register_operand" "v")
> +(match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
>"VECTOR_UNIT_ALTIVEC_P (mode)"
>"vsubus %0,%1,%2"
> @@ -573,9 +566,8 @@ (define_insn "altivec_vsubus"
>
>  (define_insn "altivec_vsubss"
>[(set (match_operand:VI 0 "register_operand" "=v")
> -(unspec:VI [(match_operand:VI 1 "register_operand" "v")
> -(match_operand:VI 2 "register_operand" "v")]
> -  UNSPEC_VSUBS))
> +(ss_minus:VI (match_operand:VI 1 "register_operand" "v")
> +(match_operand:VI 2 "register_operand" "v")))
> (set (reg:SI VSCR_REGNO) (unspec:S

Re: [PATCH, rs6000] Implement mffscrni pattern

2021-12-20 Thread David Edelsohn via Gcc-patches
On Mon, Dec 20, 2021 at 12:56 AM HAO CHEN GUI  wrote:
>
> Hi,
>   I modified the patch according to David and Segher's advice.
>
>   This patch defines a pattern for mffscrni. If the RN is a constant, it can 
> call
> gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype 
> for builtin arguments.
> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
> is DI while its
> corresponding builtin has a const int argument. The patch also fixed it.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-12-17 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (u2bit_cint_operand): Defined.
> * config/rs6000/rs6000-call.c
> (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
> it's a constant. The pattern for constant can be recognized now.
> * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined.
> (rs6000_mffscrni): Defined.
> (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
> Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].
>
> gcc/testsuite/
> * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
> * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
> test mffscrn and mffscrni separately.

This revised patch is okay.

Thanks, David

>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index f216ffdf410..b10b4ce6065 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand"
>(and (match_code "const_int")
> (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
>
> +;; Return 1 if op is an unsigned 2-bit constant integer.
> +(define_predicate "u2bit_cint_operand"
> +  (and (match_code "const_int")
> +   (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3")))
> +
>  ;; Return 1 if op is a unsigned 3-bit constant integer.
>  (define_predicate "u3bit_cint_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index d9736eaf21c..81261a0f24d 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code 
> icode, tree exp)
>   compile time if the argument is a variable.  The least significant two
>   bits of the argument, regardless of type, are used to set the rounding
>   mode.  All other bits are ignored.  */
> -  if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode))
> +  if (CONST_INT_P (op0))
>  {
> -  error ("Argument must be a value between 0 and 3.");
> -  return const0_rtx;
> +  if (!const_0_to_3_operand (op0, VOIDmode))
> +   {
> + error ("Argument must be a value between 0 and 3.");
> + return const0_rtx;
> +   }
>  }
> -
> -  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
> +  else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
>  op0 = copy_to_mode_reg (mode0, op0);
>
>pat = GEN_FCN (icode) (op0);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bddbde..b18746af7ea 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -177,6 +177,7 @@ (define_c_enum "unspecv"
> UNSPECV_MFFS; Move from FPSCR
> UNSPECV_MFFSL   ; Move from FPSCR light instruction version
> UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode
> +   UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm
> UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode
> UNSPECV_MTFSF   ; Move to FPSCR Fields 8 to 15
> UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7
> @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn"
> "mffscrn %0,%1"
>[(set_attr "type" "fp")])
>
> +(define_insn "rs6000_mffscrni"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +   (unspec_volatile:DF [(match_operand:SI 1 "u2bit_cint_operand" "n")]
> +   UNSPECV_MFFSCRNI))]
> +   "TARGET_P9_MISC"
> +   "mffscrni %0,%1"
> +  [(set_attr "type" "fp")])
> +
>  (define_insn "rs6000_mffscdrn"
>[(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN))
> @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn"
>[(set_attr "type" "fp")])
>
>  (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:DI 0 "reg_or_cint_operand")]
> + [(match_operand:SI 0 "reg_or_cint_operand")]
>"TARGET_HARD_FLOAT"
>  {
>rtx tmp_df = gen_reg_rtx (DFmode);
> @@ -6333,9 +6342,15 @@ (define_expand "rs6000_set_fpscr_rn"
>   new rounding mode bits from operands[0][62:63] into

Re: [PATCH, rs6000] Implement mffscrni pattern

2021-12-17 Thread David Edelsohn via Gcc-patches
On Thu, Dec 16, 2021 at 9:43 PM HAO CHEN GUI  wrote:
>
> Hi,
>This patch defines a pattern for mffscrni. If the RN is a constant, it can 
> call
> gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype 
> for builtin arguments.
> The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument 
> is DI while its
> corresponding builtin has a const int argument. The patch also fixed it.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.

Hi, Haochen

I have a question about the mode of the input operand in the new pattern below.

>
> ChangeLog
> 2021-12-17 Haochen Gui 
>
> gcc/
> * config/rs6000/predicates.md (u2bit_cint_operand): Defined.
> * config/rs6000/rs6000-call.c
> (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if
> it's a constant. The pattern for constant can be recognized now.
> * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined.
> (rs6000_mffscrni): Defined.
> (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI.
> Call gen_rs6000_mffscrni when operand[0] is a const int[0,3].
>
> gcc/testsuite/
> * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni.
> * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to
> test mffscrn and mffscrni separately.
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index f216ffd..b10b4ce 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand"
>(and (match_code "const_int")
> (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1")))
>
> +;; Return 1 if op is an unsigned 2-bit constant integer.
> +(define_predicate "u2bit_cint_operand"
> +  (and (match_code "const_int")
> +   (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3")))
> +
>  ;; Return 1 if op is a unsigned 3-bit constant integer.
>  (define_predicate "u3bit_cint_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index d9736ea..81261a0 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code 
> icode, tree exp)
>   compile time if the argument is a variable.  The least significant two
>   bits of the argument, regardless of type, are used to set the rounding
>   mode.  All other bits are ignored.  */
> -  if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode))
> +  if (CONST_INT_P (op0))
>  {
> -  error ("Argument must be a value between 0 and 3.");
> -  return const0_rtx;
> +  if (!const_0_to_3_operand (op0, VOIDmode))
> +   {
> + error ("Argument must be a value between 0 and 3.");
> + return const0_rtx;
> +   }
>  }
> -
> -  if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
> +  else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0))
>  op0 = copy_to_mode_reg (mode0, op0);
>
>pat = GEN_FCN (icode) (op0);
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6bec2bd..291396c 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -177,6 +177,7 @@ (define_c_enum "unspecv"
> UNSPECV_MFFS; Move from FPSCR
> UNSPECV_MFFSL   ; Move from FPSCR light instruction version
> UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode
> +   UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm
> UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode
> UNSPECV_MTFSF   ; Move to FPSCR Fields 8 to 15
> UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7
> @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn"
> "mffscrn %0,%1"
>[(set_attr "type" "fp")])
>
> +(define_insn "rs6000_mffscrni"
> +  [(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> +   (unspec_volatile:DF [(match_operand:DF 1 "u2bit_cint_operand" "n")]

Why is this input operand 1 DFmode?  This is a 2 bit integer value.
This pattern is called from rs6000_set_fpscr_rn with an SImode
operand, and it seems that this should be SImode as well.

Thanks, David

> +   UNSPECV_MFFSCRNI))]
> +   "TARGET_P9_MISC"
> +   "mffscrni %0,%1"
> +  [(set_attr "type" "fp")])
> +
>  (define_insn "rs6000_mffscdrn"
>[(set (match_operand:DF 0 "gpc_reg_operand" "=d")
> (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN))
> @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn"
>[(set_attr "type" "fp")])
>
>  (define_expand "rs6000_set_fpscr_rn"
> - [(match_operand:DI 0 "reg_or_cint_operand")]
> + [(match_operand:SI 0 "reg_or_cint_operand")]
>"TARGET_HARD_FLOAT"
> 

Re: [PATCH v4 0/6] __builtin_dynamic_object_size

2021-12-17 Thread David Edelsohn via Gcc-patches
Siddhesh,

This patch series seems to have caused testsuite regressions for
memcpy-chk, etc. in 32 bit mode (i386, x86-64 -m32 and -mx32, AIX 32
bit).

I have opened PR 103759.

Thanks, David


Re: [PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in infrastructure at once, I added _x as a
> suffix to two arrays to disambiguate the old and new versions.  Time to fix
> that also.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-06  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (altivec_resolve_overloaded_builtin): Likewise.  Also rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (rs6000_builtin_is_supported): Likewise.
> (rs6000_gimple_fold_mma_builtin): Likewise.  Also rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to
> rs6000_builtin_info.
> (cpu_expand_builtin): Likewise.
> (rs6000_expand_builtin): Likewise.
> (rs6000_init_builtins): Likewise.  Also rename rs6000_builtin_decls_x
> to rs6000_builtin_decls.
> (rs6000_builtin_decl): Rename rs6000_builtin_decls_x to
> rs6000_builtin_decls.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In generated 
> code,
> rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_bif_static_init): In generated code, rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_bif_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls, and rename
> rs6000_builtin_info_x to rs6000_builtin_info.
> (write_init_ovld_table): In generated code, rename
> rs6000_builtin_decls_x to rs6000_builtin_decls.
> (write_init_file): Likewise.
> * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function):
> Likewise.
> (rs6000_builtin_md_vectorized_function): Likewise.
> (rs6000_builtin_reciprocal): Likewise.
> (add_condition_to_bb): Likewise.
> (rs6000_atomic_assign_expand_fenv): Likewise.

Okay.

Thanks, David


Re: [PATCH 5/6] rs6000: Rename functions with "new" in their names

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> While we had two sets of built-in functionality at the same time, I put "new"
> in the names of quite a few functions.  Time to undo that.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin):
> Remove forward declaration.
> (rs6000_new_builtin_type_compatible): Rename to
> rs6000_builtin_type_compatible.
> (rs6000_builtin_type_compatible): Remove.
> (altivec_resolve_overloaded_builtin): Remove.
> (altivec_build_new_resolved_builtin): Rename to
> altivec_build_resolved_builtin.
> (altivec_resolve_new_overloaded_builtin): Rename to
> altivec_resolve_overloaded_builtin.  Remove static keyword.  Adjust
> called function names.
> * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove
> forward declaration.
> (rs6000_gimple_fold_new_builtin): Likewise.
> (rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin.
> (rs6000_gimple_fold_builtin): Remove.
> (rs6000_new_builtin_valid_without_lhs): Rename to
> rs6000_builtin_valid_without_lhs.
> (rs6000_new_builtin_is_supported): Rename to
> rs6000_builtin_is_supported.
> (rs6000_gimple_fold_new_mma_builtin): Rename to
> rs6000_gimple_fold_mma_builtin.
> (rs6000_gimple_fold_new_builtin): Rename to
> rs6000_gimple_fold_builtin.  Remove static keyword.  Adjust called
> function names.
> (rs6000_expand_builtin): Remove.
> (new_cpu_expand_builtin): Rename to cpu_expand_builtin.
> (new_mma_expand_builtin): Rename to mma_expand_builtin.
> (new_htm_spr_num): Rename to htm_spr_num.
> (new_htm_expand_builtin): Rename to htm_expand_builtin.  Change name
> of called function.
> (rs6000_expand_new_builtin): Rename to rs6000_expand_builtin.  Remove
> static keyword.  Adjust called function names.
> (rs6000_new_builtin_decl): Rename to rs6000_builtin_decl.  Remove
> static keyword.
> (rs6000_builtin_decl): Remove.
> * config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code,
> rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported.
> * config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename
> to rs6000_invalid_builtin.
> * config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function):
> Rename to rs6000_builtin_vectorized_function.
> (rs6000_new_builtin_md_vectorized_function): Rename to
> rs6000_builtin_md_vectorized_function.
> (rs6000_builtin_vectorized_function): Remove.
> (rs6000_builtin_md_vectorized_function): Remove.

Okay.

Thanks, David


Re: [PATCH 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> The old rs6000-builtin.def file is no longer needed.  Remove it and the code
> that depends on it.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin.def: Delete.
> * config/rs6000/rs6000-call.c (builtin_compatibility): Delete.
> (builtin_description): Delete.
> (builtin_hash_struct): Delete.
> (builtin_hasher): Delete.
> (builtin_hash_table): Delete.
> (builtin_hasher::hash): Delete.
> (builtin_hasher::equal): Delete.
> (rs6000_builtin_info_type): Delete.
> (rs6000_builtin_info): Delete.
> (bdesc_compat): Delete.
> (bdesc_3arg): Delete.
> (bdesc_4arg): Delete.
> (bdesc_dst): Delete.
> (bdesc_2arg): Delete.
> (bdesc_altivec_preds): Delete.
> (bdesc_abs): Delete.
> (bdesc_1arg): Delete.
> (bdesc_0arg): Delete.
> (bdesc_htm): Delete.
> (bdesc_mma): Delete.
> (rs6000_overloaded_builtin_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> (htm_spr_num): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_gimple_fold_mma_builtin): Delete.
> (gt-rs6000-call.h): Remove include directive.
> * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete.
> (rs6000_builtin_is_supported_p): Delete.
> (rs6000_overloaded_builtin_name): Delete.
> * config/rs6000/rs6000.c (rs6000_builtin_decls): Delete.
> (rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT.
> * config/rs6000/rs6000.h (rs6000_builtins): Delete.
> (altivec_builtin_types): Delete.
> (rs6000_builtin_decls): Delete.
> * config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def.

Okay.

Thanks, David


Re: [PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just renames a file and updates the build machinery accordingly.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-builtin-new.def: Rename to...
> * config/rs6000/rs6000-builtins.def: ...this.
> * config/rs6000/rs6000-gen-builtins.c: Adjust header commentary.
> * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename
> rs6000-builtin-new.def to rs6000-builtins.def.
> (rs6000-builtins.c): Likewise.

Okay.

Thanks, David


Re: [PATCH 2/6] rs6000: Remove altivec_overloaded_builtins array and initialization

2021-12-14 Thread David Edelsohn via Gcc-patches
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt  wrote:
>
> Hi!
>
> This patch just removes the huge altivec_overloaded_builtins array.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
> 2021-12-02  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Remove.
> * config/rs6000/rs6000.h (altivec_overloaded_builtins): Remove.

Okay.

Thanks, David


Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 3:38 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for scalars on power10.
> >
> > This patch implements XXSPLTIDP support for SF, and DF scalar constants.
> > The previous patch added support for vector constants.  This patch adds
> > the support for SFmode and DFmode scalar constants.
> >
> > I added 2 new tests to test loading up SF and DF scalar constants.
>
>
> ok
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
> >   (UNSPEC_XXSPLTIW_CONST): New unspec.
> >   (movsf_hardfloat): Add support for generating XXSPLTIDP.
> >   (mov_hardfloat32): Likewise.
> >   (mov_hardfloat64): Likewise.
> >   (xxspltidp__internal): New insns.
> >   (xxspltiw__internal): New insns.
> >   (splitters for SF/DFmode): Add new splitters for XXSPLTIDP.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> > ---
>
> ok
>
>
> >  gcc/config/rs6000/rs6000.md   | 97 +++
> >  .../powerpc/vec-splat-constant-df.c   | 60 
> >  .../powerpc/vec-splat-constant-sf.c   | 60 
> >  3 files changed, 199 insertions(+), 18 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 3a7bcd2426e..4122acb98cf 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -156,6 +156,8 @@ (define_c_enum "unspec"
> > UNSPEC_PEXTD
> > UNSPEC_HASHST
> > UNSPEC_HASHCHK
> > +   UNSPEC_XXSPLTIDP_CONST
> > +   UNSPEC_XXSPLTIW_CONST
> >])
> >
> >  ;;
> > @@ -7764,17 +7766,17 @@ (define_split
> >  ;;
> >  ;;   LWZ  LFSLXSSP   LXSSPX STFS   STXSSP
> >  ;;   STXSSPX  STWXXLXOR  LI FMRXSCPSGNDP
> > -;;   MR   MT  MF   NOP
> > +;;   MR   MT  MF   NOPXXSPLTIDP
> >
> >  (define_insn "movsf_hardfloat"
> >[(set (match_operand:SF 0 "nonimmediate_operand"
> >"=!r,   f, v,  wa,m, wY,
> > Z, m, wa, !r,f, wa,
> > -   !r,*c*l,  !r, *h")
> > +   !r,*c*l,  !r, *h,wa")
> >   (match_operand:SF 1 "input_operand"
> >"m, m, wY, Z, f, v,
> > wa,r, j,  j, f, wa,
> > -   r, r, *h, 0"))]
> > +   r, r, *h, 0, eP"))]
> >"(register_operand (operands[0], SFmode)
> > || register_operand (operands[1], SFmode))
> > && TARGET_HARD_FLOAT
> > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat"
> > mr %0,%1
> > mt%0 %1
> > mf%1 %0
> > -   nop"
> > +   nop
> > +   #"
> >[(set_attr "type"
> >   "load,   fpload,fpload, fpload,fpstore,   fpstore,
> >fpstore,store, veclogical, integer,   fpsimple,  fpsimple,
> > -  *,  mtjmpr,mfjmpr, *")
> > +  *,  mtjmpr,mfjmpr, *, vecperm")
> > (set_attr "isa"
> >   "*,  *, p9v,p8v,   *, p9v,
> >p8v,*, *,  *, *, *,
> > -  *,  *, *,  *")])
> > +  *,  *, *,  *, p10")])
> >
> >  ;;   LWZ  LFIWZX STWSTFIWX MTVSRWZMFVSRWZ
> >  ;;   FMR  MR MT%0   MF%1   NOP
> > @@ -8064,18 +8067,18 @@ (define_split
> >
> >  ;;   STFD LFD FMR LXSDSTXSD
> >  ;;   LXSD STXSD   XXLOR   XXLXOR  GPR<-0
> > -;;   LWZ  STW MR
> > +;;   LWZ  STW MR  XXSPLTIDP
> >
> >
> >  (define_insn "*mov_hardfloat32"
> >[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> >  "=m,  d,  d,  ,   wY,
> >,   Z,  ,  ,  !r,
> > -  Y,  r,  !r")
> > +  Y,  r,  !r, wa")
> >   (match_operand:FMOVE64 1 "input_operand"
> >   "d,  m,  d,  wY, ,
> >Z,  ,   ,  ,  ,
> > -  r,  Y,  r"))]
> > +  r,  Y,  r,  eP"))]
> >"! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> > && (gpc_reg_operand (operands[0], mode)
> > || gpc_reg_operand (operands[1], mode))"
> > @@ -8092,20 +8095,21 @@ (define_insn "*mov

Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 3:24 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for vectors on power10.
> >
> > This patch implements XXSPLTIDP support for all vector constants.  The
> > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a 
> > vector
> > of two DFmode constants.  The immediate is in SFmode format, so only 
> > constants
> > that fit as SFmode values can be loaded with XXSPLTIDP.
> >
> > The constraint (eP) added in the previous patch for XXSPLTIW is also used
> > for XXSPLTIDP.
> >
>
> ok
>
>
> > DImode scalar constants are not handled.  This is due to the majority of 
> > DImode
> > constants will be in the GPR registers.  With vector registers, you have the
> > problem that XXSPLTIDP splats the double word into both elements of the
> > vector.  However, if TImode is loaded with an integer constant, it wants a 
> > full
> > 128-bit constant.
>
> This may be worth as adding to a todo somewhere in the code.
>
> >
> > SFmode and DFmode scalar constants are not handled in this patch.  The
> > support for for those constants will be in the next patch.
>
> ok
>
> >
> > I have added a temporary switch (-msplat-float-constant) to control whether 
> > or
> > not the XXSPLTIDP instruction is generated.
> >
> > I added 2 new tests to test loading up V2DI and V2DF vector constants.
>
>
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIDP.
> >   (vsx_prefixed_constant): Likewise.
> >   (easy_vector_constant): Likewise.
> >   * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> >   New declaration.
> >   * config/rs6000/rs6000.c (output_vec_const_move): Add support for
> >   generating XXSPLTIDP.
> >   (prefixed_xxsplti_p): Likewise.
> >   (constant_generates_xxspltidp): New function.
> >   * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> >   regex for power10.
> >   * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> > ---
>
>
> ok
>
> >  gcc/config/rs6000/predicates.md   |   9 ++
> >  gcc/config/rs6000/rs6000-protos.h |   1 +
> >  gcc/config/rs6000/rs6000.c| 108 ++
> >  gcc/config/rs6000/rs6000.opt  |   4 +
> >  .../powerpc/pr86731-fwrapv-longlong.c |   9 +-
> >  .../powerpc/vec-splat-constant-v2df.c |  64 +++
> >  .../powerpc/vec-splat-constant-v2di.c |  50 
> >  7 files changed, 241 insertions(+), 4 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> >
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index ed6252bd0c4..d748b11857c 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> >
> >if (constant_generates_xxspltiw (&vsx_const))
> >   return true;
> > +
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > + return true;
> >  }
> >
> >/* Otherwise consider floating point constants hard, so that the
> > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
> >if (constant_generates_xxspltiw (&vsx_const))
> >  return true;
> >
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > +return true;
> > +
> >return false;
> >  })
> >
> > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> >
> > if (constant_generates_xxspltiw (&vsx_const))
> >   return true;
> > +
> > +   if (constant_generates_xxspltidp (&vsx_const))
> > + return true;
> >   }
>
>
> ok
>
> >
> >if (TARGET_P9_VECTOR
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 99c6a671289..2d28df7442d 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, 
> > machine_mode,
> >  vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> >  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index be24f56eb31..8fde48cf2b3 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx 

Re: [PATCH 3/5] Add Power10 XXSPLTIW

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:50 PM will schmidt  wrote:
>
> On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote:
> > Generate XXSPLTIW on power10.
> >
>
> Hi,
>
>
> > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
> > instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
> > adding support for vector constants that can be used, and adding a
> > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
> >
> > The eP constraint was added to recognize constants that can be loaded into
> > vector registers with a single prefixed instruction.
>
> Perhaps Swap "... the eP constraint was added ..."  for "Add the eP
> constraint to ..."
>
>
> >
> > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
> > constants.
>
>
> >
> > 2021-11-05  Michael Meissner  
> >
> > gcc/
> >
> >   * config/rs6000/constraints.md (eP): Update comment.
> >   * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >   generating XXSPLTIW.
> >   (vsx_prefixed_constant): New predicate.
> >   (easy_vector_constant): Add support for
> >   generating XXSPLTIW.
> >   * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
> >   declaration.
> >   (constant_generates_xxspltiw): Likewise.
> >   * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
> >   XXSPLTIW, don't do XXSPLTIB and sign extend.
>
> Perhaps just 'generate XXSPLTIW if possible'.
>
> >   (output_vec_const_move): Add support for XXSPLTIW.
> >   (prefixed_xxsplti_p): New function.
> >   (constant_generates_xxspltiw): New function.
> >   * config/rs6000/rs6000.md (prefixed attribute): Add support to
> >   mark XXSPLTI* instructions as being prefixed.
> >   * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
> >   switch.
> >   * config/rs6000/vsx.md (vsx_mov_64bit): Add support for
> >   generating XXSPLTIW or XXSPLTIDP.
> >   (vsx_mov_32bit): Likewise.
> >   * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
> >   eP constraint.
> >
> > gcc/testsuite/
> >
> >   * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
> >   * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
> >   * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
> > ---
> >  gcc/config/rs6000/constraints.md  |  6 ++
> >  gcc/config/rs6000/predicates.md   | 46 ++-
> >  gcc/config/rs6000/rs6000-protos.h |  2 +
> >  gcc/config/rs6000/rs6000.c| 81 +++
> >  gcc/config/rs6000/rs6000.md   |  5 ++
> >  gcc/config/rs6000/rs6000.opt  |  4 +
> >  gcc/config/rs6000/vsx.md  | 28 +++
> >  gcc/doc/md.texi   |  4 +
> >  .../powerpc/vec-splat-constant-v16qi.c| 27 +++
> >  .../powerpc/vec-splat-constant-v4sf.c | 67 +++
> >  .../powerpc/vec-splat-constant-v4si.c | 51 
> >  .../powerpc/vec-splat-constant-v8hi.c | 62 ++
> >  .../gcc.target/powerpc/vec-splati-runnable.c  |  4 +-
> >  13 files changed, 369 insertions(+), 18 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
> >
> > diff --git a/gcc/config/rs6000/constraints.md 
> > b/gcc/config/rs6000/constraints.md
> > index e72132b4c28..a4b05837fa6 100644
> > --- a/gcc/config/rs6000/constraints.md
> > +++ b/gcc/config/rs6000/constraints.md
> > @@ -213,6 +213,12 @@ (define_constraint "eI"
> >"A signed 34-bit integer constant if prefixed instructions are 
> > supported."
> >(match_operand 0 "cint34_operand"))
> >
> > +;; A SF/DF scalar constant or a vector constant that can be loaded into 
> > vector
> > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
> > +(define_constraint "eP"
> > +  "A constant that can be loaded into a VSX register with one prefixed 
> > insn."
> > +  (match_operand 0 "vsx_prefixed_constant"))
> > +
> >  ;; A TF/KF scalar constant or a vector constant that can load certain IEEE
> >  ;; 128-bit constants into vector registers using LXVKQ.
> >  (define_constraint "eQ"
> > diff --git a/gcc/config/rs6000/predicates.md 
> > b/gcc/config/rs6000/predicates.md
> > index e0d1c718e9f..ed6252bd0c4 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant"
> >vec_const_128bit_type vsx_const;
> >if (TARGET_POWER10 && v

Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:13 PM Michael Meissner  wrote:
>
> On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote:
> > On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote:
> > > Add new constant data structure.
> > >
> > > This patch provides the data structure and function to convert a
> > > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to
> > > an array of bytes, half-words, words, and  double words that can be loaded
> > > into a 128-bit vector register.
> > >
> > > The next patches will use this data structure to generate code that
> > > generates load of the vector/floating point registers using the XXSPLTIDP,
> > > XXSPLTIW, and LXVKQ instructions that were added in power10.
> > >
> > > 2021-11-05  Michael Meissner  
> > >
>
> Whoops, it should be meiss...@linux.ibm.com.
>
> > comment to be explicit on the structure name being copied to/from.
> > (vec_const_128bit_type is easy to search for, vector or constant or
> > structure are not as unique)
>
> Yes, the original name was more generic (rs6000_const).  Originally it could
> potentially handle vector constants that were greater than 128-bits if we ever
> have support for larger vectors.  But I thought that extra generallity 
> hindered
> the code (since you had to check whether the size was exactly 128-bits, etc.).
> So I made the data structure tailored to the problem at hand.
>
> > > +
> > > +/* Copy an floating point constant to the vector constant structure.  */
> > > +
> >
> > s/an/a/
>
> Ok.
>
> > > +static void
> > > +constant_fp_to_128bit_vector (rtx op,
> > > + machine_mode mode,
> > > + size_t byte_num,
> > > + vec_const_128bit_type *info)
> > > +{
> > > +  unsigned bitsize = GET_MODE_BITSIZE (mode);
> > > +  unsigned num_words = bitsize / 32;
> > > +  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
> > > +  long real_words[VECTOR_128BIT_WORDS];
> > > +
> > > +  /* Make sure we don't overflow the real_words array and that it is
> > > + filled completely.  */
> > > +  gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0);
> >
> > Not clear to me on the potential to partially fill the real_words
> > array.
>
> At the moment we don't support a 16-bit floating point type in the compiler
> (the Power10 has limited 16-bit floating point support, but we don't make a
> special type for it).  If/when we add the 16-bit floating point, we will
> possibly need to revisit this.
>
> > > +
> > > +  real_to_target (real_words, rtype, mode);
> > > +
> > > +  /* Iterate over each 32-bit word in the floating point constant.  The
> > > + real_to_target function puts out words in endian fashion.  We need
> >
> > Meaning host-endian fashion, or is that meant to be big-endian ?
>
> Real_to_target puts out the 32-bit values in endian fashion.  This data
> structure wants to hold everything in big endian fashion to make checking
> things simpler.
>
> > Perhaps also rephrase or move the comment up to indicate that
> > real_to_target will have placed or has already placed the words in
> >  endian fashion.
> > As stated I was expecting to see a call to real_to_target() below the
> > comment.
>
> Yes, I probably should move the real_to_target call after the comment.
>
> > > +
> > > +  /* Possibly splat the constant to fill a vector size.  */
> >
> >
> > Suggest "Splat the constant to fill a vector size if ..."
>
> Ok.

Okay.

Thanks, David


Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)

2021-12-14 Thread David Edelsohn via Gcc-patches
On Fri, Nov 5, 2021 at 2:01 PM Michael Meissner  wrote:
>
> On Fri, Nov 05, 2021 at 12:52:51PM -0500, will schmidt wrote:
> > > diff --git a/gcc/config/rs6000/predicates.md 
> > > b/gcc/config/rs6000/predicates.md
> > > index 956e42bc514..e0d1c718e9f 100644
> > > --- a/gcc/config/rs6000/predicates.md
> > > +++ b/gcc/config/rs6000/predicates.md
> > > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant"
> > >if (TARGET_VSX && op == CONST0_RTX (mode))
> > >  return 1;
> > >
> > > +  /* Constants that can be generated with ISA 3.1 instructions are easy. 
> > >  */
> >
> > Easy is relative, but OK.
>
> The names of the function is easy_fp_constant.
>
> > > +  vec_const_128bit_type vsx_const;
> > > +  if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const))
> > > +{
> > > +  if (constant_generates_lxvkq (&vsx_const) != 0)
> > > +   return true;
> > > +}
> > > +
> > >/* Otherwise consider floating point constants hard, so that the
> > >   constant gets pushed to memory during the early RTL phases.  This
> > >   has the advantage that double precision constants that can be
> > > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant"
> > > return 0;
> > >  })
> > >
> > > +;; Return 1 if the operand is a special IEEE 128-bit value that can be 
> > > loaded
> > > +;; via the LXVKQ instruction.
> > > +
> > > +(define_predicate "easy_vector_constant_ieee128"
> > > +  (match_code "const_vector,const_double")
> > > +{
> > > +  vec_const_128bit_type vsx_const;
> > > +
> > > +  /* Can we generate the LXVKQ instruction?  */
> > > +  if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10
> > > +  || !TARGET_VSX)
> > > +return false;
> >
> > Presumably all of the checks there are valid.  (Can we have power10
> > without float128_hw or ieee128_constant flags set?)I do notice the
> > addition of an ieee128_constant flag below.
>
> Yes, we can have power10 without float128_hw.  At the moment, 32-bit big 
> endian
> does not enable the 128-bit IEEE instructions.  Also when we are building the
> bits in libgcc that can switch between compiling the software routines and the
> routines used for IEEE hardware, and when we are building the IEEE 128-bit
> software emulation functions we need to explicitly turn off IEEE 128-bit
> hardware support.
>
> Similarly for VSX, if the user explicitly says -mno-vsx, then we can't enable
> this instruction.
>
> > Ok.  I did look at this a bit before it clicked, so would suggest a
> > comment stl "All of the constants that can be loaded by lxvkq will have
> > zero in the bottom 3 words, so ensure those are zero before we use a
> > switch based on the nonzero portion of the constant."
> >
> > It would be fine as-is too.  :-)
>
> Ok.

Okay.

Thanks, David


Re: [PATCH take #2] PR target/43892: Some carry flag (CA) optimizations on PowerPC.

2021-12-14 Thread David Edelsohn via Gcc-patches
Hi, Roger!

Thanks very much for investigating this issue and developing a patch
to leverage this feature of the PowerPC architecture.

2021-12-03  Roger Sayle  

gcc/ChangeLog
PR target/43892
* config/rs6000/rs6000.md (*add3_carry_in_0_2): New
define_insn to recognize commutative form of add3_carry_in_0.
(*add3_geu, *add3_leu, *subf3_carry_in_xx_subf,
*add3_carry_in_addc): New define_insn_and_split patterns.

It might be easier to read if each of the define_insn_and_split
ChangeLog entries were on a separate line and the latter ones said
"Same" or "Likewise", but up to you. Segher can be more pedantic.

gcc/testsuite/ChangeLog
PR target/43892
* gcc.target/powerpc/addcmp.c: New test case.
* gcc.target/powerpc/pr43892.c: New test case.

This patch is okay.

Thanks, David


Re: [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2021-12-13 Thread David Edelsohn via Gcc-patches
On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI  wrote:
>
> Hi,
>This patch defines a new split pattern for TI to V1TI move. The pattern 
> concatenates two subreg:DI of
> a TI to a V2DI, then move the V2DI to V1TI. With the pattern, the subreg pass 
> can do register split for
> TI when there is a TI to V1TI move. The patch optimizes one unnecessary "mr" 
> out on P9. The new
> test case illustrates it.
>
>Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk?
> Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-12-13 Haochen Gui 
>
> gcc/
> * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.
>
> gcc/testsuite/
> * gcc.target/powerpc/pr103124.c: New testcase.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bf033e31c1c..7bca7780735 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
> [(set_attr "type" "vecperm")
>  (set_attr "prefixed" "yes")])
>
> +;; split TI to V1TI move
> +(define_split
> +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> +   (subreg:V1TI
> + (match_operand:TI 1 "int_reg_operand") 0 ))]
> +  "TARGET_P9_VECTOR && !reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> +  rtx tmp3 = gen_reg_rtx (V2DImode);
> +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> +  emit_move_insn (operands[0], tmp4);
> +  DONE;
> +})
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> new file mode 100644
> index 000..724492dbcd2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */

Please don't include the "powerpc" target selector in the
gcc.target/powerpc directory.  Just use lp64.

> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-not "\mmr\M" } } */
> +
> +vector __int128 add (long long a)
> +{
> +  vector __int128 b;
> +  b = (vector __int128) {a};
> +  return b;
> +}

Okay with that change.

Thanks, David


Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)

2021-12-13 Thread David Edelsohn via Gcc-patches
On Mon, Dec 13, 2021 at 10:48 AM Bill Schmidt  wrote:
>
> Hi!
>
> PR103624 observes that we get segfaults for the 64-bit darn builtins when 
> compiled
> on a 32-bit architecture.  The old built-in infrastructure requires 
> TARGET_64BIT, and
> this was missed in the new support.  Moving these two builtins from the 
> [power9]
> stanza to the [power9-64] stanza solves the problem.
>
> Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and tested 
> on
> powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?

Okay.

Thanks, David

>
> Thanks!
> Bill
>
>
> 2021-12-13  Bill Schmidt  
>
> gcc/
> PR target/103624
> * config/rs6000/rs6000-builtin-new.def (__builtin_darn): Move to
> [power9-64] stanza.
> (__builtin_darn_raw): Likewise.
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index 30556e5c7f2..2becd96a36c 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -2799,15 +2799,9 @@
>
>  ; Miscellaneous P9 functions
>  [power9]
> -  signed long long __builtin_darn ();
> -DARN darn {}
> -
>signed int __builtin_darn_32 ();
>  DARN_32 darn_32 {}
>
> -  signed long long __builtin_darn_raw ();
> -DARN_RAW darn_raw {}
> -
>const signed int __builtin_dtstsfi_eq_dd (const int<6>, _Decimal64);
>  TSTSFI_EQ_DD dfptstsfi_eq_dd {}
>
> @@ -2840,6 +2834,12 @@
>void __builtin_altivec_stxvl (vsc, void *, long);
>  STXVL stxvl {}
>
> +  signed long long __builtin_darn ();
> +DARN darn {}
> +
> +  signed long long __builtin_darn_raw ();
> +DARN_RAW darn_raw {}
> +
>const signed int __builtin_scalar_byte_in_set (signed int, signed long 
> long);
>  CMPEQB cmpeqb {}
>
> --
> 2.27.0
>
>


Re: [PATCH] rs6000: Builtins for doubleword compare should be in [power8-vector] (PR103625)

2021-12-13 Thread David Edelsohn via Gcc-patches
On Mon, Dec 13, 2021 at 11:02 AM Bill Schmidt  wrote:
>
> Hi!
>
> PR103625 observes that we ICE when doing vector compares on doublewords.
> The original built-in function support requires Power8 vector support for
> these, but this was missed in the new built-in support.  Moving these
> functions to the [power8-vector] stanza solves the problem.
>
> Tested the fix on a powerpc-e300c3-linux-gnu cross.  Bootstrapped and tested 
> on
> powerpc64le-linux-gnu with no regressions.  Is this okay for trunk?

Okay.

Thanks, David

>
> Thanks!
> Bill
>
>
> 2021-12-13  Bill Schmidt  
>
> gcc/
> PR target/103625
> * config/rs6000/rs6000-builtin-new.def (__builtin_altivec_vcmpequd):
> Move to power8-vector stanza.
> (__builtin_altivec_vcmpequd_p): Likewise.
> (__builtin_altivec_vcmpgtsd): Likewise.
> (__builtin_altivec_vcmpgtsd_p): Likewise.
> (__builtin_altivec_vcmpgtud): Likewise.
> (__builtin_altivec_vcmpgtud_p): Likewise.
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 36 
>  1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def 
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index a020dbbe2fb..bd950f8db36 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -1200,24 +1200,6 @@
>const vull __builtin_altivec_vandc_v2di_uns (vull, vull);
>  VANDC_V2DI_UNS andcv2di3 {}
>
> -  const vsll __builtin_altivec_vcmpequd (vull, vull);
> -VCMPEQUD vector_eqv2di {}
> -
> -  const int __builtin_altivec_vcmpequd_p (int, vsll, vsll);
> -VCMPEQUD_P vector_eq_v2di_p {pred}
> -
> -  const vsll __builtin_altivec_vcmpgtsd (vsll, vsll);
> -VCMPGTSD vector_gtv2di {}
> -
> -  const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll);
> -VCMPGTSD_P vector_gt_v2di_p {pred}
> -
> -  const vsll __builtin_altivec_vcmpgtud (vull, vull);
> -VCMPGTUD vector_gtuv2di {}
> -
> -  const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll);
> -VCMPGTUD_P vector_gtu_v2di_p {pred}
> -
>const vd __builtin_altivec_vnor_v2df (vd, vd);
>  VNOR_V2DF norv2df3 {}
>
> @@ -2221,6 +2203,24 @@
>const vsc __builtin_altivec_vbpermq2 (vsc, vsc);
>  VBPERMQ2 altivec_vbpermq2 {}
>
> +  const vsll __builtin_altivec_vcmpequd (vull, vull);
> +VCMPEQUD vector_eqv2di {}
> +
> +  const int __builtin_altivec_vcmpequd_p (int, vsll, vsll);
> +VCMPEQUD_P vector_eq_v2di_p {pred}
> +
> +  const vsll __builtin_altivec_vcmpgtsd (vsll, vsll);
> +VCMPGTSD vector_gtv2di {}
> +
> +  const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll);
> +VCMPGTSD_P vector_gt_v2di_p {pred}
> +
> +  const vsll __builtin_altivec_vcmpgtud (vull, vull);
> +VCMPGTUD vector_gtuv2di {}
> +
> +  const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll);
> +VCMPGTUD_P vector_gtu_v2di_p {pred}
> +
>const vsll __builtin_altivec_vmaxsd (vsll, vsll);
>  VMAXSD smaxv2di3 {}
>
> --
> 2.27.0
>
>


Re: [PATCH] Modify combine pattern by anding a pseudo with its nonzero bits

2021-11-30 Thread David Edelsohn via Gcc-patches
On Tue, Nov 30, 2021 at 3:46 AM HAO CHEN GUI  wrote:
>
> Hi,
>
> This patch modifies the combine pattern with a helper - 
> change_pseudo_and_mask when recog fails. The helper converts a single pseudo 
> to the pseudo and with a mask if the outer operator is IOR/XOR/PLUS and the 
> inner operator is ASHIFT/LSHIFTRT/AND. The conversion helps match shift + ior 
> pattern.
>
> Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. 
> Is this okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
>
> 2021-11-30 Haochen Gui 
>
> gcc/
> * combine.c (change_pseudo_and_mask): New.
> (recog_for_combine): If recog fails, try again with the pattern
> modified by change_pseudo_and_mask.
>
> gcc/testsuite/
> * gcc.target/powerpc/20050603-3.c: Modify the dump check conditions.
> * gcc.target/powerpc/rlwimi-2.c: Likewise.
>
> patch.diff
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 03e9a780919..c83c0aceb57 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -11539,6 +11539,42 @@ change_zero_ext (rtx pat)
>return changed;
>  }
>
> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
> +   ASHIFT/LSHIFTRT/AND, convert a psuedo to psuedo AND with a mask if its

^^^ spelling mistake in comment: pseudo not psuedo

Thanks, David

> +   nonzero_bits is less than its mode mask.  */
> +static bool
> +change_pseudo_and_mask (rtx pat)
> +{
> +  bool changed = false;
> +
> +  rtx src = SET_SRC (pat);
> +  if ((GET_CODE (src) == IOR
> +   || GET_CODE (src) == XOR
> +   || GET_CODE (src) == PLUS)
> +  && (((GET_CODE (XEXP (src, 0)) == ASHIFT
> +   || GET_CODE (XEXP (src, 0)) == LSHIFTRT
> +   || GET_CODE (XEXP (src, 0)) == AND)
> +  && REG_P (XEXP (src, 1)))
> + || ((GET_CODE (XEXP (src, 1)) == ASHIFT
> +  || GET_CODE (XEXP (src, 1)) == LSHIFTRT
> +  || GET_CODE (XEXP (src, 1)) == AND)
> + && REG_P (XEXP (src, 0)
> +{
> +  rtx *reg = REG_P (XEXP (src, 0))
> +? &XEXP (SET_SRC (pat), 0)
> +: &XEXP (SET_SRC (pat), 1);
> +  machine_mode mode = GET_MODE (*reg);
> +  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
> +  if (nonzero < GET_MODE_MASK (mode))
> +   {
> + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero));
> + SUBST (*reg, x);
> + changed = true;
> +   }
> + }
> +  return changed;
> +}
> +
>  /* Like recog, but we receive the address of a pointer to a new pattern.
> We try to match the rtx that the pointer points to.
> If that fails, we may try to modify or replace the pattern,
> @@ -11586,7 +11622,14 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx 
> *pnotes)
> }
> }
>else
> -   changed = change_zero_ext (pat);
> +   {
> + if (change_pseudo_and_mask (pat))
> +   {
> + maybe_swap_commutative_operands (SET_SRC (pat));
> + changed = true;
> +   }
> + changed |= change_zero_ext (pat);
> +   }
>  }
>else if (GET_CODE (pat) == PARALLEL)
>  {
> diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c 
> b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> index 4017d34f429..e628be11532 100644
> --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> @@ -12,7 +12,7 @@ void rotins (unsigned int x)
>b.y = (x<<12) | (x>>20);
>  }
>
> -/* { dg-final { scan-assembler-not {\mrlwinm} } } */
> +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */
>  /* { dg-final { scan-assembler-not {\mrldic} } } */
>  /* { dg-final { scan-assembler-not {\mrot[lr]} } } */
>  /* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c 
> b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> index bafa371db73..ffb5f9e450f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -2,14 +2,14 @@
>  /* { dg-options "-O2" } */
>
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } 
> } */
>
>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } 
> } */
>
>  /* { dg-fina

Re: aix: adjust installation directories for GCC64

2021-11-24 Thread David Edelsohn via Gcc-patches
On Wed, Sep 15, 2021 at 4:12 AM CHIGOT, CLEMENT  wrote:
>
> As gcc on 64bit for AIX is built with "MULTILIB_MATCHES= .=maix32",
> "-print-multi-directory" and similar flags aren't returning the
> correct directory when used with -maix32: "." is returned instead
> of "ppc32".
> Libgcc installation script needs to be adjust to bypass this
> problem and correctly install 32bit files in a ppc32 subdirectory.
>
> libgcc/ChangeLog:
> 2021-09-03  Clément Chigot  
>
> * config/rs6000/t-slibgcc-aix (SHLIB_INSTALL): Replace
> "$(slibdir)@shlib_slibdir_qual@" by $(inst_libdir).

Hi, Clement

Sorry for the delay.

I believe that this is a change in behavior.  Maybe you don't see it
because you use --enable-version-specific-runtime-libs?  If one uses
that configure option, your patch produces the same results -- it's
explicitly using that version-specific directory.  If one does not use
that configure option, your patch enforces that behavior.  The patch
should not change that behavior.

Based on your question in another email message, I infer that
MULTIOSSUBDIR and shlib_slibdir_qual are defined incorrectly.

Thanks, David


Re: [PATCH 0/3] Add zero cycle move support

2021-11-22 Thread David Edelsohn via Gcc-patches
On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt  wrote:
>
> Hi!
>
> On 11/19/21 8:49 AM, Michael Meissner wrote:
> > The next set of 3 patches add zero cycle move support to the Power10.  Zero
> > cycle moves are where the move to LR/CTR/TAR register that is adjacent to 
> > the
> > jump to LR/CTR/TAR register can be fused together.
> >
> > At the moment, these set of three patches add support for zero cycle moves 
> > for
> > indirect jumps and switch tables using the CTR register.  Potential zero 
> > cycle
> > moves for doing returns are not currently handled.
> >
> > In looking at the code, I discovered that just using zero cycle moves isn't 
> > as
> > helpful unless we can eliminate the add instruction before doing the jump.  
> > I
> > also noticed that the various power10 fusion options are only done if
> > -mcpu=power10.  I added a patch to do the fusion for -mtune=power10 as well.
> >
> > I have done bootstraps and make check with these patches installed on both
> > little endian power9 and little endian power10 systems.  Can I install these
> > patches?
> >
> > The following patches will be posted:
> >
> > 1) Patch to add zero cycle move for indirect jumps and switches.
> >
> > 2) Patch to enable p10 fusion for -mtune=power10 in addition to 
> > -mcpu=power10.
> >
> > 3) Patch to use absolute addresses for switch tables instead of relative
> >addresses if zero cycle fusion is enabled.
> >
> For this last point, I had thought that the plan was to always switch over to
> absolute addresses for switch tables, following the work that Hao Chen did in
> this area.  Am I misremembering?  Hao Chen, can you please remind me where we
> ended up here?

And do the absolute addressing for switch tables changes work on AIX?
I thought that Hao Chen only had done the work for PPC64 Linux ELF
syntax with promises of future changes to accommodate AIX as well.

Thanks, David


Re: [PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-11-21 Thread David Edelsohn via Gcc-patches
On Wed, Nov 17, 2021 at 3:28 AM HAO CHEN GUI  wrote:
>
> Hi,
>
>   The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it 
> is implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it 
> is implemented by quadword byte reverse plus halfword/word byte reverse when 
> p9_vector is set.
>
>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-11-17 Haochen Gui 
>
> gcc/
> * config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use
> xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
> or v4sf when p9_vector is set.
> (altivec_vreve2 for VEC_64): Defined. Implemented by xxswapd.
>
> gcc/testsuite/
> * gcc.target/powerpc/vec_reve_1.c: New test.
> * gcc.target/powerpc/vec_reve_2.c: Likewise.

This is okay.

Please don't send a message that contains the patch as both an inline
message and as an attachment.

Thanks, David


Re: [PATCH] Fix rs6000 predicates.md use of decl_replaceable_p

2021-11-18 Thread David Edelsohn via Gcc-patches
On Thu, Nov 18, 2021 at 2:07 PM Jan Hubicka  wrote:
>
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand"
> > (match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op))
> > && (SYMBOL_REF_LOCAL_P (op)
> > || (op == XEXP (DECL_RTL (current_function_decl), 0)
> > -   && !decl_replaceable_p (current_function_decl)))
> > +   && !decl_replaceable_p (current_function_decl,
> > +   opt_for_fn
> > (current_function_decl,
> > +
> > flag_semantic_interposition
>
> Thanks, missed the use of the predicate here.
> However it is not clear to me why one would care about semantic
> interposition at this level.  It seems to me that one more cares whether
> the symbol must be always resolved locally.  In this case
> cgraph_node::get (current_function_decl)->binds_to_current_def_p 
> (cgraph_node::get (current_function_decl))
> would give right answer (and work for cases like functions in comdat groups)

Hi, Honza

I was trying to fix bootstrap as quickly as possible and used the best
example that I could find.  It definitely can be refined.

Thanks for suggesting a better design. I'll test it.

Thanks, David

>
> Honza
> > && !((DEFAULT_ABI == ABI_AIX
> >   || DEFAULT_ABI == ABI_ELFv2)
> >  && (SYMBOL_REF_EXTERNAL_P (op)


[PATCH] Fix rs6000 predicates.md use of decl_replaceable_p

2021-11-18 Thread David Edelsohn via Gcc-patches
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand"
(match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op))
&& (SYMBOL_REF_LOCAL_P (op)
|| (op == XEXP (DECL_RTL (current_function_decl), 0)
-   && !decl_replaceable_p (current_function_decl)))
+   && !decl_replaceable_p (current_function_decl,
+   opt_for_fn
(current_function_decl,
+
flag_semantic_interposition
&& !((DEFAULT_ABI == ABI_AIX
  || DEFAULT_ABI == ABI_ELFv2)
 && (SYMBOL_REF_EXTERNAL_P (op)


[PATCH] Detect Power10 CPU on AIX

2021-11-18 Thread David Edelsohn via Gcc-patches
For -mcpu=native, GCC needs to detect the processor.  This
patch adds the processor value for Power10.

Suggested by Kevin Alder.

* config/rs6000/driver-rs6000.c (detect_processor_aix): Add
power10.

Bootstrapped on powerpc-ibm-aix7.2.3.0

diff --git a/gcc/config/rs6000/driver-rs6000.c
b/gcc/config/rs6000/driver-rs6000.c
index e23c445b198..4de373da95a 100644
--- a/gcc/config/rs6000/driver-rs6000.c
+++ b/gcc/config/rs6000/driver-rs6000.c
@@ -418,6 +418,9 @@ detect_processor_aix (void)
 case 0x2:
   return "power9";

+case 0x4:
+  return "power10";
+
 default:
   return "powerpc";
 }


Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins

2021-11-17 Thread David Edelsohn via Gcc-patches
On Wed, Nov 17, 2021 at 3:02 PM Segher Boessenkool
 wrote:
>
> > It's not a strong objection, since specifying "-mno-vsx" should be
> > uncommon.  (Right?)  And, specifying "-mcpu=power8 -mvsx" is harmless.
>
> Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"?  Is
> that clearer, to your eye?

Maybe "requires -mcpu=power8 with VSX" or "requires -mcpu=power8 with
VSX enabled"?

Thanks, David


Re: [PATCH] rs6000: Match recent builtins changes in new builtins support

2021-11-09 Thread David Edelsohn via Gcc-patches
On Tue, Nov 9, 2021 at 4:40 PM Bill Schmidt  wrote:
>
> Hi!  Over the last month or so, Haochen made a couple of changes to the 
> builtins
> support that need to be reflected into the new builtin support:
>
>   14e355df   Disable gimple folding for vector min/max without fast-math
>   91419baf   Optimize vec_xl_sext
>
> In both cases these require almost identical changes in the new builtins 
> support
> to what was done for the old support.  While testing, I also observed that 
> Haochen
> made a small mistake in the test case gcc.target/powerpc/p10_vec_xl_sext.c, 
> using
> "vector long" when "vector long long" is the officially supported interface.  
> I
> corrected that as part of this patch.
>
> Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
> okay for trunk?
>
> Thanks!
> Bill
>
>
> 2021-11-09  Bill Schmidt  
>
> gcc/
> * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
> Disable gimple fold for RS6000_BIF_{XVMINDP,XVMINSP,VMINFP} and
> RS6000_BIF_{XVMAXDP,XVMAXSP,VMAXFP} when fast-math is not set.
> (lxvrse_expand_builtin): Modify the expansion for sign extension.
> All extensions are done within VSX registers.
>
> gcc/testsuite/
> * gcc.target/powerpc/p10_vec_xl_sext.c: Fix long long case.

Okay.

Thanks, David


[PATCH] Fix vsx_splat_v4si in 32 bit mode

2021-11-06 Thread David Edelsohn via Gcc-patches
powerpc: Fix vsx_splat_v4si in 32 bit mode

Tamar's recent patch to teach CSE to perform vector extract exercises
VSX splat more frequently, which exposed a constraint error for the
vsx_splat patterns.  The pattern could be created for Power9, but
the "we constraint only provided alternatives in 64 bit mode. The
instructions are valid in 32 bit mode and SImode is allowed in VSX
registers.  This patch updates the constraints from "we" to "wa" to
allow the pattern and fix the failing testcases.

Bootstrapped on powerpc-ibm-aix7.2.3.0.

gcc/ChangeLog:

* config/rs6000/vsx.md (vsx_splat_v4si): Change constraints to "wa".
(vsx_splat_v4si_di): Change constraint to "wa"

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0bf04feb6c4..a97f7f2a680 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4565,7 +4565,7 @@ (define_insn "vsx_splat__mem"

 ;; V4SI splat support
 (define_insn "vsx_splat_v4si"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=we,we")
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
 (match_operand:SI 1 "splat_input_operand" "r,Z")))]
   "TARGET_P9_VECTOR"
@@ -4578,7 +4578,7 @@ (define_insn "vsx_splat_v4si"
 ;; allows us to use direct move to get the value in a vector register
 ;; so that we can use XXSPLTW
 (define_insn "vsx_splat_v4si_di"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,we")
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
 (truncate:SI
  (match_operand:DI 1 "gpc_reg_operand" "wa,r"]


Re: [PATCH] Add TSVC tests.

2021-11-05 Thread David Edelsohn via Gcc-patches
I just noticed that Iain adjusted the tsvc.h for Darwin in the same
way that I need to adjust it for AIX.  Are we trying to keep the
testcase directory pristine and in sync with its upstream source or
can we fix it locally?

Thanks, David

On Fri, Nov 5, 2021 at 8:24 PM David Edelsohn  wrote:
>
> Hi, Martin
>
> These testcases rely on memalign in tsvc.h.  memalign is provided in
> Linux and Solaris, but is not part of Posix, and it is not available
> in AIX.  Posix defines posix_memalign, which also is available in AIX.
>
> Should the tsvc.h use posix_memalign?  Always?  Only when memalign is
> not available?
>
> Thanks, David


Re: [PATCH] Add TSVC tests.

2021-11-05 Thread David Edelsohn via Gcc-patches
Hi, Martin

These testcases rely on memalign in tsvc.h.  memalign is provided in
Linux and Solaris, but is not part of Posix, and it is not available
in AIX.  Posix defines posix_memalign, which also is available in AIX.

Should the tsvc.h use posix_memalign?  Always?  Only when memalign is
not available?

Thanks, David


Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]

2021-11-04 Thread David Edelsohn via Gcc-patches
On Thu, Nov 4, 2021 at 8:50 PM Xionghu Luo  wrote:

> [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]
>
> gcc/ChangeLog:
>
> * config/rs6000/fusion.md: Regenerate.
> * config/rs6000/genfusion.pl: Fix incorrect clobber constraint.

Okay.

Thanks, David


Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]

2021-11-03 Thread David Edelsohn via Gcc-patches
On Wed, Nov 3, 2021 at 9:46 PM Xionghu Luo  wrote:
>
> On 2021/11/3 23:13, David Edelsohn wrote:
> > Did you manually change fusion.md or did you regenerate it after
> > fixing genfusion.pl?
> >
> > If you regenerated it, the ChangeLog entry should be "Regenerated" and
> > the "Fix incorrect clobber constraint." should refer to the
> > genfusion.pl change.
> >
> > I want to ensure that genfusion.pl generates the correct constraint
> > the next time it is used.
> >
>
> Aaron mentioned he disabled the auto generation here[1], but before
> than that, Segher suggested to enable it in stage1.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564652.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564244.html
>
>
> Thus re-enable it with the followed v2 patch(Confirmed the fusion.md is
> exactly same with v1 patch.)
>
>
> [PATCH v2] rs6000: Fix incorrect fusion constraint [PR102991]
>
>
> gcc/ChangeLog:
>
> * config/rs6000/fusion.md: Regenerate.
> * config/rs6000/genfusion.pl: Fix incorrect clobber constraint.
> * config/rs6000/t-rs6000: Uncomment regeneration of fusion.md.

I believe that there is some confusion about my request. I am not
requesting that the patch enable genfusion.pl . The Makefile fragment
rule to generate fusion.md is disabled for a reason and normally
should not be enabled.  But fusion.md should be generated by
genfusion.pl when there is a change, and any changes should be made in
genfusion.pl. In other words, change genfusion.pl, temporarily enable
the Makefile fragment rule, generate fusion.md, disable genfusion.pl.
My request was an effort to ensure that genfusion.pl correctly
regenerates the new, corrected fusion.md file.  I don't want a manual
change to fusion.md that differs from the automatically generated
file. Only the updated fusion.md and genfusion.pl should be checked
in.

Has Aaron reviewed and confirmed the change to genfusion.pl?

Thanks, David


Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]

2021-11-03 Thread David Edelsohn via Gcc-patches
Did you manually change fusion.md or did you regenerate it after
fixing genfusion.pl?

If you regenerated it, the ChangeLog entry should be "Regenerated" and
the "Fix incorrect clobber constraint." should refer to the
genfusion.pl change.

I want to ensure that genfusion.pl generates the correct constraint
the next time it is used.

Thanks, David

On Wed, Nov 3, 2021 at 9:32 AM Xionghu Luo  wrote:
>
> The clobber constraint should match operand's constraint.  fusion.md was
> generated by genfusion.pl, but it is disabled now, update both places with
> correct clobber constraint.
>
> gcc/ChangeLog:
>
> * config/rs6000/fusion.md: Fix incorrect clobber constraint.
> * config/rs6000/genfusion.pl: Likewise.
> ---
>  gcc/config/rs6000/fusion.md| 128 -
>  gcc/config/rs6000/genfusion.pl |   2 +-
>  2 files changed, 65 insertions(+), 65 deletions(-)
>
> diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
> index 516baa0bb0b..d11cecb11ee 100644
> --- a/gcc/config/rs6000/fusion.md
> +++ b/gcc/config/rs6000/fusion.md
> @@ -1874,7 +1874,7 @@ (define_insn "*fuse_vand_vand"
>  (and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" 
> "v,v,v,v")
>(match_operand:VM 1 "altivec_register_operand" 
> "%v,v,v,v"))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vand %3,%1,%0\;vand %3,%3,%2
> @@ -1892,7 +1892,7 @@ (define_insn "*fuse_vandc_vand"
>  (and:VM (and:VM (not:VM (match_operand:VM 0 
> "altivec_register_operand" "v,v,v,v"))
>(match_operand:VM 1 "altivec_register_operand" 
> "v,v,v,v"))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vandc %3,%1,%0\;vand %3,%3,%2
> @@ -1910,7 +1910,7 @@ (define_insn "*fuse_veqv_vand"
>  (and:VM (not:VM (xor:VM (match_operand:VM 0 
> "altivec_register_operand" "v,v,v,v")
>(match_operand:VM 1 "altivec_register_operand" 
> "v,v,v,v")))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> veqv %3,%1,%0\;vand %3,%3,%2
> @@ -1928,7 +1928,7 @@ (define_insn "*fuse_vnand_vand"
>  (and:VM (ior:VM (not:VM (match_operand:VM 0 
> "altivec_register_operand" "v,v,v,v"))
>(not:VM (match_operand:VM 1 
> "altivec_register_operand" "v,v,v,v")))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vnand %3,%1,%0\;vand %3,%3,%2
> @@ -1946,7 +1946,7 @@ (define_insn "*fuse_vnor_vand"
>  (and:VM (and:VM (not:VM (match_operand:VM 0 
> "altivec_register_operand" "v,v,v,v"))
>(not:VM (match_operand:VM 1 
> "altivec_register_operand" "v,v,v,v")))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vnor %3,%1,%0\;vand %3,%3,%2
> @@ -1964,7 +1964,7 @@ (define_insn "*fuse_vor_vand"
>  (and:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" 
> "v,v,v,v")
>(match_operand:VM 1 "altivec_register_operand" 
> "v,v,v,v"))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vor %3,%1,%0\;vand %3,%3,%2
> @@ -1982,7 +1982,7 @@ (define_insn "*fuse_vorc_vand"
>  (and:VM (ior:VM (not:VM (match_operand:VM 0 
> "altivec_register_operand" "v,v,v,v"))
>(match_operand:VM 1 "altivec_register_operand" 
> "v,v,v,v"))
>   (match_operand:VM 2 "altivec_register_operand" "v,v,v,v")))
> -   (clobber (match_scratch:VM 4 "=X,X,X,&r"))]
> +   (clobber (match_scratch:VM 4 "=X,X,X,&v"))]
>"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)"
>"@
> vorc %3,%1,%0\;vand %3,%3,%2
> @@ -2000,7 +2000,7 @@ (define_insn "*fuse_vxor_vand"
>  (and:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" 
> "v,v,v,v")
>(match_operand:VM 1 "altivec_register_operand" 
> "v,v,v,v"))
> 

Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-11-02 Thread David Edelsohn via Gcc-patches
On Mon, Nov 1, 2021 at 10:40 PM HAO CHEN GUI  wrote:
>
> David,
>
> My patch file was broken. I am sorry for it.  Here is the correct one. 
> Thanks a lot.
>
> ChangeLog
>
> 2021-11-01 Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable
> gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
> VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not
> set.
>
> gcc/testsuite/
> * gcc.target/powerpc/vec-minmax-1.c: New test.
> * gcc.target/powerpc/vec-minmax-2.c: Likewise.

This is okay.

The default DejaGNU test action is compile, but it's a good idea to
include the dg-do line to be clear and document the intention.

Thanks, David


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-11-01 Thread David Edelsohn via Gcc-patches
Hi, Hao

Neither the inlined patch nor the attached patch seem to contain the
change to rs6000-call.c.  I only see the new testcases.

Please resend the complete patch.

Thanks David

On Mon, Nov 1, 2021 at 2:48 AM HAO CHEN GUI  wrote:
>
> Hi,
>
>   This patch disables gimple folding for VSX_BUILTIN_XVMINDP, 
> VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMINFP and  ALTIVEC_BUILTIN_VMAXFP when 
> fast-math is not set.  With the gimple folding is enabled, the four built-ins 
> will be implemented by c-type instructions - xs[min|max]cdp on P9 and P10 if 
> they can be converted to scalar comparisons.  While they are implemented by 
> xv[min|max][s|d]p on P8 and P7 as P8 and P7 don't have corresponding scalar 
> comparison instructions.  The patch binds these four built-ins to 
> xv[min|max][s|d]p when fast-math is not set. The two new test cases 
> illustrate it.
>
>   ALTIVEC_BUILTIN_VMINFP and  ALTIVEC_BUILTIN_VMAXFP are not implemented by 
> vminfp or vmaxfp.
>
> rs6000-builtin.def:BU_ALTIVEC_2 (VMAXFP,  "vmaxfp", 
> CONST, smaxv4sf3)
>
> rs6000-builtin.def:BU_ALTIVEC_2 (VMINFP,  "vminfp", 
> CONST, sminv4sf3)
>
> Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>
> ChangeLog
>
> 2021-11-01 Haochen Gui 
>
> gcc/
> * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable
> gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
> VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not
> set.
>
> gcc/testsuite/
> * gcc.target/powerpc/vec-minmax-1.c: New test.
> * gcc.target/powerpc/vec-minmax-2.c: Likewise.
>
>
> patch.diff
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> new file mode 100644
> index 000..e238659c9be
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c
> @@ -0,0 +1,52 @@
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */
> +
> +/* This test verifies that float or double vec_min/max are bound to
> +   xv[min|max][d|s]p instructions when fast-math is not set.  */
> +
> +
> +#include 
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_D = 0;
> +#else
> +   const int PREF_D = 1;
> +#endif
> +
> +double vmaxd (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_max (va, vb), PREF_D);
> +}
> +
> +double vmind (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_min (va, vb), PREF_D);
> +}
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_F = 0;
> +#else
> +   const int PREF_F = 3;
> +#endif
> +
> +float vmaxf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_max (va, vb), PREF_F);
> +}
> +
> +float vminf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_min (va, vb), PREF_F);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
> new file mode 100644
> index 000..149275d8709
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c
> @@ -0,0 +1,50 @@
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */
> +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */
> +
> +/* This test verifies that float or double vec_min/max can be converted
> +   to scalar comparison when fast-math is set.  */
> +
> +
> +#include 
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_D = 0;
> +#else
> +   const int PREF_D = 1;
> +#endif
> +
> +double vmaxd (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_max (va, vb), PREF_D);
> +}
> +
> +double vmind (double a, double b)
> +{
> +  vector double va = vec_promote (a, PREF_D);
> +  vector double vb = vec_promote (b, PREF_D);
> +  return vec_extract (vec_min (va, vb), PREF_D);
> +}
> +
> +#ifdef _BIG_ENDIAN
> +   const int PREF_F = 0;
> +#else
> +   const int PREF_F = 3;
> +#endif
> +
> +float vmaxf (float a, float b)
> +{
> +  vector float va = vec_promote (a, PREF_F);
> +  vector float vb = vec_promote (b, PREF_F);
> +  return vec_extract (vec_max (va, vb), PREF_F);
> +}
> +

Re: [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-28 Thread David Edelsohn via Gcc-patches
On Thu, Oct 28, 2021 at 1:39 AM Xionghu Luo  wrote:
>
> On 2021/10/27 21:24, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
> >>
> >> If the second operand of __builtin_shuffle is const vector 0, and with
> >> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> >> patterns match and emit for VSX xxpermdi.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/powerpc/pr102868.c: New test.
> >> ---
> >>  gcc/config/rs6000/rs6000.c  | 47 --
> >>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
> >>  2 files changed, 97 insertions(+), 3 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index d0730253bcc..5d802c1fa96 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx 
> >> op0, rtx op1,
> >>  {OPTION_MASK_P8_VECTOR,
> >>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
> >>   : CODE_FOR_p8_vmrgew_v4sf_direct,
> >> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
> >> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
> >> +{OPTION_MASK_VSX,
> >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> >> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> >> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};
> >
> > If the insn_code is the same for big endian and little endian, why
> > does the new code test BYTES_BIG_ENDIAN to set the same value
> > (CODE_FOR_vsx_xxpermdi_v16qi)?
> >
>
> Thanks for the catch, updated the patch as below:
>
> [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the 
> upper bits [PR102868]
>
> If the second operand of __builtin_shuffle is const vector 0, and with
> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> patterns match and emit for VSX xxpermdi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr102868.c: New test.

Okay.

Thanks, David


Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin  wrote:
>
> Hi David,
>
> Thanks for the review!
>
> on 2021/10/27 下午9:12, David Edelsohn wrote:
> > On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> As PR102767 shows, the commit r12-3482 exposed one ICE in function
> >> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
> >> on rs6000 (See define_expand "movmisalign"), so it return true in
> >> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
> >> the cost querying rs6000_builtin_vectorization_cost, we don't have
> >> the arms to handle the V1TI input under (TARGET_VSX &&
> >> TARGET_ALLOW_MOVMISALIGN).
> >>
> >> The proposed fix is to add the consideration for V1TI, simply make it
> >> as the cost for doubleword which is apparently bigger than the cost of
> >> scalar, won't have the vectorization to happen, just to keep consistency
> >> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
> >> but it sounds like a bad idea since it doesn't match the reality.
> >>
> >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> >> powerpc64-linux-gnu P8.
> >>
> >> Is it ok for trunk?
> >>
> >> BR,
> >> Kewen
> >> -
> >> gcc/ChangeLog:
> >>
> >> PR target/102767
> >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): 
> >> Consider
> >> V1T1 mode for unaligned load and store.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> PR target/102767
> >> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index b7ea1483da5..73d3e06c3fc 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>   {
> >> elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -   if (elements == 2)
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >>   /* Double word aligned.  */
> >>   return 4;
> >>
> >> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
> >> vect_cost_for_stmt type_of_cost,
> >>
> >>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
> >>{
> >> -elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> -if (elements == 2)
> >> -  /* Double word aligned.  */
> >> -  return 2;
> >> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> >> +   /* See PR102767, consider V1TI to keep consistency.  */
> >> +   if (elements == 2 || elements == 1)
> >> + /* Double word aligned.  */
> >> + return 2;
> >
> > This section of the patch incorrectly changes the indentation.  Please
> > use the correct indentation.
> >
>
> The indentation change is intentional since the original identation is
> wrong (more than 8 spaces leading the lines), there are more wrong
> identation lines above the first changed line, but I thought it seems a
> bad idea to fix them too when they are unrelated to what this patch
> wants to fix, so I left them alone.
>
> With the above clarification, may I push this patch without any updates
> for the mentioned indentation issue?

If you correct the indentation, you should adjust it for the entire
block, not just the lines that you change.  If you want to fix the
entire block to TAB+spaces as well, okay.  You didn't mention that you
were fixing the indentation in the explanation of the patch.

Thank, David

>
> >>
> >>  if (elements == 4)
> >>{
> >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
> >> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> new file mode 100644
> >> index 000..a4122482989
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> >> @@ -0,0 +1,21 @@
> >> +! { dg-require-effective-target powerpc_vsx_ok }
> >> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
> >> +
> >> +INTERFACE
> >> +  FUNCTION elemental_mult (a, b, c)
> >> +type(*), DIMENSION(..) :: a, b, c
> >> +  END
> >> +END INTERFACE
> >> +
> >> +allocatable  z
> >> +integer, dimension(2,2) :: a, b
> >> +call test_CFI_address
> >> +contains
> >> +  subroutine test_CFI_address
> >> +if (elemental_mult (z, x, y) .ne. 0) stop
> >> +a = reshape ([4,3,2,1], [2,2])
> >> +b = reshape ([2,3,4,5], [2,2])
> >> +if (elemental_mult (i, a, b) .ne. 0) stop
> >> +  end
> >> +end
> >> +
> >>
> >
> > The patch is okay with the indentation correction.
> >
> > Thanks, David
> >
>
> Thanks!
>
> BR,
> Kewen


Re: [PATCH] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo  wrote:
>
> If the second operand of __builtin_shuffle is const vector 0, and with
> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv.
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add
> patterns match and emit for VSX xxpermdi.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr102868.c: New test.
> ---
>  gcc/config/rs6000/rs6000.c  | 47 --
>  gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 +
>  2 files changed, 97 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index d0730253bcc..5d802c1fa96 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>  {OPTION_MASK_P8_VECTOR,
>   BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct
>   : CODE_FOR_p8_vmrgew_v4sf_direct,
> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}};
> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}},
> +{OPTION_MASK_VSX,
> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi
> +  : CODE_FOR_vsx_xxpermdi_v16qi),
> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}};

If the insn_code is the same for big endian and little endian, why
does the new code test BYTES_BIG_ENDIAN to set the same value
(CODE_FOR_vsx_xxpermdi_v16qi)?

Thanks, David

>
>unsigned int i, j, elt, which;
>unsigned char perm[16];
> @@ -23169,6 +23185,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>   machine_mode omode = insn_data[icode].operand[0].mode;
>   machine_mode imode = insn_data[icode].operand[1].mode;
>
> + rtx perm_idx = GEN_INT (0);
> + if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
> +   {
> + int perm_val = 0;
> + if (one_vec)
> +   {
> + if (perm[0] == 8)
> +   perm_val |= 2;
> + if (perm[8] == 8)
> +   perm_val |= 1;
> +   }
> + else
> +   {
> + if (perm[0] != 0)
> +   perm_val |= 2;
> + if (perm[8] != 16)
> +   perm_val |= 1;
> +   }
> + perm_idx = GEN_INT (perm_val);
> +   }
> +
>   /* For little-endian, don't use vpkuwum and vpkuhum if the
>  underlying vector type is not V4SI and V8HI, respectively.
>  For example, using vpkuwum with a V8HI picks up the even
> @@ -23192,7 +23229,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
>/* For little-endian, the two input operands must be swapped
>   (or swapped back) to ensure proper right-to-left numbering
>   from 0 to 2N-1.  */
> - if (swapped ^ !BYTES_BIG_ENDIAN)
> + if (swapped ^ !BYTES_BIG_ENDIAN
> + && icode != CODE_FOR_vsx_xxpermdi_v16qi)
> std::swap (op0, op1);
>   if (imode != V16QImode)
> {
> @@ -23203,7 +23241,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
> x = target;
>   else
> x = gen_reg_rtx (omode);
> - emit_insn (GEN_FCN (icode) (x, op0, op1));
> + if (icode == CODE_FOR_vsx_xxpermdi_v16qi)
> +   emit_insn (GEN_FCN (icode) (x, op0, op1, perm_idx));
> + else
> +   emit_insn (GEN_FCN (icode) (x, op0, op1));
>   if (omode != V16QImode)
> emit_move_insn (target, gen_lowpart (V16QImode, x));
>   return true;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102868.c 
> b/gcc/testsuite/gcc.target/powerpc/pr102868.c
> new file mode 100644
> index 000..eb45d193f66
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr102868.c
> @@ -0,0 +1,53 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include 
> +vector float b = {0.0f, 0.0f, 0.0f, 0.0f};
> +
> +
> +vector float foo1 (vector float x)
> +{
> +  vector int c = {0, 1, 

Re: Ping^3: [PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel

2021-10-27 Thread David Edelsohn via Gcc-patches
This patch series is okay.

Thanks, David

On Thu, Oct 21, 2021 at 11:25 PM Xionghu Luo  wrote:
>
> Ping^3, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html
>
>
> On 2021/10/15 14:28, Xionghu Luo via Gcc-patches wrote:
> > Ping^2, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html
> >
> >
> > On 2021/10/8 09:17, Xionghu Luo via Gcc-patches wrote:
> >> Ping, thanks.
> >>
> >>
> >> On 2021/9/17 13:25, Xionghu Luo wrote:
> >>> These two patches are updated version from:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html
> >>>
> >>> Changes:
> >>> 1. Fix alignment error in md files.
> >>> 2. Replace rtx_equal_p with match_dup.
> >>> 3. Use register_operand instead of gpc_reg_operand to align with
> >>>vperm/xxperm.
> >>> 4. Regression tested pass on P8LE.
> >>>
> >>> Xionghu Luo (2):
> >>>   rs6000: Fix wrong code generation for vec_sel [PR94613]
> >>>   rs6000: Fold xxsel to vsel since they have same semantics
> >>>
> >>>  gcc/config/rs6000/altivec.md  | 84 ++-
> >>>  gcc/config/rs6000/rs6000-call.c   | 62 ++
> >>>  gcc/config/rs6000/rs6000.c| 19 ++---
> >>>  gcc/config/rs6000/vector.md   | 26 +++---
> >>>  gcc/config/rs6000/vsx.md  | 25 --
> >>>  gcc/testsuite/gcc.target/powerpc/builtins-1.c |  2 +-
> >>>  gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++
> >>>  7 files changed, 193 insertions(+), 72 deletions(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c
> >>>
> >>
> >
>
> --
> Thanks,
> Xionghu


Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]

2021-10-27 Thread David Edelsohn via Gcc-patches
On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin  wrote:
>
> Hi,
>
> As PR102767 shows, the commit r12-3482 exposed one ICE in function
> rs6000_builtin_vectorization_cost.  We claims V1TI supports movmisalign
> on rs6000 (See define_expand "movmisalign"), so it return true in
> rs6000_builtin_support_vector_misalignment for misalign 8.  Later in
> the cost querying rs6000_builtin_vectorization_cost, we don't have
> the arms to handle the V1TI input under (TARGET_VSX &&
> TARGET_ALLOW_MOVMISALIGN).
>
> The proposed fix is to add the consideration for V1TI, simply make it
> as the cost for doubleword which is apparently bigger than the cost of
> scalar, won't have the vectorization to happen, just to keep consistency
> and avoid ICE.  Another thought is to not support movmisalign for V1TI,
> but it sounds like a bad idea since it doesn't match the reality.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> PR target/102767
> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Consider
> V1T1 mode for unaligned load and store.
>
> gcc/testsuite/ChangeLog:
>
> PR target/102767
> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b7ea1483da5..73d3e06c3fc 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>   {
> elements = TYPE_VECTOR_SUBPARTS (vectype);
> -   if (elements == 2)
> +   /* See PR102767, consider V1TI to keep consistency.  */
> +   if (elements == 2 || elements == 1)
>   /* Double word aligned.  */
>   return 4;
>
> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum 
> vect_cost_for_stmt type_of_cost,
>
>  if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN)
>{
> -elements = TYPE_VECTOR_SUBPARTS (vectype);
> -if (elements == 2)
> -  /* Double word aligned.  */
> -  return 2;
> +   elements = TYPE_VECTOR_SUBPARTS (vectype);
> +   /* See PR102767, consider V1TI to keep consistency.  */
> +   if (elements == 2 || elements == 1)
> + /* Double word aligned.  */
> + return 2;

This section of the patch incorrectly changes the indentation.  Please
use the correct indentation.

>
>  if (elements == 4)
>{
> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 
> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> new file mode 100644
> index 000..a4122482989
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90
> @@ -0,0 +1,21 @@
> +! { dg-require-effective-target powerpc_vsx_ok }
> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" }
> +
> +INTERFACE
> +  FUNCTION elemental_mult (a, b, c)
> +type(*), DIMENSION(..) :: a, b, c
> +  END
> +END INTERFACE
> +
> +allocatable  z
> +integer, dimension(2,2) :: a, b
> +call test_CFI_address
> +contains
> +  subroutine test_CFI_address
> +if (elemental_mult (z, x, y) .ne. 0) stop
> +a = reshape ([4,3,2,1], [2,2])
> +b = reshape ([2,3,4,5], [2,2])
> +if (elemental_mult (i, a, b) .ne. 0) stop
> +  end
> +end
> +
>

The patch is okay with the indentation correction.

Thanks, David


Re: [PATCH] gcc: implement AIX-style constructors

2021-10-21 Thread David Edelsohn via Gcc-patches
On Thu, Oct 21, 2021 at 8:39 AM CHIGOT, CLEMENT  wrote:
>
> Hi David,
>
> The problem is that cdtors is created by the linker only when the -bcdtors
> flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to
> the "crtcxa.c", we can't use it without using the new constructor types.
> One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will
> be replacing the default crtcxa.o when needed. I didn't think about that
> when creating the patch. But it might be a better approach. What do you think 
> ?

Hi, Clement

Another, optional object file in libgcc seems like a cleaner solution
than collect2.c emitting all of the code.

The file should not be called crtcxa_cdtors.o because "cxa" came from
the name of the libstdc++ support functions.  Maybe crtcdtors.o or
crt_cdtors.o or aixcdtors.o or aix_cdtors.o .

Thanks, David


Re: [PATCH] gcc: implement AIX-style constructors

2021-10-19 Thread David Edelsohn via Gcc-patches
Clement,

+  /* Use __C_runtime_pstartup to run ctors and register dtors.
+ This whole part should normally be in libgcc but as
+ AIX cdtors format is currently not the default, managed
+ that in collect2.  */

Why are you emitting the special startup function call in collect2.c
instead of placing it in libgcc.  The comment mentions that the
special startup function should be defined in libgcc.

Yes, the AIX ld bcdtors mechanism is not the default, but what is the
harm? The symbol will be defined and exported by libgcc. If the AIX
linker -bcdtors functionality is not invoked, the symbol is not used.
And if a user does invoke the AIX linker with -bcdtors, the behavior
will be the same (either the program was compiled to use AIX cdtors or
not, which is the same if the startup function is emitted by
collect2.c.

Also, the patch should include documentation of the option.  The
documentation should mention that this is for interoperability with
IBM XL Compiler, and the option will not operate correctly unless the
application and the GCC runtime are built with the option.

Thanks, David

On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT  wrote:
>
> AIX linker now supports constructors and destructors detection. For such
> functions to be detected, their name must starts with __sinit or __sterm.
> and -bcdtors must be passed to linker calls. It will create "_cdtors"
> symbol which can be used to launch the initialization.
>
> This patch creates a new RS6000 flag "-mcdtors=".
> With "-mcdtors=aix", gcc will generate these new constructors/destructors.
> With "-mcdtors=gcc", which is currently the default, gcc will continue
> to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D
> symbols).
> Ideally, it would have been better to enable the AIX format by default
> instead of using collect2. However, the compatibility between the
> previously-built binaries and the new ones is too complex to be done.
>
> gcc/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * collect2.c (aixbcdtors_flags): New variable.
> (main): Use it to detect -bcdtors and remove -binitfini flag.
> (write_c_file_stat): Adapt to new AIX format.
> * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define.
> (FILE_STERM_FORMAT): New define.
> (TARGET_FILE_FUNCTION_FORMAT): New define.
> * config/rs6000/aix64.opt: Add -mcdtors flag.
> * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when
>   -mcdtors=aix is passed.
> * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise.
> * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum.
> * tree.c (get_file_function_name): Add
>   TARGET_FILE_FUNCTION_FORMAT support.
>
> gcc/testsuite/ChangeLog:
> 2021-10-04  Clément Chigot  
>
> * gcc.target/powerpc/constructor-aix.c: New test.
>
>


Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-19 Thread David Edelsohn via Gcc-patches
Hi, H.J.

My colleague built GCC, including GCC Go, with your patch:

"I was able to build libgo and test it partially.  The results are
similar to the current master without libffi updates. But 64bit tests
aren't working in both cases. It's related to LIBPATH issues..."

- David

On Mon, Oct 18, 2021 at 11:09 AM H.J. Lu  wrote:
>
> On Mon, Oct 18, 2021 at 8:04 AM David Edelsohn  wrote:
> >
> > Hi, H.J.
> >
> > My colleague responded that GCC Go builds and works on AIX, but it
> > currently requires a special, custom version of GNU objcopy that adds
> > support for the types of features that Go requires to operate on AIX
> > XCOFF files.  Those changes have not yet been updated and contributed
> > to GNU Binutils.
> >
> > I will see if I can install that version of objcopy standalone.  We
> > also can ask Clement and ATOS to test GCC Go build with your proposed
> > libffi patch, or is it vanilla libffi trunk?
>
> My libffi branch:
>
> https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/libffi/master
>
> synced with libffi v3.4.2, not master.
>
> BTW, the current master branch won't build libgo:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102796
>
> Thanks.
>
> > Thanks, David
> >
> >
> > On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > > > >
> > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  
> > > > > wrote:
> > > > > >
> > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 
> > > > > > > > > > > > > > and cherry-picked fixes
> > > > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > > > has been changed significantly with new features, 
> > > > > > > > > > > > > > bug fixes and new target
> > > > > > > > > > > > > > support.  Here is a set of patches to sync with 
> > > > > > > > > > > > > > libffi 3.4.2 release and
> > > > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > > > 2. Add scripts to help sync with up

Re: [PATCH v2] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-18 Thread David Edelsohn via Gcc-patches
On Tue, Oct 12, 2021 at 9:50 PM Xionghu Luo  wrote:
>
> Resend this patch.  Previous discussion is:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
>
> vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
> 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrglb.
> Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select
> + vec_concat as normal RTL.
>
> Tested pass on P8LE, P9LE and P8BE{m32}, ok for trunk?
>
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (*altivec_vmrghb_internal): Delete.
> (altivec_vmrghb_direct): New.
> (*altivec_vmrghh_internal): Delete.
> (altivec_vmrghh_direct): New.
> (*altivec_vmrghw_internal): Delete.
> (altivec_vmrghw_direct_): New.
> (altivec_vmrghw_direct): Delete.
> (*altivec_vmrglb_internal): Delete.
> (altivec_vmrglb_direct): New.
> (*altivec_vmrglh_internal): Delete.
> (altivec_vmrglh_direct): New.
> (*altivec_vmrglw_internal): Delete.
> (altivec_vmrglw_direct_): New.
> (altivec_vmrglw_direct): Delete.
> * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const):
> Adjust.
> * config/rs6000/vsx.md (vsx_xxmrghw_): Adjust.
> (vsx_xxmrglw_): Adjust.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/builtins-1.c: Update instruction counts.

This patch is okay.

Thanks, David


Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-18 Thread David Edelsohn via Gcc-patches
Hi, H.J.

My colleague responded that GCC Go builds and works on AIX, but it
currently requires a special, custom version of GNU objcopy that adds
support for the types of features that Go requires to operate on AIX
XCOFF files.  Those changes have not yet been updated and contributed
to GNU Binutils.

I will see if I can install that version of objcopy standalone.  We
also can ask Clement and ATOS to test GCC Go build with your proposed
libffi patch, or is it vanilla libffi trunk?

Thanks, David


On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
>
> On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> >
> > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > >
> > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > > > > > > > cherry-picked fixes
> > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > has been changed significantly with new features, bug 
> > > > > > > > > > > > fixes and new target
> > > > > > > > > > > > support.  Here is a set of patches to sync with libffi 
> > > > > > > > > > > > 3.4.2 release and
> > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It 
> > > > > > > > > > > > is availale at
> > > > > > > > > > > >
> > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > > > >
> > > > > > > > > > > How did you test this?  It looks like libgo is the only 
> > > > > > > > > > > consumer of
> > > > > > > > > > > libffi these days.
> > > > > > > > > > > In particular go/libgo seems to be supported on almost 
> > > > > > > > > > > all targets besides
> > > > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > > > configurations?
> > > > > > > > > >
> > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > > > understanding is

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-16 Thread David Edelsohn via Gcc-patches
On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu  wrote:
>
> On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn  wrote:
> >
> > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
> > >
> > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  wrote:
> > > >
> > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > > > >
> > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
> > > > > > >
> > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Change in the v2 patch:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > > > > > > > cherry-picked fixes
> > > > > > > > > > > > from upstream over the last 10+ years.  In the 
> > > > > > > > > > > > meantime, libffi upstream
> > > > > > > > > > > > has been changed significantly with new features, bug 
> > > > > > > > > > > > fixes and new target
> > > > > > > > > > > > support.  Here is a set of patches to sync with libffi 
> > > > > > > > > > > > 3.4.2 release and
> > > > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > > > >
> > > > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It 
> > > > > > > > > > > > is availale at
> > > > > > > > > > > >
> > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > > > >
> > > > > > > > > > > How did you test this?  It looks like libgo is the only 
> > > > > > > > > > > consumer of
> > > > > > > > > > > libffi these days.
> > > > > > > > > > > In particular go/libgo seems to be supported on almost 
> > > > > > > > > > > all targets besides
> > > > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > > > configurations?
> > > > > > > > > >
> > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > > > understanding is that
> > > > > > > > > > the upstream libffi works on Darwin and Windows.
> > > > > > > > > >
> > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you 
> > > > > > > > > > > won't get any "review"
> > > > > > > > > > > of this massive diff.
> > > > > > > > > >
> > &

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-16 Thread David Edelsohn via Gcc-patches
On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu  wrote:
>
> On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn  wrote:
> >
> > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
> > >
> > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  wrote:
> > > >
> > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
> > > > >
> > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  wrote:
> > > > > >
> > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Change in the v2 patch:
> > > > > > > > > >
> > > > > > > > > > 1. Disable static trampolines by default.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > > > > > cherry-picked fixes
> > > > > > > > > > from upstream over the last 10+ years.  In the meantime, 
> > > > > > > > > > libffi upstream
> > > > > > > > > > has been changed significantly with new features, bug fixes 
> > > > > > > > > > and new target
> > > > > > > > > > support.  Here is a set of patches to sync with libffi 
> > > > > > > > > > 3.4.2 release and
> > > > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > > > >
> > > > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is 
> > > > > > > > > > availale at
> > > > > > > > > >
> > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > > > >
> > > > > > > > > How did you test this?  It looks like libgo is the only 
> > > > > > > > > consumer of
> > > > > > > > > libffi these days.
> > > > > > > > > In particular go/libgo seems to be supported on almost all 
> > > > > > > > > targets besides
> > > > > > > > > darwin/windows - did you test cross and canadian 
> > > > > > > > > configurations?
> > > > > > > >
> > > > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My 
> > > > > > > > understanding is that
> > > > > > > > the upstream libffi works on Darwin and Windows.
> > > > > > > >
> > > > > > > > > I applaud the attempt to sync to upsteam but I fear you won't 
> > > > > > > > > get any "review"
> > > > > > > > > of this massive diff.
> > > > > > > >
> > > > > > > > I believe that it should just work.  Our libffi is very much 
> > > > > > > > out of date.
> > > > > > >
> > > > > > > Yes, you can hope.  And yes, our libffi is out of date.
> > > > > > >
> > > > > > > Can you please do the extra step to test one weird architecture, 
> > > > > > > namely
> > > > > > > powerpc64-aix which is available on the compile-farm?
> > > > > >
> > > > > > I will give it a try and report back.
> > > > > >
> > > > > > > If that goes well I think it's good to "hope" at this point (and 
> > > > > > > 

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-16 Thread David Edelsohn via Gcc-patches
On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu  wrote:
>
> On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn  wrote:
> >
> > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
> > >
> > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  wrote:
> > > >
> > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
> > > > > >
> > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Change in the v2 patch:
> > > > > > > >
> > > > > > > > 1. Disable static trampolines by default.
> > > > > > > >
> > > > > > > >
> > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > > > cherry-picked fixes
> > > > > > > > from upstream over the last 10+ years.  In the meantime, libffi 
> > > > > > > > upstream
> > > > > > > > has been changed significantly with new features, bug fixes and 
> > > > > > > > new target
> > > > > > > > support.  Here is a set of patches to sync with libffi 3.4.2 
> > > > > > > > release and
> > > > > > > > make it easier to sync with libffi upstream:
> > > > > > > >
> > > > > > > > 1. Document how to sync with upstream.
> > > > > > > > 2. Add scripts to help sync with upstream.
> > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is 
> > > > > > > > availale at
> > > > > > > >
> > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > > > >
> > > > > > > How did you test this?  It looks like libgo is the only consumer 
> > > > > > > of
> > > > > > > libffi these days.
> > > > > > > In particular go/libgo seems to be supported on almost all 
> > > > > > > targets besides
> > > > > > > darwin/windows - did you test cross and canadian configurations?
> > > > > >
> > > > > > I only tested it on Linux/i686 and Linux/x86-64.   My understanding 
> > > > > > is that
> > > > > > the upstream libffi works on Darwin and Windows.
> > > > > >
> > > > > > > I applaud the attempt to sync to upsteam but I fear you won't get 
> > > > > > > any "review"
> > > > > > > of this massive diff.
> > > > > >
> > > > > > I believe that it should just work.  Our libffi is very much out of 
> > > > > > date.
> > > > >
> > > > > Yes, you can hope.  And yes, our libffi is out of date.
> > > > >
> > > > > Can you please do the extra step to test one weird architecture, 
> > > > > namely
> > > > > powerpc64-aix which is available on the compile-farm?
> > > >
> > > > I will give it a try and report back.
> > > >
> > > > > If that goes well I think it's good to "hope" at this point (and 
> > > > > plenty of
> > > > > time to fix fallout until the GCC 12 release).
> > > > >
> > > > > Thus OK after the extra testing dance and waiting until early next
> > > > > week so others can throw in a veto.
> > >
> > > I tried to bootstrap GCC master branch on  gcc119.fsffrance.org:
> > >
> > > *  MT/MODEL: 8284-22A 
> > > *
> > > * Partition: gcc119   
> > > *
> > > *System: power8-aix.osuosl.org
> > > *
> > > *   O/S: AIX V7.2 7200-04-03-2038
> > >
> > > I configured GCC with
> > >
> > > --with-as=/usr/bin/as --with-ld=/usr/bin/ld
> > > -

Re: [PATCH v2 0/4] libffi: Sync with upstream

2021-10-15 Thread David Edelsohn via Gcc-patches
On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu  wrote:
>
> On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu  wrote:
> >
> > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu  wrote:
> > > >
> > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu  wrote:
> > > > > >
> > > > > > Change in the v2 patch:
> > > > > >
> > > > > > 1. Disable static trampolines by default.
> > > > > >
> > > > > >
> > > > > > GCC maintained a copy of libffi snapshot from 2009 and 
> > > > > > cherry-picked fixes
> > > > > > from upstream over the last 10+ years.  In the meantime, libffi 
> > > > > > upstream
> > > > > > has been changed significantly with new features, bug fixes and new 
> > > > > > target
> > > > > > support.  Here is a set of patches to sync with libffi 3.4.2 
> > > > > > release and
> > > > > > make it easier to sync with libffi upstream:
> > > > > >
> > > > > > 1. Document how to sync with upstream.
> > > > > > 2. Add scripts to help sync with upstream.
> > > > > > 3. Sync with libffi 3.4.2. This patch is quite big.  It is availale 
> > > > > > at
> > > > > >
> > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
> > > > > > 4. Integrate libffi build and testsuite with GCC.
> > > > >
> > > > > How did you test this?  It looks like libgo is the only consumer of
> > > > > libffi these days.
> > > > > In particular go/libgo seems to be supported on almost all targets 
> > > > > besides
> > > > > darwin/windows - did you test cross and canadian configurations?
> > > >
> > > > I only tested it on Linux/i686 and Linux/x86-64.   My understanding is 
> > > > that
> > > > the upstream libffi works on Darwin and Windows.
> > > >
> > > > > I applaud the attempt to sync to upsteam but I fear you won't get any 
> > > > > "review"
> > > > > of this massive diff.
> > > >
> > > > I believe that it should just work.  Our libffi is very much out of 
> > > > date.
> > >
> > > Yes, you can hope.  And yes, our libffi is out of date.
> > >
> > > Can you please do the extra step to test one weird architecture, namely
> > > powerpc64-aix which is available on the compile-farm?
> >
> > I will give it a try and report back.
> >
> > > If that goes well I think it's good to "hope" at this point (and plenty of
> > > time to fix fallout until the GCC 12 release).
> > >
> > > Thus OK after the extra testing dance and waiting until early next
> > > week so others can throw in a veto.
>
> I tried to bootstrap GCC master branch on  gcc119.fsffrance.org:
>
> *  MT/MODEL: 8284-22A 
> *
> * Partition: gcc119   
> *
> *System: power8-aix.osuosl.org
> *
> *   O/S: AIX V7.2 7200-04-03-2038
>
> I configured GCC with
>
> --with-as=/usr/bin/as --with-ld=/usr/bin/ld
> --enable-version-specific-runtime-libs --disable-nls
> --enable-decimal-float=dpd --disable-libstdcxx-pch --disable-werror
> --enable-__cxa_atexit --with-gmp=/opt/cfarm --with-mpfr=/opt/cfarm
> --with-mpc=/opt/cfarm --with-isl=/opt/cfarm --prefix=/opt/freeware
> --with-local-prefix=/opt/freeware --enable-languages=c,c++,go
>
> I got
>
> g++   -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables 
> -W
> -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format 
> -Wmissing-format-at
> tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-
> overlength-strings -fno-common  -DHAVE_CONFIG_H  -DGENERATOR_FILE 
> -static-libstd
> c++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genenums \
> build/genenums.o build/read-md.o build/errors.o 
> ../build-powerpc-ibm-aix7.2.
> 4.0/libiberty/libiberty.a
> ld: 0711-317 ERROR: Undefined symbol: lexer_line
> ld: 0711-317 ERROR: Undefined symbol: .yylex(char const**)
> ld: 0711-317 ERROR: Undefined symbol: .yybegin(char const*)
> ld: 0711-317 ERROR: Undefined symbol: lexer_toplevel_done
> ld: 0711-317 ERROR: Undefined symbol: .yyend()
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> collect2: error: ld returned 8 exit status
> Makefile:3000: recipe for target 'build/gengtype' failed
> gmake[5]: *** [build/gengtype] Error 1
>
> David, is there an instruction to bootstrap GCC on AIX?

The CompileFarm page in the GCC wiki has instructions under "build tips":

https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines

The error that you show might be due to not having /opt/freeware/bin
first in your path and the bootstrap used the AIX version of lex or
sed or some other command.

Thanks, David


Re: PATCH, rs6000] Optimization for vec_xl_sext

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 2:17 AM HAO CHEN GUI  wrote:
>
> Hi,
>
>The patch optimizes the code generation for vec_xl_sext builtin. Now all 
> the sign extensions are done on VSX registers directly.
>
>Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>I refined the patch according to Bill and David's advice. I put the 
> patch.diff and ChangeLog in attachment also in case the indentation doesn't 
> show correctly in email body.
>
>
> ChangeLog
>
> 2021-10-11 Haochen Gui 
>
>
> gcc/
>
> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>
> Modify the expansion for sign extension. All extensions are done
>
> within VSX registers.
>
>
> gcc/testsuite/
>
> * gcc.target/powerpc/p10_vec_xl_sext.c: New test.

This is okay.

Thanks, David


Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 10:10 AM CHIGOT, CLEMENT
 wrote:
>
> Hi David,
>
> The fact that csect .data is referencing csect .text doesn't mean that
> if .text is kept, .data is kept too. It means the opposite. if .data is kept
> then .text must be kept.

Yes, we are in agreement about the purpose of the other reference
between text and data.

The __tls_get_addr reference is an effort to artificially elicit an
error if pthread is not linked in a program that uses TLS.  It is not
truly necessary for correct TLS code generation from GCC.

Your patch moves the __tls_get_addr referenced from the data section
to the text section.  As we agree, the logic of the other code implies
that the data section is used to pull in the text section, so moving
__tls_get_addr to the text section seems more fragile.  It's a game of
which section will be preserved to ensure that __tls_get_addr is
referenced to ensure that an artificial error is generated.  And it's
possible that neither text nor data sections will be referenced if
fine-grained CSECTs are used.  There is no way to make this logic
perfect.

Thanks, David


>
> That's actually what is being done by the linker with the TLS support test
> in configure.
> $ cat test.c
> __thread int a; int b; int main() { return a = b; }
>
> With ".ref __tls_get_addr" in .data:
> $ gcc -maix64 test.c -S -o test.s
> $ cat test.s
> ...
> _section_.text:
> .csect .data[RW],4
> .llong _section_.text
> .extern __tls_get_addr
> .ref __tls_get_addr
> $ gcc -maix64 test.s -o test
> $ dump -X64 -tv test
> ...
> [142]   m   0x0097 debug 0FILEC:PPC64 test.c
> [143]   m   0x16c0 .text 1  unamex.text
> [144]   a4  0x005c   00 SD   PR--
> [147]   m   0x20001298  .bss 1  externb
> [148]   a4  0x0004   00 CM   BS--
> [149]   m   0x8800 .tbss 1  externa
> [150]   a4  0x0004   00 CM   UL--
> ...
>
> Csect .data is garbage-collected by the linker. Thus the .ref doesn't matter.
>
> With ".ref __tls_get_addr" in .text:
> $ cat test.s
> _section_.text:
> .csect .data[RW],4
> .llong _section_.text
> .csect .text[PR],5
> .extern __tls_get_addr
> .ref __tls_get_addr
> $ gcc  -maix64 test.s -o test
> ld: 0711-317 ERROR: Undefined symbol: __tls_get_addr
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> collect2: error: ld returned 8 exit status
>
> As csect .text is kept (because of main function), the .ref is still there 
> and the error
> is raise correctly. As "-pthread" isn't passed, __tls_get_addr is not 
> available.
>
> However, writing this mail, I'm wondering if we don't want to always keep both
> csects. If .data is kept, then .text is and if .text is kept, then .data is.
> Or always keeping .data would have too much side effects ?
>
> Thanks,
> Clément
>
> 
> From: David Edelsohn 
> Sent: Thursday, October 14, 2021 3:42 PM
> To: CHIGOT, CLEMENT 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [PATCH] aix: ensure reference to __tls_get_addr is in text 
> section.
>
> Caution! External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
> The reference to __tls_get_addr is in the data section.  And the code
> just above creates a symbol in the text section referenced from the
> data section to ensure the text section is retained.  So this change
> doesn't make sense.  You're essentially saying that the data section
> is not used, which makes the other code useless to ensure that the
> text section is referenced.
>
> Thanks, David
>
> On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT  
> wrote:
> >
> > The garbage collector of AIX linker might remove the reference to
> > __tls_get_addr if it's added inside an unused csect.
> >
> >
> > Clément Chigot
> > ATOS Bull SAS
> > 1 rue de Provence - 38432 Échirolles - France
> >


Re: [PATCH] rs6000: Fix memory leak in rs6000_density_test

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 9:00 AM Richard Sandiford
 wrote:
>
> rs6000_density_test has an early exit test between a call
> to get_loop_body and the corresponding free.  This would
> lead to a memory leak if the early exit is taken.
>
> Tested on powerpc64le-linux-gnu.  It's obvious that moving the
> test avoids the leak, but there are multiple ways to write it,
> so: OK to install?

Thanks for noticing this and creating a patch.

This is okay.

Thanks, David

>
> Richard
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_density_test): Move early
> exit test further up the function.
> ---
>  gcc/config/rs6000/rs6000.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index acba4d9f26c..01a95591a5d 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5289,20 +5289,19 @@ struct rs6000_cost_data
>  static void
>  rs6000_density_test (rs6000_cost_data *data)
>  {
> -  struct loop *loop = data->loop_info;
> -  basic_block *bbs = get_loop_body (loop);
> -  int nbbs = loop->num_nodes;
> -  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
> -  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
> -  int i, density_pct;
> -
>/* This density test only cares about the cost of vector version of the
>   loop, so immediately return if we are passed costing for the scalar
>   version (namely computing single scalar iteration cost).  */
>if (data->costing_for_scalar)
>  return;
>
> -  for (i = 0; i < nbbs; i++)
> +  struct loop *loop = data->loop_info;
> +  basic_block *bbs = get_loop_body (loop);
> +  int nbbs = loop->num_nodes;
> +  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
> +  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
> +
> +  for (int i = 0; i < nbbs; i++)
>  {
>basic_block bb = bbs[i];
>gimple_stmt_iterator gsi;
> @@ -5322,7 +5321,7 @@ rs6000_density_test (rs6000_cost_data *data)
>  }
>
>free (bbs);
> -  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
> +  int density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
>
>if (density_pct > rs6000_density_pct_threshold
>&& vec_cost + not_vec_cost > rs6000_density_size_threshold)


Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-14 Thread David Edelsohn via Gcc-patches
The reference to __tls_get_addr is in the data section.  And the code
just above creates a symbol in the text section referenced from the
data section to ensure the text section is retained.  So this change
doesn't make sense.  You're essentially saying that the data section
is not used, which makes the other code useless to ensure that the
text section is referenced.

Thanks, David

On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT  wrote:
>
> The garbage collector of AIX linker might remove the reference to
> __tls_get_addr if it's added inside an unused csect.
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France
>


Re: [PATCH, rs6000] Optimization for vec_xl_sext

2021-10-13 Thread David Edelsohn via Gcc-patches
>> gcc/
>> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>> Modify the expansion for sign extension. All extentions are done
>> within VSX resgisters.
>
> Two typos here:  extentions => extensions, resgisters => registers.

This is okay with Bill's comments addressed.

Thanks, David


Re: [PATCH, rs6000] punish reload of lfiwzx when loading an int variable [PR102169, PR102146]

2021-10-13 Thread David Edelsohn via Gcc-patches
>   The patch punishes reload of alternative pair of "d, Z" for 
> movsi_internal1. The reload occurs if 'Z' doesn't match and generates an 
> additional insn. So the memory reload should be punished.
>
>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>
> ChangeLog
>
> 2021-09-29 Haochen Gui 
>
> gcc/
> * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages
> slightly the alternative 'Z' of "lfiwzx" when reload is needed.

Capitalize "D" of disparages.

Should this disparage "stfiwzx" also?  Carl Love saw poor code
generation at -O0 for a trivial example where a stack store moved the
value to an FPR and used stfiwx.

Thanks, David


Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set

2021-10-13 Thread David Edelsohn via Gcc-patches
2021-08-25 Haochen Gui 

gcc/
 * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin):
 Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP,
 VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions.

Please write something more than "modify".  The ChangeLog should be
more like the email subject line for this patch.

gcc/testsuite/
 * gcc.target/powerpc/vec-minmax-1.c: New test.
 * gcc.target/powerpc/vec-minmax-2.c: Likewise.

Please ensure that the indentation is correct for the case statements;
it was unclear from the text pasted into the email.

Okay with those clarifications.

Thanks, David


Re: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]

2021-10-12 Thread David Edelsohn via Gcc-patches
Hi, Xionghu

What's the status of the \M and \m testcase beautification requested
by Segher?  Did you send an updated patch? Your messages ping the
version prior to Segher's additional comments.

It seems that the changes to the patterns are complete, but there are
remaining questions about the testcase style and if the instruction
counts are ideal. I trust that the instruction counts match the
behavior after the patch, but it seemed that Segher wanted to confirm
that the counts are the values desired / expected from optimal code
generation.  The counts are the total for the file, which doesn't
communicate if the sequences themselves are optimal.

Thanks, David

On Sun, Sep 5, 2021 at 8:54 PM Xionghu Luo  wrote:
>
> Ping^2, thanks.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
>
>
> On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote:
> > Gentle ping, thanks.
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html
> >
> >
> > On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote:
> >> Hi,
> >>
> >> On 2021/6/9 07:25, Segher Boessenkool wrote:
> >>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote:
>  vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20,
>  5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for
>  vmrghlb.
> >>>
> >>> (vmrglb)
> >>>
>  +  if (BYTES_BIG_ENDIAN)
>  +emit_insn (
>  +  gen_altivec_vmrghb_direct (operands[0], operands[1],
>  operands[2]));
>  +  else
>  +emit_insn (
>  +  gen_altivec_vmrglb_direct (operands[0], operands[2],
>  operands[1]));
> >>>
> >>> Please don't indent like that, it doesn't match what we do elsewhere.
> >>> For better or for worse (for worse imo), we use deep hanging indents.
> >>> If you have to, you can do something like
> >>>
> >>>rtx insn;
> >>>if (BYTES_BIG_ENDIAN)
> >>>  insn = gen_altivec_vmrghb_direct (operands[0], operands[1],
> >>> operands[2]);
> >>>else
> >>>  insn = gen_altivec_vmrglb_direct (operands[0], operands[2],
> >>> operands[1]);
> >>>emit_insn (insn);
> >>>
> >>> (this is better even, in that it has only one emit_insn), or even
> >>>
> >>>rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> >>>: gen_altivec_vmrglb_direct;
> >>>if (!BYTES_BIG_ENDIAN)
> >>>  std::swap (operands[1], operands[2]);
> >>>emit_insn (fun (operands[0], operands[1], operands[2]));
> >>>
> >>> Well, C++ does not allow that last example like that, sigh, so
> >>>rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ?
> >>> gen_altivec_vmrghb_direct
> >>> : gen_altivec_vmrglb_direct;
> >>>
> >>> This is shorter than the other two options ;-)
> >>
> >> Changed.
> >>
> >>>
>  +(define_insn "altivec_vmrghb_direct"
>  [(set (match_operand:V16QI 0 "register_operand" "=v")
>  +(vec_select:V16QI
> >>>
> >>> This should be indented one space more.
> >>>
>  "TARGET_ALTIVEC"
>  "@
>  -   xxmrghw %x0,%x1,%x2
>  -   vmrghw %0,%1,%2"
>  +  xxmrghw %x0,%x1,%x2
>  +  vmrghw %0,%1,%2"
> >>>
> >>> The original indent was correct, please restore.
> >>>
>  -  emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo));
>  +  emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve,
>  vo));
> >>>
> >>> When you see a mode as part of a pattern name, chances are that it will
> >>> be a good candidate for using parameterized names with.  (But don't do
> >>> that now, just keep it in mind as a nice cleanup to do).
> >>
> >> OK.
> >>
> >>>
>  @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target,
>  rtx op0, rtx op1,
>   : CODE_FOR_altivec_vmrglh_direct),
>  {  0,  1, 16, 17,  2,  3, 18, 19,  4,  5, 20, 21,  6,  7,
>  22, 23 } },
>    { OPTION_MASK_ALTIVEC,
>  -  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct
>  -   : CODE_FOR_altivec_vmrglw_direct),
>  +  (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
>  +   : CODE_FOR_altivec_vmrglw_direct_v4si),
> >>>
> >>> The correct way is to align the ? and the : (or put everything on one
> >>> line of course, if that fits)
> >>>
> >>> The parens around this are not needed btw, and are a distraction.
> >>
> >> Changed.
> >>
> >>>
>  --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>  +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c
>  @@ -317,10 +317,10 @@ int main ()
>    /* { dg-final { scan-assembler-times "vctuxs" 2 } } */
>    /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */
>  -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */
>  +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */
>    /* { dg-final { scan-assembler-times "vmrghh" 8 } } */
>  -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */
>  -/* { dg-final { scan-assembler-times "xxmrglw" 

Re: [PATCH] rs6000: Fix vec_cpsgn parameter order (PR101985)

2021-10-12 Thread David Edelsohn via Gcc-patches
On Fri, Sep 24, 2021 at 11:20 AM Bill Schmidt  wrote:
>
> Hi!
>
> This fixes a bug reported in 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101985.
>
> The vec_cpsgn built-in function API differs in argument order from the
> copysign3 convention.  Currently that pattern is incorrectly used to
> implement vec_cpsgn.  Fix that while leaving the existing pattern in place
> to implement copysignf for vector modes.

It's a little confusing what "that" is.  Maybe clarify that the patch
is changing the PowerPC VSX function to invoke the GCC built-in with
the argument in the correct order.

>
> Part of the fix when using the new built-in support requires an adjustment
> to a pending patch that replaces much of altivec.h with an automatically
> generated file.  So that adjustment will be coming later...
>
> Also fix a bug in the new built-in overload infrastructure where we were
> using the VSX form of the VEC_COPYSIGN built-in when we should default to
> the VMX form.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this okay for trunk?
>
> Thanks!
> Bill
>
>
> 2021-09-24  Bill Schmidt  
>
> gcc/
> PR target/101985
> * config/rs6000/altivec.h (vec_cpsgn): Adjust.

Maybe a little more information than "Adjust"?  Swap arguments?

> * config/rs6000/rs6000-overload.def (VEC_COPYSIGN): Use SKIP to
> avoid generating an automatic #define of vec_cpsgn.  Use the
> correct built-in for V4SFmode that doesn't depend on VSX.
>
> gcc/testsuite/
> PR target/101985
> * gcc.target/powerpc/pr101985.c: New.

Okay.

Thanks, David


Re: [PATCH][OBVIOUS] rs6000: fix symtab_node::get == NULL issue

2021-09-15 Thread David Edelsohn via Gcc-patches
This needs an additional adjustment.  The encoding decoration needs to
be applied if the decl isn't an alias.  That means both a null summary
*OR* the decl is not explicitly an alias.

I'm proposing the following:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d0830a95027..ad81dfb316d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -21728,8 +21728,8 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, in
t first)
   if (decl
   && DECL_P (decl)
   && VAR_OR_FUNCTION_DECL_P (decl)
-  && symtab_node::get (decl) != NULL
-  && symtab_node::get (decl)->alias == 0
+  && (symtab_node::get (decl) == NULL
+ || symtab_node::get (decl)->alias == 0)
   && symname[strlen (symname) - 1] != ']')
 {
   const char *smclass = NULL;


On Wed, Sep 15, 2021 at 11:21 AM Martin Liška  wrote:
>
> Hello.
>
> The patch is approved by David and fixes the issue described in the PR.
>
> Martin
>
> PR target/102349
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info):
> Check that we have a symbol summary for a symbol.
> ---
>   gcc/config/rs6000/rs6000.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b0ec8108007..d0830a95027 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -21728,6 +21728,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, 
> int first)
> if (decl
> && DECL_P (decl)
> && VAR_OR_FUNCTION_DECL_P (decl)
> +  && symtab_node::get (decl) != NULL
> && symtab_node::get (decl)->alias == 0
> && symname[strlen (symname) - 1] != ']')
>   {
> --
> 2.33.0
>


<    1   2   3   4   5   6   7   8   9   10   >