Re: [PATCH]rs6000: avoid peeking eof after __vector keyword
On Mon, Mar 21, 2022 at 5:13 AM Jiufu Guo wrote: > > Hi! > > There is a rare corner case: where __vector is followed only with ";" > and near the end of the file. > > Like the case in PR101168: > using vdbl = __vector double; > #define BREAK 1 > > For this case, "__vector double" is not followed by a PP_NAME, it is > followed by CPP_SEMICOLON and then EOF. In this case, there is no > more tokens in the file. Then, do not need to continue to parse the > file. > > This patch pass bootstrap and regtest on ppc64 and ppc64le. This is okay. Maybe a tweak to the comment, see below. Thanks, David > > > BR, > Jiufu > > > PR preprocessor/101168 > > gcc/ChangeLog: > > * config/rs6000/rs6000-c.cc (rs6000_macro_to_expand): > Avoid empty identifier. > > gcc/testsuite/ChangeLog: > > * g++.target/powerpc/pr101168.C: New test. > > > --- > gcc/config/rs6000/rs6000-c.cc | 4 +++- > gcc/testsuite/g++.target/powerpc/pr101168.C | 6 ++ > 2 files changed, 9 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr101168.C > > diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc > index 3b62b499df2..f8cc7bad812 100644 > --- a/gcc/config/rs6000/rs6000-c.cc > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -282,7 +282,9 @@ rs6000_macro_to_expand (cpp_reader *pfile, const > cpp_token *tok) > expand_bool_pixel = __pixel_keyword; > else if (ident == C_CPP_HASHNODE (__bool_keyword)) > expand_bool_pixel = __bool_keyword; > - else > + > + /* If it needs to check tokens continue. */ Maybe /* If there are more tokens to check. */ ? > + else if (ident) > { > /* Try two tokens down, too. */ > do > diff --git a/gcc/testsuite/g++.target/powerpc/pr101168.C > b/gcc/testsuite/g++.target/powerpc/pr101168.C > new file mode 100644 > index 000..284e77fdc88 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr101168.C > @@ -0,0 +1,6 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-options "-maltivec" } */ > + > +using vdbl = __vector double; > +#define BREAK 1 > -- > 2.25.1 >
Re: [PATCH] libgcc: allow building float128 libraries on FreeBSD
I don't have any objection, but the patch is FreeBSD-specific. You are sending the patch from the FreeBSD organization, but I don't know the authority structure within the organization. Andreas Tobler is the FreeBSD maintainer for GCC, but I don't know his current status. Thanks, David On Sun, Feb 20, 2022 at 6:38 PM wrote: > > From: Piotr Kubaj > > While FreeBSD currently uses 64-bit long double, there should be no > problem with adding support for float128. > > Signed-off-by: Piotr Kubaj > --- > libgcc/configure| 22 ++ > libgcc/configure.ac | 11 +++ > 2 files changed, 33 insertions(+) > > diff --git a/libgcc/configure b/libgcc/configure > index 4919a56f518..334d20d1fb1 100755 > --- a/libgcc/configure > +++ b/libgcc/configure > @@ -5300,6 +5300,28 @@ fi > { $as_echo "$as_me:${as_lineno-$LINENO}: result: > $libgcc_cv_powerpc_3_1_float128_hw" >&5 > $as_echo "$libgcc_cv_powerpc_3_1_float128_hw" >&6; } >CFLAGS="$saved_CFLAGS" > +;; > +powerpc*-*-freebsd*) > + saved_CFLAGS="$CFLAGS" > + CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128" > + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PowerPC ISA 2.06 to > build __float128 libraries" >&5 > +$as_echo_n "checking for PowerPC ISA 2.06 to build __float128 libraries... " > >&6; } > +if ${libgcc_cv_powerpc_float128+:} false; then : > + $as_echo_n "(cached) " >&6 > +else > + cat confdefs.h - <<_ACEOF >conftest.$ac_ext > +/* end confdefs.h. */ > +vector double dadd (vector double a, vector double b) { return a + b; } > +_ACEOF > +if ac_fn_c_try_compile "$LINENO"; then : > + libgcc_cv_powerpc_float128=yes > +else > + libgcc_cv_powerpc_float128=no > +fi > +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext > +fi > +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: > $libgcc_cv_powerpc_float128" >&5 > +$as_echo "$libgcc_cv_powerpc_float128" >&6; } > esac > > # Collect host-machine-specific information. > diff --git a/libgcc/configure.ac b/libgcc/configure.ac > index 13a80b2551b..99ec5d405a4 100644 > --- a/libgcc/configure.ac > +++ b/libgcc/configure.ac > @@ -483,6 +483,17 @@ powerpc*-*-linux*) > [libgcc_cv_powerpc_3_1_float128_hw=yes], > [libgcc_cv_powerpc_3_1_float128_hw=no])]) >CFLAGS="$saved_CFLAGS" > +;; > +powerpc*-*-freebsd*) > + saved_CFLAGS="$CFLAGS" > + CFLAGS="$CFLAGS -mabi=altivec -mvsx -mfloat128" > + AC_CACHE_CHECK([for PowerPC ISA 2.06 to build __float128 libraries], > + [libgcc_cv_powerpc_float128], > + [AC_COMPILE_IFELSE( > +[AC_LANG_SOURCE([vector double dadd (vector double a, vector double b) { > return a + b; }])], > +[libgcc_cv_powerpc_float128=yes], > +[libgcc_cv_powerpc_float128=no])]) > + CFLAGS="$saved_CFLAGS" > esac > > # Collect host-machine-specific information. > -- > 2.35.1 >
Re: [PATCH v3, rs6000] Enable absolute jump table for PPC AIX and Linux
On Tue, Mar 1, 2022 at 12:41 AM HAO CHEN GUI wrote: > > Hi, >This patch enables absolute jump tables on PPC AIX and Linux. For AIX, the > jump > table is placed in data section. For Linux, it is placed in RELRO section when > relocation is needed. > >Bootstrapped and tested on AIX,Linux BE and LE with no regressions. Is > this okay for trunk? > Any recommendations? Thanks a lot. Hi, Haochen Thanks for working on this patch and for revising it. The patch looks okay now, but it is not appropriate for the current bug fixing stage in GCC development -- it needs to wait for GCC Stage 1 later this Spring. Also the current code uses the default readonly data section for the jump table. The new rs6000_xcoff_function_rodata_section follows the existing simple behavior, which is correct, but it should support named data sections instead of placing everything in the default ".data" section, similar to the default ELF code. I will work on that. Thanks, David > > ChangeLog > 2022-03-01 Haochen Gui > > gcc/ > * config/rs6000/aix.h (JUMP_TABLES_IN_TEXT_SECTION): Define. > * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Likewise. > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Enable > absolute jump tables for AIX and Linux. > (rs6000_xcoff_function_rodata_section): Implement. > * config/rs6000/xcoff.h (TARGET_ASM_FUNCTION_RODATA_SECTION): Define. > > gcc/testsuite > * gcc.target/powerpc/absolute-jump-table-section.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h > index ad3238bf09a..cf0708aa08b 100644 > --- a/gcc/config/rs6000/aix.h > +++ b/gcc/config/rs6000/aix.h > @@ -251,9 +251,9 @@ > #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \ >(!(FIRST) ? PAD_UPWARD : targetm.calls.function_arg_padding (MODE, TYPE)) > > -/* Indicate that jump tables go in the text section. */ > +/* Indicate that jump tables go in the data section. */ > > -#define JUMP_TABLES_IN_TEXT_SECTION 1 > +#define JUMP_TABLES_IN_TEXT_SECTION 0 > > /* Define any extra SPECS that the compiler needs to generate. */ > #undef SUBTARGET_EXTRA_SPECS > diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h > index b2a7afabc73..440e0fde52b 100644 > --- a/gcc/config/rs6000/linux64.h > +++ b/gcc/config/rs6000/linux64.h > @@ -237,9 +237,9 @@ extern int dot_symbols; > #define TARGET_ALIGN_NATURAL 1 > #endif > > -/* Indicate that jump tables go in the text section. */ > +/* Indicate that jump tables go in the rodata or RELRO section. */ > #undef JUMP_TABLES_IN_TEXT_SECTION > -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT > +#define JUMP_TABLES_IN_TEXT_SECTION 0 > > /* The linux ppc64 ABI isn't explicit on whether aggregates smaller > than a doubleword should be padded upward or downward. You could > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index bc3ef0721a4..07f78d3a05b 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -4954,6 +4954,10 @@ rs6000_option_override_internal (bool global_init_p) > warning (0, "%qs is deprecated and not recommended in any circumstances", > "-mno-speculate-indirect-jumps"); > > + /* Enable absolute jump tables for AIX and Linux. */ > + if (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2) > +rs6000_relative_jumptables = 0; > + >return ret; > } > > @@ -21419,6 +21423,16 @@ rs6000_xcoff_visibility (tree decl) >enum symbol_visibility vis = DECL_VISIBILITY (decl); >return visibility_types[vis]; > } > + > +static section * > +rs6000_xcoff_function_rodata_section (tree decl ATTRIBUTE_UNUSED, > + bool relocatable) > +{ > + if (relocatable) > +return data_section; > + else > +return readonly_data_section; > +} > #endif > > > diff --git a/gcc/config/rs6000/xcoff.h b/gcc/config/rs6000/xcoff.h > index cd0f99cb9c6..0dacd86eed9 100644 > --- a/gcc/config/rs6000/xcoff.h > +++ b/gcc/config/rs6000/xcoff.h > @@ -98,7 +98,7 @@ > #define TARGET_ASM_SELECT_SECTION rs6000_xcoff_select_section > #define TARGET_ASM_SELECT_RTX_SECTION rs6000_xcoff_select_rtx_section > #define TARGET_ASM_UNIQUE_SECTION rs6000_xcoff_unique_section > -#define TARGET_ASM_FUNCTION_RODATA_SECTION default_no_function_rodata_section > +#define TARGET_ASM_FUNCTION_RODATA_SECTION > rs6000_xcoff_function_rodata_section > #define TARGET_STRIP_NAME_ENCODING rs6000_xcoff_strip_name_encoding > #define TARGET_SECTION_TYPE_FLAGS rs6000_xcoff_section_type_flags > #ifdef HAVE_AS_TLS > diff --git a/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c > b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c > new file mode 100644 > index 000..688a6f42836 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/absolute-jump-table-section.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile { target { *-*-aix* || *-*-linux* } } } */ > +/* { dg-op
Re: [PATCH, 11 backport] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]
On Thu, Feb 10, 2022 at 4:17 PM Bill Schmidt wrote: > > Hi! > > On 2/10/22 2:50 PM, Segher Boessenkool wrote: > > On Thu, Feb 10, 2022 at 12:22:28PM -0600, Bill Schmidt wrote: > >> This is a backport from mainline 3f30f2d1dbb3228b8468b26239fe60c2974ce2ac. > >> These built-ins were misimplemented as always having big-endian semantics. > >> > >> Because the built-in infrastructure has changed, the modifications to the > >> source are different but achieve the same purpose. The modifications to > >> the test suite are identical (after fixing the issue with -mbig that David > >> pointed out with the original patch). > >> /* 1 argument vector functions added in ISA 3.0 (power9). */ > >> -BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi", CONST, > >> vclzlsbb_v16qi) > >> -BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST, vclzlsbb_v8hi) > >> -BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST, vclzlsbb_v4si) > >> -BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi", CONST, > >> vctzlsbb_v16qi) > >> -BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST, vctzlsbb_v8hi) > >> -BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST, vctzlsbb_v4si) > >> +BU_P9V_AV_1 (VCLZLSBB_V16QI, "vclzlsbb_v16qi", CONST, > >> vctzlsbb_v16qi) > >> +BU_P9V_AV_1 (VCLZLSBB_V8HI, "vclzlsbb_v8hi",CONST, vctzlsbb_v8hi) > >> +BU_P9V_AV_1 (VCLZLSBB_V4SI, "vclzlsbb_v4si",CONST, vctzlsbb_v4si) > >> +BU_P9V_AV_1 (VCTZLSBB_V16QI, "vctzlsbb_v16qi", CONST, > >> vclzlsbb_v16qi) > >> +BU_P9V_AV_1 (VCTZLSBB_V8HI, "vctzlsbb_v8hi",CONST, vclzlsbb_v8hi) > >> +BU_P9V_AV_1 (VCTZLSBB_V4SI, "vctzlsbb_v4si",CONST, vclzlsbb_v4si) > > Please change the default to be equal to the builtin name, so, the BE > > version. We do that everywhere else as well, and it makes a lot more > > sense (since everything in Power has BE numbering). > > > > The trunk version has this correct afaics? > > No, trunk has this, for example: > > const signed int __builtin_altivec_vclzlsbb_v16qi (vsc); > VCLZLSBB_V16QI vctzlsbb_v16qi {endian} > > So the backport matches what is on trunk. > > Throughout the new builtin infrastructure, the defaults are set for > little-endian, and the "endian" flag changes behavior for big-endian. > > > > >> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c > >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-0.c > >> @@ -1,6 +1,7 @@ > >> /* { dg-do compile { target { powerpc*-*-* } } } */ > > (Delete the redundant target clause when modifying any testcase, please). > > Okay. > > > >> /* { dg-require-effective-target powerpc_p9vector_ok } */ > >> /* { dg-options "-mdejagnu-cpu=power9" } */ > >> +/* { dg-additional-options "-mbig" { target powerpc64le-*-* } } */ > > You don't need the target clause, if it already is BE by default it does > > not do anything to add it redundantly. > > > > But this is wrong anyway: the name of the target triple does not say > > whether we are BE or LE. Instead you should use the be or le selectors. > > But again, just add -mbig always. > > This was added by David Edelsohn to the trunk version of the patch, because > -mbig actually is not supported on all subtargets. (I found that quite > surprising also.) Apparently this doesn't work on AIX, for example. But > -mlittle works everywhere. Go figure. -mbig/-mlittle only applies to Linux, not AIX and not Darwin. I changed the BE testcases to add "-mbig" for little endian default targets because the compiler implicitly should be operating in big endian mode for other targets and the testcases should succeed. For the LE testcases, I changed the target selector to "powrpc*-*-linux*" because that is the only PowerPC target that can operate as little endian. I could not find a generic "le" target selector. powerpc*-*-linux* understands "-mlittle", so I left the dg-options clause because there is no need to separate out "-mlittle" for that subset of PowerPC targets. Thanks, David > > That's something that should be fixed, I guess, but it's orthogonal > to this patch. > > Thanks! > Bill > > > > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-3.c > >> @@ -0,0 +1,15 @@ > >> +/* { dg-do compile { target { powerpc*-*-* } } } */ > >> +/* { dg-require-effective-target powerpc_p9vector_ok } */ > >> +/* { dg-options "-mdejagnu-cpu=power9 -mlittle" } */ > > And here you do it correctly :-) > > > > Okay with those fixes (all happen a few times). Thanks! > > > > > > Segher
Re: [PATCH] rs6000: Fix up vspltis_shifted [PR102140]
On Tue, Feb 8, 2022 at 12:25 PM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because > (const_vector:V4SI [ > (const_int 0 [0]) repeated x3 > (const_int -2147483648 [0x8000]) > ]) > is recognized as valid easy_vector_constant in between split1 pass and > end of RA. > The problem is that such constants need to be split, and the only > splitter for that is: > (define_split > [(set (match_operand:VM 0 "altivec_register_operand") > (match_operand:VM 1 "easy_vector_constant_vsldoi"))] > "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()" > There is only a single splitting pass before RA, so after that finishes, > if something gets matched in between that and end of RA (after that > can_create_pseudo_p () would be no longer true), it will never be > successfully split and we ICE at final.cc time or earlier. > > The i386 backend (and a few others) already use > (cfun->curr_properties & PROP_rtl_split_insns) > as a test for split1 pass finished, so that some insns that should be split > during split1 and shouldn't be matched afterwards are properly guarded. > > So, the following patch does that for vspltis_shifted too. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? Okay. Thanks, David > > 2022-02-08 Jakub Jelinek > > PR target/102140 > * config/rs6000/rs6000.cc (vspltis_shifted): Return false also if > split1 pass has finished already. > > * gcc.dg/pr102140.c: New test. > > --- gcc/config/rs6000/rs6000.cc.jj 2022-02-07 17:38:20.873123915 +0100 > +++ gcc/config/rs6000/rs6000.cc 2022-02-08 14:15:31.619505410 +0100 > @@ -6257,8 +6257,11 @@ vspltis_shifted (rtx op) > return false; > >/* We need to create pseudo registers to do the shift, so don't recognize > - shift vector constants after reload. */ > - if (!can_create_pseudo_p ()) > + shift vector constants after reload. Don't match it even before RA > + after split1 is done, because there won't be further splitting pass > + before RA to do the splitting. */ > + if (!can_create_pseudo_p () > + || (cfun->curr_properties & PROP_rtl_split_insns)) > return false; > >nunits = GET_MODE_NUNITS (mode); > --- gcc/testsuite/gcc.dg/pr102140.c.jj 2022-02-08 14:24:25.839041166 +0100 > +++ gcc/testsuite/gcc.dg/pr102140.c 2022-02-08 14:24:03.038359745 +0100 > @@ -0,0 +1,23 @@ > +/* PR target/102140 */ > +/* { dg-do compile { target int128 } } */ > +/* { dg-options "-Og -fipa-cp -fno-tree-ccp -fno-tree-ter -Wno-psabi" } */ > + > +typedef int __attribute__((__vector_size__ (64))) U; > +typedef __int128 __attribute__((__vector_size__ (64))) V; > + > +int a, b; > + > +static void > +bar (char c, V v) > +{ > + v *= c; > + U u = a + (U) v; > + (union { U b; }) { u }; > + b = 0; > +} > + > +void > +foo (void) > +{ > + bar (1, (V){((__int128) 9223372036854775808ULL) << 64}); > +} > > Jakub >
Re: [PATCH] testsuite: Fix up testsuite/gcc.c-torture/execute/builtins/lib/chk.c for powerpc [PR104380]
On Mon, Feb 7, 2022 at 8:20 AM Jakub Jelinek wrote: > > On Fri, Feb 04, 2022 at 12:00:57PM -0500, David Edelsohn via Gcc-patches > wrote: > > > The following testcase FAILs when configured with > > > --with-long-double-format=ieee . Only happens in the -std=c* modes, not > > > the > > > GNU modes; while the glibc headers have __asm redirects of > > > vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and > > > __vsnprintf_chkieee128, the vsnprintf fortification extern inline > > > gnu_inline > > > always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit > > > a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of > > > __vsnprintf_chkieee128. > > > > > > rs6000_mangle_decl_assembler_name already had cases for *printf and > > > *scanf, > > > so this just adds another case for *printf_chk. *scanf_chk doesn't exist. > > > __ prefixing isn't done because *printf_chk already starts with __. > > > > > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? > > > > Okay. > > Unfortunately, while I've tested the testcase also with -mabi=ieeelongdouble > by hand, the full bootstrap/regtest was on GCCFarm where glibc is too old > to test with --with-long-double-format=ieee. > I've done full bootstrap/regtest with that option during the weekend and > the patch regressed: > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O1 > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2 > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2 -flto > -fno-use-linker-plugin -flto-partition=none > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O3 > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -O3 -g > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Og -g > FAIL: gcc.c-torture/execute/builtins/snprintf-chk.c execution, -Os > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O1 > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2 > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2 -flto > -fno-use-linker-plugin -flto-partition=none > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O3 > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -O3 -g > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Og -g > FAIL: gcc.c-torture/execute/builtins/sprintf-chk.c execution, -Os > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O1 > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2 > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2 -flto > -fno-use-linker-plugin -flto-partition=none > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O3 > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -O3 -g > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Og -g > FAIL: gcc.c-torture/execute/builtins/vsnprintf-chk.c execution, -Os > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O1 > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2 > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2 -flto > -fno-use-linker-plugin -flto-partition=none > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O2 -flto > -fuse-linker-plugin -fno-fat-lto-objects > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O3 > -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -O3 -g > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Og -g > FAIL: gcc.c-torture/execute/builtins/vsprintf-chk.c execution, -Os > > The problem is that the execute/builtins/ testsuite wants to override some > of the library functions and with the change we (correctly) call > __*printf_chkieee128 and so lib/chk.c is no longer called but the glibc > APIs are. > > The following patch fixes it. > > Tested on powerpc64le-linux, ok fo
Re: [PATCH] rs6000: Fix up -D_FORTIFY_SOURCE* with -mabi=ieeelongdouble [PR104380]
On Fri, Feb 4, 2022 at 11:58 AM Jakub Jelinek wrote: > > Hi! > > The following testcase FAILs when configured with > --with-long-double-format=ieee . Only happens in the -std=c* modes, not the > GNU modes; while the glibc headers have __asm redirects of > vsnprintf and __vsnprinf_chk to __vsnprintfieee128 and > __vsnprintf_chkieee128, the vsnprintf fortification extern inline gnu_inline > always_inline wrapper calls __builtin_vsnprintf_chk and we actually emit > a call to __vsnprinf_chk (i.e. with IBM extended long double) instead of > __vsnprintf_chkieee128. > > rs6000_mangle_decl_assembler_name already had cases for *printf and *scanf, > so this just adds another case for *printf_chk. *scanf_chk doesn't exist. > __ prefixing isn't done because *printf_chk already starts with __. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? Okay. Thanks, David > > 2022-02-04 Jakub Jelinek > > PR target/104380 > * config/rs6000/rs6000.cc (rs6000_mangle_decl_assembler_name): Also > adjust mangling of __builtin*printf_chk. > > * gcc.dg/pr104380.c: New test. > > --- gcc/config/rs6000/rs6000.cc.jj 2022-01-28 10:01:41.224837656 +0100 > +++ gcc/config/rs6000/rs6000.cc 2022-02-04 12:31:27.651715472 +0100 > @@ -28228,6 +28228,7 @@ rs6000_mangle_decl_assembler_name (tree > { > size_t printf_len = strlen ("printf"); > size_t scanf_len = strlen ("scanf"); > + size_t printf_chk_len = strlen ("printf_chk"); > > if (len >= printf_len > && strcmp (name + len - printf_len, "printf") == 0) > @@ -28237,6 +28238,10 @@ rs6000_mangle_decl_assembler_name (tree >&& strcmp (name + len - scanf_len, "scanf") == 0) > newname = xasprintf ("__isoc99_%sieee128", name); > > + else if (len >= printf_chk_len > + && strcmp (name + len - printf_chk_len, "printf_chk") == 0) > + newname = xasprintf ("%sieee128", name); > + > else if (name[len - 1] == 'l') > { > bool uses_ieee128_p = false; > --- gcc/testsuite/gcc.dg/pr104380.c.jj 2022-02-04 12:51:50.152643364 +0100 > +++ gcc/testsuite/gcc.dg/pr104380.c 2022-02-04 12:53:25.092317741 +0100 > @@ -0,0 +1,32 @@ > +/* PR target/104380 */ > +/* This test needs runtime that provides __*_chk functions. */ > +/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */ > +/* { dg-options "-O2 -std=c99" } */ > + > +#define FORTIFY_SOURCE 2 > +#include > +#include > + > +static char buf[4096]; > +static char gfmt[] = "%Lg"; > + > +static int __attribute__ ((noipa)) > +foo (char *str, const char *fmt, ...) > +{ > + int ret; > + va_list ap; > + va_start (ap, fmt); > + ret = vsnprintf (str, 4096, fmt, ap); > + va_end (ap); > + return ret; > +} > + > +int > +main () > +{ > + long double dval = 128.0L; > + int ret = foo (buf, gfmt, dval); > + if (ret != 3 || __builtin_strcmp (buf, "128") != 0) > +__builtin_abort (); > + return 0; > +} > > Jakub >
Re: [PATCH 0/3] Enable pointer_query caching throughout.
On Thu, Feb 3, 2022 at 6:09 PM Martin Sebor wrote: > > On 2/3/22 15:56, David Edelsohn wrote: > > This series of patches has exploded memory usage and I can no longer > > bootstrap GCC on AIX. > > > > As with the Ranger problem exposed by Aldy's patch last September, > > something is not freeing memory. > > > > Even on systems where GCC still bootstrap, this excessive memory usage > > severely damages GCC compile performance. > > Does the change below by any chance make a difference? (It's just > a hunch, I haven't tested it beyond quickly building stage 1 and > running a few tests.) Hi, Martin Thanks for the quick response. Yes, I am able to restore bootstrap on AIX (32 bit) with the change. Thanks, David > > Martin > > > diff --git a/gcc/pointer-query.h b/gcc/pointer-query.h > index 4c725eeaf34..801a240c38d 100644 > --- a/gcc/pointer-query.h > +++ b/gcc/pointer-query.h > @@ -164,9 +164,9 @@ class pointer_query > struct cache_type > { > /* 1-based indices into cache. */ > -vec indices; > +auto_vec indices; > /* The cache itself. */ > -vec access_refs; > +auto_vec access_refs; > }; > > public:
Re: [PATCH 0/3] Enable pointer_query caching throughout.
This series of patches has exploded memory usage and I can no longer bootstrap GCC on AIX. As with the Ranger problem exposed by Aldy's patch last September, something is not freeing memory. Even on systems where GCC still bootstrap, this excessive memory usage severely damages GCC compile performance. Thanks, David
Re: [PATCH] rs6000: Fix up #include or [PR104239]
On Wed, Jan 26, 2022 at 3:45 PM Jakub Jelinek wrote: > > Hi! > > r12-4717-g7d37abedf58d66 added immintrin.h and x86gprintrin.h headers > to rs6000, these headers are on x86 standalone headers that various > programs include directly rather than including them through > . > Unfortunately, for that change the bmiintrin.h and bmi2intrin.h > headers haven't been adjusted, so the effect is that if one includes them > (without including also x86intrin.h first) #error will trigger. > Furthermore, when including such headers conditionally as some real-world > packages do, this means a regression. > > The following patch fixes it and matches what the x86 bmi{,2}intrin.h > headers do. > > Bootstrapped/regtested on powerpc64le-linux, ok for trunk? Okay. Thanks for catching this. - David > > 2022-01-26 Jakub Jelinek > > PR target/104239 > * config/rs6000/bmiintrin.h: Test _X86GPRINTRIN_H_INCLUDED instead of > _X86INTRIN_H_INCLUDED and adjust #error wording. > * config/rs6000/bmi2intrin.h: Likewise. > > * gcc.target/powerpc/pr104239-1.c: New test. > * gcc.target/powerpc/pr104239-2.c: New test. > > --- gcc/config/rs6000/bmiintrin.h.jj2022-01-11 23:11:21.936296534 +0100 > +++ gcc/config/rs6000/bmiintrin.h 2022-01-26 13:35:08.705945170 +0100 > @@ -29,8 +29,8 @@ > standard C or GNU C extensions, which are more portable and better > optimized across multiple targets. */ > > -#if !defined _X86INTRIN_H_INCLUDED > -# error "Never use directly; include instead." > +#if !defined _X86GPRINTRIN_H_INCLUDED > +# error "Never use directly; include instead." > #endif > > #ifndef _BMIINTRIN_H_INCLUDED > --- gcc/config/rs6000/bmi2intrin.h.jj 2022-01-11 23:11:21.936296534 +0100 > +++ gcc/config/rs6000/bmi2intrin.h 2022-01-26 13:34:53.373162122 +0100 > @@ -29,8 +29,8 @@ > standard C or GNU C extensions, which are more portable and better > optimized across multiple targets. */ > > -#if !defined _X86INTRIN_H_INCLUDED > -# error "Never use directly; include instead." > +#if !defined _X86GPRINTRIN_H_INCLUDED > +# error "Never use directly; include > instead." > #endif > > #ifndef _BMI2INTRIN_H_INCLUDED > --- gcc/testsuite/gcc.target/powerpc/pr104239-1.c.jj2022-01-26 > 13:42:34.103643030 +0100 > +++ gcc/testsuite/gcc.target/powerpc/pr104239-1.c 2022-01-26 > 13:42:23.101798701 +0100 > @@ -0,0 +1,9 @@ > +/* PR target/104239 */ > +/* { dg-do compile } */ > +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */ > + > +#if __has_include() > +#include > +#endif > + > +int i; > --- gcc/testsuite/gcc.target/powerpc/pr104239-2.c.jj2022-01-26 > 13:42:42.279527345 +0100 > +++ gcc/testsuite/gcc.target/powerpc/pr104239-2.c 2022-01-26 > 13:42:23.101798701 +0100 > @@ -0,0 +1,9 @@ > +/* PR target/104239 */ > +/* { dg-do compile } */ > +/* { dg-options "-DNO_WARN_X86_INTRINSICS" } */ > + > +#if __has_include() > +#include > +#endif > + > +int i; > > Jakub >
Re: [PATCH] Disable -fsplit-stack support on non-glibc targets
This patch broke bootstrap on AIX. It may have broken Darwin. I have applied the following patch. AIX doesn't need to distinguish between different Linux libc implementations. Bootstrapped on powerpc-ibm-aix7.2.3.0 Thanks, David aix: AIX is not GLIBC. A recent patch added tests for OPTION_GLIBC that is defined in linux.h and linux64.h. This broke bootstrap for non-Linux rs6000 configurations. This patch defines OPTION_GLIBC as 0. * config/rs6000/aix.h (OPTION_GLIBC): Define as 0. diff --git a/gcc/config/rs6000/aix.h b/gcc/config/rs6000/aix.h index ad3238bf09a..eb7a0c09f72 100644 --- a/gcc/config/rs6000/aix.h +++ b/gcc/config/rs6000/aix.h @@ -23,6 +23,7 @@ #define DEFAULT_ABI ABI_AIX #undef TARGET_AIX #define TARGET_AIX 1 +#define OPTION_GLIBC 0
Re: Ping: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10.
On Fri, Jan 21, 2022 at 2:56 PM Michael Meissner wrote: > > Ping patch > https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587924.html > > | Date: Fri, 7 Jan 2022 16:05:53 -0500 > | From: Michael Meissner > | Subject: [PATCH] PR 103763, Fix fold-vec-splat-floatdouble on power10. > | Message-ID: This patch is okay. Thanks, David
Re: [PATCH v2, rs6000] Add a combine pattern for CA minus one [PR95737]
On Thu, Jan 20, 2022 at 2:36 AM HAO CHEN GUI wrote: > > Hi, >This patch adds a combine pattern for "CA minus one". As CA only has two > values (0 or 1), we could convert following pattern > (sign_extend:DI (plus:SI (reg:SI 98 ca) > (const_int -1 [0x] > to >(plus:DI (reg:DI 98 ca) > (const_int -1 [0x]))) >With this patch, one unnecessary sign extend is eliminated. > >Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-01-20 Haochen Gui > > gcc/ > * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define. > > gcc/testsuite/ > * gcc.target/powerpc/pr95737.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6ecb0bd6142..1d8b212962f 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -2358,6 +2358,19 @@ (define_insn "subf3_carry_in_xx" >"subfe %0,%0,%0" >[(set_attr "type" "add")]) > > +(define_insn_and_split "*extenddi_ca_minus_one" > + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") > + (sign_extend:DI (plus:SI (reg:SI CA_REGNO) > +(const_int -1] > + "" > + "#" > + "" > + [(parallel [(set (match_dup 0) > + (plus:DI (reg:DI CA_REGNO) > + (const_int -1))) > + (clobber (reg:DI CA_REGNO))])] > + "" > +) > > (define_insn "@neg2" >[(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c > b/gcc/testsuite/gcc.target/powerpc/pr95737.c > new file mode 100644 > index 000..94320f23423 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c > @@ -0,0 +1,10 @@ > +/* PR target/95737 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ Why does the testcase force power8? This testcase is not specific to Power8 or later. > +/* { dg-final { scan-assembler-not {\mextsw\M} } } */ > + > + > +unsigned long long negativeLessThan (unsigned long long a, unsigned long > long b) > +{ > + return -(a < b); > +} If you're only testing for lp64, the testcase could use "long" instead of "long long". This is okay with those changes. Thanks, David
Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]
On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI wrote: > > Hi, >This patch adds a combine pattern for "CA minus one". As CA only has two > values (0 or 1), we could convert following pattern > (sign_extend:DI (plus:SI (reg:SI 98 ca) > (const_int -1 [0x] > to >(plus:DI (reg:DI 98 ca) > (const_int -1 [0x]))) > With this patch, it eliminates one unnecessary sign extend. Also in > rs6000, > regclass of CA register is set to NO_REGS. So CA is not in hard register set > and it can't match register_operand. The patch changes it to any_operand. Segher changed the class in 2014. https://gcc.gnu.org/pipermail/gcc-patches/2014-September/399192.html We need to ensure that it still is the correct decision in light of these new patterns. Thanks, David > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-01-19 Haochen Gui > > gcc/ > * config/rs6000/predicates.md (ca_operand): Match any_operand as CA > register is not in hard register set. > * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define. > > gcc/testsuite/ > * gcc.target/powerpc/pr95737.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index c65dfb91f3d..cd2ae1dc8e0 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand" > > ;; Return 1 if op is the carry register. > (define_predicate "ca_operand" > - (match_operand 0 "register_operand") > + (match_operand 0 "any_operand") > { >if (SUBREG_P (op)) > op = SUBREG_REG (op); > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6ecb0bd6142..f1b09aad3b5 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx" >"subfe %0,%0,%0" >[(set_attr "type" "add")]) > > +(define_insn_and_split "*extenddi_ca_minus_one" > + [(set (match_operand:DI 0 "gpc_reg_operand") > + (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand") > +(const_int -1] > + "" > + "#" > + "" > + [(parallel [(set (match_dup 0) > + (plus:DI (match_dup 2) > + (const_int -1))) > + (clobber (match_dup 2))])] > +{ > + operands[2] = copy_rtx (operands[1]); > + PUT_MODE (operands[2], DImode); > +}) > > (define_insn "@neg2" >[(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c > b/gcc/testsuite/gcc.target/powerpc/pr95737.c > new file mode 100644 > index 000..94320f23423 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c > @@ -0,0 +1,10 @@ > +/* PR target/95737 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-final { scan-assembler-not {\mextsw\M} } } */ > + > + > +unsigned long long negativeLessThan (unsigned long long a, unsigned long > long b) > +{ > + return -(a < b); > +}
Re: [PATCH, rs6000] Add a combine pattern for CA minus one [PR95737]
On Wed, Jan 19, 2022 at 2:12 AM HAO CHEN GUI wrote: > > Hi, >This patch adds a combine pattern for "CA minus one". As CA only has two > values (0 or 1), we could convert following pattern > (sign_extend:DI (plus:SI (reg:SI 98 ca) > (const_int -1 [0x] > to >(plus:DI (reg:DI 98 ca) > (const_int -1 [0x]))) > With this patch, it eliminates one unnecessary sign extend. Also in > rs6000, > regclass of CA register is set to NO_REGS. So CA is not in hard register set > and it can't match register_operand. The patch changes it to any_operand. CA_REGNO should be in class CA_REGS, not class NO_REGS. This seems like a major, latent bug. Thanks, David > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2022-01-19 Haochen Gui > > gcc/ > * config/rs6000/predicates.md (ca_operand): Match any_operand as CA > register is not in hard register set. > * config/rs6000/rs6000.md (extenddi_ca_minus_one): Define. > > gcc/testsuite/ > * gcc.target/powerpc/pr95737.c: New. > > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index c65dfb91f3d..cd2ae1dc8e0 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -188,7 +188,7 @@ (define_predicate "vlogical_operand" > > ;; Return 1 if op is the carry register. > (define_predicate "ca_operand" > - (match_operand 0 "register_operand") > + (match_operand 0 "any_operand") > { >if (SUBREG_P (op)) > op = SUBREG_REG (op); > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6ecb0bd6142..f1b09aad3b5 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -2358,6 +2358,21 @@ (define_insn "subf3_carry_in_xx" >"subfe %0,%0,%0" >[(set_attr "type" "add")]) > > +(define_insn_and_split "*extenddi_ca_minus_one" > + [(set (match_operand:DI 0 "gpc_reg_operand") > + (sign_extend:DI (plus:SI (match_operand:SI 1 "ca_operand") > +(const_int -1] > + "" > + "#" > + "" > + [(parallel [(set (match_dup 0) > + (plus:DI (match_dup 2) > + (const_int -1))) > + (clobber (match_dup 2))])] > +{ > + operands[2] = copy_rtx (operands[1]); > + PUT_MODE (operands[2], DImode); > +}) > > (define_insn "@neg2" >[(set (match_operand:GPR 0 "gpc_reg_operand" "=r") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr95737.c > b/gcc/testsuite/gcc.target/powerpc/pr95737.c > new file mode 100644 > index 000..94320f23423 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr95737.c > @@ -0,0 +1,10 @@ > +/* PR target/95737 */ > +/* { dg-do compile { target lp64 } } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ > +/* { dg-final { scan-assembler-not {\mextsw\M} } } */ > + > + > +unsigned long long negativeLessThan (unsigned long long a, unsigned long > long b) > +{ > + return -(a < b); > +}
Re: [PATCH] libstdc++: Implement C++20 atomic and atomic
This patch introduced new AIX testsuite failures. PR libstdc++/104101 Thanks, David
Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar
On Fri, Jan 14, 2022 at 5:42 AM Kewen.Lin wrote: > > on 2022/1/13 下午11:15, David Edelsohn wrote: > > On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin wrote: > >> > >> Hi David, > >> > >> on 2022/1/13 上午11:12, David Edelsohn wrote: > >>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > >>>> > >>>> Hi, > >>>> > >>>> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or > >>>> GET_MODE_NUNITS, which can use known constant instead. > >>> > >>> I'll let Segher decide, but often the additional code is useful > >>> self-documentation instead of magic constants. Or at least the change > >>> requires comments documenting the derivation of the constants > >>> currently described by the code itself. > >>> > >> > >> Thanks for the comments, I added some comments as suggested, also removed > >> the whole "altivec_vreveti2" since I noticed it's useless, it's not used > >> by any built-in functions and even unused in the commit db042e1603db50573. > >> > >> The updated version has been tested as before. > > > > As we have discussed offline, the comments need to be clarified and > > expanded. > > > > And the removal of altivec_vreveti2 should be confirmed with Carl > > Love, who added the pattern less than a year ago. There may be another > > patch planning to use it. > > > > Thanks for the suggestions David! The comments have been updated, and Carl > also helped to confirm the altivec_vreveti2 pattern is not planned for any > future work and looks reasonable to remove. > > Does this updated version look good to you? The revised patch is okay. Thanks, David > > BR, > Kewen > - > gcc/ChangeLog: > > * config/rs6000/altivec.md (altivec_vreveti2): Remove. > * config/rs6000/vsx.md (*vsx_extract_si, > *vsx_extract_si_float_df, > *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use > known constant values to simplify code. > --- > gcc/config/rs6000/altivec.md | 25 - > gcc/config/rs6000/vsx.md | 16 > 2 files changed, 12 insertions(+), 29 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 950b17862c4..4412175a0dc 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -3950,31 +3950,6 @@ >DONE; > }) > > -;; Vector reverse elements > -(define_expand "altivec_vreveti2" > - [(set (match_operand:TI 0 "register_operand" "=v") > - (unspec:TI [(match_operand:TI 1 "register_operand" "v")] > - UNSPEC_VREVEV))] > - "TARGET_ALTIVEC" > -{ > - int i, j, size, num_elements; > - rtvec v = rtvec_alloc (16); > - rtx mask = gen_reg_rtx (V16QImode); > - > - size = GET_MODE_UNIT_SIZE (TImode); > - num_elements = GET_MODE_NUNITS (TImode); > - > - for (j = 0; j < num_elements; j++) > -for (i = 0; i < size; i++) > - RTVEC_ELT (v, i + j * size) > - = GEN_INT (i + (num_elements - 1 - j) * size); > - > - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); > - emit_insn (gen_altivec_vperm_ti (operands[0], operands[1], > -operands[1], mask)); > - DONE; > -}) > - > ;; Vector reverse elements for V16QI V8HI V4SI V4SF > (define_expand "altivec_vreve2" >[(set (match_operand:VEC_K 0 "register_operand" "=v") > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index acd729d1687..ee748ff4ebd 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3854,8 +3854,10 @@ >rtx vec_tmp = operands[3]; >int value; > > + /* Adjust index for LE element ordering, the below minuend 3 is computed by > + GET_MODE_NUNITS (V4SImode) - 1. */ >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4230,8 +4232,10 @@ >rtx v4si_tmp = operands[3]; >int value; > > + /* Adjust index for LE element ordering, the below minuend 3 is computed by > + GET_MODE_NUNITS (V4SImode) - 1. */ >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in
Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]
On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin wrote: > > on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote: > > on 2022/1/13 上午11:44, David Edelsohn wrote: > >> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin wrote: > >>> > >>> Hi David, > >>> > >>> on 2022/1/13 上午11:07, David Edelsohn wrote: > >>>> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> This patch is to fix register constraint v with > >>>>> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS, > >>>>> just like some other existing register constraints with > >>>>> RS6000_CONSTRAINT_*. > >>>>> > >>>>> I happened to see this and hope it's not intentional and just > >>>>> got neglected. > >>>>> > >>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > >>>>> powerpc64-linux-gnu P8. > >>>>> > >>>>> Is it ok for trunk? > >>>> > >>>> Why do you want to make this change? > >>>> > >>>> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS; > >>>> > >>>> but all of the patterns that use a "v" constraint are (or should be) > >>>> protected by TARGET_ALTIVEC, or some final condition that only is > >>>> active for TARGET_ALTIVEC. The other constraints are conditionally > >>>> set because they can be used in a pattern with multiple alternatives > >>>> where the pattern itself is active but some of the constraints > >>>> correspond to NO_REGS when some instruction variants for VSX is not > >>>> enabled. > >>>> > >>> > >>> Good point! Thanks for the explanation. > >>> > >>>> The change isn't wrong, but it doesn't correct a bug and provides no > >>>> additional benefit nor clarty that I can see. > >>>> > >>> > >>> The original intention is to make it consistent with the other existing > >>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit > >>> weird (like was neglected). After you clarified above, > >>> RS6000_CONSTRAINT_v > >>> seems useless at all in the current framework. Do you prefer to remove > >>> it to avoid any confusions instead? > >> > >> It's used in the reg_class, so there may be some heuristic in the GCC > >> register allocator that cares about the number of registers available > >> for the target. rs6000_constraints[RS6000_CONSTRAINT_v] is defined > >> conditionally, so it seems best to leave it as is. > >> > > > > I may miss something, but I didn't find it's used for the above purposes. > > If it's best to leave it as is, the proposed patch seems to offer better > > readability. > > Two more inputs for maintainers' decision: > > 1) the original proposed patch fixed one "bug" that is: > > In function rs6000_debug_reg_global, it tries to print the register class > for the register constraint: > > fprintf (stderr, >"\n" >"d reg_class = %s\n" >"f reg_class = %s\n" >"v reg_class = %s\n" >"wa reg_class = %s\n" >... >"\n", >reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]], >reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]], >reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]], >reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]], >... > > It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally > set here: > > /* Add conditional constraints based on various options, to allow us to > collapse multiple insn patterns. */ > if (TARGET_ALTIVEC) > rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS; > > But the actual register class for register constraint is hardcoded as > ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v]. I agree that the information is inaccurate, but it is informal debugging output. And if Altivec is disabled, the value of the constraint is irrelevant / garbage. > > 2) Bootstrapped and tested one below patch to remove all the code using > RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9, > powerpc64-linux-gnu P
Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar
On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin wrote: > > Hi David, > > on 2022/1/13 上午11:12, David Edelsohn wrote: > > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > >> > >> Hi, > >> > >> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or > >> GET_MODE_NUNITS, which can use known constant instead. > > > > I'll let Segher decide, but often the additional code is useful > > self-documentation instead of magic constants. Or at least the change > > requires comments documenting the derivation of the constants > > currently described by the code itself. > > > > Thanks for the comments, I added some comments as suggested, also removed > the whole "altivec_vreveti2" since I noticed it's useless, it's not used > by any built-in functions and even unused in the commit db042e1603db50573. > > The updated version has been tested as before. As we have discussed offline, the comments need to be clarified and expanded. And the removal of altivec_vreveti2 should be confirmed with Carl Love, who added the pattern less than a year ago. There may be another patch planning to use it. Thanks, David > > BR, > Kewen > - > gcc/ChangeLog: > > * config/rs6000/altivec.md (altivec_vreveti2): Remove. > * config/rs6000/vsx.md (*vsx_extract_si, > *vsx_extract_si_float_df, > *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use > known constant values to simplify code. > --- > gcc/config/rs6000/altivec.md | 25 - > gcc/config/rs6000/vsx.md | 12 > 2 files changed, 8 insertions(+), 29 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index c2312cc1e0f..b7f056f8c60 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2" >DONE; > }) > > -;; Vector reverse elements > -(define_expand "altivec_vreveti2" > - [(set (match_operand:TI 0 "register_operand" "=v") > - (unspec:TI [(match_operand:TI 1 "register_operand" "v")] > - UNSPEC_VREVEV))] > - "TARGET_ALTIVEC" > -{ > - int i, j, size, num_elements; > - rtvec v = rtvec_alloc (16); > - rtx mask = gen_reg_rtx (V16QImode); > - > - size = GET_MODE_UNIT_SIZE (TImode); > - num_elements = GET_MODE_NUNITS (TImode); > - > - for (j = 0; j < num_elements; j++) > -for (i = 0; i < size; i++) > - RTVEC_ELT (v, i + j * size) > - = GEN_INT (i + (num_elements - 1 - j) * size); > - > - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); > - emit_insn (gen_altivec_vperm_ti (operands[0], operands[1], > -operands[1], mask)); > - DONE; > -}) > - > ;; Vector reverse elements for V16QI V8HI V4SI V4SF > (define_expand "altivec_vreve2" >[(set (match_operand:VEC_K 0 "register_operand" "=v") > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 802db0d112b..d246410880d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3854,8 +3854,9 @@ (define_insn_and_split "*vsx_extract_si" >rtx vec_tmp = operands[3]; >int value; > > + /* Adjust index for LE element ordering. */ >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df" >rtx v4si_tmp = operands[3]; >int value; > > + /* Adjust index for LE element ordering. */ >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4273,8 +4275,9 @@ (define_insn_and_split > "*vsx_extract_si_float_" >rtx df_tmp = operands[4]; >int value; > > + /* Adjust index for LE element ordering. */ >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9" > { >int ele = INTVAL (operands[4]); > > + /* Adjust index for LE element ordering. */ >if (!BYTES_BIG_ENDIAN) > -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele; > +ele = 3 - ele; > >operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele); >return "xxinsertw %x0,%x2,%4"; > -- > 2.27.0 >
Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]
On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin wrote: > > Hi David, > > on 2022/1/13 上午11:07, David Edelsohn wrote: > > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > >> > >> Hi, > >> > >> This patch is to fix register constraint v with > >> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS, > >> just like some other existing register constraints with > >> RS6000_CONSTRAINT_*. > >> > >> I happened to see this and hope it's not intentional and just > >> got neglected. > >> > >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > >> powerpc64-linux-gnu P8. > >> > >> Is it ok for trunk? > > > > Why do you want to make this change? > > > > rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS; > > > > but all of the patterns that use a "v" constraint are (or should be) > > protected by TARGET_ALTIVEC, or some final condition that only is > > active for TARGET_ALTIVEC. The other constraints are conditionally > > set because they can be used in a pattern with multiple alternatives > > where the pattern itself is active but some of the constraints > > correspond to NO_REGS when some instruction variants for VSX is not > > enabled. > > > > Good point! Thanks for the explanation. > > > The change isn't wrong, but it doesn't correct a bug and provides no > > additional benefit nor clarty that I can see. > > > > The original intention is to make it consistent with the other existing > register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit > weird (like was neglected). After you clarified above, RS6000_CONSTRAINT_v > seems useless at all in the current framework. Do you prefer to remove > it to avoid any confusions instead? It's used in the reg_class, so there may be some heuristic in the GCC register allocator that cares about the number of registers available for the target. rs6000_constraints[RS6000_CONSTRAINT_v] is defined conditionally, so it seems best to leave it as is. Thanks, David
Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar
On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > > Hi, > > This patch is to clean up some codes with GET_MODE_UNIT_SIZE or > GET_MODE_NUNITS, which can use known constant instead. I'll let Segher decide, but often the additional code is useful self-documentation instead of magic constants. Or at least the change requires comments documenting the derivation of the constants currently described by the code itself. Thanks, David > > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8. > > Is it ok for trunk? > > BR, > Kewen > - > gcc/ChangeLog: > > * config/rs6000/altivec.md (altivec_vreveti2): Use known constant > values to simplify code. > * config/rs6000/vsx.md (*vsx_extract_si, > *vsx_extract_si_float_df, > *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): > Likewise. > --- > gcc/config/rs6000/altivec.md | 11 +++ > gcc/config/rs6000/vsx.md | 8 > 2 files changed, 7 insertions(+), 12 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index c2312cc1e0f..d5c4ecfa9b7 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -3957,17 +3957,12 @@ (define_expand "altivec_vreveti2" > UNSPEC_VREVEV))] >"TARGET_ALTIVEC" > { > - int i, j, size, num_elements; > + int i; >rtvec v = rtvec_alloc (16); >rtx mask = gen_reg_rtx (V16QImode); > > - size = GET_MODE_UNIT_SIZE (TImode); > - num_elements = GET_MODE_NUNITS (TImode); > - > - for (j = 0; j < num_elements; j++) > -for (i = 0; i < size; i++) > - RTVEC_ELT (v, i + j * size) > - = GEN_INT (i + (num_elements - 1 - j) * size); > + for (i = 0; i < 16; i++) > +RTVEC_ELT (v, i) = GEN_INT (i); > >emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); >emit_insn (gen_altivec_vperm_ti (operands[0], operands[1], > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index 802db0d112b..892b99c6d6b 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3855,7 +3855,7 @@ (define_insn_and_split "*vsx_extract_si" >int value; > >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4231,7 +4231,7 @@ (define_insn_and_split "*vsx_extract_si_float_df" >int value; > >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4274,7 +4274,7 @@ (define_insn_and_split > "*vsx_extract_si_float_" >int value; > >if (!BYTES_BIG_ENDIAN) > -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element)); > +element = GEN_INT (3 - INTVAL (element)); > >/* If the value is in the correct position, we can avoid doing the VSPLT > instruction. */ > @@ -4467,7 +4467,7 @@ (define_insn "*vsx_insert_extract_v4sf_p9" >int ele = INTVAL (operands[4]); > >if (!BYTES_BIG_ENDIAN) > -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele; > +ele = 3 - ele; > >operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele); >return "xxinsertw %x0,%x2,%4"; > -- > 2.27.0 >
Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]
On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin wrote: > > Hi, > > This patch is to fix register constraint v with > rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS, > just like some other existing register constraints with > RS6000_CONSTRAINT_*. > > I happened to see this and hope it's not intentional and just > got neglected. > > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8. > > Is it ok for trunk? Why do you want to make this change? rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS; but all of the patterns that use a "v" constraint are (or should be) protected by TARGET_ALTIVEC, or some final condition that only is active for TARGET_ALTIVEC. The other constraints are conditionally set because they can be used in a pattern with multiple alternatives where the pattern itself is active but some of the constraints correspond to NO_REGS when some instruction variants for VSX is not enabled. The change isn't wrong, but it doesn't correct a bug and provides no additional benefit nor clarty that I can see. Thanks, David
Re: [PATCH, rs6000] Enable absolute jump table by default
On Wed, Jan 12, 2022 at 8:40 PM HAO CHEN GUI wrote: > > Hi David, > > On 12/1/2022 下午 10:44, David Edelsohn wrote: > > On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI wrote: > >> > >> Hi, > >>This patch enables absolute jump table by default on rs6000. The > >> relative jump tables are used when > >>it's explicit set by "rs6000_relative_jumptables", > >>or jump tables are placed in text section but global relocation is > >> required. > >> > >>Bootstrapped and tested on powerpc64-linux BE and LE with no > >> regressions. Is this okay for trunk? > >> Any recommendations? Thanks a lot. > >> > >> ChangeLog > >> 2022-01-12 Haochen Gui > >> > >> gcc/ > >> * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define. > >> * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return > >> true when relative jump table is explicit required or jump tables > >> have > >> to be put in text section but global relocation is also required. > >> * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable. > >> > >> patch.diff > >> diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h > >> index d617f346f81..2e257c60f8c 100644 > >> --- a/gcc/config/rs6000/linux64.h > >> +++ b/gcc/config/rs6000/linux64.h > >> @@ -239,7 +239,7 @@ extern int dot_symbols; > >> > >> /* Indicate that jump tables go in the text section. */ > >> #undef JUMP_TABLES_IN_TEXT_SECTION > >> -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT > >> +#define JUMP_TABLES_IN_TEXT_SECTION 0 > >> > >> /* The linux ppc64 ABI isn't explicit on whether aggregates smaller > >> than a doubleword should be padded upward or downward. You could > >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > >> index 319182e94d9..9fba893a27a 100644 > >> --- a/gcc/config/rs6000/rs6000.c > >> +++ b/gcc/config/rs6000/rs6000.c > >> @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value) > >> static bool > >> rs6000_gen_pic_addr_diff_vec (void) > >> { > >> - return rs6000_relative_jumptables; > >> + return rs6000_relative_jumptables > >> +|| (JUMP_TABLES_IN_TEXT_SECTION > >> +&& targetm.asm_out.reloc_rw_mask () == 3); > >> } > > > > This seems like contorted logic and overriding the > > rs6000_relative_jumptables option change. The later part of the patch > > overrides rs6000_relative_jumptables for all rs6000 configurations, > > and then changes this one use of rs6000_relative_jumptables to add > > more logic to revert to the old meaning for some targets. > > > > What about all of the other uses of rs6000_relative_jumptables in the > > target? What about rs6000.md? > > > > I highly doubt that this patch is correct. > > > > Why not override rs6000_relative_jumptables for PPC64 Linux instead of > > changing its value globally and then trying to fix it up? > > > > Thanks, David > Thanks for your comments. > > In this patch, I tried to enable absolute jump table on all rs6000 targets. > For PPC64 Linux, it supports RELRO section (e.g. .data.rel.ro.local) as > "JUMP_TABLES_IN_TEXT_SECTION" is set to 0. So, absolute jump tables could be > placed > in RELRO section whatever global relocation is required or not. The absolute > jump > table can't be placed in text section when global relocation is required as > text > section is not writable. So for other rs6000 targets, absolute jump table > can't be > used if the target doesn't support RELRO and global relocation is required > also. > > Looking forward to your advice. Thanks again. Why not override rs6000_relative_jumptables in rs6000_option_override_internal() for PPC64 Linux subtarget instead of changing the default value and then trying to fix the behavior for other configurations in rs6000_gen_pic_addr_diff_vec()? Or override it in SUBSUBTARGET_OVERRIDE_OPTIONS in linux64.h, or whichever header file(s) is appropriate for the subtarget variants. Your initial patch also changed rs6000_gen_pic_addr_diff_vec but didn't address the use of rs6000_relative_jumptables in the definition of CASE_VECTOR_MODE in rs6000.h. That cannot have been correct. At least without the change to the default value of rs6000_relative_jumptables you don't need to add kludges to all of the places where that variable is used for other subtarget variants of the rs6000 target. Thanks, David
Re: [PATCH, rs6000] Enable absolute jump table by default
On Wed, Jan 12, 2022 at 7:22 AM HAO CHEN GUI wrote: > > Hi, >This patch enables absolute jump table by default on rs6000. The relative > jump tables are used when >it's explicit set by "rs6000_relative_jumptables", >or jump tables are placed in text section but global relocation is > required. > >Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? > Any recommendations? Thanks a lot. > > ChangeLog > 2022-01-12 Haochen Gui > > gcc/ > * config/rs6000/linux64.h (JUMP_TABLES_IN_TEXT_SECTION): Define. > * config/rs6000/rs6000.c (rs6000_gen_pic_addr_diff_vec): Return > true when relative jump table is explicit required or jump tables have > to be put in text section but global relocation is also required. > * config/rs6000/rs6000.opt (rs6000_relative_jumptables): Disable. > > patch.diff > diff --git a/gcc/config/rs6000/linux64.h b/gcc/config/rs6000/linux64.h > index d617f346f81..2e257c60f8c 100644 > --- a/gcc/config/rs6000/linux64.h > +++ b/gcc/config/rs6000/linux64.h > @@ -239,7 +239,7 @@ extern int dot_symbols; > > /* Indicate that jump tables go in the text section. */ > #undef JUMP_TABLES_IN_TEXT_SECTION > -#define JUMP_TABLES_IN_TEXT_SECTION TARGET_64BIT > +#define JUMP_TABLES_IN_TEXT_SECTION 0 > > /* The linux ppc64 ABI isn't explicit on whether aggregates smaller > than a doubleword should be padded upward or downward. You could > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index 319182e94d9..9fba893a27a 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -28465,7 +28465,9 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value) > static bool > rs6000_gen_pic_addr_diff_vec (void) > { > - return rs6000_relative_jumptables; > + return rs6000_relative_jumptables > +|| (JUMP_TABLES_IN_TEXT_SECTION > +&& targetm.asm_out.reloc_rw_mask () == 3); > } This seems like contorted logic and overriding the rs6000_relative_jumptables option change. The later part of the patch overrides rs6000_relative_jumptables for all rs6000 configurations, and then changes this one use of rs6000_relative_jumptables to add more logic to revert to the old meaning for some targets. What about all of the other uses of rs6000_relative_jumptables in the target? What about rs6000.md? I highly doubt that this patch is correct. Why not override rs6000_relative_jumptables for PPC64 Linux instead of changing its value globally and then trying to fix it up? Thanks, David > > void > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index c2a77182a9e..75e3fa86829 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -630,7 +630,7 @@ Target Mask(MMA) Var(rs6000_isa_flags) > Generate (do not generate) MMA instructions. > > mrelative-jumptables > -Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save > +Target Undocumented Var(rs6000_relative_jumptables) Init(0) Save > > mrop-protect > Target Var(rs6000_rop_protect) Init(0)
libgfortran bootstrap failure
The recent patch to support Power IEEE128 causes a bootstrap failure on AIX and possibly all non-GLIBC systems. +#if defined(__powerpc64__) && __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ \ +&& defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32) +#define POWER_IEEE128 1 +#endif __GLIBC_PREREQ is tested on all systems. /nasfarm/edelsohn/src/src/libgfortran/libgfortran.h:107:49: error: missing binary operator before token "(" 107 | && defined __GLIBC_PREREQ && __GLIBC_PREREQ (2, 32) | ^
Re: [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
Suppress exceptions (when specified), by saving, manipulating, and restoring the FPSCR. Similarly, save, set, and restore the floating-point rounding mode when required. No attempt is made to optimize writing the FPSCR (by checking if the new value would be the same), other than using lighter weight instructions when possible. Note that explicit instruction scheduling "barriers" are added to prevent floating-point computations from being moved before or after the explicit FPSCR manipulations. (That these are required has been reported as an issue in GCC: PR102783.) The scalar versions naively use the parallel versions to compute the single scalar result and then construct the remainder of the result. Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO are swapped from the corresponding values on x86 so as to match the corresponding rounding mode values in the Power ISA. Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and convert _mm_ceil* and _mm_floor* into macros. This matches the current analogous implementations in config/i386/smmintrin.h. Function signatures match the analogous functions in config/i386/smmintrin.h. Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, modeled after the very similar "floor" and "ceil" tests. Include basic tests, plus tests at the boundaries for floating-point representation, positive and negative, test all of the parameterized rounding modes as well as the C99 rounding modes and interactions between the two. Exceptions are not explicitly tested. 2021-10-18 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT, _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF, _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC, _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC, _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New. * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss): Convert from function to macro. gcc/testsuite * gcc.target/powerpc/sse4_1-round3.h: New. * gcc.target/powerpc/sse4_1-roundpd.c: New. * gcc.target/powerpc/sse4_1-roundps.c: New. * gcc.target/powerpc/sse4_1-roundsd.c: New. * gcc.target/powerpc/sse4_1-roundss.c: New. Okay. Thanks, David
Re: [PATCH] PR 102935, Fix pr101384-1.c code generation test.
On Tue, Jan 11, 2022 at 12:06 PM Bill Schmidt wrote: > > Hi Mike, > > This looks fine to me. Maintainers? Okay. Thanks, David > > Thanks, > Bill > > On 1/7/22 6:33 PM, Michael Meissner wrote: > > Fix pr101384-1.c code generation test. > > > > Add support for the compiler using XXSPLTIB reg,255 to load all 1's into a > > register on power9 and above instead of using VSPLTI{B,H,W} reg,-1. > > > > gcc/testsuite/ > > 2022-01-07 Michael Meissner > > > > PR testsuite/102935 > > * gcc.target/powerpc/pr101384-1.c: Update insn regexp for power9 > > and power10. > > --- > > gcc/testsuite/gcc.target/powerpc/pr101384-1.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c > > b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c > > index 627d7d76721..41cf84bf8bc 100644 > > --- a/gcc/testsuite/gcc.target/powerpc/pr101384-1.c > > +++ b/gcc/testsuite/gcc.target/powerpc/pr101384-1.c > > @@ -2,7 +2,7 @@ > > /* { dg-do compile { target le } } */ > > /* { dg-options "-O2 -maltivec" } */ > > /* { dg-require-effective-target powerpc_altivec_ok } */ > > -/* { dg-final { scan-assembler-times {\mvspltis[whb] [^\n\r]*,-1\M} 9 } } > > */ > > +/* { dg-final { scan-assembler-times {\mvspltis[whb] > > [^\n\r]*,-1\M|\mxxspltib[^\n\r]*,255\M} 9 } } */ > > /* { dg-final { scan-assembler-times {\mvslw\M} 3 } } */ > > /* { dg-final { scan-assembler-times {\mvslh\M} 3 } } */ > > /* { dg-final { scan-assembler-times {\mvslb\M} 3 } } */
Re: Ping: [PATCH] rs6000: Add split pattern to replace
On Tue, Jan 11, 2022 at 2:27 AM Xionghu Luo wrote: > > On 2022/1/11 06:55, David Edelsohn wrote: > >>> +(define_insn_and_split "sldoi_to_mov_" > > It would be more consistent with the naming convention to use > > "sldoi_to_mov" without the final "_". > > OK, thanks. > > > > >>> + [(set (match_operand:VM 0 "altivec_register_operand") > >>> + (unspec:VM [(match_operand:VM 1 "easy_vector_constant") > > Should this be "easy_vector_constant_vsldoi"? > > > This doesn't work. easy_vector_constant_vsldoi return false due to > vspltis_shifted "return 0" as: > > vspltis_shifted (rtx op): /* If all elements are equal, we don't need to > do VSLDOI. */ > > > (gdb) p op > $7 = (rtx_def *) (const_vector:V4SI [ > (const_int 0 [0]) repeated x4 > ]) > (gdb) p easy_vector_constant_vsldoi(op, V4SImode) > $8 = false > p easy_vector_constant(op, V4SImode) > $9 = true Okay, thanks for checking. > > > > >>> + (match_dup 1) > >>> + (match_operand:VM 2 "u5bit_cint_operand")] > > This should be match_operand:QI, right? > > Yes. This patch is okay with the other changes. Thanks, David
Re: Ping: [PATCH] rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]
On Mon, Jan 10, 2022 at 12:37 AM Xionghu Luo wrote: > > Ping, thanks. > > > On 2021/12/13 13:16, Xionghu Luo wrote: > > Add specialized version to combine two instructions from > > > > 9: {r123:CC=cmp(r124:DI&0x6,0);clobber scratch;} > >REG_DEAD r124:DI > > 10: pc={(r123:CC==0)?L15:pc} > > REG_DEAD r123:CC > > > > to: > > > > 10: {pc={(r123:DI&0x6==0)?L15:pc};clobber scratch;clobber %0:CC;} > > > > then split2 will split it to one rotate dot instruction (to save one > > rotate back instruction) as shifted result doesn't matter when comparing > > to 0 in CCEQmode. > > > > Bootstrapped and regression tested pass on Power 8/9/10, OK for master? > > > > gcc/ChangeLog: > > > > PR target/102239 > > * config/rs6000/rs6000.md (*anddi3_insn_dot): New. > > > > gcc/testsuite/ChangeLog: > > > > PR target/102239 > > * gcc.target/powerpc/pr102239.c: New test. > > --- > > gcc/config/rs6000/rs6000-protos.h | 1 + > > gcc/config/rs6000/rs6000.c | 7 > > gcc/config/rs6000/rs6000.md | 38 + > > gcc/testsuite/gcc.target/powerpc/pr102239.c | 13 +++ > > 4 files changed, 59 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102239.c > > > > diff --git a/gcc/config/rs6000/rs6000-protos.h > > b/gcc/config/rs6000/rs6000-protos.h > > index 14f6b313105..3644c524376 100644 > > --- a/gcc/config/rs6000/rs6000-protos.h > > +++ b/gcc/config/rs6000/rs6000-protos.h > > @@ -73,6 +73,7 @@ extern int expand_block_move (rtx[], bool); > > extern bool expand_block_compare (rtx[]); > > extern bool expand_strn_compare (rtx[], int); > > extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode); > > +extern bool rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode); > > extern bool rs6000_is_valid_and_mask (rtx, machine_mode); > > extern bool rs6000_is_valid_shift_mask (rtx, rtx, machine_mode); > > extern bool rs6000_is_valid_insert_mask (rtx, rtx, machine_mode); > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index 5e129986516..57a38cf954a 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -11606,6 +11606,13 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, > > machine_mode mode) > >return true; > > } > > > > +bool > > +rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode) > > +{ > > + int nb, ne; > > + return rs6000_is_valid_mask (mask, &nb, &ne, mode) && nb >= ne && ne > 0; > > +} > > + > > /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, > > rldicl, > > or rldicr instruction, to implement an AND with it in mode MODE. */ > > > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > > index 6bec2bddbde..014dc9612ea 100644 > > --- a/gcc/config/rs6000/rs6000.md > > +++ b/gcc/config/rs6000/rs6000.md > > @@ -3762,6 +3762,44 @@ (define_insn_and_split "*and3_2insn_dot2" > > (set_attr "dot" "yes") > > (set_attr "length" "8,12")]) > > > > +(define_insn_and_split "*anddi3_insn_dot" This pattern needs a name that better represents its purpose. The pattern name implies that it's operating on a combination of AND and Record Condition bit. Also "insn" is confusing; I think that you are using the template from the 2insn_dot names, so this should explicitly be 1insn. Maybe "branch_anddi3_1insn_dot", or just "branch_anddi3_dot". > > + [(set (pc) > > +(if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r") > > + (match_operand:DI 2 "const_int_operand" "n,n")) > > + (const_int 0)) > > + (label_ref (match_operand 3 "")) > > + (pc))) > > + (clobber (match_scratch:DI 0 "=r,r")) > > + (clobber (reg:CC CR0_REGNO))] > > + "rs6000_is_valid_rotate_dot_mask (operands[2], DImode) > > + && TARGET_POWERPC64" > > + "#" > > + "&& reload_completed" > > + [(pc)] > > +{ > > + int nb, ne; > > + if (rs6000_is_valid_mask (operands[2], &nb, &ne, DImode) > > + && nb >= ne > > + && ne > 0) > > + { > > + unsigned HOST_WIDE_INT val = INTVAL (operands[2]); > > + int shift = 63 - nb; > > + rtx tmp = gen_rtx_ASHIFT (DImode, operands[1], GEN_INT (shift)); > > + tmp = gen_rtx_AND (DImode, tmp, GEN_INT (val << shift)); > > + rtx cr0 = gen_rtx_REG (CCmode, CR0_REGNO); > > + rs6000_emit_dot_insn (operands[0], tmp, 1, cr0); > > + rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[3]); > > + rtx cond = gen_rtx_EQ (CCEQmode, cr0, const0_rtx); > > + rtx ite = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, loc_ref, pc_rtx); > > + emit_jump_insn (gen_rtx_SET (pc_rtx, ite)); > > + DONE; > > + } > > + else > > + FAIL; > > +} > > + [(set_attr "type" "shift") > > + (set_attr "dot" "yes") > > + (set_attr "length" "8,12")]) > > > > (define_expand "3" > >[(set (match_operand:SDI 0 "gpc_reg_operand") > > diff --git a/gcc/test
Re: [PATCH] rs6000: Remove useless code related to -mno-power10
On Wed, Dec 29, 2021 at 4:37 AM Kewen.Lin wrote: > > Hi, > > Option -mpower10 was made as "WarnRemoved" since commit r11-2318, > so -mno-power10 doesn't take effect any more. This patch is to > remove one line useless code which still respects it. > > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8. > > Is it ok for trunk? > > BR, > Kewen > - > gcc/ChangeLog: > > * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): > Remove > useless related to option -mno-power10. > --- > gcc/config/rs6000/rs6000.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index e82a47f4c0e..66b01e589b0 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -24825,7 +24825,6 @@ rs6000_disable_incompatible_switches (void) > const HOST_WIDE_INT dep_flags; /* flags that depend on this option. > */ > const char *const name;/* name of the switch. */ >} flags[] = { > -{ OPTION_MASK_POWER10, OTHER_POWER10_MASKS,"power10" }, > { OPTION_MASK_P9_VECTOR, OTHER_P9_VECTOR_MASKS, "power9-vector" }, > { OPTION_MASK_P8_VECTOR, OTHER_P8_VECTOR_MASKS, "power8-vector" }, > { OPTION_MASK_VSX, OTHER_VSX_VECTOR_MASKS, "vsx" }, Okay. Thanks, David
Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]
On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI wrote: > > Hi, > > Gentle ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587051.html > > Thanks > > On 17/12/2021 上午 9:55, HAO CHEN GUI wrote: > > Hi, > >This patch defines a new split pattern for TI to V1TI move. The pattern > > concatenates two subreg:DI of > > a TI to a V2DI. With the pattern, the subreg pass can do register split for > > TI when there is a TI to V1TI > > move. The patch optimizes one unnecessary "mr" out on P9. The new test case > > illustrates it. > > > >Bootstrapped and tested on powerpc64-linux BE and LE with no > > regressions. Is this okay for trunk? > > Any recommendations? Thanks a lot. > > > > ChangeLog > > 2021-12-13 Haochen Gui > > > > gcc/ > > * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined. > > > > gcc/testsuite/ > > * gcc.target/powerpc/pr103124.c: New testcase. > > > > > > patch.diff > > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > > index bf033e31c1c..52968eb4609 100644 > > --- a/gcc/config/rs6000/vsx.md > > +++ b/gcc/config/rs6000/vsx.md > > @@ -6589,3 +6589,19 @@ (define_insn "xxeval" > > [(set_attr "type" "vecperm") > > (set_attr "prefixed" "yes")]) > > > > +;; Construct V1TI by vsx_concat_v2di > > +(define_split > > + [(set (match_operand:V1TI 0 "vsx_register_operand") > > + (subreg:V1TI > > + (match_operand:TI 1 "int_reg_operand") 0 ))] > > + "TARGET_P9_VECTOR && !reload_completed" > > + [(const_int 0)] > > +{ > > + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); > > + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); > > + rtx tmp3 = gen_reg_rtx (V2DImode); > > + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); > > + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); > > + emit_move_insn (operands[0], tmp4); > > + DONE; > > +}) > > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c > > b/gcc/testsuite/gcc.target/powerpc/pr103124.c > > new file mode 100644 > > index 000..e9072d19b8e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c > > @@ -0,0 +1,12 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > > +/* { dg-require-effective-target int128 } */ > > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > > +/* { dg-final { scan-assembler-not "\mmr\M" } } */ Segher probably would prefer {\mmr\M} . > > + > > +vector __int128 add (long long a) > > +{ > > + vector __int128 b; > > + b = (vector __int128) {a}; > > + return b; > > +} This is okay. Thanks, David
Re: Ping: [PATCH] rs6000: Add split pattern to replace
On Mon, Jan 10, 2022 at 12:04 AM Xionghu Luo wrote: > > Gentle ping, thanks. > > > On 2021/12/29 09:27, Xionghu Luo wrote: > > 7: r120:V4SI=const_vector > > 8: r121:V4SI=unspec[r120:V4SI,r120:V4SI,0xc] 260 > > > > with r121:v4SI = r120:V4SI when r120 is a vector with same element. > > > > Bootstrapped and regtested pass on powerpc64le-linux-gnu {P10, P9} > > and powerpc64-linux-gnu {P8, P7}. OK for master? > > > > gcc/ChangeLog: > > > > * config/rs6000/altivec.md (sldoi_to_mov_): New. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/powerpc/sldoi_to_mov.c: New test. > > --- > > gcc/config/rs6000/altivec.md| 11 +++ > > gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c | 15 +++ > > 2 files changed, 26 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c > > > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > > index b2909857c34..25f86dbe828 100644 > > --- a/gcc/config/rs6000/altivec.md > > +++ b/gcc/config/rs6000/altivec.md > > @@ -383,6 +383,17 @@ (define_split > > } > > }) > > > > +(define_insn_and_split "sldoi_to_mov_" It would be more consistent with the naming convention to use "sldoi_to_mov" without the final "_". > > + [(set (match_operand:VM 0 "altivec_register_operand") > > + (unspec:VM [(match_operand:VM 1 "easy_vector_constant") Should this be "easy_vector_constant_vsldoi"? > > + (match_dup 1) > > + (match_operand:VM 2 "u5bit_cint_operand")] This should be match_operand:QI, right? Thanks, David > > + UNSPEC_VSLDOI))] > > + "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()" > > + "#" > > + "&& 1" > > + [(set (match_dup 0) (match_dup 1))]) > > + > > (define_insn "get_vrsave_internal" > >[(set (match_operand:SI 0 "register_operand" "=r") > > (unspec:SI [(reg:SI VRSAVE_REGNO)] UNSPEC_GET_VRSAVE))] > > diff --git a/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c > > b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c > > new file mode 100644 > > index 000..2053243c456 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +#include > > +vector signed int foo1 (vector signed int a) { > > +vector signed int b = {0}; > > +return vec_sum2s(a, b); > > +} > > + > > +vector signed int foo2 (vector signed int a) { > > +vector signed int b = {0}; > > +return vec_sld(b, b, 4); > > +} > > + > > +/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 {target le} } } */ > > -- > Thanks, > Xionghu
Re: [power-ieee128] OPEN CONV
On Sat, Jan 8, 2022 at 1:59 PM Michael Meissner wrote: > > On Sat, Jan 08, 2022 at 03:18:07PM +0100, Jakub Jelinek wrote: > > On Sat, Jan 08, 2022 at 03:13:10PM +0100, Thomas Koenig wrote: > > > > > > On 08.01.22 15:02, Jakub Jelinek via Fortran wrote: > > > > Note, as for byteswapping, apparently it wasn't ever working right fox > > > > the IBM extended real(kind=16) and complex(kind=16). > > > > > > The lack of bug reports since the conversion feature was introduced in > > > 2006, more than 15 years ago, tells us something, I guess... > > > > powerpc64le was only introduced in GCC 4.8 in 2013, so slightly less > > than that, but still. > > Either nobody interchanges/shares fortran unformatted data between > > powerpc big and little endian, or if they do, they don't use real(kind=16) > > or complex(kind=16) in there... > > I still wish I had had the forethought when we were setting up the LE ABI to > change the default 128-bit format to IEEE instead of IBM. But alas, I didn't. > You would still need converters between the big endian IBM format and little > endian IEEE format, but it would have avoided a lot of the problems where GCC > assumes there is only one floating point format for each size. Mike, The LE ABI initial target was Power8 and IEEE128 hardware support was added to Power9. The ABI was a conscious decision. IEEE 128 was not a viable requirement for the LE ABI at the time of the transition. Thanks, David
Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8
On Fri, Jan 7, 2022 at 3:57 PM Paul A. Clarke wrote: > > On Fri, Jan 07, 2022 at 02:40:51PM -0500, David Edelsohn via Gcc-patches > wrote: > > +#ifdef __LITTLE_ENDIAN__ > > + /* Sum across four integers with two integer results. */ > > + asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero)); > > + /* Note: vec_sum2s could be used here, but on little-endian, vector > > + shifts are added that are not needed for this use-case. > > + A vector shift to correctly position the 32-bit integer results > > + (currently at [0] and [2]) to [1] and [3] would then need to be > > + swapped back again since the desired results are two 64-bit > > + integers ([1]|[0] and [3]|[2]). Thus, no shift is performed. */ > > +#else > >/* Sum across four integers with two integer results. */ > >result = vec_sum2s (vsum, (__vector signed int) zero); > > > > If little-endian adds shifts to correct for the position and > > big-endian does not, why not use the inline asm without the shifts for > > both? It seems confusing to add the inline asm only for LE instead of > > always using it with the appropriate comment. > > > > It's a good and valuable optimization for LE. Fewer variants are less > > fragile, easier to test and easier to maintain. If you're going to go > > to the trouble of using inline asm for LE, use it for both. > > BE (only) _does_ need a shift as seen on the next two lines after the > code snippet above: > /* Sum across four integers with two integer results. */ > result = vec_sum2s (vsum, (__vector signed int) zero); > /* Rotate the sums into the correct position. */ > result = vec_sld (result, result, 6); > > So, when using {vec_sum2s;vec_sld}: > - LE gets an implicit shift in vec_sum2s which just needs to be undone > by the vec_sld, and those shifts don't "cancel out" and get removed > by GCC. > - BE does not get any implicit shifts, but needs one that comes from > vec_sld. > > Are you saying use the asm(vsum2sws) and then conditionally call > vec_sld on BE only? > > I viewed this change as a temporary bandage unless and until GCC can > remove the unnecessary swaps. It seems like the preferred code is > vec_sum2s/vec_sld, not the asm, but that currently is suboptimal for LE. Nevermind. I thought that these patches had not been reviewed. Thanks, David
Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*
On Fri, Jan 7, 2022 at 3:35 PM Paul A. Clarke wrote: > > On Fri, Jan 07, 2022 at 02:23:14PM -0500, David Edelsohn wrote: > > > Power10 ISA added `vextract*` instructions which are realized in the > > > `vec_extractm` instrinsic. > > > > > > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and > > > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`. > > > > > > 2021-10-21 Paul A. Clarke > > > > > > gcc > > > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm > > > when _ARCH_PWR10. > > > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise. > > > (_mm_movemask_epi8): Likewise. > > > --- > > > Tested on Power10 powerpc64le-linux (compiled with and without > > > `-mcpu=power10`). > > > > > > OK for trunk? > > > > This is okay modulo > > > > > + return vec_extractm ((__v16qu) __A); > > > > Should the above be __v16qi like x86? > > That would match x86 better, but we don't have a function signature > for vec_extractm which accepts a signed type. Okay, nevermind. I thought that vec_extractm also allowed signed. Thanks, David
Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*
On Fri, Jan 7, 2022 at 3:32 PM Paul A. Clarke wrote: > > On Fri, Jan 07, 2022 at 02:15:22PM -0500, David Edelsohn wrote: > > > Power10 ISA added `xxblendv*` instructions which are realized in the > > > `vec_blendv` instrinsic. > > > > > > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and > > > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`. > > > > > > Also, copy a test from i386 for testing `_mm_blendv_ps`. > > > This should have come with commit > > > ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8, > > > but was inadvertently omitted. > > > > > > 2021-10-20 Paul A. Clarke > > > > > > gcc > > > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv > > > when _ARCH_PWR10. > > > (_mm_blendv_ps): Likewise. > > > (_mm_blendv_pd): Likewise. > > > > > > gcc/testsuite > > > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386, > > > adjust dg directives to suit. > > > --- > > > Tested on Power10 powerpc64le-linux (compiled with and without > > > `-mcpu=power10`). > > > > > > OK for trunk? > > > > This is okay modulo > > > > > + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) > > > __mask); > > > > Should the above be __v16qi like x86? > > That does arguably match the types involved (epi8) better. > > Shall I change the original implementation as well (4 lines later)? > > > return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask); vec_blendv supports the signed type, so it seems that the function should use that type, unless unsigned is preferred because PowerPC defaults to unsigned char. I wasn't going to recommend changing the existing code because I don't know how the signed type interacts with the other builtins. Thanks, David
Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8
+#ifdef __LITTLE_ENDIAN__ + /* Sum across four integers with two integer results. */ + asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero)); + /* Note: vec_sum2s could be used here, but on little-endian, vector + shifts are added that are not needed for this use-case. + A vector shift to correctly position the 32-bit integer results + (currently at [0] and [2]) to [1] and [3] would then need to be + swapped back again since the desired results are two 64-bit + integers ([1]|[0] and [3]|[2]). Thus, no shift is performed. */ +#else /* Sum across four integers with two integer results. */ result = vec_sum2s (vsum, (__vector signed int) zero); If little-endian adds shifts to correct for the position and big-endian does not, why not use the inline asm without the shifts for both? It seems confusing to add the inline asm only for LE instead of always using it with the appropriate comment. It's a good and valuable optimization for LE. Fewer variants are less fragile, easier to test and easier to maintain. If you're going to go to the trouble of using inline asm for LE, use it for both. Thanks, David
Re: [PATCH] rs6000: Add Power10 optimization for most _mm_movemask*
> Power10 ISA added `vextract*` instructions which are realized in the > `vec_extractm` instrinsic. > > Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and > `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`. > > 2021-10-21 Paul A. Clarke > > gcc > * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm > when _ARCH_PWR10. > * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise. > (_mm_movemask_epi8): Likewise. > --- > Tested on Power10 powerpc64le-linux (compiled with and without > `-mcpu=power10`). > > OK for trunk? This is okay modulo > + return vec_extractm ((__v16qu) __A); Should the above be __v16qi like x86? Thanks, David
Re: [PATCH] rs6000: Add Power10 optimization for _mm_blendv*
> Power10 ISA added `xxblendv*` instructions which are realized in the > `vec_blendv` instrinsic. > > Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and > `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`. > > Also, copy a test from i386 for testing `_mm_blendv_ps`. > This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8, > but was inadvertently omitted. > > 2021-10-20 Paul A. Clarke > > gcc > * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv > when _ARCH_PWR10. > (_mm_blendv_ps): Likewise. > (_mm_blendv_pd): Likewise. > > gcc/testsuite > * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386, > adjust dg directives to suit. > --- > Tested on Power10 powerpc64le-linux (compiled with and without > `-mcpu=power10`). > > OK for trunk? This is okay modulo > + return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) > __mask); Should the above be __v16qi like x86? Thanks, David
Re: [PATCH] aix: handle 64bit inodes for include directories
Hi, Jeff Is the revised patch from Clement okay? Thanks, David On Tue, Aug 24, 2021 at 3:59 AM CHIGOT, CLEMENT wrote: > > >>> So my worry here is this is really a host property -- ie, this is > >>> behavior of where GCC runs, not the target for which GCC is generating > >>> code. > >>> > >>> That implies that the change in aix.h is wrong. aix.h is for the > >>> target, not the host -- you don't want to define something like > >>> HOST_STAT_FOR_64BIT_INODES there. > >>> > >>> You'd want to be triggering this behavior via a host fragment, x-aix, or > >>> better yet via an autoconf test. > >> Indeed, would this version be better ? I'm not sure about the configure > >> test. > >> But as we are retrieving the size of dev_t and ino_t just above, I'm > >> assuming > >> that the one being used in stat directly. At least, that's the case on > >> AIX, and > >> this test is only made for AIX. > > It's a clear improvement. It's still checking for the aix target though: > > > > +# Select the right stat being able to handle 64bit inodes, if needed. > > +if test "$enable_largefile" != no; then > > + case "$target" in > > +*-*-aix*) > > + if test "$ac_cv_sizeof_ino_t" == "4" -a "$ac_cv_sizeof_dev_t" == > > 4; then > > + > > +$as_echo "#define HOST_STAT_FOR_64BIT_INODES stat64x" >>confdefs.h > > + > > + fi;; > > + esac > > +fi > > > > Again, we're dealing with a host property. You might be able to just > > change $target above to $host. Hmm, that makes me wonder about canadian > > crosses where host != build.We may need to do this for both the aix > > host and aix build. > > Yes, my bad, I've updated the case. I don't know if there is a usual way > to check both $build and $host. I've tried to avoid code duplication so > tell me if it's okay or if you'd rather have a case for $build and one > for $host. > > Thanks, > Clément
Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus
On Mon, Dec 20, 2021 at 6:55 PM Segher Boessenkool wrote: > > On Mon, Dec 20, 2021 at 11:45:45AM -0500, David Edelsohn wrote: > > On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo wrote: > > > These four UNSPECS seems could be replaced with native RTL, and why > > > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" > > > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): > > > > > > This bit is sticky; that is, once set to 1 it > > > remains set to 1 until it is set to 0 by an > > > mtvscr instruction. > > > > > > The RTL pattern set it to 0 but final ASM doesn't present it? And why > > > not use Clobber VSCR_REGNO instead? > > > > The design came from the early implementation of Altivec: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html > > > > If one later checks for saturation (reads VSCR), one needs a > > corresponding SET of the value. It's set in an architecture-specific > > manner that isn't described to GCC, but it's set, not just clobbered > > and in an undefined state. > > Well. RTL clobber and set do exactly the same thing, except with > clobber it is not specified *what* value is set. All bits are set, all > bits are defined. There is no (direct) way in RTL to say > "undetermined". > > An RTL clobber would work just fine afaics? I don't know about the original intention from Aldy, but if one were looking at an RTL dump and the code used the saturation bit from VSCR, it might be confusing to see a CLOBBER instead of a SET. The SET documents that VSCR_REGNO is assigned a specific value; GCC doesn't know about the semantics, but it's not some undefined bit pattern. CLOBBER implies a trash value or a value that one will not query later, i.e., one would want to SET the register to a specific value before using it. > > > The RTL does not describe that VSCR is set to the value 0. The > > (const_int 0) is not the value set. You can think of the (const_int > > 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at > > least one argument and the pattern doesn't try to express the > > argument, so it uses a dummy RTL constant. > > Yup. Traditionally (pc) was used for this. Nowadays (const_int 0) is > not really more expensive anymore, and many people find it clearer (but > not in this case it seems :-) ). > > > It's part of a PARALLEL > > and the plus or minus already expresses the data dependency of the > > pattern on the input operands. > > But they do not describe any dependency on vscr, or output to it. This > is the same problem we have with fpscr (most FP insns use some of its > fields, most set some, but there is no way to cleanly express that). It describes that VSCR_REGNO is set, an output. It doesn't describe how it is set nor inform the compiler that the value depends on the input operands in some complicated way unknown to the compiler, but the compiler cannot do anything useful with the additional information. > > Explicit clobbers like this help one side of the issue. For vscr, other > than the sat bit there is only the nj bit, and we just ignore that :-) > > > This patch is okay. Thanks for updating the machine description and > > for cleaning up the formatting. > > x2. Thanks! > > > Segher
Re: [PATCH] rs6000: Replace UNSPECS with ss_plus/us_plus and ss_minus/us_minus
On Mon, Dec 20, 2021 at 3:24 AM Xionghu Luo wrote: > > These four UNSPECS seems could be replaced with native RTL, and why > "(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))" > in the RTL pattern, per ISA of VSCR bit 127(VECTOR Saturation, SAT): > > This bit is sticky; that is, once set to 1 it > remains set to 1 until it is set to 0 by an > mtvscr instruction. > > The RTL pattern set it to 0 but final ASM doesn't present it? And why > not use Clobber VSCR_REGNO instead? The design came from the early implementation of Altivec: https://gcc.gnu.org/pipermail/gcc-patches/2002-May/077409.html If one later checks for saturation (reads VSCR), one needs a corresponding SET of the value. It's set in an architecture-specific manner that isn't described to GCC, but it's set, not just clobbered and in an undefined state. The RTL does not describe that VSCR is set to the value 0. The (const_int 0) is not the value set. You can think of the (const_int 0) as a dummy RTL argument to the VSCR UNSPEC. UNSPEC requires at least one argument and the pattern doesn't try to express the argument, so it uses a dummy RTL constant. It's part of a PARALLEL and the plus or minus already expresses the data dependency of the pattern on the input operands. I'm unsure of the meaning of your question "final ASM doesn't present it". The operation on VSCR is implicit and not emitted as an instruction. It's in a PARALLEL, which means that the single Altivec instruction has both effects. Is that what you were asking? > Tested pass on P10, OK for master? This patch is okay. Thanks for updating the machine description and for cleaning up the formatting. > Thanks. > > gcc/ChangeLog: > > * config/rs6000/altivec.md (altivec_vaddus): Replace > UNSPEC_VADDU with us_plus. > (altivec_vaddss): Replace UNSPEC_VADDS with ss_plus. > (altivec_vsubus): Replace UNSPEC_VSUBU with us_minus. > (altivec_vsubss): Replace UNSPEC_VSUBS with ss_minus. > (altivec_abss_): Likewise. > --- > gcc/config/rs6000/altivec.md | 29 ++--- > 1 file changed, 10 insertions(+), 19 deletions(-) > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index a057218aa28..b2909857c34 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -29,8 +29,6 @@ (define_c_enum "unspec" > UNSPEC_VMHADDSHS > UNSPEC_VMHRADDSHS > UNSPEC_VADDCUW > - UNSPEC_VADDU > - UNSPEC_VADDS > UNSPEC_VAVGU > UNSPEC_VAVGS > UNSPEC_VMULEUB > @@ -61,8 +59,6 @@ (define_c_enum "unspec" > UNSPEC_VSR > UNSPEC_VSRO > UNSPEC_VSUBCUW > - UNSPEC_VSUBU > - UNSPEC_VSUBS > UNSPEC_VSUM4UBS > UNSPEC_VSUM4S > UNSPEC_VSUM2SWS > @@ -517,9 +513,8 @@ (define_insn "altivec_vaddcuw" > > (define_insn "altivec_vaddus" >[(set (match_operand:VI 0 "register_operand" "=v") > -(unspec:VI [(match_operand:VI 1 "register_operand" "v") > - (match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VADDU)) > +(us_plus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] >"" >"vaddus %0,%1,%2" > @@ -527,9 +522,8 @@ (define_insn "altivec_vaddus" > > (define_insn "altivec_vaddss" >[(set (match_operand:VI 0 "register_operand" "=v") > -(unspec:VI [(match_operand:VI 1 "register_operand" "v") > -(match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VADDS)) > +(ss_plus:VI (match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] >"VECTOR_UNIT_ALTIVEC_P (mode)" >"vaddss %0,%1,%2" > @@ -563,9 +557,8 @@ (define_insn "altivec_vsubcuw" > > (define_insn "altivec_vsubus" >[(set (match_operand:VI 0 "register_operand" "=v") > -(unspec:VI [(match_operand:VI 1 "register_operand" "v") > -(match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VSUBU)) > +(us_minus:VI (match_operand:VI 1 "register_operand" "v") > +(match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))] >"VECTOR_UNIT_ALTIVEC_P (mode)" >"vsubus %0,%1,%2" > @@ -573,9 +566,8 @@ (define_insn "altivec_vsubus" > > (define_insn "altivec_vsubss" >[(set (match_operand:VI 0 "register_operand" "=v") > -(unspec:VI [(match_operand:VI 1 "register_operand" "v") > -(match_operand:VI 2 "register_operand" "v")] > - UNSPEC_VSUBS)) > +(ss_minus:VI (match_operand:VI 1 "register_operand" "v") > +(match_operand:VI 2 "register_operand" "v"))) > (set (reg:SI VSCR_REGNO) (unspec:S
Re: [PATCH, rs6000] Implement mffscrni pattern
On Mon, Dec 20, 2021 at 12:56 AM HAO CHEN GUI wrote: > > Hi, > I modified the patch according to David and Segher's advice. > > This patch defines a pattern for mffscrni. If the RN is a constant, it can > call > gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype > for builtin arguments. > The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument > is DI while its > corresponding builtin has a const int argument. The patch also fixed it. > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? > Any recommendations? Thanks a lot. > > ChangeLog > 2021-12-17 Haochen Gui > > gcc/ > * config/rs6000/predicates.md (u2bit_cint_operand): Defined. > * config/rs6000/rs6000-call.c > (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if > it's a constant. The pattern for constant can be recognized now. > * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined. > (rs6000_mffscrni): Defined. > (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI. > Call gen_rs6000_mffscrni when operand[0] is a const int[0,3]. > > gcc/testsuite/ > * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni. > * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to > test mffscrn and mffscrni separately. This revised patch is okay. Thanks, David > > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index f216ffdf410..b10b4ce6065 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand" >(and (match_code "const_int") > (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1"))) > > +;; Return 1 if op is an unsigned 2-bit constant integer. > +(define_predicate "u2bit_cint_operand" > + (and (match_code "const_int") > + (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3"))) > + > ;; Return 1 if op is a unsigned 3-bit constant integer. > (define_predicate "u3bit_cint_operand" >(and (match_code "const_int") > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index d9736eaf21c..81261a0f24d 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code > icode, tree exp) > compile time if the argument is a variable. The least significant two > bits of the argument, regardless of type, are used to set the rounding > mode. All other bits are ignored. */ > - if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode)) > + if (CONST_INT_P (op0)) > { > - error ("Argument must be a value between 0 and 3."); > - return const0_rtx; > + if (!const_0_to_3_operand (op0, VOIDmode)) > + { > + error ("Argument must be a value between 0 and 3."); > + return const0_rtx; > + } > } > - > - if (! (*insn_data[icode].operand[0].predicate) (op0, mode0)) > + else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0)) > op0 = copy_to_mode_reg (mode0, op0); > >pat = GEN_FCN (icode) (op0); > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6bec2bddbde..b18746af7ea 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -177,6 +177,7 @@ (define_c_enum "unspecv" > UNSPECV_MFFS; Move from FPSCR > UNSPECV_MFFSL ; Move from FPSCR light instruction version > UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode > + UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm > UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode > UNSPECV_MTFSF ; Move to FPSCR Fields 8 to 15 > UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7 > @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn" > "mffscrn %0,%1" >[(set_attr "type" "fp")]) > > +(define_insn "rs6000_mffscrni" > + [(set (match_operand:DF 0 "gpc_reg_operand" "=d") > + (unspec_volatile:DF [(match_operand:SI 1 "u2bit_cint_operand" "n")] > + UNSPECV_MFFSCRNI))] > + "TARGET_P9_MISC" > + "mffscrni %0,%1" > + [(set_attr "type" "fp")]) > + > (define_insn "rs6000_mffscdrn" >[(set (match_operand:DF 0 "gpc_reg_operand" "=d") > (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN)) > @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn" >[(set_attr "type" "fp")]) > > (define_expand "rs6000_set_fpscr_rn" > - [(match_operand:DI 0 "reg_or_cint_operand")] > + [(match_operand:SI 0 "reg_or_cint_operand")] >"TARGET_HARD_FLOAT" > { >rtx tmp_df = gen_reg_rtx (DFmode); > @@ -6333,9 +6342,15 @@ (define_expand "rs6000_set_fpscr_rn" > new rounding mode bits from operands[0][62:63] into
Re: [PATCH, rs6000] Implement mffscrni pattern
On Thu, Dec 16, 2021 at 9:43 PM HAO CHEN GUI wrote: > > Hi, >This patch defines a pattern for mffscrni. If the RN is a constant, it can > call > gen_rs6000_mffscrni directly. The "rs6000-builtin-new.def" defines prototype > for builtin arguments. > The pattern "rs6000_set_fpscr_rn" is then broken as the mode of its argument > is DI while its > corresponding builtin has a const int argument. The patch also fixed it. > >Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? > Any recommendations? Thanks a lot. Hi, Haochen I have a question about the mode of the input operand in the new pattern below. > > ChangeLog > 2021-12-17 Haochen Gui > > gcc/ > * config/rs6000/predicates.md (u2bit_cint_operand): Defined. > * config/rs6000/rs6000-call.c > (rs6000_expand_set_fpscr_rn_builtin): Not copy argument to a reg if > it's a constant. The pattern for constant can be recognized now. > * config/rs6000/rs6000.md (UNSPECV_MFFSCRNI): Defined. > (rs6000_mffscrni): Defined. > (rs6000_set_fpscr_rn): Change the type of operand[0] form DI to SI. > Call gen_rs6000_mffscrni when operand[0] is a const int[0,3]. > > gcc/testsuite/ > * gcc.target/powerpc/mffscrni_p9.c: New testcase for mffscrni. > * gcc.target/powerpc/test_fpscr_rn_builtin.c: Modify the test cases to > test mffscrn and mffscrni separately. > > patch.diff > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index f216ffd..b10b4ce 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -219,6 +219,11 @@ (define_predicate "u1bit_cint_operand" >(and (match_code "const_int") > (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 1"))) > > +;; Return 1 if op is an unsigned 2-bit constant integer. > +(define_predicate "u2bit_cint_operand" > + (and (match_code "const_int") > + (match_test "INTVAL (op) >= 0 && INTVAL (op) <= 3"))) > + > ;; Return 1 if op is a unsigned 3-bit constant integer. > (define_predicate "u3bit_cint_operand" >(and (match_code "const_int") > diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c > index d9736ea..81261a0 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -9610,13 +9610,15 @@ rs6000_expand_set_fpscr_rn_builtin (enum insn_code > icode, tree exp) > compile time if the argument is a variable. The least significant two > bits of the argument, regardless of type, are used to set the rounding > mode. All other bits are ignored. */ > - if (CONST_INT_P (op0) && !const_0_to_3_operand(op0, VOIDmode)) > + if (CONST_INT_P (op0)) > { > - error ("Argument must be a value between 0 and 3."); > - return const0_rtx; > + if (!const_0_to_3_operand (op0, VOIDmode)) > + { > + error ("Argument must be a value between 0 and 3."); > + return const0_rtx; > + } > } > - > - if (! (*insn_data[icode].operand[0].predicate) (op0, mode0)) > + else if (! (*insn_data[icode].operand[0].predicate) (op0, mode0)) > op0 = copy_to_mode_reg (mode0, op0); > >pat = GEN_FCN (icode) (op0); > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index 6bec2bd..291396c 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -177,6 +177,7 @@ (define_c_enum "unspecv" > UNSPECV_MFFS; Move from FPSCR > UNSPECV_MFFSL ; Move from FPSCR light instruction version > UNSPECV_MFFSCRN ; Move from FPSCR float rounding mode > + UNSPECV_MFFSCRNI; Move from FPSCR float rounding mode with imm > UNSPECV_MFFSCDRN; Move from FPSCR decimal float rounding mode > UNSPECV_MTFSF ; Move to FPSCR Fields 8 to 15 > UNSPECV_MTFSF_HI; Move to FPSCR Fields 0 to 7 > @@ -6315,6 +6316,14 @@ (define_insn "rs6000_mffscrn" > "mffscrn %0,%1" >[(set_attr "type" "fp")]) > > +(define_insn "rs6000_mffscrni" > + [(set (match_operand:DF 0 "gpc_reg_operand" "=d") > + (unspec_volatile:DF [(match_operand:DF 1 "u2bit_cint_operand" "n")] Why is this input operand 1 DFmode? This is a 2 bit integer value. This pattern is called from rs6000_set_fpscr_rn with an SImode operand, and it seems that this should be SImode as well. Thanks, David > + UNSPECV_MFFSCRNI))] > + "TARGET_P9_MISC" > + "mffscrni %0,%1" > + [(set_attr "type" "fp")]) > + > (define_insn "rs6000_mffscdrn" >[(set (match_operand:DF 0 "gpc_reg_operand" "=d") > (unspec_volatile:DF [(const_int 0)] UNSPECV_MFFSCDRN)) > @@ -6324,7 +6333,7 @@ (define_insn "rs6000_mffscdrn" >[(set_attr "type" "fp")]) > > (define_expand "rs6000_set_fpscr_rn" > - [(match_operand:DI 0 "reg_or_cint_operand")] > + [(match_operand:SI 0 "reg_or_cint_operand")] >"TARGET_HARD_FLOAT" >
Re: [PATCH v4 0/6] __builtin_dynamic_object_size
Siddhesh, This patch series seems to have caused testsuite regressions for memcpy-chk, etc. in 32 bit mode (i386, x86-64 -m32 and -mx32, AIX 32 bit). I have opened PR 103759. Thanks, David
Re: [PATCH 6/6] rs6000: Rename arrays to remove temporary _x suffix
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt wrote: > > Hi! > > While we had two sets of built-in infrastructure at once, I added _x as a > suffix to two arrays to disambiguate the old and new versions. Time to fix > that also. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > 2021-12-06 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-c.c (altivec_build_resolved_builtin): Rename > rs6000_builtin_decls_x to rs6000_builtin_decls. > (altivec_resolve_overloaded_builtin): Likewise. Also rename > rs6000_builtin_info_x to rs6000_builtin_info. > * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Rename > rs6000_builtin_info_x to rs6000_builtin_info. > (rs6000_builtin_is_supported): Likewise. > (rs6000_gimple_fold_mma_builtin): Likewise. Also rename > rs6000_builtin_decls_x to rs6000_builtin_decls. > (rs6000_gimple_fold_builtin): Rename rs6000_builtin_info_x to > rs6000_builtin_info. > (cpu_expand_builtin): Likewise. > (rs6000_expand_builtin): Likewise. > (rs6000_init_builtins): Likewise. Also rename rs6000_builtin_decls_x > to rs6000_builtin_decls. > (rs6000_builtin_decl): Rename rs6000_builtin_decls_x to > rs6000_builtin_decls. > * config/rs6000/rs6000-gen-builtins.c (write_decls): In generated > code, > rename rs6000_builtin_decls_x to rs6000_builtin_decls, and rename > rs6000_builtin_info_x to rs6000_builtin_info. > (write_bif_static_init): In generated code, rename > rs6000_builtin_info_x to rs6000_builtin_info. > (write_init_bif_table): In generated code, rename > rs6000_builtin_decls_x to rs6000_builtin_decls, and rename > rs6000_builtin_info_x to rs6000_builtin_info. > (write_init_ovld_table): In generated code, rename > rs6000_builtin_decls_x to rs6000_builtin_decls. > (write_init_file): Likewise. > * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): > Likewise. > (rs6000_builtin_md_vectorized_function): Likewise. > (rs6000_builtin_reciprocal): Likewise. > (add_condition_to_bb): Likewise. > (rs6000_atomic_assign_expand_fenv): Likewise. Okay. Thanks, David
Re: [PATCH 5/6] rs6000: Rename functions with "new" in their names
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt wrote: > > Hi! > > While we had two sets of built-in functionality at the same time, I put "new" > in the names of quite a few functions. Time to undo that. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > 2021-12-02 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-c.c (altivec_resolve_new_overloaded_builtin): > Remove forward declaration. > (rs6000_new_builtin_type_compatible): Rename to > rs6000_builtin_type_compatible. > (rs6000_builtin_type_compatible): Remove. > (altivec_resolve_overloaded_builtin): Remove. > (altivec_build_new_resolved_builtin): Rename to > altivec_build_resolved_builtin. > (altivec_resolve_new_overloaded_builtin): Rename to > altivec_resolve_overloaded_builtin. Remove static keyword. Adjust > called function names. > * config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): Remove > forward declaration. > (rs6000_gimple_fold_new_builtin): Likewise. > (rs6000_invalid_new_builtin): Rename to rs6000_invalid_builtin. > (rs6000_gimple_fold_builtin): Remove. > (rs6000_new_builtin_valid_without_lhs): Rename to > rs6000_builtin_valid_without_lhs. > (rs6000_new_builtin_is_supported): Rename to > rs6000_builtin_is_supported. > (rs6000_gimple_fold_new_mma_builtin): Rename to > rs6000_gimple_fold_mma_builtin. > (rs6000_gimple_fold_new_builtin): Rename to > rs6000_gimple_fold_builtin. Remove static keyword. Adjust called > function names. > (rs6000_expand_builtin): Remove. > (new_cpu_expand_builtin): Rename to cpu_expand_builtin. > (new_mma_expand_builtin): Rename to mma_expand_builtin. > (new_htm_spr_num): Rename to htm_spr_num. > (new_htm_expand_builtin): Rename to htm_expand_builtin. Change name > of called function. > (rs6000_expand_new_builtin): Rename to rs6000_expand_builtin. Remove > static keyword. Adjust called function names. > (rs6000_new_builtin_decl): Rename to rs6000_builtin_decl. Remove > static keyword. > (rs6000_builtin_decl): Remove. > * config/rs6000/rs6000-gen-builtins.c (write_decls): In gnerated code, > rename rs6000_new_builtin_is_supported to rs6000_builtin_is_supported. > * config/rs6000/rs6000-internal.h (rs6000_invalid_new_builtin): Rename > to rs6000_invalid_builtin. > * config/rs6000/rs6000.c (rs6000_new_builtin_vectorized_function): > Rename to rs6000_builtin_vectorized_function. > (rs6000_new_builtin_md_vectorized_function): Rename to > rs6000_builtin_md_vectorized_function. > (rs6000_builtin_vectorized_function): Remove. > (rs6000_builtin_md_vectorized_function): Remove. Okay. Thanks, David
Re: [PATCH 4/6] rs6000: Remove rs6000-builtin.def and associated data and functions
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt wrote: > > Hi! > > The old rs6000-builtin.def file is no longer needed. Remove it and the code > that depends on it. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > 2021-12-02 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin.def: Delete. > * config/rs6000/rs6000-call.c (builtin_compatibility): Delete. > (builtin_description): Delete. > (builtin_hash_struct): Delete. > (builtin_hasher): Delete. > (builtin_hash_table): Delete. > (builtin_hasher::hash): Delete. > (builtin_hasher::equal): Delete. > (rs6000_builtin_info_type): Delete. > (rs6000_builtin_info): Delete. > (bdesc_compat): Delete. > (bdesc_3arg): Delete. > (bdesc_4arg): Delete. > (bdesc_dst): Delete. > (bdesc_2arg): Delete. > (bdesc_altivec_preds): Delete. > (bdesc_abs): Delete. > (bdesc_1arg): Delete. > (bdesc_0arg): Delete. > (bdesc_htm): Delete. > (bdesc_mma): Delete. > (rs6000_overloaded_builtin_p): Delete. > (rs6000_overloaded_builtin_name): Delete. > (htm_spr_num): Delete. > (rs6000_builtin_is_supported_p): Delete. > (rs6000_gimple_fold_mma_builtin): Delete. > (gt-rs6000-call.h): Remove include directive. > * config/rs6000/rs6000-protos.h (rs6000_overloaded_builtin_p): Delete. > (rs6000_builtin_is_supported_p): Delete. > (rs6000_overloaded_builtin_name): Delete. > * config/rs6000/rs6000.c (rs6000_builtin_decls): Delete. > (rs6000_debug_reg_global): Remove reference to RS6000_BUILTIN_COUNT. > * config/rs6000/rs6000.h (rs6000_builtins): Delete. > (altivec_builtin_types): Delete. > (rs6000_builtin_decls): Delete. > * config/rs6000/t-rs6000 (TM_H): Don't add rs6000-builtin.def. Okay. Thanks, David
Re: [PATCH 3/6] rs6000: Rename rs6000-builtin-new.def to rs6000-builtins.def
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt wrote: > > Hi! > > This patch just renames a file and updates the build machinery accordingly. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > 2021-12-02 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-builtin-new.def: Rename to... > * config/rs6000/rs6000-builtins.def: ...this. > * config/rs6000/rs6000-gen-builtins.c: Adjust header commentary. > * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Rename > rs6000-builtin-new.def to rs6000-builtins.def. > (rs6000-builtins.c): Likewise. Okay. Thanks, David
Re: [PATCH 2/6] rs6000: Remove altivec_overloaded_builtins array and initialization
On Mon, Dec 6, 2021 at 3:49 PM Bill Schmidt wrote: > > Hi! > > This patch just removes the huge altivec_overloaded_builtins array. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > 2021-12-02 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Remove. > * config/rs6000/rs6000.h (altivec_overloaded_builtins): Remove. Okay. Thanks, David
Re: [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants.
On Fri, Nov 5, 2021 at 3:38 PM will schmidt wrote: > > On Fri, 2021-11-05 at 00:11 -0400, Michael Meissner wrote: > > Generate XXSPLTIDP for scalars on power10. > > > > This patch implements XXSPLTIDP support for SF, and DF scalar constants. > > The previous patch added support for vector constants. This patch adds > > the support for SFmode and DFmode scalar constants. > > > > I added 2 new tests to test loading up SF and DF scalar constants. > > > ok > > > > > 2021-11-05 Michael Meissner > > > > gcc/ > > > > * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec. > > (UNSPEC_XXSPLTIW_CONST): New unspec. > > (movsf_hardfloat): Add support for generating XXSPLTIDP. > > (mov_hardfloat32): Likewise. > > (mov_hardfloat64): Likewise. > > (xxspltidp__internal): New insns. > > (xxspltiw__internal): New insns. > > (splitters for SF/DFmode): Add new splitters for XXSPLTIDP. > > > > gcc/testsuite/ > > > > * gcc.target/powerpc/vec-splat-constant-df.c: New test. > > * gcc.target/powerpc/vec-splat-constant-sf.c: New test. > > --- > > ok > > > > gcc/config/rs6000/rs6000.md | 97 +++ > > .../powerpc/vec-splat-constant-df.c | 60 > > .../powerpc/vec-splat-constant-sf.c | 60 > > 3 files changed, 199 insertions(+), 18 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > > > > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > > index 3a7bcd2426e..4122acb98cf 100644 > > --- a/gcc/config/rs6000/rs6000.md > > +++ b/gcc/config/rs6000/rs6000.md > > @@ -156,6 +156,8 @@ (define_c_enum "unspec" > > UNSPEC_PEXTD > > UNSPEC_HASHST > > UNSPEC_HASHCHK > > + UNSPEC_XXSPLTIDP_CONST > > + UNSPEC_XXSPLTIW_CONST > >]) > > > > ;; > > @@ -7764,17 +7766,17 @@ (define_split > > ;; > > ;; LWZ LFSLXSSP LXSSPX STFS STXSSP > > ;; STXSSPX STWXXLXOR LI FMRXSCPSGNDP > > -;; MR MT MF NOP > > +;; MR MT MF NOPXXSPLTIDP > > > > (define_insn "movsf_hardfloat" > >[(set (match_operand:SF 0 "nonimmediate_operand" > >"=!r, f, v, wa,m, wY, > > Z, m, wa, !r,f, wa, > > - !r,*c*l, !r, *h") > > + !r,*c*l, !r, *h,wa") > > (match_operand:SF 1 "input_operand" > >"m, m, wY, Z, f, v, > > wa,r, j, j, f, wa, > > - r, r, *h, 0"))] > > + r, r, *h, 0, eP"))] > >"(register_operand (operands[0], SFmode) > > || register_operand (operands[1], SFmode)) > > && TARGET_HARD_FLOAT > > @@ -7796,15 +7798,16 @@ (define_insn "movsf_hardfloat" > > mr %0,%1 > > mt%0 %1 > > mf%1 %0 > > - nop" > > + nop > > + #" > >[(set_attr "type" > > "load, fpload,fpload, fpload,fpstore, fpstore, > >fpstore,store, veclogical, integer, fpsimple, fpsimple, > > - *, mtjmpr,mfjmpr, *") > > + *, mtjmpr,mfjmpr, *, vecperm") > > (set_attr "isa" > > "*, *, p9v,p8v, *, p9v, > >p8v,*, *, *, *, *, > > - *, *, *, *")]) > > + *, *, *, *, p10")]) > > > > ;; LWZ LFIWZX STWSTFIWX MTVSRWZMFVSRWZ > > ;; FMR MR MT%0 MF%1 NOP > > @@ -8064,18 +8067,18 @@ (define_split > > > > ;; STFD LFD FMR LXSDSTXSD > > ;; LXSD STXSD XXLOR XXLXOR GPR<-0 > > -;; LWZ STW MR > > +;; LWZ STW MR XXSPLTIDP > > > > > > (define_insn "*mov_hardfloat32" > >[(set (match_operand:FMOVE64 0 "nonimmediate_operand" > > "=m, d, d, , wY, > >, Z, , , !r, > > - Y, r, !r") > > + Y, r, !r, wa") > > (match_operand:FMOVE64 1 "input_operand" > > "d, m, d, wY, , > >Z, , , , , > > - r, Y, r"))] > > + r, Y, r, eP"))] > >"! TARGET_POWERPC64 && TARGET_HARD_FLOAT > > && (gpc_reg_operand (operands[0], mode) > > || gpc_reg_operand (operands[1], mode))" > > @@ -8092,20 +8095,21 @@ (define_insn "*mov
Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
On Fri, Nov 5, 2021 at 3:24 PM will schmidt wrote: > > On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote: > > Generate XXSPLTIDP for vectors on power10. > > > > This patch implements XXSPLTIDP support for all vector constants. The > > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a > > vector > > of two DFmode constants. The immediate is in SFmode format, so only > > constants > > that fit as SFmode values can be loaded with XXSPLTIDP. > > > > The constraint (eP) added in the previous patch for XXSPLTIW is also used > > for XXSPLTIDP. > > > > ok > > > > DImode scalar constants are not handled. This is due to the majority of > > DImode > > constants will be in the GPR registers. With vector registers, you have the > > problem that XXSPLTIDP splats the double word into both elements of the > > vector. However, if TImode is loaded with an integer constant, it wants a > > full > > 128-bit constant. > > This may be worth as adding to a todo somewhere in the code. > > > > > SFmode and DFmode scalar constants are not handled in this patch. The > > support for for those constants will be in the next patch. > > ok > > > > > I have added a temporary switch (-msplat-float-constant) to control whether > > or > > not the XXSPLTIDP instruction is generated. > > > > I added 2 new tests to test loading up V2DI and V2DF vector constants. > > > > > > > > 2021-11-05 Michael Meissner > > > > gcc/ > > > > * config/rs6000/predicates.md (easy_fp_constant): Add support for > > generating XXSPLTIDP. > > (vsx_prefixed_constant): Likewise. > > (easy_vector_constant): Likewise. > > * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp): > > New declaration. > > * config/rs6000/rs6000.c (output_vec_const_move): Add support for > > generating XXSPLTIDP. > > (prefixed_xxsplti_p): Likewise. > > (constant_generates_xxspltidp): New function. > > * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option. > > > > gcc/testsuite/ > > > > * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn > > regex for power10. > > * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. > > * gcc.target/powerpc/vec-splat-constant-v2di.c: New test. > > --- > > > ok > > > gcc/config/rs6000/predicates.md | 9 ++ > > gcc/config/rs6000/rs6000-protos.h | 1 + > > gcc/config/rs6000/rs6000.c| 108 ++ > > gcc/config/rs6000/rs6000.opt | 4 + > > .../powerpc/pr86731-fwrapv-longlong.c | 9 +- > > .../powerpc/vec-splat-constant-v2df.c | 64 +++ > > .../powerpc/vec-splat-constant-v2di.c | 50 > > 7 files changed, 241 insertions(+), 4 deletions(-) > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c > > > > diff --git a/gcc/config/rs6000/predicates.md > > b/gcc/config/rs6000/predicates.md > > index ed6252bd0c4..d748b11857c 100644 > > --- a/gcc/config/rs6000/predicates.md > > +++ b/gcc/config/rs6000/predicates.md > > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant" > > > >if (constant_generates_xxspltiw (&vsx_const)) > > return true; > > + > > + if (constant_generates_xxspltidp (&vsx_const)) > > + return true; > > } > > > >/* Otherwise consider floating point constants hard, so that the > > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant" > >if (constant_generates_xxspltiw (&vsx_const)) > > return true; > > > > + if (constant_generates_xxspltidp (&vsx_const)) > > +return true; > > + > >return false; > > }) > > > > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant" > > > > if (constant_generates_xxspltiw (&vsx_const)) > > return true; > > + > > + if (constant_generates_xxspltidp (&vsx_const)) > > + return true; > > } > > > ok > > > > >if (TARGET_P9_VECTOR > > diff --git a/gcc/config/rs6000/rs6000-protos.h > > b/gcc/config/rs6000/rs6000-protos.h > > index 99c6a671289..2d28df7442d 100644 > > --- a/gcc/config/rs6000/rs6000-protos.h > > +++ b/gcc/config/rs6000/rs6000-protos.h > > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, > > machine_mode, > > vec_const_128bit_type *); > > extern unsigned constant_generates_lxvkq (vec_const_128bit_type *); > > extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *); > > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *); > > #endif /* RTX_CODE */ > > > > #ifdef TREE_CODE > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > > index be24f56eb31..8fde48cf2b3 100644 > > --- a/gcc/config/rs6000/rs6000.c > > +++ b/gcc/config/rs6000/rs6000.c > > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx
Re: [PATCH 3/5] Add Power10 XXSPLTIW
On Fri, Nov 5, 2021 at 2:50 PM will schmidt wrote: > > On Fri, 2021-11-05 at 00:09 -0400, Michael Meissner wrote: > > Generate XXSPLTIW on power10. > > > > Hi, > > > > This patch adds support to automatically generate the ISA 3.1 XXSPLTIW > > instruction for V8HImode, V4SImode, and V4SFmode vectors. It does this by > > adding support for vector constants that can be used, and adding a > > VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction. > > > > The eP constraint was added to recognize constants that can be loaded into > > vector registers with a single prefixed instruction. > > Perhaps Swap "... the eP constraint was added ..." for "Add the eP > constraint to ..." > > > > > > I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector > > constants. > > > > > > 2021-11-05 Michael Meissner > > > > gcc/ > > > > * config/rs6000/constraints.md (eP): Update comment. > > * config/rs6000/predicates.md (easy_fp_constant): Add support for > > generating XXSPLTIW. > > (vsx_prefixed_constant): New predicate. > > (easy_vector_constant): Add support for > > generating XXSPLTIW. > > * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New > > declaration. > > (constant_generates_xxspltiw): Likewise. > > * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate > > XXSPLTIW, don't do XXSPLTIB and sign extend. > > Perhaps just 'generate XXSPLTIW if possible'. > > > (output_vec_const_move): Add support for XXSPLTIW. > > (prefixed_xxsplti_p): New function. > > (constant_generates_xxspltiw): New function. > > * config/rs6000/rs6000.md (prefixed attribute): Add support to > > mark XXSPLTI* instructions as being prefixed. > > * config/rs6000/rs6000.opt (-msplat-word-constant): New debug > > switch. > > * config/rs6000/vsx.md (vsx_mov_64bit): Add support for > > generating XXSPLTIW or XXSPLTIDP. > > (vsx_mov_32bit): Likewise. > > * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the > > eP constraint. > > > > gcc/testsuite/ > > > > * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test. > > * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test. > > * gcc.target/powerpc/vec-splat-constant-v4si.c: New test. > > * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test. > > * gcc.target/powerpc/vec-splati-runnable.c: Update insn count. > > --- > > gcc/config/rs6000/constraints.md | 6 ++ > > gcc/config/rs6000/predicates.md | 46 ++- > > gcc/config/rs6000/rs6000-protos.h | 2 + > > gcc/config/rs6000/rs6000.c| 81 +++ > > gcc/config/rs6000/rs6000.md | 5 ++ > > gcc/config/rs6000/rs6000.opt | 4 + > > gcc/config/rs6000/vsx.md | 28 +++ > > gcc/doc/md.texi | 4 + > > .../powerpc/vec-splat-constant-v16qi.c| 27 +++ > > .../powerpc/vec-splat-constant-v4sf.c | 67 +++ > > .../powerpc/vec-splat-constant-v4si.c | 51 > > .../powerpc/vec-splat-constant-v8hi.c | 62 ++ > > .../gcc.target/powerpc/vec-splati-runnable.c | 4 +- > > 13 files changed, 369 insertions(+), 18 deletions(-) > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c > > create mode 100644 > > gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c > > > > diff --git a/gcc/config/rs6000/constraints.md > > b/gcc/config/rs6000/constraints.md > > index e72132b4c28..a4b05837fa6 100644 > > --- a/gcc/config/rs6000/constraints.md > > +++ b/gcc/config/rs6000/constraints.md > > @@ -213,6 +213,12 @@ (define_constraint "eI" > >"A signed 34-bit integer constant if prefixed instructions are > > supported." > >(match_operand 0 "cint34_operand")) > > > > +;; A SF/DF scalar constant or a vector constant that can be loaded into > > vector > > +;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW. > > +(define_constraint "eP" > > + "A constant that can be loaded into a VSX register with one prefixed > > insn." > > + (match_operand 0 "vsx_prefixed_constant")) > > + > > ;; A TF/KF scalar constant or a vector constant that can load certain IEEE > > ;; 128-bit constants into vector registers using LXVKQ. > > (define_constraint "eQ" > > diff --git a/gcc/config/rs6000/predicates.md > > b/gcc/config/rs6000/predicates.md > > index e0d1c718e9f..ed6252bd0c4 100644 > > --- a/gcc/config/rs6000/predicates.md > > +++ b/gcc/config/rs6000/predicates.md > > @@ -605,7 +605,10 @@ (define_predicate "easy_fp_constant" > >vec_const_128bit_type vsx_const; > >if (TARGET_POWER10 && v
Re: [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function)
On Fri, Nov 5, 2021 at 2:13 PM Michael Meissner wrote: > > On Fri, Nov 05, 2021 at 12:01:43PM -0500, will schmidt wrote: > > On Fri, 2021-11-05 at 00:04 -0400, Michael Meissner wrote: > > > Add new constant data structure. > > > > > > This patch provides the data structure and function to convert a > > > CONST_INT, CONST_DOUBLE, CONST_VECTOR, or VEC_DUPLICATE of a constant) to > > > an array of bytes, half-words, words, and double words that can be loaded > > > into a 128-bit vector register. > > > > > > The next patches will use this data structure to generate code that > > > generates load of the vector/floating point registers using the XXSPLTIDP, > > > XXSPLTIW, and LXVKQ instructions that were added in power10. > > > > > > 2021-11-05 Michael Meissner > > > > > Whoops, it should be meiss...@linux.ibm.com. > > > comment to be explicit on the structure name being copied to/from. > > (vec_const_128bit_type is easy to search for, vector or constant or > > structure are not as unique) > > Yes, the original name was more generic (rs6000_const). Originally it could > potentially handle vector constants that were greater than 128-bits if we ever > have support for larger vectors. But I thought that extra generallity > hindered > the code (since you had to check whether the size was exactly 128-bits, etc.). > So I made the data structure tailored to the problem at hand. > > > > + > > > +/* Copy an floating point constant to the vector constant structure. */ > > > + > > > > s/an/a/ > > Ok. > > > > +static void > > > +constant_fp_to_128bit_vector (rtx op, > > > + machine_mode mode, > > > + size_t byte_num, > > > + vec_const_128bit_type *info) > > > +{ > > > + unsigned bitsize = GET_MODE_BITSIZE (mode); > > > + unsigned num_words = bitsize / 32; > > > + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op); > > > + long real_words[VECTOR_128BIT_WORDS]; > > > + > > > + /* Make sure we don't overflow the real_words array and that it is > > > + filled completely. */ > > > + gcc_assert (num_words <= VECTOR_128BIT_WORDS && (bitsize % 32) == 0); > > > > Not clear to me on the potential to partially fill the real_words > > array. > > At the moment we don't support a 16-bit floating point type in the compiler > (the Power10 has limited 16-bit floating point support, but we don't make a > special type for it). If/when we add the 16-bit floating point, we will > possibly need to revisit this. > > > > + > > > + real_to_target (real_words, rtype, mode); > > > + > > > + /* Iterate over each 32-bit word in the floating point constant. The > > > + real_to_target function puts out words in endian fashion. We need > > > > Meaning host-endian fashion, or is that meant to be big-endian ? > > Real_to_target puts out the 32-bit values in endian fashion. This data > structure wants to hold everything in big endian fashion to make checking > things simpler. > > > Perhaps also rephrase or move the comment up to indicate that > > real_to_target will have placed or has already placed the words in > > endian fashion. > > As stated I was expecting to see a call to real_to_target() below the > > comment. > > Yes, I probably should move the real_to_target call after the comment. > > > > + > > > + /* Possibly splat the constant to fill a vector size. */ > > > > > > Suggest "Splat the constant to fill a vector size if ..." > > Ok. Okay. Thanks, David
Re: [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ)
On Fri, Nov 5, 2021 at 2:01 PM Michael Meissner wrote: > > On Fri, Nov 05, 2021 at 12:52:51PM -0500, will schmidt wrote: > > > diff --git a/gcc/config/rs6000/predicates.md > > > b/gcc/config/rs6000/predicates.md > > > index 956e42bc514..e0d1c718e9f 100644 > > > --- a/gcc/config/rs6000/predicates.md > > > +++ b/gcc/config/rs6000/predicates.md > > > @@ -601,6 +601,14 @@ (define_predicate "easy_fp_constant" > > >if (TARGET_VSX && op == CONST0_RTX (mode)) > > > return 1; > > > > > > + /* Constants that can be generated with ISA 3.1 instructions are easy. > > > */ > > > > Easy is relative, but OK. > > The names of the function is easy_fp_constant. > > > > + vec_const_128bit_type vsx_const; > > > + if (TARGET_POWER10 && vec_const_128bit_to_bytes (op, mode, &vsx_const)) > > > +{ > > > + if (constant_generates_lxvkq (&vsx_const) != 0) > > > + return true; > > > +} > > > + > > >/* Otherwise consider floating point constants hard, so that the > > > constant gets pushed to memory during the early RTL phases. This > > > has the advantage that double precision constants that can be > > > @@ -609,6 +617,23 @@ (define_predicate "easy_fp_constant" > > > return 0; > > > }) > > > > > > +;; Return 1 if the operand is a special IEEE 128-bit value that can be > > > loaded > > > +;; via the LXVKQ instruction. > > > + > > > +(define_predicate "easy_vector_constant_ieee128" > > > + (match_code "const_vector,const_double") > > > +{ > > > + vec_const_128bit_type vsx_const; > > > + > > > + /* Can we generate the LXVKQ instruction? */ > > > + if (!TARGET_IEEE128_CONSTANT || !TARGET_FLOAT128_HW || !TARGET_POWER10 > > > + || !TARGET_VSX) > > > +return false; > > > > Presumably all of the checks there are valid. (Can we have power10 > > without float128_hw or ieee128_constant flags set?)I do notice the > > addition of an ieee128_constant flag below. > > Yes, we can have power10 without float128_hw. At the moment, 32-bit big > endian > does not enable the 128-bit IEEE instructions. Also when we are building the > bits in libgcc that can switch between compiling the software routines and the > routines used for IEEE hardware, and when we are building the IEEE 128-bit > software emulation functions we need to explicitly turn off IEEE 128-bit > hardware support. > > Similarly for VSX, if the user explicitly says -mno-vsx, then we can't enable > this instruction. > > > Ok. I did look at this a bit before it clicked, so would suggest a > > comment stl "All of the constants that can be loaded by lxvkq will have > > zero in the bottom 3 words, so ensure those are zero before we use a > > switch based on the nonzero portion of the constant." > > > > It would be fine as-is too. :-) > > Ok. Okay. Thanks, David
Re: [PATCH take #2] PR target/43892: Some carry flag (CA) optimizations on PowerPC.
Hi, Roger! Thanks very much for investigating this issue and developing a patch to leverage this feature of the PowerPC architecture. 2021-12-03 Roger Sayle gcc/ChangeLog PR target/43892 * config/rs6000/rs6000.md (*add3_carry_in_0_2): New define_insn to recognize commutative form of add3_carry_in_0. (*add3_geu, *add3_leu, *subf3_carry_in_xx_subf, *add3_carry_in_addc): New define_insn_and_split patterns. It might be easier to read if each of the define_insn_and_split ChangeLog entries were on a separate line and the latter ones said "Same" or "Likewise", but up to you. Segher can be more pedantic. gcc/testsuite/ChangeLog PR target/43892 * gcc.target/powerpc/addcmp.c: New test case. * gcc.target/powerpc/pr43892.c: New test case. This patch is okay. Thanks, David
Re: [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]
On Sun, Dec 12, 2021 at 10:00 PM HAO CHEN GUI wrote: > > Hi, >This patch defines a new split pattern for TI to V1TI move. The pattern > concatenates two subreg:DI of > a TI to a V2DI, then move the V2DI to V1TI. With the pattern, the subreg pass > can do register split for > TI when there is a TI to V1TI move. The patch optimizes one unnecessary "mr" > out on P9. The new > test case illustrates it. > >Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? > Any recommendations? Thanks a lot. > > ChangeLog > 2021-12-13 Haochen Gui > > gcc/ > * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined. > > gcc/testsuite/ > * gcc.target/powerpc/pr103124.c: New testcase. > > > patch.diff > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index bf033e31c1c..7bca7780735 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -6589,3 +6589,19 @@ (define_insn "xxeval" > [(set_attr "type" "vecperm") > (set_attr "prefixed" "yes")]) > > +;; split TI to V1TI move > +(define_split > + [(set (match_operand:V1TI 0 "vsx_register_operand") > + (subreg:V1TI > + (match_operand:TI 1 "int_reg_operand") 0 ))] > + "TARGET_P9_VECTOR && !reload_completed" > + [(const_int 0)] > +{ > + rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0); > + rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8); > + rtx tmp3 = gen_reg_rtx (V2DImode); > + emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2)); > + rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0); > + emit_move_insn (operands[0], tmp4); > + DONE; > +}) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c > b/gcc/testsuite/gcc.target/powerpc/pr103124.c > new file mode 100644 > index 000..724492dbcd2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile { target { powerpc*-*-* && lp64 } } */ Please don't include the "powerpc" target selector in the gcc.target/powerpc directory. Just use lp64. > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > +/* { dg-final { scan-assembler-not "\mmr\M" } } */ > + > +vector __int128 add (long long a) > +{ > + vector __int128 b; > + b = (vector __int128) {a}; > + return b; > +} Okay with that change. Thanks, David
Re: [PATCH] rs6000: __builtin_darn[_raw] should be in [power9-64] (PR103624)
On Mon, Dec 13, 2021 at 10:48 AM Bill Schmidt wrote: > > Hi! > > PR103624 observes that we get segfaults for the 64-bit darn builtins when > compiled > on a 32-bit architecture. The old built-in infrastructure requires > TARGET_64BIT, and > this was missed in the new support. Moving these two builtins from the > [power9] > stanza to the [power9-64] stanza solves the problem. > > Tested the fix on a powerpc-e300c3-linux-gnu cross. Bootstrapped and tested > on > powerpc64le-linux-gnu with no regressions. Is this okay for trunk? Okay. Thanks, David > > Thanks! > Bill > > > 2021-12-13 Bill Schmidt > > gcc/ > PR target/103624 > * config/rs6000/rs6000-builtin-new.def (__builtin_darn): Move to > [power9-64] stanza. > (__builtin_darn_raw): Likewise. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index 30556e5c7f2..2becd96a36c 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -2799,15 +2799,9 @@ > > ; Miscellaneous P9 functions > [power9] > - signed long long __builtin_darn (); > -DARN darn {} > - >signed int __builtin_darn_32 (); > DARN_32 darn_32 {} > > - signed long long __builtin_darn_raw (); > -DARN_RAW darn_raw {} > - >const signed int __builtin_dtstsfi_eq_dd (const int<6>, _Decimal64); > TSTSFI_EQ_DD dfptstsfi_eq_dd {} > > @@ -2840,6 +2834,12 @@ >void __builtin_altivec_stxvl (vsc, void *, long); > STXVL stxvl {} > > + signed long long __builtin_darn (); > +DARN darn {} > + > + signed long long __builtin_darn_raw (); > +DARN_RAW darn_raw {} > + >const signed int __builtin_scalar_byte_in_set (signed int, signed long > long); > CMPEQB cmpeqb {} > > -- > 2.27.0 > >
Re: [PATCH] rs6000: Builtins for doubleword compare should be in [power8-vector] (PR103625)
On Mon, Dec 13, 2021 at 11:02 AM Bill Schmidt wrote: > > Hi! > > PR103625 observes that we ICE when doing vector compares on doublewords. > The original built-in function support requires Power8 vector support for > these, but this was missed in the new built-in support. Moving these > functions to the [power8-vector] stanza solves the problem. > > Tested the fix on a powerpc-e300c3-linux-gnu cross. Bootstrapped and tested > on > powerpc64le-linux-gnu with no regressions. Is this okay for trunk? Okay. Thanks, David > > Thanks! > Bill > > > 2021-12-13 Bill Schmidt > > gcc/ > PR target/103625 > * config/rs6000/rs6000-builtin-new.def (__builtin_altivec_vcmpequd): > Move to power8-vector stanza. > (__builtin_altivec_vcmpequd_p): Likewise. > (__builtin_altivec_vcmpgtsd): Likewise. > (__builtin_altivec_vcmpgtsd_p): Likewise. > (__builtin_altivec_vcmpgtud): Likewise. > (__builtin_altivec_vcmpgtud_p): Likewise. > --- > gcc/config/rs6000/rs6000-builtin-new.def | 36 > 1 file changed, 18 insertions(+), 18 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-builtin-new.def > b/gcc/config/rs6000/rs6000-builtin-new.def > index a020dbbe2fb..bd950f8db36 100644 > --- a/gcc/config/rs6000/rs6000-builtin-new.def > +++ b/gcc/config/rs6000/rs6000-builtin-new.def > @@ -1200,24 +1200,6 @@ >const vull __builtin_altivec_vandc_v2di_uns (vull, vull); > VANDC_V2DI_UNS andcv2di3 {} > > - const vsll __builtin_altivec_vcmpequd (vull, vull); > -VCMPEQUD vector_eqv2di {} > - > - const int __builtin_altivec_vcmpequd_p (int, vsll, vsll); > -VCMPEQUD_P vector_eq_v2di_p {pred} > - > - const vsll __builtin_altivec_vcmpgtsd (vsll, vsll); > -VCMPGTSD vector_gtv2di {} > - > - const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll); > -VCMPGTSD_P vector_gt_v2di_p {pred} > - > - const vsll __builtin_altivec_vcmpgtud (vull, vull); > -VCMPGTUD vector_gtuv2di {} > - > - const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll); > -VCMPGTUD_P vector_gtu_v2di_p {pred} > - >const vd __builtin_altivec_vnor_v2df (vd, vd); > VNOR_V2DF norv2df3 {} > > @@ -2221,6 +2203,24 @@ >const vsc __builtin_altivec_vbpermq2 (vsc, vsc); > VBPERMQ2 altivec_vbpermq2 {} > > + const vsll __builtin_altivec_vcmpequd (vull, vull); > +VCMPEQUD vector_eqv2di {} > + > + const int __builtin_altivec_vcmpequd_p (int, vsll, vsll); > +VCMPEQUD_P vector_eq_v2di_p {pred} > + > + const vsll __builtin_altivec_vcmpgtsd (vsll, vsll); > +VCMPGTSD vector_gtv2di {} > + > + const int __builtin_altivec_vcmpgtsd_p (int, vsll, vsll); > +VCMPGTSD_P vector_gt_v2di_p {pred} > + > + const vsll __builtin_altivec_vcmpgtud (vull, vull); > +VCMPGTUD vector_gtuv2di {} > + > + const int __builtin_altivec_vcmpgtud_p (int, vsll, vsll); > +VCMPGTUD_P vector_gtu_v2di_p {pred} > + >const vsll __builtin_altivec_vmaxsd (vsll, vsll); > VMAXSD smaxv2di3 {} > > -- > 2.27.0 > >
Re: [PATCH] Modify combine pattern by anding a pseudo with its nonzero bits
On Tue, Nov 30, 2021 at 3:46 AM HAO CHEN GUI wrote: > > Hi, > > This patch modifies the combine pattern with a helper - > change_pseudo_and_mask when recog fails. The helper converts a single pseudo > to the pseudo and with a mask if the outer operator is IOR/XOR/PLUS and the > inner operator is ASHIFT/LSHIFTRT/AND. The conversion helps match shift + ior > pattern. > > Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. > Is this okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > > 2021-11-30 Haochen Gui > > gcc/ > * combine.c (change_pseudo_and_mask): New. > (recog_for_combine): If recog fails, try again with the pattern > modified by change_pseudo_and_mask. > > gcc/testsuite/ > * gcc.target/powerpc/20050603-3.c: Modify the dump check conditions. > * gcc.target/powerpc/rlwimi-2.c: Likewise. > > patch.diff > > diff --git a/gcc/combine.c b/gcc/combine.c > index 03e9a780919..c83c0aceb57 100644 > --- a/gcc/combine.c > +++ b/gcc/combine.c > @@ -11539,6 +11539,42 @@ change_zero_ext (rtx pat) >return changed; > } > > +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is > + ASHIFT/LSHIFTRT/AND, convert a psuedo to psuedo AND with a mask if its ^^^ spelling mistake in comment: pseudo not psuedo Thanks, David > + nonzero_bits is less than its mode mask. */ > +static bool > +change_pseudo_and_mask (rtx pat) > +{ > + bool changed = false; > + > + rtx src = SET_SRC (pat); > + if ((GET_CODE (src) == IOR > + || GET_CODE (src) == XOR > + || GET_CODE (src) == PLUS) > + && (((GET_CODE (XEXP (src, 0)) == ASHIFT > + || GET_CODE (XEXP (src, 0)) == LSHIFTRT > + || GET_CODE (XEXP (src, 0)) == AND) > + && REG_P (XEXP (src, 1))) > + || ((GET_CODE (XEXP (src, 1)) == ASHIFT > + || GET_CODE (XEXP (src, 1)) == LSHIFTRT > + || GET_CODE (XEXP (src, 1)) == AND) > + && REG_P (XEXP (src, 0) > +{ > + rtx *reg = REG_P (XEXP (src, 0)) > +? &XEXP (SET_SRC (pat), 0) > +: &XEXP (SET_SRC (pat), 1); > + machine_mode mode = GET_MODE (*reg); > + unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode); > + if (nonzero < GET_MODE_MASK (mode)) > + { > + rtx x = gen_rtx_AND (mode, *reg, GEN_INT (nonzero)); > + SUBST (*reg, x); > + changed = true; > + } > + } > + return changed; > +} > + > /* Like recog, but we receive the address of a pointer to a new pattern. > We try to match the rtx that the pointer points to. > If that fails, we may try to modify or replace the pattern, > @@ -11586,7 +11622,14 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx > *pnotes) > } > } >else > - changed = change_zero_ext (pat); > + { > + if (change_pseudo_and_mask (pat)) > + { > + maybe_swap_commutative_operands (SET_SRC (pat)); > + changed = true; > + } > + changed |= change_zero_ext (pat); > + } > } >else if (GET_CODE (pat) == PARALLEL) > { > diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c > b/gcc/testsuite/gcc.target/powerpc/20050603-3.c > index 4017d34f429..e628be11532 100644 > --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c > @@ -12,7 +12,7 @@ void rotins (unsigned int x) >b.y = (x<<12) | (x>>20); > } > > -/* { dg-final { scan-assembler-not {\mrlwinm} } } */ > +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */ > /* { dg-final { scan-assembler-not {\mrldic} } } */ > /* { dg-final { scan-assembler-not {\mrot[lr]} } } */ > /* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > index bafa371db73..ffb5f9e450f 100644 > --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c > @@ -2,14 +2,14 @@ > /* { dg-options "-O2" } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } > } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } > } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } > } */ > /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */ > /* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } > } */ > > /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } > } } */ > -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } > } */ > +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } > } */ > > /* { dg-fina
Re: aix: adjust installation directories for GCC64
On Wed, Sep 15, 2021 at 4:12 AM CHIGOT, CLEMENT wrote: > > As gcc on 64bit for AIX is built with "MULTILIB_MATCHES= .=maix32", > "-print-multi-directory" and similar flags aren't returning the > correct directory when used with -maix32: "." is returned instead > of "ppc32". > Libgcc installation script needs to be adjust to bypass this > problem and correctly install 32bit files in a ppc32 subdirectory. > > libgcc/ChangeLog: > 2021-09-03 Clément Chigot > > * config/rs6000/t-slibgcc-aix (SHLIB_INSTALL): Replace > "$(slibdir)@shlib_slibdir_qual@" by $(inst_libdir). Hi, Clement Sorry for the delay. I believe that this is a change in behavior. Maybe you don't see it because you use --enable-version-specific-runtime-libs? If one uses that configure option, your patch produces the same results -- it's explicitly using that version-specific directory. If one does not use that configure option, your patch enforces that behavior. The patch should not change that behavior. Based on your question in another email message, I infer that MULTIOSSUBDIR and shlib_slibdir_qual are defined incorrectly. Thanks, David
Re: [PATCH 0/3] Add zero cycle move support
On Mon, Nov 22, 2021 at 10:58 AM Bill Schmidt wrote: > > Hi! > > On 11/19/21 8:49 AM, Michael Meissner wrote: > > The next set of 3 patches add zero cycle move support to the Power10. Zero > > cycle moves are where the move to LR/CTR/TAR register that is adjacent to > > the > > jump to LR/CTR/TAR register can be fused together. > > > > At the moment, these set of three patches add support for zero cycle moves > > for > > indirect jumps and switch tables using the CTR register. Potential zero > > cycle > > moves for doing returns are not currently handled. > > > > In looking at the code, I discovered that just using zero cycle moves isn't > > as > > helpful unless we can eliminate the add instruction before doing the jump. > > I > > also noticed that the various power10 fusion options are only done if > > -mcpu=power10. I added a patch to do the fusion for -mtune=power10 as well. > > > > I have done bootstraps and make check with these patches installed on both > > little endian power9 and little endian power10 systems. Can I install these > > patches? > > > > The following patches will be posted: > > > > 1) Patch to add zero cycle move for indirect jumps and switches. > > > > 2) Patch to enable p10 fusion for -mtune=power10 in addition to > > -mcpu=power10. > > > > 3) Patch to use absolute addresses for switch tables instead of relative > >addresses if zero cycle fusion is enabled. > > > For this last point, I had thought that the plan was to always switch over to > absolute addresses for switch tables, following the work that Hao Chen did in > this area. Am I misremembering? Hao Chen, can you please remind me where we > ended up here? And do the absolute addressing for switch tables changes work on AIX? I thought that Hao Chen only had done the work for PPC64 Linux ELF syntax with promises of future changes to accommodate AIX as well. Thanks, David
Re: [PATCH, rs6000] optimization for vec_reve builtin [PR100868]
On Wed, Nov 17, 2021 at 3:28 AM HAO CHEN GUI wrote: > > Hi, > > The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it > is implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it > is implemented by quadword byte reverse plus halfword/word byte reverse when > p9_vector is set. > > Bootstrapped and tested on powerpc64le-linux with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. > > ChangeLog > 2021-11-17 Haochen Gui > > gcc/ > * config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use > xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si > or v4sf when p9_vector is set. > (altivec_vreve2 for VEC_64): Defined. Implemented by xxswapd. > > gcc/testsuite/ > * gcc.target/powerpc/vec_reve_1.c: New test. > * gcc.target/powerpc/vec_reve_2.c: Likewise. This is okay. Please don't send a message that contains the patch as both an inline message and as an attachment. Thanks, David
Re: [PATCH] Fix rs6000 predicates.md use of decl_replaceable_p
On Thu, Nov 18, 2021 at 2:07 PM Jan Hubicka wrote: > > > --- a/gcc/config/rs6000/predicates.md > > +++ b/gcc/config/rs6000/predicates.md > > @@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand" > > (match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op)) > > && (SYMBOL_REF_LOCAL_P (op) > > || (op == XEXP (DECL_RTL (current_function_decl), 0) > > - && !decl_replaceable_p (current_function_decl))) > > + && !decl_replaceable_p (current_function_decl, > > + opt_for_fn > > (current_function_decl, > > + > > flag_semantic_interposition > > Thanks, missed the use of the predicate here. > However it is not clear to me why one would care about semantic > interposition at this level. It seems to me that one more cares whether > the symbol must be always resolved locally. In this case > cgraph_node::get (current_function_decl)->binds_to_current_def_p > (cgraph_node::get (current_function_decl)) > would give right answer (and work for cases like functions in comdat groups) Hi, Honza I was trying to fix bootstrap as quickly as possible and used the best example that I could find. It definitely can be refined. Thanks for suggesting a better design. I'll test it. Thanks, David > > Honza > > && !((DEFAULT_ABI == ABI_AIX > > || DEFAULT_ABI == ABI_ELFv2) > > && (SYMBOL_REF_EXTERNAL_P (op)
[PATCH] Fix rs6000 predicates.md use of decl_replaceable_p
--- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand" (match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op)) && (SYMBOL_REF_LOCAL_P (op) || (op == XEXP (DECL_RTL (current_function_decl), 0) - && !decl_replaceable_p (current_function_decl))) + && !decl_replaceable_p (current_function_decl, + opt_for_fn (current_function_decl, + flag_semantic_interposition && !((DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == ABI_ELFv2) && (SYMBOL_REF_EXTERNAL_P (op)
[PATCH] Detect Power10 CPU on AIX
For -mcpu=native, GCC needs to detect the processor. This patch adds the processor value for Power10. Suggested by Kevin Alder. * config/rs6000/driver-rs6000.c (detect_processor_aix): Add power10. Bootstrapped on powerpc-ibm-aix7.2.3.0 diff --git a/gcc/config/rs6000/driver-rs6000.c b/gcc/config/rs6000/driver-rs6000.c index e23c445b198..4de373da95a 100644 --- a/gcc/config/rs6000/driver-rs6000.c +++ b/gcc/config/rs6000/driver-rs6000.c @@ -418,6 +418,9 @@ detect_processor_aix (void) case 0x2: return "power9"; +case 0x4: + return "power10"; + default: return "powerpc"; }
Re: [PATCH] rs6000: Better error messages for power8/9-vector builtins
On Wed, Nov 17, 2021 at 3:02 PM Segher Boessenkool wrote: > > > It's not a strong objection, since specifying "-mno-vsx" should be > > uncommon. (Right?) And, specifying "-mcpu=power8 -mvsx" is harmless. > > Maybe the warning could say "requires -mcpu=power8 (and -mvsx)"? Is > that clearer, to your eye? Maybe "requires -mcpu=power8 with VSX" or "requires -mcpu=power8 with VSX enabled"? Thanks, David
Re: [PATCH] rs6000: Match recent builtins changes in new builtins support
On Tue, Nov 9, 2021 at 4:40 PM Bill Schmidt wrote: > > Hi! Over the last month or so, Haochen made a couple of changes to the > builtins > support that need to be reflected into the new builtin support: > > 14e355df Disable gimple folding for vector min/max without fast-math > 91419baf Optimize vec_xl_sext > > In both cases these require almost identical changes in the new builtins > support > to what was done for the old support. While testing, I also observed that > Haochen > made a small mistake in the test case gcc.target/powerpc/p10_vec_xl_sext.c, > using > "vector long" when "vector long long" is the officially supported interface. > I > corrected that as part of this patch. > > Bootstrapped and tested on powerpc64le-linux-gnu with no regressions. Is this > okay for trunk? > > Thanks! > Bill > > > 2021-11-09 Bill Schmidt > > gcc/ > * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin): > Disable gimple fold for RS6000_BIF_{XVMINDP,XVMINSP,VMINFP} and > RS6000_BIF_{XVMAXDP,XVMAXSP,VMAXFP} when fast-math is not set. > (lxvrse_expand_builtin): Modify the expansion for sign extension. > All extensions are done within VSX registers. > > gcc/testsuite/ > * gcc.target/powerpc/p10_vec_xl_sext.c: Fix long long case. Okay. Thanks, David
[PATCH] Fix vsx_splat_v4si in 32 bit mode
powerpc: Fix vsx_splat_v4si in 32 bit mode Tamar's recent patch to teach CSE to perform vector extract exercises VSX splat more frequently, which exposed a constraint error for the vsx_splat patterns. The pattern could be created for Power9, but the "we constraint only provided alternatives in 64 bit mode. The instructions are valid in 32 bit mode and SImode is allowed in VSX registers. This patch updates the constraints from "we" to "wa" to allow the pattern and fix the failing testcases. Bootstrapped on powerpc-ibm-aix7.2.3.0. gcc/ChangeLog: * config/rs6000/vsx.md (vsx_splat_v4si): Change constraints to "wa". (vsx_splat_v4si_di): Change constraint to "wa" diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 0bf04feb6c4..a97f7f2a680 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4565,7 +4565,7 @@ (define_insn "vsx_splat__mem" ;; V4SI splat support (define_insn "vsx_splat_v4si" - [(set (match_operand:V4SI 0 "vsx_register_operand" "=we,we") + [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa") (vec_duplicate:V4SI (match_operand:SI 1 "splat_input_operand" "r,Z")))] "TARGET_P9_VECTOR" @@ -4578,7 +4578,7 @@ (define_insn "vsx_splat_v4si" ;; allows us to use direct move to get the value in a vector register ;; so that we can use XXSPLTW (define_insn "vsx_splat_v4si_di" - [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,we") + [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa") (vec_duplicate:V4SI (truncate:SI (match_operand:DI 1 "gpc_reg_operand" "wa,r"]
Re: [PATCH] Add TSVC tests.
I just noticed that Iain adjusted the tsvc.h for Darwin in the same way that I need to adjust it for AIX. Are we trying to keep the testcase directory pristine and in sync with its upstream source or can we fix it locally? Thanks, David On Fri, Nov 5, 2021 at 8:24 PM David Edelsohn wrote: > > Hi, Martin > > These testcases rely on memalign in tsvc.h. memalign is provided in > Linux and Solaris, but is not part of Posix, and it is not available > in AIX. Posix defines posix_memalign, which also is available in AIX. > > Should the tsvc.h use posix_memalign? Always? Only when memalign is > not available? > > Thanks, David
Re: [PATCH] Add TSVC tests.
Hi, Martin These testcases rely on memalign in tsvc.h. memalign is provided in Linux and Solaris, but is not part of Posix, and it is not available in AIX. Posix defines posix_memalign, which also is available in AIX. Should the tsvc.h use posix_memalign? Always? Only when memalign is not available? Thanks, David
Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]
On Thu, Nov 4, 2021 at 8:50 PM Xionghu Luo wrote: > [PATCH] rs6000: Fix incorrect fusion constraint [PR102991] > > gcc/ChangeLog: > > * config/rs6000/fusion.md: Regenerate. > * config/rs6000/genfusion.pl: Fix incorrect clobber constraint. Okay. Thanks, David
Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]
On Wed, Nov 3, 2021 at 9:46 PM Xionghu Luo wrote: > > On 2021/11/3 23:13, David Edelsohn wrote: > > Did you manually change fusion.md or did you regenerate it after > > fixing genfusion.pl? > > > > If you regenerated it, the ChangeLog entry should be "Regenerated" and > > the "Fix incorrect clobber constraint." should refer to the > > genfusion.pl change. > > > > I want to ensure that genfusion.pl generates the correct constraint > > the next time it is used. > > > > Aaron mentioned he disabled the auto generation here[1], but before > than that, Segher suggested to enable it in stage1. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-February/564652.html > [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564244.html > > > Thus re-enable it with the followed v2 patch(Confirmed the fusion.md is > exactly same with v1 patch.) > > > [PATCH v2] rs6000: Fix incorrect fusion constraint [PR102991] > > > gcc/ChangeLog: > > * config/rs6000/fusion.md: Regenerate. > * config/rs6000/genfusion.pl: Fix incorrect clobber constraint. > * config/rs6000/t-rs6000: Uncomment regeneration of fusion.md. I believe that there is some confusion about my request. I am not requesting that the patch enable genfusion.pl . The Makefile fragment rule to generate fusion.md is disabled for a reason and normally should not be enabled. But fusion.md should be generated by genfusion.pl when there is a change, and any changes should be made in genfusion.pl. In other words, change genfusion.pl, temporarily enable the Makefile fragment rule, generate fusion.md, disable genfusion.pl. My request was an effort to ensure that genfusion.pl correctly regenerates the new, corrected fusion.md file. I don't want a manual change to fusion.md that differs from the automatically generated file. Only the updated fusion.md and genfusion.pl should be checked in. Has Aaron reviewed and confirmed the change to genfusion.pl? Thanks, David
Re: [PATCH] rs6000: Fix incorrect fusion constraint [PR102991]
Did you manually change fusion.md or did you regenerate it after fixing genfusion.pl? If you regenerated it, the ChangeLog entry should be "Regenerated" and the "Fix incorrect clobber constraint." should refer to the genfusion.pl change. I want to ensure that genfusion.pl generates the correct constraint the next time it is used. Thanks, David On Wed, Nov 3, 2021 at 9:32 AM Xionghu Luo wrote: > > The clobber constraint should match operand's constraint. fusion.md was > generated by genfusion.pl, but it is disabled now, update both places with > correct clobber constraint. > > gcc/ChangeLog: > > * config/rs6000/fusion.md: Fix incorrect clobber constraint. > * config/rs6000/genfusion.pl: Likewise. > --- > gcc/config/rs6000/fusion.md| 128 - > gcc/config/rs6000/genfusion.pl | 2 +- > 2 files changed, 65 insertions(+), 65 deletions(-) > > diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md > index 516baa0bb0b..d11cecb11ee 100644 > --- a/gcc/config/rs6000/fusion.md > +++ b/gcc/config/rs6000/fusion.md > @@ -1874,7 +1874,7 @@ (define_insn "*fuse_vand_vand" > (and:VM (and:VM (match_operand:VM 0 "altivec_register_operand" > "v,v,v,v") >(match_operand:VM 1 "altivec_register_operand" > "%v,v,v,v")) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vand %3,%1,%0\;vand %3,%3,%2 > @@ -1892,7 +1892,7 @@ (define_insn "*fuse_vandc_vand" > (and:VM (and:VM (not:VM (match_operand:VM 0 > "altivec_register_operand" "v,v,v,v")) >(match_operand:VM 1 "altivec_register_operand" > "v,v,v,v")) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vandc %3,%1,%0\;vand %3,%3,%2 > @@ -1910,7 +1910,7 @@ (define_insn "*fuse_veqv_vand" > (and:VM (not:VM (xor:VM (match_operand:VM 0 > "altivec_register_operand" "v,v,v,v") >(match_operand:VM 1 "altivec_register_operand" > "v,v,v,v"))) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > veqv %3,%1,%0\;vand %3,%3,%2 > @@ -1928,7 +1928,7 @@ (define_insn "*fuse_vnand_vand" > (and:VM (ior:VM (not:VM (match_operand:VM 0 > "altivec_register_operand" "v,v,v,v")) >(not:VM (match_operand:VM 1 > "altivec_register_operand" "v,v,v,v"))) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vnand %3,%1,%0\;vand %3,%3,%2 > @@ -1946,7 +1946,7 @@ (define_insn "*fuse_vnor_vand" > (and:VM (and:VM (not:VM (match_operand:VM 0 > "altivec_register_operand" "v,v,v,v")) >(not:VM (match_operand:VM 1 > "altivec_register_operand" "v,v,v,v"))) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vnor %3,%1,%0\;vand %3,%3,%2 > @@ -1964,7 +1964,7 @@ (define_insn "*fuse_vor_vand" > (and:VM (ior:VM (match_operand:VM 0 "altivec_register_operand" > "v,v,v,v") >(match_operand:VM 1 "altivec_register_operand" > "v,v,v,v")) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vor %3,%1,%0\;vand %3,%3,%2 > @@ -1982,7 +1982,7 @@ (define_insn "*fuse_vorc_vand" > (and:VM (ior:VM (not:VM (match_operand:VM 0 > "altivec_register_operand" "v,v,v,v")) >(match_operand:VM 1 "altivec_register_operand" > "v,v,v,v")) > (match_operand:VM 2 "altivec_register_operand" "v,v,v,v"))) > - (clobber (match_scratch:VM 4 "=X,X,X,&r"))] > + (clobber (match_scratch:VM 4 "=X,X,X,&v"))] >"(TARGET_P10_FUSION && TARGET_P10_FUSION_2LOGICAL)" >"@ > vorc %3,%1,%0\;vand %3,%3,%2 > @@ -2000,7 +2000,7 @@ (define_insn "*fuse_vxor_vand" > (and:VM (xor:VM (match_operand:VM 0 "altivec_register_operand" > "v,v,v,v") >(match_operand:VM 1 "altivec_register_operand" > "v,v,v,v")) >
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
On Mon, Nov 1, 2021 at 10:40 PM HAO CHEN GUI wrote: > > David, > > My patch file was broken. I am sorry for it. Here is the correct one. > Thanks a lot. > > ChangeLog > > 2021-11-01 Haochen Gui > > gcc/ > * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable > gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, > VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not > set. > > gcc/testsuite/ > * gcc.target/powerpc/vec-minmax-1.c: New test. > * gcc.target/powerpc/vec-minmax-2.c: Likewise. This is okay. The default DejaGNU test action is compile, but it's a good idea to include the dg-do line to be clear and document the intention. Thanks, David
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
Hi, Hao Neither the inlined patch nor the attached patch seem to contain the change to rs6000-call.c. I only see the new testcases. Please resend the complete patch. Thanks David On Mon, Nov 1, 2021 at 2:48 AM HAO CHEN GUI wrote: > > Hi, > > This patch disables gimple folding for VSX_BUILTIN_XVMINDP, > VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMINFP and ALTIVEC_BUILTIN_VMAXFP when > fast-math is not set. With the gimple folding is enabled, the four built-ins > will be implemented by c-type instructions - xs[min|max]cdp on P9 and P10 if > they can be converted to scalar comparisons. While they are implemented by > xv[min|max][s|d]p on P8 and P7 as P8 and P7 don't have corresponding scalar > comparison instructions. The patch binds these four built-ins to > xv[min|max][s|d]p when fast-math is not set. The two new test cases > illustrate it. > > ALTIVEC_BUILTIN_VMINFP and ALTIVEC_BUILTIN_VMAXFP are not implemented by > vminfp or vmaxfp. > > rs6000-builtin.def:BU_ALTIVEC_2 (VMAXFP, "vmaxfp", > CONST, smaxv4sf3) > > rs6000-builtin.def:BU_ALTIVEC_2 (VMINFP, "vminfp", > CONST, sminv4sf3) > > Bootstrapped and tested on powerpc64le-linux with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. > > > ChangeLog > > 2021-11-01 Haochen Gui > > gcc/ > * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Disable > gimple fold for VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, > VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP when fast-math is not > set. > > gcc/testsuite/ > * gcc.target/powerpc/vec-minmax-1.c: New test. > * gcc.target/powerpc/vec-minmax-2.c: Likewise. > > > patch.diff > > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c > b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c > new file mode 100644 > index 000..e238659c9be > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-1.c > @@ -0,0 +1,52 @@ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */ > +/* { dg-final { scan-assembler-times {\mxvmaxdp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvmaxsp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvmindp\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\mxvminsp\M} 1 } } */ > + > +/* This test verifies that float or double vec_min/max are bound to > + xv[min|max][d|s]p instructions when fast-math is not set. */ > + > + > +#include > + > +#ifdef _BIG_ENDIAN > + const int PREF_D = 0; > +#else > + const int PREF_D = 1; > +#endif > + > +double vmaxd (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_max (va, vb), PREF_D); > +} > + > +double vmind (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_min (va, vb), PREF_D); > +} > + > +#ifdef _BIG_ENDIAN > + const int PREF_F = 0; > +#else > + const int PREF_F = 3; > +#endif > + > +float vmaxf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_max (va, vb), PREF_F); > +} > + > +float vminf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_min (va, vb), PREF_F); > +} > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > new file mode 100644 > index 000..149275d8709 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-minmax-2.c > @@ -0,0 +1,50 @@ > +/* { dg-require-effective-target powerpc_p9vector_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -ffast-math" } */ > +/* { dg-final { scan-assembler-times {\mxsmaxcdp\M} 2 } } */ > +/* { dg-final { scan-assembler-times {\mxsmincdp\M} 2 } } */ > + > +/* This test verifies that float or double vec_min/max can be converted > + to scalar comparison when fast-math is set. */ > + > + > +#include > + > +#ifdef _BIG_ENDIAN > + const int PREF_D = 0; > +#else > + const int PREF_D = 1; > +#endif > + > +double vmaxd (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_max (va, vb), PREF_D); > +} > + > +double vmind (double a, double b) > +{ > + vector double va = vec_promote (a, PREF_D); > + vector double vb = vec_promote (b, PREF_D); > + return vec_extract (vec_min (va, vb), PREF_D); > +} > + > +#ifdef _BIG_ENDIAN > + const int PREF_F = 0; > +#else > + const int PREF_F = 3; > +#endif > + > +float vmaxf (float a, float b) > +{ > + vector float va = vec_promote (a, PREF_F); > + vector float vb = vec_promote (b, PREF_F); > + return vec_extract (vec_max (va, vb), PREF_F); > +} > +
Re: [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]
On Thu, Oct 28, 2021 at 1:39 AM Xionghu Luo wrote: > > On 2021/10/27 21:24, David Edelsohn wrote: > > On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo wrote: > >> > >> If the second operand of __builtin_shuffle is const vector 0, and with > >> specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. > >> > >> gcc/ChangeLog: > >> > >> * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add > >> patterns match and emit for VSX xxpermdi. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/powerpc/pr102868.c: New test. > >> --- > >> gcc/config/rs6000/rs6000.c | 47 -- > >> gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 + > >> 2 files changed, 97 insertions(+), 3 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c > >> > >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > >> index d0730253bcc..5d802c1fa96 100644 > >> --- a/gcc/config/rs6000/rs6000.c > >> +++ b/gcc/config/rs6000/rs6000.c > >> @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx > >> op0, rtx op1, > >> {OPTION_MASK_P8_VECTOR, > >> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct > >> : CODE_FOR_p8_vmrgew_v4sf_direct, > >> - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}}; > >> + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}, > >> +{OPTION_MASK_VSX, > >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > >> + : CODE_FOR_vsx_xxpermdi_v16qi), > >> + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}}, > >> +{OPTION_MASK_VSX, > >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > >> + : CODE_FOR_vsx_xxpermdi_v16qi), > >> + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}}, > >> +{OPTION_MASK_VSX, > >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > >> + : CODE_FOR_vsx_xxpermdi_v16qi), > >> + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}}, > >> +{OPTION_MASK_VSX, > >> + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > >> + : CODE_FOR_vsx_xxpermdi_v16qi), > >> + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}}; > > > > If the insn_code is the same for big endian and little endian, why > > does the new code test BYTES_BIG_ENDIAN to set the same value > > (CODE_FOR_vsx_xxpermdi_v16qi)? > > > > Thanks for the catch, updated the patch as below: > > [PATCH v2] rs6000: Optimize __builtin_shuffle when it's used to zero the > upper bits [PR102868] > > If the second operand of __builtin_shuffle is const vector 0, and with > specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. > > gcc/ChangeLog: > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add > patterns match and emit for VSX xxpermdi. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr102868.c: New test. Okay. Thanks, David
Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]
On Wed, Oct 27, 2021 at 9:30 PM Kewen.Lin wrote: > > Hi David, > > Thanks for the review! > > on 2021/10/27 下午9:12, David Edelsohn wrote: > > On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin wrote: > >> > >> Hi, > >> > >> As PR102767 shows, the commit r12-3482 exposed one ICE in function > >> rs6000_builtin_vectorization_cost. We claims V1TI supports movmisalign > >> on rs6000 (See define_expand "movmisalign"), so it return true in > >> rs6000_builtin_support_vector_misalignment for misalign 8. Later in > >> the cost querying rs6000_builtin_vectorization_cost, we don't have > >> the arms to handle the V1TI input under (TARGET_VSX && > >> TARGET_ALLOW_MOVMISALIGN). > >> > >> The proposed fix is to add the consideration for V1TI, simply make it > >> as the cost for doubleword which is apparently bigger than the cost of > >> scalar, won't have the vectorization to happen, just to keep consistency > >> and avoid ICE. Another thought is to not support movmisalign for V1TI, > >> but it sounds like a bad idea since it doesn't match the reality. > >> > >> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > >> powerpc64-linux-gnu P8. > >> > >> Is it ok for trunk? > >> > >> BR, > >> Kewen > >> - > >> gcc/ChangeLog: > >> > >> PR target/102767 > >> * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): > >> Consider > >> V1T1 mode for unaligned load and store. > >> > >> gcc/testsuite/ChangeLog: > >> > >> PR target/102767 > >> * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file. > >> > >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > >> index b7ea1483da5..73d3e06c3fc 100644 > >> --- a/gcc/config/rs6000/rs6000.c > >> +++ b/gcc/config/rs6000/rs6000.c > >> @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum > >> vect_cost_for_stmt type_of_cost, > >> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) > >> { > >> elements = TYPE_VECTOR_SUBPARTS (vectype); > >> - if (elements == 2) > >> + /* See PR102767, consider V1TI to keep consistency. */ > >> + if (elements == 2 || elements == 1) > >> /* Double word aligned. */ > >> return 4; > >> > >> @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum > >> vect_cost_for_stmt type_of_cost, > >> > >> if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) > >>{ > >> -elements = TYPE_VECTOR_SUBPARTS (vectype); > >> -if (elements == 2) > >> - /* Double word aligned. */ > >> - return 2; > >> + elements = TYPE_VECTOR_SUBPARTS (vectype); > >> + /* See PR102767, consider V1TI to keep consistency. */ > >> + if (elements == 2 || elements == 1) > >> + /* Double word aligned. */ > >> + return 2; > > > > This section of the patch incorrectly changes the indentation. Please > > use the correct indentation. > > > > The indentation change is intentional since the original identation is > wrong (more than 8 spaces leading the lines), there are more wrong > identation lines above the first changed line, but I thought it seems a > bad idea to fix them too when they are unrelated to what this patch > wants to fix, so I left them alone. > > With the above clarification, may I push this patch without any updates > for the mentioned indentation issue? If you correct the indentation, you should adjust it for the entire block, not just the lines that you change. If you want to fix the entire block to TAB+spaces as well, okay. You didn't mention that you were fixing the indentation in the explanation of the patch. Thank, David > > >> > >> if (elements == 4) > >>{ > >> diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > >> b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > >> new file mode 100644 > >> index 000..a4122482989 > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > >> @@ -0,0 +1,21 @@ > >> +! { dg-require-effective-target powerpc_vsx_ok } > >> +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" } > >> + > >> +INTERFACE > >> + FUNCTION elemental_mult (a, b, c) > >> +type(*), DIMENSION(..) :: a, b, c > >> + END > >> +END INTERFACE > >> + > >> +allocatable z > >> +integer, dimension(2,2) :: a, b > >> +call test_CFI_address > >> +contains > >> + subroutine test_CFI_address > >> +if (elemental_mult (z, x, y) .ne. 0) stop > >> +a = reshape ([4,3,2,1], [2,2]) > >> +b = reshape ([2,3,4,5], [2,2]) > >> +if (elemental_mult (i, a, b) .ne. 0) stop > >> + end > >> +end > >> + > >> > > > > The patch is okay with the indentation correction. > > > > Thanks, David > > > > Thanks! > > BR, > Kewen
Re: [PATCH] rs6000: Optimize __builtin_shuffle when it's used to zero the upper bits [PR102868]
On Sun, Oct 24, 2021 at 10:51 PM Xionghu Luo wrote: > > If the second operand of __builtin_shuffle is const vector 0, and with > specific mask, it can be optimized to vspltisw+xxpermdi instead of lxv. > > gcc/ChangeLog: > > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Add > patterns match and emit for VSX xxpermdi. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr102868.c: New test. > --- > gcc/config/rs6000/rs6000.c | 47 -- > gcc/testsuite/gcc.target/powerpc/pr102868.c | 53 + > 2 files changed, 97 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102868.c > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index d0730253bcc..5d802c1fa96 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -23046,7 +23046,23 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgow_v4sf_direct > : CODE_FOR_p8_vmrgew_v4sf_direct, > - {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}}; > + {4, 5, 6, 7, 20, 21, 22, 23, 12, 13, 14, 15, 28, 29, 30, 31}}, > +{OPTION_MASK_VSX, > + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > + : CODE_FOR_vsx_xxpermdi_v16qi), > + {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23}}, > +{OPTION_MASK_VSX, > + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > + : CODE_FOR_vsx_xxpermdi_v16qi), > + {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}}, > +{OPTION_MASK_VSX, > + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > + : CODE_FOR_vsx_xxpermdi_v16qi), > + {0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31}}, > +{OPTION_MASK_VSX, > + (BYTES_BIG_ENDIAN ? CODE_FOR_vsx_xxpermdi_v16qi > + : CODE_FOR_vsx_xxpermdi_v16qi), > + {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31}}}; If the insn_code is the same for big endian and little endian, why does the new code test BYTES_BIG_ENDIAN to set the same value (CODE_FOR_vsx_xxpermdi_v16qi)? Thanks, David > >unsigned int i, j, elt, which; >unsigned char perm[16]; > @@ -23169,6 +23185,27 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, > machine_mode omode = insn_data[icode].operand[0].mode; > machine_mode imode = insn_data[icode].operand[1].mode; > > + rtx perm_idx = GEN_INT (0); > + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) > + { > + int perm_val = 0; > + if (one_vec) > + { > + if (perm[0] == 8) > + perm_val |= 2; > + if (perm[8] == 8) > + perm_val |= 1; > + } > + else > + { > + if (perm[0] != 0) > + perm_val |= 2; > + if (perm[8] != 16) > + perm_val |= 1; > + } > + perm_idx = GEN_INT (perm_val); > + } > + > /* For little-endian, don't use vpkuwum and vpkuhum if the > underlying vector type is not V4SI and V8HI, respectively. > For example, using vpkuwum with a V8HI picks up the even > @@ -23192,7 +23229,8 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, >/* For little-endian, the two input operands must be swapped > (or swapped back) to ensure proper right-to-left numbering > from 0 to 2N-1. */ > - if (swapped ^ !BYTES_BIG_ENDIAN) > + if (swapped ^ !BYTES_BIG_ENDIAN > + && icode != CODE_FOR_vsx_xxpermdi_v16qi) > std::swap (op0, op1); > if (imode != V16QImode) > { > @@ -23203,7 +23241,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, > rtx op1, > x = target; > else > x = gen_reg_rtx (omode); > - emit_insn (GEN_FCN (icode) (x, op0, op1)); > + if (icode == CODE_FOR_vsx_xxpermdi_v16qi) > + emit_insn (GEN_FCN (icode) (x, op0, op1, perm_idx)); > + else > + emit_insn (GEN_FCN (icode) (x, op0, op1)); > if (omode != V16QImode) > emit_move_insn (target, gen_lowpart (V16QImode, x)); > return true; > diff --git a/gcc/testsuite/gcc.target/powerpc/pr102868.c > b/gcc/testsuite/gcc.target/powerpc/pr102868.c > new file mode 100644 > index 000..eb45d193f66 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr102868.c > @@ -0,0 +1,53 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -mvsx" } */ > + > +#include > +vector float b = {0.0f, 0.0f, 0.0f, 0.0f}; > + > + > +vector float foo1 (vector float x) > +{ > + vector int c = {0, 1,
Re: Ping^3: [PATCH v2 0/2] Fix vec_sel code generation and merge xxsel to vsel
This patch series is okay. Thanks, David On Thu, Oct 21, 2021 at 11:25 PM Xionghu Luo wrote: > > Ping^3, thanks. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html > > > On 2021/10/15 14:28, Xionghu Luo via Gcc-patches wrote: > > Ping^2, thanks. > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579637.html > > > > > > On 2021/10/8 09:17, Xionghu Luo via Gcc-patches wrote: > >> Ping, thanks. > >> > >> > >> On 2021/9/17 13:25, Xionghu Luo wrote: > >>> These two patches are updated version from: > >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579490.html > >>> > >>> Changes: > >>> 1. Fix alignment error in md files. > >>> 2. Replace rtx_equal_p with match_dup. > >>> 3. Use register_operand instead of gpc_reg_operand to align with > >>>vperm/xxperm. > >>> 4. Regression tested pass on P8LE. > >>> > >>> Xionghu Luo (2): > >>> rs6000: Fix wrong code generation for vec_sel [PR94613] > >>> rs6000: Fold xxsel to vsel since they have same semantics > >>> > >>> gcc/config/rs6000/altivec.md | 84 ++- > >>> gcc/config/rs6000/rs6000-call.c | 62 ++ > >>> gcc/config/rs6000/rs6000.c| 19 ++--- > >>> gcc/config/rs6000/vector.md | 26 +++--- > >>> gcc/config/rs6000/vsx.md | 25 -- > >>> gcc/testsuite/gcc.target/powerpc/builtins-1.c | 2 +- > >>> gcc/testsuite/gcc.target/powerpc/pr94613.c| 47 +++ > >>> 7 files changed, 193 insertions(+), 72 deletions(-) > >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr94613.c > >>> > >> > > > > -- > Thanks, > Xionghu
Re: [PATCH] rs6000: Fix ICE of vect cost related to V1TI [PR102767]
On Sun, Oct 24, 2021 at 11:04 PM Kewen.Lin wrote: > > Hi, > > As PR102767 shows, the commit r12-3482 exposed one ICE in function > rs6000_builtin_vectorization_cost. We claims V1TI supports movmisalign > on rs6000 (See define_expand "movmisalign"), so it return true in > rs6000_builtin_support_vector_misalignment for misalign 8. Later in > the cost querying rs6000_builtin_vectorization_cost, we don't have > the arms to handle the V1TI input under (TARGET_VSX && > TARGET_ALLOW_MOVMISALIGN). > > The proposed fix is to add the consideration for V1TI, simply make it > as the cost for doubleword which is apparently bigger than the cost of > scalar, won't have the vectorization to happen, just to keep consistency > and avoid ICE. Another thought is to not support movmisalign for V1TI, > but it sounds like a bad idea since it doesn't match the reality. > > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and > powerpc64-linux-gnu P8. > > Is it ok for trunk? > > BR, > Kewen > - > gcc/ChangeLog: > > PR target/102767 > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Consider > V1T1 mode for unaligned load and store. > > gcc/testsuite/ChangeLog: > > PR target/102767 > * gcc.target/powerpc/ppc-fortran/pr102767.f90: New file. > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index b7ea1483da5..73d3e06c3fc 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -5145,7 +5145,8 @@ rs6000_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) > { > elements = TYPE_VECTOR_SUBPARTS (vectype); > - if (elements == 2) > + /* See PR102767, consider V1TI to keep consistency. */ > + if (elements == 2 || elements == 1) > /* Double word aligned. */ > return 4; > > @@ -5184,10 +5185,11 @@ rs6000_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > > if (TARGET_VSX && TARGET_ALLOW_MOVMISALIGN) >{ > -elements = TYPE_VECTOR_SUBPARTS (vectype); > -if (elements == 2) > - /* Double word aligned. */ > - return 2; > + elements = TYPE_VECTOR_SUBPARTS (vectype); > + /* See PR102767, consider V1TI to keep consistency. */ > + if (elements == 2 || elements == 1) > + /* Double word aligned. */ > + return 2; This section of the patch incorrectly changes the indentation. Please use the correct indentation. > > if (elements == 4) >{ > diff --git a/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > new file mode 100644 > index 000..a4122482989 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/ppc-fortran/pr102767.f90 > @@ -0,0 +1,21 @@ > +! { dg-require-effective-target powerpc_vsx_ok } > +! { dg-options "-mvsx -O2 -ftree-vectorize -mno-efficient-unaligned-vsx" } > + > +INTERFACE > + FUNCTION elemental_mult (a, b, c) > +type(*), DIMENSION(..) :: a, b, c > + END > +END INTERFACE > + > +allocatable z > +integer, dimension(2,2) :: a, b > +call test_CFI_address > +contains > + subroutine test_CFI_address > +if (elemental_mult (z, x, y) .ne. 0) stop > +a = reshape ([4,3,2,1], [2,2]) > +b = reshape ([2,3,4,5], [2,2]) > +if (elemental_mult (i, a, b) .ne. 0) stop > + end > +end > + > The patch is okay with the indentation correction. Thanks, David
Re: [PATCH] gcc: implement AIX-style constructors
On Thu, Oct 21, 2021 at 8:39 AM CHIGOT, CLEMENT wrote: > > Hi David, > > The problem is that cdtors is created by the linker only when the -bcdtors > flag is provided. Thus, if we add "extern void (* _cdtors[]) (void);" to > the "crtcxa.c", we can't use it without using the new constructor types. > One solution would be to create another crtcxa (eg crtcxa_cdtors.o) which will > be replacing the default crtcxa.o when needed. I didn't think about that > when creating the patch. But it might be a better approach. What do you think > ? Hi, Clement Another, optional object file in libgcc seems like a cleaner solution than collect2.c emitting all of the code. The file should not be called crtcxa_cdtors.o because "cxa" came from the name of the libstdc++ support functions. Maybe crtcdtors.o or crt_cdtors.o or aixcdtors.o or aix_cdtors.o . Thanks, David
Re: [PATCH] gcc: implement AIX-style constructors
Clement, + /* Use __C_runtime_pstartup to run ctors and register dtors. + This whole part should normally be in libgcc but as + AIX cdtors format is currently not the default, managed + that in collect2. */ Why are you emitting the special startup function call in collect2.c instead of placing it in libgcc. The comment mentions that the special startup function should be defined in libgcc. Yes, the AIX ld bcdtors mechanism is not the default, but what is the harm? The symbol will be defined and exported by libgcc. If the AIX linker -bcdtors functionality is not invoked, the symbol is not used. And if a user does invoke the AIX linker with -bcdtors, the behavior will be the same (either the program was compiled to use AIX cdtors or not, which is the same if the startup function is emitted by collect2.c. Also, the patch should include documentation of the option. The documentation should mention that this is for interoperability with IBM XL Compiler, and the option will not operate correctly unless the application and the GCC runtime are built with the option. Thanks, David On Mon, Oct 18, 2021 at 3:55 AM CHIGOT, CLEMENT wrote: > > AIX linker now supports constructors and destructors detection. For such > functions to be detected, their name must starts with __sinit or __sterm. > and -bcdtors must be passed to linker calls. It will create "_cdtors" > symbol which can be used to launch the initialization. > > This patch creates a new RS6000 flag "-mcdtors=". > With "-mcdtors=aix", gcc will generate these new constructors/destructors. > With "-mcdtors=gcc", which is currently the default, gcc will continue > to generate "gcc" format for constructors (ie _GLOBAL__I and _GLOBAL__D > symbols). > Ideally, it would have been better to enable the AIX format by default > instead of using collect2. However, the compatibility between the > previously-built binaries and the new ones is too complex to be done. > > gcc/ChangeLog: > 2021-10-04 Clément Chigot > > * collect2.c (aixbcdtors_flags): New variable. > (main): Use it to detect -bcdtors and remove -binitfini flag. > (write_c_file_stat): Adapt to new AIX format. > * config/rs6000/aix.h (FILE_SINIT_FORMAT): New define. > (FILE_STERM_FORMAT): New define. > (TARGET_FILE_FUNCTION_FORMAT): New define. > * config/rs6000/aix64.opt: Add -mcdtors flag. > * config/rs6000/aix71.h (LINK_SPEC_COMMON): Pass -bcdtors when > -mcdtors=aix is passed. > * config/rs6000/aix72.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/aix73.h (LINK_SPEC_COMMON): Likewise. > * config/rs6000/rs6000-opts.h (enum rs6000_cdtors): New enum. > * tree.c (get_file_function_name): Add > TARGET_FILE_FUNCTION_FORMAT support. > > gcc/testsuite/ChangeLog: > 2021-10-04 Clément Chigot > > * gcc.target/powerpc/constructor-aix.c: New test. > >
Re: [PATCH v2 0/4] libffi: Sync with upstream
Hi, H.J. My colleague built GCC, including GCC Go, with your patch: "I was able to build libgo and test it partially. The results are similar to the current master without libffi updates. But 64bit tests aren't working in both cases. It's related to LIBPATH issues..." - David On Mon, Oct 18, 2021 at 11:09 AM H.J. Lu wrote: > > On Mon, Oct 18, 2021 at 8:04 AM David Edelsohn wrote: > > > > Hi, H.J. > > > > My colleague responded that GCC Go builds and works on AIX, but it > > currently requires a special, custom version of GNU objcopy that adds > > support for the types of features that Go requires to operate on AIX > > XCOFF files. Those changes have not yet been updated and contributed > > to GNU Binutils. > > > > I will see if I can install that version of objcopy standalone. We > > also can ask Clement and ATOS to test GCC Go build with your proposed > > libffi patch, or is it vanilla libffi trunk? > > My libffi branch: > > https://gitlab.com/x86-gcc/gcc/-/tree/users/hjl/libffi/master > > synced with libffi v3.4.2, not master. > > BTW, the current master branch won't build libgo: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102796 > > Thanks. > > > Thanks, David > > > > > > On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu wrote: > > > > > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn wrote: > > > > > > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu wrote: > > > > > > > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn > > > > > wrote: > > > > > > > > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > > > > > > > > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn > > > > > > > wrote: > > > > > > > > > > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 > > > > > > > > > > > > > > and cherry-picked fixes > > > > > > > > > > > > > > from upstream over the last 10+ years. In the > > > > > > > > > > > > > > meantime, libffi upstream > > > > > > > > > > > > > > has been changed significantly with new features, > > > > > > > > > > > > > > bug fixes and new target > > > > > > > > > > > > > > support. Here is a set of patches to sync with > > > > > > > > > > > > > > libffi 3.4.2 release and > > > > > > > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > > > > > > > 2. Add scripts to help sync with up
Re: [PATCH v2] rs6000: Remove unspecs for vec_mrghl[bhw]
On Tue, Oct 12, 2021 at 9:50 PM Xionghu Luo wrote: > > Resend this patch. Previous discussion is: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html > > vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, > 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for vmrglb. > Remove UNSPEC_VMRGH_DIRECT/UNSPEC_VMRGL_DIRECT pattern as vec_select > + vec_concat as normal RTL. > > Tested pass on P8LE, P9LE and P8BE{m32}, ok for trunk? > > gcc/ChangeLog: > > * config/rs6000/altivec.md (*altivec_vmrghb_internal): Delete. > (altivec_vmrghb_direct): New. > (*altivec_vmrghh_internal): Delete. > (altivec_vmrghh_direct): New. > (*altivec_vmrghw_internal): Delete. > (altivec_vmrghw_direct_): New. > (altivec_vmrghw_direct): Delete. > (*altivec_vmrglb_internal): Delete. > (altivec_vmrglb_direct): New. > (*altivec_vmrglh_internal): Delete. > (altivec_vmrglh_direct): New. > (*altivec_vmrglw_internal): Delete. > (altivec_vmrglw_direct_): New. > (altivec_vmrglw_direct): Delete. > * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust. > * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md (vsx_xxmrghw_): Adjust. > (vsx_xxmrglw_): Adjust. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/builtins-1.c: Update instruction counts. This patch is okay. Thanks, David
Re: [PATCH v2 0/4] libffi: Sync with upstream
Hi, H.J. My colleague responded that GCC Go builds and works on AIX, but it currently requires a special, custom version of GNU objcopy that adds support for the types of features that Go requires to operate on AIX XCOFF files. Those changes have not yet been updated and contributed to GNU Binutils. I will see if I can install that version of objcopy standalone. We also can ask Clement and ATOS to test GCC Go build with your proposed libffi patch, or is it vanilla libffi trunk? Thanks, David On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu wrote: > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn wrote: > > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu wrote: > > > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn wrote: > > > > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > > > > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn > > > > > wrote: > > > > > > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu wrote: > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and > > > > > > > > > > > > cherry-picked fixes > > > > > > > > > > > > from upstream over the last 10+ years. In the > > > > > > > > > > > > meantime, libffi upstream > > > > > > > > > > > > has been changed significantly with new features, bug > > > > > > > > > > > > fixes and new target > > > > > > > > > > > > support. Here is a set of patches to sync with libffi > > > > > > > > > > > > 3.4.2 release and > > > > > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > > > > > 2. Add scripts to help sync with upstream. > > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. It > > > > > > > > > > > > is availale at > > > > > > > > > > > > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > > > > > > > > > > > > > How did you test this? It looks like libgo is the only > > > > > > > > > > > consumer of > > > > > > > > > > > libffi these days. > > > > > > > > > > > In particular go/libgo seems to be supported on almost > > > > > > > > > > > all targets besides > > > > > > > > > > > darwin/windows - did you test cross and canadian > > > > > > > > > > > configurations? > > > > > > > > > > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My > > > > > > > > > > understanding is
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Sat, Oct 16, 2021 at 3:59 PM H.J. Lu wrote: > > On Sat, Oct 16, 2021 at 12:53 PM David Edelsohn wrote: > > > > On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu wrote: > > > > > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn wrote: > > > > > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > > > > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn > > > > > wrote: > > > > > > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu wrote: > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and > > > > > > > > > > > > cherry-picked fixes > > > > > > > > > > > > from upstream over the last 10+ years. In the > > > > > > > > > > > > meantime, libffi upstream > > > > > > > > > > > > has been changed significantly with new features, bug > > > > > > > > > > > > fixes and new target > > > > > > > > > > > > support. Here is a set of patches to sync with libffi > > > > > > > > > > > > 3.4.2 release and > > > > > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > > > > > 2. Add scripts to help sync with upstream. > > > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. It > > > > > > > > > > > > is availale at > > > > > > > > > > > > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > > > > > > > > > > > > > How did you test this? It looks like libgo is the only > > > > > > > > > > > consumer of > > > > > > > > > > > libffi these days. > > > > > > > > > > > In particular go/libgo seems to be supported on almost > > > > > > > > > > > all targets besides > > > > > > > > > > > darwin/windows - did you test cross and canadian > > > > > > > > > > > configurations? > > > > > > > > > > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My > > > > > > > > > > understanding is that > > > > > > > > > > the upstream libffi works on Darwin and Windows. > > > > > > > > > > > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you > > > > > > > > > > > won't get any "review" > > > > > > > > > > > of this massive diff. > > > > > > > > > > > > &
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Sat, Oct 16, 2021 at 1:13 PM H.J. Lu wrote: > > On Sat, Oct 16, 2021 at 10:04 AM David Edelsohn wrote: > > > > On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > > > > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn wrote: > > > > > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu wrote: > > > > > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu wrote: > > > > > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and > > > > > > > > > > cherry-picked fixes > > > > > > > > > > from upstream over the last 10+ years. In the meantime, > > > > > > > > > > libffi upstream > > > > > > > > > > has been changed significantly with new features, bug fixes > > > > > > > > > > and new target > > > > > > > > > > support. Here is a set of patches to sync with libffi > > > > > > > > > > 3.4.2 release and > > > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > > > 2. Add scripts to help sync with upstream. > > > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. It is > > > > > > > > > > availale at > > > > > > > > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > > > > > > > > > How did you test this? It looks like libgo is the only > > > > > > > > > consumer of > > > > > > > > > libffi these days. > > > > > > > > > In particular go/libgo seems to be supported on almost all > > > > > > > > > targets besides > > > > > > > > > darwin/windows - did you test cross and canadian > > > > > > > > > configurations? > > > > > > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My > > > > > > > > understanding is that > > > > > > > > the upstream libffi works on Darwin and Windows. > > > > > > > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you won't > > > > > > > > > get any "review" > > > > > > > > > of this massive diff. > > > > > > > > > > > > > > > > I believe that it should just work. Our libffi is very much > > > > > > > > out of date. > > > > > > > > > > > > > > Yes, you can hope. And yes, our libffi is out of date. > > > > > > > > > > > > > > Can you please do the extra step to test one weird architecture, > > > > > > > namely > > > > > > > powerpc64-aix which is available on the compile-farm? > > > > > > > > > > > > I will give it a try and report back. > > > > > > > > > > > > > If that goes well I think it's good to "hope" at this point (and > > > > > > >
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Sat, Oct 16, 2021 at 7:48 AM H.J. Lu wrote: > > On Fri, Oct 15, 2021 at 5:22 PM David Edelsohn wrote: > > > > On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu wrote: > > > > > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu wrote: > > > > > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > > > wrote: > > > > > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu wrote: > > > > > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > > > wrote: > > > > > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu > > > > > > > wrote: > > > > > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and > > > > > > > > cherry-picked fixes > > > > > > > > from upstream over the last 10+ years. In the meantime, libffi > > > > > > > > upstream > > > > > > > > has been changed significantly with new features, bug fixes and > > > > > > > > new target > > > > > > > > support. Here is a set of patches to sync with libffi 3.4.2 > > > > > > > > release and > > > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > > > 2. Add scripts to help sync with upstream. > > > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. It is > > > > > > > > availale at > > > > > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > > > > > How did you test this? It looks like libgo is the only consumer > > > > > > > of > > > > > > > libffi these days. > > > > > > > In particular go/libgo seems to be supported on almost all > > > > > > > targets besides > > > > > > > darwin/windows - did you test cross and canadian configurations? > > > > > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My understanding > > > > > > is that > > > > > > the upstream libffi works on Darwin and Windows. > > > > > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you won't get > > > > > > > any "review" > > > > > > > of this massive diff. > > > > > > > > > > > > I believe that it should just work. Our libffi is very much out of > > > > > > date. > > > > > > > > > > Yes, you can hope. And yes, our libffi is out of date. > > > > > > > > > > Can you please do the extra step to test one weird architecture, > > > > > namely > > > > > powerpc64-aix which is available on the compile-farm? > > > > > > > > I will give it a try and report back. > > > > > > > > > If that goes well I think it's good to "hope" at this point (and > > > > > plenty of > > > > > time to fix fallout until the GCC 12 release). > > > > > > > > > > Thus OK after the extra testing dance and waiting until early next > > > > > week so others can throw in a veto. > > > > > > I tried to bootstrap GCC master branch on gcc119.fsffrance.org: > > > > > > * MT/MODEL: 8284-22A > > > * > > > * Partition: gcc119 > > > * > > > *System: power8-aix.osuosl.org > > > * > > > * O/S: AIX V7.2 7200-04-03-2038 > > > > > > I configured GCC with > > > > > > --with-as=/usr/bin/as --with-ld=/usr/bin/ld > > > -
Re: [PATCH v2 0/4] libffi: Sync with upstream
On Fri, Oct 15, 2021 at 8:06 PM H.J. Lu wrote: > > On Wed, Oct 13, 2021 at 6:42 AM H.J. Lu wrote: > > > > On Wed, Oct 13, 2021 at 6:03 AM Richard Biener > > wrote: > > > > > > On Wed, Oct 13, 2021 at 2:56 PM H.J. Lu wrote: > > > > > > > > On Wed, Oct 13, 2021 at 5:45 AM Richard Biener > > > > wrote: > > > > > > > > > > On Thu, Sep 2, 2021 at 5:50 PM H.J. Lu wrote: > > > > > > > > > > > > Change in the v2 patch: > > > > > > > > > > > > 1. Disable static trampolines by default. > > > > > > > > > > > > > > > > > > GCC maintained a copy of libffi snapshot from 2009 and > > > > > > cherry-picked fixes > > > > > > from upstream over the last 10+ years. In the meantime, libffi > > > > > > upstream > > > > > > has been changed significantly with new features, bug fixes and new > > > > > > target > > > > > > support. Here is a set of patches to sync with libffi 3.4.2 > > > > > > release and > > > > > > make it easier to sync with libffi upstream: > > > > > > > > > > > > 1. Document how to sync with upstream. > > > > > > 2. Add scripts to help sync with upstream. > > > > > > 3. Sync with libffi 3.4.2. This patch is quite big. It is availale > > > > > > at > > > > > > > > > > > > https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f > > > > > > 4. Integrate libffi build and testsuite with GCC. > > > > > > > > > > How did you test this? It looks like libgo is the only consumer of > > > > > libffi these days. > > > > > In particular go/libgo seems to be supported on almost all targets > > > > > besides > > > > > darwin/windows - did you test cross and canadian configurations? > > > > > > > > I only tested it on Linux/i686 and Linux/x86-64. My understanding is > > > > that > > > > the upstream libffi works on Darwin and Windows. > > > > > > > > > I applaud the attempt to sync to upsteam but I fear you won't get any > > > > > "review" > > > > > of this massive diff. > > > > > > > > I believe that it should just work. Our libffi is very much out of > > > > date. > > > > > > Yes, you can hope. And yes, our libffi is out of date. > > > > > > Can you please do the extra step to test one weird architecture, namely > > > powerpc64-aix which is available on the compile-farm? > > > > I will give it a try and report back. > > > > > If that goes well I think it's good to "hope" at this point (and plenty of > > > time to fix fallout until the GCC 12 release). > > > > > > Thus OK after the extra testing dance and waiting until early next > > > week so others can throw in a veto. > > I tried to bootstrap GCC master branch on gcc119.fsffrance.org: > > * MT/MODEL: 8284-22A > * > * Partition: gcc119 > * > *System: power8-aix.osuosl.org > * > * O/S: AIX V7.2 7200-04-03-2038 > > I configured GCC with > > --with-as=/usr/bin/as --with-ld=/usr/bin/ld > --enable-version-specific-runtime-libs --disable-nls > --enable-decimal-float=dpd --disable-libstdcxx-pch --disable-werror > --enable-__cxa_atexit --with-gmp=/opt/cfarm --with-mpfr=/opt/cfarm > --with-mpc=/opt/cfarm --with-isl=/opt/cfarm --prefix=/opt/freeware > --with-local-prefix=/opt/freeware --enable-languages=c,c++,go > > I got > > g++ -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables > -W > -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format > -Wmissing-format-at > tribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros > -Wno- > overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE > -static-libstd > c++ -static-libgcc -Wl,-bbigtoc -Wl,-bmaxdata:0x4000 -o build/genenums \ > build/genenums.o build/read-md.o build/errors.o > ../build-powerpc-ibm-aix7.2. > 4.0/libiberty/libiberty.a > ld: 0711-317 ERROR: Undefined symbol: lexer_line > ld: 0711-317 ERROR: Undefined symbol: .yylex(char const**) > ld: 0711-317 ERROR: Undefined symbol: .yybegin(char const*) > ld: 0711-317 ERROR: Undefined symbol: lexer_toplevel_done > ld: 0711-317 ERROR: Undefined symbol: .yyend() > ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information. > collect2: error: ld returned 8 exit status > Makefile:3000: recipe for target 'build/gengtype' failed > gmake[5]: *** [build/gengtype] Error 1 > > David, is there an instruction to bootstrap GCC on AIX? The CompileFarm page in the GCC wiki has instructions under "build tips": https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines The error that you show might be due to not having /opt/freeware/bin first in your path and the bootstrap used the AIX version of lex or sed or some other command. Thanks, David
Re: PATCH, rs6000] Optimization for vec_xl_sext
On Thu, Oct 14, 2021 at 2:17 AM HAO CHEN GUI wrote: > > Hi, > >The patch optimizes the code generation for vec_xl_sext builtin. Now all > the sign extensions are done on VSX registers directly. > >Bootstrapped and tested on powerpc64le-linux with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. > >I refined the patch according to Bill and David's advice. I put the > patch.diff and ChangeLog in attachment also in case the indentation doesn't > show correctly in email body. > > > ChangeLog > > 2021-10-11 Haochen Gui > > > gcc/ > > * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin): > > Modify the expansion for sign extension. All extensions are done > > within VSX registers. > > > gcc/testsuite/ > > * gcc.target/powerpc/p10_vec_xl_sext.c: New test. This is okay. Thanks, David
Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.
On Thu, Oct 14, 2021 at 10:10 AM CHIGOT, CLEMENT wrote: > > Hi David, > > The fact that csect .data is referencing csect .text doesn't mean that > if .text is kept, .data is kept too. It means the opposite. if .data is kept > then .text must be kept. Yes, we are in agreement about the purpose of the other reference between text and data. The __tls_get_addr reference is an effort to artificially elicit an error if pthread is not linked in a program that uses TLS. It is not truly necessary for correct TLS code generation from GCC. Your patch moves the __tls_get_addr referenced from the data section to the text section. As we agree, the logic of the other code implies that the data section is used to pull in the text section, so moving __tls_get_addr to the text section seems more fragile. It's a game of which section will be preserved to ensure that __tls_get_addr is referenced to ensure that an artificial error is generated. And it's possible that neither text nor data sections will be referenced if fine-grained CSECTs are used. There is no way to make this logic perfect. Thanks, David > > That's actually what is being done by the linker with the TLS support test > in configure. > $ cat test.c > __thread int a; int b; int main() { return a = b; } > > With ".ref __tls_get_addr" in .data: > $ gcc -maix64 test.c -S -o test.s > $ cat test.s > ... > _section_.text: > .csect .data[RW],4 > .llong _section_.text > .extern __tls_get_addr > .ref __tls_get_addr > $ gcc -maix64 test.s -o test > $ dump -X64 -tv test > ... > [142] m 0x0097 debug 0FILEC:PPC64 test.c > [143] m 0x16c0 .text 1 unamex.text > [144] a4 0x005c 00 SD PR-- > [147] m 0x20001298 .bss 1 externb > [148] a4 0x0004 00 CM BS-- > [149] m 0x8800 .tbss 1 externa > [150] a4 0x0004 00 CM UL-- > ... > > Csect .data is garbage-collected by the linker. Thus the .ref doesn't matter. > > With ".ref __tls_get_addr" in .text: > $ cat test.s > _section_.text: > .csect .data[RW],4 > .llong _section_.text > .csect .text[PR],5 > .extern __tls_get_addr > .ref __tls_get_addr > $ gcc -maix64 test.s -o test > ld: 0711-317 ERROR: Undefined symbol: __tls_get_addr > ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information. > collect2: error: ld returned 8 exit status > > As csect .text is kept (because of main function), the .ref is still there > and the error > is raise correctly. As "-pthread" isn't passed, __tls_get_addr is not > available. > > However, writing this mail, I'm wondering if we don't want to always keep both > csects. If .data is kept, then .text is and if .text is kept, then .data is. > Or always keeping .data would have too much side effects ? > > Thanks, > Clément > > > From: David Edelsohn > Sent: Thursday, October 14, 2021 3:42 PM > To: CHIGOT, CLEMENT > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] aix: ensure reference to __tls_get_addr is in text > section. > > Caution! External email. Do not open attachments or click links, unless this > email comes from a known sender and you know the content is safe. > > The reference to __tls_get_addr is in the data section. And the code > just above creates a symbol in the text section referenced from the > data section to ensure the text section is retained. So this change > doesn't make sense. You're essentially saying that the data section > is not used, which makes the other code useless to ensure that the > text section is referenced. > > Thanks, David > > On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT > wrote: > > > > The garbage collector of AIX linker might remove the reference to > > __tls_get_addr if it's added inside an unused csect. > > > > > > Clément Chigot > > ATOS Bull SAS > > 1 rue de Provence - 38432 Échirolles - France > >
Re: [PATCH] rs6000: Fix memory leak in rs6000_density_test
On Thu, Oct 14, 2021 at 9:00 AM Richard Sandiford wrote: > > rs6000_density_test has an early exit test between a call > to get_loop_body and the corresponding free. This would > lead to a memory leak if the early exit is taken. > > Tested on powerpc64le-linux-gnu. It's obvious that moving the > test avoids the leak, but there are multiple ways to write it, > so: OK to install? Thanks for noticing this and creating a patch. This is okay. Thanks, David > > Richard > > > gcc/ > * config/rs6000/rs6000.c (rs6000_density_test): Move early > exit test further up the function. > --- > gcc/config/rs6000/rs6000.c | 17 - > 1 file changed, 8 insertions(+), 9 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index acba4d9f26c..01a95591a5d 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -5289,20 +5289,19 @@ struct rs6000_cost_data > static void > rs6000_density_test (rs6000_cost_data *data) > { > - struct loop *loop = data->loop_info; > - basic_block *bbs = get_loop_body (loop); > - int nbbs = loop->num_nodes; > - loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info); > - int vec_cost = data->cost[vect_body], not_vec_cost = 0; > - int i, density_pct; > - >/* This density test only cares about the cost of vector version of the > loop, so immediately return if we are passed costing for the scalar > version (namely computing single scalar iteration cost). */ >if (data->costing_for_scalar) > return; > > - for (i = 0; i < nbbs; i++) > + struct loop *loop = data->loop_info; > + basic_block *bbs = get_loop_body (loop); > + int nbbs = loop->num_nodes; > + loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info); > + int vec_cost = data->cost[vect_body], not_vec_cost = 0; > + > + for (int i = 0; i < nbbs; i++) > { >basic_block bb = bbs[i]; >gimple_stmt_iterator gsi; > @@ -5322,7 +5321,7 @@ rs6000_density_test (rs6000_cost_data *data) > } > >free (bbs); > - density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost); > + int density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost); > >if (density_pct > rs6000_density_pct_threshold >&& vec_cost + not_vec_cost > rs6000_density_size_threshold)
Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.
The reference to __tls_get_addr is in the data section. And the code just above creates a symbol in the text section referenced from the data section to ensure the text section is retained. So this change doesn't make sense. You're essentially saying that the data section is not used, which makes the other code useless to ensure that the text section is referenced. Thanks, David On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT wrote: > > The garbage collector of AIX linker might remove the reference to > __tls_get_addr if it's added inside an unused csect. > > > Clément Chigot > ATOS Bull SAS > 1 rue de Provence - 38432 Échirolles - France >
Re: [PATCH, rs6000] Optimization for vec_xl_sext
>> gcc/ >> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin): >> Modify the expansion for sign extension. All extentions are done >> within VSX resgisters. > > Two typos here: extentions => extensions, resgisters => registers. This is okay with Bill's comments addressed. Thanks, David
Re: [PATCH, rs6000] punish reload of lfiwzx when loading an int variable [PR102169, PR102146]
> The patch punishes reload of alternative pair of "d, Z" for > movsi_internal1. The reload occurs if 'Z' doesn't match and generates an > additional insn. So the memory reload should be punished. > > Bootstrapped and tested on powerpc64le-linux with no regressions. Is this > okay for trunk? Any recommendations? Thanks a lot. > > > ChangeLog > > 2021-09-29 Haochen Gui > > gcc/ > * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages > slightly the alternative 'Z' of "lfiwzx" when reload is needed. Capitalize "D" of disparages. Should this disparage "stfiwzx" also? Carl Love saw poor code generation at -O0 for a trivial example where a stack store moved the value to an FPR and used stfiwx. Thanks, David
Re: [PATCH, rs6000] Disable gimple fold for float or double vec_minmax when fast-math is not set
2021-08-25 Haochen Gui gcc/ * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Modify the VSX_BUILTIN_XVMINDP, ALTIVEC_BUILTIN_VMINFP, VSX_BUILTIN_XVMAXDP, ALTIVEC_BUILTIN_VMAXFP expansions. Please write something more than "modify". The ChangeLog should be more like the email subject line for this patch. gcc/testsuite/ * gcc.target/powerpc/vec-minmax-1.c: New test. * gcc.target/powerpc/vec-minmax-2.c: Likewise. Please ensure that the indentation is correct for the case statements; it was unclear from the text pasted into the email. Okay with those clarifications. Thanks, David
Re: Ping ^ 2: [PATCH] rs6000: Remove unspecs for vec_mrghl[bhw]
Hi, Xionghu What's the status of the \M and \m testcase beautification requested by Segher? Did you send an updated patch? Your messages ping the version prior to Segher's additional comments. It seems that the changes to the patterns are complete, but there are remaining questions about the testcase style and if the instruction counts are ideal. I trust that the instruction counts match the behavior after the patch, but it seemed that Segher wanted to confirm that the counts are the values desired / expected from optimal code generation. The counts are the total for the file, which doesn't communicate if the sequences themselves are optimal. Thanks, David On Sun, Sep 5, 2021 at 8:54 PM Xionghu Luo wrote: > > Ping^2, thanks. > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html > > > On 2021/6/30 09:47, Xionghu Luo via Gcc-patches wrote: > > Gentle ping, thanks. > > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572330.html > > > > > > On 2021/6/9 16:03, Xionghu Luo via Gcc-patches wrote: > >> Hi, > >> > >> On 2021/6/9 07:25, Segher Boessenkool wrote: > >>> On Mon, May 24, 2021 at 04:02:13AM -0500, Xionghu Luo wrote: > vmrghb only accepts permute index {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, > 5, 21, 6, 22, 7, 23} no matter for BE or LE in ISA, similarly for > vmrghlb. > >>> > >>> (vmrglb) > >>> > + if (BYTES_BIG_ENDIAN) > +emit_insn ( > + gen_altivec_vmrghb_direct (operands[0], operands[1], > operands[2])); > + else > +emit_insn ( > + gen_altivec_vmrglb_direct (operands[0], operands[2], > operands[1])); > >>> > >>> Please don't indent like that, it doesn't match what we do elsewhere. > >>> For better or for worse (for worse imo), we use deep hanging indents. > >>> If you have to, you can do something like > >>> > >>>rtx insn; > >>>if (BYTES_BIG_ENDIAN) > >>> insn = gen_altivec_vmrghb_direct (operands[0], operands[1], > >>> operands[2]); > >>>else > >>> insn = gen_altivec_vmrglb_direct (operands[0], operands[2], > >>> operands[1]); > >>>emit_insn (insn); > >>> > >>> (this is better even, in that it has only one emit_insn), or even > >>> > >>>rtx (*fun) () = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > >>>: gen_altivec_vmrglb_direct; > >>>if (!BYTES_BIG_ENDIAN) > >>> std::swap (operands[1], operands[2]); > >>>emit_insn (fun (operands[0], operands[1], operands[2])); > >>> > >>> Well, C++ does not allow that last example like that, sigh, so > >>>rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? > >>> gen_altivec_vmrghb_direct > >>> : gen_altivec_vmrglb_direct; > >>> > >>> This is shorter than the other two options ;-) > >> > >> Changed. > >> > >>> > +(define_insn "altivec_vmrghb_direct" > [(set (match_operand:V16QI 0 "register_operand" "=v") > +(vec_select:V16QI > >>> > >>> This should be indented one space more. > >>> > "TARGET_ALTIVEC" > "@ > - xxmrghw %x0,%x1,%x2 > - vmrghw %0,%1,%2" > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > >>> > >>> The original indent was correct, please restore. > >>> > - emit_insn (gen_altivec_vmrghw_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, > vo)); > >>> > >>> When you see a mode as part of a pattern name, chances are that it will > >>> be a good candidate for using parameterized names with. (But don't do > >>> that now, just keep it in mind as a nice cleanup to do). > >> > >> OK. > >> > >>> > @@ -23022,8 +23022,8 @@ altivec_expand_vec_perm_const (rtx target, > rtx op0, rtx op1, > : CODE_FOR_altivec_vmrglh_direct), > { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, > 22, 23 } }, > { OPTION_MASK_ALTIVEC, > - (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct > - : CODE_FOR_altivec_vmrglw_direct), > + (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > + : CODE_FOR_altivec_vmrglw_direct_v4si), > >>> > >>> The correct way is to align the ? and the : (or put everything on one > >>> line of course, if that fits) > >>> > >>> The parens around this are not needed btw, and are a distraction. > >> > >> Changed. > >> > >>> > --- a/gcc/testsuite/gcc.target/powerpc/builtins-1.c > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1.c > @@ -317,10 +317,10 @@ int main () > /* { dg-final { scan-assembler-times "vctuxs" 2 } } */ > /* { dg-final { scan-assembler-times "vmrghb" 4 { target be } } } */ > -/* { dg-final { scan-assembler-times "vmrghb" 5 { target le } } } */ > +/* { dg-final { scan-assembler-times "vmrghb" 6 { target le } } } */ > /* { dg-final { scan-assembler-times "vmrghh" 8 } } */ > -/* { dg-final { scan-assembler-times "xxmrghw" 8 } } */ > -/* { dg-final { scan-assembler-times "xxmrglw"
Re: [PATCH] rs6000: Fix vec_cpsgn parameter order (PR101985)
On Fri, Sep 24, 2021 at 11:20 AM Bill Schmidt wrote: > > Hi! > > This fixes a bug reported in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101985. > > The vec_cpsgn built-in function API differs in argument order from the > copysign3 convention. Currently that pattern is incorrectly used to > implement vec_cpsgn. Fix that while leaving the existing pattern in place > to implement copysignf for vector modes. It's a little confusing what "that" is. Maybe clarify that the patch is changing the PowerPC VSX function to invoke the GCC built-in with the argument in the correct order. > > Part of the fix when using the new built-in support requires an adjustment > to a pending patch that replaces much of altivec.h with an automatically > generated file. So that adjustment will be coming later... > > Also fix a bug in the new built-in overload infrastructure where we were > using the VSX form of the VEC_COPYSIGN built-in when we should default to > the VMX form. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > Is this okay for trunk? > > Thanks! > Bill > > > 2021-09-24 Bill Schmidt > > gcc/ > PR target/101985 > * config/rs6000/altivec.h (vec_cpsgn): Adjust. Maybe a little more information than "Adjust"? Swap arguments? > * config/rs6000/rs6000-overload.def (VEC_COPYSIGN): Use SKIP to > avoid generating an automatic #define of vec_cpsgn. Use the > correct built-in for V4SFmode that doesn't depend on VSX. > > gcc/testsuite/ > PR target/101985 > * gcc.target/powerpc/pr101985.c: New. Okay. Thanks, David
Re: [PATCH][OBVIOUS] rs6000: fix symtab_node::get == NULL issue
This needs an additional adjustment. The encoding decoration needs to be applied if the decl isn't an alias. That means both a null summary *OR* the decl is not explicitly an alias. I'm proposing the following: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d0830a95027..ad81dfb316d 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -21728,8 +21728,8 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, in t first) if (decl && DECL_P (decl) && VAR_OR_FUNCTION_DECL_P (decl) - && symtab_node::get (decl) != NULL - && symtab_node::get (decl)->alias == 0 + && (symtab_node::get (decl) == NULL + || symtab_node::get (decl)->alias == 0) && symname[strlen (symname) - 1] != ']') { const char *smclass = NULL; On Wed, Sep 15, 2021 at 11:21 AM Martin Liška wrote: > > Hello. > > The patch is approved by David and fixes the issue described in the PR. > > Martin > > PR target/102349 > > gcc/ChangeLog: > > * config/rs6000/rs6000.c (rs6000_xcoff_encode_section_info): > Check that we have a symbol summary for a symbol. > --- > gcc/config/rs6000/rs6000.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index b0ec8108007..d0830a95027 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -21728,6 +21728,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx rtl, > int first) > if (decl > && DECL_P (decl) > && VAR_OR_FUNCTION_DECL_P (decl) > + && symtab_node::get (decl) != NULL > && symtab_node::get (decl)->alias == 0 > && symname[strlen (symname) - 1] != ']') > { > -- > 2.33.0 >