PING [PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc
On Tue, 2022-09-27 at 02:23 +0200, Ilya Leoshkevich wrote: > Hi, > > This is a resend of v4 with slightly adjusted commit messages: > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html > v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html > v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html > v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html > > It still survives the bootstrap and the regtest on x86_64-redhat- > linux, > s390x-redhat-linux and ppc64le-redhat-linux. It also fixes [1]. > > I also tried the approach with moving .LASANPC closer to the function > label and using FUNCTION_BOUNDARY instead of introducing > CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch > the moment where the function label is written. Architectures can do > it by calling ASM_OUTPUT_LABEL() or assemble_name() in > ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or > TARGET_ASM_FUNCTION_PROLOGUE(). epiphany_start_function() does that > twice, but passes the same decl to both calls. Note that simply > moving asan_function_start() to final_start_function_1() is not > enough, > since an architecture can write something after the function label. > This all means that for this approach to work, all the architectures > need to be adjusted, which looks like an overkill to me. > > Best regards, > Ilya > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html > > > Ilya Leoshkevich (2): > asan: specify alignment for LASANPC labels > IBM zSystems: Define CODE_LABEL_BOUNDARY > > gcc/asan.cc | 1 + > gcc/config/s390/s390.h | 3 +++ > gcc/defaults.h | 5 + > gcc/doc/tm.texi | 4 > gcc/doc/tm.texi.in | 4 > gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ > 6 files changed, 32 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c >
[PATCH v5 2/2] IBM zSystems: Define CODE_LABEL_BOUNDARY
Currently s390 emits the following sequence to store a frame_pc: a: .LASANPC0: lg %r1,.L5-.L4(%r13) la %r1,0(%r1,%r12) stg %r1,176(%r11) .L5: .quad .LASANPC0@GOTOFF The reason GOT indirection is used instead of larl is that gcc does not know that .LASANPC0, being a code label, is aligned on a 2-byte boundary, and larl can load only even addresses. Define CODE_LABEL_BOUNDARY in order to get rid of GOT indirection: larl%r1,.LASANPC0 stg %r1,176(%r11) gcc/ChangeLog: 2020-06-30 Ilya Leoshkevich * config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390 requires code labels to be aligned on a 2-byte boundary. gcc/testsuite/ChangeLog: 2019-06-30 Ilya Leoshkevich * gcc.target/s390/asan-no-gotoff.c: New test. --- gcc/config/s390/s390.h | 3 +++ gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index be566215df2..7d078ce6868 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -368,6 +368,9 @@ extern const char *s390_host_detect_local_cpu (int argc, const char **argv); /* Allocation boundary (in *bits*) for the code of a function. */ #define FUNCTION_BOUNDARY 64 +/* Alignment required for a code label, in bits. */ +#define CODE_LABEL_BOUNDARY 16 + /* There is no point aligning anything to a rounder boundary than this. */ #define BIGGEST_ALIGNMENT 64 diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c new file mode 100644 index 000..f555e4e96f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c @@ -0,0 +1,15 @@ +/* Test that ASAN labels are referenced without unnecessary indirections. */ + +/* { dg-do compile } */ +/* { dg-options "-fPIE -O2 -fsanitize=kernel-address --param asan-stack=1" } */ + +extern void c (int *); + +void a () +{ + int b; + c (&b); +} + +/* { dg-final { scan-assembler {\tlarl\t%r\d+,\.LASANPC\d+} } } */ +/* { dg-final { scan-assembler-not {\.LASANPC\d+@GOTOFF} } } */ -- 2.37.2
[PATCH v5 1/2] asan: specify alignment for LASANPC labels
gcc/ChangeLog: 2020-06-30 Ilya Leoshkevich * asan.cc (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY. * defaults.h (CODE_LABEL_BOUNDARY): New macro. * doc/tm.texi: Document CODE_LABEL_BOUNDARY. * doc/tm.texi.in: Likewise. --- gcc/asan.cc| 1 + gcc/defaults.h | 5 + gcc/doc/tm.texi| 4 gcc/doc/tm.texi.in | 4 4 files changed, 14 insertions(+) diff --git a/gcc/asan.cc b/gcc/asan.cc index 8276f12cc69..62f50ee769b 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -1960,6 +1960,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, DECL_INITIAL (decl) = decl; TREE_ASM_WRITTEN (decl) = 1; TREE_ASM_WRITTEN (id) = 1; + SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY); emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); shadow_base = expand_binop (Pmode, lshr_optab, base, gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT), diff --git a/gcc/defaults.h b/gcc/defaults.h index 953605c1627..52a471cf08e 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1455,4 +1455,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see typedef TARGET_UNIT target_unit; #endif +/* Alignment required for a code label, in bits. */ +#ifndef CODE_LABEL_BOUNDARY +#define CODE_LABEL_BOUNDARY BITS_PER_UNIT +#endif + #endif /* ! GCC_DEFAULTS_H */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 858bfb80cec..cc588ee23b5 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -1075,6 +1075,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}. Alignment required for a function entry point, in bits. @end defmac +@defmac CODE_LABEL_BOUNDARY +Alignment required for a code label, in bits. +@end defmac + @defmac BIGGEST_ALIGNMENT Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 21b849ea32a..a0b725b0685 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -971,6 +971,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}. Alignment required for a function entry point, in bits. @end defmac +@defmac CODE_LABEL_BOUNDARY +Alignment required for a code label, in bits. +@end defmac + @defmac BIGGEST_ALIGNMENT Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, -- 2.37.2
[PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc
Hi, This is a resend of v4 with slightly adjusted commit messages: v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html It still survives the bootstrap and the regtest on x86_64-redhat-linux, s390x-redhat-linux and ppc64le-redhat-linux. It also fixes [1]. I also tried the approach with moving .LASANPC closer to the function label and using FUNCTION_BOUNDARY instead of introducing CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch the moment where the function label is written. Architectures can do it by calling ASM_OUTPUT_LABEL() or assemble_name() in ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or TARGET_ASM_FUNCTION_PROLOGUE(). epiphany_start_function() does that twice, but passes the same decl to both calls. Note that simply moving asan_function_start() to final_start_function_1() is not enough, since an architecture can write something after the function label. This all means that for this approach to work, all the architectures need to be adjusted, which looks like an overkill to me. Best regards, Ilya [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html Ilya Leoshkevich (2): asan: specify alignment for LASANPC labels IBM zSystems: Define CODE_LABEL_BOUNDARY gcc/asan.cc| 1 + gcc/config/s390/s390.h | 3 +++ gcc/defaults.h | 5 + gcc/doc/tm.texi| 4 gcc/doc/tm.texi.in | 4 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ 6 files changed, 32 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c -- 2.37.2
Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
On Thu, 2022-08-11 at 07:45 +0200, Andreas Krebbel wrote: > On 8/10/22 13:42, Ilya Leoshkevich wrote: > > On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote: > > > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > > > > > > > > > > > dg.exp=pr104612.c fails with an ICE on s390x, because > > > copysignv2sf3 > > > produces an insn that vsel is supposed to recognize, but > > > can't, > > > because it's not defined for V2SF. Fix by defining it for all > > > vector > > > modes supported by copysign3. > > > > > > gcc/ChangeLog: > > > > > > * config/s390/vector.md (V_HW_FT): New iterator. > > > * config/s390/vx-builtins.md (vsel): Use V instead > > > of > > > V_HW. > > > --- > > > gcc/config/s390/vector.md | 6 ++ > > > gcc/config/s390/vx-builtins.md | 12 ++-- > > > 2 files changed, 12 insertions(+), 6 deletions(-) > > > > Jakub pointed out that this is broken in gcc-12 as well. > > The patch applies cleanly, and I started a bootstrap/regtest. > > Ok for gcc-12? > > Yes. Thanks! > > Andreas Hi, I've committed this today without realizing that gcc-12 branch is closed. Sorry! Please let me know if I should revert this. Best regards, Ilya
Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote: > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > > > dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 > produces an insn that vsel is supposed to recognize, but can't, > because it's not defined for V2SF. Fix by defining it for all vector > modes supported by copysign3. > > gcc/ChangeLog: > > * config/s390/vector.md (V_HW_FT): New iterator. > * config/s390/vx-builtins.md (vsel): Use V instead of > V_HW. > --- > gcc/config/s390/vector.md | 6 ++ > gcc/config/s390/vx-builtins.md | 12 ++-- > 2 files changed, 12 insertions(+), 6 deletions(-) Jakub pointed out that this is broken in gcc-12 as well. The patch applies cleanly, and I started a bootstrap/regtest. Ok for gcc-12?
[PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 produces an insn that vsel is supposed to recognize, but can't, because it's not defined for V2SF. Fix by defining it for all vector modes supported by copysign3. gcc/ChangeLog: * config/s390/vector.md (V_HW_FT): New iterator. * config/s390/vx-builtins.md (vsel): Use V instead of V_HW. --- gcc/config/s390/vector.md | 6 ++ gcc/config/s390/vx-builtins.md | 12 ++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index a6c4b4eb974..624729814af 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -63,6 +63,12 @@ V1DF V2DF (V1TF "TARGET_VXE") (TF "TARGET_VXE")]) +; All modes present in V_HW and VFT. +(define_mode_iterator V_HW_FT [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V1DF + V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE") + (V4SF "TARGET_VXE") (V1TF "TARGET_VXE") + (TF "TARGET_VXE")]) + ; FP vector modes directly supported by the HW. This does not include ; vector modes using only part of a vector register and should be used ; for instructions which might trigger IEEE exceptions. diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index d5130799804..98ee08b2683 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -517,12 +517,12 @@ ; swapped in s390-c.cc when we get here. (define_insn "vsel" - [(set (match_operand:V_HW 0 "register_operand" "=v") - (ior:V_HW -(and:V_HW (match_operand:V_HW 1 "register_operand" "v") - (match_operand:V_HW 3 "register_operand" "v")) -(and:V_HW (not:V_HW (match_dup 3)) - (match_operand:V_HW 2 "register_operand" "v"] + [(set (match_operand:V_HW_FT 0 "register_operand" "=v") + (ior:V_HW_FT +(and:V_HW_FT (match_operand:V_HW_FT 1 "register_operand" "v") + (match_operand:V_HW_FT 3 "register_operand" "v")) +(and:V_HW_FT (not:V_HW_FT (match_dup 3)) + (match_operand:V_HW_FT 2 "register_operand" "v"] "TARGET_VX" "vsel\t%v0,%1,%2,%3" [(set_attr "op_type" "VRR")]) -- 2.35.3
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Fri, 2022-04-29 at 13:56 +0200, Jakub Jelinek wrote: > On Fri, Apr 29, 2022 at 01:52:49PM +0200, Ilya Leoshkevich wrote: > > > This doesn't resolve the problem, unfortunately, because > > > references to discarded comdat symbols are still kept in .rodata: > > > > > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced > > > in > > > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined > > > in > > > discarded section > > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15 > > > Asse > > > rt > > > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) > > > > > > (That's from building zlib-ng with ASan and your patch on s390). > > > > > > So I was rather thinking about adding a reloc parameter to > > > mergeable_constant_section () and slightly changing the section > > > name when it's nonzero, e.g. from .cst to .cstrel. > > > > After some experimenting, I don't think that what I propose here > > is a good solution anymore, since it won't work with > > -fno-merge-constants. > > > > What do you think about something like this? > > > > --- a/gcc/varasm.cc > > +++ b/gcc/varasm.cc > > @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode > > mode, rtx x, > > return get_named_section (NULL, ".data.rel.ro", 3); > > } > > > > + if (reloc) > > + return targetm.asm_out.function_rodata_section > > (current_function_decl, > > + false); > > + > > return mergeable_constant_section (mode, align, 0); > > } > > > > This would put constants with relocations into .rodata.. > > default_function_rodata_section () already ensures that these > > sections > > are in the right comdat group. > > We don't really know if the emitted constant is purely for the > current > function, or also other functions (say emitted in as constant pool > constant > where constant pool constants are shared across the whole TU). > For the former, putting it into current function's comdat is fine, > for the > latter certainly isn't. mergeable_constant_section (), that the existing code calls in the same context, already relies on this being known and calls function_rodata_section () with exactly the same arguments. If !current_function_decl && !relocatable, we get readonly_data_section. Of course, mergeable_constant_section () does not handle comdat currently, so this point might be moot. However, looking at the callers of output_constant_pool_contents (), it seems that !current_function_decl happens in and only in the shared_constant_pool case, so it looks as if we know whether the constant is tied to a single function or not.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Thu, 2022-04-28 at 14:05 +0200, Ilya Leoshkevich wrote: > On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote: > > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote: > > > This is determined by default_elf_select_rtx_section (). If we > > > don't > > > want to mix non-reloc and reloc constants, we need to define a > > > special > > > section there. > > > > > > It seems to me, however, that this all would be made purely for > > > the > > > sake of .LASANPC, which is quite special: it's local, but at the > > > same > > > time it might need to be comdat. I don't think anything like > > > this > > > can > > > appear from compiling C/C++ code. > > > > > > Therefore I wonder if we could just drop it altogether like this? > > > > > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx > > > pbase, > > > unsigned int alignb, > > > ... > > > - emit_move_insn (mem, expand_normal (build_fold_addr_expr > > > (decl))); > > > + emit_move_insn (mem, expand_normal (build_fold_addr_expr > > > (current_function_decl))); > > > ... > > > > > > That's what LLVM is already doing. This will also solve the > > > alignment > > > problem I referred to earlier. > > > > LLVM is doing a wrong thing here. The global symbol can be > > overridden by > > a symbol in another shared library, that is definitely not what we > > want, > > because the ASAN record is for the particular implementation, not > > the > > other > > one which could be quite different. > > I see; this must be relevant when the overriding library calls > the original one through dlsym (RTLD_NEXT). > > > I think the right fix would be: > > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100 > > +++ gcc/varasm.cc 2022-04-28 13:22:44.463147066 +0200 > > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_ > > return get_named_section (NULL, ".data.rel.ro", 3); > > } > > > > + if (reloc) > > + return readonly_data_section; > > + > > return mergeable_constant_section (mode, align, 0); > > } > > > > which matches what we do in categorize_decl_for_section: > > else if (reloc & targetm.asm_out.reloc_rw_mask ()) > > ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : > > SECCAT_DATA_REL_RO; > > else if (reloc || flag_merge_constants < 2 > > ... > > /* C and C++ don't allow different variables to share the > > same > > location. -fmerge-all-constants allows even that (at > > the > > expense of not conforming). */ > > ret = SECCAT_RODATA; > > else if (DECL_INITIAL (decl) > > && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) > > ret = SECCAT_RODATA_MERGE_STR_INIT; > > else > > ret = SECCAT_RODATA_MERGE_CONST; > > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and > > .rodata > > for non-pic, and mergeable sections are only used if there are no > > relocations. > > This doesn't resolve the problem, unfortunately, because > references to discarded comdat symbols are still kept in .rodata: > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in > discarded section > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Asse > rt > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) > > (That's from building zlib-ng with ASan and your patch on s390). > > So I was rather thinking about adding a reloc parameter to > mergeable_constant_section () and slightly changing the section > name when it's nonzero, e.g. from .cst to .cstrel. After some experimenting, I don't think that what I propose here is a good solution anymore, since it won't work with -fno-merge-constants. What do you think about something like this? --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode mode, rtx x, return get_named_section (NULL, ".data.rel.ro", 3); } + if (reloc) +return targetm.asm_out.function_rodata_section (current_function_decl, + false); + return mergeable_constant_section (mode, align, 0); } This would put constants with relocations into .rodata.. default_function_rodata_section () already ensures that these sections are in the right comdat group. >
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote: > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote: > > This is determined by default_elf_select_rtx_section (). If we > > don't > > want to mix non-reloc and reloc constants, we need to define a > > special > > section there. > > > > It seems to me, however, that this all would be made purely for the > > sake of .LASANPC, which is quite special: it's local, but at the > > same > > time it might need to be comdat. I don't think anything like this > > can > > appear from compiling C/C++ code. > > > > Therefore I wonder if we could just drop it altogether like this? > > > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx > > pbase, > > unsigned int alignb, > > ... > > - emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (decl))); > > + emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (current_function_decl))); > > ... > > > > That's what LLVM is already doing. This will also solve the > > alignment > > problem I referred to earlier. > > LLVM is doing a wrong thing here. The global symbol can be > overridden by > a symbol in another shared library, that is definitely not what we > want, > because the ASAN record is for the particular implementation, not the > other > one which could be quite different. I see; this must be relevant when the overriding library calls the original one through dlsym (RTLD_NEXT). > I think the right fix would be: > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100 > +++ gcc/varasm.cc 2022-04-28 13:22:44.463147066 +0200 > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_ > return get_named_section (NULL, ".data.rel.ro", 3); > } > > + if (reloc) > + return readonly_data_section; > + > return mergeable_constant_section (mode, align, 0); > } > > which matches what we do in categorize_decl_for_section: > else if (reloc & targetm.asm_out.reloc_rw_mask ()) > ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : > SECCAT_DATA_REL_RO; > else if (reloc || flag_merge_constants < 2 > ... > /* C and C++ don't allow different variables to share the > same > location. -fmerge-all-constants allows even that (at the > expense of not conforming). */ > ret = SECCAT_RODATA; > else if (DECL_INITIAL (decl) > && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) > ret = SECCAT_RODATA_MERGE_STR_INIT; > else > ret = SECCAT_RODATA_MERGE_CONST; > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and > .rodata > for non-pic, and mergeable sections are only used if there are no > relocations. This doesn't resolve the problem, unfortunately, because references to discarded comdat symbols are still kept in .rodata: `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in discarded section `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Assert ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) (That's from building zlib-ng with ASan and your patch on s390). So I was rather thinking about adding a reloc parameter to mergeable_constant_section () and slightly changing the section name when it's nonzero, e.g. from .cst to .cstrel. > Anyway, I'd feel much safer to change it only in GCC 13, at least > initially. That's fine with me. > Or are some linkers (say lld or mold, fod ld.bfd I'm pretty sure it > doesn't, > for gold no idea but unlikely) able to merge even constants with > relocations against them? I'm not sure, but putting constants with relocations into a separate mergeable section shouldn't hurt too much. And if such a linker is implemented some day, there would be no need to tweak gcc.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 14:46 +0200, Jakub Jelinek wrote: > On Wed, Apr 27, 2022 at 02:23:00PM +0200, Jakub Jelinek wrote: > > On Wed, Apr 27, 2022 at 11:59:49AM +0200, Ilya Leoshkevich wrote: > > > I get a .LASANPC reloc there in the first place because of > > > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ > > > but of course it may happen for other reasons as well. > > > > In that case I don't see any benefit to put that into a mergeable > > section. > > Why does that happen? > > Because, when a mergeable section doesn't contain any relocations, I > don't > see any point in making it comdat. Because mergeable sections > themselves > are garbage collected, if some constant isn't referenced at all, it > isn't > emitted, or if referenced, multiple copies of the constant are merged > (or > for mergeable strings even string tail merging is performed). > > Jakub > This is determined by default_elf_select_rtx_section (). If we don't want to mix non-reloc and reloc constants, we need to define a special section there. It seems to me, however, that this all would be made purely for the sake of .LASANPC, which is quite special: it's local, but at the same time it might need to be comdat. I don't think anything like this can appear from compiling C/C++ code. Therefore I wonder if we could just drop it altogether like this? @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, ... - emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); + emit_move_insn (mem, expand_normal (build_fold_addr_expr (current_function_decl))); ... That's what LLVM is already doing. This will also solve the alignment problem I referred to earlier.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 11:59 +0200, Ilya Leoshkevich via Gcc-patches wrote: > On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote: > > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc- > > patches wrote: > > > Bootstrapped and regtested on x86_64-redhat-linux and > > > s390x-redhat-linux. Ok for master (or GCC 13 in case this > > > doesn't > > > fit > > > stage4 criteria)? > > > > I'd prefer to defer this to GCC 13 at this point. > > Furthermore, does the linker then actually merge the constants with > > the same constants from other mergeable linkonce sections or other > > mergeable sections? I'm afraid it would only merge constants > > within > > each comdat group and not across the whole ELF object. > > > > Jakub > > > > I experimented with this a little, and actually having a reloc > prevents > merging altogether (the check happens in _bfd_add_merge_section). > > I get a .LASANPC reloc there in the first place because of > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ > but of course it may happen for other reasons as well. I just realized I forgot to mention the "normal" case. There, "aMG" seems to works fine with the whole ELF: $ cat 1.s .globl _start _start: ret .section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat .quad 42 $ cat 2.s .section .rodata.yyy,"aMG",@progbits,8,.yyy,comdat .quad 42 .quad 43 .section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat .quad 42 $ gcc -nostartfiles -fPIE 1.s 2.s $ objdump -D a.out 2000 <.rodata>: 2000: 2a 00 sub(%rax),%al 2002: 00 00 add%al,(%rax) 2004: 00 00 add%al,(%rax) 2006: 00 00 add%al,(%rax) 2008: 2b 00 sub(%rax),%eax 200a: 00 00 add%al,(%rax) 200c: 00 00 add%al,(%rax) ...
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote: > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc- > patches wrote: > > Bootstrapped and regtested on x86_64-redhat-linux and > > s390x-redhat-linux. Ok for master (or GCC 13 in case this doesn't > > fit > > stage4 criteria)? > > I'd prefer to defer this to GCC 13 at this point. > Furthermore, does the linker then actually merge the constants with > the same constants from other mergeable linkonce sections or other > mergeable sections? I'm afraid it would only merge constants within > each comdat group and not across the whole ELF object. > > Jakub > I experimented with this a little, and actually having a reloc prevents merging altogether (the check happens in _bfd_add_merge_section). I get a .LASANPC reloc there in the first place because of https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ but of course it may happen for other reasons as well.
[PATCH] Honor COMDAT for mergeable constant sections
Bootstrapped and regtested on x86_64-redhat-linux and s390x-redhat-linux. Ok for master (or GCC 13 in case this doesn't fit stage4 criteria)? Building C++ template-heavy code with ASan sometimes leads to bogus "defined in discarded section" linker errors. The reason is that .rodata.FUNC.cstN sections are not placed into COMDAT group sections FUNC. This is important, because ASan puts references to .LASANPC labels into these sections. Discarding the respective .text.FUNC section causes the linker error. Fix by adding SECTION_LINKONCE to .rodata.FUNC.cstN sections in mergeable_constant_section () if the current function has an associated COMDAT group. This is similar to what switch_to_exception_section () is currently doing with .gcc_except_table.FUNC sections. gcc/ChangeLog: * varasm.cc (mergeable_constant_section): Honor COMDAT. gcc/testsuite/ChangeLog: * g++.dg/asan/comdat.C: New test. --- gcc/testsuite/g++.dg/asan/comdat.C | 35 ++ gcc/varasm.cc | 6 - 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/asan/comdat.C diff --git a/gcc/testsuite/g++.dg/asan/comdat.C b/gcc/testsuite/g++.dg/asan/comdat.C new file mode 100644 index 000..cd4f3f830a8 --- /dev/null +++ b/gcc/testsuite/g++.dg/asan/comdat.C @@ -0,0 +1,35 @@ +/* Check that we don't emit non-COMDAT rodata. */ + +/* { dg-do compile } */ +/* { dg-final { scan-assembler-not {\.section\t\.rodata\._ZN1hlsIPKcEERS_RKT_\.cst[48],"[^"]*",@progbits,[48]\n} } } */ + +const char *a; + +class b +{ +public: + b (); +}; + +class h +{ +public: + template + h & + operator<< (const c &) + { +d (b ()); +return *this; + } + + void d (b); +}; + +h e (); + +h +g () +{ + e () << a << a << a; + throw; +} diff --git a/gcc/varasm.cc b/gcc/varasm.cc index c41f17d64f7..f2614f0ee39 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -938,7 +938,11 @@ mergeable_constant_section (machine_mode mode ATTRIBUTE_UNUSED, sprintf (name, "%s.cst%d", prefix, (int) (align / 8)); flags |= (align / 8) | SECTION_MERGE; - return get_section (name, flags, NULL); + if (current_function_decl + && DECL_COMDAT_GROUP (current_function_decl) + && HAVE_COMDAT_GROUP) + flags |= SECTION_LINKONCE; + return get_section (name, flags, current_function_decl); } return readonly_data_section; } -- 2.35.1
[PATCH][GCC11] IBM Z: fix `section type conflict` with -mindirect-branch-table
Bootstrapped and regtested on s390x-redhat-linux. Ok for releases/gcc-11? s390_code_end () puts indirect branch tables into separate sections and tries to switch back to wherever it was in the beginning by calling switch_to_section (current_function_section ()). First of all, this is unnecessary - the other backends don't do it. Furthermore, at this time there is no current function, but if the last processed function was cold, in_cold_section_p remains set. This causes targetm.asm_out.function_section () to call targetm.section_type_flags (), which in absence of current function decl classifies the section as SECTION_WRITE. This causes a section type conflict with the existing SECTION_CODE. gcc/ChangeLog: * config/s390/s390.c (s390_code_end): Do not switch back to code section. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-section-type-conflict.c: New test. (cherry picked from commit 8753b13a31c777cdab0265dae0b68534247908f7) --- gcc/config/s390/s390.c| 1 - .../s390/nobp-section-type-conflict.c | 22 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 8895dd7cc76..2d2e6522eb4 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16700,7 +16700,6 @@ s390_code_end (void) assemble_name_raw (asm_out_file, label_start); fputs ("-.\n", asm_out_file); } - switch_to_section (current_function_section ()); } } } diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c new file mode 100644 index 000..5d78bc99bb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c @@ -0,0 +1,22 @@ +/* Checks that we don't get error: section type conflict with ‘put_page’. */ + +/* { dg-do compile } */ +/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern -mindirect-branch-table -O2" } */ + +int a; +int b (void); +void c (int); + +static void +put_page (void) +{ + if (b ()) +c (a); +} + +__attribute__ ((__section__ (".init.text"), __cold__)) void +d (void) +{ + put_page (); + put_page (); +} -- 2.34.1
[PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? s390_code_end () puts indirect branch tables into separate sections and tries to switch back to wherever it was in the beginning by calling switch_to_section (current_function_section ()). First of all, this is unnecessary - the other backends don't do it. Furthermore, at this time there is no current function, but if the last processed function was cold, in_cold_section_p remains set. This causes targetm.asm_out.function_section () to call targetm.section_type_flags (), which in absence of current function decl classifies the section as SECTION_WRITE. This causes a section type conflict with the existing SECTION_CODE. gcc/ChangeLog: * config/s390/s390.cc (s390_code_end): Do not switch back to code section. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-section-type-conflict.c: New test. --- gcc/config/s390/s390.cc | 1 - .../s390/nobp-section-type-conflict.c | 22 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 43c5c72554a..2db12d4ba4b 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -16809,7 +16809,6 @@ s390_code_end (void) assemble_name_raw (asm_out_file, label_start); fputs ("-.\n", asm_out_file); } - switch_to_section (current_function_section ()); } } } diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c new file mode 100644 index 000..5d78bc99bb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c @@ -0,0 +1,22 @@ +/* Checks that we don't get error: section type conflict with ‘put_page’. */ + +/* { dg-do compile } */ +/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern -mindirect-branch-table -O2" } */ + +int a; +int b (void); +void c (int); + +static void +put_page (void) +{ + if (b ()) +c (a); +} + +__attribute__ ((__section__ (".init.text"), __cold__)) void +d (void) +{ + put_page (); + put_page (); +} -- 2.34.1
[PATCH gcc-11 2/2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use %K, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT, use %K. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * g++.target/s390/mi-thunk.C: New test. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call-z10-pic-nodatarel.c: New test. * gcc.target/s390/call-z10-pic.c: New test. * gcc.target/s390/call-z10.c: New test. * gcc.target/s390/call-z9-pic-nodatarel.c: New test. * gcc.target/s390/call-z9-pic.c: New test. * gcc.target/s390/call-z9.c: New test. * gcc.target/s390/mfentry-m64-pic.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls-pic.c: New test. * gcc.target/s390/tls.c: New test. (cherry picked from commit 0990d93dd8a) --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 81 +-- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 ++ .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 + gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 + gcc/testsuite/gcc.target/s390/call-z10.c | 20 + .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 + gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 + gcc/testsuite/gcc.target/s390/call-z9.c | 20 + gcc/testsuite/gcc.target/s390/call.h | 40 + .../gcc.target/s390/mfentry-m64-pic.c | 9 +++ gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 gcc/testsuite/gcc.target/s390/tls.c | 10 +++ gcc/testsuite/gcc.target/s390/tls.h | 23 ++ 19 files changed, 320 insertions(+), 73 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.h diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md index 15093cb4b30..99c343aa32c 100644 --- a/gcc/config/s390/predicates.md +++ b/gcc/config/s390/predicates.md @@ -101,10 +101,13 @@ (define_special_predicate "bras_sym_operand" (ior (and (match_code "symbol_ref") - (match_test "!flag_pic || SYMBOL_REF_LOCAL_P (op)")) + (ior (match_test "!flag_pic") +(match_test
[PATCH gcc-11 1/2] IBM Z: Define NO_PROFILE_COUNTERS
s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. gcc/testsuite/ChangeLog: * gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new prologue size. * gcc.target/s390/mnop-mcount-m64.c: Likewise. (cherry picked from commit a1c1b7a888a) --- gcc/config/s390/s390.c| 42 +++ gcc/config/s390/s390.h| 2 + .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- 4 files changed, 20 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index c5d4c439bcc..a863dfce9a2 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13120,33 +13120,25 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) { - rtx op[8]; - - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); + rtx op[4]; fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); - op[7] = GEN_INT (UNITS_PER_LONG); - - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; + op[3] = GEN_INT (UNITS_PER_LONG); - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); + op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { - op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT); - op[4] = gen_rtx_CONST (Pmode, op[4]); + op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT); + op[2] = gen_rtx_CONST (Pmode, op[2]); } if (flag_record_mcount) @@ -13160,20 +13152,19 @@ s390_function_profiler (FILE *file, int labelno) warning (OPT_Wcannot_profile, "nested functions cannot be profiled " "with %<-mfentry%> on s390"); else - output_asm_insn ("brasl\t0,%4", op); + output_asm_insn ("brasl\t0,%2", op); } else if (TARGET_64BIT) { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 + -/* brasl */ 3 + /* lg */ 3); + output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 + +/* lg */ 3); else { output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); @@ -13182,15 +13173,14 @@ s390_function_profiler (FILE *file, int labelno) else { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 + -/* brasl */ 3 + /* l */ 2); + output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 + +/* l */ 2); else { output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines for nested functions. */ diff --git a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c index b2ad9f5bced..874ceb96fe8 100644 --- a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c +++ b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c @@ -4,5 +4,5 @@ void profileme
[PATCH gcc-11 0/2] Backport kpatch changes
Hi, This series contains a backport of kpatch changes needed to support https://github.com/dynup/kpatch/pull/1203 so that it could be used in RHEL 9. The patches have been in master for 4 months now without issues. Bootstrapped and regtested on s390x-redhat-linux. Ok for gcc-11? Best regards, Ilya Ilya Leoshkevich (2): IBM Z: Define NO_PROFILE_COUNTERS IBM Z: Use @PLT symbols for local functions in 64-bit mode gcc/config/s390/predicates.md | 9 +- gcc/config/s390/s390.c| 115 +++--- gcc/config/s390/s390.h| 2 + gcc/config/s390/s390.md | 32 ++--- gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 +++ gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 +++ gcc/testsuite/gcc.target/s390/call-z10.c | 20 +++ .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 +++ gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 +++ gcc/testsuite/gcc.target/s390/call-z9.c | 20 +++ gcc/testsuite/gcc.target/s390/call.h | 40 ++ .../gcc.target/s390/mfentry-m64-pic.c | 9 ++ .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +--- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 +++ gcc/testsuite/gcc.target/s390/tls.c | 10 ++ gcc/testsuite/gcc.target/s390/tls.h | 23 22 files changed, 336 insertions(+), 97 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.h -- 2.31.1
Re: [PATCH v3 3/3] reassoc: Test rank biasing
On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote: > On Sun, 26 Sep 2021, Ilya Leoshkevich wrote: > > > Add both positive and negative tests. > > The tests will likely be quite fragile with respect to what is > actually vectorized on which target. If you move the tests > to gcc.dg/vect/ you could at least do > > /* { dg-require-effective-target vect_int } */ > > do you need to look for the exact GIMPLE IL or is it enough to > verify we are vectorizing the reduction? Actually I don't think vectorization is that important here, and I only check how many times sum_x = sum_y + _z appears. So I use (?:vect_)?, which may or may not be there. An alternative I considered was to use -fno-tree-vectorize to get smaller regexes, but I thought it would be nice to know that vectorization does not mess up reassociation results. Best regards, Ilya
[PATCH v3 3/3] reassoc: Test rank biasing
Add both positive and negative tests. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-46.c: New test. * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests. * gcc.dg/tree-ssa/reassoc-47.c: New test. * gcc.dg/tree-ssa/reassoc-48.c: New test. * gcc.dg/tree-ssa/reassoc-49.c: New test. * gcc.dg/tree-ssa/reassoc-50.c: New test. * gcc.dg/tree-ssa/reassoc-51.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 + gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 7 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c new file mode 100644 index 000..97563dd929f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#include "reassoc-46.h" + +/* Check that the loop accumulator is added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h new file mode 100644 index 000..e60b490ea0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h @@ -0,0 +1,33 @@ +#define M 1024 +unsigned int arr1[M]; +unsigned int arr2[M]; +volatile unsigned int sink; + +unsigned int +test (void) +{ + unsigned int sum = 0; + for (int i = 0; i < M; i++) +{ +#ifdef MODIFY + /* Modify the loop accumulator using a chain of operations - this should + not affect its rank biasing. */ + sum |= 1; + sum ^= 2; +#endif +#ifdef STORE + /* Save the loop accumulator into a global variable - this should not + affect its rank biasing. */ + sink = sum; +#endif +#ifdef USE + /* Add a tricky use of the loop accumulator - this should prevent its + rank biasing. */ + i = (i + sum) % M; +#endif + /* Use addends with different ranks. */ + sum += arr1[i]; + sum += arr2[((i ^ 1) + 1) % M]; +} + return sum; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c new file mode 100644 index 000..1b0f0fdabe1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#include "reassoc-46.h" + +/* Check that if the loop accumulator is saved into a global variable, it's + still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c new file mode 100644 index 000..13836ebe8e6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is modified using a chain of operations + other than addition, its new value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c new file mode 100644 index 000..c1136a447a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is both modified using a chain of + operations other than addition and stored into a global variable, its new + value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d
[PATCH v3 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
PR tree-optimization/49749 introduced code that shortens dependency chains containing loop accumulators by placing them last on operand lists of associative operations. 456.hmmer benchmark on s390 could benefit from this, however, the code that needs it modifies loop accumulator before using it, and since only so-called loop-carried phis are are treated as loop accumulators, the code in the present form doesn't really help. According to Bill Schmidt - the original author - such a conservative approach was chosen so as to avoid unnecessarily swapping operands, which might cause unpredictable effects. However, giving special treatment to forms of loop accumulators is acceptable. The definition of loop-carried phi is: it's a single-use phi, which is used in the same innermost loop it's defined in, at least one argument of which is defined in the same innermost loop as the phi itself. Given this, it seems natural to treat single uses of such phis as phis themselves. gcc/ChangeLog: * tree-ssa-reassoc.c (biased_names): New global. (propagate_bias_p): New function. (loop_carried_phi): Remove. (propagate_rank): Propagate bias along single uses. (get_rank): Update biased_names when needed. --- gcc/tree-ssa-reassoc.c | 109 - 1 file changed, 74 insertions(+), 35 deletions(-) diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 420c14e8cf5..db9fb4e1cac 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -211,6 +211,10 @@ static int64_t *bb_rank; /* Operand->rank hashtable. */ static hash_map *operand_rank; +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be + biased. */ +static auto_bitmap biased_names; + /* Vector of SSA_NAMEs on which after reassociate_bb is done with all basic blocks the CFG should be adjusted - basic blocks split right after that SSA_NAME's definition statement and before @@ -256,6 +260,53 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi) the rank difference between two blocks. */ #define PHI_LOOP_BIAS (1 << 15) +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's + operands to the STMT's left-hand side. The goal is to preserve bias in code + like this: + + x_1 = phi(x_0, x_2) + a = x_1 | 1 + b = a ^ 2 + .MEM = b + c = b + d + x_2 = c + e + + That is, we need to preserve bias along single-use chains originating from + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be + uses, because only they participate in rank propagation. */ +static bool +propagate_bias_p (gimple *stmt) +{ + use_operand_p use; + imm_use_iterator use_iter; + gimple *single_use_stmt = NULL; + + if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_reference) +return false; + + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) +{ + gimple *current_use_stmt = USE_STMT (use); + + if (is_gimple_assign (current_use_stmt) + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME) + { + if (single_use_stmt != NULL && single_use_stmt != current_use_stmt) + return false; + single_use_stmt = current_use_stmt; + } +} + + if (single_use_stmt == NULL) +return false; + + if (gimple_bb (stmt)->loop_father + != gimple_bb (single_use_stmt)->loop_father) +return false; + + return true; +} + /* Rank assigned to a phi statement. If STMT is a loop-carried phi of an innermost loop, and the phi has only a single use which is inside the loop, then the rank is the block rank of the loop latch plus an @@ -313,49 +364,27 @@ phi_rank (gimple *stmt) return bb_rank[bb->index]; } -/* If EXP is an SSA_NAME defined by a PHI statement that represents a - loop-carried dependence of an innermost loop, return TRUE; else - return FALSE. */ -static bool -loop_carried_phi (tree exp) -{ - gimple *phi_stmt; - int64_t block_rank; - - if (TREE_CODE (exp) != SSA_NAME - || SSA_NAME_IS_DEFAULT_DEF (exp)) -return false; - - phi_stmt = SSA_NAME_DEF_STMT (exp); - - if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI) -return false; - - /* Non-loop-carried phis have block rank. Loop-carried phis have - an additional bias added in. If this phi doesn't have block rank, - it's biased and should not be propagated. */ - block_rank = bb_rank[gimple_bb (phi_stmt)->index]; - - if (phi_rank (phi_stmt) != block_rank) -return true; - - return false; -} - /* Return the maximum of RANK and the rank that should be propagated from expression OP. For most operands, this is just the rank of OP. For loop-carried phis, the value is zero to avoid undoing the bias in favor of the phi. */ static int64_t -propagate_rank (int64_t rank, tree op) +propagate_rank (int64_t rank, tree op, bool *maybe_biased_p) { int64_t op_rank; - if (loop_carried_phi (op)) -
[PATCH v3 1/3] reassoc: Do not bias loop-carried PHIs early
Biasing loop-carried PHIs during the 1st reassociation pass interferes with reduction chains and does not bring measurable benefits, so do it only during the 2nd reassociation pass. gcc/ChangeLog: * passes.def (pass_reassoc): Rename parameter to early_p. * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p): New variable. (phi_rank): Don't bias loop-carried phi ranks before vectorization pass. (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter. (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p initializer. (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p value. (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to execute_reassoc. (pass_reassoc::bias_loop_carried_phi_ranks_p): New member. --- gcc/passes.def | 4 ++-- gcc/tree-ssa-reassoc.c | 16 ++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index d7a1f8c97a6..c5f915d04c6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -242,7 +242,7 @@ along with GCC; see the file COPYING3. If not see /* Identify paths that should never be executed in a conforming program and isolate those paths. */ NEXT_PASS (pass_isolate_erroneous_paths); - NEXT_PASS (pass_reassoc, true /* insert_powi_p */); + NEXT_PASS (pass_reassoc, true /* early_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt, false /* early_p */); @@ -325,7 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_lower_switch); NEXT_PASS (pass_cse_reciprocals); - NEXT_PASS (pass_reassoc, false /* insert_powi_p */); + NEXT_PASS (pass_reassoc, false /* early_p */); NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 8498cfc7aa8..420c14e8cf5 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -180,6 +180,10 @@ along with GCC; see the file COPYING3. If not see point 3a in the pass header comment. */ static bool reassoc_insert_powi_p; +/* Enable biasing ranks of loop accumulators. We don't want this before + vectorization, since it interferes with reduction chains. */ +static bool reassoc_bias_loop_carried_phi_ranks_p; + /* Statistics */ static struct { @@ -269,6 +273,9 @@ phi_rank (gimple *stmt) use_operand_p use; gimple *use_stmt; + if (!reassoc_bias_loop_carried_phi_ranks_p) +return bb_rank[bb->index]; + /* We only care about real loops (those with a latch). */ if (!father->latch) return bb_rank[bb->index]; @@ -6940,9 +6947,10 @@ fini_reassoc (void) optimization of a gimple conditional. Otherwise returns zero. */ static unsigned int -execute_reassoc (bool insert_powi_p) +execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p) { reassoc_insert_powi_p = insert_powi_p; + reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p; init_reassoc (); @@ -6983,15 +6991,19 @@ public: { gcc_assert (n == 0); insert_powi_p = param; + bias_loop_carried_phi_ranks_p = !param; } virtual bool gate (function *) { return flag_tree_reassoc != 0; } virtual unsigned int execute (function *) -{ return execute_reassoc (insert_powi_p); } + { +return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p); + } private: /* Enable insertion of __builtin_powi calls during execute_reassoc. See point 3a in the pass header comment. */ bool insert_powi_p; + bool bias_loop_carried_phi_ranks_p; }; // class pass_reassoc } // anon namespace -- 2.31.1
[PATCH v3 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579976.html Changes in v3: * Do not propagate bias along tcc_references. * Call get_rank () before checking biased_names. * Add loop-carried phis to biased_names. * Move the propagate_bias_p () call outside of the loop. * Test with -ftree-vectorize, adjust expectations. Ilya Leoshkevich (3): reassoc: Do not bias loop-carried PHIs early reassoc: Propagate PHI_LOOP_BIAS along single uses reassoc: Test rank biasing gcc/passes.def | 4 +- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 ++ gcc/tree-ssa-reassoc.c | 125 +++-- 9 files changed, 180 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c -- 2.31.1
Re: [PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
On Thu, 2021-09-23 at 13:55 +0200, Richard Biener wrote: > On Wed, 22 Sep 2021, Ilya Leoshkevich wrote: > > > PR tree-optimization/49749 introduced code that shortens dependency > > chains containing loop accumulators by placing them last on operand > > lists of associative operations. > > > > 456.hmmer benchmark on s390 could benefit from this, however, the > > code > > that needs it modifies loop accumulator before using it, and since > > only > > so-called loop-carried phis are are treated as loop accumulators, > > the > > code in the present form doesn't really help. According to Bill > > Schmidt - the original author - such a conservative approach was > > chosen > > so as to avoid unnecessarily swapping operands, which might cause > > unpredictable effects. However, giving special treatment to forms > > of > > loop accumulators is acceptable. > > > > The definition of loop-carried phi is: it's a single-use phi, which > > is > > used in the same innermost loop it's defined in, at least one > > argument > > of which is defined in the same innermost loop as the phi itself. > > Given this, it seems natural to treat single uses of such phis as > > phis > > themselves. > > > > gcc/ChangeLog: > > > > * tree-ssa-reassoc.c (biased_names): New global. > > (propagate_bias_p): New function. > > (loop_carried_phi): Remove. > > (propagate_rank): Propagate bias along single uses. > > (get_rank): Update biased_names when needed. > > --- > > gcc/tree-ssa-reassoc.c | 97 -- > > > > 1 file changed, 64 insertions(+), 33 deletions(-) > > > > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > > index 420c14e8cf5..2f7a8882aac 100644 > > --- a/gcc/tree-ssa-reassoc.c > > +++ b/gcc/tree-ssa-reassoc.c > > @@ -211,6 +211,10 @@ static int64_t *bb_rank; > > /* Operand->rank hashtable. */ > > static hash_map *operand_rank; > > > > +/* SSA_NAMEs that are forms of loop accumulators and whose ranks > > need to be > > + biased. */ > > +static auto_bitmap biased_names; > > + > > /* Vector of SSA_NAMEs on which after reassociate_bb is done with > > all basic blocks the CFG should be adjusted - basic blocks > > split right after that SSA_NAME's definition statement and > > before > > @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator > > *gsi) > > the rank difference between two blocks. */ > > #define PHI_LOOP_BIAS (1 << 15) > > > > +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of > > the STMT's > > + operands to the STMT's left-hand side. The goal is to preserve > > bias in code > > + like this: > > + > > + x_1 = phi(x_0, x_2) > > + a = x_1 | 1 > > + b = a ^ 2 > > + .MEM = b > > + c = b + d > > + x_2 = c + e > > + > > + That is, we need to preserve bias along single-use chains > > originating from > > + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are > > considered to be > > + uses, because only they participate in rank propagation. */ > > +static bool > > +propagate_bias_p (gimple *stmt) > > +{ > > + use_operand_p use; > > + imm_use_iterator use_iter; > > + gimple *single_use_stmt = NULL; > > + > > + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) > > + { > > + gimple *current_use_stmt = USE_STMT (use); > > + > > + if (is_gimple_assign (current_use_stmt) > > + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == > > SSA_NAME) > > + { > > + if (single_use_stmt != NULL) > > what if single_use_stmt == current_use_stmt? We might have two > uses on a stmt after all - should that still be biased? I guess not > and thus the check is correct? Come to think of it, it should be ok to bias it. Things like x = x + x are fine (this particular case can be transformed into something else earlier, but I think the overall point still holds). > > > + return false; > > + single_use_stmt = current_use_stmt; > > + } > > + } > > + > > + if (single_use_stmt == NULL) > > + return false; > > + > > + if (gimple_bb (stmt)->loop_father > > + != gimple_bb (single_use_stmt)->loop_father) > > + return false; > > + > > + return true; > > +} > > + > > /* Rank assigned to a phi statement. If STMT is a loop-carried > > phi of > > an innermost loop, and the phi has only a single use which is > > inside > > the loop, then the rank is the block rank of the loop latch > > plus an > > @@ -313,46 +361,23 @@ phi_rank (gimple *stmt) > > return bb_rank[bb->index]; > > } > > > > -/* If EXP is an SSA_NAME defined by a PHI statement that > > represents a > > - loop-carried dependence of an innermost loop, return TRUE; else > > - return FALSE. */ > > -static bool > > -loop_carried_phi (tree exp) > > -{ > > - gimple *phi_stmt; > > - int64_t block_rank; > > - > > - if (TREE_CODE (exp) != SSA_NAME > > - || SSA_NAME_IS_DEFAULT_DEF (exp)) > > - return fals
[PATCH v2 3/3] reassoc: Test rank biasing
Add both positive and negative tests. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-46.c: New test. * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests. * gcc.dg/tree-ssa/reassoc-47.c: New test. * gcc.dg/tree-ssa/reassoc-48.c: New test. * gcc.dg/tree-ssa/reassoc-49.c: New test. * gcc.dg/tree-ssa/reassoc-50.c: New test. * gcc.dg/tree-ssa/reassoc-51.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 + gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 7 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c new file mode 100644 index 000..69e02bc4d4a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include "reassoc-46.h" + +/* Check that the loop accumulator is added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h new file mode 100644 index 000..e60b490ea0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h @@ -0,0 +1,33 @@ +#define M 1024 +unsigned int arr1[M]; +unsigned int arr2[M]; +volatile unsigned int sink; + +unsigned int +test (void) +{ + unsigned int sum = 0; + for (int i = 0; i < M; i++) +{ +#ifdef MODIFY + /* Modify the loop accumulator using a chain of operations - this should + not affect its rank biasing. */ + sum |= 1; + sum ^= 2; +#endif +#ifdef STORE + /* Save the loop accumulator into a global variable - this should not + affect its rank biasing. */ + sink = sum; +#endif +#ifdef USE + /* Add a tricky use of the loop accumulator - this should prevent its + rank biasing. */ + i = (i + sum) % M; +#endif + /* Use addends with different ranks. */ + sum += arr1[i]; + sum += arr2[((i ^ 1) + 1) % M]; +} + return sum; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c new file mode 100644 index 000..84b51ccddb0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define MODIFY +#include "reassoc-46.h" + +/* Check that if the loop accumulator is saved into a global variable, it's + still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c new file mode 100644 index 000..53ae8820281 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is modified using a chain of operations + other than addition, its new value is still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c new file mode 100644 index 000..a6941d5ac2b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define MODIFY +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is both modified using a chain of + operations other than addition and stored into a global variable, its new + value is still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c new file mode 100644 index 000..68cd308c4f1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimize
[PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
PR tree-optimization/49749 introduced code that shortens dependency chains containing loop accumulators by placing them last on operand lists of associative operations. 456.hmmer benchmark on s390 could benefit from this, however, the code that needs it modifies loop accumulator before using it, and since only so-called loop-carried phis are are treated as loop accumulators, the code in the present form doesn't really help. According to Bill Schmidt - the original author - such a conservative approach was chosen so as to avoid unnecessarily swapping operands, which might cause unpredictable effects. However, giving special treatment to forms of loop accumulators is acceptable. The definition of loop-carried phi is: it's a single-use phi, which is used in the same innermost loop it's defined in, at least one argument of which is defined in the same innermost loop as the phi itself. Given this, it seems natural to treat single uses of such phis as phis themselves. gcc/ChangeLog: * tree-ssa-reassoc.c (biased_names): New global. (propagate_bias_p): New function. (loop_carried_phi): Remove. (propagate_rank): Propagate bias along single uses. (get_rank): Update biased_names when needed. --- gcc/tree-ssa-reassoc.c | 97 -- 1 file changed, 64 insertions(+), 33 deletions(-) diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 420c14e8cf5..2f7a8882aac 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -211,6 +211,10 @@ static int64_t *bb_rank; /* Operand->rank hashtable. */ static hash_map *operand_rank; +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be + biased. */ +static auto_bitmap biased_names; + /* Vector of SSA_NAMEs on which after reassociate_bb is done with all basic blocks the CFG should be adjusted - basic blocks split right after that SSA_NAME's definition statement and before @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi) the rank difference between two blocks. */ #define PHI_LOOP_BIAS (1 << 15) +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's + operands to the STMT's left-hand side. The goal is to preserve bias in code + like this: + + x_1 = phi(x_0, x_2) + a = x_1 | 1 + b = a ^ 2 + .MEM = b + c = b + d + x_2 = c + e + + That is, we need to preserve bias along single-use chains originating from + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be + uses, because only they participate in rank propagation. */ +static bool +propagate_bias_p (gimple *stmt) +{ + use_operand_p use; + imm_use_iterator use_iter; + gimple *single_use_stmt = NULL; + + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) +{ + gimple *current_use_stmt = USE_STMT (use); + + if (is_gimple_assign (current_use_stmt) + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME) + { + if (single_use_stmt != NULL) + return false; + single_use_stmt = current_use_stmt; + } +} + + if (single_use_stmt == NULL) +return false; + + if (gimple_bb (stmt)->loop_father + != gimple_bb (single_use_stmt)->loop_father) +return false; + + return true; +} + /* Rank assigned to a phi statement. If STMT is a loop-carried phi of an innermost loop, and the phi has only a single use which is inside the loop, then the rank is the block rank of the loop latch plus an @@ -313,46 +361,23 @@ phi_rank (gimple *stmt) return bb_rank[bb->index]; } -/* If EXP is an SSA_NAME defined by a PHI statement that represents a - loop-carried dependence of an innermost loop, return TRUE; else - return FALSE. */ -static bool -loop_carried_phi (tree exp) -{ - gimple *phi_stmt; - int64_t block_rank; - - if (TREE_CODE (exp) != SSA_NAME - || SSA_NAME_IS_DEFAULT_DEF (exp)) -return false; - - phi_stmt = SSA_NAME_DEF_STMT (exp); - - if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI) -return false; - - /* Non-loop-carried phis have block rank. Loop-carried phis have - an additional bias added in. If this phi doesn't have block rank, - it's biased and should not be propagated. */ - block_rank = bb_rank[gimple_bb (phi_stmt)->index]; - - if (phi_rank (phi_stmt) != block_rank) -return true; - - return false; -} - /* Return the maximum of RANK and the rank that should be propagated from expression OP. For most operands, this is just the rank of OP. For loop-carried phis, the value is zero to avoid undoing the bias in favor of the phi. */ static int64_t -propagate_rank (int64_t rank, tree op) +propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p) { int64_t op_rank; - if (loop_carried_phi (op)) -return rank; + if (TREE_CODE (op) == SSA_NAME + && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op))) +{ + i
[PATCH v2 1/3] reassoc: Do not bias loop-carried PHIs early
Biasing loop-carried PHIs during the 1st reassociation pass interferes with reduction chains and does not bring measurable benefits, so do it only during the 2nd reassociation pass. gcc/ChangeLog: * passes.def (pass_reassoc): Rename parameter to early_p. * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p): New variable. (phi_rank): Don't bias loop-carried phi ranks before vectorization pass. (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter. (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p initializer. (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p value. (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to execute_reassoc. (pass_reassoc::bias_loop_carried_phi_ranks_p): New member. --- gcc/passes.def | 4 ++-- gcc/tree-ssa-reassoc.c | 16 ++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index d7a1f8c97a6..c5f915d04c6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -242,7 +242,7 @@ along with GCC; see the file COPYING3. If not see /* Identify paths that should never be executed in a conforming program and isolate those paths. */ NEXT_PASS (pass_isolate_erroneous_paths); - NEXT_PASS (pass_reassoc, true /* insert_powi_p */); + NEXT_PASS (pass_reassoc, true /* early_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt, false /* early_p */); @@ -325,7 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_lower_switch); NEXT_PASS (pass_cse_reciprocals); - NEXT_PASS (pass_reassoc, false /* insert_powi_p */); + NEXT_PASS (pass_reassoc, false /* early_p */); NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 8498cfc7aa8..420c14e8cf5 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -180,6 +180,10 @@ along with GCC; see the file COPYING3. If not see point 3a in the pass header comment. */ static bool reassoc_insert_powi_p; +/* Enable biasing ranks of loop accumulators. We don't want this before + vectorization, since it interferes with reduction chains. */ +static bool reassoc_bias_loop_carried_phi_ranks_p; + /* Statistics */ static struct { @@ -269,6 +273,9 @@ phi_rank (gimple *stmt) use_operand_p use; gimple *use_stmt; + if (!reassoc_bias_loop_carried_phi_ranks_p) +return bb_rank[bb->index]; + /* We only care about real loops (those with a latch). */ if (!father->latch) return bb_rank[bb->index]; @@ -6940,9 +6947,10 @@ fini_reassoc (void) optimization of a gimple conditional. Otherwise returns zero. */ static unsigned int -execute_reassoc (bool insert_powi_p) +execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p) { reassoc_insert_powi_p = insert_powi_p; + reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p; init_reassoc (); @@ -6983,15 +6991,19 @@ public: { gcc_assert (n == 0); insert_powi_p = param; + bias_loop_carried_phi_ranks_p = !param; } virtual bool gate (function *) { return flag_tree_reassoc != 0; } virtual unsigned int execute (function *) -{ return execute_reassoc (insert_powi_p); } + { +return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p); + } private: /* Enable insertion of __builtin_powi calls during execute_reassoc. See point 3a in the pass header comment. */ bool insert_powi_p; + bool bias_loop_carried_phi_ranks_p; }; // class pass_reassoc } // anon namespace -- 2.31.1
[PATCH v2 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
This is an update to my very old patch with the review comments addressed. Bootstrapped and regtested x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548785.html Changes in v2: * Disable PHI biasing in the early pass instance in a separate patch. * Replace s390-specific tests with the generic tree-ssa ones. * Replace the fragile (op_rank & PHI_LOOP_BIAS) test with auto_bitmap biased_names. The review suggestion was to rather check whether op is defined by a loop-carried phi, but this would allow detecting only single assingments, and not assignment chains. Another alternative that would make the check less fragile was to use saturating addition in order to prevent overflows into the PHI_LOOP_BIAS bit, but auto_bitmap of SSA_NAMEs allows graceful processing of large basic blocks, and its memory overhead looks acceptable. * Restructure the code to make it a bit more readable. The overall logic is the same as in v1. I considered implementing an idea from [1], more specifically, detecting single-use chains in is_phi_for_stmt() so that swap_ops_for_binary_stmt() shifts the corresponding operand towards the end. These two functions actually seem to serve a very related purpose. However, for single-use chain detection we would still need to recursively traverse SSA_NAME_DEF_STMTs of operands, which propagate_rank() and friends already do. So this would not have resulted in a significant code simplification. [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549149.html Ilya Leoshkevich (3): reassoc: Do not bias loop-carried PHIs early reassoc: Propagate PHI_LOOP_BIAS along single uses reassoc: Test rank biasing gcc/passes.def | 4 +- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 ++ gcc/tree-ssa-reassoc.c | 113 ++--- 9 files changed, 170 insertions(+), 37 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c -- 2.31.1
[PATCH] IBM Z: Enable LSan and TSan
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? libsanitizer/ChangeLog: * configure.tgt (s390*-*-linux*): Enable LSan and TSan for s390x. --- libsanitizer/configure.tgt | 5 + 1 file changed, 5 insertions(+) diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt index 0ca5d9fd924..f635e412bdc 100644 --- a/libsanitizer/configure.tgt +++ b/libsanitizer/configure.tgt @@ -41,6 +41,11 @@ case "${target}" in sparc*-*-linux*) ;; s390*-*-linux*) + if test x$ac_cv_sizeof_void_p = x8; then + TSAN_SUPPORTED=yes + LSAN_SUPPORTED=yes + TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_s390x.lo + fi ;; sparc*-*-solaris2.11*) ;; -- 2.31.1
[PATCH] IBM Z: Fix 5 tests in 31-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? gcc/testsuite/ChangeLog: * gcc.target/s390/global-array-element-pic2.c: Add -mzarch, add an expectation for 31-bit mode. * gcc.target/s390/load-imm64-1.c: Use unsigned long long. * gcc.target/s390/load-imm64-2.c: Likewise. * gcc.target/s390/vector/long-double-vx-macro-off-on.c: Use -mzarch. * gcc.target/s390/vector/long-double-vx-macro-on-off.c: Likewise. --- gcc/testsuite/gcc.target/s390/global-array-element-pic2.c| 5 +++-- gcc/testsuite/gcc.target/s390/load-imm64-1.c | 4 ++-- gcc/testsuite/gcc.target/s390/load-imm64-2.c | 4 ++-- .../gcc.target/s390/vector/long-double-vx-macro-off-on.c | 2 +- .../gcc.target/s390/vector/long-double-vx-macro-on-off.c | 2 +- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c index 72b87d40b85..0ee10841cac 100644 --- a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c +++ b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c @@ -1,6 +1,6 @@ /* Test accesses to global array elements in PIC code. */ /* { dg-do compile } */ -/* { dg-options "-O1 -march=z10 -fPIC" } */ +/* { dg-options "-O1 -march=z10 -mzarch -fPIC" } */ extern char a[] __attribute__ ((aligned (2))); extern char *b; @@ -8,6 +8,7 @@ extern char *b; void c() { b = a + 4; - /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" } } */ + /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" { target lp64 } } } */ + /* { dg-final { scan-assembler "(?n)\n\tlrl\t%r\\d+,a@GOTENT\n" { target { ! lp64 } } } } */ /* { dg-final { scan-assembler-not "(?n)\n\tlarl\t%r\\d+,a\[^@\]" } } */ } diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c b/gcc/testsuite/gcc.target/s390/load-imm64-1.c index 03d17f59096..8e812f2f01d 100644 --- a/gcc/testsuite/gcc.target/s390/load-imm64-1.c +++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c @@ -4,10 +4,10 @@ /* { dg-do compile } */ /* { dg-options "-O3 -march=z9-109" } */ -unsigned long +unsigned long long magic (void) { - return 0x3f08c5392f756cd; + return 0x3f08c5392f756cdULL; } /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c b/gcc/testsuite/gcc.target/s390/load-imm64-2.c index ee0ff3b0a91..c3536b4d031 100644 --- a/gcc/testsuite/gcc.target/s390/load-imm64-2.c +++ b/gcc/testsuite/gcc.target/s390/load-imm64-2.c @@ -4,10 +4,10 @@ /* { dg-do compile } */ /* { dg-options "-O3 -march=z10" } */ -unsigned long +unsigned long long magic (void) { - return 0x3f08c5392f756cd; + return 0x3f08c5392f756cdULL; } /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c index 2d67679bb11..513912e669d 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target target_attribute } */ -/* { dg-options "-march=z14" } */ +/* { dg-options "-march=z14 -mzarch" } */ #if !defined(__LONG_DOUBLE_VX__) #error #endif diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c index 6f264313408..6b3cb321338 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target target_attribute } */ -/* { dg-options "-march=z13" } */ +/* { dg-options "-march=z13 -mzarch" } */ #if defined(__LONG_DOUBLE_VX__) #error #endif -- 2.31.1
[PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to weak symbols in non-PIC code (Ulrich). Add TLS tests. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(), add tests for these cases. This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use %K, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT, use %K. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * g++.target/s390/mi-thunk.C: New test. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call-z10-pic-nodatarel.c: New test. * gcc.target/s390/call-z10-pic.c: New test. * gcc.target/s390/call-z10.c: New test. * gcc.target/s390/call-z9-pic-nodatarel.c: New test. * gcc.target/s390/call-z9-pic.c: New test. * gcc.target/s390/call-z9.c: New test. * gcc.target/s390/mfentry-m64-pic.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls-pic.c: New test. * gcc.target/s390/tls.c: New test. --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 81 +-- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 ++ .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 + gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 + gcc/testsuite/gcc.target/s390/call-z10.c | 20 + .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 + gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 + gcc/testsuite/gcc.target/s390/call-z9.c | 20 + gcc/testsuite/gcc.target/s390/call.h | 40 + .../gcc.target/s390/mfentry-m64-pic.c | 9 +++ gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 gcc/testsuite/gcc.target/s390/tls.c | 10 +++ gcc/testsuite/gcc.target/s390/tls.h | 23 ++ 19 files changed, 320 insertions(+), 73 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 1006
Re: [PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
On Wed, 2021-07-07 at 21:03 +0200, Ilya Leoshkevich wrote: > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html > v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to > UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to > weak symbols in non-PIC code (Ulrich). Add TLS tests. > > > > This helps with generating code for kernel hotpatches, which contain > individual functions and are loaded more than 2G away from vmlinux. > This should not create performance regressions for the normal use > cases, because for local functions ld replaces @PLT calls with direct > calls. Please disregard this patch, I just realized I missed two output_asm_insn () calls in s390.c: one in function_profiler () and one in s390_output_mi_thunk (). I'll send a v3.
[PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to weak symbols in non-PIC code (Ulrich). Add TLS tests. This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call31-z10-pic-nodatarel.c: New test. * gcc.target/s390/call31-z10-pic.c: New test. * gcc.target/s390/call31-z10.c: New test. * gcc.target/s390/call31-z9-pic-nodatarel.c: New test. * gcc.target/s390/call31-z9-pic.c: New test. * gcc.target/s390/call31-z9.c: New test. * gcc.target/s390/call64-z10-pic-nodatarel.c: New test. * gcc.target/s390/call64-z10-pic.c: New test. * gcc.target/s390/call64-z10.c: New test. * gcc.target/s390/call64-z9-pic-nodatarel.c: New test. * gcc.target/s390/call64-z9-pic.c: New test. * gcc.target/s390/call64-z9.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls31-pic.c: New test. * gcc.target/s390/tls31.c: New test. * gcc.target/s390/tls64-pic.c: New test. * gcc.target/s390/tls64.c: New test. --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 73 ++- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/gcc.target/s390/call.h | 40 ++ .../s390/call31-z10-pic-nodatarel.c | 16 .../gcc.target/s390/call31-z10-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 .../s390/call64-z10-pic-nodatarel.c | 17 + .../gcc.target/s390/call64-z10-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z10.c| 15 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9.c | 15 gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +-- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls.h | 23 ++ gcc/testsuite/gcc.target/s390/tls31-pic.c | 14 gcc/testsuite/gcc.target/s390/tls31.c | 9 +++ gcc/testsuite/gcc.target/s390/tls64-pic.c | 14 gcc/testsuite/gcc.target/s390/tls64.c | 9 +++ 25 files changed, 382 insertions(+), 69 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c create mode 100644 gcc/tes
[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? This helps with generating the code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/s390.c (print_operand): Handle %K. * config/s390/s390.md (*movdi_64): Use %K for larl. (reload_base_64): Likewise. (*sibcall_brc): Use %K for j. (*sibcall_brcl): Use %K for jg. (*sibcall_value_brc): Use %K for j. (*sibcall_value_brcl): Use %K for jg. (*bras): Use %K. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (main_base_64): Use %K for larl. (reload_base_64): Likewise. (@split_stack_call): Use %K for jg. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call31-z10-pic-nodatarel.c: New test. * gcc.target/s390/call31-z10-pic.c: New test. * gcc.target/s390/call31-z10.c: New test. * gcc.target/s390/call31-z9-pic-nodatarel.c: New test. * gcc.target/s390/call31-z9-pic.c: New test. * gcc.target/s390/call31-z9.c: New test. * gcc.target/s390/call64-z10-pic-nodatarel.c: New test. * gcc.target/s390/call64-z10-pic.c: New test. * gcc.target/s390/call64-z10.c: New test. * gcc.target/s390/call64-z9-pic-nodatarel.c: New test. * gcc.target/s390/call64-z9-pic.c: New test. * gcc.target/s390/call64-z9.c: New test. --- gcc/config/s390/s390.c| 9 + gcc/config/s390/s390.md | 26 ++--- gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/gcc.target/s390/call.h | 38 +++ .../s390/call31-z10-pic-nodatarel.c | 16 .../gcc.target/s390/call31-z10-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 .../s390/call64-z10-pic-nodatarel.c | 17 + .../gcc.target/s390/call64-z10-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z10.c| 15 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9.c | 15 gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 + gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +-- 19 files changed, 258 insertions(+), 43 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..e7839044a40 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr) 'E': print opcode suffix for branch on index instruction. 'G': print the size of the operand in bytes. 'J': print tls_load/tls_gdcall/tls_ldcall suffix +'K': print @PLT suffix for call targets and load address values. 'M': print the second word of a TImode operand. 'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference or address. @@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code) case 'Y': print_shift_count_operand (file, x); return; + +case 'K': + if (TARGET_64BIT + && flag_pic + && GET_CODE (x) == SYMBOL_REF + && SYMBOL_REF_FUNCTION_P (x)) + fprintf (file, "@PLT"); + return
[PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas). I've also noticed that one of the nops that we generate for -mnop-mcount is not needed now and removed it. A couple tests needed to be adjusted after that. s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. gcc/testsuite/ChangeLog: * gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new prologue size. * gcc.target/s390/mnop-mcount-m64.c: Likewise. --- gcc/config/s390/s390.c| 42 +++ gcc/config/s390/s390.h| 2 + .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- 4 files changed, 20 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..590dd8f35bc 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13110,33 +13110,25 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) { - rtx op[8]; - - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); + rtx op[4]; fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); - op[7] = GEN_INT (UNITS_PER_LONG); - - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; + op[3] = GEN_INT (UNITS_PER_LONG); - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); + op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { - op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT); - op[4] = gen_rtx_CONST (Pmode, op[4]); + op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT); + op[2] = gen_rtx_CONST (Pmode, op[2]); } if (flag_record_mcount) @@ -13150,20 +13142,19 @@ s390_function_profiler (FILE *file, int labelno) warning (OPT_Wcannot_profile, "nested functions cannot be profiled " "with %<-mfentry%> on s390"); else - output_asm_insn ("brasl\t0,%4", op); + output_asm_insn ("brasl\t0,%2", op); } else if (TARGET_64BIT) { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 + -/* brasl */ 3 + /* lg */ 3); + output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 + +/* lg */ 3); else { output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); @@ -13172,15 +13163,14 @@ s390_function_profiler (FILE *file, int labelno) else { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 + -/* brasl */ 3 + /* l */ 2); + output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 + +/* l */ 2); else { output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines
[PATCH] IBM Z: Define NO_PROFILE_COUNTERS
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. --- gcc/config/s390/s390.c | 14 ++ gcc/config/s390/s390.h | 2 ++ 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..96c9a9db53b 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int /* labelno */) { rtx op[8]; - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); - fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); @@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno) op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); op[7] = GEN_INT (UNITS_PER_LONG); - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { @@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno) output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); output_asm_insn ("brasl\t%0,%4", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) @@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno) output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); output_asm_insn ("brasl\t%0,%4", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines for nested functions. */ -- 2.31.1
[PATCH] IBM Z: Remove match_scratch workaround
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Since commit dd1ef00c45ba ("Fix bug in the define_subst handling that made match_scratch unusable for multi-alternative patterns.") the workaround for that bug in *ashrdi3_31 is not only no longer necessary, but actually breaks the build. Get rid of it by using only one alternative in (match_scratch). It will be replicated as many times as needed in order to match the pattern with which (define_subst) is used. gcc/ChangeLog: * config/s390/s390.md(*ashrdi3_31): Use a single constraint. * config/s390/subst.md(cconly_subst): Use a single constraint in (match_scratch). gcc/testsuite/ChangeLog: * gcc.target/s390/ashr.c: New test. --- gcc/config/s390/s390.md | 14 -- gcc/config/s390/subst.md | 2 +- gcc/testsuite/gcc.target/s390/ashr.c | 11 +++ 3 files changed, 16 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/ashr.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 7faf775fbf2..0c5b4dc9029 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -9328,19 +9328,13 @@ "" "") -; FIXME: The number of alternatives is doubled here to match the fix -; number of 2 in the subst pattern for the (clobber (match_scratch... -; The right fix should be to support match_scratch in the output -; pattern of a define_subst. (define_insn "*ashrdi3_31" - [(set (match_operand:DI 0 "register_operand" "=d, d") -(ashiftrt:DI (match_operand:DI 1 "register_operand" "0, 0") - (match_operand:QI 2 "shift_count_operand" "jsc,jsc"))) + [(set (match_operand:DI 0 "register_operand" "=d") +(ashiftrt:DI (match_operand:DI 1 "register_operand" "0") + (match_operand:QI 2 "shift_count_operand" "jsc"))) (clobber (reg:CC CC_REGNUM))] "!TARGET_ZARCH" - "@ - srda\t%0,%Y2 - srda\t%0,%Y2" + "srda\t%0,%Y2" [(set_attr "op_type" "RS") (set_attr "atype" "reg")]) diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md index 384af11c198..3ea6fc40ba8 100644 --- a/gcc/config/s390/subst.md +++ b/gcc/config/s390/subst.md @@ -45,7 +45,7 @@ "s390_match_ccmode(insn, CCSmode)" [(set (reg CC_REGNUM) (compare (match_dup 1) (const_int 0))) - (clobber (match_scratch:DSI 0 "=d,d"))]) + (clobber (match_scratch:DSI 0 "=d"))]) (define_subst_attr "cconly" "cconly_subst" "" "_cconly") diff --git a/gcc/testsuite/gcc.target/s390/ashr.c b/gcc/testsuite/gcc.target/s390/ashr.c new file mode 100644 index 000..8cffdfa9a1d --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/ashr.c @@ -0,0 +1,11 @@ +/* Test the arithmetic shift right pattern. */ + +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int e(void); + +int f (long c, int b) +{ + return (c >> b) && e (); +} -- 2.31.1
Re: [PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()
On Fri, 2021-04-30 at 08:49 +0200, Andreas Krebbel wrote: > On 4/28/21 3:48 AM, Ilya Leoshkevich wrote: > > Bootstrapped and regtested on s390x-redhat-linux. Tested with > > valgrind > > too (PR 100278 is now fixed). Ok for master? > > > > v1: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html > > v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is > > more > > on the "obviously correct" side than gen_raw_SUBREG(). > > > > > > > > gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard > > registers, > > since the subregs they create do not pass validation. Change > > s390_md_asm_adjust() to manually copy between hard VRs and FPRs > > instead > > of using these two functions. > > > > gcc/ChangeLog: > > > > PR target/100217 > > * config/s390/s390.c (s390_hard_fp_reg_p): New function. > > (s390_md_asm_adjust): Handle hard registers. > > > > gcc/testsuite/ChangeLog: > > > > PR target/100217 > > * gcc.target/s390/vector/long-double-asm-in-out-hard-fp- > > reg.c: New test. > > * gcc.target/s390/vector/long-double-asm-inout-hard-fp- > > reg.c: New test. > > Ok. Thanks! > > Andreas Thanks! I forgot to ask: ok for gcc-11 branch?
[PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()
Bootstrapped and regtested on s390x-redhat-linux. Tested with valgrind too (PR 100278 is now fixed). Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is more on the "obviously correct" side than gen_raw_SUBREG(). gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers, since the subregs they create do not pass validation. Change s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead of using these two functions. gcc/ChangeLog: PR target/100217 * config/s390/s390.c (s390_hard_fp_reg_p): New function. (s390_md_asm_adjust): Handle hard registers. gcc/testsuite/ChangeLog: PR target/100217 * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test. * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test. --- gcc/config/s390/s390.c| 52 +-- .../long-double-asm-in-out-hard-fp-reg.c | 33 .../long-double-asm-inout-hard-fp-reg.c | 31 +++ 3 files changed, 112 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index a9c945c5ee9..88361f98c7e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint) return seen_f_p && !seen_v_p; } +/* Return TRUE iff X is a hard floating-point (and not a vector) register. */ + +static bool +s390_hard_fp_reg_p (rtx x) +{ + if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x))) +return false; + + tree decl = REG_EXPR (x); + if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl))) +return false; + + const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + return name[0] == '*' && name[1] == 'f'; +} + /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" constraints when long doubles are stored in vector registers. */ @@ -16787,9 +16804,24 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, gcc_assert (allows_reg); gcc_assert (!is_inout); /* Copy output value from a FPR pair into a vector register. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); + rtx fprx2; push_to_sequence2 (after_md_seq, after_md_end); - emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + if (s390_hard_fp_reg_p (outputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i])); + /* The first half is already at the correct location, copy only the + * second one. Use the UNSPEC pattern instead of the SUBREG one, + * since s390_can_change_mode_class() rejects + * (subreg:DF (reg:TF %fN) 8) and thus subreg validation fails. */ + rtx v1 = gen_rtx_REG (V2DFmode, REGNO (outputs[i])); + rtx v3 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]) + 1); + emit_insn (gen_vec_permiv2df (v1, v1, v3, const0_rtx)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + } after_md_seq = get_insns (); after_md_end = get_last_insn (); end_sequence (); @@ -16813,8 +16845,20 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, continue; gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); - emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + rtx fprx2; + if (s390_hard_fp_reg_p (inputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i])); + /* Copy only the second half. */ + rtx v1 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]) + 1); + rtx v2 = gen_rtx_REG (V2DFmode, REGNO (inputs[i])); + emit_insn (gen_vec_permiv2df (v1, v2, v1, GEN_INT (3))); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + } inputs[i] = fprx2; input_modes[i] = FPRX2mode; } diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c new file mode 100644 index 000..2dcaf08f00b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ +/* { dg-do run { target { s390_z14_hw } } } */ +#include +#include + +__attribute__ ((noipa)) static long double +sqxbr (long double x) +{ + register long double in asm("f0") = x; + register long double out asm("f1"); + + asm("sqxbr\t%0,%1" :
[PATCH] IBM Z: Handle hard registers in s390_md_asm_adjust()
Bootstrapped and regtested on s390x-redhat-linux. Tested with valgrind on top of 52a5515ed (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278). Ok for master? gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers, since the subregs they create do not pass validation. Change s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead of using these two functions. gcc/ChangeLog: PR target/100217 * config/s390/s390.c (s390_hard_fp_reg_p): New function. (s390_md_asm_adjust): Handle hard registers. * config/s390/vector.md (*df_to_tf_1): New pattern. gcc/testsuite/ChangeLog: PR target/100217 * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test. * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test. --- gcc/config/s390/s390.c| 50 +-- gcc/config/s390/vector.md | 8 +++ .../long-double-asm-in-out-hard-fp-reg.c | 28 +++ .../long-double-asm-inout-hard-fp-reg.c | 27 ++ 4 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index a9c945c5ee9..ed6cea9b1f7 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint) return seen_f_p && !seen_v_p; } +/* Return TRUE iff X is a hard floating-point (and not a vector) register. */ + +static bool +s390_hard_fp_reg_p (rtx x) +{ + if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x))) +return false; + + tree decl = REG_EXPR (x); + if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl))) +return false; + + const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + return name[0] == '*' && name[1] == 'f'; +} + /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" constraints when long doubles are stored in vector registers. */ @@ -16787,9 +16804,23 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, gcc_assert (allows_reg); gcc_assert (!is_inout); /* Copy output value from a FPR pair into a vector register. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); + rtx fprx2; push_to_sequence2 (after_md_seq, after_md_end); - emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + if (s390_hard_fp_reg_p (outputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i])); + /* The first half is already at the correct location, copy only the + * second one. Use gen_rtx_raw_SUBREG() in order to skip subreg + * validation - we need to build (subreg:DF (reg:TF %fN) 8), which + * will otherwise be rejected by s390_can_change_mode_class(). */ + emit_move_insn (gen_rtx_raw_SUBREG (DFmode, outputs[i], 8), + simplify_gen_subreg (DFmode, fprx2, FPRX2mode, 8)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + } after_md_seq = get_insns (); after_md_end = get_last_insn (); end_sequence (); @@ -16813,8 +16844,19 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, continue; gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); - emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + rtx fprx2; + if (s390_hard_fp_reg_p (inputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i])); + /* Copy only the second half. */ + emit_move_insn (gen_rtx_raw_SUBREG (DFmode, fprx2, 8), + gen_rtx_raw_SUBREG (DFmode, inputs[i], 8)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + } inputs[i] = fprx2; input_modes[i] = FPRX2mode; } diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index c80d582a300..648e00625e1 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -634,6 +634,14 @@ } [(set_attr "op_type" "VRR,*")]) +(define_insn "*df_to_tf_1" + [(set (subreg:DF (match_operand:TF 0 "nonimmediate_operand" "+v") 8) + (match_operand:DF1 "general_operand" "f"))] + "TARGET_VXE" + ; M4 == 0 corresponds to %v0[0] = %v0[0]; %v0[1] = %v1[0]; + "vpdi\t%v0,%v0,%v1,0" + [(set_attr "op_type" "VRR")]) + (define_insn "*vec_ti_to_v1ti" [(set (match_operand:V1TI 0 "nonimmediate_operand" "=v,v,R, v, v,v") (vec_duplicate:V1TI (match_operand:TI 1 "general_operand" "v,R,v,j00,jm1,d")))] diff --git a/gcc/testsuite/gcc.tar
Re: [PATCH v3] fwprop: Fix single_use_p calculation
On Tue, 2021-03-23 at 12:48 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > +inline use_info * > > +set_info::single_nondebug_use () const > > +{ > > + use_info *nondebug_insn = single_nondebug_insn_use (); > > + if (nondebug_insn) > > + return has_phi_uses () ? nullptr : nondebug_insn; > > + use_info *phi = single_phi_use (); > > + if (phi) > > + return has_nondebug_insn_uses() ? nullptr : phi; > > + return nullptr; > > Very minor, but I think this is simpler as: > > if (!has_phi_uses ()) > return single_nondebug_insn_use (); > if (!has_nondebug_insn_uses ()) > return single_phi_use (); > return nullptr; > > OK with that change (or without if you prefer the original). > Thanks for the fix and for your patience. :-) > > Richard Retested with the change above and pushed as: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b61461ac7f9bdd0e98145be79423d19b933afaa0 Thanks for all the suggestions! Best regards, Ilya
[PATCH v3] fwprop: Fix single_use_p calculation
Bootstrap and regtest running on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html v1 -> v2: Pass a set_info instead of a def_info around. Add single_nondebug_insn_use () - maybe this could be improved further? [1] Simplify def->insn ()->ebb (). Improve formatting. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567121.html v2 -> v3: Introduce single_nondebug_use and single_phi_use methods. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html --- Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at set_info's uses instead of insn_info's uses. This requires passing around set_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (fwprop_propagation::fwprop_propagation): Look at set_info's uses. (try_fwprop_subst_note): Use set_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * rtl-ssa/accesses.h (set_info::single_nondebug_use) New method. (set_info::single_nondebug_insn_use): Likewise. (set_info::single_phi_use): Likewise. * rtl-ssa/member-fns.inl (set_info::single_nondebug_use) New method. (set_info::single_nondebug_insn_use): Likewise. (set_info::single_phi_use): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: New test. --- gcc/fwprop.c | 81 +-- gcc/rtl-ssa/accesses.h| 13 +++ gcc/rtl-ssa/member-fns.inl| 30 +++ .../s390/vector/long-double-asm-abi.c | 26 ++ 4 files changed, 109 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..d7203672886 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, set_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,13 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + set_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def->single_nondebug_use ()), +single_ebb_p (use_insn->ebb () == def->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,24 +368,25 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. - If REQUIRE_CONSTANT is true, require all substituted occurences of SRC + If REQUIRE_CONSTANT is true, require all substituted occurrences of SRC to fold to a constant, so that the note does not use any more registers than it did previously. If REQUIRE_CONSTANT is false, also allow the substitution if it's something we'd normally allow for the main instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, set_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -
Re: [PATCH] fwprop: Fix single_use_p calculation
On Mon, 2021-03-22 at 22:55 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote: > > > Ilya Leoshkevich writes: > > > > [...] > > > > > > Do you still want me to add single_nondebug_use() for > > > > completeness > > > > in > > > > this patch, or would it be better to add it later when it's > > > > actually > > > > needed? > > > > > > I was thinking that the fwprop.c code would use > > > def->single_nondebug_use () instead of > > > def->single_nondebug_insn_use () && !def->has_phi_uses (). > > > > But these two are not equivalent, are they? single_nondebug_use() > > that you proposed explicitly allows phis: > > > > // If there is exactly one nondebug use of the set's result, > > // return that use, otherwise return null. The use might be in > > // instruction or a phi node. > > use_info *single_nondebug_use () const; > > > > but I don't think we want to propagate into phis here. > > Or should the check be a bit bigger, like the following? > > But we're in the process of substituting the definition into an > insn use. So we know that an insn use exists. I think the > question we're trying to answer is: is this insn use the only > nondebug use? I'd rather test that with a single accessor rather > than break it down into individual data structure tests. Ah, you are absolutely right - now I get it. Please ignore the v2 then, I will send a v3.
[PATCH] fwprop: Fix single_use_p calculation
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html v1 -> v2: Pass a set_info instead of a def_info around. Add single_nondebug_insn_use () - maybe this could be improved further? [1] Simplify def->insn ()->ebb (). Improve formatting. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html --- Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at set_info's uses instead of insn_info's uses. This requires passing around set_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (fwprop_propagation::fwprop_propagation): Look at set_info's uses. (try_fwprop_subst_note): Use set_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * rtl-ssa/accesses.h (set_info::single_nondebug_insn_use): New method. * rtl-ssa/member-fns.inl (set_info::single_nondebug_insn_use): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: New test. --- gcc/fwprop.c | 79 +-- gcc/rtl-ssa/accesses.h| 4 + gcc/rtl-ssa/member-fns.inl| 9 +++ .../s390/vector/long-double-asm-abi.c | 26 ++ 4 files changed, 78 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..6173c9248eb 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, set_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,13 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + set_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def->single_nondebug_insn_use () && !def->has_phi_uses ()), +single_ebb_p (use_insn->ebb () == def->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,9 +368,9 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. If REQUIRE_CONSTANT is true, require all substituted occurences of SRC to fold to a constant, so that the note does not use any more registers @@ -379,13 +379,14 @@ contains_paradoxical_subreg_p (rtx x) instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, set_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -436,19 +437,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, return prop.num_replacements; } -/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of +/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of USE_INSN's pattern. Return true on success, otherwise leave USE_INSN unchanged. */ static bool try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, -
Re: [PATCH] fwprop: Fix single_use_p calculation
On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: [...] > > Do you still want me to add single_nondebug_use() for completeness > > in > > this patch, or would it be better to add it later when it's > > actually > > needed? > > I was thinking that the fwprop.c code would use > def->single_nondebug_use () instead of > def->single_nondebug_insn_use () && !def->has_phi_uses (). But these two are not equivalent, are they? single_nondebug_use() that you proposed explicitly allows phis: // If there is exactly one nondebug use of the set's result, // return that use, otherwise return null. The use might be in // instruction or a phi node. use_info *single_nondebug_use () const; but I don't think we want to propagate into phis here. Or should the check be a bit bigger, like the following? use_info *single = def->single_nondebug_use (); single_use_p = single && !single->is_in_phi (); [...] Best regards, Ilya
Re: [PATCH] fwprop: Fix single_use_p calculation
On Sun, 2021-03-21 at 13:19 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat- > > linux > > and s390x-redhat-linux. Ok for master? > > Given what was said downthread, I agree we should fix this for GCC > 11. > Sorry for missing this problem in the initial review. > > > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) > > simplifications") > > introduced a check that was supposed to look at the propagated > > def's > > number of uses. It uses insn_info::num_uses (), which in reality > > returns the number of uses def's insn has. The whole change > > therefore > > works only by accident. > > > > Fix by looking at def_info's uses instead of insn_info's uses. > > This > > requires passing around def_info instead of insn_info. > > > > gcc/ChangeLog: > > > > 2021-03-02 Ilya Leoshkevich > > > > * fwprop.c (def_has_single_use_p): New function. > > (fwprop_propagation::fwprop_propagation): Look at > > def_info's uses. > > (try_fwprop_subst_note): Use def_info instead of insn_info. > > (try_fwprop_subst_pattern): Likewise. > > (try_fwprop_subst_notes): Likewise. > > (try_fwprop_subst): Likewise. > > (forward_propagate_subreg): Likewise. > > (forward_propagate_and_simplify): Likewise. > > (forward_propagate_into): Likewise. > > * iterator-utils.h (single_element_p): New function. > > --- > > gcc/fwprop.c | 89 ++-- > > > > gcc/iterator-utils.h | 10 + > > 2 files changed, 62 insertions(+), 37 deletions(-) > > > > diff --git a/gcc/fwprop.c b/gcc/fwprop.c > > index 4b8a554e823..478dcdd96cc 100644 > > --- a/gcc/fwprop.c > > +++ b/gcc/fwprop.c > > @@ -175,7 +175,7 @@ namespace > > static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; > > static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; > > > > - fwprop_propagation (insn_info *, insn_info *, rtx, rtx); > > + fwprop_propagation (insn_info *, def_info *, rtx, rtx); > > use->def () returns a set_info *, and since you want set_info stuff, > I think it would probably be better to pass around a set_info * > instead. > (Let's keep the variable names the same though. “def” is still > accurate > and IMO the natural choice.) > > > @@ -191,13 +191,27 @@ namespace > > }; > > } > > > > -/* Prepare to replace FROM with TO in INSN. */ > > +/* Return true if DEF has a single non-debug non-phi use. */ > > + > > +static bool > > +def_has_single_use_p (def_info *def) > > +{ > > + if (!is_a (def)) > > + return false; > > + > > + set_info *set = as_a (def); > > + > > + return single_element_p (set->nondebug_insn_uses ()) > > + && !set->has_phi_uses (); > > I think instead we should add: > > // If exactly one nondebug instruction uses the set's result, > return > // the use by that instruction, otherwise return null. > use_info *single_nondebug_insn_use () const; > > // If there is exactly one nondebug use of the set's result, > // return that use, otherwise return null. The use might be in > // instruction or a phi node. > use_info *single_nondebug_use () const; > > before the declaration of set_info::is_local_to_ebb. > > > +} > > + > > +/* Prepare to replace FROM with TO in USE_INSN. */ > > > > fwprop_propagation::fwprop_propagation (insn_info *use_insn, > > - insn_info *def_insn, rtx > > from, rtx to) > > + def_info *def, rtx from, > > rtx to) > > : insn_propagation (use_insn->rtl (), from, to), > > - single_use_p (def_insn->num_uses () == 1), > > - single_ebb_p (use_insn->ebb () == def_insn->ebb ()) > > + single_use_p (def_has_single_use_p (def)), > > + single_ebb_p (use_insn->ebb () == def->insn ()->ebb ()) > > Just def->ebb () > > > @@ -538,7 +554,7 @@ try_fwprop_subst_pattern (obstack_watermark > > &attempt, insn_change &use_change, > > { > > if ((REG_NOTE_KIND (note) == REG_EQUAL > > || REG_NOTE_KIND (note) == REG_EQUIV) > > - && try_fwprop_subst_note (use_insn, def_insn, note, > > + && try_fwprop_subst_note (use_insn, def, note, > > dest, src, false) < 0) > > Very minor, sorry, but this now fits on one line. > > > @@ -584,10 +600,11 @@ try_fwprop_subst_notes (insn_info *use_insn, > > insn_info *def_insn, > > Return true on success, otherwise leave USE_INSN unchanged. */ > > > > static bool > > -try_fwprop_subst (use_info *use, insn_info *def_insn, > > +try_fwprop_subst (use_info *use, def_info *def, > > rtx *loc, rtx dest, rtx src) > > Same here. > > Thanks, > Richard Thanks for reviewing! I'm currently regtesting a v2. One thing though: I don't think we need single_nondebug_use() for this fix, only single_nondebug_insn_use() - the fwprop check that I'm now using is def->si
[PATCH] IBM Z: Fix "+fvm" constraint with long doubles
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? When a long double is passed to an asm statement with a "+fvm" constraint, a LRA loop occurs. This happens, because LRA chooses the widest register class in this case (VEC_REGS), but the code generated by s390_md_asm_adjust() always wants FP_REGS. Mismatching register classes cause infinite reloading. Fix by treating "fv" constraints as "v" in s390_md_asm_adjust(). gcc/ChangeLog: * config/s390/s390.c (f_constraint_p): Treat "fv" constraints as "v". gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test. --- gcc/config/s390/s390.c | 12 ++-- .../s390/vector/long-double-asm-fprvrmem.c | 11 +++ 2 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 151136bedbc..f7b1c03561e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode) static bool f_constraint_p (const char *constraint) { + bool seen_f_p = false; + bool seen_v_p = false; + for (size_t i = 0, c_len = strlen (constraint); i < c_len; i += CONSTRAINT_LEN (constraint[i], constraint + i)) { if (constraint[i] == 'f') - return true; + seen_f_p = true; + if (constraint[i] == 'v') + seen_v_p = true; } - return false; + + /* Treat "fv" constraints as "v", because LRA will choose the widest register + * class. */ + return seen_f_p && !seen_v_p; } /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c new file mode 100644 index 000..f95656c5723 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z14 -mzarch" } */ + +long double +foo (long double x) +{ + x = x * x; + asm("# %0" : "+fvm"(x)); + x = x + x; + return x; +} -- 2.29.2
[PATCH v3] IBM Z: Fix usage of "f" constraint with long doubles
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html v1 -> v2: - Handle constraint modifiers, use AR constraint instead of R, add testcases for & and %. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html v2 -> v3: - The main prereq is now committed: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566237.html - Dropped long-double-asm-abi.c test, because its prereq is not approved (yet): https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566218.html - Removed superfluous constraint pointer increment. After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (f_constraint_p): New function. (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-commutative.c: New test. * gcc.target/s390/vector/long-double-asm-earlyclobber.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-asm-matching.c: New test. * gcc.target/s390/vector/long-double-asm-regmem.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 86 +++ .../s390/vector/long-double-asm-commutative.c | 16 .../vector/long-double-asm-earlyclobber.c | 17 .../s390/vector/long-double-asm-in-out.c | 14 +++ .../s390/vector/long-double-asm-inout.c | 14 +++ .../s390/vector/long-double-asm-matching.c| 13 +++ .../s390/vector/long-double-asm-regmem.c | 8 ++ .../vector/long-double-volatile-from-i64.c| 22 + 8 files changed, 190 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-regmem.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index f3d0d1ba596..68dc3c58c1b 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16698,6 +16698,89 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional + modifiers. */ + +static bool +f_constraint_p (const char *constraint) +{ + for (size_t i = 0, c_len = strlen (constraint); i < c_len; + i += CONSTRAINT_LEN (constraint[i], constraint + i)) +{ + if (constraint[i] == 'f') + return true; +} + return false; +} + +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (!f_constraint_p (constraint)) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!is_inout); + /* Copy output value from a FPR pair into a vector register. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + push_to_sequence2
Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
On Wed, 2021-03-03 at 21:26 +0100, Ilya Leoshkevich via Gcc-patches wrote: > On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote: > > > > > > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote: > > > Hello, > > > > > > I would like to ping the following patch: > > > > > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html > > > > > > It is needed for the following regression fix: > > > > > > IBM Z: Fix usage of "f" constraint with long doubles > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html > > > > > > > > > Jakub, who would be the right person to review this change? I've > > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows > > > that > > > you deal with this code a lot. > > > > > > Best regards, > > > Ilya > > > > > > > > > > > > > > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which > > > should be ok as long as the hook itself as well as after_md_seq > > > make up > > > for it), input_mode will contain stale information. > > > > > > It might be tempting to fix this by removing input_mode altogether > > > and > > > just using GET_MODE (), but this will not work correctly with > > > constants. > > > So add input_modes parameter and document that it should be updated > > > whenever inputs parameter is updated. > > > > > > gcc/ChangeLog: > > > > > > 2021-01-05 Ilya Leoshkevich > > > > > > * cfgexpand.c (expand_asm_loc): Pass new parameter. > > > (expand_asm_stmt): Likewise. > > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add > > > new > > > parameter. > > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. > > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. > > > * config/cris/cris.c (cris_md_asm_adjust): Likewise. > > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise. > > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust): > > > Likewise. > > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. > > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. > > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. > > > * config/vax/vax.c (vax_md_asm_adjust): Likewise. > > > * config/visium/visium.c (visium_md_asm_adjust): Likewise. > > > * target.def (md_asm_adjust): Likewise. > > Ugh. A couple questions > > Are there any cases where you're going to want to change modes for > > arguments that were constants? I'm a bit surprised that we don't > > have > > a mode for constants for the cases that we care about. Presumably we > > can get a (modeless) CONST_INT here and we're not restricted to > > CONST_DOUBLE and friends (which have modes). > > Yes, this might happen. For example, here: > > asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L)); > > the (const_double) and the corresponding operand will initially have > the mode TFmode. s390_md_asm_adjust () will add a conversion from > TFmode to FPRX2mode and change the argument accordingly. Just to be more precise: the mode of the (const_double) itself will not change. Here is the resulting RTL for the asm statement above: # s390_md_asm_adjust () step 1: put the (const_double) operand into a # new (reg) with the same mode (insn (set (reg:TF 63) (const_double:TF ...))) # s390_md_asm_adjust () step 2: convert a reg from TFmode to FPRX2mode (insn (set (reg:FPRX2 65) (subreg:FPRX2 (reg:TF 63) 0))) # s390_md_asm_adjust () step 3: replace the original operand with the # resulting (reg), adjust (asm_input) accordingly (insn (set (reg:FPRX2 64) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(reg:FPRX2 65)] [(asm_input:FPRX2 ("f"))])))
Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote: > > > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote: > > Hello, > > > > I would like to ping the following patch: > > > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html > > > > It is needed for the following regression fix: > > > > IBM Z: Fix usage of "f" constraint with long doubles > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html > > > > > > Jakub, who would be the right person to review this change? I've > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows > > that > > you deal with this code a lot. > > > > Best regards, > > Ilya > > > > > > > > > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which > > should be ok as long as the hook itself as well as after_md_seq > > make up > > for it), input_mode will contain stale information. > > > > It might be tempting to fix this by removing input_mode altogether > > and > > just using GET_MODE (), but this will not work correctly with > > constants. > > So add input_modes parameter and document that it should be updated > > whenever inputs parameter is updated. > > > > gcc/ChangeLog: > > > > 2021-01-05 Ilya Leoshkevich > > > > * cfgexpand.c (expand_asm_loc): Pass new parameter. > > (expand_asm_stmt): Likewise. > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add > > new > > parameter. > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. > > * config/cris/cris.c (cris_md_asm_adjust): Likewise. > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise. > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust): > > Likewise. > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. > > * config/vax/vax.c (vax_md_asm_adjust): Likewise. > > * config/visium/visium.c (visium_md_asm_adjust): Likewise. > > * target.def (md_asm_adjust): Likewise. > Ugh. A couple questions > Are there any cases where you're going to want to change modes for > arguments that were constants? I'm a bit surprised that we don't > have > a mode for constants for the cases that we care about. Presumably we > can get a (modeless) CONST_INT here and we're not restricted to > CONST_DOUBLE and friends (which have modes). Yes, this might happen. For example, here: asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L)); the (const_double) and the corresponding operand will initially have the mode TFmode. s390_md_asm_adjust () will add a conversion from TFmode to FPRX2mode and change the argument accordingly. However, this is not the problematic case that I refer to in the commit message: I caught some failures in the testsuite that I tracked down to (const_int)s, which, like you mentioned, don't have a mode. > Is input_modes read after the call to md_asm_adjust? I'm trying to > figure out why we'd need to update it. Yes, its contents goes into (asm_operand)'s (asm_input). If we don't adjust it, (asm_input)s will no longer be consistent with input operand RTXes. > Not acking or naking at this point, I just want to make sure I > understand what's going on. > > jeff
Re: [PATCH] fwprop: Fix single_use_p calculation
On Wed, 2021-03-03 at 11:34 -0700, Jeff Law wrote: > > > On 3/2/21 3:37 PM, Ilya Leoshkevich via Gcc-patches wrote: > > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat- > > linux > > and s390x-redhat-linux. Ok for master? > > > > > > > > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) > > simplifications") > > introduced a check that was supposed to look at the propagated > > def's > > number of uses. It uses insn_info::num_uses (), which in reality > > returns the number of uses def's insn has. The whole change > > therefore > > works only by accident. > > > > Fix by looking at def_info's uses instead of insn_info's uses. > > This > > requires passing around def_info instead of insn_info. > > > > gcc/ChangeLog: > > > > 2021-03-02 Ilya Leoshkevich > > > > * fwprop.c (def_has_single_use_p): New function. > > (fwprop_propagation::fwprop_propagation): Look at > > def_info's uses. > > (try_fwprop_subst_note): Use def_info instead of insn_info. > > (try_fwprop_subst_pattern): Likewise. > > (try_fwprop_subst_notes): Likewise. > > (try_fwprop_subst): Likewise. > > (forward_propagate_subreg): Likewise. > > (forward_propagate_and_simplify): Likewise. > > (forward_propagate_into): Likewise. > > * iterator-utils.h (single_element_p): New function. > Given we're well into stage4, I'd recommend deferring to gcc-12 > unless > this fixes a code correctness issue. > > Jeff > Fortunately the issue here is not a miscompilation, but it's still a regression: on s390 small functions that use long doubles get a number of useless load/stores as well as a stack frame, where none was required before. Basically, the same issue efb6bc55a93a failed to fully fix due to the num_uses() / nondebug_insn_uses() mixup.
Re: [PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+
On Wed, 2021-03-03 at 07:50 +0100, Andreas Krebbel wrote: > On 3/2/21 11:59 PM, Ilya Leoshkevich wrote: > > mul-signed-overflow-*.c execution tests fail on z13, because they > > contain z14-specific instructions. Fix by requiring s390_z14_hw > > target. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/s390/mul-signed-overflow-1.c: Run only on > > z14+. > > * gcc.target/s390/mul-signed-overflow-2.c: Likewise. > > I did that change yesterday already. Ah, I haven't noticed. One difference between our patches is, though, that I also have `dg-do compile` - this way, compile tests still run on z13. [...]
[PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html Jakub, who would be the right person to review this change? I've decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows that you deal with this code a lot. Best regards, Ilya If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which should be ok as long as the hook itself as well as after_md_seq make up for it), input_mode will contain stale information. It might be tempting to fix this by removing input_mode altogether and just using GET_MODE (), but this will not work correctly with constants. So add input_modes parameter and document that it should be updated whenever inputs parameter is updated. gcc/ChangeLog: 2021-01-05 Ilya Leoshkevich * cfgexpand.c (expand_asm_loc): Pass new parameter. (expand_asm_stmt): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new parameter. * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. * config/cris/cris.c (cris_md_asm_adjust): Likewise. * config/i386/i386.c (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. * config/vax/vax.c (vax_md_asm_adjust): Likewise. * config/visium/visium.c (visium_md_asm_adjust): Likewise. * target.def (md_asm_adjust): Likewise. --- gcc/cfgexpand.c | 16 gcc/config/arm/aarch-common-protos.h | 8 gcc/config/arm/aarch-common.c| 7 --- gcc/config/arm/arm.c | 14 -- gcc/config/cris/cris.c | 7 --- gcc/config/i386/i386.c | 7 --- gcc/config/mn10300/mn10300.c | 7 --- gcc/config/nds32/nds32.c | 1 + gcc/config/pdp11/pdp11.c | 9 + gcc/config/rs6000/rs6000.c | 7 --- gcc/config/vax/vax.c | 3 ++- gcc/config/visium/visium.c | 12 +++- gcc/doc/tm.texi | 10 ++ gcc/target.def | 13 - 14 files changed, 69 insertions(+), 52 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index aef9e916fcd..a6b48d3e48f 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2880,6 +2880,7 @@ expand_asm_loc (tree string, int vol, location_t locus) rtx asm_op, clob; unsigned i, nclobbers; auto_vec input_rvec, output_rvec; + auto_vec input_mode; auto_vec constraints; auto_vec clobber_rvec; HARD_REG_SET clobbered_regs; @@ -2889,9 +2890,8 @@ expand_asm_loc (tree string, int vol, location_t locus) clobber_rvec.safe_push (clob); if (targetm.md_asm_adjust) - targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); + targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, + constraints, clobber_rvec, clobbered_regs); asm_op = body; nclobbers = clobber_rvec.length (); @@ -3068,8 +3068,8 @@ expand_asm_stmt (gasm *stmt) return; } - /* There are some legacy diagnostics in here, and also avoids a - sixth parameger to targetm.md_asm_adjust. */ + /* There are some legacy diagnostics in here, and also avoids an extra + parameter to targetm.md_asm_adjust. */ save_input_location s_i_l(locus); unsigned noutputs = gimple_asm_noutputs (stmt); @@ -3420,9 +3420,9 @@ expand_asm_stmt (gasm *stmt) the flags register. */ rtx_insn *after_md_seq = NULL; if (targetm.md_asm_adjust) -after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); +after_md_seq + = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, +constraints, clobber_rvec, clobbered_regs); /* Do not allow the hook to change the output and input count, lest it mess up the operand numbering. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 7a9cf3d324c..b6171e8668d 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -144,9 +144,9 @@ struct cpu_cost_table const struct vector_cost_table vect; }; -rtx_insn * -arm_md_as
[PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+
mul-signed-overflow-*.c execution tests fail on z13, because they contain z14-specific instructions. Fix by requiring s390_z14_hw target. gcc/testsuite/ChangeLog: * gcc.target/s390/mul-signed-overflow-1.c: Run only on z14+. * gcc.target/s390/mul-signed-overflow-2.c: Likewise. --- gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c | 3 ++- gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c index fdf56d6e695..e8b1938dab7 100644 --- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c +++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c @@ -1,4 +1,5 @@ -/* { dg-do run } */ +/* { dg-do compile } */ +/* { dg-do run { target { s390_z14_hw } } } */ /* z14 only because we need msrkc, msc, msgrkc, msgc */ /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c index d0088188aa2..01328e1d286 100644 --- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c +++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c @@ -1,4 +1,5 @@ -/* { dg-do run } */ +/* { dg-do compile } */ +/* { dg-do run { target { s390_z14_hw } } } */ /* z14 only because we need msrkc, msc, msgrkc, msgc */ /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ -- 2.29.2
[PATCH] fwprop: Fix single_use_p calculation
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at def_info's uses instead of insn_info's uses. This requires passing around def_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (def_has_single_use_p): New function. (fwprop_propagation::fwprop_propagation): Look at def_info's uses. (try_fwprop_subst_note): Use def_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * iterator-utils.h (single_element_p): New function. --- gcc/fwprop.c | 89 ++-- gcc/iterator-utils.h | 10 + 2 files changed, 62 insertions(+), 37 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..478dcdd96cc 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, def_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,27 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Return true if DEF has a single non-debug non-phi use. */ + +static bool +def_has_single_use_p (def_info *def) +{ + if (!is_a (def)) +return false; + + set_info *set = as_a (def); + + return single_element_p (set->nondebug_insn_uses ()) +&& !set->has_phi_uses (); +} + +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + def_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def_has_single_use_p (def)), +single_ebb_p (use_insn->ebb () == def->insn ()->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,9 +382,9 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. If REQUIRE_CONSTANT is true, require all substituted occurences of SRC to fold to a constant, so that the note does not use any more registers @@ -379,13 +393,14 @@ contains_paradoxical_subreg_p (rtx x) instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, def_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -436,19 +451,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, return prop.num_replacements; } -/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of +/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of USE_INSN's pattern. Return true on success, otherwise leave USE_INSN unchanged. */ static bool try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, - insn_info *def_insn, rtx *loc, rtx dest, rtx src) + def_info *def, rtx *loc, rtx dest, rtx src) { insn_info *use_insn = use_change.insn (); rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_pattern (loc)) { if (dump_f
[PATCH 2/2] IBM Z: Fix long double <-> DFP conversions
When switching the s390 backend to store long doubles in vector registers, the patterns for long double <-> DFP conversions were forgotten. This did not cause observable problems so far, because libdfp calls are emitted instead of pfpo. However, when building libdfp itself, this leads to infinite recursion. gcc/ChangeLog: * config/s390/vector.md (trunctf2_vr): New pattern. (trunctf2): Likewise. (trunctdtf2_vr): Likewise. (trunctdtf2): Likewise. (extendtf2_vr): Likewise. (extendtf2): Likewise. (extendtftd2_vr): Likewise. (extendtftd2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-from-decimal128.c: New test. * gcc.target/s390/vector/long-double-from-decimal32.c: New test. * gcc.target/s390/vector/long-double-from-decimal64.c: New test. * gcc.target/s390/vector/long-double-to-decimal128.c: New test. * gcc.target/s390/vector/long-double-to-decimal32.c: New test. * gcc.target/s390/vector/long-double-to-decimal64.c: New test. --- gcc/config/s390/vector.md | 72 +++ .../s390/vector/long-double-from-decimal128.c | 20 ++ .../s390/vector/long-double-from-decimal32.c | 20 ++ .../s390/vector/long-double-from-decimal64.c | 20 ++ .../s390/vector/long-double-to-decimal128.c | 19 + .../s390/vector/long-double-to-decimal32.c| 19 + .../s390/vector/long-double-to-decimal64.c| 19 + 7 files changed, 189 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index e48c965db00..bc52211c55e 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -2480,6 +2480,42 @@ "HAVE_TF (trunctfsf2)" { EXPAND_TF (trunctfsf2, 2); }) +(define_expand "trunctf2_vr" + [(match_operand:DFP_ALL 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP + && GET_MODE_SIZE (TFmode) > GET_MODE_SIZE (mode) + && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, operands[1])); + emit_insn (gen_truncfprx22 (operands[0], fprx2)); + DONE; +}) + +(define_expand "trunctf2" + [(match_operand:DFP_ALL 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "HAVE_TF (trunctf2)" + { EXPAND_TF (trunctf2, 2); }) + +(define_expand "trunctdtf2_vr" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:TD 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_trunctdfprx22 (fprx2, operands[1])); + emit_insn (gen_fprx2_to_tf (operands[0], fprx2)); + DONE; +}) + +(define_expand "trunctdtf2" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:TD 1 "nonimmediate_operand" "")] + "HAVE_TF (trunctdtf2)" + { EXPAND_TF (trunctdtf2, 2); }) + ; load lengthened (define_insn "extenddftf2_vr" @@ -2511,6 +2547,42 @@ "HAVE_TF (extendsftf2)" { EXPAND_TF (extendsftf2, 2); }) +(define_expand "extendtf2_vr" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:DFP_ALL 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP + && GET_MODE_SIZE (mode) < GET_MODE_SIZE (TFmode) + && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_extendfprx22 (fprx2, operands[1])); + emit_insn (gen_fprx2_to_tf (operands[0], fprx2)); + DONE; +}) + +(define_expand "extendtf2" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:DFP_ALL 1 "nonimmediate_operand" "")] + "HAVE_TF (extendtf2)" + { EXPAND_TF (extendtf2, 2); }) + +(define_expand "extendtftd2_vr" + [(match_operand:TD 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, operands[1])); + emit_insn (gen_extendfprx2td2 (operands[0], fprx2)); + DONE; +}) + +(define_expand "extendtftd2" + [(match_operand:TD 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "HAVE_TF (extendtftd2)" + { EXPAND_TF (extendtftd2, 2); }) + ; test data class (define_expand "signbittf2_vr" diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c b/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c new file mode 100644 index 000..3cd2c68f5c6 --- /dev/null +++ b/gcc/testsui
[PATCH 1/2] IBM Z: Improve FPRX2 <-> TF conversions
gcc/ChangeLog: * config/s390/vector.md (*fprx2_to_tf): Rename to fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. --- gcc/config/s390/vector.md | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 0e3c31f5d4f..e48c965db00 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -616,12 +616,23 @@ vlvgp\t%v0,%1,%N1" [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")]) -(define_insn "*fprx2_to_tf" - [(set (match_operand:TF 0 "nonimmediate_operand" "=v") - (subreg:TF (match_operand:FPRX2 1 "general_operand" "f") 0))] +(define_insn_and_split "fprx2_to_tf" + [(set (match_operand:TF 0 "nonimmediate_operand" "=v,AR") + (subreg:TF (match_operand:FPRX2 1 "general_operand" "f,f") 0))] "TARGET_VXE" - "vmrhg\t%v0,%1,%N1" - [(set_attr "op_type" "VRR")]) + "@ + vmrhg\t%v0,%1,%N1 + #" + "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] +{ + operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, 0); + operands[3] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 0); + operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, 8); + operands[5] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 8); +} + [(set_attr "op_type" "VRR,*")]) (define_insn "*vec_ti_to_v1ti" [(set (match_operand:V1TI 0 "nonimmediate_operand" "=v,v,R, v, v,v") @@ -753,6 +764,21 @@ "vpdi\t%V0,%v1,%V0,5" [(set_attr "op_type" "VRR")]) +(define_insn_and_split "tf_to_fprx2" + [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f") + (subreg:FPRX2 (match_operand:TF 1 "general_operand" "v,AR") 0))] + "TARGET_VXE" + "#" + "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] +{ + operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0); + operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0); + operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8); + operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8); +}) + ; vec_perm_const for V2DI using vpdi? ;; -- 2.29.2
[PATCH 0/2] IBM Z: Fix long double <-> DFP conversions
This series fixes PR99134. Patch 1 is factored out from the pending [1], patch 2 is the actual fix. Bootstrapped and regtested on s390x-redhat-linux. Ok for master? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html Ilya Leoshkevich (2): IBM Z: Improve FPRX2 <-> TF conversions IBM Z: Fix long double <-> DFP conversions gcc/config/s390/vector.md | 108 +- .../s390/vector/long-double-from-decimal128.c | 20 .../s390/vector/long-double-from-decimal32.c | 20 .../s390/vector/long-double-from-decimal64.c | 20 .../s390/vector/long-double-to-decimal128.c | 19 +++ .../s390/vector/long-double-to-decimal32.c| 19 +++ .../s390/vector/long-double-to-decimal64.c| 19 +++ 7 files changed, 220 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c -- 2.29.2
[PATCH] PING^2 Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html Best regards, Ilya
[PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg
Hello, I would like to ping the following patch: lra: clear lra_insn_recog_data after simplifying a mem subreg https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html Best regards, Ilya
[PATCH v2] IBM Z: Fix usage of "f" constraint with long doubles
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html v1 -> v2: Handle constraint modifiers, use AR constraint instead of R, add testcases for & and %. After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (f_constraint_p): New function. (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-abi.c: New test. * gcc.target/s390/vector/long-double-asm-commutative.c: New test. * gcc.target/s390/vector/long-double-asm-earlyclobber.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 88 +++ gcc/config/s390/vector.md | 36 ++-- .../s390/vector/long-double-asm-abi.c | 26 ++ .../s390/vector/long-double-asm-commutative.c | 16 .../vector/long-double-asm-earlyclobber.c | 17 .../s390/vector/long-double-asm-in-out.c | 14 +++ .../s390/vector/long-double-asm-inout.c | 14 +++ .../s390/vector/long-double-asm-matching.c| 13 +++ .../vector/long-double-volatile-from-i64.c| 22 + 9 files changed, 241 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9d2cee950d0..d4b098325e8 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16688,6 +16688,91 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional + modifiers. */ + +static bool +f_constraint_p (const char *constraint) +{ + for (size_t i = 0, c_len = strlen (constraint); i < c_len; + i += CONSTRAINT_LEN (constraint[i], constraint + i)) +{ + if (constraint[i] == 'f') + return true; +} + return false; +} + +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (!f_constraint_p (constraint + 1)) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + gcc_assert (!is_inout); + /* Copy output value from a FPR pair into a vector register. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + push_to_sequence2 (after_md_seq, after_md_end); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + after_md_seq = get_insns (); + after_md_end = get_last_insn (); + end_sequence (); + outputs[i] = fprx2; +} + + for (unsigned i = 0; i < ninputs; i++) +{ + if (GET_MODE (inputs[i]) != TFmode) + /* Not a long double - not
Re: [PATCH] IBM Z: Fix usage of "f" constraint with long doubles
On Wed, 2021-01-27 at 08:58 +0100, Andreas Krebbel wrote: > On 1/18/21 10:54 PM, Ilya Leoshkevich wrote: > ... > > > +static rtx_insn * > > +s390_md_asm_adjust (vec &outputs, vec &inputs, > > + vec &input_modes, > > + vec &constraints, vec & > > /*clobbers*/, > > + HARD_REG_SET & /*clobbered_regs*/) > > +{ > > + if (!TARGET_VXE) > > +/* Long doubles are stored in FPR pairs - nothing to do. */ > > +return NULL; > > + > > + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; > > + > > + unsigned ninputs = inputs.length (); > > + unsigned noutputs = outputs.length (); > > + for (unsigned i = 0; i < noutputs; i++) > > +{ > > + if (GET_MODE (outputs[i]) != TFmode) > > + /* Not a long double - nothing to do. */ > > + continue; > > + const char *constraint = constraints[i]; > > + bool allows_mem, allows_reg, is_inout; > > + bool ok = parse_output_constraint (&constraint, i, ninputs, > > noutputs, > > +&allows_mem, &allows_reg, > > &is_inout); > > + gcc_assert (ok); > > + if (strcmp (constraint, "=f") != 0) > > + /* Long double with a constraint other than "=f" - nothing to > > do. */ > > + continue; > > What about other constraint modifiers like & and %? Don't we need to > handle matching constraints as > well here? Oh, right - we need to account for %?!* and maybe some others. I'll j ust copy the code from parse_output_constraint() that skips over all of them, because I don't think they need any special handling - we just nee d to make sure they don't mess up the recognition of "=f". I don't think we need to explicitly support matching constraints, because parse_input_constraint() will resolve them for us. I'll add a test for this just in case. Do we make use of multi-alternative constraints on s390? I think not, because our instructions are fairly rigid, but maybe I'm missing something? ... > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > > index 0e3c31f5d4f..1332a65a1d1 100644 > > --- a/gcc/config/s390/vector.md > > +++ b/gcc/config/s390/vector.md > > @@ -616,12 +616,23 @@ (define_insn "*vec_tf_to_v1tf_vr" > > vlvgp\t%v0,%1,%N1" > >[(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")]) > > > > -(define_insn "*fprx2_to_tf" > > - [(set (match_operand:TF 0 "nonimmediate_operand" > > "=v") > > - (subreg:TF (match_operand:FPRX2 1 "general_operand" "f") > > 0))] > > +(define_insn_and_split "fprx2_to_tf" > > + [(set (match_operand:TF 0 "nonimmediate_operand" > > "=v,R") > > + (subreg:TF (match_operand:FPRX2 1 > > "general_operand" "f,f") 0))] > >"TARGET_VXE" > > - "vmrhg\t%v0,%1,%N1" > > - [(set_attr "op_type" "VRR")]) > > + "@ > > + vmrhg\t%v0,%1,%N1 > > + #" > > + "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" > > + [(set (match_dup 2) (match_dup 3)) > > + (set (match_dup 4) (match_dup 5))] > > +{ > > + operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, > > 0); > > + operands[3] = simplify_gen_subreg (DFmode, operands[1], > > FPRX2mode, 0); > > + operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, > > 8); > > + operands[5] = simplify_gen_subreg (DFmode, operands[1], > > FPRX2mode, 8); > > +} > > + [(set_attr "op_type" "VRR,*")]) > > Splitting an address like this might cause the displacement to > overflow in the second part. This > would require an additional reg to make the address valid again. > Which in turn will be a problem > after reload. You can use the 'AR' constraint for the memory > alternative. That way reload will make > sure the address is offsetable. Ok, thanks for the hint!
[PATCH v3] fwprop: Allow (subreg (mem)) simplifications
On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote: > Given what you said in the other message about combine, I agree this > is a reasonable workaround. I don't know whether it's suitable for > stage 4 or whether it would need to wait for stage 1. Thanks for reviewing! I've implemented your suggestions in the patch below. Regarding stage 4, this can be seen as a part of IBM Z https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html regression fix - before moving long doubles to vector registers and fixing up "f" constraints on RTL level, code generation for small glibc functions like __ieee754_sqrtl has been fairly efficient. Not sure if that issue is big enough to justify this common code change at this point, but still.. v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed formatting. Bootstrapped and regtested on x86_64-redhat-linux, pc64le-redhat-linux and s390x-redhat-linux. Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. --- gcc/fwprop.c | 33 - 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..123cc228630 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -176,7 +176,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (rtx_insn *, rtx, rtx); +fwprop_propagation (insn_info *, insn_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -185,13 +185,20 @@ namespace bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; uint16_t classify_result (rtx, rtx); + + private: +const bool single_use_p; +const bool single_ebb_p; }; } /* Prepare to replace FROM with TO in INSN. */ -fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to) - : insn_propagation (insn, from, to) +fwprop_propagation::fwprop_propagation (insn_info *use_insn, + insn_info *def_insn, rtx from, rtx to) + : insn_propagation (use_insn->rtl (), from, to), +single_use_p (def_insn->num_uses () == 1), +single_ebb_p (use_insn->ebb () == def_insn->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications with the following + exceptions: + 1) Propagating (mem)s into multiple uses is not profitable. + 2) Propagating (mem)s across EBBs may not be profitable if the source EBB + runs less frequently. + 3) Propagating (mem)s into paradoxical (subreg)s is not profitable. + 4) Creating new (mem/v)s is not correct, since DCE will not remove the old + ones. */ + if (single_use_p + && single_ebb_p + && SUBREG_P (old_rtx) + && !paradoxical_subreg_p (old_rtx) + && MEM_P (new_rtx) + && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } @@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_insn, def_insn, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_insn, def_insn, dest, src); if (!prop.apply_to_pattern (loc)) { if (dump_file && (dump_flags & TDF_DETAILS)) -- 2.26.2
Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications
On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote: > Ilya Leoshkevich via Gcc-patches writes: > > On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote: > > > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches > > > wrote: > > > Suppose we have: > > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) > > > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) > > > > > > > > It is clearly profitable to propagate the first insn into the > > > > second > > > > one and get: > > > > > > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) > > > > > > > > fwprop actually manages to perform this, but doesn't think the > > > > result is > > > > worth it, which results in unnecessary store/load sequences on > > > > s390. > > > > Improve the situation by classifying SUBREG -> MEM changes as > > > > profitable. > > > > > > IIRC fwprop also propagates into multiple uses and replacing a > > > non- > > > MEM > > > with a MEM is only good when the original MEM goes away - is that > > > properly > > > dealt with here? > > > > This is because of efficiency and not correctness reasons, > > right? For > > correctness I already check MEM_VOLATILE_P (new_rtx). For > > efficiency I > > think it would be reasonable to add def_insn->num_uses () == 1 > > check > > (this passes my tests, I'm yet to do a full regtest though). > > That sounds plausible, but I think there's also the issue that the > mem could be in a less frequently executed block. > > A potential problem with checking num_uses is that it might make the > boundary between fwprop and combine more fuzzy. If the propagation > makes the original instruction redundant then we should remove it > and take the cost of the removal into account when costing the > propagation (as combine does). fwprop is instead set up for cases > in which propagations are profitable even if the original instruction > is kept. > > What prevents combine from handling this? Are the instructions in > different blocks? I wanted to do this before combine, because in __ieee754_sqrtl case fwprop turns this (example from the commit message + the insn after it): (set (reg:TF 63) (mem:TF (reg:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg:TF 63) 0)) (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(reg:FPRX2 66)] [(asm_input:FPRX2 ("f"))] [])) into this: (set (reg:TF 63) (mem:TF (reg:DI 62))) (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(subreg:FPRX2 (reg:TF 63) 0)] [(asm_input:FPRX2 ("f"))] [])) by propagating (reg:FPRX2 66), and there is not much combine can do about this anymore: (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(mem:FPRX2 (reg:DI 62))] [(asm_input:FPRX2 ("f"))] [])) is not a valid insn.
[PATCH] PING Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html Best regards, Ilya
[PATCH v2] fwprop: Allow (subreg (mem)) simplifications
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses. Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. --- gcc/fwprop.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..02d3d507cbc 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -176,7 +176,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (rtx_insn *, rtx, rtx); +fwprop_propagation (rtx_insn *, insn_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -185,13 +185,18 @@ namespace bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; uint16_t classify_result (rtx, rtx); + + private: +const bool single_use_p; }; } /* Prepare to replace FROM with TO in INSN. */ -fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to) - : insn_propagation (insn, from, to) +fwprop_propagation::fwprop_propagation (rtx_insn *insn, insn_info *def_insn, + rtx from, rtx to) +: insn_propagation (insn, from, to), + single_use_p (def_insn->num_uses () == 1) { should_check_mems = true; should_note_simplifications = true; @@ -262,6 +267,13 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications. Do not allow propagation + of (mem)s into multiple uses, since those are not profitable, as well as + creating new (mem/v)s, since DCE will not remove the old ones. */ + if (single_use_p && SUBREG_P (old_rtx) && MEM_P (new_rtx) + && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } @@ -363,7 +375,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_rtl, def_insn, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -426,7 +438,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_rtl, def_insn, dest, src); if (!prop.apply_to_pattern (loc)) { if (dump_file && (dump_flags & TDF_DETAILS)) -- 2.26.2
Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications
On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote: > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches > wrote: > > > Suppose we have: > > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) > > > > It is clearly profitable to propagate the first insn into the > > second > > one and get: > > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) > > > > fwprop actually manages to perform this, but doesn't think the > > result is > > worth it, which results in unnecessary store/load sequences on > > s390. > > Improve the situation by classifying SUBREG -> MEM changes as > > profitable. > > IIRC fwprop also propagates into multiple uses and replacing a non- > MEM > with a MEM is only good when the original MEM goes away - is that > properly > dealt with here? This is because of efficiency and not correctness reasons, right? For c orrectness I already check MEM_VOLATILE_P (new_rtx). For efficiency I t hink it would be reasonable to add def_insn->num_uses () == 1 check (thi s passes my tests, I'm yet to do a full regtest though). What do you think about this?
[PATCH] fwprop: Allow (subreg (mem)) simplifications
Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. I realize it might be too late for a change like this, but it's desirable to have this in conjunction with the https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html s390 regression fix, which otherwise produces unnecessary store/load sequences in certain glibc routines, e.g. __ieee754_sqrtl. Ok for master? Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. gcc/testsuite/ChangeLog: 2021-01-15 Ilya Leoshkevich * gcc.target/s390/vector/long-double-to-i64.c: Expect that float-vector moves do *not* happen. --- gcc/fwprop.c | 5 + gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..46b8ec7eccf 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -262,6 +262,11 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications. However, do not allow + creating new (mem/v)s, since DCE will not remove the old ones. */ + if (SUBREG_P (old_rtx) && MEM_P (new_rtx) && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c index 2dbbb5d1c03..8f4e377ed72 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c @@ -10,8 +10,7 @@ long_double_to_i64 (long double x) return x; } -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */ -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */ +/* { dg-final { scan-assembler-not {\n\tvpdi\t} } } */ /* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */ int -- 2.26.2
[PATCH] IBM Z: Fix usage of "f" constraint with long doubles
Bootstrapped and regtested on s390x-redhat-linux. Depends on https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html; ok for master once the dependency is committed? After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-abi.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 73 +++ gcc/config/s390/vector.md | 36 +++-- .../s390/vector/long-double-asm-abi.c | 26 +++ .../s390/vector/long-double-asm-in-out.c | 14 .../s390/vector/long-double-asm-inout.c | 14 .../vector/long-double-volatile-from-i64.c| 22 ++ 6 files changed, 180 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9d2cee950d0..a22fd9fe391 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16688,6 +16688,76 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (strcmp (constraint, "=f") != 0) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + gcc_assert (!is_inout); + /* Copy output value from a FPR pair into a vector register. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + push_to_sequence2 (after_md_seq, after_md_end); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + after_md_seq = get_insns (); + after_md_end = get_last_insn (); + end_sequence (); + outputs[i] = fprx2; +} + + for (unsigned i = 0; i < ninputs; i++) +{ + if (GET_MODE (inputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[noutputs + i]; + bool allows_mem, allows_reg; + bool ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0, + constraints.address (), &allows_mem, + &allows_reg); + gcc_assert (ok); + if (strcmp (constraint, "f") != 0 && strcmp (constraint, "=f") != 0) + /* Long double with a constraint other than "f" (or "=f" for inout + operands) - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + /* Copy input value from a vector register into a FPR pair. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + inputs[i] = fprx2; + input_modes[i] = FPRX2mode; +} + + return after_md_seq; +} + /* Initialize GCC target structure. */ #undef TARGET_ASM_ALIGNED_HI_OP @@
[PATCH] lra: clear lra_insn_recog_data after simplifying a mem subreg
Hello, I ran into this problem when writing new patterns for s390. I'm not 100% sure this fix is correct, but it resolves my issue and survives bootstrap and regtest on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Could you please take a look? Best regards, Ilya Suppose we have: (insn (set (reg:FPRX2 70) (subreg:FPRX2 (reg/v:TF 63) 0))) where operand_loc[0] points to r70 and operand_loc[1] points to r63. If r63 is spilled, remove_pseudos() will change this insn to: (insn (set (reg:FPRX2 70) (subreg:FPRX2 (mem/c:TF (plus:DI (reg:DI %fp) (const_int 144)) This is fine so far: rtx pointed to by operand_loc[1] has been changed from (reg) to (mem), but its slot is still under (subreg). However, alter_subreg() will simplify this insn to: (insn (set (reg:FPRX2 70) (mem/c:FPRX2 (plus:DI (reg:DI %fp) (const_int 144) The (subreg) is gone, and therefore operand_loc[1] is no longer valid. This will prevent process_insn_for_elimination() from updating the spill slot offset, causing miscompilation: different instructions will refer to the same spill slot using different offsets. Fix by clearing all the cached data, and not just used_insn_alternative. gcc/ChangeLog: 2021-01-13 Ilya Leoshkevich * lra-spills.c (remove_pseudos): Call lra_update_insn_recog_data() after calling alter_subreg() on a (mem). --- gcc/lra-spills.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c index 26f56b2df02..01bd82574e7 100644 --- a/gcc/lra-spills.c +++ b/gcc/lra-spills.c @@ -431,7 +431,7 @@ remove_pseudos (rtx *loc, rtx_insn *insn) alter_subreg (loc, false); if (GET_CODE (*loc) == MEM) { - lra_get_insn_recog_data (insn)->used_insn_alternative = -1; + lra_update_insn_recog_data (insn); if (lra_dump_file != NULL) fprintf (lra_dump_file, "Memory subreg was simplified in insn #%u\n", -- 2.26.2
[PATCH] IBM Z: Fix constraints in vpdi patterns
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? The destination register is only partially overwritten, so + should be used instead of =. gcc/ChangeLog: 2021-01-08 Ilya Leoshkevich * config/s390/vector.md (*tf_to_fprx2_0): Rename from *mov_tf_to_fprx2_0 for consistency, fix constraint. (*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for consistency, fix constraint. --- gcc/config/s390/vector.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 5b8d75f18f0..0e3c31f5d4f 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -737,16 +737,16 @@ (define_insn "*vec_perm" "vperm\t%v0,%v1,%v2,%v3" [(set_attr "op_type" "VRR")]) -(define_insn "*mov_tf_to_fprx2_0" - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0) +(define_insn "*tf_to_fprx2_0" + [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0) (subreg:DF (match_operand:TF1 "general_operand" "v") 0))] "TARGET_VXE" ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1]; "vpdi\t%v0,%v1,%v0,1" [(set_attr "op_type" "VRR")]) -(define_insn "*mov_tf_to_fprx2_1" - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8) +(define_insn "*tf_to_fprx2_1" + [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8) (subreg:DF (match_operand:TF1 "general_operand" "v") 8))] "TARGET_VXE" ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1]; -- 2.26.2
[PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P. Give end users the opportunity to find out whether long doubles are stored in floating-point register pairs or in vector registers, so that they could fine-tune their asm statements. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390-c.c (s390_def_or_undef_macro): Accept callables instead of mask values. (struct target_flag_set_p): New predicate. (s390_cpu_cpp_builtins_internal): Define or undefine __LONG_DOUBLE_VX__ macro. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-vx-macro-off.c: New test. * gcc.target/s390/vector/long-double-vx-macro-on.c: New test. --- gcc/config/s390/s390-c.c | 59 --- .../s390/vector/long-double-vx-macro-off-on.c | 11 .../s390/vector/long-double-vx-macro-on-off.c | 11 3 files changed, 60 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c index 95cd2df505d..a5f5f56311a 100644 --- a/gcc/config/s390/s390-c.c +++ b/gcc/config/s390/s390-c.c @@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token *tok) /* Helper function that defines or undefines macros. If SET is true, the macro MACRO_DEF is defined. If SET is false, the macro MACRO_UNDEF is undefined. Nothing is done if SET and WAS_SET have the same value. */ +template static void -s390_def_or_undef_macro (cpp_reader *pfile, -unsigned int mask, +s390_def_or_undef_macro (cpp_reader *pfile, F is_set, const struct cl_target_option *old_opts, const struct cl_target_option *new_opts, const char *macro_def, const char *macro_undef) @@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile, bool was_set; bool set; - was_set = (!old_opts) ? false : old_opts->x_target_flags & mask; - set = new_opts->x_target_flags & mask; + was_set = (!old_opts) ? false : is_set (old_opts); + set = is_set (new_opts); if (was_set == set) return; if (set) @@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile, cpp_undef (pfile, macro_undef); } +struct target_flag_set_p +{ + target_flag_set_p (unsigned int mask) : m_mask (mask) {} + + bool + operator() (const struct cl_target_option *opts) const + { +return opts->x_target_flags & m_mask; + } + + unsigned int m_mask; +}; + /* Internal function to either define or undef the appropriate system macros. */ static void @@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, struct cl_target_option *opts, const struct cl_target_option *old_opts) { - s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts, - "__HTM__", "__HTM__"); - s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts, - "__VX__", "__VX__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__VEC__=10303", "__VEC__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__vector=__attribute__((vector_size(16)))", + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts, + opts, "__HTM__", "__HTM__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts, + opts, "__VX__", "__VX__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__VEC__=10303", "__VEC__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__vector=__attribute__((vector_size(16)))", "__vector__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__bool=__attribute__((s390_vector_bool)) unsigned", - "__bool"); + s390_def_or_undef_macro ( + pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts, + "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool"); { char macro_def[64]; gcc_assert (s390_arch != PROCESSOR_NATIVE); @@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, cpp_undef (pfile, "__ARCH__"); cpp_define (pfile, macro_def); } + s390_def_or_undef_macro ( + pfile, + [] (const struct cl_target_option *opts) { return TARGET_VXE_P (opts); }, + old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__"); if (!flag_iso) { - s390_def_or_undef_macro (pfile,
[PATCH] IBM Z: Introduce __LONG_DOUBLE_VX__ macro
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Give end users the opportunity to find out whether long doubles are stored in floating-point register pairs or in vector registers, so that they could fine-tune their asm statements. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390-c.c (s390_def_or_undef_macro): Accept callables instead of mask values. (struct target_flag_set_p): New predicate. (s390_cpu_cpp_builtins_internal): Define or undefine __LONG_DOUBLE_VX__ macro. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-vx-macro-off.c: New test. * gcc.target/s390/vector/long-double-vx-macro-on.c: New test. --- gcc/config/s390/s390-c.c | 59 --- .../s390/vector/long-double-vx-macro-off-on.c | 11 .../s390/vector/long-double-vx-macro-on-off.c | 11 3 files changed, 60 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c index 95cd2df505d..29b87d76ab1 100644 --- a/gcc/config/s390/s390-c.c +++ b/gcc/config/s390/s390-c.c @@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token *tok) /* Helper function that defines or undefines macros. If SET is true, the macro MACRO_DEF is defined. If SET is false, the macro MACRO_UNDEF is undefined. Nothing is done if SET and WAS_SET have the same value. */ +template static void -s390_def_or_undef_macro (cpp_reader *pfile, -unsigned int mask, +s390_def_or_undef_macro (cpp_reader *pfile, F is_set, const struct cl_target_option *old_opts, const struct cl_target_option *new_opts, const char *macro_def, const char *macro_undef) @@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile, bool was_set; bool set; - was_set = (!old_opts) ? false : old_opts->x_target_flags & mask; - set = new_opts->x_target_flags & mask; + was_set = (!old_opts) ? false : is_set (old_opts); + set = is_set (new_opts); if (was_set == set) return; if (set) @@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile, cpp_undef (pfile, macro_undef); } +struct target_flag_set_p +{ + target_flag_set_p (unsigned int mask) : m_mask (mask) {} + + bool + operator() (const struct cl_target_option *opts) const + { +return opts->x_target_flags & m_mask; + } + + unsigned int m_mask; +}; + /* Internal function to either define or undef the appropriate system macros. */ static void @@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, struct cl_target_option *opts, const struct cl_target_option *old_opts) { - s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts, - "__HTM__", "__HTM__"); - s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts, - "__VX__", "__VX__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__VEC__=10303", "__VEC__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__vector=__attribute__((vector_size(16)))", + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts, + opts, "__HTM__", "__HTM__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts, + opts, "__VX__", "__VX__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__VEC__=10303", "__VEC__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__vector=__attribute__((vector_size(16)))", "__vector__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__bool=__attribute__((s390_vector_bool)) unsigned", - "__bool"); + s390_def_or_undef_macro ( + pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts, + "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool"); { char macro_def[64]; gcc_assert (s390_arch != PROCESSOR_NATIVE); @@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, cpp_undef (pfile, "__ARCH__"); cpp_define (pfile, macro_def); } + s390_def_or_undef_macro ( + pfile, + [] (const struct cl_target_option *opts) { return TARGET_Z14_P (opts); }, + old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__"); if (!flag_iso) { - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, -
[PATCH] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Bootstrapped and regtested on x86_64-redhat-linux. I also built cross-compilers for arm-linux-gnueabi, cris-elf mn10300-elf, nds32-linux-gnu, pdp11-aout (didn't fully work due to https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg251887.html, but the changed code compiled fine), powerpc-linux-gnu, vax-linux-gnu and visium-elf, but didn't test them. I ran into this issue while implementing TARGET_MD_ASM_ADJUST for s390. Ok for master? If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which should be ok as long as the hook itself as well as after_md_seq make up for it), input_mode will contain stale information. It might be tempting to fix this by removing input_mode altogether and just using GET_MODE (), but this will not work correctly with constants. So add input_modes parameter and document that it should be updated whenever inputs parameter is updated. gcc/ChangeLog: 2021-01-05 Ilya Leoshkevich * cfgexpand.c (expand_asm_loc): Pass new parameter. (expand_asm_stmt): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new parameter. * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. * config/cris/cris.c (cris_md_asm_adjust): Likewise. * config/i386/i386.c (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. * config/vax/vax.c (vax_md_asm_adjust): Likewise. * config/visium/visium.c (visium_md_asm_adjust): Likewise. * target.def (md_asm_adjust): Likewise. --- gcc/cfgexpand.c | 16 gcc/config/arm/aarch-common-protos.h | 8 gcc/config/arm/aarch-common.c| 7 --- gcc/config/arm/arm.c | 14 -- gcc/config/cris/cris.c | 7 --- gcc/config/i386/i386.c | 7 --- gcc/config/mn10300/mn10300.c | 7 --- gcc/config/nds32/nds32.c | 1 + gcc/config/pdp11/pdp11.c | 9 + gcc/config/rs6000/rs6000.c | 7 --- gcc/config/vax/vax.c | 3 ++- gcc/config/visium/visium.c | 12 +++- gcc/doc/tm.texi | 10 ++ gcc/target.def | 13 - 14 files changed, 69 insertions(+), 52 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index b73019b241f..e25528261a0 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2879,6 +2879,7 @@ expand_asm_loc (tree string, int vol, location_t locus) rtx asm_op, clob; unsigned i, nclobbers; auto_vec input_rvec, output_rvec; + auto_vec input_mode; auto_vec constraints; auto_vec clobber_rvec; HARD_REG_SET clobbered_regs; @@ -2888,9 +2889,8 @@ expand_asm_loc (tree string, int vol, location_t locus) clobber_rvec.safe_push (clob); if (targetm.md_asm_adjust) - targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); + targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, + constraints, clobber_rvec, clobbered_regs); asm_op = body; nclobbers = clobber_rvec.length (); @@ -3067,8 +3067,8 @@ expand_asm_stmt (gasm *stmt) return; } - /* There are some legacy diagnostics in here, and also avoids a - sixth parameger to targetm.md_asm_adjust. */ + /* There are some legacy diagnostics in here, and also avoids an extra + parameter to targetm.md_asm_adjust. */ save_input_location s_i_l(locus); unsigned noutputs = gimple_asm_noutputs (stmt); @@ -3419,9 +3419,9 @@ expand_asm_stmt (gasm *stmt) the flags register. */ rtx_insn *after_md_seq = NULL; if (targetm.md_asm_adjust) -after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); +after_md_seq + = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, +constraints, clobber_rvec, clobbered_regs); /* Do not allow the hook to change the output and input count, lest it mess up the operand numbering. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 251de3d61a8..cbef50dde71 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -143,9 +143,9 @@ struct cpu_cost_table const struct vector_cost_table vect; }; -rtx_insn * -arm_md_asm_adjust (vec &outputs, vec &/*inputs*/, - vec &constraints, -
[PATCH] IBM Z: Fix check_effective_target_s390_z14_hw
Bootstrapped and regtested on z14. Ok for master? Commit 2f473f4b065d ("IBM Z: Do not run long double tests on old machines") introduced a predicate for tests that must run only on z14+. However, due to a syntax error, the predicate always returns false. gcc/testsuite/ChangeLog: 2020-12-10 Ilya Leoshkevich * gcc.target/s390/s390.exp: Replace %% with %. --- gcc/testsuite/gcc.target/s390/s390.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index ba493de9f95..57b2690f8ab 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -197,7 +197,7 @@ proc check_effective_target_s390_z14_hw { } { int main (void) { int x = 0; - asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : ); + asm ("msgrkc %0,%0,%0" : "+r" (x) : ); return x; } }] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 } -- 2.26.2
[PATCH v2] aix: Fixinclude updates [PR98208]
On Fri, 2020-12-11 at 07:51 -0500, Nathan Sidwell wrote: > > I'm pretty sure this is wrong. I think the test_text in > inclhack.def > should be a pre-fixed string that the testsuite presumably checks is > converted. You're right; I've added your change from the Bugzilla and updated the expectation. Does the following look better? After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to fail (at least on gcc121 machine). Fix by updating fixincludes/tests and rerunning genfixes. Co-developed-by: Nathan Sidwell fixincludes/ChangeLog: 2020-12-11 Ilya Leoshkevich * fixincl.x: Rerun genfixes. * inclhack.def(aix_physadr_t): Change test_text to something that needs to be replaced. * tests/base/sys/types.h(aix_physadr_t): Add expectation. --- fixincludes/fixincl.x | 4 ++-- fixincludes/inclhack.def | 2 +- fixincludes/tests/base/sys/types.h | 5 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x index 21439652bce..cc17edfba0b 100644 --- a/fixincludes/fixincl.x +++ b/fixincludes/fixincl.x @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16 + * It has been AutoGen-ed December 9, 2020 at 11:16:08 AM by AutoGen 5.18.16 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020 +/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec 9 11:16:08 EST 2020 * * You must regenerate it. Use the ./genfixes script. * diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def index 80c9adfb07c..3a4cfe06542 100644 --- a/fixincludes/inclhack.def +++ b/fixincludes/inclhack.def @@ -731,7 +731,7 @@ fix = { select= "typedef[ \t]*struct[ \t]*([{][^}]*[}][ \t]*\\*[ \t]*physadr_t;)"; c_fix = format; c_fix_arg = "typedef struct __physadr_s %1"; -test_text = "typedef struct __physadr_s {"; +test_text = "typedef struct { random stuff } * physadr_t;"; }; /* diff --git a/fixincludes/tests/base/sys/types.h b/fixincludes/tests/base/sys/types.h index 683b5e93ecd..7340e76b175 100644 --- a/fixincludes/tests/base/sys/types.h +++ b/fixincludes/tests/base/sys/types.h @@ -9,6 +9,11 @@ +#if defined( AIX_PHYSADR_T_CHECK ) +typedef struct __physadr_s { random stuff } * physadr_t; +#endif /* AIX_PHYSADR_T_CHECK */ + + #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T -- 2.25.4
[PATCH] aix: Fixinclude updates [PR98208]
Tested on gcc121 (x86_64 CentOS Linux 7). Ok for master? After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to fail (at least on gcc121 machine). Fix by updating fixincludes/tests and rerunning genfixes. fixincludes/ChangeLog: 2020-12-11 Ilya Leoshkevich * fixincl.x: Rerun genfixes. * tests/base/sys/types.h: Add AIX_PHYSADR_T_CHECK. --- fixincludes/fixincl.x | 4 ++-- fixincludes/tests/base/sys/types.h | 5 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x index 21439652bce..cc17edfba0b 100644 --- a/fixincludes/fixincl.x +++ b/fixincludes/fixincl.x @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16 + * It has been AutoGen-ed December 9, 2020 at 11:16:08 AM by AutoGen 5.18.16 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020 +/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec 9 11:16:08 EST 2020 * * You must regenerate it. Use the ./genfixes script. * diff --git a/fixincludes/tests/base/sys/types.h b/fixincludes/tests/base/sys/types.h index 683b5e93ecd..a318f9b713b 100644 --- a/fixincludes/tests/base/sys/types.h +++ b/fixincludes/tests/base/sys/types.h @@ -9,6 +9,11 @@ +#if defined( AIX_PHYSADR_T_CHECK ) +typedef struct __physadr_s { +#endif /* AIX_PHYSADR_T_CHECK */ + + #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T -- 2.25.4
[PATCH] Limit perf data buffer during feature checking
Bootstrapped and regtested on x86_64-redhat-linux. Ok for master? Commit 2ead1ab91123 ("Limit perf data buffer during profiling") added -m8 to perf invocations during running tests, but the same problem exists for checking whether perf is working in the first place. gcc/testsuite/ChangeLog: 2020-12-08 Ilya Leoshkevich * lib/target-supports.exp(check_profiling_available): Limit perf data buffer. --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 89c4f67554f..75b4f5d0e85 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -654,7 +654,7 @@ proc check_profiling_available { test_what } { return 0 } global srcdir - set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" "true -v >/dev/null"] + set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" "-m8 true -v >/dev/null"] if { [lindex $status 0] != 0 } { verbose "autofdo not supported because perf does not work" return 0 -- 2.25.4
Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels
On Thu, 2020-07-09 at 14:07 +0200, Ilya Leoshkevich wrote: > On Wed, 2020-07-01 at 21:48 +0200, Ilya Leoshkevich wrote: > > On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote: > > > On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc- > > > patches > > > wrote: > > > > gcc/ChangeLog: > > > > > > > > 2020-06-30 Ilya Leoshkevich > > > > > > > > * asan.c (asan_emit_stack_protection): Use > > > > CODE_LABEL_BOUNDARY. > > > > * defaults.h (CODE_LABEL_BOUNDARY): New macro. > > > > * doc/tm.texi: Document CODE_LABEL_BOUNDARY. > > > > * doc/tm.texi.in: Likewise. > > > Don't we already have the ability to set label alignments? See > > > LABEL_ALIGN. > > > > The following works with -falign-labels=2: > > > > --- a/gcc/asan.c > > +++ b/gcc/asan.c > > @@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx > > pbase, > > unsigned int alignb, > >DECL_INITIAL (decl) = decl; > >TREE_ASM_WRITTEN (decl) = 1; > >TREE_ASM_WRITTEN (id) = 1; > > - SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY); > > + SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) * > > BITS_PER_UNIT); > >emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (decl))); > >shadow_base = expand_binop (Pmode, lshr_optab, base, > > gen_int_shift_amount (Pmode, > > ASAN_SHADOW_SHIFT), > > > > In order to go this way, we would need to raise `-falign-labels=` > > default to 2 for s390, which is not incorrect, but would > > unnecessarily > > clutter asm with `.align 2` before each label. So IMHO it would be > > nicer to simply ask the backend "what is your target's instruction > > alignment?". > > Besides that it would clutter asm with .align 2, another argument > against using LABEL_ALIGN here is that it's semantically different > from > what is needed: -falign-labels value, which it returns, is specified > by > user for optimization purposes, whereas here we need to query the > architecture's property. > > In practical terms, if user specifies -falign-labels=4096, this would > affect how the code is generated here. However, this would be > completely unnecessary: we never jump to decl, its address is only > saved for reporting. Hi Jeff, Could you please have another look at this one? Best regards, Ilya
Re: [PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.
On Wed, 2020-12-02 at 11:42 -0700, Jeff Law wrote: > > On 12/1/20 7:09 PM, Ilya Leoshkevich wrote: > > On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote: > > > No strong opinions. I think whichever is less invasive in terms > > > of > > > code > > > quality is probably the way to go. What we want to avoid is > > > suppressing > > > threading unnecessarily as that often leads to false positives > > > from > > > middle-end based warnings. Suppressing threading can also lead > > > to > > > build > > > failures in the kernel due to the way they use b_c_p. > > I think v1 is better then. Would you mind approving the following? > > That's the same code as in v1, but with the improved commit message > > and > > comments. > > > > > > > > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) > > build > > with GCC 10 fails on s390 with "impossible constraint". > > > > Explanation by Jeff Law: > > > > ``` > > So what we have is a b_c_p at the start of an if-else > > chain. Subsequent > > tests on the "true" arm of the the b_c_p test may throw us off the > > constant path (because the constants are out of range). Once all > > the > > tests are passed (it's constant and the constant is in range) the > > true > > arm's terminal block has a special asm that requires a constant > > argument. In the case where we get to the terminal block on the > > true > > arm, the argument to the b_c_p is used as the constant argument to > > the > > special asm. > > > > At first glace jump threading seems to be doing the right > > thing. Except > > that we end up with two paths to that terminal block with the > > special > > asm, one for each of the two constant arguments to the b_c_p call. > > Naturally since that same value is used in the asm, we have to > > introduce > > a PHI to select between them at the head of the terminal > > block. Now > > the argument in the asm is no longer constant and boom we fail. > > ``` > > > > Fix by disallowing __builtin_constant_p on threading paths. > > > > gcc/ChangeLog: > > > > 2020-06-03 Ilya Leoshkevich > > > > * tree-ssa-threadbackward.c > > (thread_jumps::profitable_jump_thread_path): > > Do not allow __builtin_constant_p on a threading path. > > > > gcc/testsuite/ChangeLog: > > > > 2020-06-03 Ilya Leoshkevich > > > > * gcc.target/s390/builtin-constant-p-threading.c: New test. > OK. I think the old forward threader has the same problem. Which I > think can be fixed by returning NULL from > record_temporary_equivalences_from_stmts_at_dest when we see the > B_C_P > call. Fixing that in the obvious way is pre-approved once it's gone > through the usual testing. Thanks! I've committed both: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=70a62009181f66d1d1c90d3c74de38e153c96eb0 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=614aff0adf8fba5d843ec894603160151c20f0aa Best regards, Ilya
[PATCH] IBM Z: Build autovec-*-signaling-eq.c tests with exceptions
According to https://gcc.gnu.org/pipermail/gcc/2020-November/234344.html, GCC is allowed to perform optimizations that remove floating point traps, since they do not affect the modeled control flow. This interferes with two signaling comparison tests, where (a <= b && a >= b) is turned into (a <= b && a == b) by test_for_singularity, into ((a <= b) & (a == b)) by vectorizer and then into (a == b) eliminate_redundant_comparison. Fix by making traps affect the control flow by turning them into exceptions. gcc/testsuite/ChangeLog: 2020-12-03 Ilya Leoshkevich * gcc.target/s390/zvector/autovec-double-signaling-eq.c: Build with exceptions. * gcc.target/s390/zvector/autovec-float-signaling-eq.c: Likewise. --- .../gcc.target/s390/zvector/autovec-double-signaling-eq.c | 2 +- .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c index a8402b9f705..3645d3cc393 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c index 7dd91a5e6f3..d98aa0c494e 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ #include "autovec.h" -- 2.25.4
[PATCH] Fix division by 0 in printf_strlen_execute when dumping
Bootstrap ang regtest running on x86_64-redhat-linux. Ok for master? gcc/ChangeLog: 2020-12-03 Ilya Leoshkevich * tree-ssa-strlen.c (printf_strlen_execute): Avoid division by 0. --- gcc/tree-ssa-strlen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 741b47bca4a..522b2d45b3a 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -5684,7 +5684,7 @@ printf_strlen_execute (function *fun, bool warn_only) " failures: %u\n" " max_depth: %u\n", nidxs, - (nused * 100) / nidxs, + nidxs == 0 ? 0 : (nused * 100) / nidxs, walker.ptr_qry.var_cache->access_refs.length (), walker.ptr_qry.hits, walker.ptr_qry.misses, walker.ptr_qry.failures, walker.ptr_qry.max_depth); -- 2.25.4
[PATCH v2] IBM Z: Use llihf and oilf to load large immediates into GPRs
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/560822.html v1 -> v2: - Use SYMBOL_REF_P. - Fix usage of gcc_assert. - Use GEN_INT. Currently GCC loads large immediates into GPRs from the literal pool, which is not as efficient as loading two halves with llihf and oilf. gcc/ChangeLog: 2020-11-30 Ilya Leoshkevich * config/s390/s390-protos.h (s390_const_int_pool_entry_p): New function. * config/s390/s390.c (s390_const_int_pool_entry_p): New function. * config/s390/s390.md: Add define_peephole2 that produces llihf and oilf. gcc/testsuite/ChangeLog: 2020-11-30 Ilya Leoshkevich * gcc.target/s390/load-imm64-1.c: New test. * gcc.target/s390/load-imm64-2.c: New test. --- gcc/config/s390/s390-protos.h| 1 + gcc/config/s390/s390.c | 31 gcc/config/s390/s390.md | 23 +++ gcc/testsuite/gcc.target/s390/load-imm64-1.c | 14 + gcc/testsuite/gcc.target/s390/load-imm64-2.c | 14 + 5 files changed, 83 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index ad2f7f77c18..eb10c3f4bbb 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *, rtx *); extern void print_operand_address (FILE *, rtx); extern void print_operand (FILE *, rtx, int); extern void s390_output_pool_entry (rtx, machine_mode, unsigned int); +extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *); extern int s390_label_align (rtx_insn *); extern int s390_agen_dep_p (rtx_insn *, rtx_insn *); extern rtx_insn *s390_load_got (void); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 02f18366aa1..fb48102559d 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp, machine_mode mode, unsigned int align) } } +/* Return true if MEM refers to an integer constant in the literal pool. If + VAL is not nullptr, then also fill it with the constant's value. */ + +bool +s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val) +{ + /* Try to match the following: + - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)). + - (mem (symbol_ref)). */ + + if (!MEM_P (mem)) +return false; + + rtx addr = XEXP (mem, 0); + rtx sym; + if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF) +sym = XVECEXP (addr, 0, 0); + else +sym = addr; + + if (!SYMBOL_REF_P (sym) || !CONSTANT_POOL_ADDRESS_P (sym)) +return false; + + rtx val_rtx = get_pool_constant (sym); + if (!CONST_INT_P (val_rtx)) +return false; + + if (val != nullptr) +*val = INTVAL (val_rtx); + return true; +} /* Return an RTL expression representing the value of the return address for the frame COUNT steps up from the current frame. FRAME is the diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 910415a5974..d4cfbdf6732 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2116,6 +2116,29 @@ (define_peephole2 [(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))] "") +; Split loading of 64-bit constants into GPRs into llihf + oilf - +; counterintuitively, using oilf is faster than iilf. oilf clobbers +; cc, so cc must be dead. +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") + (match_operand:DI 1 "memory_operand" ""))] + "TARGET_64BIT + && TARGET_EXTIMM + && GENERAL_REG_P (operands[0]) + && s390_const_int_pool_entry_p (operands[1], nullptr) + && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))" + [(set (match_dup 0) (match_dup 2)) + (parallel +[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3))) + (clobber (reg:CC CC_REGNUM))])] +{ + HOST_WIDE_INT val; + bool ok = s390_const_int_pool_entry_p (operands[1], &val); + gcc_assert (ok); + operands[2] = GEN_INT (val & 0xULL); + operands[3] = GEN_INT (val & 0xULL); +}) + ; ; movsi instruction pattern(s). ; diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c b/gcc/testsuite/gcc.target/s390/load-imm64-1.c new file mode 100644 index 000..03d17f59096 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c @@ -0,0 +1,14 @@ +/* Test that large 64-bit constants are loaded with llihf + oilf when lgrl is + not available. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z9-109" } */ + +unsigned long +magic (void) +{ + return 0x3f08c5392f756cd; +} + +/* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\n\toilf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target
Re: [PATCH] IBM Z: Use llihf and oilf to load large immediates into GPRs
On Wed, 2020-12-02 at 08:15 +0100, Andreas Krebbel wrote: > On 12/2/20 2:34 AM, Ilya Leoshkevich wrote: > > Bootstrapped and regtesed on s390x-redhat-linux. There are slight > > improvements in all SPEC benchmarks, no regressions that could not > > be > > "fixed" by adding nops. Ok for master? > > > > > > > > Currently GCC loads large immediates into GPRs from the literal > > pool, > > which is not as efficient as loading two halves with llihf and > > oilf. > > > > gcc/ChangeLog: > > > > 2020-11-30 Ilya Leoshkevich > > > > * config/s390/s390-protos.h (s390_const_int_pool_entry_p): New > > function. > > * config/s390/s390.c (s390_const_int_pool_entry_p): New > > function. > > * config/s390/s390.md: Add define_peephole2 that produces llihf > > and oilf. > > > > gcc/testsuite/ChangeLog: > > > > 2020-11-30 Ilya Leoshkevich > > > > * gcc.target/s390/load-imm64-1.c: New test. > > * gcc.target/s390/load-imm64-2.c: New test. > > --- > > gcc/config/s390/s390-protos.h| 1 + > > gcc/config/s390/s390.c | 31 > > > > gcc/config/s390/s390.md | 22 ++ > > gcc/testsuite/gcc.target/s390/load-imm64-1.c | 10 +++ > > gcc/testsuite/gcc.target/s390/load-imm64-2.c | 10 +++ > > 5 files changed, 74 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c > > create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c > > > > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390- > > protos.h > > index ad2f7f77c18..eb10c3f4bbb 100644 > > --- a/gcc/config/s390/s390-protos.h > > +++ b/gcc/config/s390/s390-protos.h > > @@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *, > > rtx *); > > extern void print_operand_address (FILE *, rtx); > > extern void print_operand (FILE *, rtx, int); > > extern void s390_output_pool_entry (rtx, machine_mode, unsigned > > int); > > +extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *); > > extern int s390_label_align (rtx_insn *); > > extern int s390_agen_dep_p (rtx_insn *, rtx_insn *); > > extern rtx_insn *s390_load_got (void); > > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c > > index 02f18366aa1..e3d68d3543b 100644 > > --- a/gcc/config/s390/s390.c > > +++ b/gcc/config/s390/s390.c > > @@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp, > > machine_mode mode, unsigned int align) > > } > > } > > > > +/* Return true if MEM refers to an integer constant in the literal > > pool. If > > + VAL is not nullptr, then also fill it with the constant's > > value. */ > > + > > +bool > > +s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val) > > +{ > > + /* Try to match the following: > > + - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)). > > + - (mem (symbol_ref)). */ > > + > > + if (!MEM_P (mem)) > > +return false; > > + > > + rtx addr = XEXP (mem, 0); > > + rtx sym; > > + if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF) > > +sym = XVECEXP (addr, 0, 0); > > + else > > +sym = addr; > > + > > + if (GET_CODE (sym) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P > > (sym)) > !SYMBOL_REF_P (sym) Ok. > > > +return false; > > + > > + rtx val_rtx = get_pool_constant (sym); > > + if (!CONST_INT_P (val_rtx)) > > +return false; > > + > > + if (val != nullptr) > > +*val = INTVAL (val_rtx); > > + return true; > > +} > Alternatively you probably could have returned the RTX instead and > use gen_highpart / gen_lowpart in > the peephole. But no need to change that. I'll give it a try and see if the code looks better. > > > > > /* Return an RTL expression representing the value of the return > > address > > for the frame COUNT steps up from the current frame. FRAME is > > the > > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md > > index 910415a5974..79e9a75ba2f 100644 > > --- a/gcc/config/s390/s390.md > > +++ b/gcc/config/s390/s390.md > > @@ -2116,6 +2116,28 @@ (define_peephole2 > >[(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))] > >"") > > > > +; Split loading of 64-bit constants into GPRs into llihf + oilf - > > +; counterintuitively, using oilf is faster than iilf. oilf > > clobbers > > +; cc, so cc must be dead. > > +(define_peephole2 > > + [(set (match_operand:DI 0 "register_operand" "") > > +(match_operand:DI 1 "memory_operand" ""))] > > + "TARGET_64BIT > > + && TARGET_EXTIMM > > + && GENERAL_REG_P (operands[0]) > > + && s390_const_int_pool_entry_p (operands[1], nullptr) > > + && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))" > > + [(set (match_dup 0) (match_dup 2)) > > + (parallel > > +[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3))) > > + (clobber (reg:CC CC_REGNUM))])] > > +{ > > + HOST_WIDE_INT val; > > + gcc_assert (s390_const_int_pool_entry_p (operands[1], &val)); > > This probably breaks with checking dis
[PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.
On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote: > > No strong opinions. I think whichever is less invasive in terms of > code > quality is probably the way to go. What we want to avoid is > suppressing > threading unnecessarily as that often leads to false positives from > middle-end based warnings. Suppressing threading can also lead to > build > failures in the kernel due to the way they use b_c_p. I think v1 is better then. Would you mind approving the following? That's the same code as in v1, but with the improved commit message and comments. Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) build with GCC 10 fails on s390 with "impossible constraint". Explanation by Jeff Law: ``` So what we have is a b_c_p at the start of an if-else chain. Subsequent tests on the "true" arm of the the b_c_p test may throw us off the constant path (because the constants are out of range). Once all the tests are passed (it's constant and the constant is in range) the true arm's terminal block has a special asm that requires a constant argument. In the case where we get to the terminal block on the true arm, the argument to the b_c_p is used as the constant argument to the special asm. At first glace jump threading seems to be doing the right thing. Except that we end up with two paths to that terminal block with the special asm, one for each of the two constant arguments to the b_c_p call. Naturally since that same value is used in the asm, we have to introduce a PHI to select between them at the head of the terminal block. Now the argument in the asm is no longer constant and boom we fail. ``` Fix by disallowing __builtin_constant_p on threading paths. gcc/ChangeLog: 2020-06-03 Ilya Leoshkevich * tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path): Do not allow __builtin_constant_p on a threading path. gcc/testsuite/ChangeLog: 2020-06-03 Ilya Leoshkevich * gcc.target/s390/builtin-constant-p-threading.c: New test. --- .../s390/builtin-constant-p-threading.c | 46 +++ gcc/tree-ssa-threadbackward.c | 7 ++- 2 files changed, 52 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c diff --git a/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c new file mode 100644 index 000..5f0acdce0b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=z196 -mzarch" } */ + +typedef struct +{ + int counter; +} atomic_t; + +static inline __attribute__ ((__gnu_inline__)) int +__atomic_add (int val, int *ptr) +{ + int old; + asm volatile("laa %[old],%[val],%[ptr]\n" + : [old] "=d" (old), [ptr] "+Q"(*ptr) + : [val] "d" (val) + : "cc", "memory"); + return old; +} + +static inline __attribute__ ((__gnu_inline__)) void +__atomic_add_const (int val, int *ptr) +{ + asm volatile("asi %[ptr],%[val]\n" + : [ptr] "+Q" (*ptr) + : [val] "i" (val) + : "cc", "memory"); +} + +static inline __attribute__ ((__gnu_inline__)) void +atomic_add (int i, atomic_t *v) +{ + if (__builtin_constant_p (i) && (i > -129) && (i < 128)) +{ + __atomic_add_const (i, &v->counter); + return; +} + __atomic_add (i, &v->counter); +} + +static atomic_t num_active_cpus = { (0) }; + +void +ledtrig_cpu (_Bool is_active) +{ + atomic_add (is_active ? 1 : -1, &num_active_cpus); +} diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 327628f1662..30f692672d9 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -259,8 +259,13 @@ thread_jumps::profitable_jump_thread_path (basic_block bbi, tree name, !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) { + /* Do not allow OpenACC loop markers and __builtin_constant_p on +threading paths. The latter is disallowed, because an +expression might be constant on two threading paths, and +become non-constant (i.e.: phi) when they merge. */ gimple *stmt = gsi_stmt (gsi); - if (gimple_call_internal_p (stmt, IFN_UNIQUE)) + if (gimple_call_internal_p (stmt, IFN_UNIQUE) + || gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P)) { m_path.pop (); return NULL; -- 2.25.4
[PATCH] IBM Z: Use llihf and oilf to load large immediates into GPRs
Bootstrapped and regtesed on s390x-redhat-linux. There are slight improvements in all SPEC benchmarks, no regressions that could not be "fixed" by adding nops. Ok for master? Currently GCC loads large immediates into GPRs from the literal pool, which is not as efficient as loading two halves with llihf and oilf. gcc/ChangeLog: 2020-11-30 Ilya Leoshkevich * config/s390/s390-protos.h (s390_const_int_pool_entry_p): New function. * config/s390/s390.c (s390_const_int_pool_entry_p): New function. * config/s390/s390.md: Add define_peephole2 that produces llihf and oilf. gcc/testsuite/ChangeLog: 2020-11-30 Ilya Leoshkevich * gcc.target/s390/load-imm64-1.c: New test. * gcc.target/s390/load-imm64-2.c: New test. --- gcc/config/s390/s390-protos.h| 1 + gcc/config/s390/s390.c | 31 gcc/config/s390/s390.md | 22 ++ gcc/testsuite/gcc.target/s390/load-imm64-1.c | 10 +++ gcc/testsuite/gcc.target/s390/load-imm64-2.c | 10 +++ 5 files changed, 74 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index ad2f7f77c18..eb10c3f4bbb 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *, rtx *); extern void print_operand_address (FILE *, rtx); extern void print_operand (FILE *, rtx, int); extern void s390_output_pool_entry (rtx, machine_mode, unsigned int); +extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *); extern int s390_label_align (rtx_insn *); extern int s390_agen_dep_p (rtx_insn *, rtx_insn *); extern rtx_insn *s390_load_got (void); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 02f18366aa1..e3d68d3543b 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp, machine_mode mode, unsigned int align) } } +/* Return true if MEM refers to an integer constant in the literal pool. If + VAL is not nullptr, then also fill it with the constant's value. */ + +bool +s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val) +{ + /* Try to match the following: + - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)). + - (mem (symbol_ref)). */ + + if (!MEM_P (mem)) +return false; + + rtx addr = XEXP (mem, 0); + rtx sym; + if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF) +sym = XVECEXP (addr, 0, 0); + else +sym = addr; + + if (GET_CODE (sym) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (sym)) +return false; + + rtx val_rtx = get_pool_constant (sym); + if (!CONST_INT_P (val_rtx)) +return false; + + if (val != nullptr) +*val = INTVAL (val_rtx); + return true; +} /* Return an RTL expression representing the value of the return address for the frame COUNT steps up from the current frame. FRAME is the diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 910415a5974..79e9a75ba2f 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -2116,6 +2116,28 @@ (define_peephole2 [(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))] "") +; Split loading of 64-bit constants into GPRs into llihf + oilf - +; counterintuitively, using oilf is faster than iilf. oilf clobbers +; cc, so cc must be dead. +(define_peephole2 + [(set (match_operand:DI 0 "register_operand" "") +(match_operand:DI 1 "memory_operand" ""))] + "TARGET_64BIT + && TARGET_EXTIMM + && GENERAL_REG_P (operands[0]) + && s390_const_int_pool_entry_p (operands[1], nullptr) + && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))" + [(set (match_dup 0) (match_dup 2)) + (parallel +[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3))) + (clobber (reg:CC CC_REGNUM))])] +{ + HOST_WIDE_INT val; + gcc_assert (s390_const_int_pool_entry_p (operands[1], &val)); + operands[2] = gen_rtx_CONST_INT (DImode, val & 0x); + operands[3] = gen_rtx_CONST_INT (DImode, val & 0x); +}) + ; ; movsi instruction pattern(s). ; diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c b/gcc/testsuite/gcc.target/s390/load-imm64-1.c new file mode 100644 index 000..db0a89395aa --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c @@ -0,0 +1,10 @@ +/* Test that large 64-bit constants are loaded with llihf + oilf when lgrl is + not available. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z9-109" } */ + +unsigned long magic (void) { return 0x3f08c5392f756cd; } + +/* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\n\toilf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c b
[PATCH] Introduce can_vec_cmp_compare_p
Bootstrapped and regtested on x86_64-redhat-linux and s390x-redhat-linux. Ok for master? This is the same as dcd2ca63ec5c ("Introduce can_vcond_compare_p function"), but for vec_cmp. The reason it's needed is that since 5d9ade39b872 ("IBM Z: Fix PR97326: Enable fp compares in vec_cmp") and 4acba4859013 ("IBM Z: Restrict vec_cmp on z13") s390's vec_cmp expander advertises that it supports floating point comparisons except signaling ones on z13, but the common code ignores the latter restriction. gcc/ChangeLog: 2020-11-25 Ilya Leoshkevich * optabs-tree.c (vec_cmp_icode_p): New function. (vec_cmp_eq_icode_p): New function. (expand_vec_cmp_expr_p): Use vec_cmp_icode_p and vec_cmp_eq_icode_p. (vcond_icode_p): Use get_rtx_code_1, just to be uniform with vec_cmp_icode_p. * optabs.c (unsigned_optab_p): New function. (insn_predicate_matches_p): New function. (can_vec_cmp_compare_p): New function. (can_vcond_compare_p): Use unsigned_optab_p and insn_predicate_matches_p. (get_rtx_code): Use get_rtx_code_1. (get_rtx_code_1): Version of get_rtx_code that returns UNKNOWN instead of asserting. * optabs.h (can_vec_cmp_compare_p): New function. (get_rtx_code_1): New function. --- gcc/optabs-tree.c | 47 ++-- gcc/optabs.c | 78 ++- gcc/optabs.h | 12 ++-- 3 files changed, 109 insertions(+), 28 deletions(-) diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c index b797d018c84..a8968f3dd1a 100644 --- a/gcc/optabs-tree.c +++ b/gcc/optabs-tree.c @@ -337,6 +337,35 @@ supportable_convert_operation (enum tree_code code, return false; } +/* Return true iff vec_cmp_optab/vec_cmpu_optab can handle a vector comparison + for code CODE, comparing operands of type VALUE_TYPE and producing a result + of type MASK_TYPE. */ + +static bool +vec_cmp_icode_p (tree value_type, tree mask_type, enum tree_code code) +{ + enum rtx_code rcode = get_rtx_code_1 (code, TYPE_UNSIGNED (value_type)); + if (rcode == UNKNOWN) +return false; + + return can_vec_cmp_compare_p (rcode, TYPE_MODE (value_type), + TYPE_MODE (mask_type)); +} + +/* Return true iff vec_cmpeq_optab can handle a vector comparison for code + CODE, comparing operands of type VALUE_TYPE and producing a result of type + MASK_TYPE. */ + +static bool +vec_cmp_eq_icode_p (tree value_type, tree mask_type, enum tree_code code) +{ + if (code != EQ_EXPR && code != NE_EXPR) +return false; + + return get_vec_cmp_eq_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type)) +!= CODE_FOR_nothing; +} + /* Return TRUE if appropriate vector insn is available for vector comparison expr with vector type VALUE_TYPE and resulting mask with MASK_TYPE. */ @@ -344,14 +373,8 @@ supportable_convert_operation (enum tree_code code, bool expand_vec_cmp_expr_p (tree value_type, tree mask_type, enum tree_code code) { - if (get_vec_cmp_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type), -TYPE_UNSIGNED (value_type)) != CODE_FOR_nothing) -return true; - if ((code == EQ_EXPR || code == NE_EXPR) - && (get_vec_cmp_eq_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type)) - != CODE_FOR_nothing)) -return true; - return false; + return vec_cmp_icode_p (value_type, mask_type, code) +|| vec_cmp_eq_icode_p (value_type, mask_type, code); } /* Return true iff vcond_optab/vcondu_optab can handle a vector @@ -361,8 +384,12 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type, enum tree_code code) static bool vcond_icode_p (tree value_type, tree cmp_op_type, enum tree_code code) { - return can_vcond_compare_p (get_rtx_code (code, TYPE_UNSIGNED (cmp_op_type)), - TYPE_MODE (value_type), TYPE_MODE (cmp_op_type)); + enum rtx_code rcode = get_rtx_code_1 (code, TYPE_UNSIGNED (cmp_op_type)); + if (rcode == UNKNOWN) +return false; + + return can_vcond_compare_p (rcode, TYPE_MODE (value_type), + TYPE_MODE (cmp_op_type)); } /* Return true iff vcondeq_optab can handle a vector comparison for code CODE, diff --git a/gcc/optabs.c b/gcc/optabs.c index 1820b91877a..76045596980 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -3834,23 +3834,59 @@ can_compare_p (enum rtx_code code, machine_mode mode, return 0; } -/* Return whether the backend can emit a vector comparison for code CODE, - comparing operands of mode CMP_OP_MODE and producing a result with - VALUE_MODE. */ +/* Return whether RTL code CODE corresponds to an unsigned optab. */ + +static bool +unsigned_optab_p (enum rtx_code code) +{ + return code == LTU || code == LEU || code == GTU || code == GEU; +} + +/* Return whether the backend-emitted comparison for code CODE, comparing + operands of mode VALUE_MODE and producing a result with MASK_MODE, mat
[PATCH] rtl_dump_bb: fix segfault when reporting internal error
Bootstrapped and regtested on x86_64-redhat-linux and s390x-redhat-linux. Ok for master? During ICE reporting, sometimes rtl_dump_bb is called on partially initialized basic blocks. This produces another ICE, obscuring the original problem. Fix by checking that that basic blocks are initialized before touching their bb_infos. gcc/ChangeLog: 2020-11-25 Ilya Leoshkevich * cfgrtl.c (rtl_bb_info_initialized_p): New function. (rtl_dump_bb): Use rtl_bb_info_initialized_p before accessing bb insns. --- gcc/cfgrtl.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index 45d84d39b22..5e909e25882 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -97,6 +97,7 @@ static basic_block rtl_split_block (basic_block, void *); static void rtl_dump_bb (FILE *, basic_block, int, dump_flags_t); static int rtl_verify_flow_info_1 (void); static void rtl_make_forwarder_block (edge); +static bool rtl_bb_info_initialized_p (basic_block bb); /* Return true if NOTE is not one of the ones that must be kept paired, so that we may simply delete it. */ @@ -2149,7 +2150,8 @@ rtl_dump_bb (FILE *outf, basic_block bb, int indent, dump_flags_t flags) putc ('\n', outf); } - if (bb->index != ENTRY_BLOCK && bb->index != EXIT_BLOCK) + if (bb->index != ENTRY_BLOCK && bb->index != EXIT_BLOCK + && rtl_bb_info_initialized_p (bb)) { rtx_insn *last = BB_END (bb); if (last) @@ -5135,6 +5137,12 @@ init_rtl_bb_info (basic_block bb) bb->il.x.rtl = ggc_cleared_alloc (); } +static bool +rtl_bb_info_initialized_p (basic_block bb) +{ + return bb->il.x.rtl; +} + /* Returns true if it is possible to remove edge E by redirecting it to the destination of the other edge from E->src. */ -- 2.25.4
[PATCH] profopt-execute: unset testname_with_flags if create_gcov fails
Bootstrapped and regtested on x86_64-redhat-linux and s390x-redhat-linux. Ok for master? When diffing test results, there sometimes occur spurious "New tests that PASS" / "Old tests that passed, that have disappeared" messages. The reason is that if create_gcov is not installed, then the cached testname_with_flags is not cleared and is carried over to the next test. gcc/testsuite/ChangeLog: 2020-11-26 Ilya Leoshkevich * lib/profopt.exp: Unset testname_with_flags if create_gcov fails. --- gcc/testsuite/lib/profopt.exp | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp index d6863439d04..98e1a0e63af 100644 --- a/gcc/testsuite/lib/profopt.exp +++ b/gcc/testsuite/lib/profopt.exp @@ -456,6 +456,7 @@ proc profopt-execute { src } { set id [remote_spawn "" $cmd] if { $id < 0 } { unsupported "$testcase -fauto-profile: cannot run create_gcov" + unset testname_with_flags set status "fail" return } -- 2.25.4
[PATCH] IBM Z: Restrict vec_cmp on z13
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Commit 5d9ade39b872 ("IBM Z: Fix PR97326: Enable fp compares in vec_cmp") made it possible to create rtxes that describe signaling comparisons on z13, which are not supported by the hardware. Restrict this by using vcond_comparison_operator predicate. gcc/ChangeLog: 2020-11-24 Ilya Leoshkevich * config/s390/vector.md: Use vcond_comparison_operator predicate. --- gcc/config/s390/vector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index fef68644625..029ee0886c2 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -1561,7 +1561,7 @@ (define_expand "copysign3" (define_expand "vec_cmp" [(set (match_operand: 0 "register_operand" "") - (match_operator: 1 "" + (match_operator: 1 "vcond_comparison_operator" [(match_operand:V_HW 2 "register_operand" "") (match_operand:V_HW 3 "register_operand" "")]))] "TARGET_VX" -- 2.25.4
[PATCH] IBM Z: Update autovec-*-quiet-uneq expectations
Commit 229752afe315 ("VEC_COND_EXPR optimizations") has improved code generation: we no longer need "vx x,x,-1", which turned out to be superfluous. Instead, we simply swap 0 and -1 arguments of the preceding "vsel". gcc/testsuite/ChangeLog: 2020-11-23 Ilya Leoshkevich * gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Expect that "vx" is not emitted. * gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Likewise. --- .../gcc.target/s390/zvector/autovec-double-quiet-uneq.c | 5 - .../gcc.target/s390/zvector/autovec-float-quiet-uneq.c | 5 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c index 3d6da30beac..7c9b20fd2e0 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c @@ -5,6 +5,9 @@ AUTOVEC_DOUBLE (QUIET_UNEQ); +/* { dg-final { scan-assembler {\n\tvzero\t} } } */ +/* { dg-final { scan-assembler {\n\tvgmg\t} } } */ /* { dg-final { scan-assembler-times {\n\tvfchdb\t} 2 } } */ /* { dg-final { scan-assembler {\n\tvo\t} } } */ -/* { dg-final { scan-assembler {\n\tvx\t} } } */ +/* { dg-final { scan-assembler {\n\tvsel\t} } } */ +/* { dg-final { scan-assembler-not {\n\tvx\t} } } */ diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c index 1df53a99bc8..5ab9337880d 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c @@ -5,6 +5,9 @@ AUTOVEC_FLOAT (QUIET_UNEQ); +/* { dg-final { scan-assembler {\n\tvzero\t} } } */ +/* { dg-final { scan-assembler {\n\tvgmf\t} } } */ /* { dg-final { scan-assembler-times {\n\tvfchsb\t} 2 } } */ /* { dg-final { scan-assembler {\n\tvo\t} } } */ -/* { dg-final { scan-assembler {\n\tvx\t} } } */ +/* { dg-final { scan-assembler {\n\tvsel\t} } } */ +/* { dg-final { scan-assembler-not {\n\tvx\t} } } */ -- 2.25.4
Re: [PATCH v2] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p () before IPA.
On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote: > > On 6/30/20 12:46 PM, Ilya Leoshkevich wrote: > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > > > This is the implementation of Jakub's suggestion: allow > > __builtin_constant_p () after IPA, but fold it into 0. Smoke test > > passed on s390x-redhat-linux, full regtest and bootstrap are > > running on > > x86_64-redhat-linux. > > > > --- > > > > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) > > build > > with GCC 10 fails on s390 with "impossible constraint". > > > > The problem is that jump threading makes __builtin_constant_p () > > lie > > when it splits a path containing a non-constant expression in a way > > that on each of the resulting paths this expression is constant. > > > > Fix by disallowing __builtin_constant_p () on threading paths > > before > > IPA and fold it into 0 after IPA. > > > > gcc/ChangeLog: > > > > 2020-06-30 Ilya Leoshkevich > > > > * tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New > > member. > > (thread_jumps::profitable_jump_thread_path): Do not allow > > __builtin_constant_p () on threading paths unless m_allow_bcp_p > > is set. > > (thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p. > > (pass_thread_jumps::execute): Allow __builtin_constant_p () on > > threading paths after IPA. > > (pass_early_thread_jumps::execute): Do not allow > > __builtin_constant_p () on threading paths before IPA. > > * tree-ssa-threadupdate.c (duplicate_thread_path): Fold > > __builtin_constant_p () on threading paths into 0. > > > > gcc/testsuite/ChangeLog: > > > > 2020-06-30 Ilya Leoshkevich > > > > * gcc.target/s390/builtin-constant-p-threading.c: New test. > So I'm finally getting back to this. Thanks for your patience. > > It's a nasty little problem, and I suspect there's actually some > deeper > issues here. While I'd like to claim its a bad use of b_c_p, I don't > think I can reasonably make that argument. > > So what we have is a b_c_p at the start of an if-else chain. > Subsequent > tests on the "true" arm of the the b_c_p test may throw us off the > constant path (because the constants are out of range). Once all the > tests are passed (it's constant and the constant is in range) the > true > arm's terminal block has a special asm that requires a constant > argument. In the case where we get to the terminal block on the > true > arm, the argument to the b_c_p is used as the constant argument to > the > special asm. > > At first glace jump threading seems to be doing the right thing. > Except > that we end up with two paths to that terminal block with the special > asm, one for each of the two constant arguments to the b_c_p call. > Naturally since that same value is used in the asm, we have to > introduce > a PHI to select between them at the head of the terminal block. Now > the argument in the asm is no longer constant and boom we fail. > > I briefly pondered if we should only throttle when the argument to > the > b_c_p is not used elsewhere. But I think that just hides the problem > and with a little work I could probably extend the testcase to still > fail in that scenario. > > I also briefly pondered if we should isolate the terminal block as > well > (essentially creating one for each unique PHI argument). We'd likely > only need to do that when there's an ASM in the terminal block, but > that > likely just papers over the problem as well since the ASM could be in > a > successor of the terminal block. > > I haven't thought real deeply about it, but I wouldn't be surprised > if > there's other passes that can trigger similar problems. Aggressive > cross-jumping would be the most obvious, but some of the > hosting/sinking > of operations past PHIs would seem potentially problematical as well. > > Jakub suggestion might be the best one in this space. I don't have > anything better right now. The deeper questions about other passes > setting up similar scenarios can probably be punted, I'd expect > threading to be far and above the most common way for this to happen > and > I'd be comfortable faulting in investigation of other cases if/when > they > happen. > > So I retract my initial objections. Let's go with the V2 patch. > > > jeff Hi Jeff, Thanks for having another look! I did x86_64 builds of SPEC and vmlinux, and it seems that in practice v2 does not have any benefit over v1. What do you think about going with the v1, which is less complex? Best regards, Ilya
[PATCH] IBM Z: Do not run long double tests on old machines
Bootstrapped and regtested on z13 s390x-redhat-linux. Ok for master? gcc/testsuite/ChangeLog: 2020-11-12 Ilya Leoshkevich * gcc.target/s390/s390.exp (check_effective_target_s390_z14_hw): New predicate. * gcc.target/s390/vector/long-double-caller-abi-run.c: Use the new predicate. * gcc.target/s390/vector/long-double-copysign.c: Likewise. * gcc.target/s390/vector/long-double-from-double.c: Likewise. * gcc.target/s390/vector/long-double-from-float.c: Likewise. * gcc.target/s390/vector/long-double-from-i16.c: Likewise. * gcc.target/s390/vector/long-double-from-i32.c: Likewise. * gcc.target/s390/vector/long-double-from-i64.c: Likewise. * gcc.target/s390/vector/long-double-from-i8.c: Likewise. * gcc.target/s390/vector/long-double-from-u16.c: Likewise. * gcc.target/s390/vector/long-double-from-u32.c: Likewise. * gcc.target/s390/vector/long-double-from-u64.c: Likewise. * gcc.target/s390/vector/long-double-from-u8.c: Likewise. * gcc.target/s390/vector/long-double-to-double.c: Likewise. * gcc.target/s390/vector/long-double-to-float.c: Likewise. * gcc.target/s390/vector/long-double-to-i16.c: Likewise. * gcc.target/s390/vector/long-double-to-i32.c: Likewise. * gcc.target/s390/vector/long-double-to-i64.c: Likewise. * gcc.target/s390/vector/long-double-to-i8.c: Likewise. * gcc.target/s390/vector/long-double-to-u16.c: Likewise. * gcc.target/s390/vector/long-double-to-u32.c: Likewise. * gcc.target/s390/vector/long-double-to-u64.c: Likewise. * gcc.target/s390/vector/long-double-to-u8.c: Likewise. * gcc.target/s390/vector/long-double-wfaxb.c: Likewise. * gcc.target/s390/vector/long-double-wfdxb.c: Likewise. * gcc.target/s390/vector/long-double-wfsxb-1.c: Likewise. --- gcc/testsuite/gcc.target/s390/s390.exp | 10 ++ .../s390/vector/long-double-caller-abi-run.c | 3 ++- .../gcc.target/s390/vector/long-double-copysign.c | 3 ++- .../gcc.target/s390/vector/long-double-from-double.c | 3 ++- .../gcc.target/s390/vector/long-double-from-float.c| 3 ++- .../gcc.target/s390/vector/long-double-from-i16.c | 3 ++- .../gcc.target/s390/vector/long-double-from-i32.c | 3 ++- .../gcc.target/s390/vector/long-double-from-i64.c | 3 ++- .../gcc.target/s390/vector/long-double-from-i8.c | 3 ++- .../gcc.target/s390/vector/long-double-from-u16.c | 3 ++- .../gcc.target/s390/vector/long-double-from-u32.c | 3 ++- .../gcc.target/s390/vector/long-double-from-u64.c | 3 ++- .../gcc.target/s390/vector/long-double-from-u8.c | 3 ++- .../gcc.target/s390/vector/long-double-to-double.c | 3 ++- .../gcc.target/s390/vector/long-double-to-float.c | 3 ++- .../gcc.target/s390/vector/long-double-to-i16.c| 3 ++- .../gcc.target/s390/vector/long-double-to-i32.c| 3 ++- .../gcc.target/s390/vector/long-double-to-i64.c| 3 ++- .../gcc.target/s390/vector/long-double-to-i8.c | 3 ++- .../gcc.target/s390/vector/long-double-to-u16.c| 3 ++- .../gcc.target/s390/vector/long-double-to-u32.c| 3 ++- .../gcc.target/s390/vector/long-double-to-u64.c| 3 ++- .../gcc.target/s390/vector/long-double-to-u8.c | 3 ++- .../gcc.target/s390/vector/long-double-wfaxb.c | 3 ++- .../gcc.target/s390/vector/long-double-wfdxb.c | 3 ++- .../gcc.target/s390/vector/long-double-wfsxb-1.c | 3 ++- 26 files changed, 60 insertions(+), 25 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index 387a720b8e3..00e0555d55c 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -192,6 +192,16 @@ proc check_effective_target_s390_z13_hw { } { } }] "-march=z13 -m64 -mzarch" ] } { return 0 } else { return 1 } } +proc check_effective_target_s390_z14_hw { } { +if { ![check_runtime s390_check_s390_z14_hw [subst { + int main (void) + { + int x = 0; + asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : ); + return x; + } +}] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 } +} # If a testcase doesn't have special options, use these. global DEFAULT_CFLAGS diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c index f3a41bacc2f..f7315f6c2e9 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c @@ -1,4 +1,5 @@ -/* { dg-do run } */ +/* { dg-do compile } */ /* { dg-options "-O3 -march=z14 -mzarch" } */ +/* { dg-do run { target { s390_z14_hw } } } */ #include "long-double-callee-abi-scan.c" #include "long-double-caller-abi-s
[PATCH] IBM Z: Fix bootstrap breakage due to HAVE_TF macro
Bootstrap and regtest running on s390x-redhat-linux with --enable-shared --with-system-zlib --enable-threads=posix --enable-__cxa_atexit --enable-checking=yes,rtl --enable-gnu-indirect-function --disable-werror --enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=arch13. Ok for master? Commit e627cda56865 ("IBM Z: Store long doubles in vector registers when possible") introduced HAVE_TF macro which expands to a logical "or" of HAVE_ constants. Not all of these constants are available in GENERATOR_FILE context, so a hack was used: simply expand to true in this case, because the actual value matters only during compiler runtime and not during generation. However, one aspect of this value matters during generation after all: whether or not it's a constant, which in this case it appears to be. This results in incorrect values in insn-flags.h and broken bootstrap for some configurations. Fix by using a dummy value that is not a constant. gcc/ChangeLog: 2020-11-10 Ilya Leoshkevich * config/s390/s390.h (HAVE_TF): Use opaque value when GENERATOR_FILE is defined. --- gcc/config/s390/s390.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 8c028317b6b..bc579a3dadd 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -1187,8 +1187,9 @@ struct GTY(()) machine_function #define TARGET_INDIRECT_BRANCH_TABLE s390_indirect_branch_table #ifdef GENERATOR_FILE -/* gencondmd.c is built before insn-flags.h. */ -#define HAVE_TF(icode) true +/* gencondmd.c is built before insn-flags.h. Use an arbitrary opaque value + that cannot be optimized away by gen_insn. */ +#define HAVE_TF(icode) TARGET_HARD_FLOAT #else #define HAVE_TF(icode) (HAVE_##icode##_fpr || HAVE_##icode##_vr) #endif -- 2.25.4
[PATCH 2/2] IBM Z: Test long doubles in vector registers
gcc/testsuite/ChangeLog: 2020-11-05 Ilya Leoshkevich * gcc.target/s390/vector/long-double-callee-abi-scan.c: New test. * gcc.target/s390/vector/long-double-caller-abi-run.c: New test. * gcc.target/s390/vector/long-double-caller-abi-scan.c: New test. * gcc.target/s390/vector/long-double-copysign.c: New test. * gcc.target/s390/vector/long-double-fprx2-constant.c: New test. * gcc.target/s390/vector/long-double-from-double.c: New test. * gcc.target/s390/vector/long-double-from-float.c: New test. * gcc.target/s390/vector/long-double-from-i16.c: New test. * gcc.target/s390/vector/long-double-from-i32.c: New test. * gcc.target/s390/vector/long-double-from-i64.c: New test. * gcc.target/s390/vector/long-double-from-i8.c: New test. * gcc.target/s390/vector/long-double-from-u16.c: New test. * gcc.target/s390/vector/long-double-from-u32.c: New test. * gcc.target/s390/vector/long-double-from-u64.c: New test. * gcc.target/s390/vector/long-double-from-u8.c: New test. * gcc.target/s390/vector/long-double-to-double.c: New test. * gcc.target/s390/vector/long-double-to-float.c: New test. * gcc.target/s390/vector/long-double-to-i16.c: New test. * gcc.target/s390/vector/long-double-to-i32.c: New test. * gcc.target/s390/vector/long-double-to-i64.c: New test. * gcc.target/s390/vector/long-double-to-i8.c: New test. * gcc.target/s390/vector/long-double-to-u16.c: New test. * gcc.target/s390/vector/long-double-to-u32.c: New test. * gcc.target/s390/vector/long-double-to-u64.c: New test. * gcc.target/s390/vector/long-double-to-u8.c: New test. * gcc.target/s390/vector/long-double-vec-duplicate.c: New test. * gcc.target/s390/vector/long-double-wf.h: New test. * gcc.target/s390/vector/long-double-wfaxb.c: New test. * gcc.target/s390/vector/long-double-wfcxb-0001.c: New test. * gcc.target/s390/vector/long-double-wfcxb-0111.c: New test. * gcc.target/s390/vector/long-double-wfcxb-1011.c: New test. * gcc.target/s390/vector/long-double-wfcxb-1101.c: New test. * gcc.target/s390/vector/long-double-wfdxb.c: New test. * gcc.target/s390/vector/long-double-wfixb.c: New test. * gcc.target/s390/vector/long-double-wfkxb-0111.c: New test. * gcc.target/s390/vector/long-double-wfkxb-1011.c: New test. * gcc.target/s390/vector/long-double-wfkxb-1101.c: New test. * gcc.target/s390/vector/long-double-wflcxb.c: New test. * gcc.target/s390/vector/long-double-wflpxb.c: New test. * gcc.target/s390/vector/long-double-wfmaxb-2.c: New test. * gcc.target/s390/vector/long-double-wfmaxb-3.c: New test. * gcc.target/s390/vector/long-double-wfmaxb-disabled.c: New test. * gcc.target/s390/vector/long-double-wfmaxb.c: New test. * gcc.target/s390/vector/long-double-wfmsxb-disabled.c: New test. * gcc.target/s390/vector/long-double-wfmsxb.c: New test. * gcc.target/s390/vector/long-double-wfmxb.c: New test. * gcc.target/s390/vector/long-double-wfnmaxb-disabled.c: New test. * gcc.target/s390/vector/long-double-wfnmaxb.c: New test. * gcc.target/s390/vector/long-double-wfnmsxb-disabled.c: New test. * gcc.target/s390/vector/long-double-wfnmsxb.c: New test. * gcc.target/s390/vector/long-double-wfsqxb.c: New test. * gcc.target/s390/vector/long-double-wfsxb-1.c: New test. * gcc.target/s390/vector/long-double-wfsxb.c: New test. * gcc.target/s390/vector/long-double-wftcixb-1.c: New test. * gcc.target/s390/vector/long-double-wftcixb.c: New test. --- .../s390/vector/long-double-callee-abi-scan.c | 20 +++ .../s390/vector/long-double-caller-abi-run.c | 4 ++ .../s390/vector/long-double-caller-abi-scan.c | 13 .../s390/vector/long-double-copysign.c| 21 +++ .../s390/vector/long-double-fprx2-constant.c | 11 .../s390/vector/long-double-from-double.c | 18 ++ .../s390/vector/long-double-from-float.c | 19 ++ .../s390/vector/long-double-from-i16.c| 19 ++ .../s390/vector/long-double-from-i32.c| 19 ++ .../s390/vector/long-double-from-i64.c| 19 ++ .../s390/vector/long-double-from-i8.c | 19 ++ .../s390/vector/long-double-from-u16.c| 19 ++ .../s390/vector/long-double-from-u32.c| 19 ++ .../s390/vector/long-double-from-u64.c| 19 ++ .../s390/vector/long-double-from-u8.c | 19 ++ .../s390/vector/long-double-to-double.c | 18 ++ .../s390/vector/long-double-to-float.c| 19 ++ .../s390/vector/long-double-to-i16.c | 19 ++ .../s390/vector/long-double-to-i32.c | 19 ++ .../s390/vector/long-double-to-i64.c | 21 +++ .../s390/vector/long-double-to-i8.c | 1
[PATCH 1/2] IBM Z: Store long doubles in vector registers when possible
On z14+, there are instructions for working with 128-bit floats (long doubles) in vector registers. It's beneficial to use them instead of instructions that operate on floating point register pairs, because it allows to store 4 times more data in registers at a time, relieving register pressure. The raw performance of the new instructions is almost the same as that of the new ones. Implement by storing TFmode values in vector registers on z14+. Since not all operations are available with the new instructions, keep the old ones available using the new FPRX2 mode, and convert between it and TFmode when necessary (this is called "forwarder" expanders below). Change the existing TFmode expanders to call either new- or old-style ones depending on whether we are on z14+ or older machines ("dispatcher" expanders). gcc/ChangeLog: 2020-11-03 Ilya Leoshkevich * config/s390/s390-modes.def (FPRX2): New mode. * config/s390/s390-protos.h (s390_fma_allowed_p): New function. * config/s390/s390.c (s390_fma_allowed_p): Likewise. (s390_build_signbit_mask): Support 128-bit masks. (print_operand): Support printing the second word of a TFmode operand as vector register. (constant_modes): Add FPRX2mode. (s390_class_max_nregs): Return 1 for TFmode on z14+. (s390_is_fpr128): New function. (s390_is_vr128): Likewise. (s390_can_change_mode_class): Use s390_is_fpr128 and s390_is_vr128 in order to determine whether mode refers to a FPR pair or to a VR. (s390_emit_compare): Force TFmode operands into registers on z14+. * config/s390/s390.h (HAVE_TF): New macro. (EXPAND_MOVTF): New macro. (EXPAND_TF): Likewise. * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF alias. (ALL): Add FPRX2. (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-. (FP): Likewise. (FP_ANYTF): New mode iterator. (BFP): Add FPRX2 for z14+, restrict TFmode to z13-. (TD_TF): Likewise. (xde): Add FPRX2. (nBFP): Likewise. (nDFP): Likewise. (DSF): Likewise. (DFDI): Likewise. (SFSI): Likewise. (DF): Likewise. (SF): Likewise. (fT0): Likewise. (bt): Likewise. (_d): Likewise. (HALF_TMODE): Likewise. (tf_fpr): New mode_attr. (type): New mode_attr. (*cmp_ccz_0): Use type instead of mode with fsimp. (*cmp_ccs_0_fastmath): Likewise. (*cmptf_ccs): New pattern for wfcxb. (*cmptf_ccsfps): New pattern for wfkxb. (mov): Rename to mov. (signbit2): Rename to signbit2. (isinf2): Renamed to isinf2. (*TDC_insn_): Use type instead of mode with fsimp. (fixuns_trunc2): Rename to fixuns_trunc2. (fix_trunctf2): Rename to fix_trunctf2_fpr. (floatdi2): Rename to floatdi2, use type instead of mode with itof. (floatsi2): Rename to floatsi2, use type instead of mode with itof. (*floatuns2): Use type instead of mode for itof. (floatuns2): Rename to floatuns2. (trunctf2): Rename to trunctf2_fpr, use type instead of mode with fsimp. (extend2): Rename to extend2. (2): Rename to 2, use type instead of mode with fsimp. (rint2): Rename to rint2, use type instead of mode with fsimp. (2): Use type instead of mode for fsimp. (rint2): Likewise. (trunc2): Rename to trunc2. (trunc2): Rename to trunc2. (extend2): Rename to extend2. (extend2): Rename to extend2. (add3): Rename to add3, use type instead of mode with fsimp. (*add3_cc): Use type instead of mode with fsimp. (*add3_cconly): Likewise. (sub3): Rename to sub3, use type instead of mode with fsimp. (*sub3_cc): Use type instead of mode with fsimp. (*sub3_cconly): Likewise. (mul3): Rename to mul3, use type instead of mode with fsimp. (fma4): Restrict using s390_fma_allowed_p. (fms4): Restrict using s390_fma_allowed_p. (div3): Rename to div3, use type instead of mode with fdiv. (neg2): Rename to neg2. (*neg2_cc): Use type instead of mode with fsimp. (*neg2_cconly): Likewise. (*neg2_nocc): Likewise. (*neg2): Likeiwse. (abs2): Rename to abs2, use type instead of mode with fdiv. (*abs2_cc): Use type instead of mode with fsimp. (*abs2_cconly): Likewise. (*abs2_nocc): Likewise. (*abs2): Likewise. (*negabs2_cc): Likewise. (*negabs2_cconly): Likewise. (*negabs2_nocc): Likewise. (*negabs2): Likewise. (sqrt2): Rename to sqrt2, use type instead of mode with fsqrt. (cbranch4): Us
[PATCH 0/2] IBM Z: Store long doubles in vector registers when possible
Bootstrapped and regtested on s390x-redhat-linux with --with-arch=z15. Ok for master? This patch series implements storing long doubles in vector registers on z14+. Patch 1 is the actual implementation, patch 2 adds tests. v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557968.html v1 -> v2: * Committed cleanups. * Do not use general_operand for *cmptf_ccs. * Fix expander condition mismatches. * Move tests to from zvector to vector, do not use -mzvector. * Merge scan and run tests where possible. Ilya Leoshkevich (2): IBM Z: Store long doubles in vector registers when possible IBM Z: Test long doubles in vector registers gcc/config/s390/s390-modes.def| 5 +- gcc/config/s390/s390-protos.h | 1 + gcc/config/s390/s390.c| 57 ++- gcc/config/s390/s390.h| 35 ++ gcc/config/s390/s390.md | 209 ++ gcc/config/s390/s390.opt | 11 + gcc/config/s390/vector.md | 382 -- gcc/config/s390/vx-builtins.md| 38 +- .../s390/vector/long-double-callee-abi-scan.c | 20 + .../s390/vector/long-double-caller-abi-run.c | 4 + .../s390/vector/long-double-caller-abi-scan.c | 13 + .../s390/vector/long-double-copysign.c| 21 + .../s390/vector/long-double-fprx2-constant.c | 11 + .../s390/vector/long-double-from-double.c | 18 + .../s390/vector/long-double-from-float.c | 19 + .../s390/vector/long-double-from-i16.c| 19 + .../s390/vector/long-double-from-i32.c| 19 + .../s390/vector/long-double-from-i64.c| 19 + .../s390/vector/long-double-from-i8.c | 19 + .../s390/vector/long-double-from-u16.c| 19 + .../s390/vector/long-double-from-u32.c| 19 + .../s390/vector/long-double-from-u64.c| 19 + .../s390/vector/long-double-from-u8.c | 19 + .../s390/vector/long-double-to-double.c | 18 + .../s390/vector/long-double-to-float.c| 19 + .../s390/vector/long-double-to-i16.c | 19 + .../s390/vector/long-double-to-i32.c | 19 + .../s390/vector/long-double-to-i64.c | 21 + .../s390/vector/long-double-to-i8.c | 19 + .../s390/vector/long-double-to-u16.c | 20 + .../s390/vector/long-double-to-u32.c | 20 + .../s390/vector/long-double-to-u64.c | 20 + .../s390/vector/long-double-to-u8.c | 20 + .../s390/vector/long-double-vec-duplicate.c | 13 + .../gcc.target/s390/vector/long-double-wf.h | 60 +++ .../s390/vector/long-double-wfaxb.c | 17 + .../s390/vector/long-double-wfcxb-0001.c | 9 + .../s390/vector/long-double-wfcxb-0111.c | 9 + .../s390/vector/long-double-wfcxb-1011.c | 9 + .../s390/vector/long-double-wfcxb-1101.c | 9 + .../s390/vector/long-double-wfdxb.c | 17 + .../s390/vector/long-double-wfixb.c | 7 + .../s390/vector/long-double-wfkxb-0111.c | 9 + .../s390/vector/long-double-wfkxb-1011.c | 9 + .../s390/vector/long-double-wfkxb-1101.c | 9 + .../s390/vector/long-double-wflcxb.c | 7 + .../s390/vector/long-double-wflpxb.c | 7 + .../s390/vector/long-double-wfmaxb-2.c| 24 ++ .../s390/vector/long-double-wfmaxb-3.c| 14 + .../s390/vector/long-double-wfmaxb-disabled.c | 8 + .../s390/vector/long-double-wfmaxb.c | 7 + .../s390/vector/long-double-wfmsxb-disabled.c | 8 + .../s390/vector/long-double-wfmsxb.c | 7 + .../s390/vector/long-double-wfmxb.c | 7 + .../vector/long-double-wfnmaxb-disabled.c | 9 + .../s390/vector/long-double-wfnmaxb.c | 7 + .../vector/long-double-wfnmsxb-disabled.c | 9 + .../s390/vector/long-double-wfnmsxb.c | 7 + .../s390/vector/long-double-wfsqxb.c | 7 + .../s390/vector/long-double-wfsxb-1.c | 21 + .../s390/vector/long-double-wfsxb.c | 7 + .../s390/vector/long-double-wftcixb-1.c | 15 + .../s390/vector/long-double-wftcixb.c | 7 + 63 files changed, 1412 insertions(+), 134 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-callee-abi-scan.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-scan.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-copysign.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-fprx2-constant.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-double.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-float.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-i16.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-i32.c create mode 100644 gcc/testsuite/gcc
Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers
On Wed, 2020-11-04 at 18:28 +0100, Andreas Krebbel wrote: > These tests all use the -mzvector option but do not appear to make > use of the z vector languages > extensions. I think that option could be removed. Then these tests > should be moved to the vector subdir. Will change, thanks! > You could do the asm scanning also in dg-do run tests. This doesn't seem to work. For example, if I add /* { dg-final { scan-assembler-times {aaa} 999 } } */ to long-double-from-double-run.c, it won't fail. > > Andreas > > > On 03.11.20 22:46, Ilya Leoshkevich wrote: > > gcc/testsuite/ChangeLog: > > > > 2020-11-03 Ilya Leoshkevich > > > > * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-caller-abi-run.c: New > > test. > > * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-copysign-run.c: New test. > > * gcc.target/s390/zvector/long-double-copysign-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-fprx2-constant.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-double-run.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-double-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-float-run.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-float-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-i16-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-i16-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-i32-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-i32-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-i64-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-i64-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-i8-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test. > > * gcc.target/s390/zvector/long-double-from-u16-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-u16-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-u32-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-u32-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-u64-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-u64-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-from-u8-run.c: New test. > > * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-double-run.c: New > > test. > > * gcc.target/s390/zvector/long-double-to-double-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-to-float-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-float-scan.c: New > > test. > > * gcc.target/s390/zvector/long-double-to-i16-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-i32-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-i64-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-i8-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-u16-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-u32-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-u64-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test. > > * gcc.target/s390/zvector/long-double-to-u8-run.c: New test. > > * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test. > > * gcc.target/s390/zvector/long-double-vec-duplicate.c: New > > test. > > * gcc.target/s390/zvector/long-double-wf.h: New test. > > * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test. > > * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test. > > * gcc.target/s390/zvector/long-double-wfaxb.c: New test. > > * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test. > > * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test. > > * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test. > > * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test. > > * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test. > > * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test. > > * gcc.target/s390/zvector/long-double-wfdxb.c: New test. > > * gcc.target/s390/zvector/long-double-wfixb.c: New test. > > * gcc.target/s390/zvector/long-double-wfkxb-0111.c: New test. > >
Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible
On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote: > On 03.11.20 22:45, Ilya Leoshkevich wrote: > > On z14+, there are instructions for working with 128-bit floats > > (long > > doubles) in vector registers. It's beneficial to use them instead > > of > > instructions that operate on floating point register pairs, because > > it > > allows to store 4 times more data in registers at a time, > > relieveing > > register pressure. The performance of new instructions is almost > > the > > same. > > > > Implement by storing TFmode values in vector registers on > > z14+. Since > > not all operations are available with the new instructions, keep > > the old > > ones using the new FPRX2 mode, and convert between it and TFmode > > when > > necessary (this is called "forwarder" expanders below). Change the > > existing TFmode expanders to call either new- or old-style ones > > depending on whether we are on z14+ or older machines ("dispatcher" > > expanders). > > > > gcc/ChangeLog: > > > > 2020-11-03 Ilya Leoshkevich > > > > * config/s390/s390-modes.def (FPRX2): New mode. > > * config/s390/s390-protos.h (s390_fma_allowed_p): New function. > > * config/s390/s390.c (s390_fma_allowed_p): Likewise. > > (s390_build_signbit_mask): Support 128-bit masks. > > (print_operand): Support printing the second word of a TFmode > > operand as vector register. > > (constant_modes): Add FPRX2mode. > > (s390_class_max_nregs): Return 1 for TFmode on z14+. > > (s390_is_fpr128): New function. > > (s390_is_vr128): Likewise. > > (s390_can_change_mode_class): Use s390_is_fpr128 and > > s390_is_vr128 in order to determine whether mode refers to a > > FPR > > pair or to a VR. > > * config/s390/s390.h (EXPAND_MOVTF): New macro. > > (EXPAND_TF): Likewise. > > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF > > alias. > > (ALL): Add FPRX2. > > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-. > > (FP): Likewise. > > (FP_ANYTF): New mode iterator. > > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-. > > (TD_TF): Likewise. > > (xde): Add FPRX2. > > (nBFP): Likewise. > > (nDFP): Likewise. > > (DSF): Likewise. > > (DFDI): Likewise. > > (SFSI): Likewise. > > (DF): Likewise. > > (SF): Likewise. > > (fT0): Likewise. > > (bt): Likewise. > > (_d): Likewise. > > (HALF_TMODE): Likewise. > > (tf_fpr): New mode_attr. > > (type): New mode_attr. > > (*cmp_ccz_0): Use type instead of mode with fsimp. > > (*cmp_ccs_0_fastmath): Likewise. > > (*cmptf_ccs): New pattern for wfcxb. > > (*cmptf_ccsfps): New pattern for wfkxb. > > (mov): Rename to mov. > > (signbit2): Rename to signbit2. > > (isinf2): Renamed to isinf2. > > (*TDC_insn_): Use type instead of mode with fsimp. > > (fixuns_trunc2): Rename to > > fixuns_trunc2. > > (fix_trunctf2): Rename to fix_trunctf2_fpr. > > (floatdi2): Rename to floatdi2, use type > > instead of mode with itof. > > (floatsi2): Rename to floatsi2, use type > > instead of mode with itof. > > (*floatuns2): Use type instead of mode for > > itof. > > (floatuns2): Rename to > > floatuns2. > > (trunctf2): Rename to trunctf2_fpr, use type > > instead > > of mode with fsimp. > > (extend2): Rename to > > extend2. > > (2): Rename to > > 2, use type instead of > > mode with fsimp. > > (rint2): Rename to rint2, use > > type instead of mode with fsimp. > > (2): Use type instead of mode for > > fsimp. > > (rint2): Likewise. > > (trunc2): Rename to > > trunc2. > > (trunc2): Rename to > > trunc2. > > (extend2): Rename to > > extend2. > > (extend2): Rename to > > extend2. > > (add3): Rename to add3, use type instead of > > mode with fsimp. > > (*add3_cc): Use type instead of mode with fsimp. > > (*add3_cconly): Likewise. > > (sub3): Rename to sub3, use type instead of > > mode with fsimp. > > (*sub3_cc): Use type instead of mode with fsimp. > > (*sub3_cconly): Likewise. > > (mul3): Rename to mul3, use type instead of > > mode with fsimp. > > (fma4): Restrict using s390_fma_allowed_p. > > (fms4): Restrict using s390_fma_allowed_p. > > (div3): Rename to div3, use type instead of > > mode with fdiv. > > (neg2): Rename to neg2. > > (*neg2_cc): Use type instead of mode with fsimp. > > (*neg2_cconly): Likewise. > > (*neg2_nocc): Likewise. > > (*neg2): Likeiwse. > > (abs2): Rename to abs2, use type instead of > > mode with fdiv. > > (*abs2_cc): Use type instead of mode with fsimp. > > (*abs2_cconly): Likewise. > > (*abs2_nocc): Likewise. > > (*abs2): Likewise. > > (*negabs2_cc): Likewise. > > (*negabs2_cconly): Likewise. > > (*negabs2_nocc): Likewise. > > (*negabs2): Likewise. > > (sqrt2): Rename to sqrt2, u