Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]
On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote: > On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote: > > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus > > wrote: > > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote: > > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus > > > > wrote: > > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich > > > > > wrote: > > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze > > > > > > Frielinghaus > > > > > > wrote: > > > > > > > Currently subregs originating from *tf_to_fprx2_0 and > > > > > > > *tf_to_fprx2_1 > > > > > > > survive register allocation. This in turn leads to wrong > > > > > > > register > > > > > > > renaming. Keeping the current approach would mean we > > > > > > > need > > > > > > > two > > > > > > > insns > > > > > > > for > > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. > > > > > > > Something > > > > > > > along > > > > > > > the > > > > > > > lines > > > > > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > > > [(set (subreg:DF (match_operand:FPRX2 0 > > > > > > > "nonimmediate_operand" > > > > > > > "=f") 0) > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > > > "v")] > > > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > > > "TARGET_VXE" > > > > > > > "#") > > > > > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > > > "v")] > > > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > > > "TARGET_VXE" > > > > > > > "vpdi\t%v0,%v1,%v0,1 > > > > > > > [(set_attr "op_type" "VRR")]) > > > > > > > > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register > > > > > > > allocation > > > > > > > operand 0 > > > > > > > has mode FPRX2 and afterwards DF once subregs have been > > > > > > > eliminated. > > > > > > > > > > > > > > Since we always copy a whole vector register into a > > > > > > > floating- > > > > > > > point > > > > > > > register pair, another way to fix this is to merge > > > > > > > *tf_to_fprx2_0 > > > > > > > and > > > > > > > *tf_to_fprx2_1 into a single insn which means we don't > > > > > > > have > > > > > > > to > > > > > > > use > > > > > > > subregs at all. The downside of this is that the > > > > > > > assembler > > > > > > > template > > > > > > > contains two instructions, now. The upside is that we > > > > > > > don't > > > > > > > have > > > > > > > to > > > > > > > come up with some artificial insn before RA which might > > > > > > > be > > > > > > > more > > > > > > > readable/maintainable. That is implemented by this > > > > > > > patch. > > > > > > > > > > > > > > In commit r11-4872-ge627cda5686592, the output operand > > > > > > > specifier > > > > > > > %V > > > > > > > was > > > > > > > introduced which is used in tf_to_fprx2 only, now. I > > > > > > > didn't > > > > > > > come > > > > > > > up > > > > > > > with its counterpart like %F for floating-point > >
Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]
On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus wrote: > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote: > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus > > wrote: > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote: > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus > > > > wrote: > > > > > Currently subregs originating from *tf_to_fprx2_0 and > > > > > *tf_to_fprx2_1 > > > > > survive register allocation. This in turn leads to wrong > > > > > register > > > > > renaming. Keeping the current approach would mean we need > > > > > two > > > > > insns > > > > > for > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something > > > > > along > > > > > the > > > > > lines > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > [(set (subreg:DF (match_operand:FPRX2 0 > > > > > "nonimmediate_operand" > > > > > "=f") 0) > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > "v")] > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > "TARGET_VXE" > > > > > "#") > > > > > > > > > > (define_insn "*tf_to_fprx2_0" > > > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > > > (unspec:DF [(match_operand:TF 1 "general_operand" > > > > > "v")] > > > > > UNSPEC_TF_TO_FPRX2_0))] > > > > > "TARGET_VXE" > > > > > "vpdi\t%v0,%v1,%v0,1 > > > > > [(set_attr "op_type" "VRR")]) > > > > > > > > > > and similar for *tf_to_fprx2_1. Note, pre register > > > > > allocation > > > > > operand 0 > > > > > has mode FPRX2 and afterwards DF once subregs have been > > > > > eliminated. > > > > > > > > > > Since we always copy a whole vector register into a floating- > > > > > point > > > > > register pair, another way to fix this is to merge > > > > > *tf_to_fprx2_0 > > > > > and > > > > > *tf_to_fprx2_1 into a single insn which means we don't have > > > > > to > > > > > use > > > > > subregs at all. The downside of this is that the assembler > > > > > template > > > > > contains two instructions, now. The upside is that we don't > > > > > have > > > > > to > > > > > come up with some artificial insn before RA which might be > > > > > more > > > > > readable/maintainable. That is implemented by this patch. > > > > > > > > > > In commit r11-4872-ge627cda5686592, the output operand > > > > > specifier > > > > > %V > > > > > was > > > > > introduced which is used in tf_to_fprx2 only, now. I didn't > > > > > come > > > > > up > > > > > with its counterpart like %F for floating-point registers. > > > > > Instead I > > > > > printed the register pair in the output function directly. > > > > > This > > > > > spares > > > > > us a new and "rare" format specifier for a single insn. I > > > > > don't > > > > > have > > > > > a > > > > > strong opinion which option to choose, however, we should > > > > > either > > > > > add > > > > > %F > > > > > in order to mimic the same behaviour as %V or getting rid of > > > > > %V > > > > > and > > > > > inline the logic in the output function. I lean towards the > > > > > latter. > > > > > Any preferences? > > > > > --- > > > > > gcc/config/s390/s390.md | 2 + > > > > > gcc/config/s390/vector.md | 66 +++- > > > > > > > > > > > > > > > -- > > > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 + > > > > >
Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]
On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote: > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote: > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus > > wrote: > > > Currently subregs originating from *tf_to_fprx2_0 and > > > *tf_to_fprx2_1 > > > survive register allocation. This in turn leads to wrong > > > register > > > renaming. Keeping the current approach would mean we need two > > > insns > > > for > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along > > > the > > > lines > > > > > > (define_insn "*tf_to_fprx2_0" > > > [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" > > > "=f") 0) > > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > > UNSPEC_TF_TO_FPRX2_0))] > > > "TARGET_VXE" > > > "#") > > > > > > (define_insn "*tf_to_fprx2_0" > > > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > > > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > > > UNSPEC_TF_TO_FPRX2_0))] > > > "TARGET_VXE" > > > "vpdi\t%v0,%v1,%v0,1 > > > [(set_attr "op_type" "VRR")]) > > > > > > and similar for *tf_to_fprx2_1. Note, pre register allocation > > > operand 0 > > > has mode FPRX2 and afterwards DF once subregs have been > > > eliminated. > > > > > > Since we always copy a whole vector register into a floating- > > > point > > > register pair, another way to fix this is to merge *tf_to_fprx2_0 > > > and > > > *tf_to_fprx2_1 into a single insn which means we don't have to > > > use > > > subregs at all. The downside of this is that the assembler > > > template > > > contains two instructions, now. The upside is that we don't have > > > to > > > come up with some artificial insn before RA which might be more > > > readable/maintainable. That is implemented by this patch. > > > > > > In commit r11-4872-ge627cda5686592, the output operand specifier > > > %V > > > was > > > introduced which is used in tf_to_fprx2 only, now. I didn't come > > > up > > > with its counterpart like %F for floating-point registers. > > > Instead I > > > printed the register pair in the output function directly. This > > > spares > > > us a new and "rare" format specifier for a single insn. I don't > > > have > > > a > > > strong opinion which option to choose, however, we should either > > > add > > > %F > > > in order to mimic the same behaviour as %V or getting rid of %V > > > and > > > inline the logic in the output function. I lean towards the > > > latter. > > > Any preferences? > > > --- > > > gcc/config/s390/s390.md | 2 + > > > gcc/config/s390/vector.md | 66 +++- > > > > > > -- > > > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 + > > > 3 files changed, 60 insertions(+), 34 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c > > > > [...] > > > > > + char buf[64]; > > > + switch (which_alternative) > > > + { > > > + case 0: > > > + if (REGNO (operands[0]) == REGNO (operands[1])) > > > + return "vpdi\t%V0,%v1,%V0,5"; > > > + else > > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > > > + case 1: > > > + { > > > + const char *reg_pair = reg_names[REGNO (operands[0]) + > > > 1]; > > > + snprintf (buf, sizeof (buf), > > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > > > reg_pair); > > > > I wonder if there is a corner case where 8+ does not fit into short > > displacement? > > That is covered by constraint AR, i.e., for short displacement, and > AT > for long displacement. Don't they cover only %1, and not 8+%1? Can't there be a situation where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that the code doesn't leave any allowance for this: "AR" s390_mem_constraint("AR") s390_check_qrst_address('R') s390_short_displacement() INTVAL (disp) >= 0 && INTVAL (disp) < 4096
Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]
On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote: > Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1 > survive register allocation. This in turn leads to wrong register > renaming. Keeping the current approach would mean we need two insns > for > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. Something along the > lines > > (define_insn "*tf_to_fprx2_0" > [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" > "=f") 0) > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > UNSPEC_TF_TO_FPRX2_0))] > "TARGET_VXE" > "#") > > (define_insn "*tf_to_fprx2_0" > [(set (match_operand:DF 0 "nonimmediate_operand" "=f") > (unspec:DF [(match_operand:TF 1 "general_operand" "v")] > UNSPEC_TF_TO_FPRX2_0))] > "TARGET_VXE" > "vpdi\t%v0,%v1,%v0,1 > [(set_attr "op_type" "VRR")]) > > and similar for *tf_to_fprx2_1. Note, pre register allocation > operand 0 > has mode FPRX2 and afterwards DF once subregs have been eliminated. > > Since we always copy a whole vector register into a floating-point > register pair, another way to fix this is to merge *tf_to_fprx2_0 and > *tf_to_fprx2_1 into a single insn which means we don't have to use > subregs at all. The downside of this is that the assembler template > contains two instructions, now. The upside is that we don't have to > come up with some artificial insn before RA which might be more > readable/maintainable. That is implemented by this patch. > > In commit r11-4872-ge627cda5686592, the output operand specifier %V > was > introduced which is used in tf_to_fprx2 only, now. I didn't come up > with its counterpart like %F for floating-point registers. Instead I > printed the register pair in the output function directly. This > spares > us a new and "rare" format specifier for a single insn. I don't have > a > strong opinion which option to choose, however, we should either add > %F > in order to mimic the same behaviour as %V or getting rid of %V and > inline the logic in the output function. I lean towards the latter. > Any preferences? > --- > gcc/config/s390/s390.md | 2 + > gcc/config/s390/vector.md | 66 +++- > -- > gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 + > 3 files changed, 60 insertions(+), 34 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c [...] > + char buf[64]; > + switch (which_alternative) > + { > + case 0: > + if (REGNO (operands[0]) == REGNO (operands[1])) > + return "vpdi\t%V0,%v1,%V0,5"; > + else > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5"; > + case 1: > + { > + const char *reg_pair = reg_names[REGNO (operands[0]) + 1]; > + snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1", > reg_pair); I wonder if there is a corner case where 8+ does not fit into short displacement? [...]
Re: [PATCH] s390: Fix s390_const_int_pool_entry_p and movdi peephole2 [PR114605]
On Sat, 2024-04-06 at 18:58 +0200, Jakub Jelinek wrote: > Hi! > > The following testcase is miscompiled, because we have initially > a movti which loads the 0x3f803f80ULL TImode constant > from constant pool. Later on we split it into a pair of DImode > loads. Now, for the first load (why just that?, though not stage4 > material) we trigger the peephole2 which uses > s390_const_int_pool_entry_p. > That function doesn't check at all the constant pool mode though, > sees > the constant pool at that address has a CONST_INT value and just > assumes > that is the value to return, which is especially wrong for big- > endian, > if it is a DImode load from offset 0, it should be loading 0 rather > than > 0x3f803f80ULL. > The following patch adds checks if we are extracing a MODE_INT mode, > if the constant pool has MODE_INT mode as well, punts if constant > pool > has smaller mode size than the extraction one (then it would be UB), > if it has the same mode as before keeps using what it did before, > if constant pool has a larger mode than the one being extracted, uses > simplify_subreg. I'd have used avoid_constant_pool_reference > instead which can handle also offsets into the constant pool > constants, > but it can't handle UNSPEC_LTREF. > > Another thing is that once that is fixed, we ICE when we extract > constant > like 0, ior insn predicate require non-0 constant. So, the patch > also > fixes the peephole2 so that if either 32-bit half is zero, it uses a > mere > load of the constant into register rather than a pair of such load > and ior. > > Bootstrapped/regtested on s390x-linux, ok for trunk? Hi Jakub, thanks for the patch, it looks good to me. Since I'm not a maintainer, we need to wait for Andreas' opinion. > > 2024-04-06 Jakub Jelinek > > PR target/114605 > * config/s390/s390.cc (s390_const_int_pool_entry_p): Punt > if mem doesn't have MODE_INT mode, or pool constant doesn't > have MODE_INT mode, or if pool constant mode is smaller than > mem mode. If mem mode is different from pool constant mode, > try to simplify subreg. If that doesn't work, punt, if it > does, use the simplified constant instead of the constant > pool > constant. > * config/s390/s390.md (movdi from const pool peephole): If > either low or high 32-bit part is zero, just emit move insn > instead of move + ior. > > * gcc.dg/pr114605.c: New test.
[PATCH] libsanitizer: Do not mention MSan and DFSan in an error message
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? libsanitizer/ChangeLog: * sanitizer_common/sanitizer_linux_s390.cpp (AvoidCVE_2016_2143): Do not mention MSan and DFSan, which are not supported by GCC. --- libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp b/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp index 74db831b0aa..65ba825fa97 100644 --- a/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp +++ b/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp @@ -212,7 +212,7 @@ void AvoidCVE_2016_2143() { return; Report( "ERROR: Your kernel seems to be vulnerable to CVE-2016-2143. Using ASan,\n" -"MSan, TSan, DFSan or LSan with such kernel can and will crash your\n" +"TSan or LSan with such kernel can and will crash your\n" "machine, or worse.\n" "\n" "If you are certain your kernel is not vulnerable (you have compiled it\n" -- 2.44.0
[PATCH] IBM Z: Preserve exceptions in autovec-*-signaling-eq.c tests
DSE, DCE, and other passes are removing redundant signaling comparisons from these tests, but the whole point is to check that GCC knows how to emit them. Use -fno-delete-dead-exceptions to prevent that. gcc/testsuite/ChangeLog: * gcc.target/s390/zvector/autovec-double-signaling-eq.c: Preserve exceptions. * gcc.target/s390/zvector/autovec-float-signaling-eq.c: Likewise. --- .../gcc.target/s390/zvector/autovec-double-signaling-eq.c | 2 +- .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c index 3645d3cc393..b23568e06b4 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions -fno-delete-dead-exceptions" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c index d98aa0c494e..cd25d10c577 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions -fno-delete-dead-exceptions" } */ #include "autovec.h" -- 2.43.2
[PATCH] Mark ASM_OUTPUT_FUNCTION_LABEL ()'s DECL argument as used
Compile tested for the ia64-elf target; bootstrap and regtest running on x86_64-redhat-linux. Ok for trunk when successful? ia64-elf build fails with the following warning: [all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc:3889:59: error: unused parameter 'decl' [-Werror=unused-parameter] [all 2024-01-12 16:32:34] 3889 | ia64_start_function (FILE *file, const char *fnname, tree decl) decl is passed to ASM_OUTPUT_FUNCTION_LABEL (), whose default implementation does not use it. Mark it as used in order to avoid the warning. Reported-by: Jan-Benedict Glaw Suggested-by: Jan-Benedict Glaw Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL") Signed-off-by: Ilya Leoshkevich gcc/ChangeLog: * defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Mark DECL as used. --- gcc/defaults.h | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index 92f3e07f742..1a2ea68a543 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -149,8 +149,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see NAME, such as the label on a function. */ #ifndef ASM_OUTPUT_FUNCTION_LABEL -#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \ - assemble_function_label_raw ((FILE), (NAME)) +#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL)\ + do { \ +(void) (DECL); \ +assemble_function_label_raw ((FILE), (NAME)); \ + } while (0) #endif /* Output the definition of a compiler-generated label named NAME. */ -- 2.43.0
[PATCH v2] rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]
v1: https://inbox.sourceware.org/gcc-patches/20240109105253.332676-1-...@linux.ibm.com/ v1 -> v2: Move the .LASANPC label to the .text section (Jakub). Jakub okay-ed this version in the GCC Bugzilla. Bootstrap and regtest running on ppc64le-redhat-linux and powerpc64-linux-gnu. Ok for trunk when successful? rs6000_elf_declare_function_name () outputs Power ELF V1 ABI function entry labels without using ASM_OUTPUT_FUNCTION_LABEL (). As a result, .LASANPC labels are not emitted, causing linker errors. In theory, it is possible to reuse ASM_OUTPUT_FUNCTION_LABEL () by changing rs6000_output_function_entry () to generate label names without outputting them, but this would be quite a large change. Instead, factor out the .LASANPC emitting code from ASM_OUTPUT_FUNCTION_LABEL () and call it manually. Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL") Suggested-by: Jakub Jelinek Signed-off-by: Ilya Leoshkevich gcc/ChangeLog: PR sanitizer/113284 * config/rs6000/rs6000.cc (rs6000_elf_declare_function_name): Use assemble_function_label_final () for Power ELF V1 ABI. * output.h (assemble_function_label_final): New function. * varasm.cc (assemble_function_label_raw): Use assemble_function_label_final (). (assemble_function_label_final): New function. --- gcc/config/rs6000/rs6000.cc | 1 + gcc/output.h| 4 gcc/varasm.cc | 9 + 3 files changed, 14 insertions(+) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 94fbf46f2b6..5d975dab921 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -21357,6 +21357,7 @@ rs6000_elf_declare_function_name (FILE *file, const char *name, tree decl) ASM_DECLARE_RESULT (file, DECL_RESULT (decl)); rs6000_output_function_entry (file, name); fputs (":\n", file); + assemble_function_label_final (); return; } diff --git a/gcc/output.h b/gcc/output.h index c8fe1d2643d..46b0033b221 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -182,6 +182,10 @@ extern const char *get_fnname_from_decl (tree); code or data is output after the label. */ extern void assemble_function_label_raw (FILE *, const char *); +/* Finish outputting function label. Needs to be called when outputting + function label without using assemble_function_label_raw (). */ +extern void assemble_function_label_final (void); + /* Output assembler code for the constant pool of a function and associated with defining the name of the function. DECL describes the function. NAME is the function's name. For the constant pool, we use the current diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 1a869ae458a..2b633822434 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -1843,6 +1843,15 @@ void assemble_function_label_raw (FILE *file, const char *name) { ASM_OUTPUT_LABEL (file, name); + assemble_function_label_final (); +} + +/* Finish outputting function label. Needs to be called when outputting + function label without using assemble_function_label_raw (). */ + +void +assemble_function_label_final (void) +{ if ((flag_sanitize & SANITIZE_ADDRESS) /* Notify ASAN only about the first function label. */ && (in_cold_section_p == first_function_block_is_cold) -- 2.43.0
Re: [PATCH v2 2/2] asan: Align .LASANPC on function boundary
On Tue, 2024-01-09 at 11:55 -0700, Jeff Law wrote: > > > On 1/2/24 12:41, Ilya Leoshkevich wrote: > > GCC can emit code between the function label and the .LASANPC > > label, > > making the latter unaligned. Some architectures cannot load > > unaligned > > labels directly and require literal pool entries, which is > > inefficient. > > > > Move the invocation of asan_function_start to > > ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code > > is > > emitted. This allows setting the .LASANPC label alignment to the > > respective function alignment. > > --- > > gcc/asan.cc | 6 ++ > > gcc/config/i386/i386.cc | 2 +- > > gcc/config/s390/s390.cc | 2 +- > > gcc/defaults.h | 2 +- > > gcc/final.cc | 3 --- > > gcc/output.h | 4 > > gcc/varasm.cc | 14 ++ > > 7 files changed, 23 insertions(+), 10 deletions(-) > So this needs a ChangeLog obviously. I assume you've tested on > s390[x]. > It should also be tested on x86 since it's the only other platform > that redefined ASM_OUTPUT_FUNCTION_LABEL. > > Assuming those tests pass without regression, then this is fine for > the > trunk. > > Thanks, > Jeff Hi Jeff, Since Jakub already approved this 2/2, you approved 1/2, and x86_64/ppc64le/s390x regtests were successful, I've already pushed this series (with ChangeLogs). Unfortunately people discovered two regressions on i686 [1] and ppc64be [2]. The first one is already sorted out, I'm currently regtesting the fix for the second one and will push it as soon as it's done. Best regards, Ilya [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113251 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284
[PATCH] rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]
Bootstrap and regtest running on ppc64le-redhat-linux and powerpc64-linux-gnu. Ok for trunk when successful? Use ASM_OUTPUT_FUNCTION_LABEL () instead of ASM_OUTPUT_LABEL () in the Power ELF V1 ABI branch of rs6000_elf_declare_function_name () to ensure that the .LASANPC label is emitted. The other branches already use the correct macro. Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL") Signed-off-by: Ilya Leoshkevich gcc/ChangeLog: PR sanitizer/113284 * config/rs6000/rs6000.cc (rs6000_elf_declare_function_name): Use ASM_OUTPUT_FUNCTION_LABEL () for Power ELF V1 ABI. --- gcc/config/rs6000/rs6000.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 94fbf46f2b6..fd9bb807957 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -21334,7 +21334,7 @@ rs6000_elf_declare_function_name (FILE *file, const char *name, tree decl) if (TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2) { fputs ("\t.section\t\".opd\",\"aw\"\n\t.align 3\n", file); - ASM_OUTPUT_LABEL (file, name); + ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); fputs (DOUBLE_INT_ASM_OP, file); rs6000_output_function_entry (file, name); fputs (",.TOC.@tocbase,0\n\t.previous\n", file); -- 2.43.0
[PATCH] asan: Do not call asan_function_start () without the current function [PR113251]
Bootstrap and regtest running on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for trunk when successful? Using ASAN on i686-linux with -fPIC causes an ICE, because when pc_thunks are generated, there is no current function anymore, but asan_function_start () expects one. Fix by not calling asan_function_start () without one. A narrower fix would be to temporarily disable ASAN around pc_thunk generation. However, the issue looks generic enough, and may affect less often tested configurations, so go for a broader fix. Fixes: e66dc37b299c ("asan: Align .LASANPC on function boundary") Suggested-by: Jakub Jelinek Signed-off-by: Ilya Leoshkevich gcc/ChangeLog: PR sanitizer/113251 * varasm.cc (assemble_function_label_raw): Do not call asan_function_start () without the current function. --- gcc/varasm.cc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 25c1e05628d..1a869ae458a 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -1845,7 +1845,9 @@ assemble_function_label_raw (FILE *file, const char *name) ASM_OUTPUT_LABEL (file, name); if ((flag_sanitize & SANITIZE_ADDRESS) /* Notify ASAN only about the first function label. */ - && (in_cold_section_p == first_function_block_is_cold)) + && (in_cold_section_p == first_function_block_is_cold) + /* Do not notify ASAN when called from, e.g., code_end (). */ + && cfun) asan_function_start (); } -- 2.43.0
[PATCH v2 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL
gccint recommends using ASM_OUTPUT_FUNCTION_LABEL in ASM_DECLARE_FUNCTION_NAME, but many implementations use ASM_OUTPUT_LABEL instead. It's inconsistent and prevents changes to ASM_OUTPUT_FUNCTION_LABEL from affecting the respective targets. --- gcc/config/aarch64/aarch64.cc | 2 +- gcc/config/alpha/alpha.cc | 5 ++--- gcc/config/arm/aout.h | 2 +- gcc/config/arm/arm.cc | 2 +- gcc/config/bfin/bfin.h | 16 gcc/config/c6x/c6x.h| 2 +- gcc/config/gcn/gcn.cc | 5 ++--- gcc/config/h8300/h8300.h| 2 +- gcc/config/ia64/ia64.cc | 5 ++--- gcc/config/mcore/mcore-elf.h| 2 +- gcc/config/microblaze/microblaze.cc | 3 +-- gcc/config/mips/mips.cc | 19 ++- gcc/config/pa/pa.cc | 3 ++- gcc/config/riscv/riscv.cc | 2 +- gcc/config/rs6000/rs6000.cc | 4 ++-- 15 files changed, 36 insertions(+), 38 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 298477d88bb..e3c72f60d4e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24207,7 +24207,7 @@ aarch64_declare_function_name (FILE *stream, const char* name, /* Don't forget the type directive for ELF. */ ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function"); - ASM_OUTPUT_LABEL (stream, name); + ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl); cfun->machine->label_is_assembled = true; } diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc index 6aa93783226..8118255e737 100644 --- a/gcc/config/alpha/alpha.cc +++ b/gcc/config/alpha/alpha.cc @@ -7986,8 +7986,7 @@ int num_source_filenames = 0; /* Output the textual info surrounding the prologue. */ void -alpha_start_function (FILE *file, const char *fnname, - tree decl ATTRIBUTE_UNUSED) +alpha_start_function (FILE *file, const char *fnname, tree decl) { unsigned long imask, fmask; /* Complete stack size needed. */ @@ -8052,7 +8051,7 @@ alpha_start_function (FILE *file, const char *fnname, if (TARGET_ABI_OPEN_VMS) strcat (entry_label, "..en"); - ASM_OUTPUT_LABEL (file, entry_label); + ASM_OUTPUT_FUNCTION_LABEL (file, entry_label, decl); inside_function = TRUE; if (TARGET_ABI_OPEN_VMS) diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h index 49896bb9620..380147aed7d 100644 --- a/gcc/config/arm/aout.h +++ b/gcc/config/arm/aout.h @@ -152,7 +152,7 @@ do \ { \ ARM_DECLARE_FUNCTION_NAME (STREAM, NAME, DECL); \ - ASM_OUTPUT_LABEL (STREAM, NAME); \ + ASM_OUTPUT_FUNCTION_LABEL (STREAM, NAME, DECL); \ } \ while (0) #endif diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 0c0cb14a8a4..7ca607b3de1 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -21800,7 +21800,7 @@ arm_asm_declare_function_name (FILE *file, const char *name, tree decl) ARM_DECLARE_FUNCTION_NAME (file, name, decl); ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function"); ASM_DECLARE_RESULT (file, DECL_RESULT (decl)); - ASM_OUTPUT_LABEL (file, name); + ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); if (cmse_name) ASM_OUTPUT_LABEL (file, cmse_name); diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h index c25f41f6839..60a8d716819 100644 --- a/gcc/config/bfin/bfin.h +++ b/gcc/config/bfin/bfin.h @@ -995,14 +995,14 @@ typedef enum directives { fputc ('\n',FILE); \ } while (0) -#define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \ - do { \ -fputs (".type ", FILE);\ -assemble_name (FILE, NAME); \ -fputs (", STT_FUNC", FILE); \ -fputc (';',FILE); \ -fputc ('\n',FILE); \ -ASM_OUTPUT_LABEL(FILE, NAME); \ +#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)\ + do { \ +fputs (".type ", FILE);\ +assemble_name (FILE, NAME);\ +fputs (", STT_FUNC", FILE);\ +fputc (';', FILE); \ +fputc ('\n', FILE);\ +ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL); \ } while (0) #define ASM_OUTPUT_LABEL(FILE, NAME)\ diff --git a/gcc/config/c6x/c6x.h b/gcc/config/c6x/c6x.h index 26b2f2f0700..790b9627ebe 100644 --- a/gcc/config/c6x/c6x.h +++ b/gcc/config/c6x/c6x.h @@ -459,7 +459,7 @@ struct GTY(()) machine_function c6x_output_file_unwind (FILE); \ ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "function"); \ ASM_DECLARE_RESULT (FILE, D
[PATCH v2 2/2] asan: Align .LASANPC on function boundary
GCC can emit code between the function label and the .LASANPC label, making the latter unaligned. Some architectures cannot load unaligned labels directly and require literal pool entries, which is inefficient. Move the invocation of asan_function_start to ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is emitted. This allows setting the .LASANPC label alignment to the respective function alignment. --- gcc/asan.cc | 6 ++ gcc/config/i386/i386.cc | 2 +- gcc/config/s390/s390.cc | 2 +- gcc/defaults.h | 2 +- gcc/final.cc| 3 --- gcc/output.h| 4 gcc/varasm.cc | 14 ++ 7 files changed, 23 insertions(+), 10 deletions(-) diff --git a/gcc/asan.cc b/gcc/asan.cc index 8d0ffb497cc..48738244aba 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -1481,10 +1481,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) void asan_function_start (void) { - section *fnsec = function_section (current_function_decl); - switch_to_section (fnsec); - ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", -current_function_funcdef_no); + ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", current_function_funcdef_no); } /* Return number of shadow bytes that are occupied by a local variable @@ -2006,6 +2003,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, DECL_INITIAL (decl) = decl; TREE_ASM_WRITTEN (decl) = 1; TREE_ASM_WRITTEN (id) = 1; + DECL_ALIGN_RAW (decl) = DECL_ALIGN_RAW (current_function_decl); emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); shadow_base = expand_binop (Pmode, lshr_optab, base, gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT), diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 38d515dac04..09fc2b63ee3 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -1640,7 +1640,7 @@ ix86_asm_output_function_label (FILE *out_file, const char *fname, SUBTARGET_ASM_UNWIND_INIT (out_file); #endif - ASM_OUTPUT_LABEL (out_file, fname); + assemble_function_label_raw (out_file, fname); /* Output magic byte marker, if hot-patch attribute is set. */ if (is_ms_hook) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index a5c36b43972..c871a10506a 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -8323,7 +8323,7 @@ s390_asm_output_function_label (FILE *out_file, const char *fname, asm_fprintf (out_file, "\t# fn:%s wd%d\n", fname, s390_warn_dynamicstack_p); } - ASM_OUTPUT_LABEL (out_file, fname); + assemble_function_label_raw (out_file, fname); if (hw_after > 0) asm_fprintf (out_file, "\t# post-label NOPs for hotpatch (%d halfwords)\n", diff --git a/gcc/defaults.h b/gcc/defaults.h index 6f095969410..b76734908cd 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -150,7 +150,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef ASM_OUTPUT_FUNCTION_LABEL #define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \ - ASM_OUTPUT_LABEL ((FILE), (NAME)) + assemble_function_label_raw ((FILE), (NAME)) #endif /* Output the definition of a compiler-generated label named NAME. */ diff --git a/gcc/final.cc b/gcc/final.cc index e6f1b1e166b..5e21aedf8ed 100644 --- a/gcc/final.cc +++ b/gcc/final.cc @@ -1686,9 +1686,6 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, int *seen, high_block_linenum = high_function_linenum = last_linenum; - if (flag_sanitize & SANITIZE_ADDRESS) -asan_function_start (); - rtx_insn *first = *firstp; if (in_initial_view_p (first)) { diff --git a/gcc/output.h b/gcc/output.h index 76cfd58c1e6..bfdecc5ea74 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -178,6 +178,10 @@ extern void assemble_asm (tree); /* Get the function's name from a decl, as described by its RTL. */ extern const char *get_fnname_from_decl (tree); +/* Output function label, possibly with accompanying metadata. No additional + code or data is output after the label. */ +extern void assemble_function_label_raw (FILE *, const char *); + /* Output assembler code for the constant pool of a function and associated with defining the name of the function. DECL describes the function. NAME is the function's name. For the constant pool, we use the current diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 69f8f8ee018..d0d670d009c 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include "alloc-pool.h" #include "toplev.h" #include "opts.h" +#include "asan.h" /* The (assembler) name of the first globally-visible object output. */ extern GTY(()) const char *first_global_object_name; @@ -1835,6 +1836,19 @@ get_fnname_from_decl (tree decl) return XSTR (x, 0); } +/* Output function label, possibly with accompanying metadata. No additional +
[PATCH v2 0/2] asan: Align .LASANPC on function boundary
v1: https://inbox.sourceware.org/gcc-patches/20231207121005.3425208-1-...@linux.ibm.com/ v1 -> v2: Fix style issues (Jakub). Jakub has reviewed patch 2 and mentioned that he'd defer the patch 1 review to Jeff. Hi, this is another attempt to fix the .LASANPC alignment on s390x. Currently it's not only inefficient ([1]-[5]), but also causes linker errors in template-heavy code ([6]). The previous attempts to add a new constant for minimum code alignment value ([1]-[5]) did not arouse considerable enthusiasm, and fixing the fallout ([6]) is probably just a wrong thing to do. So here I'm taking another approach: making sure that .LASANPC is aligned on function boundary in the first place. This requires moving the asan_function_start() invocation to ASM_OUTPUT_FUNCTION_LABEL(). Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Compile tested for platforms listed in [7]. Best regards, Ilya [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html [3] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html [4] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html [5] https://patchwork.ozlabs.org/project/gcc/list/?series=320223 [6] https://patchwork.ozlabs.org/project/gcc/list/?series=297132 [7] http://toolchain.lug-owl.de/laminar/jobs Ilya Leoshkevich (2): Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL asan: Align .LASANPC on function boundary gcc/asan.cc | 6 ++ gcc/config/aarch64/aarch64.cc | 2 +- gcc/config/alpha/alpha.cc | 5 ++--- gcc/config/arm/aout.h | 2 +- gcc/config/arm/arm.cc | 2 +- gcc/config/bfin/bfin.h | 16 gcc/config/c6x/c6x.h| 2 +- gcc/config/gcn/gcn.cc | 5 ++--- gcc/config/h8300/h8300.h| 2 +- gcc/config/i386/i386.cc | 2 +- gcc/config/ia64/ia64.cc | 5 ++--- gcc/config/mcore/mcore-elf.h| 2 +- gcc/config/microblaze/microblaze.cc | 3 +-- gcc/config/mips/mips.cc | 19 ++- gcc/config/pa/pa.cc | 3 ++- gcc/config/riscv/riscv.cc | 2 +- gcc/config/rs6000/rs6000.cc | 4 ++-- gcc/config/s390/s390.cc | 2 +- gcc/defaults.h | 2 +- gcc/final.cc| 3 --- gcc/output.h| 4 gcc/varasm.cc | 14 ++ 22 files changed, 59 insertions(+), 48 deletions(-) -- 2.43.0
[PATCH 2/2] asan: Align .LASANPC on function boundary
GCC can emit code between the function label and the .LASANPC label, making the latter unaligned. Some architectures cannot load unaligned labels directly and require literal pool entries, which is inefficient. Move the invocation of asan_function_start to ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is emitted. This allows setting the .LASANPC label alignment to the respective function alignment. --- gcc/asan.cc | 6 ++ gcc/config/i386/i386.cc | 2 +- gcc/config/s390/s390.cc | 2 +- gcc/defaults.h | 2 +- gcc/final.cc| 3 --- gcc/output.h| 4 gcc/varasm.cc | 10 ++ 7 files changed, 19 insertions(+), 10 deletions(-) diff --git a/gcc/asan.cc b/gcc/asan.cc index 8d0ffb497cc..48738244aba 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -1481,10 +1481,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len) void asan_function_start (void) { - section *fnsec = function_section (current_function_decl); - switch_to_section (fnsec); - ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", -current_function_funcdef_no); + ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", current_function_funcdef_no); } /* Return number of shadow bytes that are occupied by a local variable @@ -2006,6 +2003,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, DECL_INITIAL (decl) = decl; TREE_ASM_WRITTEN (decl) = 1; TREE_ASM_WRITTEN (id) = 1; + DECL_ALIGN_RAW (decl) = DECL_ALIGN_RAW (current_function_decl); emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); shadow_base = expand_binop (Pmode, lshr_optab, base, gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT), diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 7c5cab4e2c6..a552a300b69 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -1640,7 +1640,7 @@ ix86_asm_output_function_label (FILE *out_file, const char *fname, SUBTARGET_ASM_UNWIND_INIT (out_file); #endif - ASM_OUTPUT_LABEL (out_file, fname); + assemble_function_label_raw (out_file, fname); /* Output magic byte marker, if hot-patch attribute is set. */ if (is_ms_hook) diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 044de874590..a022db230db 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -8323,7 +8323,7 @@ s390_asm_output_function_label (FILE *out_file, const char *fname, asm_fprintf (out_file, "\t# fn:%s wd%d\n", fname, s390_warn_dynamicstack_p); } - ASM_OUTPUT_LABEL (out_file, fname); + assemble_function_label_raw (out_file, fname); if (hw_after > 0) asm_fprintf (out_file, "\t# post-label NOPs for hotpatch (%d halfwords)\n", diff --git a/gcc/defaults.h b/gcc/defaults.h index dc6f09cacae..153d3cd32c0 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -150,7 +150,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #ifndef ASM_OUTPUT_FUNCTION_LABEL #define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \ - ASM_OUTPUT_LABEL ((FILE), (NAME)) + assemble_function_label_raw ((FILE), (NAME)) #endif /* Output the definition of a compiler-generated label named NAME. */ diff --git a/gcc/final.cc b/gcc/final.cc index e6f1b1e166b..5e21aedf8ed 100644 --- a/gcc/final.cc +++ b/gcc/final.cc @@ -1686,9 +1686,6 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, int *seen, high_block_linenum = high_function_linenum = last_linenum; - if (flag_sanitize & SANITIZE_ADDRESS) -asan_function_start (); - rtx_insn *first = *firstp; if (in_initial_view_p (first)) { diff --git a/gcc/output.h b/gcc/output.h index 76cfd58c1e6..bfdecc5ea74 100644 --- a/gcc/output.h +++ b/gcc/output.h @@ -178,6 +178,10 @@ extern void assemble_asm (tree); /* Get the function's name from a decl, as described by its RTL. */ extern const char *get_fnname_from_decl (tree); +/* Output function label, possibly with accompanying metadata. No additional + code or data is output after the label. */ +extern void assemble_function_label_raw (FILE *, const char *); + /* Output assembler code for the constant pool of a function and associated with defining the name of the function. DECL describes the function. NAME is the function's name. For the constant pool, we use the current diff --git a/gcc/varasm.cc b/gcc/varasm.cc index 167aea87091..28c29883df9 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include "alloc-pool.h" #include "toplev.h" #include "opts.h" +#include "asan.h" /* The (assembler) name of the first globally-visible object output. */ extern GTY(()) const char *first_global_object_name; @@ -1835,6 +1836,15 @@ get_fnname_from_decl (tree decl) return XSTR (x, 0); } +void assemble_function_label_raw (FILE *file, const char *name) +{ + ASM_OUTPUT_LABE
[PATCH 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL
gccint recommends using ASM_OUTPUT_FUNCTION_LABEL in ASM_DECLARE_FUNCTION_NAME, but many implementations use ASM_OUTPUT_LABEL instead. It's inconsistent and prevents changes to ASM_OUTPUT_FUNCTION_LABEL from affecting the respective targets. --- gcc/config/aarch64/aarch64.cc | 2 +- gcc/config/alpha/alpha.cc | 5 ++--- gcc/config/arm/aout.h | 2 +- gcc/config/arm/arm.cc | 2 +- gcc/config/bfin/bfin.h | 16 gcc/config/c6x/c6x.h| 2 +- gcc/config/gcn/gcn.cc | 5 ++--- gcc/config/h8300/h8300.h| 2 +- gcc/config/ia64/ia64.cc | 5 ++--- gcc/config/mcore/mcore-elf.h| 2 +- gcc/config/microblaze/microblaze.cc | 3 +-- gcc/config/mips/mips.cc | 19 ++- gcc/config/pa/pa.cc | 3 ++- gcc/config/riscv/riscv.cc | 2 +- gcc/config/rs6000/rs6000.cc | 4 ++-- 15 files changed, 36 insertions(+), 38 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 8f50a70083d..bf247a8fd17 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -23285,7 +23285,7 @@ aarch64_declare_function_name (FILE *stream, const char* name, /* Don't forget the type directive for ELF. */ ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function"); - ASM_OUTPUT_LABEL (stream, name); + ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl); cfun->machine->label_is_assembled = true; } diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc index 6aa93783226..8118255e737 100644 --- a/gcc/config/alpha/alpha.cc +++ b/gcc/config/alpha/alpha.cc @@ -7986,8 +7986,7 @@ int num_source_filenames = 0; /* Output the textual info surrounding the prologue. */ void -alpha_start_function (FILE *file, const char *fnname, - tree decl ATTRIBUTE_UNUSED) +alpha_start_function (FILE *file, const char *fnname, tree decl) { unsigned long imask, fmask; /* Complete stack size needed. */ @@ -8052,7 +8051,7 @@ alpha_start_function (FILE *file, const char *fnname, if (TARGET_ABI_OPEN_VMS) strcat (entry_label, "..en"); - ASM_OUTPUT_LABEL (file, entry_label); + ASM_OUTPUT_FUNCTION_LABEL (file, entry_label, decl); inside_function = TRUE; if (TARGET_ABI_OPEN_VMS) diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h index 49896bb9620..380147aed7d 100644 --- a/gcc/config/arm/aout.h +++ b/gcc/config/arm/aout.h @@ -152,7 +152,7 @@ do \ { \ ARM_DECLARE_FUNCTION_NAME (STREAM, NAME, DECL); \ - ASM_OUTPUT_LABEL (STREAM, NAME); \ + ASM_OUTPUT_FUNCTION_LABEL (STREAM, NAME, DECL); \ } \ while (0) #endif diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc index 6e3e2e8fb1b..7fd9bc19882 100644 --- a/gcc/config/arm/arm.cc +++ b/gcc/config/arm/arm.cc @@ -21801,7 +21801,7 @@ arm_asm_declare_function_name (FILE *file, const char *name, tree decl) ARM_DECLARE_FUNCTION_NAME (file, name, decl); ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function"); ASM_DECLARE_RESULT (file, DECL_RESULT (decl)); - ASM_OUTPUT_LABEL (file, name); + ASM_OUTPUT_FUNCTION_LABEL (file, name, decl); if (cmse_name) ASM_OUTPUT_LABEL (file, cmse_name); diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h index c25f41f6839..60a8d716819 100644 --- a/gcc/config/bfin/bfin.h +++ b/gcc/config/bfin/bfin.h @@ -995,14 +995,14 @@ typedef enum directives { fputc ('\n',FILE); \ } while (0) -#define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \ - do { \ -fputs (".type ", FILE);\ -assemble_name (FILE, NAME); \ -fputs (", STT_FUNC", FILE); \ -fputc (';',FILE); \ -fputc ('\n',FILE); \ -ASM_OUTPUT_LABEL(FILE, NAME); \ +#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)\ + do { \ +fputs (".type ", FILE);\ +assemble_name (FILE, NAME);\ +fputs (", STT_FUNC", FILE);\ +fputc (';', FILE); \ +fputc ('\n', FILE);\ +ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL); \ } while (0) #define ASM_OUTPUT_LABEL(FILE, NAME)\ diff --git a/gcc/config/c6x/c6x.h b/gcc/config/c6x/c6x.h index 26b2f2f0700..790b9627ebe 100644 --- a/gcc/config/c6x/c6x.h +++ b/gcc/config/c6x/c6x.h @@ -459,7 +459,7 @@ struct GTY(()) machine_function c6x_output_file_unwind (FILE); \ ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "function"); \ ASM_DECLARE_RESULT (FILE, D
[PATCH 0/2] asan: Align .LASANPC on function boundary
Hi, this is another attempt to fix the .LASANPC alignment on s390x. Currently it's not only inefficient ([1]-[5]), but also causes linker errors in template-heavy code ([6]). The previous attempts to add a new constant for minimum code alignment value ([1]-[5]) did not arouse considerable enthusiasm, and fixing the fallout ([6]) is probably just a wrong thing to do. So here I'm taking another approach: making sure that .LASANPC is aligned on function boundary in the first place. This requires moving the asan_function_start() invocation to ASM_OUTPUT_FUNCTION_LABEL(). Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Compile tested for platforms listed in [7]. Best regards, Ilya [1] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html [3] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html [4] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html [5] https://patchwork.ozlabs.org/project/gcc/list/?series=320223 [6] https://patchwork.ozlabs.org/project/gcc/list/?series=297132 [7] http://toolchain.lug-owl.de/laminar/jobs Ilya Leoshkevich (2): Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL asan: Align .LASANPC on function boundary gcc/asan.cc | 6 ++ gcc/config/aarch64/aarch64.cc | 2 +- gcc/config/alpha/alpha.cc | 5 ++--- gcc/config/arm/aout.h | 2 +- gcc/config/arm/arm.cc | 2 +- gcc/config/bfin/bfin.h | 16 gcc/config/c6x/c6x.h| 2 +- gcc/config/gcn/gcn.cc | 5 ++--- gcc/config/h8300/h8300.h| 2 +- gcc/config/i386/i386.cc | 2 +- gcc/config/ia64/ia64.cc | 5 ++--- gcc/config/mcore/mcore-elf.h| 2 +- gcc/config/microblaze/microblaze.cc | 3 +-- gcc/config/mips/mips.cc | 19 ++- gcc/config/pa/pa.cc | 3 ++- gcc/config/riscv/riscv.cc | 2 +- gcc/config/rs6000/rs6000.cc | 4 ++-- gcc/config/s390/s390.cc | 2 +- gcc/defaults.h | 2 +- gcc/final.cc| 3 --- gcc/output.h| 4 gcc/varasm.cc | 10 ++ 22 files changed, 55 insertions(+), 48 deletions(-) -- 2.43.0
PING [PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc
On Tue, 2022-09-27 at 02:23 +0200, Ilya Leoshkevich wrote: > Hi, > > This is a resend of v4 with slightly adjusted commit messages: > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html > v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html > v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html > v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html > > It still survives the bootstrap and the regtest on x86_64-redhat- > linux, > s390x-redhat-linux and ppc64le-redhat-linux. It also fixes [1]. > > I also tried the approach with moving .LASANPC closer to the function > label and using FUNCTION_BOUNDARY instead of introducing > CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch > the moment where the function label is written. Architectures can do > it by calling ASM_OUTPUT_LABEL() or assemble_name() in > ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or > TARGET_ASM_FUNCTION_PROLOGUE(). epiphany_start_function() does that > twice, but passes the same decl to both calls. Note that simply > moving asan_function_start() to final_start_function_1() is not > enough, > since an architecture can write something after the function label. > This all means that for this approach to work, all the architectures > need to be adjusted, which looks like an overkill to me. > > Best regards, > Ilya > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html > > > Ilya Leoshkevich (2): > asan: specify alignment for LASANPC labels > IBM zSystems: Define CODE_LABEL_BOUNDARY > > gcc/asan.cc | 1 + > gcc/config/s390/s390.h | 3 +++ > gcc/defaults.h | 5 + > gcc/doc/tm.texi | 4 > gcc/doc/tm.texi.in | 4 > gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ > 6 files changed, 32 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c >
[PATCH v5 2/2] IBM zSystems: Define CODE_LABEL_BOUNDARY
Currently s390 emits the following sequence to store a frame_pc: a: .LASANPC0: lg %r1,.L5-.L4(%r13) la %r1,0(%r1,%r12) stg %r1,176(%r11) .L5: .quad .LASANPC0@GOTOFF The reason GOT indirection is used instead of larl is that gcc does not know that .LASANPC0, being a code label, is aligned on a 2-byte boundary, and larl can load only even addresses. Define CODE_LABEL_BOUNDARY in order to get rid of GOT indirection: larl%r1,.LASANPC0 stg %r1,176(%r11) gcc/ChangeLog: 2020-06-30 Ilya Leoshkevich * config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390 requires code labels to be aligned on a 2-byte boundary. gcc/testsuite/ChangeLog: 2019-06-30 Ilya Leoshkevich * gcc.target/s390/asan-no-gotoff.c: New test. --- gcc/config/s390/s390.h | 3 +++ gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ 2 files changed, 18 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index be566215df2..7d078ce6868 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -368,6 +368,9 @@ extern const char *s390_host_detect_local_cpu (int argc, const char **argv); /* Allocation boundary (in *bits*) for the code of a function. */ #define FUNCTION_BOUNDARY 64 +/* Alignment required for a code label, in bits. */ +#define CODE_LABEL_BOUNDARY 16 + /* There is no point aligning anything to a rounder boundary than this. */ #define BIGGEST_ALIGNMENT 64 diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c new file mode 100644 index 000..f555e4e96f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c @@ -0,0 +1,15 @@ +/* Test that ASAN labels are referenced without unnecessary indirections. */ + +/* { dg-do compile } */ +/* { dg-options "-fPIE -O2 -fsanitize=kernel-address --param asan-stack=1" } */ + +extern void c (int *); + +void a () +{ + int b; + c (&b); +} + +/* { dg-final { scan-assembler {\tlarl\t%r\d+,\.LASANPC\d+} } } */ +/* { dg-final { scan-assembler-not {\.LASANPC\d+@GOTOFF} } } */ -- 2.37.2
[PATCH v5 1/2] asan: specify alignment for LASANPC labels
gcc/ChangeLog: 2020-06-30 Ilya Leoshkevich * asan.cc (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY. * defaults.h (CODE_LABEL_BOUNDARY): New macro. * doc/tm.texi: Document CODE_LABEL_BOUNDARY. * doc/tm.texi.in: Likewise. --- gcc/asan.cc| 1 + gcc/defaults.h | 5 + gcc/doc/tm.texi| 4 gcc/doc/tm.texi.in | 4 4 files changed, 14 insertions(+) diff --git a/gcc/asan.cc b/gcc/asan.cc index 8276f12cc69..62f50ee769b 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -1960,6 +1960,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, DECL_INITIAL (decl) = decl; TREE_ASM_WRITTEN (decl) = 1; TREE_ASM_WRITTEN (id) = 1; + SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY); emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); shadow_base = expand_binop (Pmode, lshr_optab, base, gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT), diff --git a/gcc/defaults.h b/gcc/defaults.h index 953605c1627..52a471cf08e 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1455,4 +1455,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see typedef TARGET_UNIT target_unit; #endif +/* Alignment required for a code label, in bits. */ +#ifndef CODE_LABEL_BOUNDARY +#define CODE_LABEL_BOUNDARY BITS_PER_UNIT +#endif + #endif /* ! GCC_DEFAULTS_H */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 858bfb80cec..cc588ee23b5 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -1075,6 +1075,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}. Alignment required for a function entry point, in bits. @end defmac +@defmac CODE_LABEL_BOUNDARY +Alignment required for a code label, in bits. +@end defmac + @defmac BIGGEST_ALIGNMENT Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 21b849ea32a..a0b725b0685 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -971,6 +971,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}. Alignment required for a function entry point, in bits. @end defmac +@defmac CODE_LABEL_BOUNDARY +Alignment required for a code label, in bits. +@end defmac + @defmac BIGGEST_ALIGNMENT Biggest alignment that any data type can require on this machine, in bits. Note that this is not the biggest alignment that is supported, -- 2.37.2
[PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc
Hi, This is a resend of v4 with slightly adjusted commit messages: v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html It still survives the bootstrap and the regtest on x86_64-redhat-linux, s390x-redhat-linux and ppc64le-redhat-linux. It also fixes [1]. I also tried the approach with moving .LASANPC closer to the function label and using FUNCTION_BOUNDARY instead of introducing CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch the moment where the function label is written. Architectures can do it by calling ASM_OUTPUT_LABEL() or assemble_name() in ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or TARGET_ASM_FUNCTION_PROLOGUE(). epiphany_start_function() does that twice, but passes the same decl to both calls. Note that simply moving asan_function_start() to final_start_function_1() is not enough, since an architecture can write something after the function label. This all means that for this approach to work, all the architectures need to be adjusted, which looks like an overkill to me. Best regards, Ilya [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html Ilya Leoshkevich (2): asan: specify alignment for LASANPC labels IBM zSystems: Define CODE_LABEL_BOUNDARY gcc/asan.cc| 1 + gcc/config/s390/s390.h | 3 +++ gcc/defaults.h | 5 + gcc/doc/tm.texi| 4 gcc/doc/tm.texi.in | 4 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++ 6 files changed, 32 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c -- 2.37.2
Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
On Thu, 2022-08-11 at 07:45 +0200, Andreas Krebbel wrote: > On 8/10/22 13:42, Ilya Leoshkevich wrote: > > On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote: > > > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > > > > > > > > > > > dg.exp=pr104612.c fails with an ICE on s390x, because > > > copysignv2sf3 > > > produces an insn that vsel is supposed to recognize, but > > > can't, > > > because it's not defined for V2SF. Fix by defining it for all > > > vector > > > modes supported by copysign3. > > > > > > gcc/ChangeLog: > > > > > > * config/s390/vector.md (V_HW_FT): New iterator. > > > * config/s390/vx-builtins.md (vsel): Use V instead > > > of > > > V_HW. > > > --- > > > gcc/config/s390/vector.md | 6 ++ > > > gcc/config/s390/vx-builtins.md | 12 ++-- > > > 2 files changed, 12 insertions(+), 6 deletions(-) > > > > Jakub pointed out that this is broken in gcc-12 as well. > > The patch applies cleanly, and I started a bootstrap/regtest. > > Ok for gcc-12? > > Yes. Thanks! > > Andreas Hi, I've committed this today without realizing that gcc-12 branch is closed. Sorry! Please let me know if I should revert this. Best regards, Ilya
Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote: > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > > > dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 > produces an insn that vsel is supposed to recognize, but can't, > because it's not defined for V2SF. Fix by defining it for all vector > modes supported by copysign3. > > gcc/ChangeLog: > > * config/s390/vector.md (V_HW_FT): New iterator. > * config/s390/vx-builtins.md (vsel): Use V instead of > V_HW. > --- > gcc/config/s390/vector.md | 6 ++ > gcc/config/s390/vx-builtins.md | 12 ++-- > 2 files changed, 12 insertions(+), 6 deletions(-) Jakub pointed out that this is broken in gcc-12 as well. The patch applies cleanly, and I started a bootstrap/regtest. Ok for gcc-12?
[PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3 produces an insn that vsel is supposed to recognize, but can't, because it's not defined for V2SF. Fix by defining it for all vector modes supported by copysign3. gcc/ChangeLog: * config/s390/vector.md (V_HW_FT): New iterator. * config/s390/vx-builtins.md (vsel): Use V instead of V_HW. --- gcc/config/s390/vector.md | 6 ++ gcc/config/s390/vx-builtins.md | 12 ++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index a6c4b4eb974..624729814af 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -63,6 +63,12 @@ V1DF V2DF (V1TF "TARGET_VXE") (TF "TARGET_VXE")]) +; All modes present in V_HW and VFT. +(define_mode_iterator V_HW_FT [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V1DF + V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE") + (V4SF "TARGET_VXE") (V1TF "TARGET_VXE") + (TF "TARGET_VXE")]) + ; FP vector modes directly supported by the HW. This does not include ; vector modes using only part of a vector register and should be used ; for instructions which might trigger IEEE exceptions. diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index d5130799804..98ee08b2683 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -517,12 +517,12 @@ ; swapped in s390-c.cc when we get here. (define_insn "vsel" - [(set (match_operand:V_HW 0 "register_operand" "=v") - (ior:V_HW -(and:V_HW (match_operand:V_HW 1 "register_operand" "v") - (match_operand:V_HW 3 "register_operand" "v")) -(and:V_HW (not:V_HW (match_dup 3)) - (match_operand:V_HW 2 "register_operand" "v"] + [(set (match_operand:V_HW_FT 0 "register_operand" "=v") + (ior:V_HW_FT +(and:V_HW_FT (match_operand:V_HW_FT 1 "register_operand" "v") + (match_operand:V_HW_FT 3 "register_operand" "v")) +(and:V_HW_FT (not:V_HW_FT (match_dup 3)) + (match_operand:V_HW_FT 2 "register_operand" "v"] "TARGET_VX" "vsel\t%v0,%1,%2,%3" [(set_attr "op_type" "VRR")]) -- 2.35.3
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Fri, 2022-04-29 at 13:56 +0200, Jakub Jelinek wrote: > On Fri, Apr 29, 2022 at 01:52:49PM +0200, Ilya Leoshkevich wrote: > > > This doesn't resolve the problem, unfortunately, because > > > references to discarded comdat symbols are still kept in .rodata: > > > > > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced > > > in > > > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined > > > in > > > discarded section > > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15 > > > Asse > > > rt > > > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) > > > > > > (That's from building zlib-ng with ASan and your patch on s390). > > > > > > So I was rather thinking about adding a reloc parameter to > > > mergeable_constant_section () and slightly changing the section > > > name when it's nonzero, e.g. from .cst to .cstrel. > > > > After some experimenting, I don't think that what I propose here > > is a good solution anymore, since it won't work with > > -fno-merge-constants. > > > > What do you think about something like this? > > > > --- a/gcc/varasm.cc > > +++ b/gcc/varasm.cc > > @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode > > mode, rtx x, > > return get_named_section (NULL, ".data.rel.ro", 3); > > } > > > > + if (reloc) > > + return targetm.asm_out.function_rodata_section > > (current_function_decl, > > + false); > > + > > return mergeable_constant_section (mode, align, 0); > > } > > > > This would put constants with relocations into .rodata.. > > default_function_rodata_section () already ensures that these > > sections > > are in the right comdat group. > > We don't really know if the emitted constant is purely for the > current > function, or also other functions (say emitted in as constant pool > constant > where constant pool constants are shared across the whole TU). > For the former, putting it into current function's comdat is fine, > for the > latter certainly isn't. mergeable_constant_section (), that the existing code calls in the same context, already relies on this being known and calls function_rodata_section () with exactly the same arguments. If !current_function_decl && !relocatable, we get readonly_data_section. Of course, mergeable_constant_section () does not handle comdat currently, so this point might be moot. However, looking at the callers of output_constant_pool_contents (), it seems that !current_function_decl happens in and only in the shared_constant_pool case, so it looks as if we know whether the constant is tied to a single function or not.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Thu, 2022-04-28 at 14:05 +0200, Ilya Leoshkevich wrote: > On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote: > > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote: > > > This is determined by default_elf_select_rtx_section (). If we > > > don't > > > want to mix non-reloc and reloc constants, we need to define a > > > special > > > section there. > > > > > > It seems to me, however, that this all would be made purely for > > > the > > > sake of .LASANPC, which is quite special: it's local, but at the > > > same > > > time it might need to be comdat. I don't think anything like > > > this > > > can > > > appear from compiling C/C++ code. > > > > > > Therefore I wonder if we could just drop it altogether like this? > > > > > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx > > > pbase, > > > unsigned int alignb, > > > ... > > > - emit_move_insn (mem, expand_normal (build_fold_addr_expr > > > (decl))); > > > + emit_move_insn (mem, expand_normal (build_fold_addr_expr > > > (current_function_decl))); > > > ... > > > > > > That's what LLVM is already doing. This will also solve the > > > alignment > > > problem I referred to earlier. > > > > LLVM is doing a wrong thing here. The global symbol can be > > overridden by > > a symbol in another shared library, that is definitely not what we > > want, > > because the ASAN record is for the particular implementation, not > > the > > other > > one which could be quite different. > > I see; this must be relevant when the overriding library calls > the original one through dlsym (RTLD_NEXT). > > > I think the right fix would be: > > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100 > > +++ gcc/varasm.cc 2022-04-28 13:22:44.463147066 +0200 > > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_ > > return get_named_section (NULL, ".data.rel.ro", 3); > > } > > > > + if (reloc) > > + return readonly_data_section; > > + > > return mergeable_constant_section (mode, align, 0); > > } > > > > which matches what we do in categorize_decl_for_section: > > else if (reloc & targetm.asm_out.reloc_rw_mask ()) > > ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : > > SECCAT_DATA_REL_RO; > > else if (reloc || flag_merge_constants < 2 > > ... > > /* C and C++ don't allow different variables to share the > > same > > location. -fmerge-all-constants allows even that (at > > the > > expense of not conforming). */ > > ret = SECCAT_RODATA; > > else if (DECL_INITIAL (decl) > > && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) > > ret = SECCAT_RODATA_MERGE_STR_INIT; > > else > > ret = SECCAT_RODATA_MERGE_CONST; > > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and > > .rodata > > for non-pic, and mergeable sections are only used if there are no > > relocations. > > This doesn't resolve the problem, unfortunately, because > references to discarded comdat symbols are still kept in .rodata: > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in > discarded section > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Asse > rt > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) > > (That's from building zlib-ng with ASan and your patch on s390). > > So I was rather thinking about adding a reloc parameter to > mergeable_constant_section () and slightly changing the section > name when it's nonzero, e.g. from .cst to .cstrel. After some experimenting, I don't think that what I propose here is a good solution anymore, since it won't work with -fno-merge-constants. What do you think about something like this? --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode mode, rtx x, return get_named_section (NULL, ".data.rel.ro", 3); } + if (reloc) +return targetm.asm_out.function_rodata_section (current_function_decl, + false); + return mergeable_constant_section (mode, align, 0); } This would put constants with relocations into .rodata.. default_function_rodata_section () already ensures that these sections are in the right comdat group. >
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote: > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote: > > This is determined by default_elf_select_rtx_section (). If we > > don't > > want to mix non-reloc and reloc constants, we need to define a > > special > > section there. > > > > It seems to me, however, that this all would be made purely for the > > sake of .LASANPC, which is quite special: it's local, but at the > > same > > time it might need to be comdat. I don't think anything like this > > can > > appear from compiling C/C++ code. > > > > Therefore I wonder if we could just drop it altogether like this? > > > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx > > pbase, > > unsigned int alignb, > > ... > > - emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (decl))); > > + emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (current_function_decl))); > > ... > > > > That's what LLVM is already doing. This will also solve the > > alignment > > problem I referred to earlier. > > LLVM is doing a wrong thing here. The global symbol can be > overridden by > a symbol in another shared library, that is definitely not what we > want, > because the ASAN record is for the particular implementation, not the > other > one which could be quite different. I see; this must be relevant when the overriding library calls the original one through dlsym (RTLD_NEXT). > I think the right fix would be: > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100 > +++ gcc/varasm.cc 2022-04-28 13:22:44.463147066 +0200 > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_ > return get_named_section (NULL, ".data.rel.ro", 3); > } > > + if (reloc) > + return readonly_data_section; > + > return mergeable_constant_section (mode, align, 0); > } > > which matches what we do in categorize_decl_for_section: > else if (reloc & targetm.asm_out.reloc_rw_mask ()) > ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : > SECCAT_DATA_REL_RO; > else if (reloc || flag_merge_constants < 2 > ... > /* C and C++ don't allow different variables to share the > same > location. -fmerge-all-constants allows even that (at the > expense of not conforming). */ > ret = SECCAT_RODATA; > else if (DECL_INITIAL (decl) > && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST) > ret = SECCAT_RODATA_MERGE_STR_INIT; > else > ret = SECCAT_RODATA_MERGE_CONST; > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and > .rodata > for non-pic, and mergeable sections are only used if there are no > relocations. This doesn't resolve the problem, unfortunately, because references to discarded comdat symbols are still kept in .rodata: `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in discarded section `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Assert ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o) (That's from building zlib-ng with ASan and your patch on s390). So I was rather thinking about adding a reloc parameter to mergeable_constant_section () and slightly changing the section name when it's nonzero, e.g. from .cst to .cstrel. > Anyway, I'd feel much safer to change it only in GCC 13, at least > initially. That's fine with me. > Or are some linkers (say lld or mold, fod ld.bfd I'm pretty sure it > doesn't, > for gold no idea but unlikely) able to merge even constants with > relocations against them? I'm not sure, but putting constants with relocations into a separate mergeable section shouldn't hurt too much. And if such a linker is implemented some day, there would be no need to tweak gcc.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 14:46 +0200, Jakub Jelinek wrote: > On Wed, Apr 27, 2022 at 02:23:00PM +0200, Jakub Jelinek wrote: > > On Wed, Apr 27, 2022 at 11:59:49AM +0200, Ilya Leoshkevich wrote: > > > I get a .LASANPC reloc there in the first place because of > > > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ > > > but of course it may happen for other reasons as well. > > > > In that case I don't see any benefit to put that into a mergeable > > section. > > Why does that happen? > > Because, when a mergeable section doesn't contain any relocations, I > don't > see any point in making it comdat. Because mergeable sections > themselves > are garbage collected, if some constant isn't referenced at all, it > isn't > emitted, or if referenced, multiple copies of the constant are merged > (or > for mergeable strings even string tail merging is performed). > > Jakub > This is determined by default_elf_select_rtx_section (). If we don't want to mix non-reloc and reloc constants, we need to define a special section there. It seems to me, however, that this all would be made purely for the sake of .LASANPC, which is quite special: it's local, but at the same time it might need to be comdat. I don't think anything like this can appear from compiling C/C++ code. Therefore I wonder if we could just drop it altogether like this? @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned int alignb, ... - emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl))); + emit_move_insn (mem, expand_normal (build_fold_addr_expr (current_function_decl))); ... That's what LLVM is already doing. This will also solve the alignment problem I referred to earlier.
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 11:59 +0200, Ilya Leoshkevich via Gcc-patches wrote: > On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote: > > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc- > > patches wrote: > > > Bootstrapped and regtested on x86_64-redhat-linux and > > > s390x-redhat-linux. Ok for master (or GCC 13 in case this > > > doesn't > > > fit > > > stage4 criteria)? > > > > I'd prefer to defer this to GCC 13 at this point. > > Furthermore, does the linker then actually merge the constants with > > the same constants from other mergeable linkonce sections or other > > mergeable sections? I'm afraid it would only merge constants > > within > > each comdat group and not across the whole ELF object. > > > > Jakub > > > > I experimented with this a little, and actually having a reloc > prevents > merging altogether (the check happens in _bfd_add_merge_section). > > I get a .LASANPC reloc there in the first place because of > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ > but of course it may happen for other reasons as well. I just realized I forgot to mention the "normal" case. There, "aMG" seems to works fine with the whole ELF: $ cat 1.s .globl _start _start: ret .section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat .quad 42 $ cat 2.s .section .rodata.yyy,"aMG",@progbits,8,.yyy,comdat .quad 42 .quad 43 .section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat .quad 42 $ gcc -nostartfiles -fPIE 1.s 2.s $ objdump -D a.out 2000 <.rodata>: 2000: 2a 00 sub(%rax),%al 2002: 00 00 add%al,(%rax) 2004: 00 00 add%al,(%rax) 2006: 00 00 add%al,(%rax) 2008: 2b 00 sub(%rax),%eax 200a: 00 00 add%al,(%rax) 200c: 00 00 add%al,(%rax) ...
Re: [PATCH] Honor COMDAT for mergeable constant sections
On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote: > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc- > patches wrote: > > Bootstrapped and regtested on x86_64-redhat-linux and > > s390x-redhat-linux. Ok for master (or GCC 13 in case this doesn't > > fit > > stage4 criteria)? > > I'd prefer to defer this to GCC 13 at this point. > Furthermore, does the linker then actually merge the constants with > the same constants from other mergeable linkonce sections or other > mergeable sections? I'm afraid it would only merge constants within > each comdat group and not across the whole ELF object. > > Jakub > I experimented with this a little, and actually having a reloc prevents merging altogether (the check happens in _bfd_add_merge_section). I get a .LASANPC reloc there in the first place because of https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/ but of course it may happen for other reasons as well.
[PATCH] Honor COMDAT for mergeable constant sections
Bootstrapped and regtested on x86_64-redhat-linux and s390x-redhat-linux. Ok for master (or GCC 13 in case this doesn't fit stage4 criteria)? Building C++ template-heavy code with ASan sometimes leads to bogus "defined in discarded section" linker errors. The reason is that .rodata.FUNC.cstN sections are not placed into COMDAT group sections FUNC. This is important, because ASan puts references to .LASANPC labels into these sections. Discarding the respective .text.FUNC section causes the linker error. Fix by adding SECTION_LINKONCE to .rodata.FUNC.cstN sections in mergeable_constant_section () if the current function has an associated COMDAT group. This is similar to what switch_to_exception_section () is currently doing with .gcc_except_table.FUNC sections. gcc/ChangeLog: * varasm.cc (mergeable_constant_section): Honor COMDAT. gcc/testsuite/ChangeLog: * g++.dg/asan/comdat.C: New test. --- gcc/testsuite/g++.dg/asan/comdat.C | 35 ++ gcc/varasm.cc | 6 - 2 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/asan/comdat.C diff --git a/gcc/testsuite/g++.dg/asan/comdat.C b/gcc/testsuite/g++.dg/asan/comdat.C new file mode 100644 index 000..cd4f3f830a8 --- /dev/null +++ b/gcc/testsuite/g++.dg/asan/comdat.C @@ -0,0 +1,35 @@ +/* Check that we don't emit non-COMDAT rodata. */ + +/* { dg-do compile } */ +/* { dg-final { scan-assembler-not {\.section\t\.rodata\._ZN1hlsIPKcEERS_RKT_\.cst[48],"[^"]*",@progbits,[48]\n} } } */ + +const char *a; + +class b +{ +public: + b (); +}; + +class h +{ +public: + template + h & + operator<< (const c &) + { +d (b ()); +return *this; + } + + void d (b); +}; + +h e (); + +h +g () +{ + e () << a << a << a; + throw; +} diff --git a/gcc/varasm.cc b/gcc/varasm.cc index c41f17d64f7..f2614f0ee39 100644 --- a/gcc/varasm.cc +++ b/gcc/varasm.cc @@ -938,7 +938,11 @@ mergeable_constant_section (machine_mode mode ATTRIBUTE_UNUSED, sprintf (name, "%s.cst%d", prefix, (int) (align / 8)); flags |= (align / 8) | SECTION_MERGE; - return get_section (name, flags, NULL); + if (current_function_decl + && DECL_COMDAT_GROUP (current_function_decl) + && HAVE_COMDAT_GROUP) + flags |= SECTION_LINKONCE; + return get_section (name, flags, current_function_decl); } return readonly_data_section; } -- 2.35.1
[PATCH][GCC11] IBM Z: fix `section type conflict` with -mindirect-branch-table
Bootstrapped and regtested on s390x-redhat-linux. Ok for releases/gcc-11? s390_code_end () puts indirect branch tables into separate sections and tries to switch back to wherever it was in the beginning by calling switch_to_section (current_function_section ()). First of all, this is unnecessary - the other backends don't do it. Furthermore, at this time there is no current function, but if the last processed function was cold, in_cold_section_p remains set. This causes targetm.asm_out.function_section () to call targetm.section_type_flags (), which in absence of current function decl classifies the section as SECTION_WRITE. This causes a section type conflict with the existing SECTION_CODE. gcc/ChangeLog: * config/s390/s390.c (s390_code_end): Do not switch back to code section. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-section-type-conflict.c: New test. (cherry picked from commit 8753b13a31c777cdab0265dae0b68534247908f7) --- gcc/config/s390/s390.c| 1 - .../s390/nobp-section-type-conflict.c | 22 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 8895dd7cc76..2d2e6522eb4 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16700,7 +16700,6 @@ s390_code_end (void) assemble_name_raw (asm_out_file, label_start); fputs ("-.\n", asm_out_file); } - switch_to_section (current_function_section ()); } } } diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c new file mode 100644 index 000..5d78bc99bb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c @@ -0,0 +1,22 @@ +/* Checks that we don't get error: section type conflict with ‘put_page’. */ + +/* { dg-do compile } */ +/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern -mindirect-branch-table -O2" } */ + +int a; +int b (void); +void c (int); + +static void +put_page (void) +{ + if (b ()) +c (a); +} + +__attribute__ ((__section__ (".init.text"), __cold__)) void +d (void) +{ + put_page (); + put_page (); +} -- 2.34.1
[PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? s390_code_end () puts indirect branch tables into separate sections and tries to switch back to wherever it was in the beginning by calling switch_to_section (current_function_section ()). First of all, this is unnecessary - the other backends don't do it. Furthermore, at this time there is no current function, but if the last processed function was cold, in_cold_section_p remains set. This causes targetm.asm_out.function_section () to call targetm.section_type_flags (), which in absence of current function decl classifies the section as SECTION_WRITE. This causes a section type conflict with the existing SECTION_CODE. gcc/ChangeLog: * config/s390/s390.cc (s390_code_end): Do not switch back to code section. gcc/testsuite/ChangeLog: * gcc.target/s390/nobp-section-type-conflict.c: New test. --- gcc/config/s390/s390.cc | 1 - .../s390/nobp-section-type-conflict.c | 22 +++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 43c5c72554a..2db12d4ba4b 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -16809,7 +16809,6 @@ s390_code_end (void) assemble_name_raw (asm_out_file, label_start); fputs ("-.\n", asm_out_file); } - switch_to_section (current_function_section ()); } } } diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c new file mode 100644 index 000..5d78bc99bb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c @@ -0,0 +1,22 @@ +/* Checks that we don't get error: section type conflict with ‘put_page’. */ + +/* { dg-do compile } */ +/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern -mindirect-branch-table -O2" } */ + +int a; +int b (void); +void c (int); + +static void +put_page (void) +{ + if (b ()) +c (a); +} + +__attribute__ ((__section__ (".init.text"), __cold__)) void +d (void) +{ + put_page (); + put_page (); +} -- 2.34.1
[PATCH gcc-11 2/2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use %K, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT, use %K. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * g++.target/s390/mi-thunk.C: New test. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call-z10-pic-nodatarel.c: New test. * gcc.target/s390/call-z10-pic.c: New test. * gcc.target/s390/call-z10.c: New test. * gcc.target/s390/call-z9-pic-nodatarel.c: New test. * gcc.target/s390/call-z9-pic.c: New test. * gcc.target/s390/call-z9.c: New test. * gcc.target/s390/mfentry-m64-pic.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls-pic.c: New test. * gcc.target/s390/tls.c: New test. (cherry picked from commit 0990d93dd8a) --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 81 +-- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 ++ .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 + gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 + gcc/testsuite/gcc.target/s390/call-z10.c | 20 + .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 + gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 + gcc/testsuite/gcc.target/s390/call-z9.c | 20 + gcc/testsuite/gcc.target/s390/call.h | 40 + .../gcc.target/s390/mfentry-m64-pic.c | 9 +++ gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 gcc/testsuite/gcc.target/s390/tls.c | 10 +++ gcc/testsuite/gcc.target/s390/tls.h | 23 ++ 19 files changed, 320 insertions(+), 73 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.h diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md index 15093cb4b30..99c343aa32c 100644 --- a/gcc/config/s390/predicates.md +++ b/gcc/config/s390/predicates.md @@ -101,10 +101,13 @@ (define_special_predicate "bras_sym_operand" (ior (and (match_code "symbol_ref") - (match_test "!flag_pic || SYMBOL_REF_LOCAL_P (op)")) + (ior (match_test "!flag_pic") +(match_test
[PATCH gcc-11 1/2] IBM Z: Define NO_PROFILE_COUNTERS
s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. gcc/testsuite/ChangeLog: * gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new prologue size. * gcc.target/s390/mnop-mcount-m64.c: Likewise. (cherry picked from commit a1c1b7a888a) --- gcc/config/s390/s390.c| 42 +++ gcc/config/s390/s390.h| 2 + .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- 4 files changed, 20 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index c5d4c439bcc..a863dfce9a2 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13120,33 +13120,25 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) { - rtx op[8]; - - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); + rtx op[4]; fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); - op[7] = GEN_INT (UNITS_PER_LONG); - - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; + op[3] = GEN_INT (UNITS_PER_LONG); - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); + op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { - op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT); - op[4] = gen_rtx_CONST (Pmode, op[4]); + op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT); + op[2] = gen_rtx_CONST (Pmode, op[2]); } if (flag_record_mcount) @@ -13160,20 +13152,19 @@ s390_function_profiler (FILE *file, int labelno) warning (OPT_Wcannot_profile, "nested functions cannot be profiled " "with %<-mfentry%> on s390"); else - output_asm_insn ("brasl\t0,%4", op); + output_asm_insn ("brasl\t0,%2", op); } else if (TARGET_64BIT) { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 + -/* brasl */ 3 + /* lg */ 3); + output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 + +/* lg */ 3); else { output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); @@ -13182,15 +13173,14 @@ s390_function_profiler (FILE *file, int labelno) else { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 + -/* brasl */ 3 + /* l */ 2); + output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 + +/* l */ 2); else { output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines for nested functions. */ diff --git a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c index b2ad9f5bced..874ceb96fe8 100644 --- a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c +++ b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c @@ -4,5 +4,5 @@ void profileme
[PATCH gcc-11 0/2] Backport kpatch changes
Hi, This series contains a backport of kpatch changes needed to support https://github.com/dynup/kpatch/pull/1203 so that it could be used in RHEL 9. The patches have been in master for 4 months now without issues. Bootstrapped and regtested on s390x-redhat-linux. Ok for gcc-11? Best regards, Ilya Ilya Leoshkevich (2): IBM Z: Define NO_PROFILE_COUNTERS IBM Z: Use @PLT symbols for local functions in 64-bit mode gcc/config/s390/predicates.md | 9 +- gcc/config/s390/s390.c| 115 +++--- gcc/config/s390/s390.h| 2 + gcc/config/s390/s390.md | 32 ++--- gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 +++ gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 +++ gcc/testsuite/gcc.target/s390/call-z10.c | 20 +++ .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 +++ gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 +++ gcc/testsuite/gcc.target/s390/call-z9.c | 20 +++ gcc/testsuite/gcc.target/s390/call.h | 40 ++ .../gcc.target/s390/mfentry-m64-pic.c | 9 ++ .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +--- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 +++ gcc/testsuite/gcc.target/s390/tls.c | 10 ++ gcc/testsuite/gcc.target/s390/tls.h | 23 22 files changed, 336 insertions(+), 97 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.h -- 2.31.1
Re: [PATCH v3 3/3] reassoc: Test rank biasing
On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote: > On Sun, 26 Sep 2021, Ilya Leoshkevich wrote: > > > Add both positive and negative tests. > > The tests will likely be quite fragile with respect to what is > actually vectorized on which target. If you move the tests > to gcc.dg/vect/ you could at least do > > /* { dg-require-effective-target vect_int } */ > > do you need to look for the exact GIMPLE IL or is it enough to > verify we are vectorizing the reduction? Actually I don't think vectorization is that important here, and I only check how many times sum_x = sum_y + _z appears. So I use (?:vect_)?, which may or may not be there. An alternative I considered was to use -fno-tree-vectorize to get smaller regexes, but I thought it would be nice to know that vectorization does not mess up reassociation results. Best regards, Ilya
[PATCH v3 3/3] reassoc: Test rank biasing
Add both positive and negative tests. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-46.c: New test. * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests. * gcc.dg/tree-ssa/reassoc-47.c: New test. * gcc.dg/tree-ssa/reassoc-48.c: New test. * gcc.dg/tree-ssa/reassoc-49.c: New test. * gcc.dg/tree-ssa/reassoc-50.c: New test. * gcc.dg/tree-ssa/reassoc-51.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 + gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 7 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c new file mode 100644 index 000..97563dd929f --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#include "reassoc-46.h" + +/* Check that the loop accumulator is added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h new file mode 100644 index 000..e60b490ea0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h @@ -0,0 +1,33 @@ +#define M 1024 +unsigned int arr1[M]; +unsigned int arr2[M]; +volatile unsigned int sink; + +unsigned int +test (void) +{ + unsigned int sum = 0; + for (int i = 0; i < M; i++) +{ +#ifdef MODIFY + /* Modify the loop accumulator using a chain of operations - this should + not affect its rank biasing. */ + sum |= 1; + sum ^= 2; +#endif +#ifdef STORE + /* Save the loop accumulator into a global variable - this should not + affect its rank biasing. */ + sink = sum; +#endif +#ifdef USE + /* Add a tricky use of the loop accumulator - this should prevent its + rank biasing. */ + i = (i + sum) % M; +#endif + /* Use addends with different ranks. */ + sum += arr1[i]; + sum += arr2[((i ^ 1) + 1) % M]; +} + return sum; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c new file mode 100644 index 000..1b0f0fdabe1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#include "reassoc-46.h" + +/* Check that if the loop accumulator is saved into a global variable, it's + still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c new file mode 100644 index 000..13836ebe8e6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is modified using a chain of operations + other than addition, its new value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ (?:vect_)?_[\d._]+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c new file mode 100644 index 000..c1136a447a2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */ + +#define MODIFY +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is both modified using a chain of + operations other than addition and stored into a global variable, its new + value is still added last. */ +/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = (?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d
[PATCH v3 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
PR tree-optimization/49749 introduced code that shortens dependency chains containing loop accumulators by placing them last on operand lists of associative operations. 456.hmmer benchmark on s390 could benefit from this, however, the code that needs it modifies loop accumulator before using it, and since only so-called loop-carried phis are are treated as loop accumulators, the code in the present form doesn't really help. According to Bill Schmidt - the original author - such a conservative approach was chosen so as to avoid unnecessarily swapping operands, which might cause unpredictable effects. However, giving special treatment to forms of loop accumulators is acceptable. The definition of loop-carried phi is: it's a single-use phi, which is used in the same innermost loop it's defined in, at least one argument of which is defined in the same innermost loop as the phi itself. Given this, it seems natural to treat single uses of such phis as phis themselves. gcc/ChangeLog: * tree-ssa-reassoc.c (biased_names): New global. (propagate_bias_p): New function. (loop_carried_phi): Remove. (propagate_rank): Propagate bias along single uses. (get_rank): Update biased_names when needed. --- gcc/tree-ssa-reassoc.c | 109 - 1 file changed, 74 insertions(+), 35 deletions(-) diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 420c14e8cf5..db9fb4e1cac 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -211,6 +211,10 @@ static int64_t *bb_rank; /* Operand->rank hashtable. */ static hash_map *operand_rank; +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be + biased. */ +static auto_bitmap biased_names; + /* Vector of SSA_NAMEs on which after reassociate_bb is done with all basic blocks the CFG should be adjusted - basic blocks split right after that SSA_NAME's definition statement and before @@ -256,6 +260,53 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi) the rank difference between two blocks. */ #define PHI_LOOP_BIAS (1 << 15) +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's + operands to the STMT's left-hand side. The goal is to preserve bias in code + like this: + + x_1 = phi(x_0, x_2) + a = x_1 | 1 + b = a ^ 2 + .MEM = b + c = b + d + x_2 = c + e + + That is, we need to preserve bias along single-use chains originating from + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be + uses, because only they participate in rank propagation. */ +static bool +propagate_bias_p (gimple *stmt) +{ + use_operand_p use; + imm_use_iterator use_iter; + gimple *single_use_stmt = NULL; + + if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_reference) +return false; + + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) +{ + gimple *current_use_stmt = USE_STMT (use); + + if (is_gimple_assign (current_use_stmt) + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME) + { + if (single_use_stmt != NULL && single_use_stmt != current_use_stmt) + return false; + single_use_stmt = current_use_stmt; + } +} + + if (single_use_stmt == NULL) +return false; + + if (gimple_bb (stmt)->loop_father + != gimple_bb (single_use_stmt)->loop_father) +return false; + + return true; +} + /* Rank assigned to a phi statement. If STMT is a loop-carried phi of an innermost loop, and the phi has only a single use which is inside the loop, then the rank is the block rank of the loop latch plus an @@ -313,49 +364,27 @@ phi_rank (gimple *stmt) return bb_rank[bb->index]; } -/* If EXP is an SSA_NAME defined by a PHI statement that represents a - loop-carried dependence of an innermost loop, return TRUE; else - return FALSE. */ -static bool -loop_carried_phi (tree exp) -{ - gimple *phi_stmt; - int64_t block_rank; - - if (TREE_CODE (exp) != SSA_NAME - || SSA_NAME_IS_DEFAULT_DEF (exp)) -return false; - - phi_stmt = SSA_NAME_DEF_STMT (exp); - - if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI) -return false; - - /* Non-loop-carried phis have block rank. Loop-carried phis have - an additional bias added in. If this phi doesn't have block rank, - it's biased and should not be propagated. */ - block_rank = bb_rank[gimple_bb (phi_stmt)->index]; - - if (phi_rank (phi_stmt) != block_rank) -return true; - - return false; -} - /* Return the maximum of RANK and the rank that should be propagated from expression OP. For most operands, this is just the rank of OP. For loop-carried phis, the value is zero to avoid undoing the bias in favor of the phi. */ static int64_t -propagate_rank (int64_t rank, tree op) +propagate_rank (int64_t rank, tree op, bool *maybe_biased_p) { int64_t op_rank; - if (loop_carried_phi (op)) -
[PATCH v3 1/3] reassoc: Do not bias loop-carried PHIs early
Biasing loop-carried PHIs during the 1st reassociation pass interferes with reduction chains and does not bring measurable benefits, so do it only during the 2nd reassociation pass. gcc/ChangeLog: * passes.def (pass_reassoc): Rename parameter to early_p. * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p): New variable. (phi_rank): Don't bias loop-carried phi ranks before vectorization pass. (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter. (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p initializer. (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p value. (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to execute_reassoc. (pass_reassoc::bias_loop_carried_phi_ranks_p): New member. --- gcc/passes.def | 4 ++-- gcc/tree-ssa-reassoc.c | 16 ++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index d7a1f8c97a6..c5f915d04c6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -242,7 +242,7 @@ along with GCC; see the file COPYING3. If not see /* Identify paths that should never be executed in a conforming program and isolate those paths. */ NEXT_PASS (pass_isolate_erroneous_paths); - NEXT_PASS (pass_reassoc, true /* insert_powi_p */); + NEXT_PASS (pass_reassoc, true /* early_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt, false /* early_p */); @@ -325,7 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_lower_switch); NEXT_PASS (pass_cse_reciprocals); - NEXT_PASS (pass_reassoc, false /* insert_powi_p */); + NEXT_PASS (pass_reassoc, false /* early_p */); NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 8498cfc7aa8..420c14e8cf5 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -180,6 +180,10 @@ along with GCC; see the file COPYING3. If not see point 3a in the pass header comment. */ static bool reassoc_insert_powi_p; +/* Enable biasing ranks of loop accumulators. We don't want this before + vectorization, since it interferes with reduction chains. */ +static bool reassoc_bias_loop_carried_phi_ranks_p; + /* Statistics */ static struct { @@ -269,6 +273,9 @@ phi_rank (gimple *stmt) use_operand_p use; gimple *use_stmt; + if (!reassoc_bias_loop_carried_phi_ranks_p) +return bb_rank[bb->index]; + /* We only care about real loops (those with a latch). */ if (!father->latch) return bb_rank[bb->index]; @@ -6940,9 +6947,10 @@ fini_reassoc (void) optimization of a gimple conditional. Otherwise returns zero. */ static unsigned int -execute_reassoc (bool insert_powi_p) +execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p) { reassoc_insert_powi_p = insert_powi_p; + reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p; init_reassoc (); @@ -6983,15 +6991,19 @@ public: { gcc_assert (n == 0); insert_powi_p = param; + bias_loop_carried_phi_ranks_p = !param; } virtual bool gate (function *) { return flag_tree_reassoc != 0; } virtual unsigned int execute (function *) -{ return execute_reassoc (insert_powi_p); } + { +return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p); + } private: /* Enable insertion of __builtin_powi calls during execute_reassoc. See point 3a in the pass header comment. */ bool insert_powi_p; + bool bias_loop_carried_phi_ranks_p; }; // class pass_reassoc } // anon namespace -- 2.31.1
[PATCH v3 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579976.html Changes in v3: * Do not propagate bias along tcc_references. * Call get_rank () before checking biased_names. * Add loop-carried phis to biased_names. * Move the propagate_bias_p () call outside of the loop. * Test with -ftree-vectorize, adjust expectations. Ilya Leoshkevich (3): reassoc: Do not bias loop-carried PHIs early reassoc: Propagate PHI_LOOP_BIAS along single uses reassoc: Test rank biasing gcc/passes.def | 4 +- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 ++ gcc/tree-ssa-reassoc.c | 125 +++-- 9 files changed, 180 insertions(+), 39 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c -- 2.31.1
Re: [PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
On Thu, 2021-09-23 at 13:55 +0200, Richard Biener wrote: > On Wed, 22 Sep 2021, Ilya Leoshkevich wrote: > > > PR tree-optimization/49749 introduced code that shortens dependency > > chains containing loop accumulators by placing them last on operand > > lists of associative operations. > > > > 456.hmmer benchmark on s390 could benefit from this, however, the > > code > > that needs it modifies loop accumulator before using it, and since > > only > > so-called loop-carried phis are are treated as loop accumulators, > > the > > code in the present form doesn't really help. According to Bill > > Schmidt - the original author - such a conservative approach was > > chosen > > so as to avoid unnecessarily swapping operands, which might cause > > unpredictable effects. However, giving special treatment to forms > > of > > loop accumulators is acceptable. > > > > The definition of loop-carried phi is: it's a single-use phi, which > > is > > used in the same innermost loop it's defined in, at least one > > argument > > of which is defined in the same innermost loop as the phi itself. > > Given this, it seems natural to treat single uses of such phis as > > phis > > themselves. > > > > gcc/ChangeLog: > > > > * tree-ssa-reassoc.c (biased_names): New global. > > (propagate_bias_p): New function. > > (loop_carried_phi): Remove. > > (propagate_rank): Propagate bias along single uses. > > (get_rank): Update biased_names when needed. > > --- > > gcc/tree-ssa-reassoc.c | 97 -- > > > > 1 file changed, 64 insertions(+), 33 deletions(-) > > > > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > > index 420c14e8cf5..2f7a8882aac 100644 > > --- a/gcc/tree-ssa-reassoc.c > > +++ b/gcc/tree-ssa-reassoc.c > > @@ -211,6 +211,10 @@ static int64_t *bb_rank; > > /* Operand->rank hashtable. */ > > static hash_map *operand_rank; > > > > +/* SSA_NAMEs that are forms of loop accumulators and whose ranks > > need to be > > + biased. */ > > +static auto_bitmap biased_names; > > + > > /* Vector of SSA_NAMEs on which after reassociate_bb is done with > > all basic blocks the CFG should be adjusted - basic blocks > > split right after that SSA_NAME's definition statement and > > before > > @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator > > *gsi) > > the rank difference between two blocks. */ > > #define PHI_LOOP_BIAS (1 << 15) > > > > +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of > > the STMT's > > + operands to the STMT's left-hand side. The goal is to preserve > > bias in code > > + like this: > > + > > + x_1 = phi(x_0, x_2) > > + a = x_1 | 1 > > + b = a ^ 2 > > + .MEM = b > > + c = b + d > > + x_2 = c + e > > + > > + That is, we need to preserve bias along single-use chains > > originating from > > + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are > > considered to be > > + uses, because only they participate in rank propagation. */ > > +static bool > > +propagate_bias_p (gimple *stmt) > > +{ > > + use_operand_p use; > > + imm_use_iterator use_iter; > > + gimple *single_use_stmt = NULL; > > + > > + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) > > + { > > + gimple *current_use_stmt = USE_STMT (use); > > + > > + if (is_gimple_assign (current_use_stmt) > > + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == > > SSA_NAME) > > + { > > + if (single_use_stmt != NULL) > > what if single_use_stmt == current_use_stmt? We might have two > uses on a stmt after all - should that still be biased? I guess not > and thus the check is correct? Come to think of it, it should be ok to bias it. Things like x = x + x are fine (this particular case can be transformed into something else earlier, but I think the overall point still holds). > > > + return false; > > + single_use_stmt = current_use_stmt; > > + } > > + } > > + > > + if (single_use_stmt == NULL) > > + return false; > > + > > + if (gimple_bb (stmt)->loop_father > > + != gimple_bb (single_use_stmt)->loop_father) > > + return false; > > + > > + return true; > > +} > &g
[PATCH v2 3/3] reassoc: Test rank biasing
Add both positive and negative tests. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/reassoc-46.c: New test. * gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests. * gcc.dg/tree-ssa/reassoc-47.c: New test. * gcc.dg/tree-ssa/reassoc-48.c: New test. * gcc.dg/tree-ssa/reassoc-49.c: New test. * gcc.dg/tree-ssa/reassoc-50.c: New test. * gcc.dg/tree-ssa/reassoc-51.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 + gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 7 files changed, 90 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c new file mode 100644 index 000..69e02bc4d4a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include "reassoc-46.h" + +/* Check that the loop accumulator is added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h new file mode 100644 index 000..e60b490ea0d --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h @@ -0,0 +1,33 @@ +#define M 1024 +unsigned int arr1[M]; +unsigned int arr2[M]; +volatile unsigned int sink; + +unsigned int +test (void) +{ + unsigned int sum = 0; + for (int i = 0; i < M; i++) +{ +#ifdef MODIFY + /* Modify the loop accumulator using a chain of operations - this should + not affect its rank biasing. */ + sum |= 1; + sum ^= 2; +#endif +#ifdef STORE + /* Save the loop accumulator into a global variable - this should not + affect its rank biasing. */ + sink = sum; +#endif +#ifdef USE + /* Add a tricky use of the loop accumulator - this should prevent its + rank biasing. */ + i = (i + sum) % M; +#endif + /* Use addends with different ranks. */ + sum += arr1[i]; + sum += arr2[((i ^ 1) + 1) % M]; +} + return sum; +} diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c new file mode 100644 index 000..84b51ccddb0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define MODIFY +#include "reassoc-46.h" + +/* Check that if the loop accumulator is saved into a global variable, it's + still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c new file mode 100644 index 000..53ae8820281 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is modified using a chain of operations + other than addition, its new value is still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c new file mode 100644 index 000..a6941d5ac2b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#define MODIFY +#define STORE +#include "reassoc-46.h" + +/* Check that if the loop accumulator is both modified using a chain of + operations other than addition and stored into a global variable, its new + value is still added last. */ +/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ _\d+)} 1 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c new file mode 100644 index 000..68cd308c4f1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimize
[PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
PR tree-optimization/49749 introduced code that shortens dependency chains containing loop accumulators by placing them last on operand lists of associative operations. 456.hmmer benchmark on s390 could benefit from this, however, the code that needs it modifies loop accumulator before using it, and since only so-called loop-carried phis are are treated as loop accumulators, the code in the present form doesn't really help. According to Bill Schmidt - the original author - such a conservative approach was chosen so as to avoid unnecessarily swapping operands, which might cause unpredictable effects. However, giving special treatment to forms of loop accumulators is acceptable. The definition of loop-carried phi is: it's a single-use phi, which is used in the same innermost loop it's defined in, at least one argument of which is defined in the same innermost loop as the phi itself. Given this, it seems natural to treat single uses of such phis as phis themselves. gcc/ChangeLog: * tree-ssa-reassoc.c (biased_names): New global. (propagate_bias_p): New function. (loop_carried_phi): Remove. (propagate_rank): Propagate bias along single uses. (get_rank): Update biased_names when needed. --- gcc/tree-ssa-reassoc.c | 97 -- 1 file changed, 64 insertions(+), 33 deletions(-) diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 420c14e8cf5..2f7a8882aac 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -211,6 +211,10 @@ static int64_t *bb_rank; /* Operand->rank hashtable. */ static hash_map *operand_rank; +/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be + biased. */ +static auto_bitmap biased_names; + /* Vector of SSA_NAMEs on which after reassociate_bb is done with all basic blocks the CFG should be adjusted - basic blocks split right after that SSA_NAME's definition statement and before @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi) the rank difference between two blocks. */ #define PHI_LOOP_BIAS (1 << 15) +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's + operands to the STMT's left-hand side. The goal is to preserve bias in code + like this: + + x_1 = phi(x_0, x_2) + a = x_1 | 1 + b = a ^ 2 + .MEM = b + c = b + d + x_2 = c + e + + That is, we need to preserve bias along single-use chains originating from + loop-carried phis. Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be + uses, because only they participate in rank propagation. */ +static bool +propagate_bias_p (gimple *stmt) +{ + use_operand_p use; + imm_use_iterator use_iter; + gimple *single_use_stmt = NULL; + + FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt)) +{ + gimple *current_use_stmt = USE_STMT (use); + + if (is_gimple_assign (current_use_stmt) + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME) + { + if (single_use_stmt != NULL) + return false; + single_use_stmt = current_use_stmt; + } +} + + if (single_use_stmt == NULL) +return false; + + if (gimple_bb (stmt)->loop_father + != gimple_bb (single_use_stmt)->loop_father) +return false; + + return true; +} + /* Rank assigned to a phi statement. If STMT is a loop-carried phi of an innermost loop, and the phi has only a single use which is inside the loop, then the rank is the block rank of the loop latch plus an @@ -313,46 +361,23 @@ phi_rank (gimple *stmt) return bb_rank[bb->index]; } -/* If EXP is an SSA_NAME defined by a PHI statement that represents a - loop-carried dependence of an innermost loop, return TRUE; else - return FALSE. */ -static bool -loop_carried_phi (tree exp) -{ - gimple *phi_stmt; - int64_t block_rank; - - if (TREE_CODE (exp) != SSA_NAME - || SSA_NAME_IS_DEFAULT_DEF (exp)) -return false; - - phi_stmt = SSA_NAME_DEF_STMT (exp); - - if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI) -return false; - - /* Non-loop-carried phis have block rank. Loop-carried phis have - an additional bias added in. If this phi doesn't have block rank, - it's biased and should not be propagated. */ - block_rank = bb_rank[gimple_bb (phi_stmt)->index]; - - if (phi_rank (phi_stmt) != block_rank) -return true; - - return false; -} - /* Return the maximum of RANK and the rank that should be propagated from expression OP. For most operands, this is just the rank of OP. For loop-carried phis, the value is zero to avoid undoing the bias in favor of the phi. */ static int64_t -propagate_rank (int64_t rank, tree op) +propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p) { int64_t op_rank; - if (loop_carried_phi (op)) -return rank; + if (TREE_CODE (op) == SSA_NAME + && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op))) +{ + i
[PATCH v2 1/3] reassoc: Do not bias loop-carried PHIs early
Biasing loop-carried PHIs during the 1st reassociation pass interferes with reduction chains and does not bring measurable benefits, so do it only during the 2nd reassociation pass. gcc/ChangeLog: * passes.def (pass_reassoc): Rename parameter to early_p. * tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p): New variable. (phi_rank): Don't bias loop-carried phi ranks before vectorization pass. (execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter. (pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p initializer. (pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p value. (pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to execute_reassoc. (pass_reassoc::bias_loop_carried_phi_ranks_p): New member. --- gcc/passes.def | 4 ++-- gcc/tree-ssa-reassoc.c | 16 ++-- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index d7a1f8c97a6..c5f915d04c6 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -242,7 +242,7 @@ along with GCC; see the file COPYING3. If not see /* Identify paths that should never be executed in a conforming program and isolate those paths. */ NEXT_PASS (pass_isolate_erroneous_paths); - NEXT_PASS (pass_reassoc, true /* insert_powi_p */); + NEXT_PASS (pass_reassoc, true /* early_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt, false /* early_p */); @@ -325,7 +325,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_lower_switch); NEXT_PASS (pass_cse_reciprocals); - NEXT_PASS (pass_reassoc, false /* insert_powi_p */); + NEXT_PASS (pass_reassoc, false /* early_p */); NEXT_PASS (pass_strength_reduction); NEXT_PASS (pass_split_paths); NEXT_PASS (pass_tracer); diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 8498cfc7aa8..420c14e8cf5 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -180,6 +180,10 @@ along with GCC; see the file COPYING3. If not see point 3a in the pass header comment. */ static bool reassoc_insert_powi_p; +/* Enable biasing ranks of loop accumulators. We don't want this before + vectorization, since it interferes with reduction chains. */ +static bool reassoc_bias_loop_carried_phi_ranks_p; + /* Statistics */ static struct { @@ -269,6 +273,9 @@ phi_rank (gimple *stmt) use_operand_p use; gimple *use_stmt; + if (!reassoc_bias_loop_carried_phi_ranks_p) +return bb_rank[bb->index]; + /* We only care about real loops (those with a latch). */ if (!father->latch) return bb_rank[bb->index]; @@ -6940,9 +6947,10 @@ fini_reassoc (void) optimization of a gimple conditional. Otherwise returns zero. */ static unsigned int -execute_reassoc (bool insert_powi_p) +execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p) { reassoc_insert_powi_p = insert_powi_p; + reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p; init_reassoc (); @@ -6983,15 +6991,19 @@ public: { gcc_assert (n == 0); insert_powi_p = param; + bias_loop_carried_phi_ranks_p = !param; } virtual bool gate (function *) { return flag_tree_reassoc != 0; } virtual unsigned int execute (function *) -{ return execute_reassoc (insert_powi_p); } + { +return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p); + } private: /* Enable insertion of __builtin_powi calls during execute_reassoc. See point 3a in the pass header comment. */ bool insert_powi_p; + bool bias_loop_carried_phi_ranks_p; }; // class pass_reassoc } // anon namespace -- 2.31.1
[PATCH v2 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses
This is an update to my very old patch with the review comments addressed. Bootstrapped and regtested x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548785.html Changes in v2: * Disable PHI biasing in the early pass instance in a separate patch. * Replace s390-specific tests with the generic tree-ssa ones. * Replace the fragile (op_rank & PHI_LOOP_BIAS) test with auto_bitmap biased_names. The review suggestion was to rather check whether op is defined by a loop-carried phi, but this would allow detecting only single assingments, and not assignment chains. Another alternative that would make the check less fragile was to use saturating addition in order to prevent overflows into the PHI_LOOP_BIAS bit, but auto_bitmap of SSA_NAMEs allows graceful processing of large basic blocks, and its memory overhead looks acceptable. * Restructure the code to make it a bit more readable. The overall logic is the same as in v1. I considered implementing an idea from [1], more specifically, detecting single-use chains in is_phi_for_stmt() so that swap_ops_for_binary_stmt() shifts the corresponding operand towards the end. These two functions actually seem to serve a very related purpose. However, for single-use chain detection we would still need to recursively traverse SSA_NAME_DEF_STMTs of operands, which propagate_rank() and friends already do. So this would not have resulted in a significant code simplification. [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549149.html Ilya Leoshkevich (3): reassoc: Do not bias loop-carried PHIs early reassoc: Propagate PHI_LOOP_BIAS along single uses reassoc: Test rank biasing gcc/passes.def | 4 +- gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c | 7 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c | 9 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 ++ gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 ++ gcc/tree-ssa-reassoc.c | 113 ++--- 9 files changed, 170 insertions(+), 37 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c -- 2.31.1
[PATCH] IBM Z: Enable LSan and TSan
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? libsanitizer/ChangeLog: * configure.tgt (s390*-*-linux*): Enable LSan and TSan for s390x. --- libsanitizer/configure.tgt | 5 + 1 file changed, 5 insertions(+) diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt index 0ca5d9fd924..f635e412bdc 100644 --- a/libsanitizer/configure.tgt +++ b/libsanitizer/configure.tgt @@ -41,6 +41,11 @@ case "${target}" in sparc*-*-linux*) ;; s390*-*-linux*) + if test x$ac_cv_sizeof_void_p = x8; then + TSAN_SUPPORTED=yes + LSAN_SUPPORTED=yes + TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_s390x.lo + fi ;; sparc*-*-solaris2.11*) ;; -- 2.31.1
[PATCH] IBM Z: Fix 5 tests in 31-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? gcc/testsuite/ChangeLog: * gcc.target/s390/global-array-element-pic2.c: Add -mzarch, add an expectation for 31-bit mode. * gcc.target/s390/load-imm64-1.c: Use unsigned long long. * gcc.target/s390/load-imm64-2.c: Likewise. * gcc.target/s390/vector/long-double-vx-macro-off-on.c: Use -mzarch. * gcc.target/s390/vector/long-double-vx-macro-on-off.c: Likewise. --- gcc/testsuite/gcc.target/s390/global-array-element-pic2.c| 5 +++-- gcc/testsuite/gcc.target/s390/load-imm64-1.c | 4 ++-- gcc/testsuite/gcc.target/s390/load-imm64-2.c | 4 ++-- .../gcc.target/s390/vector/long-double-vx-macro-off-on.c | 2 +- .../gcc.target/s390/vector/long-double-vx-macro-on-off.c | 2 +- 5 files changed, 9 insertions(+), 8 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c index 72b87d40b85..0ee10841cac 100644 --- a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c +++ b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c @@ -1,6 +1,6 @@ /* Test accesses to global array elements in PIC code. */ /* { dg-do compile } */ -/* { dg-options "-O1 -march=z10 -fPIC" } */ +/* { dg-options "-O1 -march=z10 -mzarch -fPIC" } */ extern char a[] __attribute__ ((aligned (2))); extern char *b; @@ -8,6 +8,7 @@ extern char *b; void c() { b = a + 4; - /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" } } */ + /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" { target lp64 } } } */ + /* { dg-final { scan-assembler "(?n)\n\tlrl\t%r\\d+,a@GOTENT\n" { target { ! lp64 } } } } */ /* { dg-final { scan-assembler-not "(?n)\n\tlarl\t%r\\d+,a\[^@\]" } } */ } diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c b/gcc/testsuite/gcc.target/s390/load-imm64-1.c index 03d17f59096..8e812f2f01d 100644 --- a/gcc/testsuite/gcc.target/s390/load-imm64-1.c +++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c @@ -4,10 +4,10 @@ /* { dg-do compile } */ /* { dg-options "-O3 -march=z9-109" } */ -unsigned long +unsigned long long magic (void) { - return 0x3f08c5392f756cd; + return 0x3f08c5392f756cdULL; } /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c b/gcc/testsuite/gcc.target/s390/load-imm64-2.c index ee0ff3b0a91..c3536b4d031 100644 --- a/gcc/testsuite/gcc.target/s390/load-imm64-2.c +++ b/gcc/testsuite/gcc.target/s390/load-imm64-2.c @@ -4,10 +4,10 @@ /* { dg-do compile } */ /* { dg-options "-O3 -march=z10" } */ -unsigned long +unsigned long long magic (void) { - return 0x3f08c5392f756cd; + return 0x3f08c5392f756cdULL; } /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */ diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c index 2d67679bb11..513912e669d 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target target_attribute } */ -/* { dg-options "-march=z14" } */ +/* { dg-options "-march=z14 -mzarch" } */ #if !defined(__LONG_DOUBLE_VX__) #error #endif diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c index 6f264313408..6b3cb321338 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target target_attribute } */ -/* { dg-options "-march=z13" } */ +/* { dg-options "-march=z13 -mzarch" } */ #if defined(__LONG_DOUBLE_VX__) #error #endif -- 2.31.1
[PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to weak symbols in non-PIC code (Ulrich). Add TLS tests. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(), add tests for these cases. This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use %K, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT, use %K. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * g++.target/s390/mi-thunk.C: New test. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call-z10-pic-nodatarel.c: New test. * gcc.target/s390/call-z10-pic.c: New test. * gcc.target/s390/call-z10.c: New test. * gcc.target/s390/call-z9-pic-nodatarel.c: New test. * gcc.target/s390/call-z9-pic.c: New test. * gcc.target/s390/call-z9.c: New test. * gcc.target/s390/mfentry-m64-pic.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls-pic.c: New test. * gcc.target/s390/tls.c: New test. --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 81 +-- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/g++.target/s390/mi-thunk.C | 23 ++ .../gcc.target/s390/call-z10-pic-nodatarel.c | 20 + gcc/testsuite/gcc.target/s390/call-z10-pic.c | 20 + gcc/testsuite/gcc.target/s390/call-z10.c | 20 + .../gcc.target/s390/call-z9-pic-nodatarel.c | 18 + gcc/testsuite/gcc.target/s390/call-z9-pic.c | 18 + gcc/testsuite/gcc.target/s390/call-z9.c | 20 + gcc/testsuite/gcc.target/s390/call.h | 40 + .../gcc.target/s390/mfentry-m64-pic.c | 9 +++ gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls-pic.c | 14 gcc/testsuite/gcc.target/s390/tls.c | 10 +++ gcc/testsuite/gcc.target/s390/tls.h | 23 ++ 19 files changed, 320 insertions(+), 73 deletions(-) create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/tls.c create mode 1006
Re: [PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
On Wed, 2021-07-07 at 21:03 +0200, Ilya Leoshkevich wrote: > Bootstrapped and regtested on s390x-redhat-linux. Ok for master? > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html > v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to > UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to > weak symbols in non-PIC code (Ulrich). Add TLS tests. > > > > This helps with generating code for kernel hotpatches, which contain > individual functions and are loaded more than 2G away from vmlinux. > This should not create performance regressions for the normal use > cases, because for local functions ld replaces @PLT calls with direct > calls. Please disregard this patch, I just realized I missed two output_asm_insn () calls in s390.c: one in function_profiler () and one in s390_output_mi_thunk (). I'll send a v3.
[PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to UNSPEC_PLT31 (Ulrich, Andreas). Do not append @PLT only to weak symbols in non-PIC code (Ulrich). Add TLS tests. This helps with generating code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/predicates.md (bras_sym_operand): Accept all functions in 64-bit mode, use UNSPEC_PLT31. (larl_operand): Use UNSPEC_PLT31. * config/s390/s390.c (s390_loadrelative_operand_p): Likewise. (legitimize_pic_address): Likewise. (s390_emit_tls_call_insn): Mark __tls_get_offset as function, use UNSPEC_PLT31. (s390_delegitimize_address): Use UNSPEC_PLT31. (s390_output_addr_const_extra): Likewise. (print_operand): Add @PLT to TLS calls, handle %K. (s390_function_profiler): Mark __fentry__/_mcount as function, use UNSPEC_PLT31. (s390_output_mi_thunk): Use only UNSPEC_GOT. (s390_emit_call): Use UNSPEC_PLT31. (s390_emit_tpf_eh_return): Mark __tpf_eh_return as function. * config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT. (*movdi_64): Use %K. (reload_base_64): Likewise. (*sibcall_brc): Likewise. (*sibcall_brcl): Likewise. (*sibcall_value_brc): Likewise. (*sibcall_value_brcl): Likewise. (*bras): Likewise. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (*bras_tls): Likewise. (*brasl_tls): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. (@split_stack_call): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call31-z10-pic-nodatarel.c: New test. * gcc.target/s390/call31-z10-pic.c: New test. * gcc.target/s390/call31-z10.c: New test. * gcc.target/s390/call31-z9-pic-nodatarel.c: New test. * gcc.target/s390/call31-z9-pic.c: New test. * gcc.target/s390/call31-z9.c: New test. * gcc.target/s390/call64-z10-pic-nodatarel.c: New test. * gcc.target/s390/call64-z10-pic.c: New test. * gcc.target/s390/call64-z10.c: New test. * gcc.target/s390/call64-z9-pic-nodatarel.c: New test. * gcc.target/s390/call64-z9-pic.c: New test. * gcc.target/s390/call64-z9.c: New test. * gcc.target/s390/tls.h: Common code for the new TLS tests. * gcc.target/s390/tls31-pic.c: New test. * gcc.target/s390/tls31.c: New test. * gcc.target/s390/tls64-pic.c: New test. * gcc.target/s390/tls64.c: New test. --- gcc/config/s390/predicates.md | 9 ++- gcc/config/s390/s390.c| 73 ++- gcc/config/s390/s390.md | 32 gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/gcc.target/s390/call.h | 40 ++ .../s390/call31-z10-pic-nodatarel.c | 16 .../gcc.target/s390/call31-z10-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 .../s390/call64-z10-pic-nodatarel.c | 17 + .../gcc.target/s390/call64-z10-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z10.c| 15 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9.c | 15 gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 +-- gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +- gcc/testsuite/gcc.target/s390/tls.h | 23 ++ gcc/testsuite/gcc.target/s390/tls31-pic.c | 14 gcc/testsuite/gcc.target/s390/tls31.c | 9 +++ gcc/testsuite/gcc.target/s390/tls64-pic.c | 14 gcc/testsuite/gcc.target/s390/tls64.c | 9 +++ 25 files changed, 382 insertions(+), 69 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c create mode 100644 gcc/tes
[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? This helps with generating the code for kernel hotpatches, which contain individual functions and are loaded more than 2G away from vmlinux. This should not create performance regressions for the normal use cases, because for local functions ld replaces @PLT calls with direct calls. gcc/ChangeLog: * config/s390/s390.c (print_operand): Handle %K. * config/s390/s390.md (*movdi_64): Use %K for larl. (reload_base_64): Likewise. (*sibcall_brc): Use %K for j. (*sibcall_brcl): Use %K for jg. (*sibcall_value_brc): Use %K for j. (*sibcall_value_brcl): Use %K for jg. (*bras): Use %K. (*brasl): Likewise. (*bras_r): Likewise. (*brasl_r): Likewise. (main_base_64): Use %K for larl. (reload_base_64): Likewise. (@split_stack_call): Use %K for jg. gcc/testsuite/ChangeLog: * g++.dg/ext/visibility/noPLT.C: Skip on s390x. * gcc.target/s390/nodatarel-1.c: Move foostatic to the new tests. * gcc.target/s390/pr80080-4.c: Allow @PLT suffix. * gcc.target/s390/risbg-ll-3.c: Likewise. * gcc.target/s390/call.h: Common code for the new tests. * gcc.target/s390/call31-z10-pic-nodatarel.c: New test. * gcc.target/s390/call31-z10-pic.c: New test. * gcc.target/s390/call31-z10.c: New test. * gcc.target/s390/call31-z9-pic-nodatarel.c: New test. * gcc.target/s390/call31-z9-pic.c: New test. * gcc.target/s390/call31-z9.c: New test. * gcc.target/s390/call64-z10-pic-nodatarel.c: New test. * gcc.target/s390/call64-z10-pic.c: New test. * gcc.target/s390/call64-z10.c: New test. * gcc.target/s390/call64-z9-pic-nodatarel.c: New test. * gcc.target/s390/call64-z9-pic.c: New test. * gcc.target/s390/call64-z9.c: New test. --- gcc/config/s390/s390.c| 9 + gcc/config/s390/s390.md | 26 ++--- gcc/testsuite/g++.dg/ext/visibility/noPLT.C | 2 +- gcc/testsuite/gcc.target/s390/call.h | 38 +++ .../s390/call31-z10-pic-nodatarel.c | 16 .../gcc.target/s390/call31-z10-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 .../s390/call64-z10-pic-nodatarel.c | 17 + .../gcc.target/s390/call64-z10-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z10.c| 15 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 + gcc/testsuite/gcc.target/s390/call64-z9.c | 15 gcc/testsuite/gcc.target/s390/nodatarel-1.c | 26 + gcc/testsuite/gcc.target/s390/pr80080-4.c | 2 +- gcc/testsuite/gcc.target/s390/risbg-ll-3.c| 6 +-- 19 files changed, 258 insertions(+), 43 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/call.h create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..e7839044a40 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr) 'E': print opcode suffix for branch on index instruction. 'G': print the size of the operand in bytes. 'J': print tls_load/tls_gdcall/tls_ldcall suffix +'K': print @PLT suffix for call targets and load address values. 'M': print the second word of a TImode operand. 'N': print the second word of a DImode operand. 'O': print only the displacement of a memory reference or address. @@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code) case 'Y': print_shift_count_operand (file, x); return; + +case 'K': + if (TARGET_64BIT + && flag_pic + && GET_CODE (x) == SYMBOL_REF + && SYMBOL_REF_FUNCTION_P (x)) + fprintf (file, "@PLT"); + return
[PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas). I've also noticed that one of the nops that we generate for -mnop-mcount is not needed now and removed it. A couple tests needed to be adjusted after that. s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. gcc/testsuite/ChangeLog: * gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new prologue size. * gcc.target/s390/mnop-mcount-m64.c: Likewise. --- gcc/config/s390/s390.c| 42 +++ gcc/config/s390/s390.h| 2 + .../gcc.target/s390/mnop-mcount-m31-mzarch.c | 2 +- .../gcc.target/s390/mnop-mcount-m64.c | 2 +- 4 files changed, 20 insertions(+), 28 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..590dd8f35bc 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13110,33 +13110,25 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED) { - rtx op[8]; - - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); + rtx op[4]; fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM); op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); - op[7] = GEN_INT (UNITS_PER_LONG); - - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; + op[3] = GEN_INT (UNITS_PER_LONG); - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); + op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { - op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT); - op[4] = gen_rtx_CONST (Pmode, op[4]); + op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT); + op[2] = gen_rtx_CONST (Pmode, op[2]); } if (flag_record_mcount) @@ -13150,20 +13142,19 @@ s390_function_profiler (FILE *file, int labelno) warning (OPT_Wcannot_profile, "nested functions cannot be profiled " "with %<-mfentry%> on s390"); else - output_asm_insn ("brasl\t0,%4", op); + output_asm_insn ("brasl\t0,%2", op); } else if (TARGET_64BIT) { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 + -/* brasl */ 3 + /* lg */ 3); + output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 + +/* lg */ 3); else { output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); @@ -13172,15 +13163,14 @@ s390_function_profiler (FILE *file, int labelno) else { if (flag_nop_mcount) - output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 + -/* brasl */ 3 + /* l */ 2); + output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 + +/* l */ 2); else { output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) - output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); - output_asm_insn ("brasl\t%0,%4", op); + output_asm_insn (".cfi_rel_offset\t%0,%3", op); + output_asm_insn ("brasl\t%0,%2", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_restore\t%0", op); diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines
[PATCH] IBM Z: Define NO_PROFILE_COUNTERS
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? s390 glibc does not need counters in the .data section, since it stores edge hits in its own data structure. Therefore counters only waste space and confuse diffing tools (e.g. kpatch), so don't generate them. gcc/ChangeLog: * config/s390/s390.c (s390_function_profiler): Ignore labelno parameter. * config/s390/s390.h (NO_PROFILE_COUNTERS): Define. --- gcc/config/s390/s390.c | 14 ++ gcc/config/s390/s390.h | 2 ++ 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 6bbeb640e1f..96c9a9db53b 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw) } } -/* Output assembler code to FILE to increment profiler label # LABELNO - for profiling a function entry. */ +/* Output assembler code to FILE to call a profiler hook. */ void -s390_function_profiler (FILE *file, int labelno) +s390_function_profiler (FILE *file, int /* labelno */) { rtx op[8]; - char label[128]; - ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno); - fprintf (file, "# function profiler \n"); op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM); @@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno) op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG)); op[7] = GEN_INT (UNITS_PER_LONG); - op[2] = gen_rtx_REG (Pmode, 1); - op[3] = gen_rtx_SYMBOL_REF (Pmode, label); - SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL; - op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount"); if (flag_pic) { @@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno) output_asm_insn ("stg\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); output_asm_insn ("brasl\t%0,%4", op); output_asm_insn ("lg\t%0,%1", op); if (flag_dwarf2_cfi_asm) @@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno) output_asm_insn ("st\t%0,%1", op); if (flag_dwarf2_cfi_asm) output_asm_insn (".cfi_rel_offset\t%0,%7", op); - output_asm_insn ("larl\t%2,%3", op); output_asm_insn ("brasl\t%0,%4", op); output_asm_insn ("l\t%0,%1", op); if (flag_dwarf2_cfi_asm) diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h index 3b876160420..fb16a455a03 100644 --- a/gcc/config/s390/s390.h +++ b/gcc/config/s390/s390.h @@ -787,6 +787,8 @@ CUMULATIVE_ARGS; #define PROFILE_BEFORE_PROLOGUE 1 +#define NO_PROFILE_COUNTERS 1 + /* Trampolines for nested functions. */ -- 2.31.1
[PATCH] IBM Z: Remove match_scratch workaround
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Since commit dd1ef00c45ba ("Fix bug in the define_subst handling that made match_scratch unusable for multi-alternative patterns.") the workaround for that bug in *ashrdi3_31 is not only no longer necessary, but actually breaks the build. Get rid of it by using only one alternative in (match_scratch). It will be replicated as many times as needed in order to match the pattern with which (define_subst) is used. gcc/ChangeLog: * config/s390/s390.md(*ashrdi3_31): Use a single constraint. * config/s390/subst.md(cconly_subst): Use a single constraint in (match_scratch). gcc/testsuite/ChangeLog: * gcc.target/s390/ashr.c: New test. --- gcc/config/s390/s390.md | 14 -- gcc/config/s390/subst.md | 2 +- gcc/testsuite/gcc.target/s390/ashr.c | 11 +++ 3 files changed, 16 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/ashr.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 7faf775fbf2..0c5b4dc9029 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -9328,19 +9328,13 @@ "" "") -; FIXME: The number of alternatives is doubled here to match the fix -; number of 2 in the subst pattern for the (clobber (match_scratch... -; The right fix should be to support match_scratch in the output -; pattern of a define_subst. (define_insn "*ashrdi3_31" - [(set (match_operand:DI 0 "register_operand" "=d, d") -(ashiftrt:DI (match_operand:DI 1 "register_operand" "0, 0") - (match_operand:QI 2 "shift_count_operand" "jsc,jsc"))) + [(set (match_operand:DI 0 "register_operand" "=d") +(ashiftrt:DI (match_operand:DI 1 "register_operand" "0") + (match_operand:QI 2 "shift_count_operand" "jsc"))) (clobber (reg:CC CC_REGNUM))] "!TARGET_ZARCH" - "@ - srda\t%0,%Y2 - srda\t%0,%Y2" + "srda\t%0,%Y2" [(set_attr "op_type" "RS") (set_attr "atype" "reg")]) diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md index 384af11c198..3ea6fc40ba8 100644 --- a/gcc/config/s390/subst.md +++ b/gcc/config/s390/subst.md @@ -45,7 +45,7 @@ "s390_match_ccmode(insn, CCSmode)" [(set (reg CC_REGNUM) (compare (match_dup 1) (const_int 0))) - (clobber (match_scratch:DSI 0 "=d,d"))]) + (clobber (match_scratch:DSI 0 "=d"))]) (define_subst_attr "cconly" "cconly_subst" "" "_cconly") diff --git a/gcc/testsuite/gcc.target/s390/ashr.c b/gcc/testsuite/gcc.target/s390/ashr.c new file mode 100644 index 000..8cffdfa9a1d --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/ashr.c @@ -0,0 +1,11 @@ +/* Test the arithmetic shift right pattern. */ + +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int e(void); + +int f (long c, int b) +{ + return (c >> b) && e (); +} -- 2.31.1
Re: [PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()
On Fri, 2021-04-30 at 08:49 +0200, Andreas Krebbel wrote: > On 4/28/21 3:48 AM, Ilya Leoshkevich wrote: > > Bootstrapped and regtested on s390x-redhat-linux. Tested with > > valgrind > > too (PR 100278 is now fixed). Ok for master? > > > > v1: > > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html > > v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is > > more > > on the "obviously correct" side than gen_raw_SUBREG(). > > > > > > > > gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard > > registers, > > since the subregs they create do not pass validation. Change > > s390_md_asm_adjust() to manually copy between hard VRs and FPRs > > instead > > of using these two functions. > > > > gcc/ChangeLog: > > > > PR target/100217 > > * config/s390/s390.c (s390_hard_fp_reg_p): New function. > > (s390_md_asm_adjust): Handle hard registers. > > > > gcc/testsuite/ChangeLog: > > > > PR target/100217 > > * gcc.target/s390/vector/long-double-asm-in-out-hard-fp- > > reg.c: New test. > > * gcc.target/s390/vector/long-double-asm-inout-hard-fp- > > reg.c: New test. > > Ok. Thanks! > > Andreas Thanks! I forgot to ask: ok for gcc-11 branch?
[PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()
Bootstrapped and regtested on s390x-redhat-linux. Tested with valgrind too (PR 100278 is now fixed). Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is more on the "obviously correct" side than gen_raw_SUBREG(). gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers, since the subregs they create do not pass validation. Change s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead of using these two functions. gcc/ChangeLog: PR target/100217 * config/s390/s390.c (s390_hard_fp_reg_p): New function. (s390_md_asm_adjust): Handle hard registers. gcc/testsuite/ChangeLog: PR target/100217 * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test. * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test. --- gcc/config/s390/s390.c| 52 +-- .../long-double-asm-in-out-hard-fp-reg.c | 33 .../long-double-asm-inout-hard-fp-reg.c | 31 +++ 3 files changed, 112 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index a9c945c5ee9..88361f98c7e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint) return seen_f_p && !seen_v_p; } +/* Return TRUE iff X is a hard floating-point (and not a vector) register. */ + +static bool +s390_hard_fp_reg_p (rtx x) +{ + if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x))) +return false; + + tree decl = REG_EXPR (x); + if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl))) +return false; + + const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + return name[0] == '*' && name[1] == 'f'; +} + /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" constraints when long doubles are stored in vector registers. */ @@ -16787,9 +16804,24 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, gcc_assert (allows_reg); gcc_assert (!is_inout); /* Copy output value from a FPR pair into a vector register. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); + rtx fprx2; push_to_sequence2 (after_md_seq, after_md_end); - emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + if (s390_hard_fp_reg_p (outputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i])); + /* The first half is already at the correct location, copy only the + * second one. Use the UNSPEC pattern instead of the SUBREG one, + * since s390_can_change_mode_class() rejects + * (subreg:DF (reg:TF %fN) 8) and thus subreg validation fails. */ + rtx v1 = gen_rtx_REG (V2DFmode, REGNO (outputs[i])); + rtx v3 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]) + 1); + emit_insn (gen_vec_permiv2df (v1, v1, v3, const0_rtx)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + } after_md_seq = get_insns (); after_md_end = get_last_insn (); end_sequence (); @@ -16813,8 +16845,20 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, continue; gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); - emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + rtx fprx2; + if (s390_hard_fp_reg_p (inputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i])); + /* Copy only the second half. */ + rtx v1 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]) + 1); + rtx v2 = gen_rtx_REG (V2DFmode, REGNO (inputs[i])); + emit_insn (gen_vec_permiv2df (v1, v2, v1, GEN_INT (3))); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + } inputs[i] = fprx2; input_modes[i] = FPRX2mode; } diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c new file mode 100644 index 000..2dcaf08f00b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ +/* { dg-do run { target { s390_z14_hw } } } */ +#include +#include + +__attribute__ ((noipa)) static long double +sqxbr (long double x) +{ + register long double in asm("f0") = x; + register long double out asm("f1"); + + asm("sqxbr\t%0,%1" :
[PATCH] IBM Z: Handle hard registers in s390_md_asm_adjust()
Bootstrapped and regtested on s390x-redhat-linux. Tested with valgrind on top of 52a5515ed (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278). Ok for master? gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers, since the subregs they create do not pass validation. Change s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead of using these two functions. gcc/ChangeLog: PR target/100217 * config/s390/s390.c (s390_hard_fp_reg_p): New function. (s390_md_asm_adjust): Handle hard registers. * config/s390/vector.md (*df_to_tf_1): New pattern. gcc/testsuite/ChangeLog: PR target/100217 * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test. * gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test. --- gcc/config/s390/s390.c| 50 +-- gcc/config/s390/vector.md | 8 +++ .../long-double-asm-in-out-hard-fp-reg.c | 28 +++ .../long-double-asm-inout-hard-fp-reg.c | 27 ++ 4 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index a9c945c5ee9..ed6cea9b1f7 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint) return seen_f_p && !seen_v_p; } +/* Return TRUE iff X is a hard floating-point (and not a vector) register. */ + +static bool +s390_hard_fp_reg_p (rtx x) +{ + if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x))) +return false; + + tree decl = REG_EXPR (x); + if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl))) +return false; + + const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + return name[0] == '*' && name[1] == 'f'; +} + /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" constraints when long doubles are stored in vector registers. */ @@ -16787,9 +16804,23 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, gcc_assert (allows_reg); gcc_assert (!is_inout); /* Copy output value from a FPR pair into a vector register. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); + rtx fprx2; push_to_sequence2 (after_md_seq, after_md_end); - emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + if (s390_hard_fp_reg_p (outputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i])); + /* The first half is already at the correct location, copy only the + * second one. Use gen_rtx_raw_SUBREG() in order to skip subreg + * validation - we need to build (subreg:DF (reg:TF %fN) 8), which + * will otherwise be rejected by s390_can_change_mode_class(). */ + emit_move_insn (gen_rtx_raw_SUBREG (DFmode, outputs[i], 8), + simplify_gen_subreg (DFmode, fprx2, FPRX2mode, 8)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + } after_md_seq = get_insns (); after_md_end = get_last_insn (); end_sequence (); @@ -16813,8 +16844,19 @@ s390_md_asm_adjust (vec &outputs, vec &inputs, continue; gcc_assert (allows_reg); /* Copy input value from a vector register into a FPR pair. */ - rtx fprx2 = gen_reg_rtx (FPRX2mode); - emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + rtx fprx2; + if (s390_hard_fp_reg_p (inputs[i])) + { + fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i])); + /* Copy only the second half. */ + emit_move_insn (gen_rtx_raw_SUBREG (DFmode, fprx2, 8), + gen_rtx_raw_SUBREG (DFmode, inputs[i], 8)); + } + else + { + fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + } inputs[i] = fprx2; input_modes[i] = FPRX2mode; } diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index c80d582a300..648e00625e1 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -634,6 +634,14 @@ } [(set_attr "op_type" "VRR,*")]) +(define_insn "*df_to_tf_1" + [(set (subreg:DF (match_operand:TF 0 "nonimmediate_operand" "+v") 8) + (match_operand:DF1 "general_operand" "f"))] + "TARGET_VXE" + ; M4 == 0 corresponds to %v0[0] = %v0[0]; %v0[1] = %v1[0]; + "vpdi\t%v0,%v0,%v1,0" + [(set_attr "op_type" "VRR")]) + (define_insn "*vec_ti_to_v1ti" [(set (match_operand:V1TI 0 "nonimmediate_operand" "=v,v,R, v, v,v") (vec_duplicate:V1TI (match_operand:TI 1 "general_operand" "v,R,v,j00,jm1,d")))] diff --git a/gcc/testsuite/gcc.tar
Re: [PATCH v3] fwprop: Fix single_use_p calculation
On Tue, 2021-03-23 at 12:48 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > +inline use_info * > > +set_info::single_nondebug_use () const > > +{ > > + use_info *nondebug_insn = single_nondebug_insn_use (); > > + if (nondebug_insn) > > + return has_phi_uses () ? nullptr : nondebug_insn; > > + use_info *phi = single_phi_use (); > > + if (phi) > > + return has_nondebug_insn_uses() ? nullptr : phi; > > + return nullptr; > > Very minor, but I think this is simpler as: > > if (!has_phi_uses ()) > return single_nondebug_insn_use (); > if (!has_nondebug_insn_uses ()) > return single_phi_use (); > return nullptr; > > OK with that change (or without if you prefer the original). > Thanks for the fix and for your patience. :-) > > Richard Retested with the change above and pushed as: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b61461ac7f9bdd0e98145be79423d19b933afaa0 Thanks for all the suggestions! Best regards, Ilya
[PATCH v3] fwprop: Fix single_use_p calculation
Bootstrap and regtest running on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html v1 -> v2: Pass a set_info instead of a def_info around. Add single_nondebug_insn_use () - maybe this could be improved further? [1] Simplify def->insn ()->ebb (). Improve formatting. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567121.html v2 -> v3: Introduce single_nondebug_use and single_phi_use methods. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html --- Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at set_info's uses instead of insn_info's uses. This requires passing around set_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (fwprop_propagation::fwprop_propagation): Look at set_info's uses. (try_fwprop_subst_note): Use set_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * rtl-ssa/accesses.h (set_info::single_nondebug_use) New method. (set_info::single_nondebug_insn_use): Likewise. (set_info::single_phi_use): Likewise. * rtl-ssa/member-fns.inl (set_info::single_nondebug_use) New method. (set_info::single_nondebug_insn_use): Likewise. (set_info::single_phi_use): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: New test. --- gcc/fwprop.c | 81 +-- gcc/rtl-ssa/accesses.h| 13 +++ gcc/rtl-ssa/member-fns.inl| 30 +++ .../s390/vector/long-double-asm-abi.c | 26 ++ 4 files changed, 109 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..d7203672886 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, set_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,13 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + set_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def->single_nondebug_use ()), +single_ebb_p (use_insn->ebb () == def->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,24 +368,25 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. - If REQUIRE_CONSTANT is true, require all substituted occurences of SRC + If REQUIRE_CONSTANT is true, require all substituted occurrences of SRC to fold to a constant, so that the note does not use any more registers than it did previously. If REQUIRE_CONSTANT is false, also allow the substitution if it's something we'd normally allow for the main instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, set_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); i
Re: [PATCH] fwprop: Fix single_use_p calculation
On Mon, 2021-03-22 at 22:55 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote: > > > Ilya Leoshkevich writes: > > > > [...] > > > > > > Do you still want me to add single_nondebug_use() for > > > > completeness > > > > in > > > > this patch, or would it be better to add it later when it's > > > > actually > > > > needed? > > > > > > I was thinking that the fwprop.c code would use > > > def->single_nondebug_use () instead of > > > def->single_nondebug_insn_use () && !def->has_phi_uses (). > > > > But these two are not equivalent, are they? single_nondebug_use() > > that you proposed explicitly allows phis: > > > > // If there is exactly one nondebug use of the set's result, > > // return that use, otherwise return null. The use might be in > > // instruction or a phi node. > > use_info *single_nondebug_use () const; > > > > but I don't think we want to propagate into phis here. > > Or should the check be a bit bigger, like the following? > > But we're in the process of substituting the definition into an > insn use. So we know that an insn use exists. I think the > question we're trying to answer is: is this insn use the only > nondebug use? I'd rather test that with a single accessor rather > than break it down into individual data structure tests. Ah, you are absolutely right - now I get it. Please ignore the v2 then, I will send a v3.
[PATCH] fwprop: Fix single_use_p calculation
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html v1 -> v2: Pass a set_info instead of a def_info around. Add single_nondebug_insn_use () - maybe this could be improved further? [1] Simplify def->insn ()->ebb (). Improve formatting. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html --- Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at set_info's uses instead of insn_info's uses. This requires passing around set_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (fwprop_propagation::fwprop_propagation): Look at set_info's uses. (try_fwprop_subst_note): Use set_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * rtl-ssa/accesses.h (set_info::single_nondebug_insn_use): New method. * rtl-ssa/member-fns.inl (set_info::single_nondebug_insn_use): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-abi.c: New test. --- gcc/fwprop.c | 79 +-- gcc/rtl-ssa/accesses.h| 4 + gcc/rtl-ssa/member-fns.inl| 9 +++ .../s390/vector/long-double-asm-abi.c | 26 ++ 4 files changed, 78 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..6173c9248eb 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, set_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,13 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + set_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def->single_nondebug_insn_use () && !def->has_phi_uses ()), +single_ebb_p (use_insn->ebb () == def->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,9 +368,9 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. If REQUIRE_CONSTANT is true, require all substituted occurences of SRC to fold to a constant, so that the note does not use any more registers @@ -379,13 +379,14 @@ contains_paradoxical_subreg_p (rtx x) instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, set_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -436,19 +437,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, return prop.num_replacements; } -/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of +/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of USE_INSN's pattern. Return true on success, otherwise leave US
Re: [PATCH] fwprop: Fix single_use_p calculation
On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: [...] > > Do you still want me to add single_nondebug_use() for completeness > > in > > this patch, or would it be better to add it later when it's > > actually > > needed? > > I was thinking that the fwprop.c code would use > def->single_nondebug_use () instead of > def->single_nondebug_insn_use () && !def->has_phi_uses (). But these two are not equivalent, are they? single_nondebug_use() that you proposed explicitly allows phis: // If there is exactly one nondebug use of the set's result, // return that use, otherwise return null. The use might be in // instruction or a phi node. use_info *single_nondebug_use () const; but I don't think we want to propagate into phis here. Or should the check be a bit bigger, like the following? use_info *single = def->single_nondebug_use (); single_use_p = single && !single->is_in_phi (); [...] Best regards, Ilya
Re: [PATCH] fwprop: Fix single_use_p calculation
On Sun, 2021-03-21 at 13:19 +, Richard Sandiford wrote: > Ilya Leoshkevich writes: > > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat- > > linux > > and s390x-redhat-linux. Ok for master? > > Given what was said downthread, I agree we should fix this for GCC > 11. > Sorry for missing this problem in the initial review. > > > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) > > simplifications") > > introduced a check that was supposed to look at the propagated > > def's > > number of uses. It uses insn_info::num_uses (), which in reality > > returns the number of uses def's insn has. The whole change > > therefore > > works only by accident. > > > > Fix by looking at def_info's uses instead of insn_info's uses. > > This > > requires passing around def_info instead of insn_info. > > > > gcc/ChangeLog: > > > > 2021-03-02 Ilya Leoshkevich > > > > * fwprop.c (def_has_single_use_p): New function. > > (fwprop_propagation::fwprop_propagation): Look at > > def_info's uses. > > (try_fwprop_subst_note): Use def_info instead of insn_info. > > (try_fwprop_subst_pattern): Likewise. > > (try_fwprop_subst_notes): Likewise. > > (try_fwprop_subst): Likewise. > > (forward_propagate_subreg): Likewise. > > (forward_propagate_and_simplify): Likewise. > > (forward_propagate_into): Likewise. > > * iterator-utils.h (single_element_p): New function. > > --- > > gcc/fwprop.c | 89 ++-- > > > > gcc/iterator-utils.h | 10 + > > 2 files changed, 62 insertions(+), 37 deletions(-) > > > > diff --git a/gcc/fwprop.c b/gcc/fwprop.c > > index 4b8a554e823..478dcdd96cc 100644 > > --- a/gcc/fwprop.c > > +++ b/gcc/fwprop.c > > @@ -175,7 +175,7 @@ namespace > > static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; > > static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; > > > > - fwprop_propagation (insn_info *, insn_info *, rtx, rtx); > > + fwprop_propagation (insn_info *, def_info *, rtx, rtx); > > use->def () returns a set_info *, and since you want set_info stuff, > I think it would probably be better to pass around a set_info * > instead. > (Let's keep the variable names the same though. “def” is still > accurate > and IMO the natural choice.) > > > @@ -191,13 +191,27 @@ namespace > > }; > > } > > > > -/* Prepare to replace FROM with TO in INSN. */ > > +/* Return true if DEF has a single non-debug non-phi use. */ > > + > > +static bool > > +def_has_single_use_p (def_info *def) > > +{ > > + if (!is_a (def)) > > + return false; > > + > > + set_info *set = as_a (def); > > + > > + return single_element_p (set->nondebug_insn_uses ()) > > + && !set->has_phi_uses (); > > I think instead we should add: > > // If exactly one nondebug instruction uses the set's result, > return > // the use by that instruction, otherwise return null. > use_info *single_nondebug_insn_use () const; > > // If there is exactly one nondebug use of the set's result, > // return that use, otherwise return null. The use might be in > // instruction or a phi node. > use_info *single_nondebug_use () const; > > before the declaration of set_info::is_local_to_ebb. > > > +} > > + > > +/* Prepare to replace FROM with TO in USE_INSN. */ > > > > fwprop_propagation::fwprop_propagation (insn_info *use_insn, > > - insn_info *def_insn, rtx > > from, rtx to) > > + def_info *def, rtx from, > > rtx to) > > : insn_propagation (use_insn->rtl (), from, to), > > - single_use_p (def_insn->num_uses () == 1), > > - single_ebb_p (use_insn->ebb () == def_insn->ebb ()) > > + single_use_p (def_has_single_use_p (def)), > > + single_ebb_p (use_insn->ebb () == def->insn ()->ebb ()) > > Just def->ebb () > > > @@ -538,7 +554,7 @@ try_fwprop_subst_pattern (obstack_watermark > > &attempt, insn_change &use_change, > > { > > if ((REG_NOTE_KIND (note) == REG_EQUAL > > || REG_NOTE_KIND (note) == REG_EQUIV) > > - && try_fwprop_subst_note (use_insn, def_insn, note, > > + && try_fwprop_subst_note (use_insn,
[PATCH] IBM Z: Fix "+fvm" constraint with long doubles
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? When a long double is passed to an asm statement with a "+fvm" constraint, a LRA loop occurs. This happens, because LRA chooses the widest register class in this case (VEC_REGS), but the code generated by s390_md_asm_adjust() always wants FP_REGS. Mismatching register classes cause infinite reloading. Fix by treating "fv" constraints as "v" in s390_md_asm_adjust(). gcc/ChangeLog: * config/s390/s390.c (f_constraint_p): Treat "fv" constraints as "v". gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test. --- gcc/config/s390/s390.c | 12 ++-- .../s390/vector/long-double-asm-fprvrmem.c | 11 +++ 2 files changed, 21 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 151136bedbc..f7b1c03561e 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode) static bool f_constraint_p (const char *constraint) { + bool seen_f_p = false; + bool seen_v_p = false; + for (size_t i = 0, c_len = strlen (constraint); i < c_len; i += CONSTRAINT_LEN (constraint[i], constraint + i)) { if (constraint[i] == 'f') - return true; + seen_f_p = true; + if (constraint[i] == 'v') + seen_v_p = true; } - return false; + + /* Treat "fv" constraints as "v", because LRA will choose the widest register + * class. */ + return seen_f_p && !seen_v_p; } /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c new file mode 100644 index 000..f95656c5723 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z14 -mzarch" } */ + +long double +foo (long double x) +{ + x = x * x; + asm("# %0" : "+fvm"(x)); + x = x + x; + return x; +} -- 2.29.2
[PATCH v3] IBM Z: Fix usage of "f" constraint with long doubles
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html v1 -> v2: - Handle constraint modifiers, use AR constraint instead of R, add testcases for & and %. v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html v2 -> v3: - The main prereq is now committed: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566237.html - Dropped long-double-asm-abi.c test, because its prereq is not approved (yet): https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566218.html - Removed superfluous constraint pointer increment. After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (f_constraint_p): New function. (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-commutative.c: New test. * gcc.target/s390/vector/long-double-asm-earlyclobber.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-asm-matching.c: New test. * gcc.target/s390/vector/long-double-asm-regmem.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 86 +++ .../s390/vector/long-double-asm-commutative.c | 16 .../vector/long-double-asm-earlyclobber.c | 17 .../s390/vector/long-double-asm-in-out.c | 14 +++ .../s390/vector/long-double-asm-inout.c | 14 +++ .../s390/vector/long-double-asm-matching.c| 13 +++ .../s390/vector/long-double-asm-regmem.c | 8 ++ .../vector/long-double-volatile-from-i64.c| 22 + 8 files changed, 190 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-regmem.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index f3d0d1ba596..68dc3c58c1b 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16698,6 +16698,89 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional + modifiers. */ + +static bool +f_constraint_p (const char *constraint) +{ + for (size_t i = 0, c_len = strlen (constraint); i < c_len; + i += CONSTRAINT_LEN (constraint[i], constraint + i)) +{ + if (constraint[i] == 'f') + return true; +} + return false; +} + +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (!f_constraint_p (constraint)) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!is_inout); + /* Copy output va
Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
On Wed, 2021-03-03 at 21:26 +0100, Ilya Leoshkevich via Gcc-patches wrote: > On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote: > > > > > > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote: > > > Hello, > > > > > > I would like to ping the following patch: > > > > > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html > > > > > > It is needed for the following regression fix: > > > > > > IBM Z: Fix usage of "f" constraint with long doubles > > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html > > > > > > > > > Jakub, who would be the right person to review this change? I've > > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows > > > that > > > you deal with this code a lot. > > > > > > Best regards, > > > Ilya > > > > > > > > > > > > > > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which > > > should be ok as long as the hook itself as well as after_md_seq > > > make up > > > for it), input_mode will contain stale information. > > > > > > It might be tempting to fix this by removing input_mode altogether > > > and > > > just using GET_MODE (), but this will not work correctly with > > > constants. > > > So add input_modes parameter and document that it should be updated > > > whenever inputs parameter is updated. > > > > > > gcc/ChangeLog: > > > > > > 2021-01-05 Ilya Leoshkevich > > > > > > * cfgexpand.c (expand_asm_loc): Pass new parameter. > > > (expand_asm_stmt): Likewise. > > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add > > > new > > > parameter. > > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. > > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. > > > * config/cris/cris.c (cris_md_asm_adjust): Likewise. > > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise. > > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust): > > > Likewise. > > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. > > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. > > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. > > > * config/vax/vax.c (vax_md_asm_adjust): Likewise. > > > * config/visium/visium.c (visium_md_asm_adjust): Likewise. > > > * target.def (md_asm_adjust): Likewise. > > Ugh. A couple questions > > Are there any cases where you're going to want to change modes for > > arguments that were constants? I'm a bit surprised that we don't > > have > > a mode for constants for the cases that we care about. Presumably we > > can get a (modeless) CONST_INT here and we're not restricted to > > CONST_DOUBLE and friends (which have modes). > > Yes, this might happen. For example, here: > > asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L)); > > the (const_double) and the corresponding operand will initially have > the mode TFmode. s390_md_asm_adjust () will add a conversion from > TFmode to FPRX2mode and change the argument accordingly. Just to be more precise: the mode of the (const_double) itself will not change. Here is the resulting RTL for the asm statement above: # s390_md_asm_adjust () step 1: put the (const_double) operand into a # new (reg) with the same mode (insn (set (reg:TF 63) (const_double:TF ...))) # s390_md_asm_adjust () step 2: convert a reg from TFmode to FPRX2mode (insn (set (reg:FPRX2 65) (subreg:FPRX2 (reg:TF 63) 0))) # s390_md_asm_adjust () step 3: replace the original operand with the # resulting (reg), adjust (asm_input) accordingly (insn (set (reg:FPRX2 64) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(reg:FPRX2 65)] [(asm_input:FPRX2 ("f"))])))
Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote: > > > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote: > > Hello, > > > > I would like to ping the following patch: > > > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html > > > > It is needed for the following regression fix: > > > > IBM Z: Fix usage of "f" constraint with long doubles > > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html > > > > > > Jakub, who would be the right person to review this change? I've > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows > > that > > you deal with this code a lot. > > > > Best regards, > > Ilya > > > > > > > > > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which > > should be ok as long as the hook itself as well as after_md_seq > > make up > > for it), input_mode will contain stale information. > > > > It might be tempting to fix this by removing input_mode altogether > > and > > just using GET_MODE (), but this will not work correctly with > > constants. > > So add input_modes parameter and document that it should be updated > > whenever inputs parameter is updated. > > > > gcc/ChangeLog: > > > > 2021-01-05 Ilya Leoshkevich > > > > * cfgexpand.c (expand_asm_loc): Pass new parameter. > > (expand_asm_stmt): Likewise. > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add > > new > > parameter. > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. > > * config/cris/cris.c (cris_md_asm_adjust): Likewise. > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise. > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust): > > Likewise. > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. > > * config/vax/vax.c (vax_md_asm_adjust): Likewise. > > * config/visium/visium.c (visium_md_asm_adjust): Likewise. > > * target.def (md_asm_adjust): Likewise. > Ugh. A couple questions > Are there any cases where you're going to want to change modes for > arguments that were constants? I'm a bit surprised that we don't > have > a mode for constants for the cases that we care about. Presumably we > can get a (modeless) CONST_INT here and we're not restricted to > CONST_DOUBLE and friends (which have modes). Yes, this might happen. For example, here: asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L)); the (const_double) and the corresponding operand will initially have the mode TFmode. s390_md_asm_adjust () will add a conversion from TFmode to FPRX2mode and change the argument accordingly. However, this is not the problematic case that I refer to in the commit message: I caught some failures in the testsuite that I tracked down to (const_int)s, which, like you mentioned, don't have a mode. > Is input_modes read after the call to md_asm_adjust? I'm trying to > figure out why we'd need to update it. Yes, its contents goes into (asm_operand)'s (asm_input). If we don't adjust it, (asm_input)s will no longer be consistent with input operand RTXes. > Not acking or naking at this point, I just want to make sure I > understand what's going on. > > jeff
Re: [PATCH] fwprop: Fix single_use_p calculation
On Wed, 2021-03-03 at 11:34 -0700, Jeff Law wrote: > > > On 3/2/21 3:37 PM, Ilya Leoshkevich via Gcc-patches wrote: > > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat- > > linux > > and s390x-redhat-linux. Ok for master? > > > > > > > > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) > > simplifications") > > introduced a check that was supposed to look at the propagated > > def's > > number of uses. It uses insn_info::num_uses (), which in reality > > returns the number of uses def's insn has. The whole change > > therefore > > works only by accident. > > > > Fix by looking at def_info's uses instead of insn_info's uses. > > This > > requires passing around def_info instead of insn_info. > > > > gcc/ChangeLog: > > > > 2021-03-02 Ilya Leoshkevich > > > > * fwprop.c (def_has_single_use_p): New function. > > (fwprop_propagation::fwprop_propagation): Look at > > def_info's uses. > > (try_fwprop_subst_note): Use def_info instead of insn_info. > > (try_fwprop_subst_pattern): Likewise. > > (try_fwprop_subst_notes): Likewise. > > (try_fwprop_subst): Likewise. > > (forward_propagate_subreg): Likewise. > > (forward_propagate_and_simplify): Likewise. > > (forward_propagate_into): Likewise. > > * iterator-utils.h (single_element_p): New function. > Given we're well into stage4, I'd recommend deferring to gcc-12 > unless > this fixes a code correctness issue. > > Jeff > Fortunately the issue here is not a miscompilation, but it's still a regression: on s390 small functions that use long doubles get a number of useless load/stores as well as a stack frame, where none was required before. Basically, the same issue efb6bc55a93a failed to fully fix due to the num_uses() / nondebug_insn_uses() mixup.
Re: [PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+
On Wed, 2021-03-03 at 07:50 +0100, Andreas Krebbel wrote: > On 3/2/21 11:59 PM, Ilya Leoshkevich wrote: > > mul-signed-overflow-*.c execution tests fail on z13, because they > > contain z14-specific instructions. Fix by requiring s390_z14_hw > > target. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/s390/mul-signed-overflow-1.c: Run only on > > z14+. > > * gcc.target/s390/mul-signed-overflow-2.c: Likewise. > > I did that change yesterday already. Ah, I haven't noticed. One difference between our patches is, though, that I also have `dg-do compile` - this way, compile tests still run on z13. [...]
[PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html Jakub, who would be the right person to review this change? I've decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows that you deal with this code a lot. Best regards, Ilya If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which should be ok as long as the hook itself as well as after_md_seq make up for it), input_mode will contain stale information. It might be tempting to fix this by removing input_mode altogether and just using GET_MODE (), but this will not work correctly with constants. So add input_modes parameter and document that it should be updated whenever inputs parameter is updated. gcc/ChangeLog: 2021-01-05 Ilya Leoshkevich * cfgexpand.c (expand_asm_loc): Pass new parameter. (expand_asm_stmt): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new parameter. * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. * config/cris/cris.c (cris_md_asm_adjust): Likewise. * config/i386/i386.c (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. * config/vax/vax.c (vax_md_asm_adjust): Likewise. * config/visium/visium.c (visium_md_asm_adjust): Likewise. * target.def (md_asm_adjust): Likewise. --- gcc/cfgexpand.c | 16 gcc/config/arm/aarch-common-protos.h | 8 gcc/config/arm/aarch-common.c| 7 --- gcc/config/arm/arm.c | 14 -- gcc/config/cris/cris.c | 7 --- gcc/config/i386/i386.c | 7 --- gcc/config/mn10300/mn10300.c | 7 --- gcc/config/nds32/nds32.c | 1 + gcc/config/pdp11/pdp11.c | 9 + gcc/config/rs6000/rs6000.c | 7 --- gcc/config/vax/vax.c | 3 ++- gcc/config/visium/visium.c | 12 +++- gcc/doc/tm.texi | 10 ++ gcc/target.def | 13 - 14 files changed, 69 insertions(+), 52 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index aef9e916fcd..a6b48d3e48f 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2880,6 +2880,7 @@ expand_asm_loc (tree string, int vol, location_t locus) rtx asm_op, clob; unsigned i, nclobbers; auto_vec input_rvec, output_rvec; + auto_vec input_mode; auto_vec constraints; auto_vec clobber_rvec; HARD_REG_SET clobbered_regs; @@ -2889,9 +2890,8 @@ expand_asm_loc (tree string, int vol, location_t locus) clobber_rvec.safe_push (clob); if (targetm.md_asm_adjust) - targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); + targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, + constraints, clobber_rvec, clobbered_regs); asm_op = body; nclobbers = clobber_rvec.length (); @@ -3068,8 +3068,8 @@ expand_asm_stmt (gasm *stmt) return; } - /* There are some legacy diagnostics in here, and also avoids a - sixth parameger to targetm.md_asm_adjust. */ + /* There are some legacy diagnostics in here, and also avoids an extra + parameter to targetm.md_asm_adjust. */ save_input_location s_i_l(locus); unsigned noutputs = gimple_asm_noutputs (stmt); @@ -3420,9 +3420,9 @@ expand_asm_stmt (gasm *stmt) the flags register. */ rtx_insn *after_md_seq = NULL; if (targetm.md_asm_adjust) -after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); +after_md_seq + = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, +constraints, clobber_rvec, clobbered_regs); /* Do not allow the hook to change the output and input count, lest it mess up the operand numbering. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 7a9cf3d324c..b6171e8668d 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -144,9 +144,9 @@ struct cpu_cost_table const struct vector_cost_table vect;
[PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+
mul-signed-overflow-*.c execution tests fail on z13, because they contain z14-specific instructions. Fix by requiring s390_z14_hw target. gcc/testsuite/ChangeLog: * gcc.target/s390/mul-signed-overflow-1.c: Run only on z14+. * gcc.target/s390/mul-signed-overflow-2.c: Likewise. --- gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c | 3 ++- gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c index fdf56d6e695..e8b1938dab7 100644 --- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c +++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c @@ -1,4 +1,5 @@ -/* { dg-do run } */ +/* { dg-do compile } */ +/* { dg-do run { target { s390_z14_hw } } } */ /* z14 only because we need msrkc, msc, msgrkc, msgc */ /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c index d0088188aa2..01328e1d286 100644 --- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c +++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c @@ -1,4 +1,5 @@ -/* { dg-do run } */ +/* { dg-do compile } */ +/* { dg-do run { target { s390_z14_hw } } } */ /* z14 only because we need msrkc, msc, msgrkc, msgc */ /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */ -- 2.29.2
[PATCH] fwprop: Fix single_use_p calculation
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications") introduced a check that was supposed to look at the propagated def's number of uses. It uses insn_info::num_uses (), which in reality returns the number of uses def's insn has. The whole change therefore works only by accident. Fix by looking at def_info's uses instead of insn_info's uses. This requires passing around def_info instead of insn_info. gcc/ChangeLog: 2021-03-02 Ilya Leoshkevich * fwprop.c (def_has_single_use_p): New function. (fwprop_propagation::fwprop_propagation): Look at def_info's uses. (try_fwprop_subst_note): Use def_info instead of insn_info. (try_fwprop_subst_pattern): Likewise. (try_fwprop_subst_notes): Likewise. (try_fwprop_subst): Likewise. (forward_propagate_subreg): Likewise. (forward_propagate_and_simplify): Likewise. (forward_propagate_into): Likewise. * iterator-utils.h (single_element_p): New function. --- gcc/fwprop.c | 89 ++-- gcc/iterator-utils.h | 10 + 2 files changed, 62 insertions(+), 37 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index 4b8a554e823..478dcdd96cc 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -175,7 +175,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (insn_info *, insn_info *, rtx, rtx); +fwprop_propagation (insn_info *, def_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -191,13 +191,27 @@ namespace }; } -/* Prepare to replace FROM with TO in INSN. */ +/* Return true if DEF has a single non-debug non-phi use. */ + +static bool +def_has_single_use_p (def_info *def) +{ + if (!is_a (def)) +return false; + + set_info *set = as_a (def); + + return single_element_p (set->nondebug_insn_uses ()) +&& !set->has_phi_uses (); +} + +/* Prepare to replace FROM with TO in USE_INSN. */ fwprop_propagation::fwprop_propagation (insn_info *use_insn, - insn_info *def_insn, rtx from, rtx to) + def_info *def, rtx from, rtx to) : insn_propagation (use_insn->rtl (), from, to), -single_use_p (def_insn->num_uses () == 1), -single_ebb_p (use_insn->ebb () == def_insn->ebb ()) +single_use_p (def_has_single_use_p (def)), +single_ebb_p (use_insn->ebb () == def->insn ()->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -368,9 +382,9 @@ contains_paradoxical_subreg_p (rtx x) return false; } -/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN. - Return the number of substitutions on success, otherwise return -1 and - leave USE_INSN unchanged. +/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of + USE_INSN. Return the number of substitutions on success, otherwise return + -1 and leave USE_INSN unchanged. If REQUIRE_CONSTANT is true, require all substituted occurences of SRC to fold to a constant, so that the note does not use any more registers @@ -379,13 +393,14 @@ contains_paradoxical_subreg_p (rtx x) instruction pattern. */ static int -try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, +try_fwprop_subst_note (insn_info *use_insn, def_info *def, rtx note, rtx dest, rtx src, bool require_constant) { rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def_insn, dest, src); + fwprop_propagation prop (use_insn, def, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -436,19 +451,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, return prop.num_replacements; } -/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of +/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of USE_INSN's pattern. Return true on success, otherwise leave USE_INSN unchanged. */ static bool try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, - insn_info *def_insn, rtx *loc, rtx dest, rtx src) + def_info *def, rtx *loc, rtx dest, rtx src) { insn_info *use_insn = use_change.insn (); rtx_insn *use_rtl = use_insn->rtl (); + insn_info *def_insn = def->insn (); insn_change_watermark watermark; - fwprop_propagation prop (use_insn, def
[PATCH 2/2] IBM Z: Fix long double <-> DFP conversions
When switching the s390 backend to store long doubles in vector registers, the patterns for long double <-> DFP conversions were forgotten. This did not cause observable problems so far, because libdfp calls are emitted instead of pfpo. However, when building libdfp itself, this leads to infinite recursion. gcc/ChangeLog: * config/s390/vector.md (trunctf2_vr): New pattern. (trunctf2): Likewise. (trunctdtf2_vr): Likewise. (trunctdtf2): Likewise. (extendtf2_vr): Likewise. (extendtf2): Likewise. (extendtftd2_vr): Likewise. (extendtftd2): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/long-double-from-decimal128.c: New test. * gcc.target/s390/vector/long-double-from-decimal32.c: New test. * gcc.target/s390/vector/long-double-from-decimal64.c: New test. * gcc.target/s390/vector/long-double-to-decimal128.c: New test. * gcc.target/s390/vector/long-double-to-decimal32.c: New test. * gcc.target/s390/vector/long-double-to-decimal64.c: New test. --- gcc/config/s390/vector.md | 72 +++ .../s390/vector/long-double-from-decimal128.c | 20 ++ .../s390/vector/long-double-from-decimal32.c | 20 ++ .../s390/vector/long-double-from-decimal64.c | 20 ++ .../s390/vector/long-double-to-decimal128.c | 19 + .../s390/vector/long-double-to-decimal32.c| 19 + .../s390/vector/long-double-to-decimal64.c| 19 + 7 files changed, 189 insertions(+) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index e48c965db00..bc52211c55e 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -2480,6 +2480,42 @@ "HAVE_TF (trunctfsf2)" { EXPAND_TF (trunctfsf2, 2); }) +(define_expand "trunctf2_vr" + [(match_operand:DFP_ALL 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP + && GET_MODE_SIZE (TFmode) > GET_MODE_SIZE (mode) + && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, operands[1])); + emit_insn (gen_truncfprx22 (operands[0], fprx2)); + DONE; +}) + +(define_expand "trunctf2" + [(match_operand:DFP_ALL 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "HAVE_TF (trunctf2)" + { EXPAND_TF (trunctf2, 2); }) + +(define_expand "trunctdtf2_vr" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:TD 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_trunctdfprx22 (fprx2, operands[1])); + emit_insn (gen_fprx2_to_tf (operands[0], fprx2)); + DONE; +}) + +(define_expand "trunctdtf2" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:TD 1 "nonimmediate_operand" "")] + "HAVE_TF (trunctdtf2)" + { EXPAND_TF (trunctdtf2, 2); }) + ; load lengthened (define_insn "extenddftf2_vr" @@ -2511,6 +2547,42 @@ "HAVE_TF (extendsftf2)" { EXPAND_TF (extendsftf2, 2); }) +(define_expand "extendtf2_vr" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:DFP_ALL 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP + && GET_MODE_SIZE (mode) < GET_MODE_SIZE (TFmode) + && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_extendfprx22 (fprx2, operands[1])); + emit_insn (gen_fprx2_to_tf (operands[0], fprx2)); + DONE; +}) + +(define_expand "extendtf2" + [(match_operand:TF 0 "nonimmediate_operand" "") + (match_operand:DFP_ALL 1 "nonimmediate_operand" "")] + "HAVE_TF (extendtf2)" + { EXPAND_TF (extendtf2, 2); }) + +(define_expand "extendtftd2_vr" + [(match_operand:TD 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "TARGET_HARD_DFP && TARGET_VXE" +{ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, operands[1])); + emit_insn (gen_extendfprx2td2 (operands[0], fprx2)); + DONE; +}) + +(define_expand "extendtftd2" + [(match_operand:TD 0 "nonimmediate_operand" "") + (match_operand:TF 1 "nonimmediate_operand" "")] + "HAVE_TF (extendtftd2)" + { EXPAND_TF (extendtftd2, 2); }) + ; test data class (define_expand "signbittf2_vr" diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c b/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c new file mode 100644 index 000..3cd2c68f5c6 --- /dev/null +++ b/gcc/testsui
[PATCH 1/2] IBM Z: Improve FPRX2 <-> TF conversions
gcc/ChangeLog: * config/s390/vector.md (*fprx2_to_tf): Rename to fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. --- gcc/config/s390/vector.md | 36 +++- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 0e3c31f5d4f..e48c965db00 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -616,12 +616,23 @@ vlvgp\t%v0,%1,%N1" [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")]) -(define_insn "*fprx2_to_tf" - [(set (match_operand:TF 0 "nonimmediate_operand" "=v") - (subreg:TF (match_operand:FPRX2 1 "general_operand" "f") 0))] +(define_insn_and_split "fprx2_to_tf" + [(set (match_operand:TF 0 "nonimmediate_operand" "=v,AR") + (subreg:TF (match_operand:FPRX2 1 "general_operand" "f,f") 0))] "TARGET_VXE" - "vmrhg\t%v0,%1,%N1" - [(set_attr "op_type" "VRR")]) + "@ + vmrhg\t%v0,%1,%N1 + #" + "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] +{ + operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, 0); + operands[3] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 0); + operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, 8); + operands[5] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 8); +} + [(set_attr "op_type" "VRR,*")]) (define_insn "*vec_ti_to_v1ti" [(set (match_operand:V1TI 0 "nonimmediate_operand" "=v,v,R, v, v,v") @@ -753,6 +764,21 @@ "vpdi\t%V0,%v1,%V0,5" [(set_attr "op_type" "VRR")]) +(define_insn_and_split "tf_to_fprx2" + [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f") + (subreg:FPRX2 (match_operand:TF 1 "general_operand" "v,AR") 0))] + "TARGET_VXE" + "#" + "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 4) (match_dup 5))] +{ + operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0); + operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0); + operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8); + operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8); +}) + ; vec_perm_const for V2DI using vpdi? ;; -- 2.29.2
[PATCH 0/2] IBM Z: Fix long double <-> DFP conversions
This series fixes PR99134. Patch 1 is factored out from the pending [1], patch 2 is the actual fix. Bootstrapped and regtested on s390x-redhat-linux. Ok for master? [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html Ilya Leoshkevich (2): IBM Z: Improve FPRX2 <-> TF conversions IBM Z: Fix long double <-> DFP conversions gcc/config/s390/vector.md | 108 +- .../s390/vector/long-double-from-decimal128.c | 20 .../s390/vector/long-double-from-decimal32.c | 20 .../s390/vector/long-double-from-decimal64.c | 20 .../s390/vector/long-double-to-decimal128.c | 19 +++ .../s390/vector/long-double-to-decimal32.c| 19 +++ .../s390/vector/long-double-to-decimal64.c| 19 +++ 7 files changed, 220 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c -- 2.29.2
[PATCH] PING^2 Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html Best regards, Ilya
[PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg
Hello, I would like to ping the following patch: lra: clear lra_insn_recog_data after simplifying a mem subreg https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html Best regards, Ilya
[PATCH v2] IBM Z: Fix usage of "f" constraint with long doubles
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html v1 -> v2: Handle constraint modifiers, use AR constraint instead of R, add testcases for & and %. After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (f_constraint_p): New function. (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-abi.c: New test. * gcc.target/s390/vector/long-double-asm-commutative.c: New test. * gcc.target/s390/vector/long-double-asm-earlyclobber.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 88 +++ gcc/config/s390/vector.md | 36 ++-- .../s390/vector/long-double-asm-abi.c | 26 ++ .../s390/vector/long-double-asm-commutative.c | 16 .../vector/long-double-asm-earlyclobber.c | 17 .../s390/vector/long-double-asm-in-out.c | 14 +++ .../s390/vector/long-double-asm-inout.c | 14 +++ .../s390/vector/long-double-asm-matching.c| 13 +++ .../vector/long-double-volatile-from-i64.c| 22 + 9 files changed, 241 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9d2cee950d0..d4b098325e8 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16688,6 +16688,91 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional + modifiers. */ + +static bool +f_constraint_p (const char *constraint) +{ + for (size_t i = 0, c_len = strlen (constraint); i < c_len; + i += CONSTRAINT_LEN (constraint[i], constraint + i)) +{ + if (constraint[i] == 'f') + return true; +} + return false; +} + +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (!f_constraint_p (constraint + 1)) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + gcc_assert (!is_inout); + /* Copy output value from a FPR pair into a vector register. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + push_to_sequence2 (after_md_seq, after_md_end); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + after_md_seq = get_insns (); + after_md_end = get_last_insn (); + end_sequence (); + outputs[i] = fprx2; +} + + for
Re: [PATCH] IBM Z: Fix usage of "f" constraint with long doubles
On Wed, 2021-01-27 at 08:58 +0100, Andreas Krebbel wrote: > On 1/18/21 10:54 PM, Ilya Leoshkevich wrote: > ... > > > +static rtx_insn * > > +s390_md_asm_adjust (vec &outputs, vec &inputs, > > + vec &input_modes, > > + vec &constraints, vec & > > /*clobbers*/, > > + HARD_REG_SET & /*clobbered_regs*/) > > +{ > > + if (!TARGET_VXE) > > +/* Long doubles are stored in FPR pairs - nothing to do. */ > > +return NULL; > > + > > + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; > > + > > + unsigned ninputs = inputs.length (); > > + unsigned noutputs = outputs.length (); > > + for (unsigned i = 0; i < noutputs; i++) > > +{ > > + if (GET_MODE (outputs[i]) != TFmode) > > + /* Not a long double - nothing to do. */ > > + continue; > > + const char *constraint = constraints[i]; > > + bool allows_mem, allows_reg, is_inout; > > + bool ok = parse_output_constraint (&constraint, i, ninputs, > > noutputs, > > +&allows_mem, &allows_reg, > > &is_inout); > > + gcc_assert (ok); > > + if (strcmp (constraint, "=f") != 0) > > + /* Long double with a constraint other than "=f" - nothing to > > do. */ > > + continue; > > What about other constraint modifiers like & and %? Don't we need to > handle matching constraints as > well here? Oh, right - we need to account for %?!*&# and maybe some others. I'll j ust copy the code from parse_output_constraint() that skips over all of them, because I don't think they need any special handling - we just nee d to make sure they don't mess up the recognition of "=f". I don't think we need to explicitly support matching constraints, because parse_input_constraint() will resolve them for us. I'll add a test for this just in case. Do we make use of multi-alternative constraints on s390? I think not, because our instructions are fairly rigid, but maybe I'm missing something? ... > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > > index 0e3c31f5d4f..1332a65a1d1 100644 > > --- a/gcc/config/s390/vector.md > > +++ b/gcc/config/s390/vector.md > > @@ -616,12 +616,23 @@ (define_insn "*vec_tf_to_v1tf_vr" > > vlvgp\t%v0,%1,%N1" > >[(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")]) > > > > -(define_insn "*fprx2_to_tf" > > - [(set (match_operand:TF 0 "nonimmediate_operand" > > "=v") > > - (subreg:TF (match_operand:FPRX2 1 "general_operand" "f") > > 0))] > > +(define_insn_and_split "fprx2_to_tf" > > + [(set (match_operand:TF 0 "nonimmediate_operand" > > "=v,R") > > + (subreg:TF (match_operand:FPRX2 1 > > "general_operand" "f,f") 0))] > >"TARGET_VXE" > > - "vmrhg\t%v0,%1,%N1" > > - [(set_attr "op_type" "VRR")]) > > + "@ > > + vmrhg\t%v0,%1,%N1 > > + #" > > + "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))" > > + [(set (match_dup 2) (match_dup 3)) > > + (set (match_dup 4) (match_dup 5))] > > +{ > > + operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, > > 0); > > + operands[3] = simplify_gen_subreg (DFmode, operands[1], > > FPRX2mode, 0); > > + operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, > > 8); > > + operands[5] = simplify_gen_subreg (DFmode, operands[1], > > FPRX2mode, 8); > > +} > > + [(set_attr "op_type" "VRR,*")]) > > Splitting an address like this might cause the displacement to > overflow in the second part. This > would require an additional reg to make the address valid again. > Which in turn will be a problem > after reload. You can use the 'AR' constraint for the memory > alternative. That way reload will make > sure the address is offsetable. Ok, thanks for the hint!
[PATCH v3] fwprop: Allow (subreg (mem)) simplifications
On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote: > Given what you said in the other message about combine, I agree this > is a reasonable workaround. I don't know whether it's suitable for > stage 4 or whether it would need to wait for stage 1. Thanks for reviewing! I've implemented your suggestions in the patch below. Regarding stage 4, this can be seen as a part of IBM Z https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html regression fix - before moving long doubles to vector registers and fixing up "f" constraints on RTL level, code generation for small glibc functions like __ieee754_sqrtl has been fairly efficient. Not sure if that issue is big enough to justify this common code change at this point, but still.. v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed formatting. Bootstrapped and regtested on x86_64-redhat-linux, pc64le-redhat-linux and s390x-redhat-linux. Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. --- gcc/fwprop.c | 33 - 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..123cc228630 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -176,7 +176,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (rtx_insn *, rtx, rtx); +fwprop_propagation (insn_info *, insn_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -185,13 +185,20 @@ namespace bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; uint16_t classify_result (rtx, rtx); + + private: +const bool single_use_p; +const bool single_ebb_p; }; } /* Prepare to replace FROM with TO in INSN. */ -fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to) - : insn_propagation (insn, from, to) +fwprop_propagation::fwprop_propagation (insn_info *use_insn, + insn_info *def_insn, rtx from, rtx to) + : insn_propagation (use_insn->rtl (), from, to), +single_use_p (def_insn->num_uses () == 1), +single_ebb_p (use_insn->ebb () == def_insn->ebb ()) { should_check_mems = true; should_note_simplifications = true; @@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications with the following + exceptions: + 1) Propagating (mem)s into multiple uses is not profitable. + 2) Propagating (mem)s across EBBs may not be profitable if the source EBB + runs less frequently. + 3) Propagating (mem)s into paradoxical (subreg)s is not profitable. + 4) Creating new (mem/v)s is not correct, since DCE will not remove the old + ones. */ + if (single_use_p + && single_ebb_p + && SUBREG_P (old_rtx) + && !paradoxical_subreg_p (old_rtx) + && MEM_P (new_rtx) + && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } @@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_insn, def_insn, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_insn, def_insn, dest, src); if (!prop.apply_to_pattern (loc)) { if (dump_file && (dump_flags & TDF_DETAILS)) -- 2.26.2
Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications
On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote: > Ilya Leoshkevich via Gcc-patches writes: > > On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote: > > > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches > > > wrote: > > > Suppose we have: > > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) > > > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) > > > > > > > > It is clearly profitable to propagate the first insn into the > > > > second > > > > one and get: > > > > > > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) > > > > > > > > fwprop actually manages to perform this, but doesn't think the > > > > result is > > > > worth it, which results in unnecessary store/load sequences on > > > > s390. > > > > Improve the situation by classifying SUBREG -> MEM changes as > > > > profitable. > > > > > > IIRC fwprop also propagates into multiple uses and replacing a > > > non- > > > MEM > > > with a MEM is only good when the original MEM goes away - is that > > > properly > > > dealt with here? > > > > This is because of efficiency and not correctness reasons, > > right? For > > correctness I already check MEM_VOLATILE_P (new_rtx). For > > efficiency I > > think it would be reasonable to add def_insn->num_uses () == 1 > > check > > (this passes my tests, I'm yet to do a full regtest though). > > That sounds plausible, but I think there's also the issue that the > mem could be in a less frequently executed block. > > A potential problem with checking num_uses is that it might make the > boundary between fwprop and combine more fuzzy. If the propagation > makes the original instruction redundant then we should remove it > and take the cost of the removal into account when costing the > propagation (as combine does). fwprop is instead set up for cases > in which propagations are profitable even if the original instruction > is kept. > > What prevents combine from handling this? Are the instructions in > different blocks? I wanted to do this before combine, because in __ieee754_sqrtl case fwprop turns this (example from the commit message + the insn after it): (set (reg:TF 63) (mem:TF (reg:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg:TF 63) 0)) (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(reg:FPRX2 66)] [(asm_input:FPRX2 ("f"))] [])) into this: (set (reg:TF 63) (mem:TF (reg:DI 62))) (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(subreg:FPRX2 (reg:TF 63) 0)] [(asm_input:FPRX2 ("f"))] [])) by propagating (reg:FPRX2 66), and there is not much combine can do about this anymore: (set (reg:FPRX2 65) (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0 [(mem:FPRX2 (reg:DI 62))] [(asm_input:FPRX2 ("f"))] [])) is not a valid insn.
[PATCH] PING Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Hello, I would like to ping the following patch: Add input_modes parameter to TARGET_MD_ASM_ADJUST hook https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html It is needed for the following regression fix: IBM Z: Fix usage of "f" constraint with long doubles https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html Best regards, Ilya
[PATCH v2] fwprop: Allow (subreg (mem)) simplifications
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses. Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Ok for master? Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. --- gcc/fwprop.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..02d3d507cbc 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -176,7 +176,7 @@ namespace static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1; static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2; -fwprop_propagation (rtx_insn *, rtx, rtx); +fwprop_propagation (rtx_insn *, insn_info *, rtx, rtx); bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; @@ -185,13 +185,18 @@ namespace bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; uint16_t classify_result (rtx, rtx); + + private: +const bool single_use_p; }; } /* Prepare to replace FROM with TO in INSN. */ -fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to) - : insn_propagation (insn, from, to) +fwprop_propagation::fwprop_propagation (rtx_insn *insn, insn_info *def_insn, + rtx from, rtx to) +: insn_propagation (insn, from, to), + single_use_p (def_insn->num_uses () == 1) { should_check_mems = true; should_note_simplifications = true; @@ -262,6 +267,13 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications. Do not allow propagation + of (mem)s into multiple uses, since those are not profitable, as well as + creating new (mem/v)s, since DCE will not remove the old ones. */ + if (single_use_p && SUBREG_P (old_rtx) && MEM_P (new_rtx) + && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } @@ -363,7 +375,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_rtl, def_insn, dest, src); if (!prop.apply_to_rvalue (&XEXP (note, 0))) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -426,7 +438,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, rtx_insn *use_rtl = use_insn->rtl (); insn_change_watermark watermark; - fwprop_propagation prop (use_rtl, dest, src); + fwprop_propagation prop (use_rtl, def_insn, dest, src); if (!prop.apply_to_pattern (loc)) { if (dump_file && (dump_flags & TDF_DETAILS)) -- 2.26.2
Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications
On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote: > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches > wrote: > > > Suppose we have: > > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) > > > > It is clearly profitable to propagate the first insn into the > > second > > one and get: > > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) > > > > fwprop actually manages to perform this, but doesn't think the > > result is > > worth it, which results in unnecessary store/load sequences on > > s390. > > Improve the situation by classifying SUBREG -> MEM changes as > > profitable. > > IIRC fwprop also propagates into multiple uses and replacing a non- > MEM > with a MEM is only good when the original MEM goes away - is that > properly > dealt with here? This is because of efficiency and not correctness reasons, right? For c orrectness I already check MEM_VOLATILE_P (new_rtx). For efficiency I t hink it would be reasonable to add def_insn->num_uses () == 1 check (thi s passes my tests, I'm yet to do a full regtest though). What do you think about this?
[PATCH] fwprop: Allow (subreg (mem)) simplifications
Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. I realize it might be too late for a change like this, but it's desirable to have this in conjunction with the https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html s390 regression fix, which otherwise produces unnecessary store/load sequences in certain glibc routines, e.g. __ieee754_sqrtl. Ok for master? Suppose we have: (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62))) (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0)) It is clearly profitable to propagate the first insn into the second one and get: (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62))) fwprop actually manages to perform this, but doesn't think the result is worth it, which results in unnecessary store/load sequences on s390. Improve the situation by classifying SUBREG -> MEM changes as profitable. gcc/ChangeLog: 2021-01-15 Ilya Leoshkevich * fwprop.c (fwprop_propagation::classify_result): Allow (subreg (mem)) simplifications. gcc/testsuite/ChangeLog: 2021-01-15 Ilya Leoshkevich * gcc.target/s390/vector/long-double-to-i64.c: Expect that float-vector moves do *not* happen. --- gcc/fwprop.c | 5 + gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c | 3 +-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/gcc/fwprop.c b/gcc/fwprop.c index eff8f7cc141..46b8ec7eccf 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -262,6 +262,11 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx new_rtx) && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from))) return PROFITABLE; + /* Allow (subreg (mem)) -> (mem) simplifications. However, do not allow + creating new (mem/v)s, since DCE will not remove the old ones. */ + if (SUBREG_P (old_rtx) && MEM_P (new_rtx) && !MEM_VOLATILE_P (new_rtx)) +return PROFITABLE; + return 0; } diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c index 2dbbb5d1c03..8f4e377ed72 100644 --- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c +++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c @@ -10,8 +10,7 @@ long_double_to_i64 (long double x) return x; } -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */ -/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */ +/* { dg-final { scan-assembler-not {\n\tvpdi\t} } } */ /* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */ int -- 2.26.2
[PATCH] IBM Z: Fix usage of "f" constraint with long doubles
Bootstrapped and regtested on s390x-redhat-linux. Depends on https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html; ok for master once the dependency is committed? After switching the s390 backend to store long doubles in vector registers, "f" constraint broke when used with the former: long doubles correspond to TFmode, which in combination with "f" corresponds to hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair. Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to FPRX2mode and back. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390.c (s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST. (TARGET_MD_ASM_ADJUST): Likewise. * config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf, add memory alternative. (tf_to_fprx2): New pattern. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-asm-abi.c: New test. * gcc.target/s390/vector/long-double-asm-in-out.c: New test. * gcc.target/s390/vector/long-double-asm-inout.c: New test. * gcc.target/s390/vector/long-double-volatile-from-i64.c: New test. --- gcc/config/s390/s390.c| 73 +++ gcc/config/s390/vector.md | 36 +++-- .../s390/vector/long-double-asm-abi.c | 26 +++ .../s390/vector/long-double-asm-in-out.c | 14 .../s390/vector/long-double-asm-inout.c | 14 .../vector/long-double-volatile-from-i64.c| 22 ++ 6 files changed, 180 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 9d2cee950d0..a22fd9fe391 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -16688,6 +16688,76 @@ s390_shift_truncation_mask (machine_mode mode) return mode == DImode || mode == SImode ? 63 : 0; } +/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f" + constraints when long doubles are stored in vector registers. */ + +static rtx_insn * +s390_md_asm_adjust (vec &outputs, vec &inputs, + vec &input_modes, + vec &constraints, vec & /*clobbers*/, + HARD_REG_SET & /*clobbered_regs*/) +{ + if (!TARGET_VXE) +/* Long doubles are stored in FPR pairs - nothing to do. */ +return NULL; + + rtx_insn *after_md_seq = NULL, *after_md_end = NULL; + + unsigned ninputs = inputs.length (); + unsigned noutputs = outputs.length (); + for (unsigned i = 0; i < noutputs; i++) +{ + if (GET_MODE (outputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[i]; + bool allows_mem, allows_reg, is_inout; + bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs, +&allows_mem, &allows_reg, &is_inout); + gcc_assert (ok); + if (strcmp (constraint, "=f") != 0) + /* Long double with a constraint other than "=f" - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + gcc_assert (!is_inout); + /* Copy output value from a FPR pair into a vector register. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + push_to_sequence2 (after_md_seq, after_md_end); + emit_insn (gen_fprx2_to_tf (outputs[i], fprx2)); + after_md_seq = get_insns (); + after_md_end = get_last_insn (); + end_sequence (); + outputs[i] = fprx2; +} + + for (unsigned i = 0; i < ninputs; i++) +{ + if (GET_MODE (inputs[i]) != TFmode) + /* Not a long double - nothing to do. */ + continue; + const char *constraint = constraints[noutputs + i]; + bool allows_mem, allows_reg; + bool ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0, + constraints.address (), &allows_mem, + &allows_reg); + gcc_assert (ok); + if (strcmp (constraint, "f") != 0 && strcmp (constraint, "=f") != 0) + /* Long double with a constraint other than "f" (or "=f" for inout + operands) - nothing to do. */ + continue; + gcc_assert (allows_reg); + gcc_assert (!allows_mem); + /* Copy input value from a vector register into a FPR pair. */ + rtx fprx2 = gen_reg_rtx (FPRX2mode); + emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i])); + inputs[i] =
[PATCH] lra: clear lra_insn_recog_data after simplifying a mem subreg
Hello, I ran into this problem when writing new patterns for s390. I'm not 100% sure this fix is correct, but it resolves my issue and survives bootstrap and regtest on x86_64-redhat-linux, ppc64le-redhat-linux and s390x-redhat-linux. Could you please take a look? Best regards, Ilya Suppose we have: (insn (set (reg:FPRX2 70) (subreg:FPRX2 (reg/v:TF 63) 0))) where operand_loc[0] points to r70 and operand_loc[1] points to r63. If r63 is spilled, remove_pseudos() will change this insn to: (insn (set (reg:FPRX2 70) (subreg:FPRX2 (mem/c:TF (plus:DI (reg:DI %fp) (const_int 144)) This is fine so far: rtx pointed to by operand_loc[1] has been changed from (reg) to (mem), but its slot is still under (subreg). However, alter_subreg() will simplify this insn to: (insn (set (reg:FPRX2 70) (mem/c:FPRX2 (plus:DI (reg:DI %fp) (const_int 144) The (subreg) is gone, and therefore operand_loc[1] is no longer valid. This will prevent process_insn_for_elimination() from updating the spill slot offset, causing miscompilation: different instructions will refer to the same spill slot using different offsets. Fix by clearing all the cached data, and not just used_insn_alternative. gcc/ChangeLog: 2021-01-13 Ilya Leoshkevich * lra-spills.c (remove_pseudos): Call lra_update_insn_recog_data() after calling alter_subreg() on a (mem). --- gcc/lra-spills.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c index 26f56b2df02..01bd82574e7 100644 --- a/gcc/lra-spills.c +++ b/gcc/lra-spills.c @@ -431,7 +431,7 @@ remove_pseudos (rtx *loc, rtx_insn *insn) alter_subreg (loc, false); if (GET_CODE (*loc) == MEM) { - lra_get_insn_recog_data (insn)->used_insn_alternative = -1; + lra_update_insn_recog_data (insn); if (lra_dump_file != NULL) fprintf (lra_dump_file, "Memory subreg was simplified in insn #%u\n", -- 2.26.2
[PATCH] IBM Z: Fix constraints in vpdi patterns
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? The destination register is only partially overwritten, so + should be used instead of =. gcc/ChangeLog: 2021-01-08 Ilya Leoshkevich * config/s390/vector.md (*tf_to_fprx2_0): Rename from *mov_tf_to_fprx2_0 for consistency, fix constraint. (*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for consistency, fix constraint. --- gcc/config/s390/vector.md | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 5b8d75f18f0..0e3c31f5d4f 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -737,16 +737,16 @@ (define_insn "*vec_perm" "vperm\t%v0,%v1,%v2,%v3" [(set_attr "op_type" "VRR")]) -(define_insn "*mov_tf_to_fprx2_0" - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0) +(define_insn "*tf_to_fprx2_0" + [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0) (subreg:DF (match_operand:TF1 "general_operand" "v") 0))] "TARGET_VXE" ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1]; "vpdi\t%v0,%v1,%v0,1" [(set_attr "op_type" "VRR")]) -(define_insn "*mov_tf_to_fprx2_1" - [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8) +(define_insn "*tf_to_fprx2_1" + [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8) (subreg:DF (match_operand:TF1 "general_operand" "v") 8))] "TARGET_VXE" ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1]; -- 2.26.2
[PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P. Give end users the opportunity to find out whether long doubles are stored in floating-point register pairs or in vector registers, so that they could fine-tune their asm statements. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390-c.c (s390_def_or_undef_macro): Accept callables instead of mask values. (struct target_flag_set_p): New predicate. (s390_cpu_cpp_builtins_internal): Define or undefine __LONG_DOUBLE_VX__ macro. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-vx-macro-off.c: New test. * gcc.target/s390/vector/long-double-vx-macro-on.c: New test. --- gcc/config/s390/s390-c.c | 59 --- .../s390/vector/long-double-vx-macro-off-on.c | 11 .../s390/vector/long-double-vx-macro-on-off.c | 11 3 files changed, 60 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c index 95cd2df505d..a5f5f56311a 100644 --- a/gcc/config/s390/s390-c.c +++ b/gcc/config/s390/s390-c.c @@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token *tok) /* Helper function that defines or undefines macros. If SET is true, the macro MACRO_DEF is defined. If SET is false, the macro MACRO_UNDEF is undefined. Nothing is done if SET and WAS_SET have the same value. */ +template static void -s390_def_or_undef_macro (cpp_reader *pfile, -unsigned int mask, +s390_def_or_undef_macro (cpp_reader *pfile, F is_set, const struct cl_target_option *old_opts, const struct cl_target_option *new_opts, const char *macro_def, const char *macro_undef) @@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile, bool was_set; bool set; - was_set = (!old_opts) ? false : old_opts->x_target_flags & mask; - set = new_opts->x_target_flags & mask; + was_set = (!old_opts) ? false : is_set (old_opts); + set = is_set (new_opts); if (was_set == set) return; if (set) @@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile, cpp_undef (pfile, macro_undef); } +struct target_flag_set_p +{ + target_flag_set_p (unsigned int mask) : m_mask (mask) {} + + bool + operator() (const struct cl_target_option *opts) const + { +return opts->x_target_flags & m_mask; + } + + unsigned int m_mask; +}; + /* Internal function to either define or undef the appropriate system macros. */ static void @@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, struct cl_target_option *opts, const struct cl_target_option *old_opts) { - s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts, - "__HTM__", "__HTM__"); - s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts, - "__VX__", "__VX__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__VEC__=10303", "__VEC__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__vector=__attribute__((vector_size(16)))", + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts, + opts, "__HTM__", "__HTM__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts, + opts, "__VX__", "__VX__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__VEC__=10303", "__VEC__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__vector=__attribute__((vector_size(16)))", "__vector__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__bool=__attribute__((s390_vector_bool)) unsigned", - "__bool"); + s390_def_or_undef_macro ( + pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts, + "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool"); { char macro_def[64]; gcc_assert (s390_arch != PROCESSOR_NATIVE); @@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, cpp_undef (pfile, "__ARCH__"); cpp_define (pfile, macro_def); } + s390_def_or_undef_macro ( +
[PATCH] IBM Z: Introduce __LONG_DOUBLE_VX__ macro
Bootstrapped and regtested on s390x-redhat-linux. Ok for master? Give end users the opportunity to find out whether long doubles are stored in floating-point register pairs or in vector registers, so that they could fine-tune their asm statements. gcc/ChangeLog: 2020-12-14 Ilya Leoshkevich * config/s390/s390-c.c (s390_def_or_undef_macro): Accept callables instead of mask values. (struct target_flag_set_p): New predicate. (s390_cpu_cpp_builtins_internal): Define or undefine __LONG_DOUBLE_VX__ macro. gcc/testsuite/ChangeLog: 2020-12-14 Ilya Leoshkevich * gcc.target/s390/vector/long-double-vx-macro-off.c: New test. * gcc.target/s390/vector/long-double-vx-macro-on.c: New test. --- gcc/config/s390/s390-c.c | 59 --- .../s390/vector/long-double-vx-macro-off-on.c | 11 .../s390/vector/long-double-vx-macro-on-off.c | 11 3 files changed, 60 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c index 95cd2df505d..29b87d76ab1 100644 --- a/gcc/config/s390/s390-c.c +++ b/gcc/config/s390/s390-c.c @@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token *tok) /* Helper function that defines or undefines macros. If SET is true, the macro MACRO_DEF is defined. If SET is false, the macro MACRO_UNDEF is undefined. Nothing is done if SET and WAS_SET have the same value. */ +template static void -s390_def_or_undef_macro (cpp_reader *pfile, -unsigned int mask, +s390_def_or_undef_macro (cpp_reader *pfile, F is_set, const struct cl_target_option *old_opts, const struct cl_target_option *new_opts, const char *macro_def, const char *macro_undef) @@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile, bool was_set; bool set; - was_set = (!old_opts) ? false : old_opts->x_target_flags & mask; - set = new_opts->x_target_flags & mask; + was_set = (!old_opts) ? false : is_set (old_opts); + set = is_set (new_opts); if (was_set == set) return; if (set) @@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile, cpp_undef (pfile, macro_undef); } +struct target_flag_set_p +{ + target_flag_set_p (unsigned int mask) : m_mask (mask) {} + + bool + operator() (const struct cl_target_option *opts) const + { +return opts->x_target_flags & m_mask; + } + + unsigned int m_mask; +}; + /* Internal function to either define or undef the appropriate system macros. */ static void @@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, struct cl_target_option *opts, const struct cl_target_option *old_opts) { - s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts, - "__HTM__", "__HTM__"); - s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts, - "__VX__", "__VX__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__VEC__=10303", "__VEC__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__vector=__attribute__((vector_size(16)))", + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts, + opts, "__HTM__", "__HTM__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts, + opts, "__VX__", "__VX__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__VEC__=10303", "__VEC__"); + s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, + opts, "__vector=__attribute__((vector_size(16)))", "__vector__"); - s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts, - "__bool=__attribute__((s390_vector_bool)) unsigned", - "__bool"); + s390_def_or_undef_macro ( + pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts, + "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool"); { char macro_def[64]; gcc_assert (s390_arch != PROCESSOR_NATIVE); @@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile, cpp_undef (pfile, "__ARCH__"); cpp_define (pfile, macro_def); } + s390_def_or_undef_macro ( + pfile, + [] (const struc
[PATCH] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
Bootstrapped and regtested on x86_64-redhat-linux. I also built cross-compilers for arm-linux-gnueabi, cris-elf mn10300-elf, nds32-linux-gnu, pdp11-aout (didn't fully work due to https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg251887.html, but the changed code compiled fine), powerpc-linux-gnu, vax-linux-gnu and visium-elf, but didn't test them. I ran into this issue while implementing TARGET_MD_ASM_ADJUST for s390. Ok for master? If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which should be ok as long as the hook itself as well as after_md_seq make up for it), input_mode will contain stale information. It might be tempting to fix this by removing input_mode altogether and just using GET_MODE (), but this will not work correctly with constants. So add input_modes parameter and document that it should be updated whenever inputs parameter is updated. gcc/ChangeLog: 2021-01-05 Ilya Leoshkevich * cfgexpand.c (expand_asm_loc): Pass new parameter. (expand_asm_stmt): Likewise. * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new parameter. * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise. * config/arm/arm.c (thumb1_md_asm_adjust): Likewise. * config/cris/cris.c (cris_md_asm_adjust): Likewise. * config/i386/i386.c (ix86_md_asm_adjust): Likewise. * config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise. * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise. * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise. * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise. * config/vax/vax.c (vax_md_asm_adjust): Likewise. * config/visium/visium.c (visium_md_asm_adjust): Likewise. * target.def (md_asm_adjust): Likewise. --- gcc/cfgexpand.c | 16 gcc/config/arm/aarch-common-protos.h | 8 gcc/config/arm/aarch-common.c| 7 --- gcc/config/arm/arm.c | 14 -- gcc/config/cris/cris.c | 7 --- gcc/config/i386/i386.c | 7 --- gcc/config/mn10300/mn10300.c | 7 --- gcc/config/nds32/nds32.c | 1 + gcc/config/pdp11/pdp11.c | 9 + gcc/config/rs6000/rs6000.c | 7 --- gcc/config/vax/vax.c | 3 ++- gcc/config/visium/visium.c | 12 +++- gcc/doc/tm.texi | 10 ++ gcc/target.def | 13 - 14 files changed, 69 insertions(+), 52 deletions(-) diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index b73019b241f..e25528261a0 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -2879,6 +2879,7 @@ expand_asm_loc (tree string, int vol, location_t locus) rtx asm_op, clob; unsigned i, nclobbers; auto_vec input_rvec, output_rvec; + auto_vec input_mode; auto_vec constraints; auto_vec clobber_rvec; HARD_REG_SET clobbered_regs; @@ -2888,9 +2889,8 @@ expand_asm_loc (tree string, int vol, location_t locus) clobber_rvec.safe_push (clob); if (targetm.md_asm_adjust) - targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); + targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, + constraints, clobber_rvec, clobbered_regs); asm_op = body; nclobbers = clobber_rvec.length (); @@ -3067,8 +3067,8 @@ expand_asm_stmt (gasm *stmt) return; } - /* There are some legacy diagnostics in here, and also avoids a - sixth parameger to targetm.md_asm_adjust. */ + /* There are some legacy diagnostics in here, and also avoids an extra + parameter to targetm.md_asm_adjust. */ save_input_location s_i_l(locus); unsigned noutputs = gimple_asm_noutputs (stmt); @@ -3419,9 +3419,9 @@ expand_asm_stmt (gasm *stmt) the flags register. */ rtx_insn *after_md_seq = NULL; if (targetm.md_asm_adjust) -after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec, - constraints, clobber_rvec, - clobbered_regs); +after_md_seq + = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode, +constraints, clobber_rvec, clobbered_regs); /* Do not allow the hook to change the output and input count, lest it mess up the operand numbering. */ diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 251de3d61a8..cbef50dde71 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -143,9 +143,9 @@ struct cpu_cost_table const struct vector_cost_table vect; }; -rtx_insn * -arm_md_asm_adjust (vec &outputs, vec &/*inputs*/, -
[PATCH] IBM Z: Fix check_effective_target_s390_z14_hw
Bootstrapped and regtested on z14. Ok for master? Commit 2f473f4b065d ("IBM Z: Do not run long double tests on old machines") introduced a predicate for tests that must run only on z14+. However, due to a syntax error, the predicate always returns false. gcc/testsuite/ChangeLog: 2020-12-10 Ilya Leoshkevich * gcc.target/s390/s390.exp: Replace %% with %. --- gcc/testsuite/gcc.target/s390/s390.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp index ba493de9f95..57b2690f8ab 100644 --- a/gcc/testsuite/gcc.target/s390/s390.exp +++ b/gcc/testsuite/gcc.target/s390/s390.exp @@ -197,7 +197,7 @@ proc check_effective_target_s390_z14_hw { } { int main (void) { int x = 0; - asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : ); + asm ("msgrkc %0,%0,%0" : "+r" (x) : ); return x; } }] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 } -- 2.26.2
[PATCH v2] aix: Fixinclude updates [PR98208]
On Fri, 2020-12-11 at 07:51 -0500, Nathan Sidwell wrote: > > I'm pretty sure this is wrong. I think the test_text in > inclhack.def > should be a pre-fixed string that the testsuite presumably checks is > converted. You're right; I've added your change from the Bugzilla and updated the expectation. Does the following look better? After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to fail (at least on gcc121 machine). Fix by updating fixincludes/tests and rerunning genfixes. Co-developed-by: Nathan Sidwell fixincludes/ChangeLog: 2020-12-11 Ilya Leoshkevich * fixincl.x: Rerun genfixes. * inclhack.def(aix_physadr_t): Change test_text to something that needs to be replaced. * tests/base/sys/types.h(aix_physadr_t): Add expectation. --- fixincludes/fixincl.x | 4 ++-- fixincludes/inclhack.def | 2 +- fixincludes/tests/base/sys/types.h | 5 + 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x index 21439652bce..cc17edfba0b 100644 --- a/fixincludes/fixincl.x +++ b/fixincludes/fixincl.x @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16 + * It has been AutoGen-ed December 9, 2020 at 11:16:08 AM by AutoGen 5.18.16 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020 +/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec 9 11:16:08 EST 2020 * * You must regenerate it. Use the ./genfixes script. * diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def index 80c9adfb07c..3a4cfe06542 100644 --- a/fixincludes/inclhack.def +++ b/fixincludes/inclhack.def @@ -731,7 +731,7 @@ fix = { select= "typedef[ \t]*struct[ \t]*([{][^}]*[}][ \t]*\\*[ \t]*physadr_t;)"; c_fix = format; c_fix_arg = "typedef struct __physadr_s %1"; -test_text = "typedef struct __physadr_s {"; +test_text = "typedef struct { random stuff } * physadr_t;"; }; /* diff --git a/fixincludes/tests/base/sys/types.h b/fixincludes/tests/base/sys/types.h index 683b5e93ecd..7340e76b175 100644 --- a/fixincludes/tests/base/sys/types.h +++ b/fixincludes/tests/base/sys/types.h @@ -9,6 +9,11 @@ +#if defined( AIX_PHYSADR_T_CHECK ) +typedef struct __physadr_s { random stuff } * physadr_t; +#endif /* AIX_PHYSADR_T_CHECK */ + + #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T -- 2.25.4
[PATCH] aix: Fixinclude updates [PR98208]
Tested on gcc121 (x86_64 CentOS Linux 7). Ok for master? After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to fail (at least on gcc121 machine). Fix by updating fixincludes/tests and rerunning genfixes. fixincludes/ChangeLog: 2020-12-11 Ilya Leoshkevich * fixincl.x: Rerun genfixes. * tests/base/sys/types.h: Add AIX_PHYSADR_T_CHECK. --- fixincludes/fixincl.x | 4 ++-- fixincludes/tests/base/sys/types.h | 5 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x index 21439652bce..cc17edfba0b 100644 --- a/fixincludes/fixincl.x +++ b/fixincludes/fixincl.x @@ -2,11 +2,11 @@ * * DO NOT EDIT THIS FILE (fixincl.x) * - * It has been AutoGen-ed October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16 + * It has been AutoGen-ed December 9, 2020 at 11:16:08 AM by AutoGen 5.18.16 * From the definitionsinclhack.def * and the template file fixincl */ -/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020 +/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec 9 11:16:08 EST 2020 * * You must regenerate it. Use the ./genfixes script. * diff --git a/fixincludes/tests/base/sys/types.h b/fixincludes/tests/base/sys/types.h index 683b5e93ecd..a318f9b713b 100644 --- a/fixincludes/tests/base/sys/types.h +++ b/fixincludes/tests/base/sys/types.h @@ -9,6 +9,11 @@ +#if defined( AIX_PHYSADR_T_CHECK ) +typedef struct __physadr_s { +#endif /* AIX_PHYSADR_T_CHECK */ + + #if defined( GNU_TYPES_CHECK ) #if !defined(_GCC_PTRDIFF_T) #define _GCC_PTRDIFF_T -- 2.25.4
[PATCH] Limit perf data buffer during feature checking
Bootstrapped and regtested on x86_64-redhat-linux. Ok for master? Commit 2ead1ab91123 ("Limit perf data buffer during profiling") added -m8 to perf invocations during running tests, but the same problem exists for checking whether perf is working in the first place. gcc/testsuite/ChangeLog: 2020-12-08 Ilya Leoshkevich * lib/target-supports.exp(check_profiling_available): Limit perf data buffer. --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 89c4f67554f..75b4f5d0e85 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -654,7 +654,7 @@ proc check_profiling_available { test_what } { return 0 } global srcdir - set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" "true -v >/dev/null"] + set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" "-m8 true -v >/dev/null"] if { [lindex $status 0] != 0 } { verbose "autofdo not supported because perf does not work" return 0 -- 2.25.4
Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels
On Thu, 2020-07-09 at 14:07 +0200, Ilya Leoshkevich wrote: > On Wed, 2020-07-01 at 21:48 +0200, Ilya Leoshkevich wrote: > > On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote: > > > On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc- > > > patches > > > wrote: > > > > gcc/ChangeLog: > > > > > > > > 2020-06-30 Ilya Leoshkevich > > > > > > > > * asan.c (asan_emit_stack_protection): Use > > > > CODE_LABEL_BOUNDARY. > > > > * defaults.h (CODE_LABEL_BOUNDARY): New macro. > > > > * doc/tm.texi: Document CODE_LABEL_BOUNDARY. > > > > * doc/tm.texi.in: Likewise. > > > Don't we already have the ability to set label alignments? See > > > LABEL_ALIGN. > > > > The following works with -falign-labels=2: > > > > --- a/gcc/asan.c > > +++ b/gcc/asan.c > > @@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx > > pbase, > > unsigned int alignb, > >DECL_INITIAL (decl) = decl; > >TREE_ASM_WRITTEN (decl) = 1; > >TREE_ASM_WRITTEN (id) = 1; > > - SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY); > > + SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) * > > BITS_PER_UNIT); > >emit_move_insn (mem, expand_normal (build_fold_addr_expr > > (decl))); > >shadow_base = expand_binop (Pmode, lshr_optab, base, > > gen_int_shift_amount (Pmode, > > ASAN_SHADOW_SHIFT), > > > > In order to go this way, we would need to raise `-falign-labels=` > > default to 2 for s390, which is not incorrect, but would > > unnecessarily > > clutter asm with `.align 2` before each label. So IMHO it would be > > nicer to simply ask the backend "what is your target's instruction > > alignment?". > > Besides that it would clutter asm with .align 2, another argument > against using LABEL_ALIGN here is that it's semantically different > from > what is needed: -falign-labels value, which it returns, is specified > by > user for optimization purposes, whereas here we need to query the > architecture's property. > > In practical terms, if user specifies -falign-labels=4096, this would > affect how the code is generated here. However, this would be > completely unnecessary: we never jump to decl, its address is only > saved for reporting. Hi Jeff, Could you please have another look at this one? Best regards, Ilya
Re: [PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.
On Wed, 2020-12-02 at 11:42 -0700, Jeff Law wrote: > > On 12/1/20 7:09 PM, Ilya Leoshkevich wrote: > > On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote: > > > No strong opinions. I think whichever is less invasive in terms > > > of > > > code > > > quality is probably the way to go. What we want to avoid is > > > suppressing > > > threading unnecessarily as that often leads to false positives > > > from > > > middle-end based warnings. Suppressing threading can also lead > > > to > > > build > > > failures in the kernel due to the way they use b_c_p. > > I think v1 is better then. Would you mind approving the following? > > That's the same code as in v1, but with the improved commit message > > and > > comments. > > > > > > > > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) > > build > > with GCC 10 fails on s390 with "impossible constraint". > > > > Explanation by Jeff Law: > > > > ``` > > So what we have is a b_c_p at the start of an if-else > > chain. Subsequent > > tests on the "true" arm of the the b_c_p test may throw us off the > > constant path (because the constants are out of range). Once all > > the > > tests are passed (it's constant and the constant is in range) the > > true > > arm's terminal block has a special asm that requires a constant > > argument. In the case where we get to the terminal block on the > > true > > arm, the argument to the b_c_p is used as the constant argument to > > the > > special asm. > > > > At first glace jump threading seems to be doing the right > > thing. Except > > that we end up with two paths to that terminal block with the > > special > > asm, one for each of the two constant arguments to the b_c_p call. > > Naturally since that same value is used in the asm, we have to > > introduce > > a PHI to select between them at the head of the terminal > > block. Now > > the argument in the asm is no longer constant and boom we fail. > > ``` > > > > Fix by disallowing __builtin_constant_p on threading paths. > > > > gcc/ChangeLog: > > > > 2020-06-03 Ilya Leoshkevich > > > > * tree-ssa-threadbackward.c > > (thread_jumps::profitable_jump_thread_path): > > Do not allow __builtin_constant_p on a threading path. > > > > gcc/testsuite/ChangeLog: > > > > 2020-06-03 Ilya Leoshkevich > > > > * gcc.target/s390/builtin-constant-p-threading.c: New test. > OK. I think the old forward threader has the same problem. Which I > think can be fixed by returning NULL from > record_temporary_equivalences_from_stmts_at_dest when we see the > B_C_P > call. Fixing that in the obvious way is pre-approved once it's gone > through the usual testing. Thanks! I've committed both: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=70a62009181f66d1d1c90d3c74de38e153c96eb0 https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=614aff0adf8fba5d843ec894603160151c20f0aa Best regards, Ilya
[PATCH] IBM Z: Build autovec-*-signaling-eq.c tests with exceptions
According to https://gcc.gnu.org/pipermail/gcc/2020-November/234344.html, GCC is allowed to perform optimizations that remove floating point traps, since they do not affect the modeled control flow. This interferes with two signaling comparison tests, where (a <= b && a >= b) is turned into (a <= b && a == b) by test_for_singularity, into ((a <= b) & (a == b)) by vectorizer and then into (a == b) eliminate_redundant_comparison. Fix by making traps affect the control flow by turning them into exceptions. gcc/testsuite/ChangeLog: 2020-12-03 Ilya Leoshkevich * gcc.target/s390/zvector/autovec-double-signaling-eq.c: Build with exceptions. * gcc.target/s390/zvector/autovec-float-signaling-eq.c: Likewise. --- .../gcc.target/s390/zvector/autovec-double-signaling-eq.c | 2 +- .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c index a8402b9f705..3645d3cc393 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c index 7dd91a5e6f3..d98aa0c494e 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions -fnon-call-exceptions" } */ #include "autovec.h" -- 2.25.4