PING [PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc

2022-10-17 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2022-09-27 at 02:23 +0200, Ilya Leoshkevich wrote:
> Hi,
> 
> This is a resend of v4 with slightly adjusted commit messages:
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
> v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
> 
> It still survives the bootstrap and the regtest on x86_64-redhat-
> linux,
> s390x-redhat-linux and ppc64le-redhat-linux.  It also fixes [1].
> 
> I also tried the approach with moving .LASANPC closer to the function
> label and using FUNCTION_BOUNDARY instead of introducing
> CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch
> the moment where the function label is written.  Architectures can do
> it by calling ASM_OUTPUT_LABEL() or assemble_name() in
> ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or
> TARGET_ASM_FUNCTION_PROLOGUE().  epiphany_start_function() does that
> twice, but passes the same decl to both calls.  Note that simply
> moving asan_function_start() to final_start_function_1() is not
> enough,
> since an architecture can write something after the function label.
> This all means that for this approach to work, all the architectures
> need to be adjusted, which looks like an overkill to me.
> 
> Best regards,
> Ilya
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html
> 
> 
> Ilya Leoshkevich (2):
>   asan: specify alignment for LASANPC labels
>   IBM zSystems: Define CODE_LABEL_BOUNDARY
> 
>  gcc/asan.cc    |  1 +
>  gcc/config/s390/s390.h |  3 +++
>  gcc/defaults.h |  5 +
>  gcc/doc/tm.texi    |  4 
>  gcc/doc/tm.texi.in |  4 
>  gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
>  6 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
> 



[PATCH v5 2/2] IBM zSystems: Define CODE_LABEL_BOUNDARY

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
Currently s390 emits the following sequence to store a frame_pc:

a:
.LASANPC0:

lg  %r1,.L5-.L4(%r13)
la  %r1,0(%r1,%r12)
stg %r1,176(%r11)

.L5:
.quad   .LASANPC0@GOTOFF

The reason GOT indirection is used instead of larl is that gcc does not
know that .LASANPC0, being a code label, is aligned on a 2-byte
boundary, and larl can load only even addresses.

Define CODE_LABEL_BOUNDARY in order to get rid of GOT indirection:

larl%r1,.LASANPC0
stg %r1,176(%r11)

gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
requires code labels to be aligned on a 2-byte boundary.

gcc/testsuite/ChangeLog:

2019-06-30  Ilya Leoshkevich  

* gcc.target/s390/asan-no-gotoff.c: New test.
---
 gcc/config/s390/s390.h |  3 +++
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index be566215df2..7d078ce6868 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -368,6 +368,9 @@ extern const char *s390_host_detect_local_cpu (int argc, 
const char **argv);
 /* Allocation boundary (in *bits*) for the code of a function.  */
 #define FUNCTION_BOUNDARY 64
 
+/* Alignment required for a code label, in bits.  */
+#define CODE_LABEL_BOUNDARY 16
+
 /* There is no point aligning anything to a rounder boundary than this.  */
 #define BIGGEST_ALIGNMENT 64
 
diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c 
b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
new file mode 100644
index 000..f555e4e96f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
@@ -0,0 +1,15 @@
+/* Test that ASAN labels are referenced without unnecessary indirections.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fPIE -O2 -fsanitize=kernel-address --param asan-stack=1" } */
+
+extern void c (int *);
+
+void a ()
+{
+  int b;
+  c (&b);
+}
+
+/* { dg-final { scan-assembler {\tlarl\t%r\d+,\.LASANPC\d+} } } */
+/* { dg-final { scan-assembler-not {\.LASANPC\d+@GOTOFF} } } */
-- 
2.37.2



[PATCH v5 1/2] asan: specify alignment for LASANPC labels

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* asan.cc (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
* defaults.h (CODE_LABEL_BOUNDARY): New macro.
* doc/tm.texi: Document CODE_LABEL_BOUNDARY.
* doc/tm.texi.in: Likewise.
---
 gcc/asan.cc| 1 +
 gcc/defaults.h | 5 +
 gcc/doc/tm.texi| 4 
 gcc/doc/tm.texi.in | 4 
 4 files changed, 14 insertions(+)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 8276f12cc69..62f50ee769b 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1960,6 +1960,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 953605c1627..52a471cf08e 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1455,4 +1455,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 typedef TARGET_UNIT target_unit;
 #endif
 
+/* Alignment required for a code label, in bits.  */
+#ifndef CODE_LABEL_BOUNDARY
+#define CODE_LABEL_BOUNDARY BITS_PER_UNIT
+#endif
+
 #endif  /* ! GCC_DEFAULTS_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 858bfb80cec..cc588ee23b5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1075,6 +1075,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 21b849ea32a..a0b725b0685 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -971,6 +971,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
-- 
2.37.2



[PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
Hi,

This is a resend of v4 with slightly adjusted commit messages:

v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html

It still survives the bootstrap and the regtest on x86_64-redhat-linux,
s390x-redhat-linux and ppc64le-redhat-linux.  It also fixes [1].

I also tried the approach with moving .LASANPC closer to the function
label and using FUNCTION_BOUNDARY instead of introducing
CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch
the moment where the function label is written.  Architectures can do
it by calling ASM_OUTPUT_LABEL() or assemble_name() in
ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or
TARGET_ASM_FUNCTION_PROLOGUE().  epiphany_start_function() does that
twice, but passes the same decl to both calls.  Note that simply
moving asan_function_start() to final_start_function_1() is not enough,
since an architecture can write something after the function label.
This all means that for this approach to work, all the architectures
need to be adjusted, which looks like an overkill to me.

Best regards,
Ilya

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html


Ilya Leoshkevich (2):
  asan: specify alignment for LASANPC labels
  IBM zSystems: Define CODE_LABEL_BOUNDARY

 gcc/asan.cc|  1 +
 gcc/config/s390/s390.h |  3 +++
 gcc/defaults.h |  5 +
 gcc/doc/tm.texi|  4 
 gcc/doc/tm.texi.in |  4 
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 6 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

-- 
2.37.2



Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-17 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-08-11 at 07:45 +0200, Andreas Krebbel wrote:
> On 8/10/22 13:42, Ilya Leoshkevich wrote:
> > On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
> > > Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> > > 
> > > 
> > > 
> > > dg.exp=pr104612.c fails with an ICE on s390x, because
> > > copysignv2sf3
> > > produces an insn that vsel is supposed to recognize, but
> > > can't,
> > > because it's not defined for V2SF.  Fix by defining it for all
> > > vector
> > > modes supported by copysign3.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * config/s390/vector.md (V_HW_FT): New iterator.
> > > * config/s390/vx-builtins.md (vsel): Use V instead
> > > of
> > > V_HW.
> > > ---
> > >  gcc/config/s390/vector.md  |  6 ++
> > >  gcc/config/s390/vx-builtins.md | 12 ++--
> > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > 
> > Jakub pointed out that this is broken in gcc-12 as well.
> > The patch applies cleanly, and I started a bootstrap/regtest.
> > Ok for gcc-12?
> 
> Yes. Thanks!
> 
> Andreas

Hi,

I've committed this today without realizing that gcc-12 branch is
closed.  Sorry!  Please let me know if I should revert this.

Best regards,
Ilya


Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-10 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
> produces an insn that vsel is supposed to recognize, but can't,
> because it's not defined for V2SF.  Fix by defining it for all vector
> modes supported by copysign3.
> 
> gcc/ChangeLog:
> 
> * config/s390/vector.md (V_HW_FT): New iterator.
> * config/s390/vx-builtins.md (vsel): Use V instead of
> V_HW.
> ---
>  gcc/config/s390/vector.md  |  6 ++
>  gcc/config/s390/vx-builtins.md | 12 ++--
>  2 files changed, 12 insertions(+), 6 deletions(-)

Jakub pointed out that this is broken in gcc-12 as well.
The patch applies cleanly, and I started a bootstrap/regtest.
Ok for gcc-12?


[PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-03 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
produces an insn that vsel is supposed to recognize, but can't,
because it's not defined for V2SF.  Fix by defining it for all vector
modes supported by copysign3.

gcc/ChangeLog:

* config/s390/vector.md (V_HW_FT): New iterator.
* config/s390/vx-builtins.md (vsel): Use V instead of
V_HW.
---
 gcc/config/s390/vector.md  |  6 ++
 gcc/config/s390/vx-builtins.md | 12 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a6c4b4eb974..624729814af 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -63,6 +63,12 @@
   V1DF V2DF
   (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
+; All modes present in V_HW and VFT.
+(define_mode_iterator V_HW_FT [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V1DF
+  V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE")
+  (V4SF "TARGET_VXE") (V1TF "TARGET_VXE")
+  (TF "TARGET_VXE")])
+
 ; FP vector modes directly supported by the HW.  This does not include
 ; vector modes using only part of a vector register and should be used
 ; for instructions which might trigger IEEE exceptions.
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index d5130799804..98ee08b2683 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -517,12 +517,12 @@
 ; swapped in s390-c.cc when we get here.
 
 (define_insn "vsel"
-  [(set (match_operand:V_HW  0 "register_operand" "=v")
-   (ior:V_HW
-(and:V_HW (match_operand:V_HW   1 "register_operand"  "v")
-  (match_operand:V_HW   3 "register_operand"  "v"))
-(and:V_HW (not:V_HW (match_dup 3))
-  (match_operand:V_HW   2 "register_operand"  "v"]
+  [(set (match_operand:V_HW_FT   0 "register_operand" "=v")
+   (ior:V_HW_FT
+(and:V_HW_FT (match_operand:V_HW_FT 1 "register_operand"  "v")
+ (match_operand:V_HW_FT 3 "register_operand"  "v"))
+(and:V_HW_FT (not:V_HW_FT (match_dup 3))
+ (match_operand:V_HW_FT 2 "register_operand"  "v"]
   "TARGET_VX"
   "vsel\t%v0,%1,%2,%3"
   [(set_attr "op_type" "VRR")])
-- 
2.35.3



Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-29 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2022-04-29 at 13:56 +0200, Jakub Jelinek wrote:
> On Fri, Apr 29, 2022 at 01:52:49PM +0200, Ilya Leoshkevich wrote:
> > > This doesn't resolve the problem, unfortunately, because
> > > references to discarded comdat symbols are still kept in .rodata:
> > > 
> > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced
> > > in
> > > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined
> > > in
> > > discarded section
> > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15
> > > Asse
> > > rt
> > > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)
> > > 
> > > (That's from building zlib-ng with ASan and your patch on s390).
> > > 
> > > So I was rather thinking about adding a reloc parameter to
> > > mergeable_constant_section () and slightly changing the section
> > > name when it's nonzero, e.g. from .cst to .cstrel.
> > 
> > After some experimenting, I don't think that what I propose here
> > is a good solution anymore, since it won't work with
> > -fno-merge-constants.
> > 
> > What do you think about something like this?
> > 
> > --- a/gcc/varasm.cc
> > +++ b/gcc/varasm.cc
> > @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode
> > mode, rtx x,
> >     return get_named_section (NULL, ".data.rel.ro", 3);
> >  }
> >  
> > +  if (reloc)
> > +    return targetm.asm_out.function_rodata_section
> > (current_function_decl,
> > +   false);
> > +
> >    return mergeable_constant_section (mode, align, 0);
> >  }
> > 
> > This would put constants with relocations into .rodata..
> > default_function_rodata_section () already ensures that these
> > sections
> > are in the right comdat group.
> 
> We don't really know if the emitted constant is purely for the
> current
> function, or also other functions (say emitted in as constant pool
> constant
> where constant pool constants are shared across the whole TU).
> For the former, putting it into current function's comdat is fine,
> for the
> latter certainly isn't.

mergeable_constant_section (), that the existing code calls in the same
context, already relies on this being known and calls
function_rodata_section () with exactly the same arguments.  If
!current_function_decl && !relocatable, we get readonly_data_section.
Of course, mergeable_constant_section () does not handle comdat
currently, so this point might be moot.

However, looking at the callers of output_constant_pool_contents (), it
seems that !current_function_decl happens in and only in the
shared_constant_pool case, so it looks as if we know whether the
constant is tied to a single function or not.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-29 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-04-28 at 14:05 +0200, Ilya Leoshkevich wrote:
> On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote:
> > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote:
> > > This is determined by default_elf_select_rtx_section ().  If we
> > > don't
> > > want to mix non-reloc and reloc constants, we need to define a
> > > special
> > > section there.
> > > 
> > > It seems to me, however, that this all would be made purely for
> > > the
> > > sake of .LASANPC, which is quite special: it's local, but at the
> > > same
> > > time it might need to be comdat.  I don't think anything like
> > > this
> > > can
> > > appear from compiling C/C++ code.
> > > 
> > > Therefore I wonder if we could just drop it altogether like this?
> > > 
> > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx
> > > pbase,
> > > unsigned int alignb,
> > > ...
> > > -  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > > (decl)));
> > > +  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > > (current_function_decl)));
> > > ...
> > > 
> > > That's what LLVM is already doing.  This will also solve the
> > > alignment
> > > problem I referred to earlier.
> > 
> > LLVM is doing a wrong thing here.  The global symbol can be
> > overridden by
> > a symbol in another shared library, that is definitely not what we
> > want,
> > because the ASAN record is for the particular implementation, not
> > the
> > other
> > one which could be quite different.
> 
> I see; this must be relevant when the overriding library calls
> the original one through dlsym (RTLD_NEXT).
> 
> > I think the right fix would be:
> > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100
> > +++ gcc/varasm.cc   2022-04-28 13:22:44.463147066 +0200
> > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_
> > return get_named_section (NULL, ".data.rel.ro", 3);
> >  }
> >  
> > +  if (reloc)
> > +    return readonly_data_section;
> > +
> >    return mergeable_constant_section (mode, align, 0);
> >  }
> >  
> > which matches what we do in categorize_decl_for_section:
> >   else if (reloc & targetm.asm_out.reloc_rw_mask ())
> >     ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL :
> > SECCAT_DATA_REL_RO;
> >   else if (reloc || flag_merge_constants < 2
> > ...
> >     /* C and C++ don't allow different variables to share the
> > same
> >    location.  -fmerge-all-constants allows even that (at
> > the
> >    expense of not conforming).  */
> >     ret = SECCAT_RODATA;
> >   else if (DECL_INITIAL (decl)
> >    && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
> >     ret = SECCAT_RODATA_MERGE_STR_INIT;
> >   else
> >     ret = SECCAT_RODATA_MERGE_CONST;
> > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and
> > .rodata
> > for non-pic, and mergeable sections are only used if there are no
> > relocations.
> 
> This doesn't resolve the problem, unfortunately, because
> references to discarded comdat symbols are still kept in .rodata:
> 
> `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in
> section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in
> discarded section
> `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Asse
> rt
> ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)
> 
> (That's from building zlib-ng with ASan and your patch on s390).
> 
> So I was rather thinking about adding a reloc parameter to
> mergeable_constant_section () and slightly changing the section
> name when it's nonzero, e.g. from .cst to .cstrel.

After some experimenting, I don't think that what I propose here
is a good solution anymore, since it won't work with
-fno-merge-constants.

What do you think about something like this?

--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode
mode, rtx x,
return get_named_section (NULL, ".data.rel.ro", 3);
 }
 
+  if (reloc)
+return targetm.asm_out.function_rodata_section
(current_function_decl,
+   false);
+
   return mergeable_constant_section (mode, align, 0);
 }

This would put constants with relocations into .rodata..
default_function_rodata_section () already ensures that these sections
are in the right comdat group.
> 


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-28 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote:
> On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote:
> > This is determined by default_elf_select_rtx_section ().  If we
> > don't
> > want to mix non-reloc and reloc constants, we need to define a
> > special
> > section there.
> > 
> > It seems to me, however, that this all would be made purely for the
> > sake of .LASANPC, which is quite special: it's local, but at the
> > same
> > time it might need to be comdat.  I don't think anything like this
> > can
> > appear from compiling C/C++ code.
> > 
> > Therefore I wonder if we could just drop it altogether like this?
> > 
> > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx
> > pbase,
> > unsigned int alignb,
> > ...
> > -  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (decl)));
> > +  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (current_function_decl)));
> > ...
> > 
> > That's what LLVM is already doing.  This will also solve the
> > alignment
> > problem I referred to earlier.
> 
> LLVM is doing a wrong thing here.  The global symbol can be
> overridden by
> a symbol in another shared library, that is definitely not what we
> want,
> because the ASAN record is for the particular implementation, not the
> other
> one which could be quite different.

I see; this must be relevant when the overriding library calls
the original one through dlsym (RTLD_NEXT).

> I think the right fix would be:
> --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100
> +++ gcc/varasm.cc   2022-04-28 13:22:44.463147066 +0200
> @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_
> return get_named_section (NULL, ".data.rel.ro", 3);
>  }
>  
> +  if (reloc)
> +    return readonly_data_section;
> +
>    return mergeable_constant_section (mode, align, 0);
>  }
>  
> which matches what we do in categorize_decl_for_section:
>   else if (reloc & targetm.asm_out.reloc_rw_mask ())
>     ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL :
> SECCAT_DATA_REL_RO;
>   else if (reloc || flag_merge_constants < 2
> ...
>     /* C and C++ don't allow different variables to share the
> same
>    location.  -fmerge-all-constants allows even that (at the
>    expense of not conforming).  */
>     ret = SECCAT_RODATA;
>   else if (DECL_INITIAL (decl)
>    && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
>     ret = SECCAT_RODATA_MERGE_STR_INIT;
>   else
>     ret = SECCAT_RODATA_MERGE_CONST;
> i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and
> .rodata
> for non-pic, and mergeable sections are only used if there are no
> relocations.

This doesn't resolve the problem, unfortunately, because
references to discarded comdat symbols are still kept in .rodata:

`.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in
section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in
discarded section
`.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Assert
ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)

(That's from building zlib-ng with ASan and your patch on s390).

So I was rather thinking about adding a reloc parameter to
mergeable_constant_section () and slightly changing the section
name when it's nonzero, e.g. from .cst to .cstrel.

> Anyway, I'd feel much safer to change it only in GCC 13, at least
> initially.

That's fine with me.

> Or are some linkers (say lld or mold, fod ld.bfd I'm pretty sure it
> doesn't,
> for gold no idea but unlikely) able to merge even constants with
> relocations against them?

I'm not sure, but putting constants with relocations into a separate
mergeable section shouldn't hurt too much.  And if such a linker is
implemented some day, there would be no need to tweak gcc.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-28 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 14:46 +0200, Jakub Jelinek wrote:
> On Wed, Apr 27, 2022 at 02:23:00PM +0200, Jakub Jelinek wrote:
> > On Wed, Apr 27, 2022 at 11:59:49AM +0200, Ilya Leoshkevich wrote:
> > > I get a .LASANPC reloc there in the first place because of
> > > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
> > > but of course it may happen for other reasons as well.
> > 
> > In that case I don't see any benefit to put that into a mergeable
> > section.
> > Why does that happen?
> 
> Because, when a mergeable section doesn't contain any relocations, I
> don't
> see any point in making it comdat.  Because mergeable sections
> themselves
> are garbage collected, if some constant isn't referenced at all, it
> isn't
> emitted, or if referenced, multiple copies of the constant are merged
> (or
> for mergeable strings even string tail merging is performed).
> 
> Jakub
> 

This is determined by default_elf_select_rtx_section ().  If we don't
want to mix non-reloc and reloc constants, we need to define a special
section there.

It seems to me, however, that this all would be made purely for the
sake of .LASANPC, which is quite special: it's local, but at the same
time it might need to be comdat.  I don't think anything like this can
appear from compiling C/C++ code.

Therefore I wonder if we could just drop it altogether like this?

@@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,
...
-  emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
+  emit_move_insn (mem, expand_normal (build_fold_addr_expr
(current_function_decl)));
...

That's what LLVM is already doing.  This will also solve the alignment
problem I referred to earlier.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 11:59 +0200, Ilya Leoshkevich via Gcc-patches
wrote:
> On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote:
> > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc-
> > patches wrote:
> > > Bootstrapped and regtested on x86_64-redhat-linux and
> > > s390x-redhat-linux.  Ok for master (or GCC 13 in case this
> > > doesn't
> > > fit
> > > stage4 criteria)?
> > 
> > I'd prefer to defer this to GCC 13 at this point.
> > Furthermore, does the linker then actually merge the constants with
> > the same constants from other mergeable linkonce sections or other
> > mergeable sections?  I'm afraid it would only merge constants
> > within
> > each comdat group and not across the whole ELF object.
> > 
> > Jakub
> > 
> 
> I experimented with this a little, and actually having a reloc
> prevents
> merging altogether (the check happens in _bfd_add_merge_section).
> 
> I get a .LASANPC reloc there in the first place because of
> https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
> but of course it may happen for other reasons as well.

I just realized I forgot to mention the "normal" case.
There, "aMG" seems to works fine with the whole ELF:

$ cat 1.s
.globl _start
_start:
ret
.section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat
.quad 42

$ cat 2.s
.section .rodata.yyy,"aMG",@progbits,8,.yyy,comdat
.quad 42
.quad 43
.section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat
.quad 42

$ gcc -nostartfiles -fPIE 1.s 2.s
$ objdump -D a.out

2000 <.rodata>:
2000:   2a 00   sub(%rax),%al
2002:   00 00   add%al,(%rax)
2004:   00 00   add%al,(%rax)
2006:   00 00   add%al,(%rax)
2008:   2b 00   sub(%rax),%eax
200a:   00 00   add%al,(%rax)
200c:   00 00   add%al,(%rax)
...



Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote:
> On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc-
> patches wrote:
> > Bootstrapped and regtested on x86_64-redhat-linux and
> > s390x-redhat-linux.  Ok for master (or GCC 13 in case this doesn't
> > fit
> > stage4 criteria)?
> 
> I'd prefer to defer this to GCC 13 at this point.
> Furthermore, does the linker then actually merge the constants with
> the same constants from other mergeable linkonce sections or other
> mergeable sections?  I'm afraid it would only merge constants within
> each comdat group and not across the whole ELF object.
> 
> Jakub
> 

I experimented with this a little, and actually having a reloc prevents
merging altogether (the check happens in _bfd_add_merge_section).

I get a .LASANPC reloc there in the first place because of
https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
but of course it may happen for other reasons as well.


[PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.  Ok for master (or GCC 13 in case this doesn't fit
stage4 criteria)?



Building C++ template-heavy code with ASan sometimes leads to bogus
"defined in discarded section" linker errors.

The reason is that .rodata.FUNC.cstN sections are not placed into
COMDAT group sections FUNC.  This is important, because ASan puts
references to .LASANPC labels into these sections.  Discarding the
respective .text.FUNC section causes the linker error.

Fix by adding SECTION_LINKONCE to .rodata.FUNC.cstN sections in
mergeable_constant_section () if the current function has an associated
COMDAT group.  This is similar to what switch_to_exception_section ()
is currently doing with .gcc_except_table.FUNC sections.

gcc/ChangeLog:

* varasm.cc (mergeable_constant_section): Honor COMDAT.

gcc/testsuite/ChangeLog:

* g++.dg/asan/comdat.C: New test.
---
 gcc/testsuite/g++.dg/asan/comdat.C | 35 ++
 gcc/varasm.cc  |  6 -
 2 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/asan/comdat.C

diff --git a/gcc/testsuite/g++.dg/asan/comdat.C 
b/gcc/testsuite/g++.dg/asan/comdat.C
new file mode 100644
index 000..cd4f3f830a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/comdat.C
@@ -0,0 +1,35 @@
+/* Check that we don't emit non-COMDAT rodata.  */
+
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-not 
{\.section\t\.rodata\._ZN1hlsIPKcEERS_RKT_\.cst[48],"[^"]*",@progbits,[48]\n} } 
} */
+
+const char *a;
+
+class b
+{
+public:
+  b ();
+};
+
+class h
+{
+public:
+  template 
+  h &
+  operator<< (const c &)
+  {
+d (b ());
+return *this;
+  }
+
+  void d (b);
+};
+
+h e ();
+
+h
+g ()
+{
+  e () << a << a << a;
+  throw;
+}
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index c41f17d64f7..f2614f0ee39 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -938,7 +938,11 @@ mergeable_constant_section (machine_mode mode 
ATTRIBUTE_UNUSED,
 
   sprintf (name, "%s.cst%d", prefix, (int) (align / 8));
   flags |= (align / 8) | SECTION_MERGE;
-  return get_section (name, flags, NULL);
+  if (current_function_decl
+ && DECL_COMDAT_GROUP (current_function_decl)
+ && HAVE_COMDAT_GROUP)
+   flags |= SECTION_LINKONCE;
+  return get_section (name, flags, current_function_decl);
 }
   return readonly_data_section;
 }
-- 
2.35.1



[PATCH][GCC11] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for
releases/gcc-11?



s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).

First of all, this is unnecessary - the other backends don't do it.

Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set.  This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE.  This causes a section
type conflict with the existing SECTION_CODE.

gcc/ChangeLog:

* config/s390/s390.c (s390_code_end): Do not switch back to
code section.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-section-type-conflict.c: New test.

(cherry picked from commit 8753b13a31c777cdab0265dae0b68534247908f7)
---
 gcc/config/s390/s390.c|  1 -
 .../s390/nobp-section-type-conflict.c | 22 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 8895dd7cc76..2d2e6522eb4 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16700,7 +16700,6 @@ s390_code_end (void)
  assemble_name_raw (asm_out_file, label_start);
  fputs ("-.\n", asm_out_file);
}
- switch_to_section (current_function_section ());
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
new file mode 100644
index 000..5d78bc99bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
@@ -0,0 +1,22 @@
+/* Checks that we don't get error: section type conflict with ‘put_page’.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern 
-mindirect-branch-table -O2" } */
+
+int a;
+int b (void);
+void c (int);
+
+static void
+put_page (void)
+{
+  if (b ())
+c (a);
+}
+
+__attribute__ ((__section__ (".init.text"), __cold__)) void
+d (void)
+{
+  put_page ();
+  put_page ();
+}
-- 
2.34.1



[PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?


s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).

First of all, this is unnecessary - the other backends don't do it.

Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set.  This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE.  This causes a section
type conflict with the existing SECTION_CODE.

gcc/ChangeLog:

* config/s390/s390.cc (s390_code_end): Do not switch back to
code section.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-section-type-conflict.c: New test.
---
 gcc/config/s390/s390.cc   |  1 -
 .../s390/nobp-section-type-conflict.c | 22 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 43c5c72554a..2db12d4ba4b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -16809,7 +16809,6 @@ s390_code_end (void)
  assemble_name_raw (asm_out_file, label_start);
  fputs ("-.\n", asm_out_file);
}
- switch_to_section (current_function_section ());
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
new file mode 100644
index 000..5d78bc99bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
@@ -0,0 +1,22 @@
+/* Checks that we don't get error: section type conflict with ‘put_page’.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern 
-mindirect-branch-table -O2" } */
+
+int a;
+int b (void);
+void c (int);
+
+static void
+put_page (void)
+{
+  if (b ())
+c (a);
+}
+
+__attribute__ ((__section__ (".init.text"), __cold__)) void
+d (void)
+{
+  put_page ();
+  put_page ();
+}
-- 
2.34.1



[PATCH gcc-11 2/2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use %K, use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* g++.target/s390/mi-thunk.C: New test.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call-z10-pic.c: New test.
* gcc.target/s390/call-z10.c: New test.
* gcc.target/s390/call-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call-z9-pic.c: New test.
* gcc.target/s390/call-z9.c: New test.
* gcc.target/s390/mfentry-m64-pic.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls-pic.c: New test.
* gcc.target/s390/tls.c: New test.

(cherry picked from commit 0990d93dd8a)
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 81 +--
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  | 23 ++
 .../gcc.target/s390/call-z10-pic-nodatarel.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10.c  | 20 +
 .../gcc.target/s390/call-z9-pic-nodatarel.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9.c   | 20 +
 gcc/testsuite/gcc.target/s390/call.h  | 40 +
 .../gcc.target/s390/mfentry-m64-pic.c |  9 +++
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +-
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   | 14 
 gcc/testsuite/gcc.target/s390/tls.c   | 10 +++
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 19 files changed, 320 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.h

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 15093cb4b30..99c343aa32c 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -101,10 +101,13 @@
 
 (define_special_predicate "bras_sym_operand"
   (ior (and (match_code "symbol_ref")
-   (match_test "!flag_pic || SYMBOL_REF_LOCAL_P (op)"))
+   (ior (match_test "!flag_pic")
+(match_test 

[PATCH gcc-11 1/2] IBM Z: Define NO_PROFILE_COUNTERS

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.

(cherry picked from commit a1c1b7a888a)
---
 gcc/config/s390/s390.c| 42 +++
 gcc/config/s390/s390.h|  2 +
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |  2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |  2 +-
 4 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index c5d4c439bcc..a863dfce9a2 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13120,33 +13120,25 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
-  rtx op[8];
-
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
+  rtx op[4];
 
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
   op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
-  op[7] = GEN_INT (UNITS_PER_LONG);
-
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
+  op[3] = GEN_INT (UNITS_PER_LONG);
 
-  op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
+  op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
-  op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT);
-  op[4] = gen_rtx_CONST (Pmode, op[4]);
+  op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT);
+  op[2] = gen_rtx_CONST (Pmode, op[2]);
 }
 
   if (flag_record_mcount)
@@ -13160,20 +13152,19 @@ s390_function_profiler (FILE *file, int labelno)
warning (OPT_Wcannot_profile, "nested functions cannot be profiled "
 "with %<-mfentry%> on s390");
   else
-   output_asm_insn ("brasl\t0,%4", op);
+   output_asm_insn ("brasl\t0,%2", op);
 }
   else if (TARGET_64BIT)
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 +
-/* brasl */ 3 + /* lg */ 3);
+   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 +
+/* lg */ 3);
   else
{
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
@@ -13182,15 +13173,14 @@ s390_function_profiler (FILE *file, int labelno)
   else
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 +
-/* brasl */ 3 + /* l */ 2);
+   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 +
+/* l */ 2);
   else
{
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines for nested functions.  */
 
diff --git a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c 
b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
index b2ad9f5bced..874ceb96fe8 100644
--- a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
+++ b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
@@ -4,5 +4,5 @@
 void
 profileme 

[PATCH gcc-11 0/2] Backport kpatch changes

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
Hi,

This series contains a backport of kpatch changes needed to support
https://github.com/dynup/kpatch/pull/1203 so that it could be used in
RHEL 9.  The patches have been in master for 4 months now without
issues.

Bootstrapped and regtested on s390x-redhat-linux.

Ok for gcc-11?

Best regards,
Ilya

Ilya Leoshkevich (2):
  IBM Z: Define NO_PROFILE_COUNTERS
  IBM Z: Use @PLT symbols for local functions in 64-bit mode

 gcc/config/s390/predicates.md |   9 +-
 gcc/config/s390/s390.c| 115 +++---
 gcc/config/s390/s390.h|   2 +
 gcc/config/s390/s390.md   |  32 ++---
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |   2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  |  23 
 .../gcc.target/s390/call-z10-pic-nodatarel.c  |  20 +++
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  |  20 +++
 gcc/testsuite/gcc.target/s390/call-z10.c  |  20 +++
 .../gcc.target/s390/call-z9-pic-nodatarel.c   |  18 +++
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   |  18 +++
 gcc/testsuite/gcc.target/s390/call-z9.c   |  20 +++
 gcc/testsuite/gcc.target/s390/call.h  |  40 ++
 .../gcc.target/s390/mfentry-m64-pic.c |   9 ++
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |   2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |   2 +-
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   |  26 +---
 gcc/testsuite/gcc.target/s390/pr80080-4.c |   2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|   6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   |  14 +++
 gcc/testsuite/gcc.target/s390/tls.c   |  10 ++
 gcc/testsuite/gcc.target/s390/tls.h   |  23 
 22 files changed, 336 insertions(+), 97 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.h

-- 
2.31.1



Re: [PATCH v3 3/3] reassoc: Test rank biasing

2021-09-28 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote:
> On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:
> 
> > Add both positive and negative tests.
> 
> The tests will likely be quite fragile with respect to what is
> actually vectorized on which target.  If you move the tests
> to gcc.dg/vect/ you could at least do
> 
> /* { dg-require-effective-target vect_int } */
> 
> do you need to look for the exact GIMPLE IL or is it enough to
> verify we are vectorizing the reduction?

Actually I don't think vectorization is that important here, and I
only check how many times sum_x = sum_y + _z appears.  So I use
(?:vect_)?, which may or may not be there.

An alternative I considered was to use -fno-tree-vectorize to get
smaller regexes, but I thought it would be nice to know that
vectorization does not mess up reassociation results.

Best regards,
Ilya



[PATCH v3 3/3] reassoc: Test rank biasing

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
Add both positive and negative tests.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/reassoc-46.c: New test.
* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
* gcc.dg/tree-ssa/reassoc-47.c: New test.
* gcc.dg/tree-ssa/reassoc-48.c: New test.
* gcc.dg/tree-ssa/reassoc-49.c: New test.
* gcc.dg/tree-ssa/reassoc-50.c: New test.
* gcc.dg/tree-ssa/reassoc-51.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
 7 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
new file mode 100644
index 000..97563dd929f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#include "reassoc-46.h"
+
+/* Check that the loop accumulator is added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
new file mode 100644
index 000..e60b490ea0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
@@ -0,0 +1,33 @@
+#define M 1024
+unsigned int arr1[M];
+unsigned int arr2[M];
+volatile unsigned int sink;
+
+unsigned int
+test (void)
+{
+  unsigned int sum = 0;
+  for (int i = 0; i < M; i++)
+{
+#ifdef MODIFY
+  /* Modify the loop accumulator using a chain of operations - this should
+ not affect its rank biasing.  */
+  sum |= 1;
+  sum ^= 2;
+#endif
+#ifdef STORE
+  /* Save the loop accumulator into a global variable - this should not
+ affect its rank biasing.  */
+  sink = sum;
+#endif
+#ifdef USE
+  /* Add a tricky use of the loop accumulator - this should prevent its
+ rank biasing.  */
+  i = (i + sum) % M;
+#endif
+  /* Use addends with different ranks.  */
+  sum += arr1[i];
+  sum += arr2[((i ^ 1) + 1) % M];
+}
+  return sum;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
new file mode 100644
index 000..1b0f0fdabe1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define MODIFY
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is saved into a global variable, it's
+   still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
new file mode 100644
index 000..13836ebe8e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is modified using a chain of operations
+   other than addition, its new value is still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
new file mode 100644
index 000..c1136a447a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define MODIFY
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is both modified using a chain of
+   operations other than addition and stored into a global variable, its new
+   value is still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d

[PATCH v3 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

* tree-ssa-reassoc.c (biased_names): New global.
(propagate_bias_p): New function.
(loop_carried_phi): Remove.
(propagate_rank): Propagate bias along single uses.
(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 109 -
 1 file changed, 74 insertions(+), 35 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..db9fb4e1cac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@ static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
all basic blocks the CFG should be adjusted - basic blocks
split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,53 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+ x_1 = phi(x_0, x_2)
+ a = x_1 | 1
+ b = a ^ 2
+ .MEM = b
+ c = b + d
+ x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_reference)
+return false;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+{
+  gimple *current_use_stmt = USE_STMT (use);
+
+  if (is_gimple_assign (current_use_stmt)
+ && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+   {
+ if (single_use_stmt != NULL && single_use_stmt != current_use_stmt)
+   return false;
+ single_use_stmt = current_use_stmt;
+   }
+}
+
+  if (single_use_stmt == NULL)
+return false;
+
+  if (gimple_bb (stmt)->loop_father
+  != gimple_bb (single_use_stmt)->loop_father)
+return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
an innermost loop, and the phi has only a single use which is inside
the loop, then the rank is the block rank of the loop latch plus an
@@ -313,49 +364,27 @@ phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-  || SSA_NAME_IS_DEFAULT_DEF (exp))
-return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
- an additional bias added in.  If this phi doesn't have block rank,
- it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
from expression OP.  For most operands, this is just the rank of OP.
For loop-carried phis, the value is zero to avoid undoing the bias
in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, bool *maybe_biased_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
- 

[PATCH v3 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..c5f915d04c6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, true /* early_p */);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
-  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, false /* early_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
 return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
 {
   gcc_assert (n == 0);
   insert_powi_p = param;
+  bias_loop_carried_phi_ranks_p = !param;
 }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-{ return execute_reassoc (insert_powi_p); }
+  {
+return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
  point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
-- 
2.31.1



[PATCH v3 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579976.html
Changes in v3:
* Do not propagate bias along tcc_references.
* Call get_rank () before checking biased_names.
* Add loop-carried phis to biased_names.
* Move the propagate_bias_p () call outside of the loop.
* Test with -ftree-vectorize, adjust expectations.

Ilya Leoshkevich (3):
  reassoc: Do not bias loop-carried PHIs early
  reassoc: Propagate PHI_LOOP_BIAS along single uses
  reassoc: Test rank biasing

 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |   7 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h |  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c |  11 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c |  10 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c |  11 ++
 gcc/tree-ssa-reassoc.c | 125 +++--
 9 files changed, 180 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

-- 
2.31.1



Re: [PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-24 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-09-23 at 13:55 +0200, Richard Biener wrote:
> On Wed, 22 Sep 2021, Ilya Leoshkevich wrote:
> 
> > PR tree-optimization/49749 introduced code that shortens dependency
> > chains containing loop accumulators by placing them last on operand
> > lists of associative operations.
> > 
> > 456.hmmer benchmark on s390 could benefit from this, however, the
> > code
> > that needs it modifies loop accumulator before using it, and since
> > only
> > so-called loop-carried phis are are treated as loop accumulators,
> > the
> > code in the present form doesn't really help.   According to Bill
> > Schmidt - the original author - such a conservative approach was
> > chosen
> > so as to avoid unnecessarily swapping operands, which might cause
> > unpredictable effects.  However, giving special treatment to forms
> > of
> > loop accumulators is acceptable.
> > 
> > The definition of loop-carried phi is: it's a single-use phi, which
> > is
> > used in the same innermost loop it's defined in, at least one
> > argument
> > of which is defined in the same innermost loop as the phi itself.
> > Given this, it seems natural to treat single uses of such phis as
> > phis
> > themselves.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-reassoc.c (biased_names): New global.
> > (propagate_bias_p): New function.
> > (loop_carried_phi): Remove.
> > (propagate_rank): Propagate bias along single uses.
> > (get_rank): Update biased_names when needed.
> > ---
> >  gcc/tree-ssa-reassoc.c | 97 --
> > 
> >  1 file changed, 64 insertions(+), 33 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> > index 420c14e8cf5..2f7a8882aac 100644
> > --- a/gcc/tree-ssa-reassoc.c
> > +++ b/gcc/tree-ssa-reassoc.c
> > @@ -211,6 +211,10 @@ static int64_t *bb_rank;
> >  /* Operand->rank hashtable.  */
> >  static hash_map *operand_rank;
> >  
> > +/* SSA_NAMEs that are forms of loop accumulators and whose ranks
> > need to be
> > +   biased.  */
> > +static auto_bitmap biased_names;
> > +
> >  /* Vector of SSA_NAMEs on which after reassociate_bb is done with
> >     all basic blocks the CFG should be adjusted - basic blocks
> >     split right after that SSA_NAME's definition statement and
> > before
> > @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator
> > *gsi)
> >     the rank difference between two blocks.  */
> >  #define PHI_LOOP_BIAS (1 << 15)
> >  
> > +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of
> > the STMT's
> > +   operands to the STMT's left-hand side.  The goal is to preserve
> > bias in code
> > +   like this:
> > +
> > + x_1 = phi(x_0, x_2)
> > + a = x_1 | 1
> > + b = a ^ 2
> > + .MEM = b
> > + c = b + d
> > + x_2 = c + e
> > +
> > +   That is, we need to preserve bias along single-use chains
> > originating from
> > +   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are
> > considered to be
> > +   uses, because only they participate in rank propagation.  */
> > +static bool
> > +propagate_bias_p (gimple *stmt)
> > +{
> > +  use_operand_p use;
> > +  imm_use_iterator use_iter;
> > +  gimple *single_use_stmt = NULL;
> > +
> > +  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
> > +    {
> > +  gimple *current_use_stmt = USE_STMT (use);
> > +
> > +  if (is_gimple_assign (current_use_stmt)
> > + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) ==
> > SSA_NAME)
> > +   {
> > + if (single_use_stmt != NULL)
> 
> what if single_use_stmt == current_use_stmt?  We might have two
> uses on a stmt after all - should that still be biased?  I guess not
> and thus the check is correct?

Come to think of it, it should be ok to bias it.  Things like
x = x + x are fine (this particular case can be transformed into
something else earlier, but I think the overall point still holds).
> 
> > +   return false;
> > + single_use_stmt = current_use_stmt;
> > +   }
> > +    }
> > +
> > +  if (single_use_stmt == NULL)
> > +    return false;
> > +
> > +  if (gimple_bb (stmt)->loop_father
> > +  != gimple_bb (single_use_stmt)->loop_father)
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> >  /* Rank assigned to a phi statement.  If STMT is a loop-carried
> > phi of
> >     an innermost loop, and the phi has only a single use which is
> > inside
> >     the loop, then the rank is the block rank of the loop latch
> > plus an
> > @@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
> >    return bb_rank[bb->index];
> >  }
> >  
> > -/* If EXP is an SSA_NAME defined by a PHI statement that
> > represents a
> > -   loop-carried dependence of an innermost loop, return TRUE; else
> > -   return FALSE.  */
> > -static bool
> > -loop_carried_phi (tree exp)
> > -{
> > -  gimple *phi_stmt;
> > -  int64_t block_rank;
> > -
> > -  if (TREE_CODE (exp) != SSA_NAME
> > -  || SSA_NAME_IS_DEFAULT_DEF (exp))
> > -    return fals

[PATCH v2 3/3] reassoc: Test rank biasing

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
Add both positive and negative tests.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/reassoc-46.c: New test.
* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
* gcc.dg/tree-ssa/reassoc-47.c: New test.
* gcc.dg/tree-ssa/reassoc-48.c: New test.
* gcc.dg/tree-ssa/reassoc-49.c: New test.
* gcc.dg/tree-ssa/reassoc-50.c: New test.
* gcc.dg/tree-ssa/reassoc-51.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
 7 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
new file mode 100644
index 000..69e02bc4d4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "reassoc-46.h"
+
+/* Check that the loop accumulator is added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
new file mode 100644
index 000..e60b490ea0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
@@ -0,0 +1,33 @@
+#define M 1024
+unsigned int arr1[M];
+unsigned int arr2[M];
+volatile unsigned int sink;
+
+unsigned int
+test (void)
+{
+  unsigned int sum = 0;
+  for (int i = 0; i < M; i++)
+{
+#ifdef MODIFY
+  /* Modify the loop accumulator using a chain of operations - this should
+ not affect its rank biasing.  */
+  sum |= 1;
+  sum ^= 2;
+#endif
+#ifdef STORE
+  /* Save the loop accumulator into a global variable - this should not
+ affect its rank biasing.  */
+  sink = sum;
+#endif
+#ifdef USE
+  /* Add a tricky use of the loop accumulator - this should prevent its
+ rank biasing.  */
+  i = (i + sum) % M;
+#endif
+  /* Use addends with different ranks.  */
+  sum += arr1[i];
+  sum += arr2[((i ^ 1) + 1) % M];
+}
+  return sum;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
new file mode 100644
index 000..84b51ccddb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is saved into a global variable, it's
+   still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
new file mode 100644
index 000..53ae8820281
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is modified using a chain of operations
+   other than addition, its new value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
new file mode 100644
index 000..a6941d5ac2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is both modified using a chain of
+   operations other than addition and stored into a global variable, its new
+   value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
new file mode 100644
index 000..68cd308c4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimize

[PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

* tree-ssa-reassoc.c (biased_names): New global.
(propagate_bias_p): New function.
(loop_carried_phi): Remove.
(propagate_rank): Propagate bias along single uses.
(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 97 --
 1 file changed, 64 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..2f7a8882aac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@ static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
all basic blocks the CFG should be adjusted - basic blocks
split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+ x_1 = phi(x_0, x_2)
+ a = x_1 | 1
+ b = a ^ 2
+ .MEM = b
+ c = b + d
+ x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+{
+  gimple *current_use_stmt = USE_STMT (use);
+
+  if (is_gimple_assign (current_use_stmt)
+ && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+   {
+ if (single_use_stmt != NULL)
+   return false;
+ single_use_stmt = current_use_stmt;
+   }
+}
+
+  if (single_use_stmt == NULL)
+return false;
+
+  if (gimple_bb (stmt)->loop_father
+  != gimple_bb (single_use_stmt)->loop_father)
+return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
an innermost loop, and the phi has only a single use which is inside
the loop, then the rank is the block rank of the loop latch plus an
@@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-  || SSA_NAME_IS_DEFAULT_DEF (exp))
-return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
- an additional bias added in.  If this phi doesn't have block rank,
- it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
from expression OP.  For most operands, this is just the rank of OP.
For loop-carried phis, the value is zero to avoid undoing the bias
in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
-return rank;
+  if (TREE_CODE (op) == SSA_NAME
+  && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
+{
+  i

[PATCH v2 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..c5f915d04c6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, true /* early_p */);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
-  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, false /* early_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
 return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
 {
   gcc_assert (n == 0);
   insert_powi_p = param;
+  bias_loop_carried_phi_ranks_p = !param;
 }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-{ return execute_reassoc (insert_powi_p); }
+  {
+return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
  point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
-- 
2.31.1



[PATCH v2 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
This is an update to my very old patch with the review comments
addressed.  Bootstrapped and regtested x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548785.html
Changes in v2:
* Disable PHI biasing in the early pass instance in a separate patch.
* Replace s390-specific tests with the generic tree-ssa ones.
* Replace the fragile (op_rank & PHI_LOOP_BIAS) test with auto_bitmap
  biased_names.  The review suggestion was to rather check whether op
  is defined by a loop-carried phi, but this would allow detecting only
  single assingments, and not assignment chains.  Another alternative
  that would make the check less fragile was to use saturating addition
  in order to prevent overflows into the PHI_LOOP_BIAS bit, but
  auto_bitmap of SSA_NAMEs allows graceful processing of large basic
  blocks, and its memory overhead looks acceptable.
* Restructure the code to make it a bit more readable.  The overall
  logic is the same as in v1.  I considered implementing an idea from
  [1], more specifically, detecting single-use chains in
  is_phi_for_stmt() so that swap_ops_for_binary_stmt() shifts the
  corresponding operand towards the end.  These two functions actually
  seem to serve a very related purpose.  However, for single-use chain
  detection we would still need to recursively traverse
  SSA_NAME_DEF_STMTs of operands, which propagate_rank() and friends
  already do.  So this would not have resulted in a significant code
  simplification.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549149.html

Ilya Leoshkevich (3):
  reassoc: Do not bias loop-carried PHIs early
  reassoc: Propagate PHI_LOOP_BIAS along single uses
  reassoc: Test rank biasing

 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |   7 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h |  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c |  11 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c |  10 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c |  11 ++
 gcc/tree-ssa-reassoc.c | 113 ++---
 9 files changed, 170 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

-- 
2.31.1



[PATCH] IBM Z: Enable LSan and TSan

2021-07-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

libsanitizer/ChangeLog:

* configure.tgt (s390*-*-linux*): Enable LSan and TSan for
s390x.
---
 libsanitizer/configure.tgt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 0ca5d9fd924..f635e412bdc 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -41,6 +41,11 @@ case "${target}" in
   sparc*-*-linux*)
;;
   s390*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   LSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_s390x.lo
+   fi
;;
   sparc*-*-solaris2.11*)
;;
-- 
2.31.1



[PATCH] IBM Z: Fix 5 tests in 31-bit mode

2021-07-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



gcc/testsuite/ChangeLog:

* gcc.target/s390/global-array-element-pic2.c: Add -mzarch, add
an expectation for 31-bit mode.
* gcc.target/s390/load-imm64-1.c: Use unsigned long long.
* gcc.target/s390/load-imm64-2.c: Likewise.
* gcc.target/s390/vector/long-double-vx-macro-off-on.c: Use
-mzarch.
* gcc.target/s390/vector/long-double-vx-macro-on-off.c:
Likewise.
---
 gcc/testsuite/gcc.target/s390/global-array-element-pic2.c| 5 +++--
 gcc/testsuite/gcc.target/s390/load-imm64-1.c | 4 ++--
 gcc/testsuite/gcc.target/s390/load-imm64-2.c | 4 ++--
 .../gcc.target/s390/vector/long-double-vx-macro-off-on.c | 2 +-
 .../gcc.target/s390/vector/long-double-vx-macro-on-off.c | 2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c 
b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
index 72b87d40b85..0ee10841cac 100644
--- a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
+++ b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
@@ -1,6 +1,6 @@
 /* Test accesses to global array elements in PIC code.  */
 /* { dg-do compile } */
-/* { dg-options "-O1 -march=z10 -fPIC" } */
+/* { dg-options "-O1 -march=z10 -mzarch -fPIC" } */
 
 extern char a[] __attribute__ ((aligned (2)));
 extern char *b;
@@ -8,6 +8,7 @@ extern char *b;
 void c()
 {
   b = a + 4;
-  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" { target 
lp64 } } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlrl\t%r\\d+,a@GOTENT\n" { target { ! 
lp64 } } } } */
   /* { dg-final { scan-assembler-not "(?n)\n\tlarl\t%r\\d+,a\[^@\]" } } */
 }
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
index 03d17f59096..8e812f2f01d 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-1.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z9-109" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
index ee0ff3b0a91..c3536b4d031 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-2.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z10" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
index 2d67679bb11..513912e669d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z14" } */
+/* { dg-options "-march=z14 -mzarch" } */
 #if !defined(__LONG_DOUBLE_VX__)
 #error
 #endif
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
index 6f264313408..6b3cb321338 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z13" } */
+/* { dg-options "-march=z13 -mzarch" } */
 #if defined(__LONG_DOUBLE_VX__)
 #error
 #endif
-- 
2.31.1



[PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-12 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
  UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
  weak symbols in non-PIC code (Ulrich).  Add TLS tests.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html
v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(),
  add tests for these cases.



This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use %K, use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* g++.target/s390/mi-thunk.C: New test.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call-z10-pic.c: New test.
* gcc.target/s390/call-z10.c: New test.
* gcc.target/s390/call-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call-z9-pic.c: New test.
* gcc.target/s390/call-z9.c: New test.
* gcc.target/s390/mfentry-m64-pic.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls-pic.c: New test.
* gcc.target/s390/tls.c: New test.
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 81 +--
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  | 23 ++
 .../gcc.target/s390/call-z10-pic-nodatarel.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10.c  | 20 +
 .../gcc.target/s390/call-z9-pic-nodatarel.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9.c   | 20 +
 gcc/testsuite/gcc.target/s390/call.h  | 40 +
 .../gcc.target/s390/mfentry-m64-pic.c |  9 +++
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +-
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   | 14 
 gcc/testsuite/gcc.target/s390/tls.c   | 10 +++
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 19 files changed, 320 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 1006

Re: [PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-07 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-07-07 at 21:03 +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
> v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
>   UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
>   weak symbols in non-PIC code (Ulrich).  Add TLS tests.
> 
> 
> 
> This helps with generating code for kernel hotpatches, which contain
> individual functions and are loaded more than 2G away from vmlinux.
> This should not create performance regressions for the normal use
> cases, because for local functions ld replaces @PLT calls with direct
> calls.

Please disregard this patch, I just realized I missed two
output_asm_insn () calls in s390.c: one in function_profiler () and
one in s390_output_mi_thunk ().  I'll send a v3.



[PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-07 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
  UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
  weak symbols in non-PIC code (Ulrich).  Add TLS tests.



This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call31-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z10-pic.c: New test.
* gcc.target/s390/call31-z10.c: New test.
* gcc.target/s390/call31-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z9-pic.c: New test.
* gcc.target/s390/call31-z9.c: New test.
* gcc.target/s390/call64-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z10-pic.c: New test.
* gcc.target/s390/call64-z10.c: New test.
* gcc.target/s390/call64-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z9-pic.c: New test.
* gcc.target/s390/call64-z9.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls31-pic.c: New test.
* gcc.target/s390/tls31.c: New test.
* gcc.target/s390/tls64-pic.c: New test.
* gcc.target/s390/tls64.c: New test.
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 73 ++-
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/gcc.target/s390/call.h  | 40 ++
 .../s390/call31-z10-pic-nodatarel.c   | 16 
 .../gcc.target/s390/call31-z10-pic.c  | 16 
 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 
 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 
 .../s390/call64-z10-pic-nodatarel.c   | 17 +
 .../gcc.target/s390/call64-z10-pic.c  | 17 +
 gcc/testsuite/gcc.target/s390/call64-z10.c| 15 
 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9.c | 15 
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +--
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 gcc/testsuite/gcc.target/s390/tls31-pic.c | 14 
 gcc/testsuite/gcc.target/s390/tls31.c |  9 +++
 gcc/testsuite/gcc.target/s390/tls64-pic.c | 14 
 gcc/testsuite/gcc.target/s390/tls64.c |  9 +++
 25 files changed, 382 insertions(+), 69 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c
 create mode 100644 gcc/tes

[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-06-24 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



This helps with generating the code for kernel hotpatches, which
contain individual functions and are loaded more than 2G away from
vmlinux.  This should not create performance regressions for the
normal use cases, because for local functions ld replaces @PLT calls
with direct calls.

gcc/ChangeLog:

* config/s390/s390.c (print_operand): Handle %K.
* config/s390/s390.md (*movdi_64): Use %K for larl.
(reload_base_64): Likewise.
(*sibcall_brc): Use %K for j.
(*sibcall_brcl): Use %K for jg.
(*sibcall_value_brc): Use %K for j.
(*sibcall_value_brcl): Use %K for jg.
(*bras): Use %K.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(main_base_64): Use %K for larl.
(reload_base_64): Likewise.
(@split_stack_call): Use %K for jg.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call31-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z10-pic.c: New test.
* gcc.target/s390/call31-z10.c: New test.
* gcc.target/s390/call31-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z9-pic.c: New test.
* gcc.target/s390/call31-z9.c: New test.
* gcc.target/s390/call64-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z10-pic.c: New test.
* gcc.target/s390/call64-z10.c: New test.
* gcc.target/s390/call64-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z9-pic.c: New test.
* gcc.target/s390/call64-z9.c: New test.
---
 gcc/config/s390/s390.c|  9 +
 gcc/config/s390/s390.md   | 26 ++---
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/gcc.target/s390/call.h  | 38 +++
 .../s390/call31-z10-pic-nodatarel.c   | 16 
 .../gcc.target/s390/call31-z10-pic.c  | 16 
 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 
 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 
 .../s390/call64-z10-pic-nodatarel.c   | 17 +
 .../gcc.target/s390/call64-z10-pic.c  | 17 +
 gcc/testsuite/gcc.target/s390/call64-z10.c| 15 
 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9.c | 15 
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +--
 19 files changed, 258 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e7839044a40 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr)
 'E': print opcode suffix for branch on index instruction.
 'G': print the size of the operand in bytes.
 'J': print tls_load/tls_gdcall/tls_ldcall suffix
+'K': print @PLT suffix for call targets and load address values.
 'M': print the second word of a TImode operand.
 'N': print the second word of a DImode operand.
 'O': print only the displacement of a memory reference or address.
@@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code)
 case 'Y':
   print_shift_count_operand (file, x);
   return;
+
+case 'K':
+  if (TARGET_64BIT
+ && flag_pic
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_FUNCTION_P (x))
+   fprintf (file, "@PLT");
+  return

[PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html
v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas).
  I've also noticed that one of the nops that we generate for
  -mnop-mcount is not needed now and removed it.  A couple
  tests needed to be adjusted after that.




s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.
---
 gcc/config/s390/s390.c| 42 +++
 gcc/config/s390/s390.h|  2 +
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |  2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |  2 +-
 4 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..590dd8f35bc 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,33 +13110,25 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
-  rtx op[8];
-
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
+  rtx op[4];
 
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
   op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
-  op[7] = GEN_INT (UNITS_PER_LONG);
-
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
+  op[3] = GEN_INT (UNITS_PER_LONG);
 
-  op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
+  op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
-  op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT);
-  op[4] = gen_rtx_CONST (Pmode, op[4]);
+  op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT);
+  op[2] = gen_rtx_CONST (Pmode, op[2]);
 }
 
   if (flag_record_mcount)
@@ -13150,20 +13142,19 @@ s390_function_profiler (FILE *file, int labelno)
warning (OPT_Wcannot_profile, "nested functions cannot be profiled "
 "with %<-mfentry%> on s390");
   else
-   output_asm_insn ("brasl\t0,%4", op);
+   output_asm_insn ("brasl\t0,%2", op);
 }
   else if (TARGET_64BIT)
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 +
-/* brasl */ 3 + /* lg */ 3);
+   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 +
+/* lg */ 3);
   else
{
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
@@ -13172,15 +13163,14 @@ s390_function_profiler (FILE *file, int labelno)
   else
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 +
-/* brasl */ 3 + /* l */ 2);
+   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 +
+/* l */ 2);
   else
{
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines 

[PATCH] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-21 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.
---
 gcc/config/s390/s390.c | 14 ++
 gcc/config/s390/s390.h |  2 ++
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..96c9a9db53b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int /* labelno */)
 {
   rtx op[8];
 
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
-
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
@@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno)
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
   op[7] = GEN_INT (UNITS_PER_LONG);
 
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
-
   op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
@@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
@@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines for nested functions.  */
 
-- 
2.31.1



[PATCH] IBM Z: Remove match_scratch workaround

2021-06-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



Since commit dd1ef00c45ba ("Fix bug in the define_subst handling that
made match_scratch unusable for multi-alternative patterns.") the
workaround for that bug in *ashrdi3_31 is not only no
longer necessary, but actually breaks the build.

Get rid of it by using only one alternative in (match_scratch).  It
will be replicated as many times as needed in order to match the
pattern with which (define_subst) is used.

gcc/ChangeLog:

* config/s390/s390.md(*ashrdi3_31): Use a single
constraint.
* config/s390/subst.md(cconly_subst): Use a single constraint
in (match_scratch).

gcc/testsuite/ChangeLog:

* gcc.target/s390/ashr.c: New test.
---
 gcc/config/s390/s390.md  | 14 --
 gcc/config/s390/subst.md |  2 +-
 gcc/testsuite/gcc.target/s390/ashr.c | 11 +++
 3 files changed, 16 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ashr.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 7faf775fbf2..0c5b4dc9029 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -9328,19 +9328,13 @@
   ""
   "")
 
-; FIXME: The number of alternatives is doubled here to match the fix
-; number of 2 in the subst pattern for the (clobber (match_scratch...
-; The right fix should be to support match_scratch in the output
-; pattern of a define_subst.
 (define_insn "*ashrdi3_31"
-  [(set (match_operand:DI 0 "register_operand"   "=d, d")
-(ashiftrt:DI (match_operand:DI 1 "register_operand"   "0, 0")
- (match_operand:QI 2 "shift_count_operand" "jsc,jsc")))
+  [(set (match_operand:DI 0 "register_operand"   "=d")
+(ashiftrt:DI (match_operand:DI 1 "register_operand"   "0")
+ (match_operand:QI 2 "shift_count_operand" "jsc")))
(clobber (reg:CC CC_REGNUM))]
   "!TARGET_ZARCH"
-  "@
-   srda\t%0,%Y2
-   srda\t%0,%Y2"
+  "srda\t%0,%Y2"
   [(set_attr "op_type" "RS")
(set_attr "atype"   "reg")])
 
diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md
index 384af11c198..3ea6fc40ba8 100644
--- a/gcc/config/s390/subst.md
+++ b/gcc/config/s390/subst.md
@@ -45,7 +45,7 @@
   "s390_match_ccmode(insn, CCSmode)"
   [(set (reg CC_REGNUM)
(compare (match_dup 1) (const_int 0)))
-   (clobber (match_scratch:DSI 0 "=d,d"))])
+   (clobber (match_scratch:DSI 0 "=d"))])
 
 (define_subst_attr "cconly" "cconly_subst" "" "_cconly")
 
diff --git a/gcc/testsuite/gcc.target/s390/ashr.c 
b/gcc/testsuite/gcc.target/s390/ashr.c
new file mode 100644
index 000..8cffdfa9a1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ashr.c
@@ -0,0 +1,11 @@
+/* Test the arithmetic shift right pattern.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int e(void);
+
+int f (long c, int b)
+{
+  return (c >> b) && e ();
+}
-- 
2.31.1



Re: [PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-05-03 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2021-04-30 at 08:49 +0200, Andreas Krebbel wrote:
> On 4/28/21 3:48 AM, Ilya Leoshkevich wrote:
> > Bootstrapped and regtested on s390x-redhat-linux.  Tested with
> > valgrind
> > too (PR 100278 is now fixed).  Ok for master?
> > 
> > v1:
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html
> > v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is
> > more
> >   on the "obviously correct" side than gen_raw_SUBREG().
> > 
> > 
> > 
> > gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard
> > registers,
> > since the subregs they create do not pass validation.  Change
> > s390_md_asm_adjust() to manually copy between hard VRs and FPRs
> > instead
> > of using these two functions.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/100217
> > * config/s390/s390.c (s390_hard_fp_reg_p): New function.
> > (s390_md_asm_adjust): Handle hard registers.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/100217
> > * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-
> > reg.c: New test.
> > * gcc.target/s390/vector/long-double-asm-inout-hard-fp-
> > reg.c: New test.
> 
> Ok. Thanks!
> 
> Andreas

Thanks!

I forgot to ask: ok for gcc-11 branch?



[PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-04-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Tested with valgrind
too (PR 100278 is now fixed).  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html
v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is more
  on the "obviously correct" side than gen_raw_SUBREG().



gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers,
since the subregs they create do not pass validation.  Change
s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead
of using these two functions.

gcc/ChangeLog:

PR target/100217
* config/s390/s390.c (s390_hard_fp_reg_p): New function.
(s390_md_asm_adjust): Handle hard registers.

gcc/testsuite/ChangeLog:

PR target/100217
* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test.
---
 gcc/config/s390/s390.c| 52 +--
 .../long-double-asm-in-out-hard-fp-reg.c  | 33 
 .../long-double-asm-inout-hard-fp-reg.c   | 31 +++
 3 files changed, 112 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a9c945c5ee9..88361f98c7e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint)
   return seen_f_p && !seen_v_p;
 }
 
+/* Return TRUE iff X is a hard floating-point (and not a vector) register.  */
+
+static bool
+s390_hard_fp_reg_p (rtx x)
+{
+  if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x)))
+return false;
+
+  tree decl = REG_EXPR (x);
+  if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl)))
+return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  return name[0] == '*' && name[1] == 'f';
+}
+
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
constraints when long doubles are stored in vector registers.  */
 
@@ -16787,9 +16804,24 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
   gcc_assert (allows_reg);
   gcc_assert (!is_inout);
   /* Copy output value from a FPR pair into a vector register.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  rtx fprx2;
   push_to_sequence2 (after_md_seq, after_md_end);
-  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  if (s390_hard_fp_reg_p (outputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i]));
+ /* The first half is already at the correct location, copy only the
+  * second one.  Use the UNSPEC pattern instead of the SUBREG one,
+  * since s390_can_change_mode_class() rejects
+  * (subreg:DF (reg:TF %fN) 8) and thus subreg validation fails.  */
+ rtx v1 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]));
+ rtx v3 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]) + 1);
+ emit_insn (gen_vec_permiv2df (v1, v1, v3, const0_rtx));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+   }
   after_md_seq = get_insns ();
   after_md_end = get_last_insn ();
   end_sequence ();
@@ -16813,8 +16845,20 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
continue;
   gcc_assert (allows_reg);
   /* Copy input value from a vector register into a FPR pair.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
-  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  rtx fprx2;
+  if (s390_hard_fp_reg_p (inputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i]));
+ /* Copy only the second half.  */
+ rtx v1 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]) + 1);
+ rtx v2 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]));
+ emit_insn (gen_vec_permiv2df (v1, v2, v1, GEN_INT (3)));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+   }
   inputs[i] = fprx2;
   input_modes[i] = FPRX2mode;
 }
diff --git 
a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
new file mode 100644
index 000..2dcaf08f00b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
+/* { dg-do run { target { s390_z14_hw } } } */
+#include 
+#include 
+
+__attribute__ ((noipa)) static long double
+sqxbr (long double x)
+{
+  register long double in asm("f0") = x;
+  register long double out asm("f1");
+
+  asm("sqxbr\t%0,%1" :

[PATCH] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-04-26 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Tested with valgrind
on top of 52a5515ed (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278).  Ok for master?



gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers,
since the subregs they create do not pass validation.  Change
s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead
of using these two functions.

gcc/ChangeLog:

PR target/100217
* config/s390/s390.c (s390_hard_fp_reg_p): New function.
(s390_md_asm_adjust): Handle hard registers.
* config/s390/vector.md (*df_to_tf_1): New pattern.

gcc/testsuite/ChangeLog:

PR target/100217
* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test.
---
 gcc/config/s390/s390.c| 50 +--
 gcc/config/s390/vector.md |  8 +++
 .../long-double-asm-in-out-hard-fp-reg.c  | 28 +++
 .../long-double-asm-inout-hard-fp-reg.c   | 27 ++
 4 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a9c945c5ee9..ed6cea9b1f7 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint)
   return seen_f_p && !seen_v_p;
 }
 
+/* Return TRUE iff X is a hard floating-point (and not a vector) register.  */
+
+static bool
+s390_hard_fp_reg_p (rtx x)
+{
+  if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x)))
+return false;
+
+  tree decl = REG_EXPR (x);
+  if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl)))
+return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  return name[0] == '*' && name[1] == 'f';
+}
+
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
constraints when long doubles are stored in vector registers.  */
 
@@ -16787,9 +16804,23 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
   gcc_assert (allows_reg);
   gcc_assert (!is_inout);
   /* Copy output value from a FPR pair into a vector register.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  rtx fprx2;
   push_to_sequence2 (after_md_seq, after_md_end);
-  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  if (s390_hard_fp_reg_p (outputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i]));
+ /* The first half is already at the correct location, copy only the
+  * second one.  Use gen_rtx_raw_SUBREG() in order to skip subreg
+  * validation - we need to build (subreg:DF (reg:TF %fN) 8), which
+  * will otherwise be rejected by s390_can_change_mode_class().  */
+ emit_move_insn (gen_rtx_raw_SUBREG (DFmode, outputs[i], 8),
+ simplify_gen_subreg (DFmode, fprx2, FPRX2mode, 8));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+   }
   after_md_seq = get_insns ();
   after_md_end = get_last_insn ();
   end_sequence ();
@@ -16813,8 +16844,19 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
continue;
   gcc_assert (allows_reg);
   /* Copy input value from a vector register into a FPR pair.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
-  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  rtx fprx2;
+  if (s390_hard_fp_reg_p (inputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i]));
+ /* Copy only the second half.  */
+ emit_move_insn (gen_rtx_raw_SUBREG (DFmode, fprx2, 8),
+ gen_rtx_raw_SUBREG (DFmode, inputs[i], 8));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+   }
   inputs[i] = fprx2;
   input_modes[i] = FPRX2mode;
 }
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300..648e00625e1 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -634,6 +634,14 @@
 }
   [(set_attr "op_type" "VRR,*")])
 
+(define_insn "*df_to_tf_1"
+  [(set (subreg:DF (match_operand:TF 0 "nonimmediate_operand" "+v") 8)
+   (match_operand:DF1 "general_operand"   "f"))]
+  "TARGET_VXE"
+  ; M4 == 0 corresponds to %v0[0] = %v0[0]; %v0[1] = %v1[0];
+  "vpdi\t%v0,%v0,%v1,0"
+  [(set_attr "op_type" "VRR")])
+
 (define_insn "*vec_ti_to_v1ti"
   [(set (match_operand:V1TI   0 "nonimmediate_operand" 
"=v,v,R,  v,  v,v")
(vec_duplicate:V1TI (match_operand:TI 1 "general_operand"   
"v,R,v,j00,jm1,d")))]
diff --git 
a/gcc/testsuite/gcc.tar

Re: [PATCH v3] fwprop: Fix single_use_p calculation

2021-03-23 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-03-23 at 12:48 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > +inline use_info *
> > +set_info::single_nondebug_use () const
> > +{
> > +  use_info *nondebug_insn = single_nondebug_insn_use ();
> > +  if (nondebug_insn)
> > +    return has_phi_uses () ? nullptr : nondebug_insn;
> > +  use_info *phi = single_phi_use ();
> > +  if (phi)
> > +    return has_nondebug_insn_uses() ? nullptr : phi;
> > +  return nullptr;
> 
> Very minor, but I think this is simpler as:
> 
>   if (!has_phi_uses ())
>     return single_nondebug_insn_use ();
>   if (!has_nondebug_insn_uses ())
>     return single_phi_use ();
>   return nullptr;
> 
> OK with that change (or without if you prefer the original).
> Thanks for the fix and for your patience. :-)
> 
> Richard

Retested with the change above and pushed as:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b61461ac7f9bdd0e98145be79423d19b933afaa0

Thanks for all the suggestions!

Best regards,
Ilya



[PATCH v3] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
Bootstrap and regtest running on x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html
v1 -> v2: Pass a set_info instead of a def_info around.
  Add single_nondebug_insn_use () - maybe this could be improved
  further? [1]
  Simplify def->insn ()->ebb ().
  Improve formatting.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567121.html
v2 -> v3: Introduce single_nondebug_use and single_phi_use methods.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html

---

Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at set_info's uses instead of insn_info's uses.  This
requires passing around set_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::fwprop_propagation): Look at
set_info's uses.
(try_fwprop_subst_note): Use set_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* rtl-ssa/accesses.h (set_info::single_nondebug_use) New
method.
(set_info::single_nondebug_insn_use): Likewise.
(set_info::single_phi_use): Likewise.
* rtl-ssa/member-fns.inl (set_info::single_nondebug_use) New
method.
(set_info::single_nondebug_insn_use): Likewise.
(set_info::single_phi_use): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
---
 gcc/fwprop.c  | 81 +--
 gcc/rtl-ssa/accesses.h| 13 +++
 gcc/rtl-ssa/member-fns.inl| 30 +++
 .../s390/vector/long-double-asm-abi.c | 26 ++
 4 files changed, 109 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..d7203672886 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, set_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,13 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   set_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def->single_nondebug_use ()),
+single_ebb_p (use_insn->ebb () == def->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,24 +368,25 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
-   If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
+   If REQUIRE_CONSTANT is true, require all substituted occurrences of SRC
to fold to a constant, so that the note does not use any more registers
than it did previously.  If REQUIRE_CONSTANT is false, also allow the
substitution if it's something we'd normally allow for the main
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, set_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -

Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Mon, 2021-03-22 at 22:55 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote:
> > > Ilya Leoshkevich  writes:
> > 
> > [...]
> > 
> > > > Do you still want me to add single_nondebug_use() for
> > > > completeness
> > > > in
> > > > this patch, or would it be better to add it later when it's
> > > > actually
> > > > needed?
> > > 
> > > I was thinking that the fwprop.c code would use
> > > def->single_nondebug_use () instead of
> > > def->single_nondebug_insn_use () && !def->has_phi_uses ().
> > 
> > But these two are not equivalent, are they?  single_nondebug_use()
> > that you proposed explicitly allows phis:
> > 
> >   // If there is exactly one nondebug use of the set's result,
> >   // return that use, otherwise return null.  The use might be in
> >   // instruction or a phi node.
> >   use_info *single_nondebug_use () const;
> > 
> > but I don't think we want to propagate into phis here.
> > Or should the check be a bit bigger, like the following?
> 
> But we're in the process of substituting the definition into an
> insn use.  So we know that an insn use exists.  I think the
> question we're trying to answer is: is this insn use the only
> nondebug use?  I'd rather test that with a single accessor rather
> than break it down into individual data structure tests.

Ah, you are absolutely right - now I get it.  Please ignore the v2
then, I will send a v3.



[PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html
v1 -> v2: Pass a set_info instead of a def_info around.
  Add single_nondebug_insn_use () - maybe this could be improved
  further? [1]
  Simplify def->insn ()->ebb ().
  Improve formatting.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html

---

Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at set_info's uses instead of insn_info's uses.  This
requires passing around set_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::fwprop_propagation): Look at
set_info's uses.
(try_fwprop_subst_note): Use set_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* rtl-ssa/accesses.h (set_info::single_nondebug_insn_use): New
method.
* rtl-ssa/member-fns.inl (set_info::single_nondebug_insn_use):
Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
---
 gcc/fwprop.c  | 79 +--
 gcc/rtl-ssa/accesses.h|  4 +
 gcc/rtl-ssa/member-fns.inl|  9 +++
 .../s390/vector/long-double-asm-abi.c | 26 ++
 4 files changed, 78 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..6173c9248eb 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, set_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,13 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   set_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def->single_nondebug_insn_use () && !def->has_phi_uses ()),
+single_ebb_p (use_insn->ebb () == def->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,9 +368,9 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
to fold to a constant, so that the note does not use any more registers
@@ -379,13 +379,14 @@ contains_paradoxical_subreg_p (rtx x)
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, set_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -436,19 +437,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   return prop.num_replacements;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of
+/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of
USE_INSN's pattern.  Return true on success, otherwise leave USE_INSN
unchanged.  */
 
 static bool
 try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change,
-   

Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:

[...]

> > Do you still want me to add single_nondebug_use() for completeness
> > in
> > this patch, or would it be better to add it later when it's
> > actually
> > needed?
> 
> I was thinking that the fwprop.c code would use
> def->single_nondebug_use () instead of
> def->single_nondebug_insn_use () && !def->has_phi_uses ().

But these two are not equivalent, are they?  single_nondebug_use()
that you proposed explicitly allows phis:

  // If there is exactly one nondebug use of the set's result,
  // return that use, otherwise return null.  The use might be in
  // instruction or a phi node.
  use_info *single_nondebug_use () const;

but I don't think we want to propagate into phis here.
Or should the check be a bit bigger, like the following?

use_info *single = def->single_nondebug_use ();
single_use_p = single && !single->is_in_phi ();


[...]

Best regards,
Ilya



Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Sun, 2021-03-21 at 13:19 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-
> > linux
> > and s390x-redhat-linux.  Ok for master?
> 
> Given what was said downthread, I agree we should fix this for GCC
> 11.
> Sorry for missing this problem in the initial review.
> 
> > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem))
> > simplifications")
> > introduced a check that was supposed to look at the propagated
> > def's
> > number of uses.  It uses insn_info::num_uses (), which in reality
> > returns the number of uses def's insn has.  The whole change
> > therefore
> > works only by accident.
> > 
> > Fix by looking at def_info's uses instead of insn_info's uses. 
> > This
> > requires passing around def_info instead of insn_info.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-03-02  Ilya Leoshkevich  
> > 
> > * fwprop.c (def_has_single_use_p): New function.
> > (fwprop_propagation::fwprop_propagation): Look at
> > def_info's uses.
> > (try_fwprop_subst_note): Use def_info instead of insn_info.
> > (try_fwprop_subst_pattern): Likewise.
> > (try_fwprop_subst_notes): Likewise.
> > (try_fwprop_subst): Likewise.
> > (forward_propagate_subreg): Likewise.
> > (forward_propagate_and_simplify): Likewise.
> > (forward_propagate_into): Likewise.
> > * iterator-utils.h (single_element_p): New function.
> > ---
> >  gcc/fwprop.c | 89 ++--
> > 
> >  gcc/iterator-utils.h | 10 +
> >  2 files changed, 62 insertions(+), 37 deletions(-)
> > 
> > diff --git a/gcc/fwprop.c b/gcc/fwprop.c
> > index 4b8a554e823..478dcdd96cc 100644
> > --- a/gcc/fwprop.c
> > +++ b/gcc/fwprop.c
> > @@ -175,7 +175,7 @@ namespace
> >  static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
> >  static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
> >  
> > -    fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
> > +    fwprop_propagation (insn_info *, def_info *, rtx, rtx);
> 
> use->def () returns a set_info *, and since you want set_info stuff,
> I think it would probably be better to pass around a set_info *
> instead.
> (Let's keep the variable names the same though.  “def” is still
> accurate
> and IMO the natural choice.)
> 
> > @@ -191,13 +191,27 @@ namespace
> >    };
> >  }
> >  
> > -/* Prepare to replace FROM with TO in INSN.  */
> > +/* Return true if DEF has a single non-debug non-phi use.  */
> > +
> > +static bool
> > +def_has_single_use_p (def_info *def)
> > +{
> > +  if (!is_a (def))
> > +    return false;
> > +
> > +  set_info *set = as_a (def);
> > +
> > +  return single_element_p (set->nondebug_insn_uses ())
> > +    && !set->has_phi_uses ();
> 
> I think instead we should add:
> 
>   // If exactly one nondebug instruction uses the set's result,
> return
>   // the use by that instruction, otherwise return null.
>   use_info *single_nondebug_insn_use () const;
> 
>   // If there is exactly one nondebug use of the set's result,
>   // return that use, otherwise return null.  The use might be in
>   // instruction or a phi node.
>   use_info *single_nondebug_use () const;
> 
> before the declaration of set_info::is_local_to_ebb.
> 
> > +}
> > +
> > +/* Prepare to replace FROM with TO in USE_INSN.  */
> >  
> >  fwprop_propagation::fwprop_propagation (insn_info *use_insn,
> > -   insn_info *def_insn, rtx
> > from, rtx to)
> > +   def_info *def, rtx from,
> > rtx to)
> >    : insn_propagation (use_insn->rtl (), from, to),
> > -    single_use_p (def_insn->num_uses () == 1),
> > -    single_ebb_p (use_insn->ebb () == def_insn->ebb ())
> > +    single_use_p (def_has_single_use_p (def)),
> > +    single_ebb_p (use_insn->ebb () == def->insn ()->ebb ())
> 
> Just def->ebb ()
> 
> > @@ -538,7 +554,7 @@ try_fwprop_subst_pattern (obstack_watermark
> > &attempt, insn_change &use_change,
> >  {
> >    if ((REG_NOTE_KIND (note) == REG_EQUAL
> >    || REG_NOTE_KIND (note) == REG_EQUIV)
> > - && try_fwprop_subst_note (use_insn, def_insn, note,
> > + && try_fwprop_subst_note (use_insn, def, note,
> >     dest, src, false) < 0)
> 
> Very minor, sorry, but this now fits on one line.
> 
> > @@ -584,10 +600,11 @@ try_fwprop_subst_notes (insn_info *use_insn,
> > insn_info *def_insn,
> >     Return true on success, otherwise leave USE_INSN unchanged.  */
> >  
> >  static bool
> > -try_fwprop_subst (use_info *use, insn_info *def_insn,
> > +try_fwprop_subst (use_info *use, def_info *def,
> >   rtx *loc, rtx dest, rtx src)
> 
> Same here.
> 
> Thanks,
> Richard

Thanks for reviewing!  I'm currently regtesting a v2.

One thing though: I don't think we need single_nondebug_use() for this
fix, only single_nondebug_insn_use() - the fwprop check that I'm now
using is def->si

[PATCH] IBM Z: Fix "+fvm" constraint with long doubles

2021-03-15 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



When a long double is passed to an asm statement with a "+fvm"
constraint, a LRA loop occurs.  This happens, because LRA chooses the
widest register class in this case (VEC_REGS), but the code generated
by s390_md_asm_adjust() always wants FP_REGS.  Mismatching register
classes cause infinite reloading.

Fix by treating "fv" constraints as "v" in s390_md_asm_adjust().

gcc/ChangeLog:

* config/s390/s390.c (f_constraint_p): Treat "fv" constraints
as "v".

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test.
---
 gcc/config/s390/s390.c   | 12 ++--
 .../s390/vector/long-double-asm-fprvrmem.c   | 11 +++
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 151136bedbc..f7b1c03561e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode)
 static bool
 f_constraint_p (const char *constraint)
 {
+  bool seen_f_p = false;
+  bool seen_v_p = false;
+
   for (size_t i = 0, c_len = strlen (constraint); i < c_len;
i += CONSTRAINT_LEN (constraint[i], constraint + i))
 {
   if (constraint[i] == 'f')
-   return true;
+   seen_f_p = true;
+  if (constraint[i] == 'v')
+   seen_v_p = true;
 }
-  return false;
+
+  /* Treat "fv" constraints as "v", because LRA will choose the widest register
+   * class.  */
+  return seen_f_p && !seen_v_p;
 }
 
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
new file mode 100644
index 000..f95656c5723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzarch" } */
+
+long double
+foo (long double x)
+{
+  x = x * x;
+  asm("# %0" : "+fvm"(x));
+  x = x + x;
+  return x;
+}
-- 
2.29.2



[PATCH v3] IBM Z: Fix usage of "f" constraint with long doubles

2021-03-04 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html
v1 -> v2:
- Handle constraint modifiers, use AR constraint instead of R, add
  testcases for & and %.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
v2 -> v3:
- The main prereq is now committed:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566237.html
- Dropped long-double-asm-abi.c test, because its prereq is not
  approved (yet):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566218.html
- Removed superfluous constraint pointer increment.



After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (f_constraint_p): New function.
(s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-commutative.c: New
test.
* gcc.target/s390/vector/long-double-asm-earlyclobber.c: New
test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-asm-matching.c: New test.
* gcc.target/s390/vector/long-double-asm-regmem.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 86 +++
 .../s390/vector/long-double-asm-commutative.c | 16 
 .../vector/long-double-asm-earlyclobber.c | 17 
 .../s390/vector/long-double-asm-in-out.c  | 14 +++
 .../s390/vector/long-double-asm-inout.c   | 14 +++
 .../s390/vector/long-double-asm-matching.c| 13 +++
 .../s390/vector/long-double-asm-regmem.c  |  8 ++
 .../vector/long-double-volatile-from-i64.c| 22 +
 8 files changed, 190 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-regmem.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f3d0d1ba596..68dc3c58c1b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16698,6 +16698,89 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional
+   modifiers.  */
+
+static bool
+f_constraint_p (const char *constraint)
+{
+  for (size_t i = 0, c_len = strlen (constraint); i < c_len;
+   i += CONSTRAINT_LEN (constraint[i], constraint + i))
+{
+  if (constraint[i] == 'f')
+   return true;
+}
+  return false;
+}
+
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (!f_constraint_p (constraint))
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 

Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 21:26 +0100, Ilya Leoshkevich via Gcc-patches
wrote:
> On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote:
> > 
> > 
> > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > > Hello,
> > > 
> > > I would like to ping the following patch:
> > > 
> > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
> > >  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html
> > > 
> > > It is needed for the following regression fix:
> > > 
> > > IBM Z: Fix usage of "f" constraint with long doubles
> > >  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
> > > 
> > > 
> > > Jakub, who would be the right person to review this change?  I've
> > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows
> > > that
> > > you deal with this code a lot.
> > > 
> > > Best regards,
> > > Ilya
> > > 
> > > 
> > > 
> > > 
> > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
> > > should be ok as long as the hook itself as well as after_md_seq
> > > make up
> > > for it), input_mode will contain stale information.
> > > 
> > > It might be tempting to fix this by removing input_mode altogether
> > > and
> > > just using GET_MODE (), but this will not work correctly with
> > > constants.
> > > So add input_modes parameter and document that it should be updated
> > > whenever inputs parameter is updated.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2021-01-05  Ilya Leoshkevich  
> > > 
> > > * cfgexpand.c (expand_asm_loc): Pass new parameter.
> > > (expand_asm_stmt): Likewise.
> > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add
> > > new
> > > parameter.
> > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
> > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
> > > * config/cris/cris.c (cris_md_asm_adjust): Likewise.
> > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise.
> > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust):
> > > Likewise.
> > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
> > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
> > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
> > > * config/vax/vax.c (vax_md_asm_adjust): Likewise.
> > > * config/visium/visium.c (visium_md_asm_adjust): Likewise.
> > > * target.def (md_asm_adjust): Likewise.
> > Ugh.    A couple questions
> > Are there any cases where you're going to want to change modes for
> > arguments that were constants?   I'm a bit surprised that we don't
> > have
> > a mode for constants for the cases that we care about.  Presumably we
> > can get a (modeless) CONST_INT here and we're not restricted to
> > CONST_DOUBLE and friends (which have modes).
> 
> Yes, this might happen.  For example, here:
> 
>     asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L));
> 
> the (const_double) and the corresponding operand will initially have 
> the mode TFmode.  s390_md_asm_adjust () will add a conversion from
> TFmode to FPRX2mode and change the argument accordingly.

Just to be more precise: the mode of the (const_double) itself will not
change.  Here is the resulting RTL for the asm statement above:

# s390_md_asm_adjust () step 1: put the (const_double) operand into a
# new (reg) with the same mode
(insn (set (reg:TF 63)
   (const_double:TF ...)))

# s390_md_asm_adjust () step 2: convert a reg from TFmode to FPRX2mode
(insn (set (reg:FPRX2 65)
   (subreg:FPRX2 (reg:TF 63) 0)))

# s390_md_asm_adjust () step 3: replace the original operand with the
# resulting (reg), adjust (asm_input) accordingly
(insn (set (reg:FPRX2 64)
   (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
   [(reg:FPRX2 65)]
   [(asm_input:FPRX2 ("f"))])))



Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote:
> 
> 
> On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > Hello,
> > 
> > I would like to ping the following patch:
> > 
> > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html
> > 
> > It is needed for the following regression fix:
> > 
> > IBM Z: Fix usage of "f" constraint with long doubles
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
> > 
> > 
> > Jakub, who would be the right person to review this change?  I've
> > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows
> > that
> > you deal with this code a lot.
> > 
> > Best regards,
> > Ilya
> > 
> > 
> > 
> > 
> > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
> > should be ok as long as the hook itself as well as after_md_seq
> > make up
> > for it), input_mode will contain stale information.
> > 
> > It might be tempting to fix this by removing input_mode altogether
> > and
> > just using GET_MODE (), but this will not work correctly with
> > constants.
> > So add input_modes parameter and document that it should be updated
> > whenever inputs parameter is updated.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-01-05  Ilya Leoshkevich  
> > 
> > * cfgexpand.c (expand_asm_loc): Pass new parameter.
> > (expand_asm_stmt): Likewise.
> > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add
> > new
> > parameter.
> > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
> > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
> > * config/cris/cris.c (cris_md_asm_adjust): Likewise.
> > * config/i386/i386.c (ix86_md_asm_adjust): Likewise.
> > * config/mn10300/mn10300.c (mn10300_md_asm_adjust):
> > Likewise.
> > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
> > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
> > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
> > * config/vax/vax.c (vax_md_asm_adjust): Likewise.
> > * config/visium/visium.c (visium_md_asm_adjust): Likewise.
> > * target.def (md_asm_adjust): Likewise.
> Ugh.    A couple questions
> Are there any cases where you're going to want to change modes for
> arguments that were constants?   I'm a bit surprised that we don't
> have
> a mode for constants for the cases that we care about.  Presumably we
> can get a (modeless) CONST_INT here and we're not restricted to
> CONST_DOUBLE and friends (which have modes).

Yes, this might happen.  For example, here:

asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L));

the (const_double) and the corresponding operand will initially have 
the mode TFmode.  s390_md_asm_adjust () will add a conversion from
TFmode to FPRX2mode and change the argument accordingly.

However, this is not the problematic case that I refer to in the
commit message:  I caught some failures in the testsuite that I
tracked down to (const_int)s, which, like you mentioned, don't have
a mode.

> Is input_modes read after the call to md_asm_adjust?   I'm trying to
> figure out why we'd need to update it.

Yes, its contents goes into (asm_operand)'s (asm_input). If we don't
adjust it, (asm_input)s will no longer be consistent with input operand
RTXes.

> Not acking or naking at this point, I just want to make sure I
> understand what's going on.
> 
> jeff



Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 11:34 -0700, Jeff Law wrote:
> 
> 
> On 3/2/21 3:37 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-
> > linux
> > and s390x-redhat-linux.  Ok for master?
> > 
> > 
> > 
> > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem))
> > simplifications")
> > introduced a check that was supposed to look at the propagated
> > def's
> > number of uses.  It uses insn_info::num_uses (), which in reality
> > returns the number of uses def's insn has.  The whole change
> > therefore
> > works only by accident.
> > 
> > Fix by looking at def_info's uses instead of insn_info's uses. 
> > This
> > requires passing around def_info instead of insn_info.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-03-02  Ilya Leoshkevich  
> > 
> > * fwprop.c (def_has_single_use_p): New function.
> > (fwprop_propagation::fwprop_propagation): Look at
> > def_info's uses.
> > (try_fwprop_subst_note): Use def_info instead of insn_info.
> > (try_fwprop_subst_pattern): Likewise.
> > (try_fwprop_subst_notes): Likewise.
> > (try_fwprop_subst): Likewise.
> > (forward_propagate_subreg): Likewise.
> > (forward_propagate_and_simplify): Likewise.
> > (forward_propagate_into): Likewise.
> > * iterator-utils.h (single_element_p): New function.
> Given we're well into stage4, I'd recommend deferring to gcc-12
> unless
> this fixes a code correctness issue.
> 
> Jeff
> 

Fortunately the issue here is not a miscompilation, but it's still
a regression: on s390 small functions that use long doubles get
a number of useless load/stores as well as a stack frame, where none
was required before.  Basically, the same issue efb6bc55a93a failed to
fully fix due to the num_uses() / nondebug_insn_uses() mixup.



Re: [PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 07:50 +0100, Andreas Krebbel wrote:
> On 3/2/21 11:59 PM, Ilya Leoshkevich wrote:
> > mul-signed-overflow-*.c execution tests fail on z13, because they
> > contain z14-specific instructions.  Fix by requiring s390_z14_hw
> > target.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/mul-signed-overflow-1.c: Run only on
> > z14+.
> > * gcc.target/s390/mul-signed-overflow-2.c: Likewise.
> 
> I did that change yesterday already.

Ah, I haven't noticed.  One difference between our patches is, though,
that I also have `dg-do compile` - this way, compile tests still run on
z13.

[...]



[PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html


Jakub, who would be the right person to review this change?  I've
decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows that
you deal with this code a lot.

Best regards,
Ilya




If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
should be ok as long as the hook itself as well as after_md_seq make up
for it), input_mode will contain stale information.

It might be tempting to fix this by removing input_mode altogether and
just using GET_MODE (), but this will not work correctly with constants.
So add input_modes parameter and document that it should be updated
whenever inputs parameter is updated.

gcc/ChangeLog:

2021-01-05  Ilya Leoshkevich  

* cfgexpand.c (expand_asm_loc): Pass new parameter.
(expand_asm_stmt): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new
parameter.
* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
* config/cris/cris.c (cris_md_asm_adjust): Likewise.
* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
* config/vax/vax.c (vax_md_asm_adjust): Likewise.
* config/visium/visium.c (visium_md_asm_adjust): Likewise.
* target.def (md_asm_adjust): Likewise.
---
 gcc/cfgexpand.c  | 16 
 gcc/config/arm/aarch-common-protos.h |  8 
 gcc/config/arm/aarch-common.c|  7 ---
 gcc/config/arm/arm.c | 14 --
 gcc/config/cris/cris.c   |  7 ---
 gcc/config/i386/i386.c   |  7 ---
 gcc/config/mn10300/mn10300.c |  7 ---
 gcc/config/nds32/nds32.c |  1 +
 gcc/config/pdp11/pdp11.c |  9 +
 gcc/config/rs6000/rs6000.c   |  7 ---
 gcc/config/vax/vax.c |  3 ++-
 gcc/config/visium/visium.c   | 12 +++-
 gcc/doc/tm.texi  | 10 ++
 gcc/target.def   | 13 -
 14 files changed, 69 insertions(+), 52 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index aef9e916fcd..a6b48d3e48f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2880,6 +2880,7 @@ expand_asm_loc (tree string, int vol, location_t locus)
   rtx asm_op, clob;
   unsigned i, nclobbers;
   auto_vec input_rvec, output_rvec;
+  auto_vec input_mode;
   auto_vec constraints;
   auto_vec clobber_rvec;
   HARD_REG_SET clobbered_regs;
@@ -2889,9 +2890,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
   clobber_rvec.safe_push (clob);
 
   if (targetm.md_asm_adjust)
-   targetm.md_asm_adjust (output_rvec, input_rvec,
-  constraints, clobber_rvec,
-  clobbered_regs);
+   targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+  constraints, clobber_rvec, clobbered_regs);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
@@ -3068,8 +3068,8 @@ expand_asm_stmt (gasm *stmt)
   return;
 }
 
-  /* There are some legacy diagnostics in here, and also avoids a
- sixth parameger to targetm.md_asm_adjust.  */
+  /* There are some legacy diagnostics in here, and also avoids an extra
+ parameter to targetm.md_asm_adjust.  */
   save_input_location s_i_l(locus);
 
   unsigned noutputs = gimple_asm_noutputs (stmt);
@@ -3420,9 +3420,9 @@ expand_asm_stmt (gasm *stmt)
  the flags register.  */
   rtx_insn *after_md_seq = NULL;
   if (targetm.md_asm_adjust)
-after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec,
- constraints, clobber_rvec,
- clobbered_regs);
+after_md_seq
+   = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+constraints, clobber_rvec, clobbered_regs);
 
   /* Do not allow the hook to change the output and input count,
  lest it mess up the operand numbering.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 7a9cf3d324c..b6171e8668d 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -144,9 +144,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;
 };
 
-rtx_insn *
-arm_md_as

[PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
mul-signed-overflow-*.c execution tests fail on z13, because they
contain z14-specific instructions.  Fix by requiring s390_z14_hw
target.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mul-signed-overflow-1.c: Run only on z14+.
* gcc.target/s390/mul-signed-overflow-2.c: Likewise.
---
 gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c | 3 ++-
 gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
index fdf56d6e695..e8b1938dab7 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
+/* { dg-do run { target { s390_z14_hw } } } */
 /* z14 only because we need msrkc, msc, msgrkc, msgc  */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 
diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
index d0088188aa2..01328e1d286 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
+/* { dg-do run { target { s390_z14_hw } } } */
 /* z14 only because we need msrkc, msc, msgrkc, msgc  */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 
-- 
2.29.2



[PATCH] fwprop: Fix single_use_p calculation

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?



Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at def_info's uses instead of insn_info's uses.  This
requires passing around def_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (def_has_single_use_p): New function.
(fwprop_propagation::fwprop_propagation): Look at
def_info's uses.
(try_fwprop_subst_note): Use def_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* iterator-utils.h (single_element_p): New function.
---
 gcc/fwprop.c | 89 ++--
 gcc/iterator-utils.h | 10 +
 2 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..478dcdd96cc 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, def_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,27 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Return true if DEF has a single non-debug non-phi use.  */
+
+static bool
+def_has_single_use_p (def_info *def)
+{
+  if (!is_a (def))
+return false;
+
+  set_info *set = as_a (def);
+
+  return single_element_p (set->nondebug_insn_uses ())
+&& !set->has_phi_uses ();
+}
+
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   def_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def_has_single_use_p (def)),
+single_ebb_p (use_insn->ebb () == def->insn ()->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,9 +382,9 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
to fold to a constant, so that the note does not use any more registers
@@ -379,13 +393,14 @@ contains_paradoxical_subreg_p (rtx x)
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, def_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -436,19 +451,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   return prop.num_replacements;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of
+/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of
USE_INSN's pattern.  Return true on success, otherwise leave USE_INSN
unchanged.  */
 
 static bool
 try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change,
- insn_info *def_insn, rtx *loc, rtx dest, rtx src)
+ def_info *def, rtx *loc, rtx dest, rtx src)
 {
   insn_info *use_insn = use_change.insn ();
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_f

[PATCH 2/2] IBM Z: Fix long double <-> DFP conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
When switching the s390 backend to store long doubles in vector
registers, the patterns for long double <-> DFP conversions were
forgotten.  This did not cause observable problems so far, because
libdfp calls are emitted instead of pfpo.  However, when building
libdfp itself, this leads to infinite recursion.

gcc/ChangeLog:

* config/s390/vector.md (trunctf2_vr): New
pattern.
(trunctf2): Likewise.
(trunctdtf2_vr): Likewise.
(trunctdtf2): Likewise.
(extendtf2_vr): Likewise.
(extendtf2): Likewise.
(extendtftd2_vr): Likewise.
(extendtftd2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-from-decimal128.c: New test.
* gcc.target/s390/vector/long-double-from-decimal32.c: New test.
* gcc.target/s390/vector/long-double-from-decimal64.c: New test.
* gcc.target/s390/vector/long-double-to-decimal128.c: New test.
* gcc.target/s390/vector/long-double-to-decimal32.c: New test.
* gcc.target/s390/vector/long-double-to-decimal64.c: New test.
---
 gcc/config/s390/vector.md | 72 +++
 .../s390/vector/long-double-from-decimal128.c | 20 ++
 .../s390/vector/long-double-from-decimal32.c  | 20 ++
 .../s390/vector/long-double-from-decimal64.c  | 20 ++
 .../s390/vector/long-double-to-decimal128.c   | 19 +
 .../s390/vector/long-double-to-decimal32.c| 19 +
 .../s390/vector/long-double-to-decimal64.c| 19 +
 7 files changed, 189 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index e48c965db00..bc52211c55e 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -2480,6 +2480,42 @@
   "HAVE_TF (trunctfsf2)"
   { EXPAND_TF (trunctfsf2, 2); })
 
+(define_expand "trunctf2_vr"
+  [(match_operand:DFP_ALL 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP
+   && GET_MODE_SIZE (TFmode) > GET_MODE_SIZE (mode)
+   && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, operands[1]));
+  emit_insn (gen_truncfprx22 (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "trunctf2"
+  [(match_operand:DFP_ALL 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (trunctf2)"
+  { EXPAND_TF (trunctf2, 2); })
+
+(define_expand "trunctdtf2_vr"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:TD 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_trunctdfprx22 (fprx2, operands[1]));
+  emit_insn (gen_fprx2_to_tf (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "trunctdtf2"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:TD 1 "nonimmediate_operand" "")]
+  "HAVE_TF (trunctdtf2)"
+  { EXPAND_TF (trunctdtf2, 2); })
+
 ; load lengthened
 
 (define_insn "extenddftf2_vr"
@@ -2511,6 +2547,42 @@
   "HAVE_TF (extendsftf2)"
   { EXPAND_TF (extendsftf2, 2); })
 
+(define_expand "extendtf2_vr"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:DFP_ALL 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP
+   && GET_MODE_SIZE (mode) < GET_MODE_SIZE (TFmode)
+   && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_extendfprx22 (fprx2, operands[1]));
+  emit_insn (gen_fprx2_to_tf (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "extendtf2"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:DFP_ALL 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extendtf2)"
+  { EXPAND_TF (extendtf2, 2); })
+
+(define_expand "extendtftd2_vr"
+  [(match_operand:TD 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, operands[1]));
+  emit_insn (gen_extendfprx2td2 (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "extendtftd2"
+  [(match_operand:TD 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extendtftd2)"
+  { EXPAND_TF (extendtftd2, 2); })
+
 ; test data class
 
 (define_expand "signbittf2_vr"
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
new file mode 100644
index 000..3cd2c68f5c6
--- /dev/null
+++ b/gcc/testsui

[PATCH 1/2] IBM Z: Improve FPRX2 <-> TF conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
gcc/ChangeLog:

* config/s390/vector.md (*fprx2_to_tf): Rename to fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.
---
 gcc/config/s390/vector.md | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 0e3c31f5d4f..e48c965db00 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -616,12 +616,23 @@
vlvgp\t%v0,%1,%N1"
   [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
 
-(define_insn "*fprx2_to_tf"
-  [(set (match_operand:TF   0 "nonimmediate_operand" "=v")
-   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f") 0))]
+(define_insn_and_split "fprx2_to_tf"
+  [(set (match_operand:TF   0 "nonimmediate_operand" "=v,AR")
+   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f,f") 0))]
   "TARGET_VXE"
-  "vmrhg\t%v0,%1,%N1"
-  [(set_attr "op_type" "VRR")])
+  "@
+   vmrhg\t%v0,%1,%N1
+   #"
+  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, 0);
+  operands[3] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 0);
+  operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, 8);
+  operands[5] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 8);
+}
+  [(set_attr "op_type" "VRR,*")])
 
 (define_insn "*vec_ti_to_v1ti"
   [(set (match_operand:V1TI   0 "nonimmediate_operand" 
"=v,v,R,  v,  v,v")
@@ -753,6 +764,21 @@
   "vpdi\t%V0,%v1,%V0,5"
   [(set_attr "op_type" "VRR")])
 
+(define_insn_and_split "tf_to_fprx2"
+  [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f")
+   (subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
+  "TARGET_VXE"
+  "#"
+  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0);
+  operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0);
+  operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8);
+  operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
+})
+
 ; vec_perm_const for V2DI using vpdi?
 
 ;;
-- 
2.29.2



[PATCH 0/2] IBM Z: Fix long double <-> DFP conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
This series fixes PR99134.  Patch 1 is factored out from the pending
[1], patch 2 is the actual fix.  Bootstrapped and regtested on
s390x-redhat-linux.  Ok for master?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html

Ilya Leoshkevich (2):
  IBM Z: Improve FPRX2 <-> TF conversions
  IBM Z: Fix long double <-> DFP conversions

 gcc/config/s390/vector.md | 108 +-
 .../s390/vector/long-double-from-decimal128.c |  20 
 .../s390/vector/long-double-from-decimal32.c  |  20 
 .../s390/vector/long-double-from-decimal64.c  |  20 
 .../s390/vector/long-double-to-decimal128.c   |  19 +++
 .../s390/vector/long-double-to-decimal32.c|  19 +++
 .../s390/vector/long-double-to-decimal64.c|  19 +++
 7 files changed, 220 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c

-- 
2.29.2



[PATCH] PING^2 Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-02-15 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

Best regards,
Ilya



[PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-01-28 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

lra: clear lra_insn_recog_data after simplifying a mem subreg
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html

Best regards,
Ilya



[PATCH v2] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

v1 -> v2: Handle constraint modifiers, use AR constraint instead of R,
add testcases for & and %.




After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (f_constraint_p): New function.
(s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
* gcc.target/s390/vector/long-double-asm-commutative.c: New
test.
* gcc.target/s390/vector/long-double-asm-earlyclobber.c: New
test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 88 +++
 gcc/config/s390/vector.md | 36 ++--
 .../s390/vector/long-double-asm-abi.c | 26 ++
 .../s390/vector/long-double-asm-commutative.c | 16 
 .../vector/long-double-asm-earlyclobber.c | 17 
 .../s390/vector/long-double-asm-in-out.c  | 14 +++
 .../s390/vector/long-double-asm-inout.c   | 14 +++
 .../s390/vector/long-double-asm-matching.c| 13 +++
 .../vector/long-double-volatile-from-i64.c| 22 +
 9 files changed, 241 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9d2cee950d0..d4b098325e8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16688,6 +16688,91 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional
+   modifiers.  */
+
+static bool
+f_constraint_p (const char *constraint)
+{
+  for (size_t i = 0, c_len = strlen (constraint); i < c_len;
+   i += CONSTRAINT_LEN (constraint[i], constraint + i))
+{
+  if (constraint[i] == 'f')
+   return true;
+}
+  return false;
+}
+
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (!f_constraint_p (constraint + 1))
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 (after_md_seq, after_md_end);
+  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  after_md_seq = get_insns ();
+  after_md_end = get_last_insn ();
+  end_sequence ();
+  outputs[i] = fprx2;
+}
+
+  for (unsigned i = 0; i < ninputs; i++)
+{
+  if (GET_MODE (inputs[i]) != TFmode)
+   /* Not a long double - not

Re: [PATCH] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-01-27 at 08:58 +0100, Andreas Krebbel wrote:
> On 1/18/21 10:54 PM, Ilya Leoshkevich wrote:
> ...
> 
> > +static rtx_insn *
> > +s390_md_asm_adjust (vec &outputs, vec &inputs,
> > +   vec &input_modes,
> > +   vec &constraints, vec &
> > /*clobbers*/,
> > +   HARD_REG_SET & /*clobbered_regs*/)
> > +{
> > +  if (!TARGET_VXE)
> > +/* Long doubles are stored in FPR pairs - nothing to do.  */
> > +return NULL;
> > +
> > +  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
> > +
> > +  unsigned ninputs = inputs.length ();
> > +  unsigned noutputs = outputs.length ();
> > +  for (unsigned i = 0; i < noutputs; i++)
> > +{
> > +  if (GET_MODE (outputs[i]) != TFmode)
> > +   /* Not a long double - nothing to do.  */
> > +   continue;
> > +  const char *constraint = constraints[i];
> > +  bool allows_mem, allows_reg, is_inout;
> > +  bool ok = parse_output_constraint (&constraint, i, ninputs,
> > noutputs,
> > +&allows_mem, &allows_reg,
> > &is_inout);
> > +  gcc_assert (ok);
> > +  if (strcmp (constraint, "=f") != 0)
> > +   /* Long double with a constraint other than "=f" - nothing to
> > do.  */
> > +   continue;
> 
> What about other constraint modifiers like & and %? Don't we need to
> handle matching constraints as
> well here?

Oh, right - we need to account for %?!*&# and maybe some others.  I'll
j
ust copy the code from parse_output_constraint() that skips over all
of
them, because I don't think they need any special handling - we just
nee
d to make sure they don't mess up the recognition of "=f".

I don't think we need to explicitly support matching constraints,
because parse_input_constraint() will resolve them for us.  I'll add
a test for this just in case.

Do we make use of multi-alternative constraints on s390?  I think not,
because our instructions are fairly rigid, but maybe I'm missing
something?

...

> > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > index 0e3c31f5d4f..1332a65a1d1 100644
> > --- a/gcc/config/s390/vector.md
> > +++ b/gcc/config/s390/vector.md
> > @@ -616,12 +616,23 @@ (define_insn "*vec_tf_to_v1tf_vr"
> > vlvgp\t%v0,%1,%N1"
> >[(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
> >  
> > -(define_insn "*fprx2_to_tf"
> > -  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v")
> > -   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f")
> > 0))]
> > +(define_insn_and_split "fprx2_to_tf"
> > +  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v,R")
> > +   (subreg:TF (match_operand:FPRX2 1
> > "general_operand"   "f,f") 0))]
> >"TARGET_VXE"
> > -  "vmrhg\t%v0,%1,%N1"
> > -  [(set_attr "op_type" "VRR")])
> > +  "@
> > +   vmrhg\t%v0,%1,%N1
> > +   #"
> > +  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
> > +  [(set (match_dup 2) (match_dup 3))
> > +   (set (match_dup 4) (match_dup 5))]
> > +{
> > +  operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 0);
> > +  operands[3] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 0);
> > +  operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 8);
> > +  operands[5] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 8);
> > +}
> > +  [(set_attr "op_type" "VRR,*")])
> 
> Splitting an address like this might cause the displacement to
> overflow in the second part. This
> would require an additional reg to make the address valid again.
> Which in turn will be a problem
> after reload. You can use the 'AR' constraint for the memory
> alternative. That way reload will make
> sure the address is offsetable.

Ok, thanks for the hint!



[PATCH v3] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote:
> Given what you said in the other message about combine, I agree this
> is a reasonable workaround.  I don't know whether it's suitable for
> stage 4 or whether it would need to wait for stage 1.

Thanks for reviewing!  I've implemented your suggestions in the patch
below.

Regarding stage 4, this can be seen as a part of IBM Z

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

regression fix - before moving long doubles to vector registers and
fixing up "f" constraints on RTL level, code generation for small
glibc functions like __ieee754_sqrtl has been fairly efficient.  Not
sure if that issue is big enough to justify this common code change at
this point, but still..



v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed
formatting.  Bootstrapped and regtested on x86_64-redhat-linux,
pc64le-redhat-linux and s390x-redhat-linux.




Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.
---
 gcc/fwprop.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..123cc228630 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (rtx_insn *, rtx, rtx);
+fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -185,13 +185,20 @@ namespace
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
 uint16_t classify_result (rtx, rtx);
+
+  private:
+const bool single_use_p;
+const bool single_ebb_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (insn_info *use_insn,
+   insn_info *def_insn, rtx from, rtx to)
+  : insn_propagation (use_insn->rtl (), from, to),
+single_use_p (def_insn->num_uses () == 1),
+single_ebb_p (use_insn->ebb () == def_insn->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications with the following
+ exceptions:
+ 1) Propagating (mem)s into multiple uses is not profitable.
+ 2) Propagating (mem)s across EBBs may not be profitable if the source EBB
+   runs less frequently.
+ 3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
+ 4) Creating new (mem/v)s is not correct, since DCE will not remove the old
+   ones.  */
+  if (single_use_p
+  && single_ebb_p
+  && SUBREG_P (old_rtx)
+  && !paradoxical_subreg_p (old_rtx)
+  && MEM_P (new_rtx)
+  && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2



Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote:
> Ilya Leoshkevich via Gcc-patches  writes:
> > On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote:
> > > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches
> > >  wrote:
> > > Suppose we have:
> > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
> > > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))
> > > > 
> > > > It is clearly profitable to propagate the first insn into the
> > > > second
> > > > one and get:
> > > > 
> > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))
> > > > 
> > > > fwprop actually manages to perform this, but doesn't think the
> > > > result is
> > > > worth it, which results in unnecessary store/load sequences on
> > > > s390.
> > > > Improve the situation by classifying SUBREG -> MEM changes as
> > > > profitable.
> > > 
> > > IIRC fwprop also propagates into multiple uses and replacing a
> > > non-
> > > MEM
> > > with a MEM is only good when the original MEM goes away - is that
> > > properly
> > > dealt with here?
> > 
> > This is because of efficiency and not correctness reasons,
> > right?  For
> > correctness I already check MEM_VOLATILE_P (new_rtx).  For
> > efficiency I
> > think it would be reasonable to add def_insn->num_uses () == 1
> > check
> > (this passes my tests, I'm yet to do a full regtest though).
> 
> That sounds plausible, but I think there's also the issue that the
> mem could be in a less frequently executed block.
> 
> A potential problem with checking num_uses is that it might make the
> boundary between fwprop and combine more fuzzy.  If the propagation
> makes the original instruction redundant then we should remove it
> and take the cost of the removal into account when costing the
> propagation (as combine does).  fwprop is instead set up for cases
> in which propagations are profitable even if the original instruction
> is kept.
> 
> What prevents combine from handling this?  Are the instructions in
> different blocks?

I wanted to do this before combine, because in __ieee754_sqrtl case
fwprop turns this (example from the commit message + the insn after
it):

(set (reg:TF 63) (mem:TF (reg:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg:TF 63) 0))
(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(reg:FPRX2 66)]
 [(asm_input:FPRX2 ("f"))]
 []))

into this:

(set (reg:TF 63) (mem:TF (reg:DI 62)))
(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(subreg:FPRX2 (reg:TF 63) 0)]
 [(asm_input:FPRX2 ("f"))]
 []))

by propagating (reg:FPRX2 66), and there is not much combine can do
about this anymore:

(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(mem:FPRX2 (reg:DI 62))]
 [(asm_input:FPRX2 ("f"))]
 []))

is not a valid insn.



[PATCH] PING Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-01-20 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

Best regards,
Ilya



[PATCH v2] fwprop: Allow (subreg (mem)) simplifications

2021-01-19 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html

v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses.

Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?



Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.
---
 gcc/fwprop.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..02d3d507cbc 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (rtx_insn *, rtx, rtx);
+fwprop_propagation (rtx_insn *, insn_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -185,13 +185,18 @@ namespace
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
 uint16_t classify_result (rtx, rtx);
+
+  private:
+const bool single_use_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (rtx_insn *insn, insn_info *def_insn,
+   rtx from, rtx to)
+: insn_propagation (insn, from, to),
+  single_use_p (def_insn->num_uses () == 1)
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +267,13 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications.  Do not allow propagation
+ of (mem)s into multiple uses, since those are not profitable, as well as
+ creating new (mem/v)s, since DCE will not remove the old ones.  */
+  if (single_use_p && SUBREG_P (old_rtx) && MEM_P (new_rtx)
+  && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +375,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_rtl, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +438,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_rtl, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2



Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-19 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote:
> On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches
>  wrote:
> > 
> Suppose we have:
> > 
> > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
> > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))
> > 
> > It is clearly profitable to propagate the first insn into the
> > second
> > one and get:
> > 
> > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))
> > 
> > fwprop actually manages to perform this, but doesn't think the
> > result is
> > worth it, which results in unnecessary store/load sequences on
> > s390.
> > Improve the situation by classifying SUBREG -> MEM changes as
> > profitable.
> 
> IIRC fwprop also propagates into multiple uses and replacing a non-
> MEM
> with a MEM is only good when the original MEM goes away - is that
> properly
> dealt with here?

This is because of efficiency and not correctness reasons, right?  For
c
orrectness I already check MEM_VOLATILE_P (new_rtx).  For efficiency I
t
hink it would be reasonable to add def_insn->num_uses () == 1 check
(thi
s passes my tests, I'm yet to do a full regtest though).  What do
you
think about this?



[PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-18 Thread Ilya Leoshkevich via Gcc-patches
Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  I realize it might be too late for a change
like this, but it's desirable to have this in conjunction with the
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html s390
regression fix, which otherwise produces unnecessary store/load
sequences in certain glibc routines, e.g. __ieee754_sqrtl.  Ok for
master?



Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.

gcc/testsuite/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-to-i64.c: Expect that
float-vector moves do *not* happen.
---
 gcc/fwprop.c  | 5 +
 gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..46b8ec7eccf 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -262,6 +262,11 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications.  However, do not allow
+ creating new (mem/v)s, since DCE will not remove the old ones.  */
+  if (SUBREG_P (old_rtx) && MEM_P (new_rtx) && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
index 2dbbb5d1c03..8f4e377ed72 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
@@ -10,8 +10,7 @@ long_double_to_i64 (long double x)
   return x;
 }
 
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */
+/* { dg-final { scan-assembler-not {\n\tvpdi\t} } } */
 /* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */
 
 int
-- 
2.26.2



[PATCH] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-18 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Depends on
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html;
ok for master once the dependency is committed?



After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (s390_md_asm_adjust): Implement
TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 73 +++
 gcc/config/s390/vector.md | 36 +++--
 .../s390/vector/long-double-asm-abi.c | 26 +++
 .../s390/vector/long-double-asm-in-out.c  | 14 
 .../s390/vector/long-double-asm-inout.c   | 14 
 .../vector/long-double-volatile-from-i64.c| 22 ++
 6 files changed, 180 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9d2cee950d0..a22fd9fe391 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16688,6 +16688,76 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (strcmp (constraint, "=f") != 0)
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 (after_md_seq, after_md_end);
+  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  after_md_seq = get_insns ();
+  after_md_end = get_last_insn ();
+  end_sequence ();
+  outputs[i] = fprx2;
+}
+
+  for (unsigned i = 0; i < ninputs; i++)
+{
+  if (GET_MODE (inputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[noutputs + i];
+  bool allows_mem, allows_reg;
+  bool ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0,
+   constraints.address (), &allows_mem,
+   &allows_reg);
+  gcc_assert (ok);
+  if (strcmp (constraint, "f") != 0 && strcmp (constraint, "=f") != 0)
+   /* Long double with a constraint other than "f" (or "=f" for inout
+  operands) - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  /* Copy input value from a vector register into a FPR pair.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  inputs[i] = fprx2;
+  input_modes[i] = FPRX2mode;
+}
+
+  return after_md_seq;
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ 

[PATCH] lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-01-13 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I ran into this problem when writing new patterns for s390.  I'm not
100% sure this fix is correct, but it resolves my issue and survives
bootstrap and regtest on x86_64-redhat-linux, ppc64le-redhat-linux and
s390x-redhat-linux.  Could you please take a look?

Best regards,
Ilya




Suppose we have:

(insn (set (reg:FPRX2 70) (subreg:FPRX2 (reg/v:TF 63) 0)))

where operand_loc[0] points to r70 and operand_loc[1] points to r63.
If r63 is spilled, remove_pseudos() will change this insn to:

  (insn (set (reg:FPRX2 70)
 (subreg:FPRX2 (mem/c:TF (plus:DI (reg:DI %fp)
  (const_int 144))

This is fine so far: rtx pointed to by operand_loc[1] has been changed
from (reg) to (mem), but its slot is still under (subreg).  However,
alter_subreg() will simplify this insn to:

  (insn (set (reg:FPRX2 70)
 (mem/c:FPRX2 (plus:DI (reg:DI %fp) (const_int 144)

The (subreg) is gone, and therefore operand_loc[1] is no longer valid.
This will prevent process_insn_for_elimination() from updating the spill
slot offset, causing miscompilation: different instructions will refer
to the same spill slot using different offsets.

Fix by clearing all the cached data, and not just used_insn_alternative.

gcc/ChangeLog:

2021-01-13  Ilya Leoshkevich  

* lra-spills.c (remove_pseudos): Call lra_update_insn_recog_data()
after calling alter_subreg() on a (mem).
---
 gcc/lra-spills.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 26f56b2df02..01bd82574e7 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -431,7 +431,7 @@ remove_pseudos (rtx *loc, rtx_insn *insn)
  alter_subreg (loc, false);
  if (GET_CODE (*loc) == MEM)
{
- lra_get_insn_recog_data (insn)->used_insn_alternative = -1;
+ lra_update_insn_recog_data (insn);
  if (lra_dump_file != NULL)
fprintf (lra_dump_file,
 "Memory subreg was simplified in insn #%u\n",
-- 
2.26.2



[PATCH] IBM Z: Fix constraints in vpdi patterns

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



The destination register is only partially overwritten, so + should be
used instead of =.

gcc/ChangeLog:

2021-01-08  Ilya Leoshkevich  

* config/s390/vector.md (*tf_to_fprx2_0): Rename from
*mov_tf_to_fprx2_0 for consistency, fix constraint.
(*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for
consistency, fix constraint.
---
 gcc/config/s390/vector.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 5b8d75f18f0..0e3c31f5d4f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -737,16 +737,16 @@ (define_insn "*vec_perm"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_0"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
+(define_insn "*tf_to_fprx2_0"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
   "TARGET_VXE"
   ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
   "vpdi\t%v0,%v1,%v0,1"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_1"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8)
+(define_insn "*tf_to_fprx2_1"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
   "TARGET_VXE"
   ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
-- 
2.26.2



[PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html
v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P.



Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on.c: New test.
---
 gcc/config/s390/s390-c.c  | 59 ---
 .../s390/vector/long-double-vx-macro-off-on.c | 11 
 .../s390/vector/long-double-vx-macro-on-off.c | 11 
 3 files changed, 60 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 95cd2df505d..a5f5f56311a 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
 /* Helper function that defines or undefines macros.  If SET is true, the macro
MACRO_DEF is defined.  If SET is false, the macro MACRO_UNDEF is undefined.
Nothing is done if SET and WAS_SET have the same value.  */
+template 
 static void
-s390_def_or_undef_macro (cpp_reader *pfile,
-unsigned int mask,
+s390_def_or_undef_macro (cpp_reader *pfile, F is_set,
 const struct cl_target_option *old_opts,
 const struct cl_target_option *new_opts,
 const char *macro_def, const char *macro_undef)
@@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile,
   bool was_set;
   bool set;
 
-  was_set = (!old_opts) ? false : old_opts->x_target_flags & mask;
-  set = new_opts->x_target_flags & mask;
+  was_set = (!old_opts) ? false : is_set (old_opts);
+  set = is_set (new_opts);
   if (was_set == set)
 return;
   if (set)
@@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile,
 cpp_undef (pfile, macro_undef);
 }
 
+struct target_flag_set_p
+{
+  target_flag_set_p (unsigned int mask) : m_mask (mask) {}
+
+  bool
+  operator() (const struct cl_target_option *opts) const
+  {
+return opts->x_target_flags & m_mask;
+  }
+
+  unsigned int m_mask;
+};
+
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
@@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
struct cl_target_option *opts,
const struct cl_target_option *old_opts)
 {
-  s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts,
-  "__HTM__", "__HTM__");
-  s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts,
-  "__VX__", "__VX__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__VEC__=10303", "__VEC__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__vector=__attribute__((vector_size(16)))",
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts,
+  opts, "__HTM__", "__HTM__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts,
+  opts, "__VX__", "__VX__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__VEC__=10303", "__VEC__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__vector=__attribute__((vector_size(16)))",
   "__vector__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__bool=__attribute__((s390_vector_bool)) unsigned",
-  "__bool");
+  s390_def_or_undef_macro (
+  pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts,
+  "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool");
   {
 char macro_def[64];
 gcc_assert (s390_arch != PROCESSOR_NATIVE);
@@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
 cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
+  s390_def_or_undef_macro (
+  pfile,
+  [] (const struct cl_target_option *opts) { return TARGET_VXE_P (opts); },
+  old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__");
 
   if (!flag_iso)
 {
-  s390_def_or_undef_macro (pfile,

[PATCH] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-07 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on.c: New test.
---
 gcc/config/s390/s390-c.c  | 59 ---
 .../s390/vector/long-double-vx-macro-off-on.c | 11 
 .../s390/vector/long-double-vx-macro-on-off.c | 11 
 3 files changed, 60 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 95cd2df505d..29b87d76ab1 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
 /* Helper function that defines or undefines macros.  If SET is true, the macro
MACRO_DEF is defined.  If SET is false, the macro MACRO_UNDEF is undefined.
Nothing is done if SET and WAS_SET have the same value.  */
+template 
 static void
-s390_def_or_undef_macro (cpp_reader *pfile,
-unsigned int mask,
+s390_def_or_undef_macro (cpp_reader *pfile, F is_set,
 const struct cl_target_option *old_opts,
 const struct cl_target_option *new_opts,
 const char *macro_def, const char *macro_undef)
@@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile,
   bool was_set;
   bool set;
 
-  was_set = (!old_opts) ? false : old_opts->x_target_flags & mask;
-  set = new_opts->x_target_flags & mask;
+  was_set = (!old_opts) ? false : is_set (old_opts);
+  set = is_set (new_opts);
   if (was_set == set)
 return;
   if (set)
@@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile,
 cpp_undef (pfile, macro_undef);
 }
 
+struct target_flag_set_p
+{
+  target_flag_set_p (unsigned int mask) : m_mask (mask) {}
+
+  bool
+  operator() (const struct cl_target_option *opts) const
+  {
+return opts->x_target_flags & m_mask;
+  }
+
+  unsigned int m_mask;
+};
+
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
@@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
struct cl_target_option *opts,
const struct cl_target_option *old_opts)
 {
-  s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts,
-  "__HTM__", "__HTM__");
-  s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts,
-  "__VX__", "__VX__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__VEC__=10303", "__VEC__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__vector=__attribute__((vector_size(16)))",
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts,
+  opts, "__HTM__", "__HTM__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts,
+  opts, "__VX__", "__VX__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__VEC__=10303", "__VEC__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__vector=__attribute__((vector_size(16)))",
   "__vector__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__bool=__attribute__((s390_vector_bool)) unsigned",
-  "__bool");
+  s390_def_or_undef_macro (
+  pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts,
+  "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool");
   {
 char macro_def[64];
 gcc_assert (s390_arch != PROCESSOR_NATIVE);
@@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
 cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
+  s390_def_or_undef_macro (
+  pfile,
+  [] (const struct cl_target_option *opts) { return TARGET_Z14_P (opts); },
+  old_opts, opts, "__LONG_DOUBLE_VX__", "__LONG_DOUBLE_VX__");
 
   if (!flag_iso)
 {
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
- 

[PATCH] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-01-05 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux.  I also built
cross-compilers for arm-linux-gnueabi, cris-elf mn10300-elf,
nds32-linux-gnu, pdp11-aout (didn't fully work due to
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg251887.html,
but the changed code compiled fine), powerpc-linux-gnu, vax-linux-gnu
and visium-elf, but didn't test them.  I ran into this issue while
implementing TARGET_MD_ASM_ADJUST for s390.  Ok for master?



If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
should be ok as long as the hook itself as well as after_md_seq make up
for it), input_mode will contain stale information.

It might be tempting to fix this by removing input_mode altogether and
just using GET_MODE (), but this will not work correctly with constants.
So add input_modes parameter and document that it should be updated
whenever inputs parameter is updated.

gcc/ChangeLog:

2021-01-05  Ilya Leoshkevich  

* cfgexpand.c (expand_asm_loc): Pass new parameter.
(expand_asm_stmt): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new
parameter.
* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
* config/cris/cris.c (cris_md_asm_adjust): Likewise.
* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
* config/vax/vax.c (vax_md_asm_adjust): Likewise.
* config/visium/visium.c (visium_md_asm_adjust): Likewise.
* target.def (md_asm_adjust): Likewise.
---
 gcc/cfgexpand.c  | 16 
 gcc/config/arm/aarch-common-protos.h |  8 
 gcc/config/arm/aarch-common.c|  7 ---
 gcc/config/arm/arm.c | 14 --
 gcc/config/cris/cris.c   |  7 ---
 gcc/config/i386/i386.c   |  7 ---
 gcc/config/mn10300/mn10300.c |  7 ---
 gcc/config/nds32/nds32.c |  1 +
 gcc/config/pdp11/pdp11.c |  9 +
 gcc/config/rs6000/rs6000.c   |  7 ---
 gcc/config/vax/vax.c |  3 ++-
 gcc/config/visium/visium.c   | 12 +++-
 gcc/doc/tm.texi  | 10 ++
 gcc/target.def   | 13 -
 14 files changed, 69 insertions(+), 52 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index b73019b241f..e25528261a0 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2879,6 +2879,7 @@ expand_asm_loc (tree string, int vol, location_t locus)
   rtx asm_op, clob;
   unsigned i, nclobbers;
   auto_vec input_rvec, output_rvec;
+  auto_vec input_mode;
   auto_vec constraints;
   auto_vec clobber_rvec;
   HARD_REG_SET clobbered_regs;
@@ -2888,9 +2889,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
   clobber_rvec.safe_push (clob);
 
   if (targetm.md_asm_adjust)
-   targetm.md_asm_adjust (output_rvec, input_rvec,
-  constraints, clobber_rvec,
-  clobbered_regs);
+   targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+  constraints, clobber_rvec, clobbered_regs);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
@@ -3067,8 +3067,8 @@ expand_asm_stmt (gasm *stmt)
   return;
 }
 
-  /* There are some legacy diagnostics in here, and also avoids a
- sixth parameger to targetm.md_asm_adjust.  */
+  /* There are some legacy diagnostics in here, and also avoids an extra
+ parameter to targetm.md_asm_adjust.  */
   save_input_location s_i_l(locus);
 
   unsigned noutputs = gimple_asm_noutputs (stmt);
@@ -3419,9 +3419,9 @@ expand_asm_stmt (gasm *stmt)
  the flags register.  */
   rtx_insn *after_md_seq = NULL;
   if (targetm.md_asm_adjust)
-after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec,
- constraints, clobber_rvec,
- clobbered_regs);
+after_md_seq
+   = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+constraints, clobber_rvec, clobbered_regs);
 
   /* Do not allow the hook to change the output and input count,
  lest it mess up the operand numbering.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 251de3d61a8..cbef50dde71 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -143,9 +143,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;
 };
 
-rtx_insn *
-arm_md_asm_adjust (vec &outputs, vec &/*inputs*/,
-   vec &constraints,
-

[PATCH] IBM Z: Fix check_effective_target_s390_z14_hw

2021-01-05 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on z14.  Ok for master?



Commit 2f473f4b065d ("IBM Z: Do not run long double tests on old
machines") introduced a predicate for tests that must run only on z14+.
However, due to a syntax error, the predicate always returns false.

gcc/testsuite/ChangeLog:

2020-12-10  Ilya Leoshkevich  

* gcc.target/s390/s390.exp: Replace %% with %.
---
 gcc/testsuite/gcc.target/s390/s390.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index ba493de9f95..57b2690f8ab 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -197,7 +197,7 @@ proc check_effective_target_s390_z14_hw { } {
int main (void)
{
int x = 0;
-   asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : );
+   asm ("msgrkc %0,%0,%0" : "+r" (x) : );
return x;
}
 }] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 }
-- 
2.26.2



[PATCH v2] aix: Fixinclude updates [PR98208]

2020-12-14 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2020-12-11 at 07:51 -0500, Nathan Sidwell wrote:
>
> I'm pretty sure this is wrong.  I think the test_text in
> inclhack.def
> should be a pre-fixed string that the testsuite presumably checks is
> converted.

You're right; I've added your change from the Bugzilla and updated the
expectation.  Does the following look better?



After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to
fail (at least on gcc121 machine).  Fix by updating fixincludes/tests
and rerunning genfixes.

Co-developed-by: Nathan Sidwell 

fixincludes/ChangeLog:

2020-12-11  Ilya Leoshkevich  

* fixincl.x: Rerun genfixes.
* inclhack.def(aix_physadr_t): Change test_text to something
that needs to be replaced.
* tests/base/sys/types.h(aix_physadr_t): Add expectation.
---
 fixincludes/fixincl.x  | 4 ++--
 fixincludes/inclhack.def   | 2 +-
 fixincludes/tests/base/sys/types.h | 5 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x
index 21439652bce..cc17edfba0b 100644
--- a/fixincludes/fixincl.x
+++ b/fixincludes/fixincl.x
@@ -2,11 +2,11 @@
  *
  * DO NOT EDIT THIS FILE   (fixincl.x)
  *
- * It has been AutoGen-ed  October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16
+ * It has been AutoGen-ed  December  9, 2020 at 11:16:08 AM by AutoGen 5.18.16
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020
+/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec  9 11:16:08 EST 2020
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 80c9adfb07c..3a4cfe06542 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -731,7 +731,7 @@ fix = {
 select= "typedef[ \t]*struct[ \t]*([{][^}]*[}][ \t]*\\*[ 
\t]*physadr_t;)";
 c_fix = format;
 c_fix_arg = "typedef struct __physadr_s %1";
-test_text = "typedef struct __physadr_s {";
+test_text = "typedef   struct { random stuff } *   physadr_t;";
 };
 
 /*
diff --git a/fixincludes/tests/base/sys/types.h 
b/fixincludes/tests/base/sys/types.h
index 683b5e93ecd..7340e76b175 100644
--- a/fixincludes/tests/base/sys/types.h
+++ b/fixincludes/tests/base/sys/types.h
@@ -9,6 +9,11 @@
 
 
 
+#if defined( AIX_PHYSADR_T_CHECK )
+typedef struct __physadr_s { random stuff } *  physadr_t;
+#endif  /* AIX_PHYSADR_T_CHECK */
+
+
 #if defined( GNU_TYPES_CHECK )
 #if !defined(_GCC_PTRDIFF_T)
 #define _GCC_PTRDIFF_T
-- 
2.25.4



[PATCH] aix: Fixinclude updates [PR98208]

2020-12-10 Thread Ilya Leoshkevich via Gcc-patches
Tested on gcc121 (x86_64 CentOS Linux 7).  Ok for master?



After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to
fail (at least on gcc121 machine).  Fix by updating fixincludes/tests
and rerunning genfixes.

fixincludes/ChangeLog:

2020-12-11  Ilya Leoshkevich  

* fixincl.x: Rerun genfixes.
* tests/base/sys/types.h: Add AIX_PHYSADR_T_CHECK.
---
 fixincludes/fixincl.x  | 4 ++--
 fixincludes/tests/base/sys/types.h | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x
index 21439652bce..cc17edfba0b 100644
--- a/fixincludes/fixincl.x
+++ b/fixincludes/fixincl.x
@@ -2,11 +2,11 @@
  *
  * DO NOT EDIT THIS FILE   (fixincl.x)
  *
- * It has been AutoGen-ed  October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16
+ * It has been AutoGen-ed  December  9, 2020 at 11:16:08 AM by AutoGen 5.18.16
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020
+/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec  9 11:16:08 EST 2020
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
diff --git a/fixincludes/tests/base/sys/types.h 
b/fixincludes/tests/base/sys/types.h
index 683b5e93ecd..a318f9b713b 100644
--- a/fixincludes/tests/base/sys/types.h
+++ b/fixincludes/tests/base/sys/types.h
@@ -9,6 +9,11 @@
 
 
 
+#if defined( AIX_PHYSADR_T_CHECK )
+typedef struct __physadr_s {
+#endif  /* AIX_PHYSADR_T_CHECK */
+
+
 #if defined( GNU_TYPES_CHECK )
 #if !defined(_GCC_PTRDIFF_T)
 #define _GCC_PTRDIFF_T
-- 
2.25.4



[PATCH] Limit perf data buffer during feature checking

2020-12-09 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux.  Ok for master?

Commit 2ead1ab91123 ("Limit perf data buffer during profiling") added
-m8 to perf invocations during running tests, but the same problem
exists for checking whether perf is working in the first place.

gcc/testsuite/ChangeLog:

2020-12-08  Ilya Leoshkevich  

* lib/target-supports.exp(check_profiling_available): Limit
perf data buffer.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 89c4f67554f..75b4f5d0e85 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -654,7 +654,7 @@ proc check_profiling_available { test_what } {
return 0
}
 global srcdir
-   set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" 
"true -v >/dev/null"]
+   set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" 
"-m8 true -v >/dev/null"]
if { [lindex $status 0] != 0 } {
verbose "autofdo not supported because perf does not work"
return 0
-- 
2.25.4



Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-12-08 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2020-07-09 at 14:07 +0200, Ilya Leoshkevich wrote:
> On Wed, 2020-07-01 at 21:48 +0200, Ilya Leoshkevich wrote:
> > On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote:
> > > On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc-
> > > patches
> > > wrote:
> > > > gcc/ChangeLog:
> > > > 
> > > > 2020-06-30  Ilya Leoshkevich  
> > > > 
> > > > * asan.c (asan_emit_stack_protection): Use
> > > > CODE_LABEL_BOUNDARY.
> > > > * defaults.h (CODE_LABEL_BOUNDARY): New macro.
> > > > * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
> > > > * doc/tm.texi.in: Likewise.
> > > Don't we already have the ability to set label alignments?  See
> > > LABEL_ALIGN.
> > 
> > The following works with -falign-labels=2:
> > 
> > --- a/gcc/asan.c
> > +++ b/gcc/asan.c
> > @@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx
> > pbase,
> > unsigned int alignb,
> >DECL_INITIAL (decl) = decl;
> >TREE_ASM_WRITTEN (decl) = 1;
> >TREE_ASM_WRITTEN (id) = 1;
> > -  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
> > +  SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) *
> > BITS_PER_UNIT);
> >emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (decl)));
> >shadow_base = expand_binop (Pmode, lshr_optab, base,
> >   gen_int_shift_amount (Pmode,
> > ASAN_SHADOW_SHIFT),
> > 
> > In order to go this way, we would need to raise `-falign-labels=`
> > default to 2 for s390, which is not incorrect, but would
> > unnecessarily
> > clutter asm with `.align 2` before each label.  So IMHO it would be
> > nicer to simply ask the backend "what is your target's instruction
> > alignment?".
> 
> Besides that it would clutter asm with .align 2, another argument
> against using LABEL_ALIGN here is that it's semantically different
> from
> what is needed: -falign-labels value, which it returns, is specified
> by
> user for optimization purposes, whereas here we need to query the
> architecture's property.
> 
> In practical terms, if user specifies -falign-labels=4096, this would
> affect how the code is generated here. However, this would be
> completely unnecessary: we never jump to decl, its address is only
> saved for reporting.

Hi Jeff,

Could you please have another look at this one?

Best regards,
Ilya



Re: [PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-12-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-12-02 at 11:42 -0700, Jeff Law wrote:
> 
> On 12/1/20 7:09 PM, Ilya Leoshkevich wrote:
> > On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote:
> > > No strong opinions.  I think whichever is less invasive in terms
> > > of
> > > code
> > > quality is probably the way to go.  What we want to avoid is
> > > suppressing
> > > threading unnecessarily as that often leads to false positives
> > > from
> > > middle-end based warnings.  Suppressing threading can also lead
> > > to
> > > build
> > > failures in the kernel due to the way they use b_c_p.
> > I think v1 is better then.  Would you mind approving the following?
> > That's the same code as in v1, but with the improved commit message
> > and
> > comments.
> > 
> > 
> > 
> > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c)
> > build
> > with GCC 10 fails on s390 with "impossible constraint".
> > 
> > Explanation by Jeff Law:
> > 
> > ```
> > So what we have is a b_c_p at the start of an if-else
> > chain.  Subsequent
> > tests on the "true" arm of the the b_c_p test may throw us off the
> > constant path (because the constants are out of range).  Once all
> > the
> > tests are passed (it's constant and the constant is in range) the
> > true
> > arm's terminal block has a special asm that requires a constant
> > argument.   In the case where we get to the terminal block on the
> > true
> > arm, the argument to the b_c_p is used as the constant argument to
> > the
> > special asm.
> > 
> > At first glace jump threading seems to be doing the right
> > thing.  Except
> > that we end up with two paths to that terminal block with the
> > special
> > asm, one for each of the two constant arguments to the b_c_p call.
> > Naturally since that same value is used in the asm, we have to
> > introduce
> > a PHI to select between them at the head of the terminal
> > block.   Now
> > the argument in the asm is no longer constant and boom we fail.
> > ```
> > 
> > Fix by disallowing __builtin_constant_p on threading paths.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-06-03  Ilya Leoshkevich  
> > 
> > * tree-ssa-threadbackward.c
> > (thread_jumps::profitable_jump_thread_path):
> > Do not allow __builtin_constant_p on a threading path.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-06-03  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/builtin-constant-p-threading.c: New test.
> OK.  I think the old forward threader has the same problem.  Which I
> think can be fixed by returning NULL from
> record_temporary_equivalences_from_stmts_at_dest when we see the
> B_C_P
> call.  Fixing that in the obvious way is pre-approved once it's gone
> through the usual testing.

Thanks!

I've committed both:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=70a62009181f66d1d1c90d3c74de38e153c96eb0
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=614aff0adf8fba5d843ec894603160151c20f0aa

Best regards,
Ilya



[PATCH] IBM Z: Build autovec-*-signaling-eq.c tests with exceptions

2020-12-02 Thread Ilya Leoshkevich via Gcc-patches
According to
https://gcc.gnu.org/pipermail/gcc/2020-November/234344.html, GCC is
allowed to perform optimizations that remove floating point traps,
since they do not affect the modeled control flow.  This interferes with
two signaling comparison tests, where (a <= b && a >= b) is turned into
(a <= b && a == b) by test_for_singularity, into ((a <= b) & (a == b))
by vectorizer and then into (a == b) eliminate_redundant_comparison.

Fix by making traps affect the control flow by turning them into
exceptions.

gcc/testsuite/ChangeLog:

2020-12-03  Ilya Leoshkevich  

* gcc.target/s390/zvector/autovec-double-signaling-eq.c: Build
with exceptions.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c:
Likewise.
---
 .../gcc.target/s390/zvector/autovec-double-signaling-eq.c   | 2 +-
 .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
index a8402b9f705..3645d3cc393 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
 
 #include "autovec.h"
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
index 7dd91a5e6f3..d98aa0c494e 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
 
 #include "autovec.h"
 
-- 
2.25.4



[PATCH] Fix division by 0 in printf_strlen_execute when dumping

2020-12-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrap ang regtest running on x86_64-redhat-linux.  Ok for master?

gcc/ChangeLog:

2020-12-03  Ilya Leoshkevich  

* tree-ssa-strlen.c (printf_strlen_execute): Avoid division by
0.
---
 gcc/tree-ssa-strlen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 741b47bca4a..522b2d45b3a 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -5684,7 +5684,7 @@ printf_strlen_execute (function *fun, bool warn_only)
   "  failures:  %u\n"
   "  max_depth: %u\n",
   nidxs,
-  (nused * 100) / nidxs,
+  nidxs == 0 ? 0 : (nused * 100) / nidxs,
   walker.ptr_qry.var_cache->access_refs.length (),
   walker.ptr_qry.hits, walker.ptr_qry.misses,
   walker.ptr_qry.failures, walker.ptr_qry.max_depth);
-- 
2.25.4



[PATCH v2] IBM Z: Use llihf and oilf to load large immediates into GPRs

2020-12-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/560822.html

v1 -> v2:
- Use SYMBOL_REF_P.
- Fix usage of gcc_assert.
- Use GEN_INT.



Currently GCC loads large immediates into GPRs from the literal pool,
which is not as efficient as loading two halves with llihf and oilf.

gcc/ChangeLog:

2020-11-30  Ilya Leoshkevich  

* config/s390/s390-protos.h (s390_const_int_pool_entry_p): New
function.
* config/s390/s390.c (s390_const_int_pool_entry_p): New
function.
* config/s390/s390.md: Add define_peephole2 that produces llihf
and oilf.

gcc/testsuite/ChangeLog:

2020-11-30  Ilya Leoshkevich  

* gcc.target/s390/load-imm64-1.c: New test.
* gcc.target/s390/load-imm64-2.c: New test.
---
 gcc/config/s390/s390-protos.h|  1 +
 gcc/config/s390/s390.c   | 31 
 gcc/config/s390/s390.md  | 23 +++
 gcc/testsuite/gcc.target/s390/load-imm64-1.c | 14 +
 gcc/testsuite/gcc.target/s390/load-imm64-2.c | 14 +
 5 files changed, 83 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index ad2f7f77c18..eb10c3f4bbb 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *, rtx *);
 extern void print_operand_address (FILE *, rtx);
 extern void print_operand (FILE *, rtx, int);
 extern void s390_output_pool_entry (rtx, machine_mode, unsigned int);
+extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *);
 extern int s390_label_align (rtx_insn *);
 extern int s390_agen_dep_p (rtx_insn *, rtx_insn *);
 extern rtx_insn *s390_load_got (void);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 02f18366aa1..fb48102559d 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp, machine_mode mode, 
unsigned int align)
 }
 }
 
+/* Return true if MEM refers to an integer constant in the literal pool.  If
+   VAL is not nullptr, then also fill it with the constant's value.  */
+
+bool
+s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val)
+{
+  /* Try to match the following:
+ - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)).
+ - (mem (symbol_ref)).  */
+
+  if (!MEM_P (mem))
+return false;
+
+  rtx addr = XEXP (mem, 0);
+  rtx sym;
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF)
+sym = XVECEXP (addr, 0, 0);
+  else
+sym = addr;
+
+  if (!SYMBOL_REF_P (sym) || !CONSTANT_POOL_ADDRESS_P (sym))
+return false;
+
+  rtx val_rtx = get_pool_constant (sym);
+  if (!CONST_INT_P (val_rtx))
+return false;
+
+  if (val != nullptr)
+*val = INTVAL (val_rtx);
+  return true;
+}
 
 /* Return an RTL expression representing the value of the return address
for the frame COUNT steps up from the current frame.  FRAME is the
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 910415a5974..d4cfbdf6732 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2116,6 +2116,29 @@ (define_peephole2
   [(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))]
   "")
 
+; Split loading of 64-bit constants into GPRs into llihf + oilf -
+; counterintuitively, using oilf is faster than iilf.  oilf clobbers
+; cc, so cc must be dead.
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand" "")
+   (match_operand:DI 1 "memory_operand" ""))]
+  "TARGET_64BIT
+   && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && s390_const_int_pool_entry_p (operands[1], nullptr)
+   && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))"
+  [(set (match_dup 0) (match_dup 2))
+   (parallel
+[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC CC_REGNUM))])]
+{
+  HOST_WIDE_INT val;
+  bool ok = s390_const_int_pool_entry_p (operands[1], &val);
+  gcc_assert (ok);
+  operands[2] = GEN_INT (val & 0xULL);
+  operands[3] = GEN_INT (val & 0xULL);
+})
+
 ;
 ; movsi instruction pattern(s).
 ;
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
new file mode 100644
index 000..03d17f59096
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
@@ -0,0 +1,14 @@
+/* Test that large 64-bit constants are loaded with llihf + oilf when lgrl is
+   not available.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z9-109" } */
+
+unsigned long
+magic (void)
+{
+  return 0x3f08c5392f756cd;
+}
+
+/* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\n\toilf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target

Re: [PATCH] IBM Z: Use llihf and oilf to load large immediates into GPRs

2020-12-02 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-12-02 at 08:15 +0100, Andreas Krebbel wrote:
> On 12/2/20 2:34 AM, Ilya Leoshkevich wrote:
> > Bootstrapped and regtesed on s390x-redhat-linux.  There are slight
> > improvements in all SPEC benchmarks, no regressions that could not
> > be
> > "fixed" by adding nops.  Ok for master?
> > 
> > 
> > 
> > Currently GCC loads large immediates into GPRs from the literal
> > pool,
> > which is not as efficient as loading two halves with llihf and
> > oilf.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-11-30  Ilya Leoshkevich  
> > 
> > * config/s390/s390-protos.h (s390_const_int_pool_entry_p): New
> > function.
> > * config/s390/s390.c (s390_const_int_pool_entry_p): New
> > function.
> > * config/s390/s390.md: Add define_peephole2 that produces llihf
> > and oilf.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-11-30  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/load-imm64-1.c: New test.
> > * gcc.target/s390/load-imm64-2.c: New test.
> > ---
> >  gcc/config/s390/s390-protos.h|  1 +
> >  gcc/config/s390/s390.c   | 31
> > 
> >  gcc/config/s390/s390.md  | 22 ++
> >  gcc/testsuite/gcc.target/s390/load-imm64-1.c | 10 +++
> >  gcc/testsuite/gcc.target/s390/load-imm64-2.c | 10 +++
> >  5 files changed, 74 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c
> > 
> > diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-
> > protos.h
> > index ad2f7f77c18..eb10c3f4bbb 100644
> > --- a/gcc/config/s390/s390-protos.h
> > +++ b/gcc/config/s390/s390-protos.h
> > @@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *,
> > rtx *);
> >  extern void print_operand_address (FILE *, rtx);
> >  extern void print_operand (FILE *, rtx, int);
> >  extern void s390_output_pool_entry (rtx, machine_mode, unsigned
> > int);
> > +extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *);
> >  extern int s390_label_align (rtx_insn *);
> >  extern int s390_agen_dep_p (rtx_insn *, rtx_insn *);
> >  extern rtx_insn *s390_load_got (void);
> > diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> > index 02f18366aa1..e3d68d3543b 100644
> > --- a/gcc/config/s390/s390.c
> > +++ b/gcc/config/s390/s390.c
> > @@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp,
> > machine_mode mode, unsigned int align)
> >  }
> >  }
> >  
> > +/* Return true if MEM refers to an integer constant in the literal
> > pool.  If
> > +   VAL is not nullptr, then also fill it with the constant's
> > value.  */
> > +
> > +bool
> > +s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val)
> > +{
> > +  /* Try to match the following:
> > + - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)).
> > + - (mem (symbol_ref)).  */
> > +
> > +  if (!MEM_P (mem))
> > +return false;
> > +
> > +  rtx addr = XEXP (mem, 0);
> > +  rtx sym;
> > +  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF)
> > +sym = XVECEXP (addr, 0, 0);
> > +  else
> > +sym = addr;
> > +
> > +  if (GET_CODE (sym) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P
> > (sym))
> !SYMBOL_REF_P (sym)

Ok.

> 
> > +return false;
> > +
> > +  rtx val_rtx = get_pool_constant (sym);
> > +  if (!CONST_INT_P (val_rtx))
> > +return false;
> > +
> > +  if (val != nullptr)
> > +*val = INTVAL (val_rtx);
> > +  return true;
> > +}
> Alternatively you probably could have returned the RTX instead and
> use gen_highpart / gen_lowpart in
> the peephole. But no need to change that.

I'll give it a try and see if the code looks better.

> 
> >  
> >  /* Return an RTL expression representing the value of the return
> > address
> > for the frame COUNT steps up from the current frame.  FRAME is
> > the
> > diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> > index 910415a5974..79e9a75ba2f 100644
> > --- a/gcc/config/s390/s390.md
> > +++ b/gcc/config/s390/s390.md
> > @@ -2116,6 +2116,28 @@ (define_peephole2
> >[(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))]
> >"")
> >  
> > +; Split loading of 64-bit constants into GPRs into llihf + oilf -
> > +; counterintuitively, using oilf is faster than iilf.  oilf
> > clobbers
> > +; cc, so cc must be dead.
> > +(define_peephole2
> > +  [(set (match_operand:DI 0 "register_operand" "")
> > +(match_operand:DI 1 "memory_operand" ""))]
> > +  "TARGET_64BIT
> > +   && TARGET_EXTIMM
> > +   && GENERAL_REG_P (operands[0])
> > +   && s390_const_int_pool_entry_p (operands[1], nullptr)
> > +   && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))"
> > +  [(set (match_dup 0) (match_dup 2))
> > +   (parallel
> > +[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3)))
> > + (clobber (reg:CC CC_REGNUM))])]
> > +{
> > +  HOST_WIDE_INT val;
> > +  gcc_assert (s390_const_int_pool_entry_p (operands[1], &val));
> 
> This probably breaks with checking dis

[PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-12-01 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote:
> 
> No strong opinions.  I think whichever is less invasive in terms of
> code
> quality is probably the way to go.  What we want to avoid is
> suppressing
> threading unnecessarily as that often leads to false positives from
> middle-end based warnings.  Suppressing threading can also lead to
> build
> failures in the kernel due to the way they use b_c_p.

I think v1 is better then.  Would you mind approving the following?
That's the same code as in v1, but with the improved commit message and
comments.



Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) build
with GCC 10 fails on s390 with "impossible constraint".

Explanation by Jeff Law:

```
So what we have is a b_c_p at the start of an if-else chain.  Subsequent
tests on the "true" arm of the the b_c_p test may throw us off the
constant path (because the constants are out of range).  Once all the
tests are passed (it's constant and the constant is in range) the true
arm's terminal block has a special asm that requires a constant
argument.   In the case where we get to the terminal block on the true
arm, the argument to the b_c_p is used as the constant argument to the
special asm.

At first glace jump threading seems to be doing the right thing.  Except
that we end up with two paths to that terminal block with the special
asm, one for each of the two constant arguments to the b_c_p call.
Naturally since that same value is used in the asm, we have to introduce
a PHI to select between them at the head of the terminal block.   Now
the argument in the asm is no longer constant and boom we fail.
```

Fix by disallowing __builtin_constant_p on threading paths.

gcc/ChangeLog:

2020-06-03  Ilya Leoshkevich  

* tree-ssa-threadbackward.c (thread_jumps::profitable_jump_thread_path):
Do not allow __builtin_constant_p on a threading path.

gcc/testsuite/ChangeLog:

2020-06-03  Ilya Leoshkevich  

* gcc.target/s390/builtin-constant-p-threading.c: New test.
---
 .../s390/builtin-constant-p-threading.c   | 46 +++
 gcc/tree-ssa-threadbackward.c |  7 ++-
 2 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c

diff --git a/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c 
b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c
new file mode 100644
index 000..5f0acdce0b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z196 -mzarch" } */
+
+typedef struct
+{
+  int counter;
+} atomic_t;
+
+static inline __attribute__ ((__gnu_inline__)) int
+__atomic_add (int val, int *ptr)
+{
+  int old;
+  asm volatile("laa %[old],%[val],%[ptr]\n"
+  : [old] "=d" (old), [ptr] "+Q"(*ptr)
+  : [val] "d" (val)
+  : "cc", "memory");
+  return old;
+}
+
+static inline __attribute__ ((__gnu_inline__)) void
+__atomic_add_const (int val, int *ptr)
+{
+  asm volatile("asi %[ptr],%[val]\n"
+  : [ptr] "+Q" (*ptr)
+  : [val] "i" (val)
+  : "cc", "memory");
+}
+
+static inline __attribute__ ((__gnu_inline__)) void
+atomic_add (int i, atomic_t *v)
+{
+  if (__builtin_constant_p (i) && (i > -129) && (i < 128))
+{
+  __atomic_add_const (i, &v->counter);
+  return;
+}
+  __atomic_add (i, &v->counter);
+}
+
+static atomic_t num_active_cpus = { (0) };
+
+void
+ledtrig_cpu (_Bool is_active)
+{
+  atomic_add (is_active ? 1 : -1, &num_active_cpus);
+}
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 327628f1662..30f692672d9 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -259,8 +259,13 @@ thread_jumps::profitable_jump_thread_path (basic_block 
bbi, tree name,
   !gsi_end_p (gsi);
   gsi_next_nondebug (&gsi))
{
+ /* Do not allow OpenACC loop markers and __builtin_constant_p on
+threading paths.  The latter is disallowed, because an
+expression might be constant on two threading paths, and
+become non-constant (i.e.: phi) when they merge.  */
  gimple *stmt = gsi_stmt (gsi);
- if (gimple_call_internal_p (stmt, IFN_UNIQUE))
+ if (gimple_call_internal_p (stmt, IFN_UNIQUE)
+ || gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P))
{
  m_path.pop ();
  return NULL;
-- 
2.25.4



[PATCH] IBM Z: Use llihf and oilf to load large immediates into GPRs

2020-12-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtesed on s390x-redhat-linux.  There are slight
improvements in all SPEC benchmarks, no regressions that could not be
"fixed" by adding nops.  Ok for master?



Currently GCC loads large immediates into GPRs from the literal pool,
which is not as efficient as loading two halves with llihf and oilf.

gcc/ChangeLog:

2020-11-30  Ilya Leoshkevich  

* config/s390/s390-protos.h (s390_const_int_pool_entry_p): New
function.
* config/s390/s390.c (s390_const_int_pool_entry_p): New
function.
* config/s390/s390.md: Add define_peephole2 that produces llihf
and oilf.

gcc/testsuite/ChangeLog:

2020-11-30  Ilya Leoshkevich  

* gcc.target/s390/load-imm64-1.c: New test.
* gcc.target/s390/load-imm64-2.c: New test.
---
 gcc/config/s390/s390-protos.h|  1 +
 gcc/config/s390/s390.c   | 31 
 gcc/config/s390/s390.md  | 22 ++
 gcc/testsuite/gcc.target/s390/load-imm64-1.c | 10 +++
 gcc/testsuite/gcc.target/s390/load-imm64-2.c | 10 +++
 5 files changed, 74 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/load-imm64-2.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index ad2f7f77c18..eb10c3f4bbb 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -135,6 +135,7 @@ extern void s390_split_access_reg (rtx, rtx *, rtx *);
 extern void print_operand_address (FILE *, rtx);
 extern void print_operand (FILE *, rtx, int);
 extern void s390_output_pool_entry (rtx, machine_mode, unsigned int);
+extern bool s390_const_int_pool_entry_p (rtx, HOST_WIDE_INT *);
 extern int s390_label_align (rtx_insn *);
 extern int s390_agen_dep_p (rtx_insn *, rtx_insn *);
 extern rtx_insn *s390_load_got (void);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 02f18366aa1..e3d68d3543b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -9400,6 +9400,37 @@ s390_output_pool_entry (rtx exp, machine_mode mode, 
unsigned int align)
 }
 }
 
+/* Return true if MEM refers to an integer constant in the literal pool.  If
+   VAL is not nullptr, then also fill it with the constant's value.  */
+
+bool
+s390_const_int_pool_entry_p (rtx mem, HOST_WIDE_INT *val)
+{
+  /* Try to match the following:
+ - (mem (unspec [(symbol_ref) (reg)] UNSPEC_LTREF)).
+ - (mem (symbol_ref)).  */
+
+  if (!MEM_P (mem))
+return false;
+
+  rtx addr = XEXP (mem, 0);
+  rtx sym;
+  if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LTREF)
+sym = XVECEXP (addr, 0, 0);
+  else
+sym = addr;
+
+  if (GET_CODE (sym) != SYMBOL_REF || !CONSTANT_POOL_ADDRESS_P (sym))
+return false;
+
+  rtx val_rtx = get_pool_constant (sym);
+  if (!CONST_INT_P (val_rtx))
+return false;
+
+  if (val != nullptr)
+*val = INTVAL (val_rtx);
+  return true;
+}
 
 /* Return an RTL expression representing the value of the return address
for the frame COUNT steps up from the current frame.  FRAME is the
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 910415a5974..79e9a75ba2f 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -2116,6 +2116,28 @@ (define_peephole2
   [(set (match_dup 0) (plus:DI (match_dup 1) (match_dup 2)))]
   "")
 
+; Split loading of 64-bit constants into GPRs into llihf + oilf -
+; counterintuitively, using oilf is faster than iilf.  oilf clobbers
+; cc, so cc must be dead.
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand" "")
+(match_operand:DI 1 "memory_operand" ""))]
+  "TARGET_64BIT
+   && TARGET_EXTIMM
+   && GENERAL_REG_P (operands[0])
+   && s390_const_int_pool_entry_p (operands[1], nullptr)
+   && peep2_reg_dead_p (1, gen_rtx_REG (CCmode, CC_REGNUM))"
+  [(set (match_dup 0) (match_dup 2))
+   (parallel
+[(set (match_dup 0) (ior:DI (match_dup 0) (match_dup 3)))
+ (clobber (reg:CC CC_REGNUM))])]
+{
+  HOST_WIDE_INT val;
+  gcc_assert (s390_const_int_pool_entry_p (operands[1], &val));
+  operands[2] = gen_rtx_CONST_INT (DImode, val & 0x);
+  operands[3] = gen_rtx_CONST_INT (DImode, val & 0x);
+})
+
 ;
 ; movsi instruction pattern(s).
 ;
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
new file mode 100644
index 000..db0a89395aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
@@ -0,0 +1,10 @@
+/* Test that large 64-bit constants are loaded with llihf + oilf when lgrl is
+   not available.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z9-109" } */
+
+unsigned long magic (void) { return 0x3f08c5392f756cd; }
+
+/* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\n\toilf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c 
b

[PATCH] Introduce can_vec_cmp_compare_p

2020-11-26 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.  Ok for master?



This is the same as dcd2ca63ec5c ("Introduce can_vcond_compare_p
function"), but for vec_cmp.  The reason it's needed is that since
5d9ade39b872 ("IBM Z: Fix PR97326: Enable fp compares in vec_cmp")
and 4acba4859013 ("IBM Z: Restrict vec_cmp on z13") s390's vec_cmp
expander advertises that it supports floating point comparisons except
signaling ones on z13, but the common code ignores the latter
restriction.

gcc/ChangeLog:

2020-11-25  Ilya Leoshkevich  

* optabs-tree.c (vec_cmp_icode_p): New function.
(vec_cmp_eq_icode_p): New function.
(expand_vec_cmp_expr_p): Use vec_cmp_icode_p and
vec_cmp_eq_icode_p.
(vcond_icode_p): Use get_rtx_code_1, just to be uniform with
vec_cmp_icode_p.
* optabs.c (unsigned_optab_p): New function.
(insn_predicate_matches_p): New function.
(can_vec_cmp_compare_p): New function.
(can_vcond_compare_p): Use unsigned_optab_p and
insn_predicate_matches_p.
(get_rtx_code): Use get_rtx_code_1.
(get_rtx_code_1): Version of get_rtx_code that returns UNKNOWN
instead of asserting.
* optabs.h (can_vec_cmp_compare_p): New function.
(get_rtx_code_1): New function.
---
 gcc/optabs-tree.c | 47 ++--
 gcc/optabs.c  | 78 ++-
 gcc/optabs.h  | 12 ++--
 3 files changed, 109 insertions(+), 28 deletions(-)

diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index b797d018c84..a8968f3dd1a 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -337,6 +337,35 @@ supportable_convert_operation (enum tree_code code,
   return false;
 }
 
+/* Return true iff vec_cmp_optab/vec_cmpu_optab can handle a vector comparison
+   for code CODE, comparing operands of type VALUE_TYPE and producing a result
+   of type MASK_TYPE.  */
+
+static bool
+vec_cmp_icode_p (tree value_type, tree mask_type, enum tree_code code)
+{
+  enum rtx_code rcode = get_rtx_code_1 (code, TYPE_UNSIGNED (value_type));
+  if (rcode == UNKNOWN)
+return false;
+
+  return can_vec_cmp_compare_p (rcode, TYPE_MODE (value_type),
+   TYPE_MODE (mask_type));
+}
+
+/* Return true iff vec_cmpeq_optab can handle a vector comparison for code
+   CODE, comparing operands of type VALUE_TYPE and producing a result of type
+   MASK_TYPE.  */
+
+static bool
+vec_cmp_eq_icode_p (tree value_type, tree mask_type, enum tree_code code)
+{
+  if (code != EQ_EXPR && code != NE_EXPR)
+return false;
+
+  return get_vec_cmp_eq_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type))
+!= CODE_FOR_nothing;
+}
+
 /* Return TRUE if appropriate vector insn is available
for vector comparison expr with vector type VALUE_TYPE
and resulting mask with MASK_TYPE.  */
@@ -344,14 +373,8 @@ supportable_convert_operation (enum tree_code code,
 bool
 expand_vec_cmp_expr_p (tree value_type, tree mask_type, enum tree_code code)
 {
-  if (get_vec_cmp_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type),
-TYPE_UNSIGNED (value_type)) != CODE_FOR_nothing)
-return true;
-  if ((code == EQ_EXPR || code == NE_EXPR)
-  && (get_vec_cmp_eq_icode (TYPE_MODE (value_type), TYPE_MODE (mask_type))
- != CODE_FOR_nothing))
-return true;
-  return false;
+  return vec_cmp_icode_p (value_type, mask_type, code)
+|| vec_cmp_eq_icode_p (value_type, mask_type, code);
 }
 
 /* Return true iff vcond_optab/vcondu_optab can handle a vector
@@ -361,8 +384,12 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type, 
enum tree_code code)
 static bool
 vcond_icode_p (tree value_type, tree cmp_op_type, enum tree_code code)
 {
-  return can_vcond_compare_p (get_rtx_code (code, TYPE_UNSIGNED (cmp_op_type)),
- TYPE_MODE (value_type), TYPE_MODE (cmp_op_type));
+  enum rtx_code rcode = get_rtx_code_1 (code, TYPE_UNSIGNED (cmp_op_type));
+  if (rcode == UNKNOWN)
+return false;
+
+  return can_vcond_compare_p (rcode, TYPE_MODE (value_type),
+ TYPE_MODE (cmp_op_type));
 }
 
 /* Return true iff vcondeq_optab can handle a vector comparison for code CODE,
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 1820b91877a..76045596980 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3834,23 +3834,59 @@ can_compare_p (enum rtx_code code, machine_mode mode,
   return 0;
 }
 
-/* Return whether the backend can emit a vector comparison for code CODE,
-   comparing operands of mode CMP_OP_MODE and producing a result with
-   VALUE_MODE.  */
+/* Return whether RTL code CODE corresponds to an unsigned optab.  */
+
+static bool
+unsigned_optab_p (enum rtx_code code)
+{
+  return code == LTU || code == LEU || code == GTU || code == GEU;
+}
+
+/* Return whether the backend-emitted comparison for code CODE, comparing
+   operands of mode VALUE_MODE and producing a result with MASK_MODE, mat

[PATCH] rtl_dump_bb: fix segfault when reporting internal error

2020-11-26 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.  Ok for master?



During ICE reporting, sometimes rtl_dump_bb is called on partially
initialized basic blocks.  This produces another ICE, obscuring the
original problem.

Fix by checking that that basic blocks are initialized before touching
their bb_infos.

gcc/ChangeLog:

2020-11-25  Ilya Leoshkevich  

* cfgrtl.c (rtl_bb_info_initialized_p): New function.
(rtl_dump_bb): Use rtl_bb_info_initialized_p before accessing bb
insns.
---
 gcc/cfgrtl.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 45d84d39b22..5e909e25882 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -97,6 +97,7 @@ static basic_block rtl_split_block (basic_block, void *);
 static void rtl_dump_bb (FILE *, basic_block, int, dump_flags_t);
 static int rtl_verify_flow_info_1 (void);
 static void rtl_make_forwarder_block (edge);
+static bool rtl_bb_info_initialized_p (basic_block bb);
 
 /* Return true if NOTE is not one of the ones that must be kept paired,
so that we may simply delete it.  */
@@ -2149,7 +2150,8 @@ rtl_dump_bb (FILE *outf, basic_block bb, int indent, 
dump_flags_t flags)
   putc ('\n', outf);
 }
 
-  if (bb->index != ENTRY_BLOCK && bb->index != EXIT_BLOCK)
+  if (bb->index != ENTRY_BLOCK && bb->index != EXIT_BLOCK
+  && rtl_bb_info_initialized_p (bb))
 {
   rtx_insn *last = BB_END (bb);
   if (last)
@@ -5135,6 +5137,12 @@ init_rtl_bb_info (basic_block bb)
   bb->il.x.rtl = ggc_cleared_alloc ();
 }
 
+static bool
+rtl_bb_info_initialized_p (basic_block bb)
+{
+  return bb->il.x.rtl;
+}
+
 /* Returns true if it is possible to remove edge E by redirecting
it to the destination of the other edge from E->src.  */
 
-- 
2.25.4



[PATCH] profopt-execute: unset testname_with_flags if create_gcov fails

2020-11-26 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.  Ok for master?



When diffing test results, there sometimes occur spurious "New tests
that PASS" / "Old tests that passed, that have disappeared" messages.
The reason is that if create_gcov is not installed, then the cached
testname_with_flags is not cleared and is carried over to the next
test.

gcc/testsuite/ChangeLog:

2020-11-26  Ilya Leoshkevich  

* lib/profopt.exp: Unset testname_with_flags if create_gcov
fails.
---
 gcc/testsuite/lib/profopt.exp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
index d6863439d04..98e1a0e63af 100644
--- a/gcc/testsuite/lib/profopt.exp
+++ b/gcc/testsuite/lib/profopt.exp
@@ -456,6 +456,7 @@ proc profopt-execute { src } {
set id [remote_spawn "" $cmd]
if { $id < 0 } {
unsupported "$testcase -fauto-profile: cannot run 
create_gcov"
+   unset testname_with_flags
set status "fail"
return
}
-- 
2.25.4



[PATCH] IBM Z: Restrict vec_cmp on z13

2020-11-24 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



Commit 5d9ade39b872 ("IBM Z: Fix PR97326: Enable fp compares in
vec_cmp") made it possible to create rtxes that describe signaling
comparisons on z13, which are not supported by the hardware.  Restrict
this by using vcond_comparison_operator predicate.

gcc/ChangeLog:

2020-11-24  Ilya Leoshkevich  

* config/s390/vector.md: Use vcond_comparison_operator
predicate.
---
 gcc/config/s390/vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index fef68644625..029ee0886c2 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1561,7 +1561,7 @@ (define_expand "copysign3"
 
 (define_expand "vec_cmp"
   [(set (match_operand:  0 "register_operand" "")
-   (match_operator: 1 ""
+   (match_operator: 1 "vcond_comparison_operator"
  [(match_operand:V_HW 2 "register_operand" "")
   (match_operand:V_HW 3 "register_operand" "")]))]
   "TARGET_VX"
-- 
2.25.4



[PATCH] IBM Z: Update autovec-*-quiet-uneq expectations

2020-11-23 Thread Ilya Leoshkevich via Gcc-patches
Commit 229752afe315 ("VEC_COND_EXPR optimizations") has improved code
generation: we no longer need "vx x,x,-1", which turned out to be
superfluous.  Instead, we simply swap 0 and -1 arguments of the
preceding "vsel".

gcc/testsuite/ChangeLog:

2020-11-23  Ilya Leoshkevich  

* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Expect
that "vx" is not emitted.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Likewise.
---
 .../gcc.target/s390/zvector/autovec-double-quiet-uneq.c  | 5 -
 .../gcc.target/s390/zvector/autovec-float-quiet-uneq.c   | 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
index 3d6da30beac..7c9b20fd2e0 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c
@@ -5,6 +5,9 @@
 
 AUTOVEC_DOUBLE (QUIET_UNEQ);
 
+/* { dg-final { scan-assembler {\n\tvzero\t} } } */
+/* { dg-final { scan-assembler {\n\tvgmg\t} } } */
 /* { dg-final { scan-assembler-times {\n\tvfchdb\t} 2 } } */
 /* { dg-final { scan-assembler {\n\tvo\t} } } */
-/* { dg-final { scan-assembler {\n\tvx\t} } } */
+/* { dg-final { scan-assembler {\n\tvsel\t} } } */
+/* { dg-final { scan-assembler-not {\n\tvx\t} } } */
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
index 1df53a99bc8..5ab9337880d 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c
@@ -5,6 +5,9 @@
 
 AUTOVEC_FLOAT (QUIET_UNEQ);
 
+/* { dg-final { scan-assembler {\n\tvzero\t} } } */
+/* { dg-final { scan-assembler {\n\tvgmf\t} } } */
 /* { dg-final { scan-assembler-times {\n\tvfchsb\t} 2 } } */
 /* { dg-final { scan-assembler {\n\tvo\t} } } */
-/* { dg-final { scan-assembler {\n\tvx\t} } } */
+/* { dg-final { scan-assembler {\n\tvsel\t} } } */
+/* { dg-final { scan-assembler-not {\n\tvx\t} } } */
-- 
2.25.4



Re: [PATCH v2] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p () before IPA.

2020-11-23 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote:
> 
> On 6/30/20 12:46 PM, Ilya Leoshkevich wrote:
> > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > 
> > This is the implementation of Jakub's suggestion: allow
> > __builtin_constant_p () after IPA, but fold it into 0.  Smoke test
> > passed on s390x-redhat-linux, full regtest and bootstrap are
> > running on
> > x86_64-redhat-linux.
> > 
> > ---
> > 
> > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c)
> > build
> > with GCC 10 fails on s390 with "impossible constraint".
> > 
> > The problem is that jump threading makes __builtin_constant_p ()
> > lie
> > when it splits a path containing a non-constant expression in a way
> > that on each of the resulting paths this expression is constant.
> > 
> > Fix by disallowing __builtin_constant_p () on threading paths
> > before
> > IPA and fold it into 0 after IPA.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-06-30  Ilya Leoshkevich  
> > 
> > * tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New
> > member.
> > (thread_jumps::profitable_jump_thread_path): Do not allow
> > __builtin_constant_p () on threading paths unless m_allow_bcp_p
> > is set.
> > (thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p.
> > (pass_thread_jumps::execute): Allow __builtin_constant_p () on
> > threading paths after IPA.
> > (pass_early_thread_jumps::execute): Do not allow
> > __builtin_constant_p () on threading paths before IPA.
> > * tree-ssa-threadupdate.c (duplicate_thread_path): Fold
> > __builtin_constant_p () on threading paths into 0.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-06-30  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/builtin-constant-p-threading.c: New test.
> So I'm finally getting back to this.  Thanks for your patience.
> 
> It's a nasty little problem, and I suspect there's actually some
> deeper
> issues here.  While I'd like to claim its a bad use of b_c_p, I don't
> think I can reasonably make that argument.
> 
> So what we have is a b_c_p at the start of an if-else chain. 
> Subsequent
> tests on the "true" arm of the the b_c_p test may throw us off the
> constant path (because the constants are out of range).  Once all the
> tests are passed (it's constant and the constant is in range) the
> true
> arm's terminal block has a special asm that requires a constant
> argument.   In the case where we get to the terminal block on the
> true
> arm, the argument to the b_c_p is used as the constant argument to
> the
> special asm.
> 
> At first glace jump threading seems to be doing the right thing. 
> Except
> that we end up with two paths to that terminal block with the special
> asm, one for each of the two constant arguments to the b_c_p call. 
> Naturally since that same value is used in the asm, we have to
> introduce
> a PHI to select between them at the head of the terminal block.   Now
> the argument in the asm is no longer constant and boom we fail.
> 
> I briefly pondered if we should only throttle when the argument to
> the
> b_c_p is not used elsewhere.  But I think that just hides the problem
> and with a little work I could probably extend the testcase to still
> fail in that scenario.
> 
> I also briefly pondered if we should isolate the terminal block as
> well
> (essentially creating one for each unique PHI argument).  We'd likely
> only need to do that when there's an ASM in the terminal block, but
> that
> likely just papers over the problem as well since the ASM could be in
> a
> successor of the terminal block.
> 
> I haven't thought real deeply about it, but I wouldn't be surprised
> if
> there's other passes that can trigger similar problems.  Aggressive
> cross-jumping would be the most obvious, but some of the
> hosting/sinking
> of operations past PHIs would seem potentially problematical as well.
> 
> Jakub suggestion might be the best one in this space.   I don't have
> anything better right now.  The deeper questions about other passes
> setting up similar scenarios can probably be punted, I'd expect
> threading to be far and above the most common way for this to happen
> and
> I'd be comfortable faulting in investigation of other cases if/when
> they
> happen.
> 
> So I retract my initial objections.  Let's go with the V2 patch.
> 
> 
> jeff

Hi Jeff,

Thanks for having another look!

I did x86_64 builds of SPEC and vmlinux, and it seems that in practice
v2 does not have any benefit over v1.

What do you think about going with the v1, which is less complex?

Best regards,
Ilya



[PATCH] IBM Z: Do not run long double tests on old machines

2020-11-13 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on z13 s390x-redhat-linux.  Ok for master?

gcc/testsuite/ChangeLog:

2020-11-12  Ilya Leoshkevich  

* gcc.target/s390/s390.exp (check_effective_target_s390_z14_hw):
New predicate.
* gcc.target/s390/vector/long-double-caller-abi-run.c: Use the
new predicate.
* gcc.target/s390/vector/long-double-copysign.c: Likewise.
* gcc.target/s390/vector/long-double-from-double.c: Likewise.
* gcc.target/s390/vector/long-double-from-float.c: Likewise.
* gcc.target/s390/vector/long-double-from-i16.c: Likewise.
* gcc.target/s390/vector/long-double-from-i32.c: Likewise.
* gcc.target/s390/vector/long-double-from-i64.c: Likewise.
* gcc.target/s390/vector/long-double-from-i8.c: Likewise.
* gcc.target/s390/vector/long-double-from-u16.c: Likewise.
* gcc.target/s390/vector/long-double-from-u32.c: Likewise.
* gcc.target/s390/vector/long-double-from-u64.c: Likewise.
* gcc.target/s390/vector/long-double-from-u8.c: Likewise.
* gcc.target/s390/vector/long-double-to-double.c: Likewise.
* gcc.target/s390/vector/long-double-to-float.c: Likewise.
* gcc.target/s390/vector/long-double-to-i16.c: Likewise.
* gcc.target/s390/vector/long-double-to-i32.c: Likewise.
* gcc.target/s390/vector/long-double-to-i64.c: Likewise.
* gcc.target/s390/vector/long-double-to-i8.c: Likewise.
* gcc.target/s390/vector/long-double-to-u16.c: Likewise.
* gcc.target/s390/vector/long-double-to-u32.c: Likewise.
* gcc.target/s390/vector/long-double-to-u64.c: Likewise.
* gcc.target/s390/vector/long-double-to-u8.c: Likewise.
* gcc.target/s390/vector/long-double-wfaxb.c: Likewise.
* gcc.target/s390/vector/long-double-wfdxb.c: Likewise.
* gcc.target/s390/vector/long-double-wfsxb-1.c: Likewise.
---
 gcc/testsuite/gcc.target/s390/s390.exp | 10 ++
 .../s390/vector/long-double-caller-abi-run.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-copysign.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-double.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-from-float.c|  3 ++-
 .../gcc.target/s390/vector/long-double-from-i16.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i32.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i64.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-i8.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u16.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u32.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u64.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-from-u8.c   |  3 ++-
 .../gcc.target/s390/vector/long-double-to-double.c |  3 ++-
 .../gcc.target/s390/vector/long-double-to-float.c  |  3 ++-
 .../gcc.target/s390/vector/long-double-to-i16.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i32.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i64.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-i8.c |  3 ++-
 .../gcc.target/s390/vector/long-double-to-u16.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u32.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u64.c|  3 ++-
 .../gcc.target/s390/vector/long-double-to-u8.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfaxb.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfdxb.c |  3 ++-
 .../gcc.target/s390/vector/long-double-wfsxb-1.c   |  3 ++-
 26 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index 387a720b8e3..00e0555d55c 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -192,6 +192,16 @@ proc check_effective_target_s390_z13_hw { } {
}
 }] "-march=z13 -m64 -mzarch" ] } { return 0 } else { return 1 }
 }
+proc check_effective_target_s390_z14_hw { } {
+if { ![check_runtime s390_check_s390_z14_hw [subst {
+   int main (void)
+   {
+   int x = 0;
+   asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : );
+   return x;
+   }
+}] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 }
+}
 
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
index f3a41bacc2f..f7315f6c2e9 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
 /* { dg-options "-O3 -march=z14 -mzarch" } */
+/* { dg-do run { target { s390_z14_hw } } } */
 #include "long-double-callee-abi-scan.c"
 #include "long-double-caller-abi-s

[PATCH] IBM Z: Fix bootstrap breakage due to HAVE_TF macro

2020-11-10 Thread Ilya Leoshkevich via Gcc-patches
Bootstrap and regtest running on s390x-redhat-linux with --enable-shared
--with-system-zlib --enable-threads=posix --enable-__cxa_atexit
--enable-checking=yes,rtl --enable-gnu-indirect-function
--disable-werror --enable-languages=c,c++,fortran,objc,obj-c++
--with-arch=arch13.  Ok for master?



Commit e627cda56865 ("IBM Z: Store long doubles in vector registers
when possible") introduced HAVE_TF macro which expands to a logical
"or" of HAVE_ constants.  Not all of these constants are available in
GENERATOR_FILE context, so a hack was used: simply expand to true in
this case, because the actual value matters only during compiler
runtime and not during generation.

However, one aspect of this value matters during generation after all:
whether or not it's a constant, which in this case it appears to be.
This results in incorrect values in insn-flags.h and broken bootstrap
for some configurations.

Fix by using a dummy value that is not a constant.

gcc/ChangeLog:

2020-11-10  Ilya Leoshkevich  

* config/s390/s390.h (HAVE_TF): Use opaque value when
GENERATOR_FILE is defined.
---
 gcc/config/s390/s390.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 8c028317b6b..bc579a3dadd 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -1187,8 +1187,9 @@ struct GTY(()) machine_function
 #define TARGET_INDIRECT_BRANCH_TABLE s390_indirect_branch_table
 
 #ifdef GENERATOR_FILE
-/* gencondmd.c is built before insn-flags.h.  */
-#define HAVE_TF(icode) true
+/* gencondmd.c is built before insn-flags.h.  Use an arbitrary opaque value
+   that cannot be optimized away by gen_insn.  */
+#define HAVE_TF(icode) TARGET_HARD_FLOAT
 #else
 #define HAVE_TF(icode) (HAVE_##icode##_fpr || HAVE_##icode##_vr)
 #endif
-- 
2.25.4



[PATCH 2/2] IBM Z: Test long doubles in vector registers

2020-11-09 Thread Ilya Leoshkevich via Gcc-patches
gcc/testsuite/ChangeLog:

2020-11-05  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-callee-abi-scan.c: New test.
* gcc.target/s390/vector/long-double-caller-abi-run.c: New test.
* gcc.target/s390/vector/long-double-caller-abi-scan.c: New test.
* gcc.target/s390/vector/long-double-copysign.c: New test.
* gcc.target/s390/vector/long-double-fprx2-constant.c: New test.
* gcc.target/s390/vector/long-double-from-double.c: New test.
* gcc.target/s390/vector/long-double-from-float.c: New test.
* gcc.target/s390/vector/long-double-from-i16.c: New test.
* gcc.target/s390/vector/long-double-from-i32.c: New test.
* gcc.target/s390/vector/long-double-from-i64.c: New test.
* gcc.target/s390/vector/long-double-from-i8.c: New test.
* gcc.target/s390/vector/long-double-from-u16.c: New test.
* gcc.target/s390/vector/long-double-from-u32.c: New test.
* gcc.target/s390/vector/long-double-from-u64.c: New test.
* gcc.target/s390/vector/long-double-from-u8.c: New test.
* gcc.target/s390/vector/long-double-to-double.c: New test.
* gcc.target/s390/vector/long-double-to-float.c: New test.
* gcc.target/s390/vector/long-double-to-i16.c: New test.
* gcc.target/s390/vector/long-double-to-i32.c: New test.
* gcc.target/s390/vector/long-double-to-i64.c: New test.
* gcc.target/s390/vector/long-double-to-i8.c: New test.
* gcc.target/s390/vector/long-double-to-u16.c: New test.
* gcc.target/s390/vector/long-double-to-u32.c: New test.
* gcc.target/s390/vector/long-double-to-u64.c: New test.
* gcc.target/s390/vector/long-double-to-u8.c: New test.
* gcc.target/s390/vector/long-double-vec-duplicate.c: New test.
* gcc.target/s390/vector/long-double-wf.h: New test.
* gcc.target/s390/vector/long-double-wfaxb.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-0001.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-0111.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-1011.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-1101.c: New test.
* gcc.target/s390/vector/long-double-wfdxb.c: New test.
* gcc.target/s390/vector/long-double-wfixb.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-0111.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-1011.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-1101.c: New test.
* gcc.target/s390/vector/long-double-wflcxb.c: New test.
* gcc.target/s390/vector/long-double-wflpxb.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-2.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-3.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb.c: New test.
* gcc.target/s390/vector/long-double-wfmsxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfmsxb.c: New test.
* gcc.target/s390/vector/long-double-wfmxb.c: New test.
* gcc.target/s390/vector/long-double-wfnmaxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfnmaxb.c: New test.
* gcc.target/s390/vector/long-double-wfnmsxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfnmsxb.c: New test.
* gcc.target/s390/vector/long-double-wfsqxb.c: New test.
* gcc.target/s390/vector/long-double-wfsxb-1.c: New test.
* gcc.target/s390/vector/long-double-wfsxb.c: New test.
* gcc.target/s390/vector/long-double-wftcixb-1.c: New test.
* gcc.target/s390/vector/long-double-wftcixb.c: New test.
---
 .../s390/vector/long-double-callee-abi-scan.c | 20 +++
 .../s390/vector/long-double-caller-abi-run.c  |  4 ++
 .../s390/vector/long-double-caller-abi-scan.c | 13 
 .../s390/vector/long-double-copysign.c| 21 +++
 .../s390/vector/long-double-fprx2-constant.c  | 11 
 .../s390/vector/long-double-from-double.c | 18 ++
 .../s390/vector/long-double-from-float.c  | 19 ++
 .../s390/vector/long-double-from-i16.c| 19 ++
 .../s390/vector/long-double-from-i32.c| 19 ++
 .../s390/vector/long-double-from-i64.c| 19 ++
 .../s390/vector/long-double-from-i8.c | 19 ++
 .../s390/vector/long-double-from-u16.c| 19 ++
 .../s390/vector/long-double-from-u32.c| 19 ++
 .../s390/vector/long-double-from-u64.c| 19 ++
 .../s390/vector/long-double-from-u8.c | 19 ++
 .../s390/vector/long-double-to-double.c   | 18 ++
 .../s390/vector/long-double-to-float.c| 19 ++
 .../s390/vector/long-double-to-i16.c  | 19 ++
 .../s390/vector/long-double-to-i32.c  | 19 ++
 .../s390/vector/long-double-to-i64.c  | 21 +++
 .../s390/vector/long-double-to-i8.c   | 1

[PATCH 1/2] IBM Z: Store long doubles in vector registers when possible

2020-11-09 Thread Ilya Leoshkevich via Gcc-patches
On z14+, there are instructions for working with 128-bit floats (long
doubles) in vector registers.  It's beneficial to use them instead of
instructions that operate on floating point register pairs, because it
allows to store 4 times more data in registers at a time, relieving
register pressure.  The raw performance of the new instructions is
almost the same as that of the new ones.

Implement by storing TFmode values in vector registers on z14+.  Since
not all operations are available with the new instructions, keep the
old ones available using the new FPRX2 mode, and convert between it and
TFmode when necessary (this is called "forwarder" expanders below).
Change the existing TFmode expanders to call either new- or old-style
ones depending on whether we are on z14+ or older machines
("dispatcher" expanders).

gcc/ChangeLog:

2020-11-03  Ilya Leoshkevich  

* config/s390/s390-modes.def (FPRX2): New mode.
* config/s390/s390-protos.h (s390_fma_allowed_p): New function.
* config/s390/s390.c (s390_fma_allowed_p): Likewise.
(s390_build_signbit_mask): Support 128-bit masks.
(print_operand): Support printing the second word of a TFmode
operand as vector register.
(constant_modes): Add FPRX2mode.
(s390_class_max_nregs): Return 1 for TFmode on z14+.
(s390_is_fpr128): New function.
(s390_is_vr128): Likewise.
(s390_can_change_mode_class): Use s390_is_fpr128 and
s390_is_vr128 in order to determine whether mode refers to a FPR
pair or to a VR.
(s390_emit_compare): Force TFmode operands into registers on
z14+.
* config/s390/s390.h (HAVE_TF): New macro.
(EXPAND_MOVTF): New macro.
(EXPAND_TF): Likewise.
* config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
alias.
(ALL): Add FPRX2.
(FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
(FP): Likewise.
(FP_ANYTF): New mode iterator.
(BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
(TD_TF): Likewise.
(xde): Add FPRX2.
(nBFP): Likewise.
(nDFP): Likewise.
(DSF): Likewise.
(DFDI): Likewise.
(SFSI): Likewise.
(DF): Likewise.
(SF): Likewise.
(fT0): Likewise.
(bt): Likewise.
(_d): Likewise.
(HALF_TMODE): Likewise.
(tf_fpr): New mode_attr.
(type): New mode_attr.
(*cmp_ccz_0): Use type instead of mode with fsimp.
(*cmp_ccs_0_fastmath): Likewise.
(*cmptf_ccs): New pattern for wfcxb.
(*cmptf_ccsfps): New pattern for wfkxb.
(mov): Rename to mov.
(signbit2): Rename to signbit2.
(isinf2): Renamed to isinf2.
(*TDC_insn_): Use type instead of mode with fsimp.
(fixuns_trunc2): Rename to
fixuns_trunc2.
(fix_trunctf2): Rename to fix_trunctf2_fpr.
(floatdi2): Rename to floatdi2, use type
instead of mode with itof.
(floatsi2): Rename to floatsi2, use type
instead of mode with itof.
(*floatuns2): Use type instead of mode for
itof.
(floatuns2): Rename to
floatuns2.
(trunctf2): Rename to trunctf2_fpr, use type instead
of mode with fsimp.
(extend2): Rename to
extend2.
(2): Rename to
2, use type instead of
mode with fsimp.
(rint2): Rename to rint2, use
type instead of mode with fsimp.
(2): Use type instead of mode for
fsimp.
(rint2): Likewise.
(trunc2): Rename to
trunc2.
(trunc2): Rename to
trunc2.
(extend2): Rename to
extend2.
(extend2): Rename to
extend2.
(add3): Rename to add3, use type instead of
mode with fsimp.
(*add3_cc): Use type instead of mode with fsimp.
(*add3_cconly): Likewise.
(sub3): Rename to sub3, use type instead of
mode with fsimp.
(*sub3_cc): Use type instead of mode with fsimp.
(*sub3_cconly): Likewise.
(mul3): Rename to mul3, use type instead of
mode with fsimp.
(fma4): Restrict using s390_fma_allowed_p.
(fms4): Restrict using s390_fma_allowed_p.
(div3): Rename to div3, use type instead of
mode with fdiv.
(neg2): Rename to neg2.
(*neg2_cc): Use type instead of mode with fsimp.
(*neg2_cconly): Likewise.
(*neg2_nocc): Likewise.
(*neg2): Likeiwse.
(abs2): Rename to abs2, use type instead of
mode with fdiv.
(*abs2_cc): Use type instead of mode with fsimp.
(*abs2_cconly): Likewise.
(*abs2_nocc): Likewise.
(*abs2): Likewise.
(*negabs2_cc): Likewise.
(*negabs2_cconly): Likewise.
(*negabs2_nocc): Likewise.
(*negabs2): Likewise.
(sqrt2): Rename to sqrt2, use type instead
of mode with fsqrt.
(cbranch4): Us

[PATCH 0/2] IBM Z: Store long doubles in vector registers when possible

2020-11-09 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux with --with-arch=z15.
Ok for master?

This patch series implements storing long doubles in vector registers
on z14+.  Patch 1 is the actual implementation, patch 2 adds tests.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557968.html
v1 -> v2:
* Committed cleanups.
* Do not use general_operand for *cmptf_ccs.
* Fix expander condition mismatches.
* Move tests to from zvector to vector, do not use -mzvector.
* Merge scan and run tests where possible.

Ilya Leoshkevich (2):
  IBM Z: Store long doubles in vector registers when possible
  IBM Z: Test long doubles in vector registers

 gcc/config/s390/s390-modes.def|   5 +-
 gcc/config/s390/s390-protos.h |   1 +
 gcc/config/s390/s390.c|  57 ++-
 gcc/config/s390/s390.h|  35 ++
 gcc/config/s390/s390.md   | 209 ++
 gcc/config/s390/s390.opt  |  11 +
 gcc/config/s390/vector.md | 382 --
 gcc/config/s390/vx-builtins.md|  38 +-
 .../s390/vector/long-double-callee-abi-scan.c |  20 +
 .../s390/vector/long-double-caller-abi-run.c  |   4 +
 .../s390/vector/long-double-caller-abi-scan.c |  13 +
 .../s390/vector/long-double-copysign.c|  21 +
 .../s390/vector/long-double-fprx2-constant.c  |  11 +
 .../s390/vector/long-double-from-double.c |  18 +
 .../s390/vector/long-double-from-float.c  |  19 +
 .../s390/vector/long-double-from-i16.c|  19 +
 .../s390/vector/long-double-from-i32.c|  19 +
 .../s390/vector/long-double-from-i64.c|  19 +
 .../s390/vector/long-double-from-i8.c |  19 +
 .../s390/vector/long-double-from-u16.c|  19 +
 .../s390/vector/long-double-from-u32.c|  19 +
 .../s390/vector/long-double-from-u64.c|  19 +
 .../s390/vector/long-double-from-u8.c |  19 +
 .../s390/vector/long-double-to-double.c   |  18 +
 .../s390/vector/long-double-to-float.c|  19 +
 .../s390/vector/long-double-to-i16.c  |  19 +
 .../s390/vector/long-double-to-i32.c  |  19 +
 .../s390/vector/long-double-to-i64.c  |  21 +
 .../s390/vector/long-double-to-i8.c   |  19 +
 .../s390/vector/long-double-to-u16.c  |  20 +
 .../s390/vector/long-double-to-u32.c  |  20 +
 .../s390/vector/long-double-to-u64.c  |  20 +
 .../s390/vector/long-double-to-u8.c   |  20 +
 .../s390/vector/long-double-vec-duplicate.c   |  13 +
 .../gcc.target/s390/vector/long-double-wf.h   |  60 +++
 .../s390/vector/long-double-wfaxb.c   |  17 +
 .../s390/vector/long-double-wfcxb-0001.c  |   9 +
 .../s390/vector/long-double-wfcxb-0111.c  |   9 +
 .../s390/vector/long-double-wfcxb-1011.c  |   9 +
 .../s390/vector/long-double-wfcxb-1101.c  |   9 +
 .../s390/vector/long-double-wfdxb.c   |  17 +
 .../s390/vector/long-double-wfixb.c   |   7 +
 .../s390/vector/long-double-wfkxb-0111.c  |   9 +
 .../s390/vector/long-double-wfkxb-1011.c  |   9 +
 .../s390/vector/long-double-wfkxb-1101.c  |   9 +
 .../s390/vector/long-double-wflcxb.c  |   7 +
 .../s390/vector/long-double-wflpxb.c  |   7 +
 .../s390/vector/long-double-wfmaxb-2.c|  24 ++
 .../s390/vector/long-double-wfmaxb-3.c|  14 +
 .../s390/vector/long-double-wfmaxb-disabled.c |   8 +
 .../s390/vector/long-double-wfmaxb.c  |   7 +
 .../s390/vector/long-double-wfmsxb-disabled.c |   8 +
 .../s390/vector/long-double-wfmsxb.c  |   7 +
 .../s390/vector/long-double-wfmxb.c   |   7 +
 .../vector/long-double-wfnmaxb-disabled.c |   9 +
 .../s390/vector/long-double-wfnmaxb.c |   7 +
 .../vector/long-double-wfnmsxb-disabled.c |   9 +
 .../s390/vector/long-double-wfnmsxb.c |   7 +
 .../s390/vector/long-double-wfsqxb.c  |   7 +
 .../s390/vector/long-double-wfsxb-1.c |  21 +
 .../s390/vector/long-double-wfsxb.c   |   7 +
 .../s390/vector/long-double-wftcixb-1.c   |  15 +
 .../s390/vector/long-double-wftcixb.c |   7 +
 63 files changed, 1412 insertions(+), 134 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-callee-abi-scan.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-run.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-caller-abi-scan.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-copysign.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-fprx2-constant.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-double.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-float.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-i16.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-from-i32.c
 create mode 100644 gcc/testsuite/gcc

Re: [PATCH 4/4] IBM Z: Test long doubles in vector registers

2020-11-04 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-11-04 at 18:28 +0100, Andreas Krebbel wrote:
> These tests all use the -mzvector option but do not appear to make
> use of the z vector languages
> extensions. I think that option could be removed. Then these tests
> should be moved to the vector subdir.

Will change, thanks!

> You could do the asm scanning also in dg-do run tests.

This doesn't seem to work.  For example, if I add 

/* { dg-final { scan-assembler-times {aaa} 999 } } */

to long-double-from-double-run.c, it won't fail.

> 
> Andreas
> 
> 
> On 03.11.20 22:46, Ilya Leoshkevich wrote:
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/zvector/long-double-callee-abi-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-caller-abi-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-caller-abi-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-copysign-run.c: New test.
> > * gcc.target/s390/zvector/long-double-copysign-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-fprx2-constant.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-double-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-double-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-float-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-float-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i16-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i32-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i64-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-i8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-i8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u16-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u32-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u64-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-from-u8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-from-u8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-double-run.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-double-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-float-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-float-scan.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-to-i16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i16-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i32-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i64-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-i8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u16-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u16-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u32-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u32-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u64-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u64-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u8-run.c: New test.
> > * gcc.target/s390/zvector/long-double-to-u8-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-vec-duplicate.c: New
> > test.
> > * gcc.target/s390/zvector/long-double-wf.h: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb-run.c: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-wfaxb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-0001.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-0111.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-1011.c: New test.
> > * gcc.target/s390/zvector/long-double-wfcxb-1101.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb-run.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb-scan.c: New test.
> > * gcc.target/s390/zvector/long-double-wfdxb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfixb.c: New test.
> > * gcc.target/s390/zvector/long-double-wfkxb-0111.c: New test.
> >   

Re: [PATCH 3/4] IBM Z: Store long doubles in vector registers when possible

2020-11-04 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-11-04 at 18:16 +0100, Andreas Krebbel wrote:
> On 03.11.20 22:45, Ilya Leoshkevich wrote:
> > On z14+, there are instructions for working with 128-bit floats
> > (long
> > doubles) in vector registers.  It's beneficial to use them instead
> > of
> > instructions that operate on floating point register pairs, because
> > it
> > allows to store 4 times more data in registers at a time,
> > relieveing
> > register pressure.  The performance of new instructions is almost
> > the
> > same.
> > 
> > Implement by storing TFmode values in vector registers on
> > z14+.  Since
> > not all operations are available with the new instructions, keep
> > the old
> > ones using the new FPRX2 mode, and convert between it and TFmode
> > when
> > necessary (this is called "forwarder" expanders below).  Change the
> > existing TFmode expanders to call either new- or old-style ones
> > depending on whether we are on z14+ or older machines ("dispatcher"
> > expanders).
> > 
> > gcc/ChangeLog:
> > 
> > 2020-11-03  Ilya Leoshkevich  
> > 
> > * config/s390/s390-modes.def (FPRX2): New mode.
> > * config/s390/s390-protos.h (s390_fma_allowed_p): New function.
> > * config/s390/s390.c (s390_fma_allowed_p): Likewise.
> > (s390_build_signbit_mask): Support 128-bit masks.
> > (print_operand): Support printing the second word of a TFmode
> > operand as vector register.
> > (constant_modes): Add FPRX2mode.
> > (s390_class_max_nregs): Return 1 for TFmode on z14+.
> > (s390_is_fpr128): New function.
> > (s390_is_vr128): Likewise.
> > (s390_can_change_mode_class): Use s390_is_fpr128 and
> > s390_is_vr128 in order to determine whether mode refers to a
> > FPR
> > pair or to a VR.
> > * config/s390/s390.h (EXPAND_MOVTF): New macro.
> > (EXPAND_TF): Likewise.
> > * config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
> > alias.
> > (ALL): Add FPRX2.
> > (FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (FP): Likewise.
> > (FP_ANYTF): New mode iterator.
> > (BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
> > (TD_TF): Likewise.
> > (xde): Add FPRX2.
> > (nBFP): Likewise.
> > (nDFP): Likewise.
> > (DSF): Likewise.
> > (DFDI): Likewise.
> > (SFSI): Likewise.
> > (DF): Likewise.
> > (SF): Likewise.
> > (fT0): Likewise.
> > (bt): Likewise.
> > (_d): Likewise.
> > (HALF_TMODE): Likewise.
> > (tf_fpr): New mode_attr.
> > (type): New mode_attr.
> > (*cmp_ccz_0): Use type instead of mode with fsimp.
> > (*cmp_ccs_0_fastmath): Likewise.
> > (*cmptf_ccs): New pattern for wfcxb.
> > (*cmptf_ccsfps): New pattern for wfkxb.
> > (mov): Rename to mov.
> > (signbit2): Rename to signbit2.
> > (isinf2): Renamed to isinf2.
> > (*TDC_insn_): Use type instead of mode with fsimp.
> > (fixuns_trunc2): Rename to
> > fixuns_trunc2.
> > (fix_trunctf2): Rename to fix_trunctf2_fpr.
> > (floatdi2): Rename to floatdi2, use type
> > instead of mode with itof.
> > (floatsi2): Rename to floatsi2, use type
> > instead of mode with itof.
> > (*floatuns2): Use type instead of mode for
> > itof.
> > (floatuns2): Rename to
> > floatuns2.
> > (trunctf2): Rename to trunctf2_fpr, use type
> > instead
> > of mode with fsimp.
> > (extend2): Rename to
> > extend2.
> > (2): Rename to
> > 2, use type instead of
> > mode with fsimp.
> > (rint2): Rename to rint2, use
> > type instead of mode with fsimp.
> > (2): Use type instead of mode for
> > fsimp.
> > (rint2): Likewise.
> > (trunc2): Rename to
> > trunc2.
> > (trunc2): Rename to
> > trunc2.
> > (extend2): Rename to
> > extend2.
> > (extend2): Rename to
> > extend2.
> > (add3): Rename to add3, use type instead of
> > mode with fsimp.
> > (*add3_cc): Use type instead of mode with fsimp.
> > (*add3_cconly): Likewise.
> > (sub3): Rename to sub3, use type instead of
> > mode with fsimp.
> > (*sub3_cc): Use type instead of mode with fsimp.
> > (*sub3_cconly): Likewise.
> > (mul3): Rename to mul3, use type instead of
> > mode with fsimp.
> > (fma4): Restrict using s390_fma_allowed_p.
> > (fms4): Restrict using s390_fma_allowed_p.
> > (div3): Rename to div3, use type instead of
> > mode with fdiv.
> > (neg2): Rename to neg2.
> > (*neg2_cc): Use type instead of mode with fsimp.
> > (*neg2_cconly): Likewise.
> > (*neg2_nocc): Likewise.
> > (*neg2): Likeiwse.
> > (abs2): Rename to abs2, use type instead of
> > mode with fdiv.
> > (*abs2_cc): Use type instead of mode with fsimp.
> > (*abs2_cconly): Likewise.
> > (*abs2_nocc): Likewise.
> > (*abs2): Likewise.
> > (*negabs2_cc): Likewise.
> > (*negabs2_cconly): Likewise.
> > (*negabs2_nocc): Likewise.
> > (*negabs2): Likewise.
> > (sqrt2): Rename to sqrt2, u

  1   2   >