Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 16:44 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 01:59:48PM +0200, Ilya Leoshkevich wrote:
> > On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > > > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > > > wrote:
> > > > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich
> > > > > wrote:
> > > > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze
> > > > > > Frielinghaus
> > > > > > wrote:
> > > > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > > > *tf_to_fprx2_1
> > > > > > > survive register allocation.  This in turn leads to wrong
> > > > > > > register
> > > > > > > renaming.  Keeping the current approach would mean we
> > > > > > > need
> > > > > > > two
> > > > > > > insns
> > > > > > > for
> > > > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively. 
> > > > > > > Something
> > > > > > > along
> > > > > > > the
> > > > > > > lines
> > > > > > > 
> > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > > > "nonimmediate_operand"
> > > > > > > "=f") 0)
> > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > "v")]
> > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > >   "TARGET_VXE"
> > > > > > >   "#")
> > > > > > > 
> > > > > > > (define_insn "*tf_to_fprx2_0"
> > > > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > > > "v")]
> > > > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > > > >   "TARGET_VXE"
> > > > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > > > >   [(set_attr "op_type" "VRR")])
> > > > > > > 
> > > > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > > > allocation
> > > > > > > operand 0
> > > > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > > > eliminated.
> > > > > > > 
> > > > > > > Since we always copy a whole vector register into a
> > > > > > > floating-
> > > > > > > point
> > > > > > > register pair, another way to fix this is to merge
> > > > > > > *tf_to_fprx2_0
> > > > > > > and
> > > > > > > *tf_to_fprx2_1 into a single insn which means we don't
> > > > > > > have
> > > > > > > to
> > > > > > > use
> > > > > > > subregs at all.  The downside of this is that the
> > > > > > > assembler
> > > > > > > template
> > > > > > > contains two instructions, now.  The upside is that we
> > > > > > > don't
> > > > > > > have
> > > > > > > to
> > > > > > > come up with some artificial insn before RA which might
> > > > > > > be
> > > > > > > more
> > > > > > > readable/maintainable.  That is implemented by this
> > > > > > > patch.
> > > > > > > 
> > > > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > > > specifier
> > > > > > > %V
> > > > > > > was
> > > > > > > introduced which is used in tf_to_fprx2 only, now.  I
> > > > > > > didn't
> > > > > > > come
> > > > > > > up
> > > > > > > with its counterpart like %F for floating-point
> >

Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 13:34 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 01:22:30PM +0200, Ilya Leoshkevich wrote:
> > On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > > > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > > > wrote:
> > > > > Currently subregs originating from *tf_to_fprx2_0 and
> > > > > *tf_to_fprx2_1
> > > > > survive register allocation.  This in turn leads to wrong
> > > > > register
> > > > > renaming.  Keeping the current approach would mean we need
> > > > > two
> > > > > insns
> > > > > for
> > > > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something
> > > > > along
> > > > > the
> > > > > lines
> > > > > 
> > > > > (define_insn "*tf_to_fprx2_0"
> > > > >   [(set (subreg:DF (match_operand:FPRX2 0
> > > > > "nonimmediate_operand"
> > > > > "=f") 0)
> > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > "v")]
> > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > >   "TARGET_VXE"
> > > > >   "#")
> > > > > 
> > > > > (define_insn "*tf_to_fprx2_0"
> > > > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > > > >     (unspec:DF [(match_operand:TF 1 "general_operand"
> > > > > "v")]
> > > > >    UNSPEC_TF_TO_FPRX2_0))]
> > > > >   "TARGET_VXE"
> > > > >   "vpdi\t%v0,%v1,%v0,1
> > > > >   [(set_attr "op_type" "VRR")])
> > > > > 
> > > > > and similar for *tf_to_fprx2_1.  Note, pre register
> > > > > allocation
> > > > > operand 0
> > > > > has mode FPRX2 and afterwards DF once subregs have been
> > > > > eliminated.
> > > > > 
> > > > > Since we always copy a whole vector register into a floating-
> > > > > point
> > > > > register pair, another way to fix this is to merge
> > > > > *tf_to_fprx2_0
> > > > > and
> > > > > *tf_to_fprx2_1 into a single insn which means we don't have
> > > > > to
> > > > > use
> > > > > subregs at all.  The downside of this is that the assembler
> > > > > template
> > > > > contains two instructions, now.  The upside is that we don't
> > > > > have
> > > > > to
> > > > > come up with some artificial insn before RA which might be
> > > > > more
> > > > > readable/maintainable.  That is implemented by this patch.
> > > > > 
> > > > > In commit r11-4872-ge627cda5686592, the output operand
> > > > > specifier
> > > > > %V
> > > > > was
> > > > > introduced which is used in tf_to_fprx2 only, now.  I didn't
> > > > > come
> > > > > up
> > > > > with its counterpart like %F for floating-point registers. 
> > > > > Instead I
> > > > > printed the register pair in the output function directly. 
> > > > > This
> > > > > spares
> > > > > us a new and "rare" format specifier for a single insn.  I
> > > > > don't
> > > > > have
> > > > > a
> > > > > strong opinion which option to choose, however, we should
> > > > > either
> > > > > add
> > > > > %F
> > > > > in order to mimic the same behaviour as %V or getting rid of
> > > > > %V
> > > > > and
> > > > > inline the logic in the output function.  I lean towards the
> > > > > latter.
> > > > > Any preferences?
> > > > > ---
> > > > >  gcc/config/s390/s390.md    |  2 +
> > > > >  gcc/config/s390/vector.md  | 66 +++-
> > > > > 
> > > > > 
> > > > > --
> > > > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > > > >

Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Wed, 2024-09-11 at 12:35 +0200, Stefan Schulze Frielinghaus wrote:
> On Wed, Sep 11, 2024 at 11:47:54AM +0200, Ilya Leoshkevich wrote:
> > On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus
> > wrote:
> > > Currently subregs originating from *tf_to_fprx2_0 and
> > > *tf_to_fprx2_1
> > > survive register allocation.  This in turn leads to wrong
> > > register
> > > renaming.  Keeping the current approach would mean we need two
> > > insns
> > > for
> > > *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along
> > > the
> > > lines
> > > 
> > > (define_insn "*tf_to_fprx2_0"
> > >   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> > > "=f") 0)
> > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > >    UNSPEC_TF_TO_FPRX2_0))]
> > >   "TARGET_VXE"
> > >   "#")
> > > 
> > > (define_insn "*tf_to_fprx2_0"
> > >   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
> > >     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
> > >    UNSPEC_TF_TO_FPRX2_0))]
> > >   "TARGET_VXE"
> > >   "vpdi\t%v0,%v1,%v0,1
> > >   [(set_attr "op_type" "VRR")])
> > > 
> > > and similar for *tf_to_fprx2_1.  Note, pre register allocation
> > > operand 0
> > > has mode FPRX2 and afterwards DF once subregs have been
> > > eliminated.
> > > 
> > > Since we always copy a whole vector register into a floating-
> > > point
> > > register pair, another way to fix this is to merge *tf_to_fprx2_0
> > > and
> > > *tf_to_fprx2_1 into a single insn which means we don't have to
> > > use
> > > subregs at all.  The downside of this is that the assembler
> > > template
> > > contains two instructions, now.  The upside is that we don't have
> > > to
> > > come up with some artificial insn before RA which might be more
> > > readable/maintainable.  That is implemented by this patch.
> > > 
> > > In commit r11-4872-ge627cda5686592, the output operand specifier
> > > %V
> > > was
> > > introduced which is used in tf_to_fprx2 only, now.  I didn't come
> > > up
> > > with its counterpart like %F for floating-point registers. 
> > > Instead I
> > > printed the register pair in the output function directly.  This
> > > spares
> > > us a new and "rare" format specifier for a single insn.  I don't
> > > have
> > > a
> > > strong opinion which option to choose, however, we should either
> > > add
> > > %F
> > > in order to mimic the same behaviour as %V or getting rid of %V
> > > and
> > > inline the logic in the output function.  I lean towards the
> > > latter.
> > > Any preferences?
> > > ---
> > >  gcc/config/s390/s390.md    |  2 +
> > >  gcc/config/s390/vector.md  | 66 +++-
> > > 
> > > --
> > >  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
> > >  3 files changed, 60 insertions(+), 34 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c
> > 
> > [...]
> > 
> > > +  char buf[64];
> > > +  switch (which_alternative)
> > > +    {
> > > +    case 0:
> > > +  if (REGNO (operands[0]) == REGNO (operands[1]))
> > > + return "vpdi\t%V0,%v1,%V0,5";
> > > +  else
> > > + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> > > +    case 1:
> > > +  {
> > > + const char *reg_pair = reg_names[REGNO (operands[0]) +
> > > 1];
> > > + snprintf (buf, sizeof (buf),
> > > "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> > > reg_pair);
> > 
> > I wonder if there is a corner case where 8+ does not fit into short
> > displacement?
> 
> That is covered by constraint AR, i.e., for short displacement, and
> AT
> for long displacement.

Don't they cover only %1, and not 8+%1? Can't there be a situation
where %1 barely fits and 8+%1 doesn't fit? A quick glance shows that
the code doesn't leave any allowance for this:

"AR"
  s390_mem_constraint("AR")
s390_check_qrst_address('R')
  s390_short_displacement()
INTVAL (disp) >= 0 && INTVAL (disp) < 4096


Re: [PATCH] s390: Fix TF to FPRX2 conversion [PR115860]

2024-09-11 Thread Ilya Leoshkevich
On Fri, 2024-08-16 at 09:41 +0200, Stefan Schulze Frielinghaus wrote:
> Currently subregs originating from *tf_to_fprx2_0 and *tf_to_fprx2_1
> survive register allocation.  This in turn leads to wrong register
> renaming.  Keeping the current approach would mean we need two insns
> for
> *tf_to_fprx2_0 and *tf_to_fprx2_1, respectively.  Something along the
> lines
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand"
> "=f") 0)
>     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>    UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "#")
> 
> (define_insn "*tf_to_fprx2_0"
>   [(set (match_operand:DF 0 "nonimmediate_operand" "=f")
>     (unspec:DF [(match_operand:TF 1 "general_operand" "v")]
>    UNSPEC_TF_TO_FPRX2_0))]
>   "TARGET_VXE"
>   "vpdi\t%v0,%v1,%v0,1
>   [(set_attr "op_type" "VRR")])
> 
> and similar for *tf_to_fprx2_1.  Note, pre register allocation
> operand 0
> has mode FPRX2 and afterwards DF once subregs have been eliminated.
> 
> Since we always copy a whole vector register into a floating-point
> register pair, another way to fix this is to merge *tf_to_fprx2_0 and
> *tf_to_fprx2_1 into a single insn which means we don't have to use
> subregs at all.  The downside of this is that the assembler template
> contains two instructions, now.  The upside is that we don't have to
> come up with some artificial insn before RA which might be more
> readable/maintainable.  That is implemented by this patch.
> 
> In commit r11-4872-ge627cda5686592, the output operand specifier %V
> was
> introduced which is used in tf_to_fprx2 only, now.  I didn't come up
> with its counterpart like %F for floating-point registers.  Instead I
> printed the register pair in the output function directly.  This
> spares
> us a new and "rare" format specifier for a single insn.  I don't have
> a
> strong opinion which option to choose, however, we should either add
> %F
> in order to mimic the same behaviour as %V or getting rid of %V and
> inline the logic in the output function.  I lean towards the latter.
> Any preferences?
> ---
>  gcc/config/s390/s390.md    |  2 +
>  gcc/config/s390/vector.md  | 66 +++-
> --
>  gcc/testsuite/gcc.target/s390/pr115860-1.c | 26 +
>  3 files changed, 60 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/pr115860-1.c

[...]

> +  char buf[64];
> +  switch (which_alternative)
> +    {
> +    case 0:
> +  if (REGNO (operands[0]) == REGNO (operands[1]))
> + return "vpdi\t%V0,%v1,%V0,5";
> +  else
> + return "ldr\t%f0,%f1;vpdi\t%V0,%v1,%V0,5";
> +    case 1:
> +  {
> + const char *reg_pair = reg_names[REGNO (operands[0]) + 1];
> + snprintf (buf, sizeof (buf), "ld\t%%f0,%%1;ld\t%%%s,8+%%1",
> reg_pair);

I wonder if there is a corner case where 8+ does not fit into short
displacement?

[...]


Re: [PATCH] s390: Fix s390_const_int_pool_entry_p and movdi peephole2 [PR114605]

2024-04-08 Thread Ilya Leoshkevich
On Sat, 2024-04-06 at 18:58 +0200, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase is miscompiled, because we have initially
> a movti which loads the 0x3f803f80ULL TImode constant
> from constant pool.  Later on we split it into a pair of DImode
> loads.  Now, for the first load (why just that?, though not stage4
> material) we trigger the peephole2 which uses
> s390_const_int_pool_entry_p.
> That function doesn't check at all the constant pool mode though,
> sees
> the constant pool at that address has a CONST_INT value and just
> assumes
> that is the value to return, which is especially wrong for big-
> endian,
> if it is a DImode load from offset 0, it should be loading 0 rather
> than
> 0x3f803f80ULL.
> The following patch adds checks if we are extracing a MODE_INT mode,
> if the constant pool has MODE_INT mode as well, punts if constant
> pool
> has smaller mode size than the extraction one (then it would be UB),
> if it has the same mode as before keeps using what it did before,
> if constant pool has a larger mode than the one being extracted, uses
> simplify_subreg.  I'd have used avoid_constant_pool_reference
> instead which can handle also offsets into the constant pool
> constants,
> but it can't handle UNSPEC_LTREF.
> 
> Another thing is that once that is fixed, we ICE when we extract
> constant
> like 0, ior insn predicate require non-0 constant.  So, the patch
> also
> fixes the peephole2 so that if either 32-bit half is zero, it uses a
> mere
> load of the constant into register rather than a pair of such load
> and ior.
> 
> Bootstrapped/regtested on s390x-linux, ok for trunk?

Hi Jakub, thanks for the patch, it looks good to me.
Since I'm not a maintainer, we need to wait for Andreas' opinion.

> 
> 2024-04-06  Jakub Jelinek  
> 
>   PR target/114605
>   * config/s390/s390.cc (s390_const_int_pool_entry_p): Punt
>   if mem doesn't have MODE_INT mode, or pool constant doesn't
>   have MODE_INT mode, or if pool constant mode is smaller than
>   mem mode.  If mem mode is different from pool constant mode,
>   try to simplify subreg.  If that doesn't work, punt, if it
>   does, use the simplified constant instead of the constant
> pool
>   constant.
>   * config/s390/s390.md (movdi from const pool peephole): If
>   either low or high 32-bit part is zero, just emit move insn
>   instead of move + ior.
> 
>   * gcc.dg/pr114605.c: New test.


[PATCH] libsanitizer: Do not mention MSan and DFSan in an error message

2024-04-04 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?


libsanitizer/ChangeLog:

* sanitizer_common/sanitizer_linux_s390.cpp (AvoidCVE_2016_2143):
Do not mention MSan and DFSan, which are not supported by GCC.
---
 libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp 
b/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp
index 74db831b0aa..65ba825fa97 100644
--- a/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp
+++ b/libsanitizer/sanitizer_common/sanitizer_linux_s390.cpp
@@ -212,7 +212,7 @@ void AvoidCVE_2016_2143() {
 return;
   Report(
 "ERROR: Your kernel seems to be vulnerable to CVE-2016-2143.  Using 
ASan,\n"
-"MSan, TSan, DFSan or LSan with such kernel can and will crash your\n"
+"TSan or LSan with such kernel can and will crash your\n"
 "machine, or worse.\n"
 "\n"
 "If you are certain your kernel is not vulnerable (you have compiled it\n"
-- 
2.44.0



[PATCH] IBM Z: Preserve exceptions in autovec-*-signaling-eq.c tests

2024-02-19 Thread Ilya Leoshkevich
DSE, DCE, and other passes are removing redundant signaling comparisons
from these tests, but the whole point is to check that GCC knows how to
emit them.  Use -fno-delete-dead-exceptions to prevent that.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/autovec-double-signaling-eq.c:
Preserve exceptions.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c:
Likewise.
---
 .../gcc.target/s390/zvector/autovec-double-signaling-eq.c   | 2 +-
 .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
index 3645d3cc393..b23568e06b4 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions -fno-delete-dead-exceptions" } */
 
 #include "autovec.h"
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
index d98aa0c494e..cd25d10c577 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions -fno-delete-dead-exceptions" } */
 
 #include "autovec.h"
 
-- 
2.43.2



[PATCH] Mark ASM_OUTPUT_FUNCTION_LABEL ()'s DECL argument as used

2024-01-15 Thread Ilya Leoshkevich
Compile tested for the ia64-elf target; bootstrap and regtest running
on x86_64-redhat-linux.  Ok for trunk when successful?



ia64-elf build fails with the following warning:

[all 2024-01-12 16:32:34] ../../gcc/gcc/config/ia64/ia64.cc:3889:59: 
error: unused parameter 'decl' [-Werror=unused-parameter]
[all 2024-01-12 16:32:34]  3889 | ia64_start_function (FILE *file, 
const char *fnname, tree decl)

decl is passed to ASM_OUTPUT_FUNCTION_LABEL (), whose default
implementation does not use it.  Mark it as used in order to avoid the
warning.

Reported-by: Jan-Benedict Glaw 
Suggested-by: Jan-Benedict Glaw 
Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
ASM_OUTPUT_FUNCTION_LABEL")
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

* defaults.h (ASM_OUTPUT_FUNCTION_LABEL): Mark DECL as used.
---
 gcc/defaults.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/defaults.h b/gcc/defaults.h
index 92f3e07f742..1a2ea68a543 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -149,8 +149,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
NAME, such as the label on a function.  */
 
 #ifndef ASM_OUTPUT_FUNCTION_LABEL
-#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \
-  assemble_function_label_raw ((FILE), (NAME))
+#define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL)\
+  do { \
+(void) (DECL); \
+assemble_function_label_raw ((FILE), (NAME));  \
+  } while (0)
 #endif
 
 /* Output the definition of a compiler-generated label named NAME.  */
-- 
2.43.0



[PATCH v2] rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]

2024-01-09 Thread Ilya Leoshkevich
v1: 
https://inbox.sourceware.org/gcc-patches/20240109105253.332676-1-...@linux.ibm.com/
v1 -> v2: Move the .LASANPC label to the .text section (Jakub).
  Jakub okay-ed this version in the GCC Bugzilla.

Bootstrap and regtest running on ppc64le-redhat-linux and
powerpc64-linux-gnu.  Ok for trunk when successful?



rs6000_elf_declare_function_name () outputs Power ELF V1 ABI function
entry labels without using ASM_OUTPUT_FUNCTION_LABEL ().  As a result,
.LASANPC labels are not emitted, causing linker errors.

In theory, it is possible to reuse ASM_OUTPUT_FUNCTION_LABEL () by
changing rs6000_output_function_entry () to generate label names
without outputting them, but this would be quite a large change.

Instead, factor out the .LASANPC emitting code from
ASM_OUTPUT_FUNCTION_LABEL () and call it manually.

Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
ASM_OUTPUT_FUNCTION_LABEL")
Suggested-by: Jakub Jelinek 
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

PR sanitizer/113284
* config/rs6000/rs6000.cc (rs6000_elf_declare_function_name):
Use assemble_function_label_final () for Power ELF V1 ABI.
* output.h (assemble_function_label_final): New function.
* varasm.cc (assemble_function_label_raw): Use
assemble_function_label_final ().
(assemble_function_label_final): New function.
---
 gcc/config/rs6000/rs6000.cc | 1 +
 gcc/output.h| 4 
 gcc/varasm.cc   | 9 +
 3 files changed, 14 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94fbf46f2b6..5d975dab921 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21357,6 +21357,7 @@ rs6000_elf_declare_function_name (FILE *file, const 
char *name, tree decl)
   ASM_DECLARE_RESULT (file, DECL_RESULT (decl));
   rs6000_output_function_entry (file, name);
   fputs (":\n", file);
+  assemble_function_label_final ();
   return;
 }
 
diff --git a/gcc/output.h b/gcc/output.h
index c8fe1d2643d..46b0033b221 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -182,6 +182,10 @@ extern const char *get_fnname_from_decl (tree);
code or data is output after the label.  */
 extern void assemble_function_label_raw (FILE *, const char *);
 
+/* Finish outputting function label.  Needs to be called when outputting
+   function label without using assemble_function_label_raw ().  */
+extern void assemble_function_label_final (void);
+
 /* Output assembler code for the constant pool of a function and associated
with defining the name of the function.  DECL describes the function.
NAME is the function's name.  For the constant pool, we use the current
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 1a869ae458a..2b633822434 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -1843,6 +1843,15 @@ void
 assemble_function_label_raw (FILE *file, const char *name)
 {
   ASM_OUTPUT_LABEL (file, name);
+  assemble_function_label_final ();
+}
+
+/* Finish outputting function label.  Needs to be called when outputting
+   function label without using assemble_function_label_raw ().  */
+
+void
+assemble_function_label_final (void)
+{
   if ((flag_sanitize & SANITIZE_ADDRESS)
   /* Notify ASAN only about the first function label.  */
   && (in_cold_section_p == first_function_block_is_cold)
-- 
2.43.0



Re: [PATCH v2 2/2] asan: Align .LASANPC on function boundary

2024-01-09 Thread Ilya Leoshkevich
On Tue, 2024-01-09 at 11:55 -0700, Jeff Law wrote:
> 
> 
> On 1/2/24 12:41, Ilya Leoshkevich wrote:
> > GCC can emit code between the function label and the .LASANPC
> > label,
> > making the latter unaligned.  Some architectures cannot load
> > unaligned
> > labels directly and require literal pool entries, which is
> > inefficient.
> > 
> > Move the invocation of asan_function_start to
> > ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code
> > is
> > emitted.  This allows setting the .LASANPC label alignment to the
> > respective function alignment.
> > ---
> >   gcc/asan.cc |  6 ++
> >   gcc/config/i386/i386.cc |  2 +-
> >   gcc/config/s390/s390.cc |  2 +-
> >   gcc/defaults.h  |  2 +-
> >   gcc/final.cc    |  3 ---
> >   gcc/output.h    |  4 
> >   gcc/varasm.cc   | 14 ++
> >   7 files changed, 23 insertions(+), 10 deletions(-)
> So this needs a ChangeLog obviously.  I assume you've tested on
> s390[x]. 
>   It should also be tested on x86 since it's the only other platform 
> that redefined ASM_OUTPUT_FUNCTION_LABEL.
> 
> Assuming those tests pass without regression, then this is fine for
> the 
> trunk.
> 
> Thanks,
> Jeff

Hi Jeff,

Since Jakub already approved this 2/2, you approved 1/2, and
x86_64/ppc64le/s390x regtests were successful, I've already pushed this
series (with ChangeLogs).

Unfortunately people discovered two regressions on i686 [1] and ppc64be
[2].  The first one is already sorted out, I'm currently regtesting the
fix for the second one and will push it as soon as it's done.

Best regards,
Ilya

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113251
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113284


[PATCH] rs6000: Fix ASAN linker errors for Power ELF V1 ABI [PR113284]

2024-01-09 Thread Ilya Leoshkevich
Bootstrap and regtest running on ppc64le-redhat-linux and
powerpc64-linux-gnu.  Ok for trunk when successful?



Use ASM_OUTPUT_FUNCTION_LABEL () instead of ASM_OUTPUT_LABEL () in
the Power ELF V1 ABI branch of rs6000_elf_declare_function_name () to
ensure that the .LASANPC label is emitted.  The other branches already
use the correct macro.

Fixes: c659dd8bfb55 ("Implement ASM_DECLARE_FUNCTION_NAME using 
ASM_OUTPUT_FUNCTION_LABEL")
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

PR sanitizer/113284
* config/rs6000/rs6000.cc (rs6000_elf_declare_function_name):
Use ASM_OUTPUT_FUNCTION_LABEL () for Power ELF V1 ABI.
---
 gcc/config/rs6000/rs6000.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94fbf46f2b6..fd9bb807957 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21334,7 +21334,7 @@ rs6000_elf_declare_function_name (FILE *file, const 
char *name, tree decl)
   if (TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
 {
   fputs ("\t.section\t\".opd\",\"aw\"\n\t.align 3\n", file);
-  ASM_OUTPUT_LABEL (file, name);
+  ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
   fputs (DOUBLE_INT_ASM_OP, file);
   rs6000_output_function_entry (file, name);
   fputs (",.TOC.@tocbase,0\n\t.previous\n", file);
-- 
2.43.0



[PATCH] asan: Do not call asan_function_start () without the current function [PR113251]

2024-01-08 Thread Ilya Leoshkevich
Bootstrap and regtest running on x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.  Ok for trunk when
successful?



Using ASAN on i686-linux with -fPIC causes an ICE, because when
pc_thunks are generated, there is no current function anymore, but
asan_function_start () expects one.

Fix by not calling asan_function_start () without one.

A narrower fix would be to temporarily disable ASAN around pc_thunk
generation.  However, the issue looks generic enough, and may affect
less often tested configurations, so go for a broader fix.

Fixes: e66dc37b299c ("asan: Align .LASANPC on function boundary")
Suggested-by: Jakub Jelinek 
Signed-off-by: Ilya Leoshkevich 

gcc/ChangeLog:

PR sanitizer/113251
* varasm.cc (assemble_function_label_raw): Do not call
asan_function_start () without the current function.
---
 gcc/varasm.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 25c1e05628d..1a869ae458a 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -1845,7 +1845,9 @@ assemble_function_label_raw (FILE *file, const char *name)
   ASM_OUTPUT_LABEL (file, name);
   if ((flag_sanitize & SANITIZE_ADDRESS)
   /* Notify ASAN only about the first function label.  */
-  && (in_cold_section_p == first_function_block_is_cold))
+  && (in_cold_section_p == first_function_block_is_cold)
+  /* Do not notify ASAN when called from, e.g., code_end ().  */
+  && cfun)
 asan_function_start ();
 }
 
-- 
2.43.0



[PATCH v2 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL

2024-01-02 Thread Ilya Leoshkevich
gccint recommends using ASM_OUTPUT_FUNCTION_LABEL in
ASM_DECLARE_FUNCTION_NAME, but many implementations use
ASM_OUTPUT_LABEL instead.  It's inconsistent and prevents changes to
ASM_OUTPUT_FUNCTION_LABEL from affecting the respective targets.
---
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 15 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 298477d88bb..e3c72f60d4e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24207,7 +24207,7 @@ aarch64_declare_function_name (FILE *stream, const 
char* name,
 
   /* Don't forget the type directive for ELF.  */
   ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
-  ASM_OUTPUT_LABEL (stream, name);
+  ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
 
   cfun->machine->label_is_assembled = true;
 }
diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 6aa93783226..8118255e737 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -7986,8 +7986,7 @@ int num_source_filenames = 0;
 /* Output the textual info surrounding the prologue.  */
 
 void
-alpha_start_function (FILE *file, const char *fnname,
- tree decl ATTRIBUTE_UNUSED)
+alpha_start_function (FILE *file, const char *fnname, tree decl)
 {
   unsigned long imask, fmask;
   /* Complete stack size needed.  */
@@ -8052,7 +8051,7 @@ alpha_start_function (FILE *file, const char *fnname,
   if (TARGET_ABI_OPEN_VMS)
 strcat (entry_label, "..en");
 
-  ASM_OUTPUT_LABEL (file, entry_label);
+  ASM_OUTPUT_FUNCTION_LABEL (file, entry_label, decl);
   inside_function = TRUE;
 
   if (TARGET_ABI_OPEN_VMS)
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 49896bb9620..380147aed7d 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -152,7 +152,7 @@
   do   \
 {  \
   ARM_DECLARE_FUNCTION_NAME (STREAM, NAME, DECL);   \
-  ASM_OUTPUT_LABEL (STREAM, NAME); \
+  ASM_OUTPUT_FUNCTION_LABEL (STREAM, NAME, DECL);  \
 }  \
   while (0)
 #endif
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 0c0cb14a8a4..7ca607b3de1 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -21800,7 +21800,7 @@ arm_asm_declare_function_name (FILE *file, const char 
*name, tree decl)
   ARM_DECLARE_FUNCTION_NAME (file, name, decl);
   ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function");
   ASM_DECLARE_RESULT (file, DECL_RESULT (decl));
-  ASM_OUTPUT_LABEL (file, name);
+  ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
 
   if (cmse_name)
 ASM_OUTPUT_LABEL (file, cmse_name);
diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h
index c25f41f6839..60a8d716819 100644
--- a/gcc/config/bfin/bfin.h
+++ b/gcc/config/bfin/bfin.h
@@ -995,14 +995,14 @@ typedef enum directives {
 fputc ('\n',FILE); \
   } while (0)
 
-#define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \
-  do { \
-fputs (".type ", FILE);\
-assemble_name (FILE, NAME); \
-fputs (", STT_FUNC", FILE); \
-fputc (';',FILE);   \
-fputc ('\n',FILE); \
-ASM_OUTPUT_LABEL(FILE, NAME);  \
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)\
+  do { \
+fputs (".type ", FILE);\
+assemble_name (FILE, NAME);\
+fputs (", STT_FUNC", FILE);\
+fputc (';', FILE); \
+fputc ('\n', FILE);\
+ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL);  \
   } while (0)
 
 #define ASM_OUTPUT_LABEL(FILE, NAME)\
diff --git a/gcc/config/c6x/c6x.h b/gcc/config/c6x/c6x.h
index 26b2f2f0700..790b9627ebe 100644
--- a/gcc/config/c6x/c6x.h
+++ b/gcc/config/c6x/c6x.h
@@ -459,7 +459,7 @@ struct GTY(()) machine_function
   c6x_output_file_unwind (FILE);   \
   ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "function");  \
   ASM_DECLARE_RESULT (FILE, D

[PATCH v2 2/2] asan: Align .LASANPC on function boundary

2024-01-02 Thread Ilya Leoshkevich
GCC can emit code between the function label and the .LASANPC label,
making the latter unaligned.  Some architectures cannot load unaligned
labels directly and require literal pool entries, which is inefficient.

Move the invocation of asan_function_start to
ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
emitted.  This allows setting the .LASANPC label alignment to the
respective function alignment.
---
 gcc/asan.cc |  6 ++
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 14 ++
 7 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 8d0ffb497cc..48738244aba 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1481,10 +1481,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 void
 asan_function_start (void)
 {
-  section *fnsec = function_section (current_function_decl);
-  switch_to_section (fnsec);
-  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC",
-current_function_funcdef_no);
+  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", 
current_function_funcdef_no);
 }
 
 /* Return number of shadow bytes that are occupied by a local variable
@@ -2006,6 +2003,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  DECL_ALIGN_RAW (decl) = DECL_ALIGN_RAW (current_function_decl);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38d515dac04..09fc2b63ee3 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1640,7 +1640,7 @@ ix86_asm_output_function_label (FILE *out_file, const 
char *fname,
   SUBTARGET_ASM_UNWIND_INIT (out_file);
 #endif
 
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
 
   /* Output magic byte marker, if hot-patch attribute is set.  */
   if (is_ms_hook)
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index a5c36b43972..c871a10506a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8323,7 +8323,7 @@ s390_asm_output_function_label (FILE *out_file, const 
char *fname,
   asm_fprintf (out_file, "\t# fn:%s wd%d\n", fname,
   s390_warn_dynamicstack_p);
 }
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
   if (hw_after > 0)
 asm_fprintf (out_file,
 "\t# post-label NOPs for hotpatch (%d halfwords)\n",
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 6f095969410..b76734908cd 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -150,7 +150,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #ifndef ASM_OUTPUT_FUNCTION_LABEL
 #define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \
-  ASM_OUTPUT_LABEL ((FILE), (NAME))
+  assemble_function_label_raw ((FILE), (NAME))
 #endif
 
 /* Output the definition of a compiler-generated label named NAME.  */
diff --git a/gcc/final.cc b/gcc/final.cc
index e6f1b1e166b..5e21aedf8ed 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -1686,9 +1686,6 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, 
int *seen,
 
   high_block_linenum = high_function_linenum = last_linenum;
 
-  if (flag_sanitize & SANITIZE_ADDRESS)
-asan_function_start ();
-
   rtx_insn *first = *firstp;
   if (in_initial_view_p (first))
 {
diff --git a/gcc/output.h b/gcc/output.h
index 76cfd58c1e6..bfdecc5ea74 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -178,6 +178,10 @@ extern void assemble_asm (tree);
 /* Get the function's name from a decl, as described by its RTL.  */
 extern const char *get_fnname_from_decl (tree);
 
+/* Output function label, possibly with accompanying metadata.  No additional
+   code or data is output after the label.  */
+extern void assemble_function_label_raw (FILE *, const char *);
+
 /* Output assembler code for the constant pool of a function and associated
with defining the name of the function.  DECL describes the function.
NAME is the function's name.  For the constant pool, we use the current
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 69f8f8ee018..d0d670d009c 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "toplev.h"
 #include "opts.h"
+#include "asan.h"
 
 /* The (assembler) name of the first globally-visible object output.  */
 extern GTY(()) const char *first_global_object_name;
@@ -1835,6 +1836,19 @@ get_fnname_from_decl (tree decl)
   return XSTR (x, 0);
 }
 
+/* Output function label, possibly with accompanying metadata.  No additional
+  

[PATCH v2 0/2] asan: Align .LASANPC on function boundary

2024-01-02 Thread Ilya Leoshkevich
v1: 
https://inbox.sourceware.org/gcc-patches/20231207121005.3425208-1-...@linux.ibm.com/
v1 -> v2: Fix style issues (Jakub).
  Jakub has reviewed patch 2 and mentioned that he'd defer the
  patch 1 review to Jeff.



Hi,

this is another attempt to fix the .LASANPC alignment on s390x.
Currently it's not only inefficient ([1]-[5]), but also causes linker
errors in template-heavy code ([6]).

The previous attempts to add a new constant for minimum code alignment
value ([1]-[5]) did not arouse considerable enthusiasm, and fixing the
fallout ([6]) is probably just a wrong thing to do.

So here I'm taking another approach: making sure that .LASANPC is
aligned on function boundary in the first place.  This requires moving
the asan_function_start() invocation to ASM_OUTPUT_FUNCTION_LABEL().

Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Compile tested for platforms listed in [7].

Best regards,
Ilya

[1] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
[5] https://patchwork.ozlabs.org/project/gcc/list/?series=320223
[6] https://patchwork.ozlabs.org/project/gcc/list/?series=297132
[7] http://toolchain.lug-owl.de/laminar/jobs

Ilya Leoshkevich (2):
  Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL
  asan: Align .LASANPC on function boundary

 gcc/asan.cc |  6 ++
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 14 ++
 22 files changed, 59 insertions(+), 48 deletions(-)

-- 
2.43.0



[PATCH 2/2] asan: Align .LASANPC on function boundary

2023-12-07 Thread Ilya Leoshkevich
GCC can emit code between the function label and the .LASANPC label,
making the latter unaligned.  Some architectures cannot load unaligned
labels directly and require literal pool entries, which is inefficient.

Move the invocation of asan_function_start to
ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
emitted.  This allows setting the .LASANPC label alignment to the
respective function alignment.
---
 gcc/asan.cc |  6 ++
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 10 ++
 7 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 8d0ffb497cc..48738244aba 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1481,10 +1481,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 void
 asan_function_start (void)
 {
-  section *fnsec = function_section (current_function_decl);
-  switch_to_section (fnsec);
-  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC",
-current_function_funcdef_no);
+  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", 
current_function_funcdef_no);
 }
 
 /* Return number of shadow bytes that are occupied by a local variable
@@ -2006,6 +2003,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  DECL_ALIGN_RAW (decl) = DECL_ALIGN_RAW (current_function_decl);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 7c5cab4e2c6..a552a300b69 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1640,7 +1640,7 @@ ix86_asm_output_function_label (FILE *out_file, const 
char *fname,
   SUBTARGET_ASM_UNWIND_INIT (out_file);
 #endif
 
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
 
   /* Output magic byte marker, if hot-patch attribute is set.  */
   if (is_ms_hook)
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 044de874590..a022db230db 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8323,7 +8323,7 @@ s390_asm_output_function_label (FILE *out_file, const 
char *fname,
   asm_fprintf (out_file, "\t# fn:%s wd%d\n", fname,
   s390_warn_dynamicstack_p);
 }
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
   if (hw_after > 0)
 asm_fprintf (out_file,
 "\t# post-label NOPs for hotpatch (%d halfwords)\n",
diff --git a/gcc/defaults.h b/gcc/defaults.h
index dc6f09cacae..153d3cd32c0 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -150,7 +150,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #ifndef ASM_OUTPUT_FUNCTION_LABEL
 #define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \
-  ASM_OUTPUT_LABEL ((FILE), (NAME))
+  assemble_function_label_raw ((FILE), (NAME))
 #endif
 
 /* Output the definition of a compiler-generated label named NAME.  */
diff --git a/gcc/final.cc b/gcc/final.cc
index e6f1b1e166b..5e21aedf8ed 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -1686,9 +1686,6 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, 
int *seen,
 
   high_block_linenum = high_function_linenum = last_linenum;
 
-  if (flag_sanitize & SANITIZE_ADDRESS)
-asan_function_start ();
-
   rtx_insn *first = *firstp;
   if (in_initial_view_p (first))
 {
diff --git a/gcc/output.h b/gcc/output.h
index 76cfd58c1e6..bfdecc5ea74 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -178,6 +178,10 @@ extern void assemble_asm (tree);
 /* Get the function's name from a decl, as described by its RTL.  */
 extern const char *get_fnname_from_decl (tree);
 
+/* Output function label, possibly with accompanying metadata.  No additional
+   code or data is output after the label.  */
+extern void assemble_function_label_raw (FILE *, const char *);
+
 /* Output assembler code for the constant pool of a function and associated
with defining the name of the function.  DECL describes the function.
NAME is the function's name.  For the constant pool, we use the current
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 167aea87091..28c29883df9 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "toplev.h"
 #include "opts.h"
+#include "asan.h"
 
 /* The (assembler) name of the first globally-visible object output.  */
 extern GTY(()) const char *first_global_object_name;
@@ -1835,6 +1836,15 @@ get_fnname_from_decl (tree decl)
   return XSTR (x, 0);
 }
 
+void assemble_function_label_raw (FILE *file, const char *name)
+{
+  ASM_OUTPUT_LABE

[PATCH 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL

2023-12-07 Thread Ilya Leoshkevich
gccint recommends using ASM_OUTPUT_FUNCTION_LABEL in
ASM_DECLARE_FUNCTION_NAME, but many implementations use
ASM_OUTPUT_LABEL instead.  It's inconsistent and prevents changes to
ASM_OUTPUT_FUNCTION_LABEL from affecting the respective targets.
---
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 15 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 8f50a70083d..bf247a8fd17 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23285,7 +23285,7 @@ aarch64_declare_function_name (FILE *stream, const 
char* name,
 
   /* Don't forget the type directive for ELF.  */
   ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
-  ASM_OUTPUT_LABEL (stream, name);
+  ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
 
   cfun->machine->label_is_assembled = true;
 }
diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 6aa93783226..8118255e737 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -7986,8 +7986,7 @@ int num_source_filenames = 0;
 /* Output the textual info surrounding the prologue.  */
 
 void
-alpha_start_function (FILE *file, const char *fnname,
- tree decl ATTRIBUTE_UNUSED)
+alpha_start_function (FILE *file, const char *fnname, tree decl)
 {
   unsigned long imask, fmask;
   /* Complete stack size needed.  */
@@ -8052,7 +8051,7 @@ alpha_start_function (FILE *file, const char *fnname,
   if (TARGET_ABI_OPEN_VMS)
 strcat (entry_label, "..en");
 
-  ASM_OUTPUT_LABEL (file, entry_label);
+  ASM_OUTPUT_FUNCTION_LABEL (file, entry_label, decl);
   inside_function = TRUE;
 
   if (TARGET_ABI_OPEN_VMS)
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 49896bb9620..380147aed7d 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -152,7 +152,7 @@
   do   \
 {  \
   ARM_DECLARE_FUNCTION_NAME (STREAM, NAME, DECL);   \
-  ASM_OUTPUT_LABEL (STREAM, NAME); \
+  ASM_OUTPUT_FUNCTION_LABEL (STREAM, NAME, DECL);  \
 }  \
   while (0)
 #endif
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 6e3e2e8fb1b..7fd9bc19882 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -21801,7 +21801,7 @@ arm_asm_declare_function_name (FILE *file, const char 
*name, tree decl)
   ARM_DECLARE_FUNCTION_NAME (file, name, decl);
   ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function");
   ASM_DECLARE_RESULT (file, DECL_RESULT (decl));
-  ASM_OUTPUT_LABEL (file, name);
+  ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
 
   if (cmse_name)
 ASM_OUTPUT_LABEL (file, cmse_name);
diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h
index c25f41f6839..60a8d716819 100644
--- a/gcc/config/bfin/bfin.h
+++ b/gcc/config/bfin/bfin.h
@@ -995,14 +995,14 @@ typedef enum directives {
 fputc ('\n',FILE); \
   } while (0)
 
-#define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \
-  do { \
-fputs (".type ", FILE);\
-assemble_name (FILE, NAME); \
-fputs (", STT_FUNC", FILE); \
-fputc (';',FILE);   \
-fputc ('\n',FILE); \
-ASM_OUTPUT_LABEL(FILE, NAME);  \
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)\
+  do { \
+fputs (".type ", FILE);\
+assemble_name (FILE, NAME);\
+fputs (", STT_FUNC", FILE);\
+fputc (';', FILE); \
+fputc ('\n', FILE);\
+ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL);  \
   } while (0)
 
 #define ASM_OUTPUT_LABEL(FILE, NAME)\
diff --git a/gcc/config/c6x/c6x.h b/gcc/config/c6x/c6x.h
index 26b2f2f0700..790b9627ebe 100644
--- a/gcc/config/c6x/c6x.h
+++ b/gcc/config/c6x/c6x.h
@@ -459,7 +459,7 @@ struct GTY(()) machine_function
   c6x_output_file_unwind (FILE);   \
   ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "function");  \
   ASM_DECLARE_RESULT (FILE, D

[PATCH 0/2] asan: Align .LASANPC on function boundary

2023-12-07 Thread Ilya Leoshkevich
Hi,

this is another attempt to fix the .LASANPC alignment on s390x.
Currently it's not only inefficient ([1]-[5]), but also causes linker
errors in template-heavy code ([6]).

The previous attempts to add a new constant for minimum code alignment
value ([1]-[5]) did not arouse considerable enthusiasm, and fixing the
fallout ([6]) is probably just a wrong thing to do.

So here I'm taking another approach: making sure that .LASANPC is
aligned on function boundary in the first place.  This requires moving
the asan_function_start() invocation to ASM_OUTPUT_FUNCTION_LABEL().

Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Compile tested for platforms listed in [7].

Best regards,
Ilya

[1] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
[5] https://patchwork.ozlabs.org/project/gcc/list/?series=320223
[6] https://patchwork.ozlabs.org/project/gcc/list/?series=297132
[7] http://toolchain.lug-owl.de/laminar/jobs

Ilya Leoshkevich (2):
  Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL
  asan: Align .LASANPC on function boundary

 gcc/asan.cc |  6 ++
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 10 ++
 22 files changed, 55 insertions(+), 48 deletions(-)

-- 
2.43.0



PING [PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc

2022-10-17 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2022-09-27 at 02:23 +0200, Ilya Leoshkevich wrote:
> Hi,
> 
> This is a resend of v4 with slightly adjusted commit messages:
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
> v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
> v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
> 
> It still survives the bootstrap and the regtest on x86_64-redhat-
> linux,
> s390x-redhat-linux and ppc64le-redhat-linux.  It also fixes [1].
> 
> I also tried the approach with moving .LASANPC closer to the function
> label and using FUNCTION_BOUNDARY instead of introducing
> CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch
> the moment where the function label is written.  Architectures can do
> it by calling ASM_OUTPUT_LABEL() or assemble_name() in
> ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or
> TARGET_ASM_FUNCTION_PROLOGUE().  epiphany_start_function() does that
> twice, but passes the same decl to both calls.  Note that simply
> moving asan_function_start() to final_start_function_1() is not
> enough,
> since an architecture can write something after the function label.
> This all means that for this approach to work, all the architectures
> need to be adjusted, which looks like an overkill to me.
> 
> Best regards,
> Ilya
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html
> 
> 
> Ilya Leoshkevich (2):
>   asan: specify alignment for LASANPC labels
>   IBM zSystems: Define CODE_LABEL_BOUNDARY
> 
>  gcc/asan.cc    |  1 +
>  gcc/config/s390/s390.h |  3 +++
>  gcc/defaults.h |  5 +
>  gcc/doc/tm.texi    |  4 
>  gcc/doc/tm.texi.in |  4 
>  gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
>  6 files changed, 32 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
> 



[PATCH v5 2/2] IBM zSystems: Define CODE_LABEL_BOUNDARY

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
Currently s390 emits the following sequence to store a frame_pc:

a:
.LASANPC0:

lg  %r1,.L5-.L4(%r13)
la  %r1,0(%r1,%r12)
stg %r1,176(%r11)

.L5:
.quad   .LASANPC0@GOTOFF

The reason GOT indirection is used instead of larl is that gcc does not
know that .LASANPC0, being a code label, is aligned on a 2-byte
boundary, and larl can load only even addresses.

Define CODE_LABEL_BOUNDARY in order to get rid of GOT indirection:

larl%r1,.LASANPC0
stg %r1,176(%r11)

gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* config/s390/s390.h (CODE_LABEL_BOUNDARY): Specify that s390
requires code labels to be aligned on a 2-byte boundary.

gcc/testsuite/ChangeLog:

2019-06-30  Ilya Leoshkevich  

* gcc.target/s390/asan-no-gotoff.c: New test.
---
 gcc/config/s390/s390.h |  3 +++
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 2 files changed, 18 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index be566215df2..7d078ce6868 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -368,6 +368,9 @@ extern const char *s390_host_detect_local_cpu (int argc, 
const char **argv);
 /* Allocation boundary (in *bits*) for the code of a function.  */
 #define FUNCTION_BOUNDARY 64
 
+/* Alignment required for a code label, in bits.  */
+#define CODE_LABEL_BOUNDARY 16
+
 /* There is no point aligning anything to a rounder boundary than this.  */
 #define BIGGEST_ALIGNMENT 64
 
diff --git a/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c 
b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
new file mode 100644
index 000..f555e4e96f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/asan-no-gotoff.c
@@ -0,0 +1,15 @@
+/* Test that ASAN labels are referenced without unnecessary indirections.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fPIE -O2 -fsanitize=kernel-address --param asan-stack=1" } */
+
+extern void c (int *);
+
+void a ()
+{
+  int b;
+  c (&b);
+}
+
+/* { dg-final { scan-assembler {\tlarl\t%r\d+,\.LASANPC\d+} } } */
+/* { dg-final { scan-assembler-not {\.LASANPC\d+@GOTOFF} } } */
-- 
2.37.2



[PATCH v5 1/2] asan: specify alignment for LASANPC labels

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  

* asan.cc (asan_emit_stack_protection): Use CODE_LABEL_BOUNDARY.
* defaults.h (CODE_LABEL_BOUNDARY): New macro.
* doc/tm.texi: Document CODE_LABEL_BOUNDARY.
* doc/tm.texi.in: Likewise.
---
 gcc/asan.cc| 1 +
 gcc/defaults.h | 5 +
 gcc/doc/tm.texi| 4 
 gcc/doc/tm.texi.in | 4 
 4 files changed, 14 insertions(+)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 8276f12cc69..62f50ee769b 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1960,6 +1960,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 953605c1627..52a471cf08e 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1455,4 +1455,9 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 typedef TARGET_UNIT target_unit;
 #endif
 
+/* Alignment required for a code label, in bits.  */
+#ifndef CODE_LABEL_BOUNDARY
+#define CODE_LABEL_BOUNDARY BITS_PER_UNIT
+#endif
+
 #endif  /* ! GCC_DEFAULTS_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 858bfb80cec..cc588ee23b5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1075,6 +1075,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 21b849ea32a..a0b725b0685 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -971,6 +971,10 @@ to a value equal to or larger than @code{STACK_BOUNDARY}.
 Alignment required for a function entry point, in bits.
 @end defmac
 
+@defmac CODE_LABEL_BOUNDARY
+Alignment required for a code label, in bits.
+@end defmac
+
 @defmac BIGGEST_ALIGNMENT
 Biggest alignment that any data type can require on this machine, in
 bits.  Note that this is not the biggest alignment that is supported,
-- 
2.37.2



[PATCH v5 0/2] IBM zSystems: Improve storing asan frame_pc

2022-09-26 Thread Ilya Leoshkevich via Gcc-patches
Hi,

This is a resend of v4 with slightly adjusted commit messages:

v1: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
v2: https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
v3: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
v4: https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html

It still survives the bootstrap and the regtest on x86_64-redhat-linux,
s390x-redhat-linux and ppc64le-redhat-linux.  It also fixes [1].

I also tried the approach with moving .LASANPC closer to the function
label and using FUNCTION_BOUNDARY instead of introducing
CODE_LABEL_BOUNDARY, but the problem there is that it's hard to catch
the moment where the function label is written.  Architectures can do
it by calling ASM_OUTPUT_LABEL() or assemble_name() in
ASM_DECLARE_FUNCTION_NAME(), ASM_OUTPUT_FUNCTION_LABEL() or
TARGET_ASM_FUNCTION_PROLOGUE().  epiphany_start_function() does that
twice, but passes the same decl to both calls.  Note that simply
moving asan_function_start() to final_start_function_1() is not enough,
since an architecture can write something after the function label.
This all means that for this approach to work, all the architectures
need to be adjusted, which looks like an overkill to me.

Best regards,
Ilya

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-April/593666.html


Ilya Leoshkevich (2):
  asan: specify alignment for LASANPC labels
  IBM zSystems: Define CODE_LABEL_BOUNDARY

 gcc/asan.cc|  1 +
 gcc/config/s390/s390.h |  3 +++
 gcc/defaults.h |  5 +
 gcc/doc/tm.texi|  4 
 gcc/doc/tm.texi.in |  4 
 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c | 15 +++
 6 files changed, 32 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/asan-no-gotoff.c

-- 
2.37.2



Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-17 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-08-11 at 07:45 +0200, Andreas Krebbel wrote:
> On 8/10/22 13:42, Ilya Leoshkevich wrote:
> > On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
> > > Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> > > 
> > > 
> > > 
> > > dg.exp=pr104612.c fails with an ICE on s390x, because
> > > copysignv2sf3
> > > produces an insn that vsel is supposed to recognize, but
> > > can't,
> > > because it's not defined for V2SF.  Fix by defining it for all
> > > vector
> > > modes supported by copysign3.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * config/s390/vector.md (V_HW_FT): New iterator.
> > > * config/s390/vx-builtins.md (vsel): Use V instead
> > > of
> > > V_HW.
> > > ---
> > >  gcc/config/s390/vector.md  |  6 ++
> > >  gcc/config/s390/vx-builtins.md | 12 ++--
> > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > 
> > Jakub pointed out that this is broken in gcc-12 as well.
> > The patch applies cleanly, and I started a bootstrap/regtest.
> > Ok for gcc-12?
> 
> Yes. Thanks!
> 
> Andreas

Hi,

I've committed this today without realizing that gcc-12 branch is
closed.  Sorry!  Please let me know if I should revert this.

Best regards,
Ilya


Re: [PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-10 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-08-03 at 12:20 +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> 
> dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
> produces an insn that vsel is supposed to recognize, but can't,
> because it's not defined for V2SF.  Fix by defining it for all vector
> modes supported by copysign3.
> 
> gcc/ChangeLog:
> 
> * config/s390/vector.md (V_HW_FT): New iterator.
> * config/s390/vx-builtins.md (vsel): Use V instead of
> V_HW.
> ---
>  gcc/config/s390/vector.md  |  6 ++
>  gcc/config/s390/vx-builtins.md | 12 ++--
>  2 files changed, 12 insertions(+), 6 deletions(-)

Jakub pointed out that this is broken in gcc-12 as well.
The patch applies cleanly, and I started a bootstrap/regtest.
Ok for gcc-12?


[PATCH] PR106342 - IBM zSystems: Provide vsel for all vector modes

2022-08-03 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



dg.exp=pr104612.c fails with an ICE on s390x, because copysignv2sf3
produces an insn that vsel is supposed to recognize, but can't,
because it's not defined for V2SF.  Fix by defining it for all vector
modes supported by copysign3.

gcc/ChangeLog:

* config/s390/vector.md (V_HW_FT): New iterator.
* config/s390/vx-builtins.md (vsel): Use V instead of
V_HW.
---
 gcc/config/s390/vector.md  |  6 ++
 gcc/config/s390/vx-builtins.md | 12 ++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index a6c4b4eb974..624729814af 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -63,6 +63,12 @@
   V1DF V2DF
   (V1TF "TARGET_VXE") (TF "TARGET_VXE")])
 
+; All modes present in V_HW and VFT.
+(define_mode_iterator V_HW_FT [V16QI V8HI V4SI V2DI (V1TI "TARGET_VXE") V1DF
+  V2DF (V1SF "TARGET_VXE") (V2SF "TARGET_VXE")
+  (V4SF "TARGET_VXE") (V1TF "TARGET_VXE")
+  (TF "TARGET_VXE")])
+
 ; FP vector modes directly supported by the HW.  This does not include
 ; vector modes using only part of a vector register and should be used
 ; for instructions which might trigger IEEE exceptions.
diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md
index d5130799804..98ee08b2683 100644
--- a/gcc/config/s390/vx-builtins.md
+++ b/gcc/config/s390/vx-builtins.md
@@ -517,12 +517,12 @@
 ; swapped in s390-c.cc when we get here.
 
 (define_insn "vsel"
-  [(set (match_operand:V_HW  0 "register_operand" "=v")
-   (ior:V_HW
-(and:V_HW (match_operand:V_HW   1 "register_operand"  "v")
-  (match_operand:V_HW   3 "register_operand"  "v"))
-(and:V_HW (not:V_HW (match_dup 3))
-  (match_operand:V_HW   2 "register_operand"  "v"]
+  [(set (match_operand:V_HW_FT   0 "register_operand" "=v")
+   (ior:V_HW_FT
+(and:V_HW_FT (match_operand:V_HW_FT 1 "register_operand"  "v")
+ (match_operand:V_HW_FT 3 "register_operand"  "v"))
+(and:V_HW_FT (not:V_HW_FT (match_dup 3))
+ (match_operand:V_HW_FT 2 "register_operand"  "v"]
   "TARGET_VX"
   "vsel\t%v0,%1,%2,%3"
   [(set_attr "op_type" "VRR")])
-- 
2.35.3



Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-29 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2022-04-29 at 13:56 +0200, Jakub Jelinek wrote:
> On Fri, Apr 29, 2022 at 01:52:49PM +0200, Ilya Leoshkevich wrote:
> > > This doesn't resolve the problem, unfortunately, because
> > > references to discarded comdat symbols are still kept in .rodata:
> > > 
> > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced
> > > in
> > > section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined
> > > in
> > > discarded section
> > > `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15
> > > Asse
> > > rt
> > > ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)
> > > 
> > > (That's from building zlib-ng with ASan and your patch on s390).
> > > 
> > > So I was rather thinking about adding a reloc parameter to
> > > mergeable_constant_section () and slightly changing the section
> > > name when it's nonzero, e.g. from .cst to .cstrel.
> > 
> > After some experimenting, I don't think that what I propose here
> > is a good solution anymore, since it won't work with
> > -fno-merge-constants.
> > 
> > What do you think about something like this?
> > 
> > --- a/gcc/varasm.cc
> > +++ b/gcc/varasm.cc
> > @@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode
> > mode, rtx x,
> >     return get_named_section (NULL, ".data.rel.ro", 3);
> >  }
> >  
> > +  if (reloc)
> > +    return targetm.asm_out.function_rodata_section
> > (current_function_decl,
> > +   false);
> > +
> >    return mergeable_constant_section (mode, align, 0);
> >  }
> > 
> > This would put constants with relocations into .rodata..
> > default_function_rodata_section () already ensures that these
> > sections
> > are in the right comdat group.
> 
> We don't really know if the emitted constant is purely for the
> current
> function, or also other functions (say emitted in as constant pool
> constant
> where constant pool constants are shared across the whole TU).
> For the former, putting it into current function's comdat is fine,
> for the
> latter certainly isn't.

mergeable_constant_section (), that the existing code calls in the same
context, already relies on this being known and calls
function_rodata_section () with exactly the same arguments.  If
!current_function_decl && !relocatable, we get readonly_data_section.
Of course, mergeable_constant_section () does not handle comdat
currently, so this point might be moot.

However, looking at the callers of output_constant_pool_contents (), it
seems that !current_function_decl happens in and only in the
shared_constant_pool case, so it looks as if we know whether the
constant is tied to a single function or not.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-29 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-04-28 at 14:05 +0200, Ilya Leoshkevich wrote:
> On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote:
> > On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote:
> > > This is determined by default_elf_select_rtx_section ().  If we
> > > don't
> > > want to mix non-reloc and reloc constants, we need to define a
> > > special
> > > section there.
> > > 
> > > It seems to me, however, that this all would be made purely for
> > > the
> > > sake of .LASANPC, which is quite special: it's local, but at the
> > > same
> > > time it might need to be comdat.  I don't think anything like
> > > this
> > > can
> > > appear from compiling C/C++ code.
> > > 
> > > Therefore I wonder if we could just drop it altogether like this?
> > > 
> > > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx
> > > pbase,
> > > unsigned int alignb,
> > > ...
> > > -  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > > (decl)));
> > > +  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > > (current_function_decl)));
> > > ...
> > > 
> > > That's what LLVM is already doing.  This will also solve the
> > > alignment
> > > problem I referred to earlier.
> > 
> > LLVM is doing a wrong thing here.  The global symbol can be
> > overridden by
> > a symbol in another shared library, that is definitely not what we
> > want,
> > because the ASAN record is for the particular implementation, not
> > the
> > other
> > one which could be quite different.
> 
> I see; this must be relevant when the overriding library calls
> the original one through dlsym (RTLD_NEXT).
> 
> > I think the right fix would be:
> > --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100
> > +++ gcc/varasm.cc   2022-04-28 13:22:44.463147066 +0200
> > @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_
> > return get_named_section (NULL, ".data.rel.ro", 3);
> >  }
> >  
> > +  if (reloc)
> > +    return readonly_data_section;
> > +
> >    return mergeable_constant_section (mode, align, 0);
> >  }
> >  
> > which matches what we do in categorize_decl_for_section:
> >   else if (reloc & targetm.asm_out.reloc_rw_mask ())
> >     ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL :
> > SECCAT_DATA_REL_RO;
> >   else if (reloc || flag_merge_constants < 2
> > ...
> >     /* C and C++ don't allow different variables to share the
> > same
> >    location.  -fmerge-all-constants allows even that (at
> > the
> >    expense of not conforming).  */
> >     ret = SECCAT_RODATA;
> >   else if (DECL_INITIAL (decl)
> >    && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
> >     ret = SECCAT_RODATA_MERGE_STR_INIT;
> >   else
> >     ret = SECCAT_RODATA_MERGE_CONST;
> > i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and
> > .rodata
> > for non-pic, and mergeable sections are only used if there are no
> > relocations.
> 
> This doesn't resolve the problem, unfortunately, because
> references to discarded comdat symbols are still kept in .rodata:
> 
> `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in
> section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in
> discarded section
> `.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Asse
> rt
> ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)
> 
> (That's from building zlib-ng with ASan and your patch on s390).
> 
> So I was rather thinking about adding a reloc parameter to
> mergeable_constant_section () and slightly changing the section
> name when it's nonzero, e.g. from .cst to .cstrel.

After some experimenting, I don't think that what I propose here
is a good solution anymore, since it won't work with
-fno-merge-constants.

What do you think about something like this?

--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -7326,6 +7326,10 @@ default_elf_select_rtx_section (machine_mode
mode, rtx x,
return get_named_section (NULL, ".data.rel.ro", 3);
 }
 
+  if (reloc)
+return targetm.asm_out.function_rodata_section
(current_function_decl,
+   false);
+
   return mergeable_constant_section (mode, align, 0);
 }

This would put constants with relocations into .rodata..
default_function_rodata_section () already ensures that these sections
are in the right comdat group.
> 


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-28 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2022-04-28 at 13:27 +0200, Jakub Jelinek wrote:
> On Thu, Apr 28, 2022 at 01:03:26PM +0200, Ilya Leoshkevich wrote:
> > This is determined by default_elf_select_rtx_section ().  If we
> > don't
> > want to mix non-reloc and reloc constants, we need to define a
> > special
> > section there.
> > 
> > It seems to me, however, that this all would be made purely for the
> > sake of .LASANPC, which is quite special: it's local, but at the
> > same
> > time it might need to be comdat.  I don't think anything like this
> > can
> > appear from compiling C/C++ code.
> > 
> > Therefore I wonder if we could just drop it altogether like this?
> > 
> > @@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx
> > pbase,
> > unsigned int alignb,
> > ...
> > -  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (decl)));
> > +  emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (current_function_decl)));
> > ...
> > 
> > That's what LLVM is already doing.  This will also solve the
> > alignment
> > problem I referred to earlier.
> 
> LLVM is doing a wrong thing here.  The global symbol can be
> overridden by
> a symbol in another shared library, that is definitely not what we
> want,
> because the ASAN record is for the particular implementation, not the
> other
> one which could be quite different.

I see; this must be relevant when the overriding library calls
the original one through dlsym (RTLD_NEXT).

> I think the right fix would be:
> --- gcc/varasm.cc.jj2022-03-07 15:00:17.255592497 +0100
> +++ gcc/varasm.cc   2022-04-28 13:22:44.463147066 +0200
> @@ -7326,6 +7326,9 @@ default_elf_select_rtx_section (machine_
> return get_named_section (NULL, ".data.rel.ro", 3);
>  }
>  
> +  if (reloc)
> +    return readonly_data_section;
> +
>    return mergeable_constant_section (mode, align, 0);
>  }
>  
> which matches what we do in categorize_decl_for_section:
>   else if (reloc & targetm.asm_out.reloc_rw_mask ())
>     ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL :
> SECCAT_DATA_REL_RO;
>   else if (reloc || flag_merge_constants < 2
> ...
>     /* C and C++ don't allow different variables to share the
> same
>    location.  -fmerge-all-constants allows even that (at the
>    expense of not conforming).  */
>     ret = SECCAT_RODATA;
>   else if (DECL_INITIAL (decl)
>    && TREE_CODE (DECL_INITIAL (decl)) == STRING_CST)
>     ret = SECCAT_RODATA_MERGE_STR_INIT;
>   else
>     ret = SECCAT_RODATA_MERGE_CONST;
> i.e. if reloc is true, it goes into .data.rel.ro* for -fpic and
> .rodata
> for non-pic, and mergeable sections are only used if there are no
> relocations.

This doesn't resolve the problem, unfortunately, because
references to discarded comdat symbols are still kept in .rodata:

`.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_' referenced in
section `.rodata' of ../lib/libgtest.a(gtest-all.cc.o): defined in
discarded section
`.text._ZN7testing15AssertionResultlsIPKcEERS0_RKT_[_ZN7testing15Assert
ionResultlsIPKcEERS0_RKT_]' of ../lib/libgtest.a(gtest-all.cc.o)

(That's from building zlib-ng with ASan and your patch on s390).

So I was rather thinking about adding a reloc parameter to
mergeable_constant_section () and slightly changing the section
name when it's nonzero, e.g. from .cst to .cstrel.

> Anyway, I'd feel much safer to change it only in GCC 13, at least
> initially.

That's fine with me.

> Or are some linkers (say lld or mold, fod ld.bfd I'm pretty sure it
> doesn't,
> for gold no idea but unlikely) able to merge even constants with
> relocations against them?

I'm not sure, but putting constants with relocations into a separate
mergeable section shouldn't hurt too much.  And if such a linker is
implemented some day, there would be no need to tweak gcc.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-28 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 14:46 +0200, Jakub Jelinek wrote:
> On Wed, Apr 27, 2022 at 02:23:00PM +0200, Jakub Jelinek wrote:
> > On Wed, Apr 27, 2022 at 11:59:49AM +0200, Ilya Leoshkevich wrote:
> > > I get a .LASANPC reloc there in the first place because of
> > > https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
> > > but of course it may happen for other reasons as well.
> > 
> > In that case I don't see any benefit to put that into a mergeable
> > section.
> > Why does that happen?
> 
> Because, when a mergeable section doesn't contain any relocations, I
> don't
> see any point in making it comdat.  Because mergeable sections
> themselves
> are garbage collected, if some constant isn't referenced at all, it
> isn't
> emitted, or if referenced, multiple copies of the constant are merged
> (or
> for mergeable strings even string tail merging is performed).
> 
> Jakub
> 

This is determined by default_elf_select_rtx_section ().  If we don't
want to mix non-reloc and reloc constants, we need to define a special
section there.

It seems to me, however, that this all would be made purely for the
sake of .LASANPC, which is quite special: it's local, but at the same
time it might need to be comdat.  I don't think anything like this can
appear from compiling C/C++ code.

Therefore I wonder if we could just drop it altogether like this?

@@ -1928,22 +1919,7 @@ asan_emit_stack_protection (rtx base, rtx pbase,
unsigned int alignb,
...
-  emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
+  emit_move_insn (mem, expand_normal (build_fold_addr_expr
(current_function_decl)));
...

That's what LLVM is already doing.  This will also solve the alignment
problem I referred to earlier.


Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 11:59 +0200, Ilya Leoshkevich via Gcc-patches
wrote:
> On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote:
> > On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc-
> > patches wrote:
> > > Bootstrapped and regtested on x86_64-redhat-linux and
> > > s390x-redhat-linux.  Ok for master (or GCC 13 in case this
> > > doesn't
> > > fit
> > > stage4 criteria)?
> > 
> > I'd prefer to defer this to GCC 13 at this point.
> > Furthermore, does the linker then actually merge the constants with
> > the same constants from other mergeable linkonce sections or other
> > mergeable sections?  I'm afraid it would only merge constants
> > within
> > each comdat group and not across the whole ELF object.
> > 
> > Jakub
> > 
> 
> I experimented with this a little, and actually having a reloc
> prevents
> merging altogether (the check happens in _bfd_add_merge_section).
> 
> I get a .LASANPC reloc there in the first place because of
> https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
> but of course it may happen for other reasons as well.

I just realized I forgot to mention the "normal" case.
There, "aMG" seems to works fine with the whole ELF:

$ cat 1.s
.globl _start
_start:
ret
.section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat
.quad 42

$ cat 2.s
.section .rodata.yyy,"aMG",@progbits,8,.yyy,comdat
.quad 42
.quad 43
.section .rodata.xxx,"aMG",@progbits,8,.xxx,comdat
.quad 42

$ gcc -nostartfiles -fPIE 1.s 2.s
$ objdump -D a.out

2000 <.rodata>:
2000:   2a 00   sub(%rax),%al
2002:   00 00   add%al,(%rax)
2004:   00 00   add%al,(%rax)
2006:   00 00   add%al,(%rax)
2008:   2b 00   sub(%rax),%eax
200a:   00 00   add%al,(%rax)
200c:   00 00   add%al,(%rax)
...



Re: [PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2022-04-27 at 11:33 +0200, Jakub Jelinek wrote:
> On Wed, Apr 27, 2022 at 11:27:49AM +0200, Ilya Leoshkevich via Gcc-
> patches wrote:
> > Bootstrapped and regtested on x86_64-redhat-linux and
> > s390x-redhat-linux.  Ok for master (or GCC 13 in case this doesn't
> > fit
> > stage4 criteria)?
> 
> I'd prefer to defer this to GCC 13 at this point.
> Furthermore, does the linker then actually merge the constants with
> the same constants from other mergeable linkonce sections or other
> mergeable sections?  I'm afraid it would only merge constants within
> each comdat group and not across the whole ELF object.
> 
> Jakub
> 

I experimented with this a little, and actually having a reloc prevents
merging altogether (the check happens in _bfd_add_merge_section).

I get a .LASANPC reloc there in the first place because of
https://patchwork.ozlabs.org/project/gcc/patch/20190702085154.26981-1-...@linux.ibm.com/
but of course it may happen for other reasons as well.


[PATCH] Honor COMDAT for mergeable constant sections

2022-04-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.  Ok for master (or GCC 13 in case this doesn't fit
stage4 criteria)?



Building C++ template-heavy code with ASan sometimes leads to bogus
"defined in discarded section" linker errors.

The reason is that .rodata.FUNC.cstN sections are not placed into
COMDAT group sections FUNC.  This is important, because ASan puts
references to .LASANPC labels into these sections.  Discarding the
respective .text.FUNC section causes the linker error.

Fix by adding SECTION_LINKONCE to .rodata.FUNC.cstN sections in
mergeable_constant_section () if the current function has an associated
COMDAT group.  This is similar to what switch_to_exception_section ()
is currently doing with .gcc_except_table.FUNC sections.

gcc/ChangeLog:

* varasm.cc (mergeable_constant_section): Honor COMDAT.

gcc/testsuite/ChangeLog:

* g++.dg/asan/comdat.C: New test.
---
 gcc/testsuite/g++.dg/asan/comdat.C | 35 ++
 gcc/varasm.cc  |  6 -
 2 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/asan/comdat.C

diff --git a/gcc/testsuite/g++.dg/asan/comdat.C 
b/gcc/testsuite/g++.dg/asan/comdat.C
new file mode 100644
index 000..cd4f3f830a8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/comdat.C
@@ -0,0 +1,35 @@
+/* Check that we don't emit non-COMDAT rodata.  */
+
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-not 
{\.section\t\.rodata\._ZN1hlsIPKcEERS_RKT_\.cst[48],"[^"]*",@progbits,[48]\n} } 
} */
+
+const char *a;
+
+class b
+{
+public:
+  b ();
+};
+
+class h
+{
+public:
+  template 
+  h &
+  operator<< (const c &)
+  {
+d (b ());
+return *this;
+  }
+
+  void d (b);
+};
+
+h e ();
+
+h
+g ()
+{
+  e () << a << a << a;
+  throw;
+}
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index c41f17d64f7..f2614f0ee39 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -938,7 +938,11 @@ mergeable_constant_section (machine_mode mode 
ATTRIBUTE_UNUSED,
 
   sprintf (name, "%s.cst%d", prefix, (int) (align / 8));
   flags |= (align / 8) | SECTION_MERGE;
-  return get_section (name, flags, NULL);
+  if (current_function_decl
+ && DECL_COMDAT_GROUP (current_function_decl)
+ && HAVE_COMDAT_GROUP)
+   flags |= SECTION_LINKONCE;
+  return get_section (name, flags, current_function_decl);
 }
   return readonly_data_section;
 }
-- 
2.35.1



[PATCH][GCC11] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for
releases/gcc-11?



s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).

First of all, this is unnecessary - the other backends don't do it.

Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set.  This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE.  This causes a section
type conflict with the existing SECTION_CODE.

gcc/ChangeLog:

* config/s390/s390.c (s390_code_end): Do not switch back to
code section.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-section-type-conflict.c: New test.

(cherry picked from commit 8753b13a31c777cdab0265dae0b68534247908f7)
---
 gcc/config/s390/s390.c|  1 -
 .../s390/nobp-section-type-conflict.c | 22 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 8895dd7cc76..2d2e6522eb4 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16700,7 +16700,6 @@ s390_code_end (void)
  assemble_name_raw (asm_out_file, label_start);
  fputs ("-.\n", asm_out_file);
}
- switch_to_section (current_function_section ());
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
new file mode 100644
index 000..5d78bc99bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
@@ -0,0 +1,22 @@
+/* Checks that we don't get error: section type conflict with ‘put_page’.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern 
-mindirect-branch-table -O2" } */
+
+int a;
+int b (void);
+void c (int);
+
+static void
+put_page (void)
+{
+  if (b ())
+c (a);
+}
+
+__attribute__ ((__section__ (".init.text"), __cold__)) void
+d (void)
+{
+  put_page ();
+  put_page ();
+}
-- 
2.34.1



[PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?


s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).

First of all, this is unnecessary - the other backends don't do it.

Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set.  This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE.  This causes a section
type conflict with the existing SECTION_CODE.

gcc/ChangeLog:

* config/s390/s390.cc (s390_code_end): Do not switch back to
code section.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-section-type-conflict.c: New test.
---
 gcc/config/s390/s390.cc   |  1 -
 .../s390/nobp-section-type-conflict.c | 22 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 43c5c72554a..2db12d4ba4b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -16809,7 +16809,6 @@ s390_code_end (void)
  assemble_name_raw (asm_out_file, label_start);
  fputs ("-.\n", asm_out_file);
}
- switch_to_section (current_function_section ());
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
new file mode 100644
index 000..5d78bc99bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
@@ -0,0 +1,22 @@
+/* Checks that we don't get error: section type conflict with ‘put_page’.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern 
-mindirect-branch-table -O2" } */
+
+int a;
+int b (void);
+void c (int);
+
+static void
+put_page (void)
+{
+  if (b ())
+c (a);
+}
+
+__attribute__ ((__section__ (".init.text"), __cold__)) void
+d (void)
+{
+  put_page ();
+  put_page ();
+}
-- 
2.34.1



[PATCH gcc-11 2/2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use %K, use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* g++.target/s390/mi-thunk.C: New test.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call-z10-pic.c: New test.
* gcc.target/s390/call-z10.c: New test.
* gcc.target/s390/call-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call-z9-pic.c: New test.
* gcc.target/s390/call-z9.c: New test.
* gcc.target/s390/mfentry-m64-pic.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls-pic.c: New test.
* gcc.target/s390/tls.c: New test.

(cherry picked from commit 0990d93dd8a)
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 81 +--
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  | 23 ++
 .../gcc.target/s390/call-z10-pic-nodatarel.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10.c  | 20 +
 .../gcc.target/s390/call-z9-pic-nodatarel.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9.c   | 20 +
 gcc/testsuite/gcc.target/s390/call.h  | 40 +
 .../gcc.target/s390/mfentry-m64-pic.c |  9 +++
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +-
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   | 14 
 gcc/testsuite/gcc.target/s390/tls.c   | 10 +++
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 19 files changed, 320 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.h

diff --git a/gcc/config/s390/predicates.md b/gcc/config/s390/predicates.md
index 15093cb4b30..99c343aa32c 100644
--- a/gcc/config/s390/predicates.md
+++ b/gcc/config/s390/predicates.md
@@ -101,10 +101,13 @@
 
 (define_special_predicate "bras_sym_operand"
   (ior (and (match_code "symbol_ref")
-   (match_test "!flag_pic || SYMBOL_REF_LOCAL_P (op)"))
+   (ior (match_test "!flag_pic")
+(match_test 

[PATCH gcc-11 1/2] IBM Z: Define NO_PROFILE_COUNTERS

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.

(cherry picked from commit a1c1b7a888a)
---
 gcc/config/s390/s390.c| 42 +++
 gcc/config/s390/s390.h|  2 +
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |  2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |  2 +-
 4 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index c5d4c439bcc..a863dfce9a2 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13120,33 +13120,25 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
-  rtx op[8];
-
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
+  rtx op[4];
 
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
   op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
-  op[7] = GEN_INT (UNITS_PER_LONG);
-
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
+  op[3] = GEN_INT (UNITS_PER_LONG);
 
-  op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
+  op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
-  op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT);
-  op[4] = gen_rtx_CONST (Pmode, op[4]);
+  op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT);
+  op[2] = gen_rtx_CONST (Pmode, op[2]);
 }
 
   if (flag_record_mcount)
@@ -13160,20 +13152,19 @@ s390_function_profiler (FILE *file, int labelno)
warning (OPT_Wcannot_profile, "nested functions cannot be profiled "
 "with %<-mfentry%> on s390");
   else
-   output_asm_insn ("brasl\t0,%4", op);
+   output_asm_insn ("brasl\t0,%2", op);
 }
   else if (TARGET_64BIT)
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 +
-/* brasl */ 3 + /* lg */ 3);
+   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 +
+/* lg */ 3);
   else
{
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
@@ -13182,15 +13173,14 @@ s390_function_profiler (FILE *file, int labelno)
   else
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 +
-/* brasl */ 3 + /* l */ 2);
+   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 +
+/* l */ 2);
   else
{
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines for nested functions.  */
 
diff --git a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c 
b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
index b2ad9f5bced..874ceb96fe8 100644
--- a/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
+++ b/gcc/testsuite/gcc.target/s390/mnop-mcount-m31-mzarch.c
@@ -4,5 +4,5 @@
 void
 profileme 

[PATCH gcc-11 0/2] Backport kpatch changes

2021-09-30 Thread Ilya Leoshkevich via Gcc-patches
Hi,

This series contains a backport of kpatch changes needed to support
https://github.com/dynup/kpatch/pull/1203 so that it could be used in
RHEL 9.  The patches have been in master for 4 months now without
issues.

Bootstrapped and regtested on s390x-redhat-linux.

Ok for gcc-11?

Best regards,
Ilya

Ilya Leoshkevich (2):
  IBM Z: Define NO_PROFILE_COUNTERS
  IBM Z: Use @PLT symbols for local functions in 64-bit mode

 gcc/config/s390/predicates.md |   9 +-
 gcc/config/s390/s390.c| 115 +++---
 gcc/config/s390/s390.h|   2 +
 gcc/config/s390/s390.md   |  32 ++---
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |   2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  |  23 
 .../gcc.target/s390/call-z10-pic-nodatarel.c  |  20 +++
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  |  20 +++
 gcc/testsuite/gcc.target/s390/call-z10.c  |  20 +++
 .../gcc.target/s390/call-z9-pic-nodatarel.c   |  18 +++
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   |  18 +++
 gcc/testsuite/gcc.target/s390/call-z9.c   |  20 +++
 gcc/testsuite/gcc.target/s390/call.h  |  40 ++
 .../gcc.target/s390/mfentry-m64-pic.c |   9 ++
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |   2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |   2 +-
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   |  26 +---
 gcc/testsuite/gcc.target/s390/pr80080-4.c |   2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|   6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   |  14 +++
 gcc/testsuite/gcc.target/s390/tls.c   |  10 ++
 gcc/testsuite/gcc.target/s390/tls.h   |  23 
 22 files changed, 336 insertions(+), 97 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.h

-- 
2.31.1



Re: [PATCH v3 3/3] reassoc: Test rank biasing

2021-09-28 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-09-28 at 13:28 +0200, Richard Biener wrote:
> On Sun, 26 Sep 2021, Ilya Leoshkevich wrote:
> 
> > Add both positive and negative tests.
> 
> The tests will likely be quite fragile with respect to what is
> actually vectorized on which target.  If you move the tests
> to gcc.dg/vect/ you could at least do
> 
> /* { dg-require-effective-target vect_int } */
> 
> do you need to look for the exact GIMPLE IL or is it enough to
> verify we are vectorizing the reduction?

Actually I don't think vectorization is that important here, and I
only check how many times sum_x = sum_y + _z appears.  So I use
(?:vect_)?, which may or may not be there.

An alternative I considered was to use -fno-tree-vectorize to get
smaller regexes, but I thought it would be nice to know that
vectorization does not mess up reassociation results.

Best regards,
Ilya



[PATCH v3 3/3] reassoc: Test rank biasing

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
Add both positive and negative tests.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/reassoc-46.c: New test.
* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
* gcc.dg/tree-ssa/reassoc-47.c: New test.
* gcc.dg/tree-ssa/reassoc-48.c: New test.
* gcc.dg/tree-ssa/reassoc-49.c: New test.
* gcc.dg/tree-ssa/reassoc-50.c: New test.
* gcc.dg/tree-ssa/reassoc-51.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
 7 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
new file mode 100644
index 000..97563dd929f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#include "reassoc-46.h"
+
+/* Check that the loop accumulator is added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
new file mode 100644
index 000..e60b490ea0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
@@ -0,0 +1,33 @@
+#define M 1024
+unsigned int arr1[M];
+unsigned int arr2[M];
+volatile unsigned int sink;
+
+unsigned int
+test (void)
+{
+  unsigned int sum = 0;
+  for (int i = 0; i < M; i++)
+{
+#ifdef MODIFY
+  /* Modify the loop accumulator using a chain of operations - this should
+ not affect its rank biasing.  */
+  sum |= 1;
+  sum ^= 2;
+#endif
+#ifdef STORE
+  /* Save the loop accumulator into a global variable - this should not
+ affect its rank biasing.  */
+  sink = sum;
+#endif
+#ifdef USE
+  /* Add a tricky use of the loop accumulator - this should prevent its
+ rank biasing.  */
+  i = (i + sum) % M;
+#endif
+  /* Use addends with different ranks.  */
+  sum += arr1[i];
+  sum += arr2[((i ^ 1) + 1) % M];
+}
+  return sum;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
new file mode 100644
index 000..1b0f0fdabe1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define MODIFY
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is saved into a global variable, it's
+   still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
new file mode 100644
index 000..13836ebe8e6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is modified using a chain of operations
+   other than addition, its new value is still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d._]+ \+ 
(?:vect_)?_[\d._]+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
new file mode 100644
index 000..c1136a447a2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -ftree-vectorize" } */
+
+#define MODIFY
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is both modified using a chain of
+   operations other than addition and stored into a global variable, its new
+   value is still added last.  */
+/* { dg-final { scan-tree-dump-times {(?:vect_)?sum_[\d._]+ = 
(?:(?:vect_)?_[\d._]+ \+ (?:vect_)?sum_[\d._]+|(?:vect_)?sum_[\d

[PATCH v3 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

* tree-ssa-reassoc.c (biased_names): New global.
(propagate_bias_p): New function.
(loop_carried_phi): Remove.
(propagate_rank): Propagate bias along single uses.
(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 109 -
 1 file changed, 74 insertions(+), 35 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..db9fb4e1cac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@ static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
all basic blocks the CFG should be adjusted - basic blocks
split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,53 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+ x_1 = phi(x_0, x_2)
+ a = x_1 | 1
+ b = a ^ 2
+ .MEM = b
+ c = b + d
+ x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_reference)
+return false;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+{
+  gimple *current_use_stmt = USE_STMT (use);
+
+  if (is_gimple_assign (current_use_stmt)
+ && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+   {
+ if (single_use_stmt != NULL && single_use_stmt != current_use_stmt)
+   return false;
+ single_use_stmt = current_use_stmt;
+   }
+}
+
+  if (single_use_stmt == NULL)
+return false;
+
+  if (gimple_bb (stmt)->loop_father
+  != gimple_bb (single_use_stmt)->loop_father)
+return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
an innermost loop, and the phi has only a single use which is inside
the loop, then the rank is the block rank of the loop latch plus an
@@ -313,49 +364,27 @@ phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-  || SSA_NAME_IS_DEFAULT_DEF (exp))
-return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
- an additional bias added in.  If this phi doesn't have block rank,
- it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
from expression OP.  For most operands, this is just the rank of OP.
For loop-carried phis, the value is zero to avoid undoing the bias
in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, bool *maybe_biased_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
- 

[PATCH v3 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..c5f915d04c6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, true /* early_p */);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
-  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, false /* early_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
 return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
 {
   gcc_assert (n == 0);
   insert_powi_p = param;
+  bias_loop_carried_phi_ranks_p = !param;
 }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-{ return execute_reassoc (insert_powi_p); }
+  {
+return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
  point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
-- 
2.31.1



[PATCH v3 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-26 Thread Ilya Leoshkevich via Gcc-patches
v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579976.html
Changes in v3:
* Do not propagate bias along tcc_references.
* Call get_rank () before checking biased_names.
* Add loop-carried phis to biased_names.
* Move the propagate_bias_p () call outside of the loop.
* Test with -ftree-vectorize, adjust expectations.

Ilya Leoshkevich (3):
  reassoc: Do not bias loop-carried PHIs early
  reassoc: Propagate PHI_LOOP_BIAS along single uses
  reassoc: Test rank biasing

 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |   7 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h |  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c |  11 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c |  10 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c |  11 ++
 gcc/tree-ssa-reassoc.c | 125 +++--
 9 files changed, 180 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

-- 
2.31.1



Re: [PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-24 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-09-23 at 13:55 +0200, Richard Biener wrote:
> On Wed, 22 Sep 2021, Ilya Leoshkevich wrote:
> 
> > PR tree-optimization/49749 introduced code that shortens dependency
> > chains containing loop accumulators by placing them last on operand
> > lists of associative operations.
> > 
> > 456.hmmer benchmark on s390 could benefit from this, however, the
> > code
> > that needs it modifies loop accumulator before using it, and since
> > only
> > so-called loop-carried phis are are treated as loop accumulators,
> > the
> > code in the present form doesn't really help.   According to Bill
> > Schmidt - the original author - such a conservative approach was
> > chosen
> > so as to avoid unnecessarily swapping operands, which might cause
> > unpredictable effects.  However, giving special treatment to forms
> > of
> > loop accumulators is acceptable.
> > 
> > The definition of loop-carried phi is: it's a single-use phi, which
> > is
> > used in the same innermost loop it's defined in, at least one
> > argument
> > of which is defined in the same innermost loop as the phi itself.
> > Given this, it seems natural to treat single uses of such phis as
> > phis
> > themselves.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-reassoc.c (biased_names): New global.
> > (propagate_bias_p): New function.
> > (loop_carried_phi): Remove.
> > (propagate_rank): Propagate bias along single uses.
> > (get_rank): Update biased_names when needed.
> > ---
> >  gcc/tree-ssa-reassoc.c | 97 --
> > 
> >  1 file changed, 64 insertions(+), 33 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> > index 420c14e8cf5..2f7a8882aac 100644
> > --- a/gcc/tree-ssa-reassoc.c
> > +++ b/gcc/tree-ssa-reassoc.c
> > @@ -211,6 +211,10 @@ static int64_t *bb_rank;
> >  /* Operand->rank hashtable.  */
> >  static hash_map *operand_rank;
> >  
> > +/* SSA_NAMEs that are forms of loop accumulators and whose ranks
> > need to be
> > +   biased.  */
> > +static auto_bitmap biased_names;
> > +
> >  /* Vector of SSA_NAMEs on which after reassociate_bb is done with
> >     all basic blocks the CFG should be adjusted - basic blocks
> >     split right after that SSA_NAME's definition statement and
> > before
> > @@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator
> > *gsi)
> >     the rank difference between two blocks.  */
> >  #define PHI_LOOP_BIAS (1 << 15)
> >  
> > +/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of
> > the STMT's
> > +   operands to the STMT's left-hand side.  The goal is to preserve
> > bias in code
> > +   like this:
> > +
> > + x_1 = phi(x_0, x_2)
> > + a = x_1 | 1
> > + b = a ^ 2
> > + .MEM = b
> > + c = b + d
> > + x_2 = c + e
> > +
> > +   That is, we need to preserve bias along single-use chains
> > originating from
> > +   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are
> > considered to be
> > +   uses, because only they participate in rank propagation.  */
> > +static bool
> > +propagate_bias_p (gimple *stmt)
> > +{
> > +  use_operand_p use;
> > +  imm_use_iterator use_iter;
> > +  gimple *single_use_stmt = NULL;
> > +
> > +  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
> > +    {
> > +  gimple *current_use_stmt = USE_STMT (use);
> > +
> > +  if (is_gimple_assign (current_use_stmt)
> > + && TREE_CODE (gimple_assign_lhs (current_use_stmt)) ==
> > SSA_NAME)
> > +   {
> > + if (single_use_stmt != NULL)
> 
> what if single_use_stmt == current_use_stmt?  We might have two
> uses on a stmt after all - should that still be biased?  I guess not
> and thus the check is correct?

Come to think of it, it should be ok to bias it.  Things like
x = x + x are fine (this particular case can be transformed into
something else earlier, but I think the overall point still holds).
> 
> > +   return false;
> > + single_use_stmt = current_use_stmt;
> > +   }
> > +    }
> > +
> > +  if (single_use_stmt == NULL)
> > +    return false;
> > +
> > +  if (gimple_bb (stmt)->loop_father
> > +  != gimple_bb (single_use_stmt)->loop_father)
> > +    return false;
> > +
> > +  return true;
> > +}
> &g

[PATCH v2 3/3] reassoc: Test rank biasing

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
Add both positive and negative tests.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/reassoc-46.c: New test.
* gcc.dg/tree-ssa/reassoc-46.h: Common code for new tests.
* gcc.dg/tree-ssa/reassoc-47.c: New test.
* gcc.dg/tree-ssa/reassoc-48.c: New test.
* gcc.dg/tree-ssa/reassoc-49.c: New test.
* gcc.dg/tree-ssa/reassoc-50.c: New test.
* gcc.dg/tree-ssa/reassoc-51.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h | 33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |  9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c | 11 
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c | 10 +++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c | 11 
 7 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
new file mode 100644
index 000..69e02bc4d4a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include "reassoc-46.h"
+
+/* Check that the loop accumulator is added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
new file mode 100644
index 000..e60b490ea0d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
@@ -0,0 +1,33 @@
+#define M 1024
+unsigned int arr1[M];
+unsigned int arr2[M];
+volatile unsigned int sink;
+
+unsigned int
+test (void)
+{
+  unsigned int sum = 0;
+  for (int i = 0; i < M; i++)
+{
+#ifdef MODIFY
+  /* Modify the loop accumulator using a chain of operations - this should
+ not affect its rank biasing.  */
+  sum |= 1;
+  sum ^= 2;
+#endif
+#ifdef STORE
+  /* Save the loop accumulator into a global variable - this should not
+ affect its rank biasing.  */
+  sink = sum;
+#endif
+#ifdef USE
+  /* Add a tricky use of the loop accumulator - this should prevent its
+ rank biasing.  */
+  i = (i + sum) % M;
+#endif
+  /* Use addends with different ranks.  */
+  sum += arr1[i];
+  sum += arr2[((i ^ 1) + 1) % M];
+}
+  return sum;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
new file mode 100644
index 000..84b51ccddb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is saved into a global variable, it's
+   still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
new file mode 100644
index 000..53ae8820281
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is modified using a chain of operations
+   other than addition, its new value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
new file mode 100644
index 000..a6941d5ac2b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define MODIFY
+#define STORE
+#include "reassoc-46.h"
+
+/* Check that if the loop accumulator is both modified using a chain of
+   operations other than addition and stored into a global variable, its new
+   value is still added last.  */
+/* { dg-final { scan-tree-dump-times {sum_\d+ = (?:_\d+ \+ sum_\d+|sum_\d+ \+ 
_\d+)} 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c 
b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
new file mode 100644
index 000..68cd308c4f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimize

[PATCH v2 2/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
PR tree-optimization/49749 introduced code that shortens dependency
chains containing loop accumulators by placing them last on operand
lists of associative operations.

456.hmmer benchmark on s390 could benefit from this, however, the code
that needs it modifies loop accumulator before using it, and since only
so-called loop-carried phis are are treated as loop accumulators, the
code in the present form doesn't really help.   According to Bill
Schmidt - the original author - such a conservative approach was chosen
so as to avoid unnecessarily swapping operands, which might cause
unpredictable effects.  However, giving special treatment to forms of
loop accumulators is acceptable.

The definition of loop-carried phi is: it's a single-use phi, which is
used in the same innermost loop it's defined in, at least one argument
of which is defined in the same innermost loop as the phi itself.
Given this, it seems natural to treat single uses of such phis as phis
themselves.

gcc/ChangeLog:

* tree-ssa-reassoc.c (biased_names): New global.
(propagate_bias_p): New function.
(loop_carried_phi): Remove.
(propagate_rank): Propagate bias along single uses.
(get_rank): Update biased_names when needed.
---
 gcc/tree-ssa-reassoc.c | 97 --
 1 file changed, 64 insertions(+), 33 deletions(-)

diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 420c14e8cf5..2f7a8882aac 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -211,6 +211,10 @@ static int64_t *bb_rank;
 /* Operand->rank hashtable.  */
 static hash_map *operand_rank;
 
+/* SSA_NAMEs that are forms of loop accumulators and whose ranks need to be
+   biased.  */
+static auto_bitmap biased_names;
+
 /* Vector of SSA_NAMEs on which after reassociate_bb is done with
all basic blocks the CFG should be adjusted - basic blocks
split right after that SSA_NAME's definition statement and before
@@ -256,6 +260,50 @@ reassoc_remove_stmt (gimple_stmt_iterator *gsi)
the rank difference between two blocks.  */
 #define PHI_LOOP_BIAS (1 << 15)
 
+/* Return TRUE iff PHI_LOOP_BIAS should be propagated from one of the STMT's
+   operands to the STMT's left-hand side.  The goal is to preserve bias in code
+   like this:
+
+ x_1 = phi(x_0, x_2)
+ a = x_1 | 1
+ b = a ^ 2
+ .MEM = b
+ c = b + d
+ x_2 = c + e
+
+   That is, we need to preserve bias along single-use chains originating from
+   loop-carried phis.  Only GIMPLE_ASSIGNs to SSA_NAMEs are considered to be
+   uses, because only they participate in rank propagation.  */
+static bool
+propagate_bias_p (gimple *stmt)
+{
+  use_operand_p use;
+  imm_use_iterator use_iter;
+  gimple *single_use_stmt = NULL;
+
+  FOR_EACH_IMM_USE_FAST (use, use_iter, gimple_assign_lhs (stmt))
+{
+  gimple *current_use_stmt = USE_STMT (use);
+
+  if (is_gimple_assign (current_use_stmt)
+ && TREE_CODE (gimple_assign_lhs (current_use_stmt)) == SSA_NAME)
+   {
+ if (single_use_stmt != NULL)
+   return false;
+ single_use_stmt = current_use_stmt;
+   }
+}
+
+  if (single_use_stmt == NULL)
+return false;
+
+  if (gimple_bb (stmt)->loop_father
+  != gimple_bb (single_use_stmt)->loop_father)
+return false;
+
+  return true;
+}
+
 /* Rank assigned to a phi statement.  If STMT is a loop-carried phi of
an innermost loop, and the phi has only a single use which is inside
the loop, then the rank is the block rank of the loop latch plus an
@@ -313,46 +361,23 @@ phi_rank (gimple *stmt)
   return bb_rank[bb->index];
 }
 
-/* If EXP is an SSA_NAME defined by a PHI statement that represents a
-   loop-carried dependence of an innermost loop, return TRUE; else
-   return FALSE.  */
-static bool
-loop_carried_phi (tree exp)
-{
-  gimple *phi_stmt;
-  int64_t block_rank;
-
-  if (TREE_CODE (exp) != SSA_NAME
-  || SSA_NAME_IS_DEFAULT_DEF (exp))
-return false;
-
-  phi_stmt = SSA_NAME_DEF_STMT (exp);
-
-  if (gimple_code (SSA_NAME_DEF_STMT (exp)) != GIMPLE_PHI)
-return false;
-
-  /* Non-loop-carried phis have block rank.  Loop-carried phis have
- an additional bias added in.  If this phi doesn't have block rank,
- it's biased and should not be propagated.  */
-  block_rank = bb_rank[gimple_bb (phi_stmt)->index];
-
-  if (phi_rank (phi_stmt) != block_rank)
-return true;
-
-  return false;
-}
-
 /* Return the maximum of RANK and the rank that should be propagated
from expression OP.  For most operands, this is just the rank of OP.
For loop-carried phis, the value is zero to avoid undoing the bias
in favor of the phi.  */
 static int64_t
-propagate_rank (int64_t rank, tree op)
+propagate_rank (int64_t rank, tree op, gimple *stmt, bool *bias_p)
 {
   int64_t op_rank;
 
-  if (loop_carried_phi (op))
-return rank;
+  if (TREE_CODE (op) == SSA_NAME
+  && bitmap_bit_p (biased_names, SSA_NAME_VERSION (op)))
+{
+  i

[PATCH v2 1/3] reassoc: Do not bias loop-carried PHIs early

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
Biasing loop-carried PHIs during the 1st reassociation pass interferes
with reduction chains and does not bring measurable benefits, so do it
only during the 2nd reassociation pass.

gcc/ChangeLog:

* passes.def (pass_reassoc): Rename parameter to early_p.
* tree-ssa-reassoc.c (reassoc_bias_loop_carried_phi_ranks_p):
New variable.
(phi_rank): Don't bias loop-carried phi ranks
before vectorization pass.
(execute_reassoc): Add bias_loop_carried_phi_ranks_p parameter.
(pass_reassoc::pass_reassoc): Add bias_loop_carried_phi_ranks_p
initializer.
(pass_reassoc::set_param): Set bias_loop_carried_phi_ranks_p
value.
(pass_reassoc::execute): Pass bias_loop_carried_phi_ranks_p to
execute_reassoc.
(pass_reassoc::bias_loop_carried_phi_ranks_p): New member.
---
 gcc/passes.def |  4 ++--
 gcc/tree-ssa-reassoc.c | 16 ++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index d7a1f8c97a6..c5f915d04c6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -242,7 +242,7 @@ along with GCC; see the file COPYING3.  If not see
   /* Identify paths that should never be executed in a conforming
 program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, true /* early_p */);
   NEXT_PASS (pass_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
@@ -325,7 +325,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vector_ssa);
   NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_cse_reciprocals);
-  NEXT_PASS (pass_reassoc, false /* insert_powi_p */);
+  NEXT_PASS (pass_reassoc, false /* early_p */);
   NEXT_PASS (pass_strength_reduction);
   NEXT_PASS (pass_split_paths);
   NEXT_PASS (pass_tracer);
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 8498cfc7aa8..420c14e8cf5 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -180,6 +180,10 @@ along with GCC; see the file COPYING3.  If not see
point 3a in the pass header comment.  */
 static bool reassoc_insert_powi_p;
 
+/* Enable biasing ranks of loop accumulators.  We don't want this before
+   vectorization, since it interferes with reduction chains.  */
+static bool reassoc_bias_loop_carried_phi_ranks_p;
+
 /* Statistics */
 static struct
 {
@@ -269,6 +273,9 @@ phi_rank (gimple *stmt)
   use_operand_p use;
   gimple *use_stmt;
 
+  if (!reassoc_bias_loop_carried_phi_ranks_p)
+return bb_rank[bb->index];
+
   /* We only care about real loops (those with a latch).  */
   if (!father->latch)
 return bb_rank[bb->index];
@@ -6940,9 +6947,10 @@ fini_reassoc (void)
optimization of a gimple conditional.  Otherwise returns zero.  */
 
 static unsigned int
-execute_reassoc (bool insert_powi_p)
+execute_reassoc (bool insert_powi_p, bool bias_loop_carried_phi_ranks_p)
 {
   reassoc_insert_powi_p = insert_powi_p;
+  reassoc_bias_loop_carried_phi_ranks_p = bias_loop_carried_phi_ranks_p;
 
   init_reassoc ();
 
@@ -6983,15 +6991,19 @@ public:
 {
   gcc_assert (n == 0);
   insert_powi_p = param;
+  bias_loop_carried_phi_ranks_p = !param;
 }
   virtual bool gate (function *) { return flag_tree_reassoc != 0; }
   virtual unsigned int execute (function *)
-{ return execute_reassoc (insert_powi_p); }
+  {
+return execute_reassoc (insert_powi_p, bias_loop_carried_phi_ranks_p);
+  }
 
  private:
   /* Enable insertion of __builtin_powi calls during execute_reassoc.  See
  point 3a in the pass header comment.  */
   bool insert_powi_p;
+  bool bias_loop_carried_phi_ranks_p;
 }; // class pass_reassoc
 
 } // anon namespace
-- 
2.31.1



[PATCH v2 0/3] reassoc: Propagate PHI_LOOP_BIAS along single uses

2021-09-21 Thread Ilya Leoshkevich via Gcc-patches
This is an update to my very old patch with the review comments
addressed.  Bootstrapped and regtested x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548785.html
Changes in v2:
* Disable PHI biasing in the early pass instance in a separate patch.
* Replace s390-specific tests with the generic tree-ssa ones.
* Replace the fragile (op_rank & PHI_LOOP_BIAS) test with auto_bitmap
  biased_names.  The review suggestion was to rather check whether op
  is defined by a loop-carried phi, but this would allow detecting only
  single assingments, and not assignment chains.  Another alternative
  that would make the check less fragile was to use saturating addition
  in order to prevent overflows into the PHI_LOOP_BIAS bit, but
  auto_bitmap of SSA_NAMEs allows graceful processing of large basic
  blocks, and its memory overhead looks acceptable.
* Restructure the code to make it a bit more readable.  The overall
  logic is the same as in v1.  I considered implementing an idea from
  [1], more specifically, detecting single-use chains in
  is_phi_for_stmt() so that swap_ops_for_binary_stmt() shifts the
  corresponding operand towards the end.  These two functions actually
  seem to serve a very related purpose.  However, for single-use chain
  detection we would still need to recursively traverse
  SSA_NAME_DEF_STMTs of operands, which propagate_rank() and friends
  already do.  So this would not have resulted in a significant code
  simplification.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549149.html

Ilya Leoshkevich (3):
  reassoc: Do not bias loop-carried PHIs early
  reassoc: Propagate PHI_LOOP_BIAS along single uses
  reassoc: Test rank biasing

 gcc/passes.def |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c |   7 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h |  33 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c |   9 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c |  11 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c |  10 ++
 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c |  11 ++
 gcc/tree-ssa-reassoc.c | 113 ++---
 9 files changed, 170 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-46.h
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-47.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-48.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-49.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-50.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/reassoc-51.c

-- 
2.31.1



[PATCH] IBM Z: Enable LSan and TSan

2021-07-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

libsanitizer/ChangeLog:

* configure.tgt (s390*-*-linux*): Enable LSan and TSan for
s390x.
---
 libsanitizer/configure.tgt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index 0ca5d9fd924..f635e412bdc 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -41,6 +41,11 @@ case "${target}" in
   sparc*-*-linux*)
;;
   s390*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   LSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_s390x.lo
+   fi
;;
   sparc*-*-solaris2.11*)
;;
-- 
2.31.1



[PATCH] IBM Z: Fix 5 tests in 31-bit mode

2021-07-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



gcc/testsuite/ChangeLog:

* gcc.target/s390/global-array-element-pic2.c: Add -mzarch, add
an expectation for 31-bit mode.
* gcc.target/s390/load-imm64-1.c: Use unsigned long long.
* gcc.target/s390/load-imm64-2.c: Likewise.
* gcc.target/s390/vector/long-double-vx-macro-off-on.c: Use
-mzarch.
* gcc.target/s390/vector/long-double-vx-macro-on-off.c:
Likewise.
---
 gcc/testsuite/gcc.target/s390/global-array-element-pic2.c| 5 +++--
 gcc/testsuite/gcc.target/s390/load-imm64-1.c | 4 ++--
 gcc/testsuite/gcc.target/s390/load-imm64-2.c | 4 ++--
 .../gcc.target/s390/vector/long-double-vx-macro-off-on.c | 2 +-
 .../gcc.target/s390/vector/long-double-vx-macro-on-off.c | 2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c 
b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
index 72b87d40b85..0ee10841cac 100644
--- a/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
+++ b/gcc/testsuite/gcc.target/s390/global-array-element-pic2.c
@@ -1,6 +1,6 @@
 /* Test accesses to global array elements in PIC code.  */
 /* { dg-do compile } */
-/* { dg-options "-O1 -march=z10 -fPIC" } */
+/* { dg-options "-O1 -march=z10 -mzarch -fPIC" } */
 
 extern char a[] __attribute__ ((aligned (2)));
 extern char *b;
@@ -8,6 +8,7 @@ extern char *b;
 void c()
 {
   b = a + 4;
-  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlgrl\t%r\\d+,a@GOTENT\n" { target 
lp64 } } } */
+  /* { dg-final { scan-assembler "(?n)\n\tlrl\t%r\\d+,a@GOTENT\n" { target { ! 
lp64 } } } } */
   /* { dg-final { scan-assembler-not "(?n)\n\tlarl\t%r\\d+,a\[^@\]" } } */
 }
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-1.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
index 03d17f59096..8e812f2f01d 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-1.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-1.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z9-109" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/load-imm64-2.c 
b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
index ee0ff3b0a91..c3536b4d031 100644
--- a/gcc/testsuite/gcc.target/s390/load-imm64-2.c
+++ b/gcc/testsuite/gcc.target/s390/load-imm64-2.c
@@ -4,10 +4,10 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z10" } */
 
-unsigned long
+unsigned long long
 magic (void)
 {
-  return 0x3f08c5392f756cd;
+  return 0x3f08c5392f756cdULL;
 }
 
 /* { dg-final { scan-assembler-times {\n\tllihf\t} 1 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
index 2d67679bb11..513912e669d 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z14" } */
+/* { dg-options "-march=z14 -mzarch" } */
 #if !defined(__LONG_DOUBLE_VX__)
 #error
 #endif
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
index 6f264313408..6b3cb321338 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target target_attribute } */
-/* { dg-options "-march=z13" } */
+/* { dg-options "-march=z13 -mzarch" } */
 #if defined(__LONG_DOUBLE_VX__)
 #error
 #endif
-- 
2.31.1



[PATCH v3] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-12 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
  UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
  weak symbols in non-PIC code (Ulrich).  Add TLS tests.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574646.html
v2 -> v3: Use %K in function_profiler() and s390_output_mi_thunk(),
  add tests for these cases.



This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use %K, use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT, use %K.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* g++.target/s390/mi-thunk.C: New test.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call-z10-pic.c: New test.
* gcc.target/s390/call-z10.c: New test.
* gcc.target/s390/call-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call-z9-pic.c: New test.
* gcc.target/s390/call-z9.c: New test.
* gcc.target/s390/mfentry-m64-pic.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls-pic.c: New test.
* gcc.target/s390/tls.c: New test.
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 81 +--
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/g++.target/s390/mi-thunk.C  | 23 ++
 .../gcc.target/s390/call-z10-pic-nodatarel.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10-pic.c  | 20 +
 gcc/testsuite/gcc.target/s390/call-z10.c  | 20 +
 .../gcc.target/s390/call-z9-pic-nodatarel.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9-pic.c   | 18 +
 gcc/testsuite/gcc.target/s390/call-z9.c   | 20 +
 gcc/testsuite/gcc.target/s390/call.h  | 40 +
 .../gcc.target/s390/mfentry-m64-pic.c |  9 +++
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +-
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls-pic.c   | 14 
 gcc/testsuite/gcc.target/s390/tls.c   | 10 +++
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 19 files changed, 320 insertions(+), 73 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/s390/mi-thunk.C
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/mfentry-m64-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/tls.c
 create mode 1006

Re: [PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-07 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-07-07 at 21:03 +0200, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
> v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
>   UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
>   weak symbols in non-PIC code (Ulrich).  Add TLS tests.
> 
> 
> 
> This helps with generating code for kernel hotpatches, which contain
> individual functions and are loaded more than 2G away from vmlinux.
> This should not create performance regressions for the normal use
> cases, because for local functions ld replaces @PLT calls with direct
> calls.

Please disregard this patch, I just realized I missed two
output_asm_insn () calls in s390.c: one in function_profiler () and
one in s390_output_mi_thunk ().  I'll send a v3.



[PATCH v2] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-07-07 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573614.html
v1 -> v2: Do not use UNSPEC_PLT in 64-bit code and rename it to
  UNSPEC_PLT31 (Ulrich, Andreas).  Do not append @PLT only to
  weak symbols in non-PIC code (Ulrich).  Add TLS tests.



This helps with generating code for kernel hotpatches, which contain
individual functions and are loaded more than 2G away from vmlinux.
This should not create performance regressions for the normal use
cases, because for local functions ld replaces @PLT calls with direct
calls.

gcc/ChangeLog:

* config/s390/predicates.md (bras_sym_operand): Accept all
functions in 64-bit mode, use UNSPEC_PLT31.
(larl_operand): Use UNSPEC_PLT31.
* config/s390/s390.c (s390_loadrelative_operand_p): Likewise.
(legitimize_pic_address): Likewise.
(s390_emit_tls_call_insn): Mark __tls_get_offset as function,
use UNSPEC_PLT31.
(s390_delegitimize_address): Use UNSPEC_PLT31.
(s390_output_addr_const_extra): Likewise.
(print_operand): Add @PLT to TLS calls, handle %K.
(s390_function_profiler): Mark __fentry__/_mcount as function,
use UNSPEC_PLT31.
(s390_output_mi_thunk): Use only UNSPEC_GOT.
(s390_emit_call): Use UNSPEC_PLT31.
(s390_emit_tpf_eh_return): Mark __tpf_eh_return as function.
* config/s390/s390.md (UNSPEC_PLT31): Rename from UNSPEC_PLT.
(*movdi_64): Use %K.
(reload_base_64): Likewise.
(*sibcall_brc): Likewise.
(*sibcall_brcl): Likewise.
(*sibcall_value_brc): Likewise.
(*sibcall_value_brcl): Likewise.
(*bras): Likewise.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(*bras_tls): Likewise.
(*brasl_tls): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
(@split_stack_call): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call31-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z10-pic.c: New test.
* gcc.target/s390/call31-z10.c: New test.
* gcc.target/s390/call31-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z9-pic.c: New test.
* gcc.target/s390/call31-z9.c: New test.
* gcc.target/s390/call64-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z10-pic.c: New test.
* gcc.target/s390/call64-z10.c: New test.
* gcc.target/s390/call64-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z9-pic.c: New test.
* gcc.target/s390/call64-z9.c: New test.
* gcc.target/s390/tls.h: Common code for the new TLS tests.
* gcc.target/s390/tls31-pic.c: New test.
* gcc.target/s390/tls31.c: New test.
* gcc.target/s390/tls64-pic.c: New test.
* gcc.target/s390/tls64.c: New test.
---
 gcc/config/s390/predicates.md |  9 ++-
 gcc/config/s390/s390.c| 73 ++-
 gcc/config/s390/s390.md   | 32 
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/gcc.target/s390/call.h  | 40 ++
 .../s390/call31-z10-pic-nodatarel.c   | 16 
 .../gcc.target/s390/call31-z10-pic.c  | 16 
 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 
 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 
 .../s390/call64-z10-pic-nodatarel.c   | 17 +
 .../gcc.target/s390/call64-z10-pic.c  | 17 +
 gcc/testsuite/gcc.target/s390/call64-z10.c| 15 
 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9.c | 15 
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +--
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +-
 gcc/testsuite/gcc.target/s390/tls.h   | 23 ++
 gcc/testsuite/gcc.target/s390/tls31-pic.c | 14 
 gcc/testsuite/gcc.target/s390/tls31.c |  9 +++
 gcc/testsuite/gcc.target/s390/tls64-pic.c | 14 
 gcc/testsuite/gcc.target/s390/tls64.c |  9 +++
 25 files changed, 382 insertions(+), 69 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c
 create mode 100644 gcc/tes

[PATCH] IBM Z: Use @PLT symbols for local functions in 64-bit mode

2021-06-24 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



This helps with generating the code for kernel hotpatches, which
contain individual functions and are loaded more than 2G away from
vmlinux.  This should not create performance regressions for the
normal use cases, because for local functions ld replaces @PLT calls
with direct calls.

gcc/ChangeLog:

* config/s390/s390.c (print_operand): Handle %K.
* config/s390/s390.md (*movdi_64): Use %K for larl.
(reload_base_64): Likewise.
(*sibcall_brc): Use %K for j.
(*sibcall_brcl): Use %K for jg.
(*sibcall_value_brc): Use %K for j.
(*sibcall_value_brcl): Use %K for jg.
(*bras): Use %K.
(*brasl): Likewise.
(*bras_r): Likewise.
(*brasl_r): Likewise.
(main_base_64): Use %K for larl.
(reload_base_64): Likewise.
(@split_stack_call): Use %K for jg.

gcc/testsuite/ChangeLog:

* g++.dg/ext/visibility/noPLT.C: Skip on s390x.
* gcc.target/s390/nodatarel-1.c: Move foostatic to the new
tests.
* gcc.target/s390/pr80080-4.c: Allow @PLT suffix.
* gcc.target/s390/risbg-ll-3.c: Likewise.
* gcc.target/s390/call.h: Common code for the new tests.
* gcc.target/s390/call31-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z10-pic.c: New test.
* gcc.target/s390/call31-z10.c: New test.
* gcc.target/s390/call31-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call31-z9-pic.c: New test.
* gcc.target/s390/call31-z9.c: New test.
* gcc.target/s390/call64-z10-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z10-pic.c: New test.
* gcc.target/s390/call64-z10.c: New test.
* gcc.target/s390/call64-z9-pic-nodatarel.c: New test.
* gcc.target/s390/call64-z9-pic.c: New test.
* gcc.target/s390/call64-z9.c: New test.
---
 gcc/config/s390/s390.c|  9 +
 gcc/config/s390/s390.md   | 26 ++---
 gcc/testsuite/g++.dg/ext/visibility/noPLT.C   |  2 +-
 gcc/testsuite/gcc.target/s390/call.h  | 38 +++
 .../s390/call31-z10-pic-nodatarel.c   | 16 
 .../gcc.target/s390/call31-z10-pic.c  | 16 
 gcc/testsuite/gcc.target/s390/call31-z10.c| 15 
 .../gcc.target/s390/call31-z9-pic-nodatarel.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9-pic.c | 16 
 gcc/testsuite/gcc.target/s390/call31-z9.c | 15 
 .../s390/call64-z10-pic-nodatarel.c   | 17 +
 .../gcc.target/s390/call64-z10-pic.c  | 17 +
 gcc/testsuite/gcc.target/s390/call64-z10.c| 15 
 .../gcc.target/s390/call64-z9-pic-nodatarel.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9-pic.c | 17 +
 gcc/testsuite/gcc.target/s390/call64-z9.c | 15 
 gcc/testsuite/gcc.target/s390/nodatarel-1.c   | 26 +
 gcc/testsuite/gcc.target/s390/pr80080-4.c |  2 +-
 gcc/testsuite/gcc.target/s390/risbg-ll-3.c|  6 +--
 19 files changed, 258 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/call.h
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call31-z9.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z10.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic-nodatarel.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9-pic.c
 create mode 100644 gcc/testsuite/gcc.target/s390/call64-z9.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..e7839044a40 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7943,6 +7943,7 @@ print_operand_address (FILE *file, rtx addr)
 'E': print opcode suffix for branch on index instruction.
 'G': print the size of the operand in bytes.
 'J': print tls_load/tls_gdcall/tls_ldcall suffix
+'K': print @PLT suffix for call targets and load address values.
 'M': print the second word of a TImode operand.
 'N': print the second word of a DImode operand.
 'O': print only the displacement of a memory reference or address.
@@ -8129,6 +8130,14 @@ print_operand (FILE *file, rtx x, int code)
 case 'Y':
   print_shift_count_operand (file, x);
   return;
+
+case 'K':
+  if (TARGET_64BIT
+ && flag_pic
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_FUNCTION_P (x))
+   fprintf (file, "@PLT");
+  return

[PATCH v2] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-23 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573348.html
v1 -> v2: Use ATTRIBUTE_UNUSED, compact op[] array (Andreas).
  I've also noticed that one of the nops that we generate for
  -mnop-mcount is not needed now and removed it.  A couple
  tests needed to be adjusted after that.




s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mnop-mcount-m31-mzarch.c: Adapt to the new
prologue size.
* gcc.target/s390/mnop-mcount-m64.c: Likewise.
---
 gcc/config/s390/s390.c| 42 +++
 gcc/config/s390/s390.h|  2 +
 .../gcc.target/s390/mnop-mcount-m31-mzarch.c  |  2 +-
 .../gcc.target/s390/mnop-mcount-m64.c |  2 +-
 4 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..590dd8f35bc 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,33 +13110,25 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int labelno ATTRIBUTE_UNUSED)
 {
-  rtx op[8];
-
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
+  rtx op[4];
 
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
   op[1] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
-  op[7] = GEN_INT (UNITS_PER_LONG);
-
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
+  op[3] = GEN_INT (UNITS_PER_LONG);
 
-  op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
+  op[2] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
-  op[4] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[4]), UNSPEC_PLT);
-  op[4] = gen_rtx_CONST (Pmode, op[4]);
+  op[2] = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op[2]), UNSPEC_PLT);
+  op[2] = gen_rtx_CONST (Pmode, op[2]);
 }
 
   if (flag_record_mcount)
@@ -13150,20 +13142,19 @@ s390_function_profiler (FILE *file, int labelno)
warning (OPT_Wcannot_profile, "nested functions cannot be profiled "
 "with %<-mfentry%> on s390");
   else
-   output_asm_insn ("brasl\t0,%4", op);
+   output_asm_insn ("brasl\t0,%2", op);
 }
   else if (TARGET_64BIT)
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* larl */ 3 +
-/* brasl */ 3 + /* lg */ 3);
+   output_asm_nops ("-mnop-mcount", /* stg */ 3 + /* brasl */ 3 +
+/* lg */ 3);
   else
{
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
@@ -13172,15 +13163,14 @@ s390_function_profiler (FILE *file, int labelno)
   else
 {
   if (flag_nop_mcount)
-   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* larl */ 3 +
-/* brasl */ 3 + /* l */ 2);
+   output_asm_nops ("-mnop-mcount", /* st */ 2 + /* brasl */ 3 +
+/* l */ 2);
   else
{
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
-   output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
- output_asm_insn ("brasl\t%0,%4", op);
+   output_asm_insn (".cfi_rel_offset\t%0,%3", op);
+ output_asm_insn ("brasl\t%0,%2", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_restore\t%0", op);
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines 

[PATCH] IBM Z: Define NO_PROFILE_COUNTERS

2021-06-21 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



s390 glibc does not need counters in the .data section, since it stores
edge hits in its own data structure.  Therefore counters only waste
space and confuse diffing tools (e.g. kpatch), so don't generate them.

gcc/ChangeLog:

* config/s390/s390.c (s390_function_profiler): Ignore labelno
parameter.
* config/s390/s390.h (NO_PROFILE_COUNTERS): Define.
---
 gcc/config/s390/s390.c | 14 ++
 gcc/config/s390/s390.h |  2 ++
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6bbeb640e1f..96c9a9db53b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -13110,17 +13110,13 @@ output_asm_nops (const char *user, int hw)
 }
 }
 
-/* Output assembler code to FILE to increment profiler label # LABELNO
-   for profiling a function entry.  */
+/* Output assembler code to FILE to call a profiler hook.  */
 
 void
-s390_function_profiler (FILE *file, int labelno)
+s390_function_profiler (FILE *file, int /* labelno */)
 {
   rtx op[8];
 
-  char label[128];
-  ASM_GENERATE_INTERNAL_LABEL (label, "LP", labelno);
-
   fprintf (file, "# function profiler \n");
 
   op[0] = gen_rtx_REG (Pmode, RETURN_REGNUM);
@@ -13128,10 +13124,6 @@ s390_function_profiler (FILE *file, int labelno)
   op[1] = gen_rtx_MEM (Pmode, plus_constant (Pmode, op[1], UNITS_PER_LONG));
   op[7] = GEN_INT (UNITS_PER_LONG);
 
-  op[2] = gen_rtx_REG (Pmode, 1);
-  op[3] = gen_rtx_SYMBOL_REF (Pmode, label);
-  SYMBOL_REF_FLAGS (op[3]) = SYMBOL_FLAG_LOCAL;
-
   op[4] = gen_rtx_SYMBOL_REF (Pmode, flag_fentry ? "__fentry__" : "_mcount");
   if (flag_pic)
 {
@@ -13162,7 +13154,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("stg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("lg\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
@@ -13179,7 +13170,6 @@ s390_function_profiler (FILE *file, int labelno)
  output_asm_insn ("st\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
output_asm_insn (".cfi_rel_offset\t%0,%7", op);
- output_asm_insn ("larl\t%2,%3", op);
  output_asm_insn ("brasl\t%0,%4", op);
  output_asm_insn ("l\t%0,%1", op);
  if (flag_dwarf2_cfi_asm)
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 3b876160420..fb16a455a03 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -787,6 +787,8 @@ CUMULATIVE_ARGS;
 
 #define PROFILE_BEFORE_PROLOGUE 1
 
+#define NO_PROFILE_COUNTERS 1
+
 
 /* Trampolines for nested functions.  */
 
-- 
2.31.1



[PATCH] IBM Z: Remove match_scratch workaround

2021-06-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



Since commit dd1ef00c45ba ("Fix bug in the define_subst handling that
made match_scratch unusable for multi-alternative patterns.") the
workaround for that bug in *ashrdi3_31 is not only no
longer necessary, but actually breaks the build.

Get rid of it by using only one alternative in (match_scratch).  It
will be replicated as many times as needed in order to match the
pattern with which (define_subst) is used.

gcc/ChangeLog:

* config/s390/s390.md(*ashrdi3_31): Use a single
constraint.
* config/s390/subst.md(cconly_subst): Use a single constraint
in (match_scratch).

gcc/testsuite/ChangeLog:

* gcc.target/s390/ashr.c: New test.
---
 gcc/config/s390/s390.md  | 14 --
 gcc/config/s390/subst.md |  2 +-
 gcc/testsuite/gcc.target/s390/ashr.c | 11 +++
 3 files changed, 16 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/ashr.c

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 7faf775fbf2..0c5b4dc9029 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -9328,19 +9328,13 @@
   ""
   "")
 
-; FIXME: The number of alternatives is doubled here to match the fix
-; number of 2 in the subst pattern for the (clobber (match_scratch...
-; The right fix should be to support match_scratch in the output
-; pattern of a define_subst.
 (define_insn "*ashrdi3_31"
-  [(set (match_operand:DI 0 "register_operand"   "=d, d")
-(ashiftrt:DI (match_operand:DI 1 "register_operand"   "0, 0")
- (match_operand:QI 2 "shift_count_operand" "jsc,jsc")))
+  [(set (match_operand:DI 0 "register_operand"   "=d")
+(ashiftrt:DI (match_operand:DI 1 "register_operand"   "0")
+ (match_operand:QI 2 "shift_count_operand" "jsc")))
(clobber (reg:CC CC_REGNUM))]
   "!TARGET_ZARCH"
-  "@
-   srda\t%0,%Y2
-   srda\t%0,%Y2"
+  "srda\t%0,%Y2"
   [(set_attr "op_type" "RS")
(set_attr "atype"   "reg")])
 
diff --git a/gcc/config/s390/subst.md b/gcc/config/s390/subst.md
index 384af11c198..3ea6fc40ba8 100644
--- a/gcc/config/s390/subst.md
+++ b/gcc/config/s390/subst.md
@@ -45,7 +45,7 @@
   "s390_match_ccmode(insn, CCSmode)"
   [(set (reg CC_REGNUM)
(compare (match_dup 1) (const_int 0)))
-   (clobber (match_scratch:DSI 0 "=d,d"))])
+   (clobber (match_scratch:DSI 0 "=d"))])
 
 (define_subst_attr "cconly" "cconly_subst" "" "_cconly")
 
diff --git a/gcc/testsuite/gcc.target/s390/ashr.c 
b/gcc/testsuite/gcc.target/s390/ashr.c
new file mode 100644
index 000..8cffdfa9a1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/ashr.c
@@ -0,0 +1,11 @@
+/* Test the arithmetic shift right pattern.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int e(void);
+
+int f (long c, int b)
+{
+  return (c >> b) && e ();
+}
-- 
2.31.1



Re: [PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-05-03 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2021-04-30 at 08:49 +0200, Andreas Krebbel wrote:
> On 4/28/21 3:48 AM, Ilya Leoshkevich wrote:
> > Bootstrapped and regtested on s390x-redhat-linux.  Tested with
> > valgrind
> > too (PR 100278 is now fixed).  Ok for master?
> > 
> > v1:
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html
> > v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is
> > more
> >   on the "obviously correct" side than gen_raw_SUBREG().
> > 
> > 
> > 
> > gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard
> > registers,
> > since the subregs they create do not pass validation.  Change
> > s390_md_asm_adjust() to manually copy between hard VRs and FPRs
> > instead
> > of using these two functions.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/100217
> > * config/s390/s390.c (s390_hard_fp_reg_p): New function.
> > (s390_md_asm_adjust): Handle hard registers.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/100217
> > * gcc.target/s390/vector/long-double-asm-in-out-hard-fp-
> > reg.c: New test.
> > * gcc.target/s390/vector/long-double-asm-inout-hard-fp-
> > reg.c: New test.
> 
> Ok. Thanks!
> 
> Andreas

Thanks!

I forgot to ask: ok for gcc-11 branch?



[PATCH v2] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-04-27 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Tested with valgrind
too (PR 100278 is now fixed).  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568771.html
v1 -> v2: Use the UNSPEC pattern, which is less efficient, but is more
  on the "obviously correct" side than gen_raw_SUBREG().



gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers,
since the subregs they create do not pass validation.  Change
s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead
of using these two functions.

gcc/ChangeLog:

PR target/100217
* config/s390/s390.c (s390_hard_fp_reg_p): New function.
(s390_md_asm_adjust): Handle hard registers.

gcc/testsuite/ChangeLog:

PR target/100217
* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test.
---
 gcc/config/s390/s390.c| 52 +--
 .../long-double-asm-in-out-hard-fp-reg.c  | 33 
 .../long-double-asm-inout-hard-fp-reg.c   | 31 +++
 3 files changed, 112 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a9c945c5ee9..88361f98c7e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint)
   return seen_f_p && !seen_v_p;
 }
 
+/* Return TRUE iff X is a hard floating-point (and not a vector) register.  */
+
+static bool
+s390_hard_fp_reg_p (rtx x)
+{
+  if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x)))
+return false;
+
+  tree decl = REG_EXPR (x);
+  if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl)))
+return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  return name[0] == '*' && name[1] == 'f';
+}
+
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
constraints when long doubles are stored in vector registers.  */
 
@@ -16787,9 +16804,24 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
   gcc_assert (allows_reg);
   gcc_assert (!is_inout);
   /* Copy output value from a FPR pair into a vector register.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  rtx fprx2;
   push_to_sequence2 (after_md_seq, after_md_end);
-  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  if (s390_hard_fp_reg_p (outputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i]));
+ /* The first half is already at the correct location, copy only the
+  * second one.  Use the UNSPEC pattern instead of the SUBREG one,
+  * since s390_can_change_mode_class() rejects
+  * (subreg:DF (reg:TF %fN) 8) and thus subreg validation fails.  */
+ rtx v1 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]));
+ rtx v3 = gen_rtx_REG (V2DFmode, REGNO (outputs[i]) + 1);
+ emit_insn (gen_vec_permiv2df (v1, v1, v3, const0_rtx));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+   }
   after_md_seq = get_insns ();
   after_md_end = get_last_insn ();
   end_sequence ();
@@ -16813,8 +16845,20 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
continue;
   gcc_assert (allows_reg);
   /* Copy input value from a vector register into a FPR pair.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
-  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  rtx fprx2;
+  if (s390_hard_fp_reg_p (inputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i]));
+ /* Copy only the second half.  */
+ rtx v1 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]) + 1);
+ rtx v2 = gen_rtx_REG (V2DFmode, REGNO (inputs[i]));
+ emit_insn (gen_vec_permiv2df (v1, v2, v1, GEN_INT (3)));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+   }
   inputs[i] = fprx2;
   input_modes[i] = FPRX2mode;
 }
diff --git 
a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
new file mode 100644
index 000..2dcaf08f00b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
+/* { dg-do run { target { s390_z14_hw } } } */
+#include 
+#include 
+
+__attribute__ ((noipa)) static long double
+sqxbr (long double x)
+{
+  register long double in asm("f0") = x;
+  register long double out asm("f1");
+
+  asm("sqxbr\t%0,%1" :

[PATCH] IBM Z: Handle hard registers in s390_md_asm_adjust()

2021-04-26 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Tested with valgrind
on top of 52a5515ed (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100278).  Ok for master?



gen_fprx2_to_tf() and gen_tf_to_fprx2() cannot handle hard registers,
since the subregs they create do not pass validation.  Change
s390_md_asm_adjust() to manually copy between hard VRs and FPRs instead
of using these two functions.

gcc/ChangeLog:

PR target/100217
* config/s390/s390.c (s390_hard_fp_reg_p): New function.
(s390_md_asm_adjust): Handle hard registers.
* config/s390/vector.md (*df_to_tf_1): New pattern.

gcc/testsuite/ChangeLog:

PR target/100217
* gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c: New test.
* gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c: New test.
---
 gcc/config/s390/s390.c| 50 +--
 gcc/config/s390/vector.md |  8 +++
 .../long-double-asm-in-out-hard-fp-reg.c  | 28 +++
 .../long-double-asm-inout-hard-fp-reg.c   | 27 ++
 4 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out-hard-fp-reg.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout-hard-fp-reg.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index a9c945c5ee9..ed6cea9b1f7 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16754,6 +16754,23 @@ f_constraint_p (const char *constraint)
   return seen_f_p && !seen_v_p;
 }
 
+/* Return TRUE iff X is a hard floating-point (and not a vector) register.  */
+
+static bool
+s390_hard_fp_reg_p (rtx x)
+{
+  if (!(REG_P (x) && HARD_REGISTER_P (x) && REG_ATTRS (x)))
+return false;
+
+  tree decl = REG_EXPR (x);
+  if (!(HAS_DECL_ASSEMBLER_NAME_P (decl) && DECL_ASSEMBLER_NAME_SET_P (decl)))
+return false;
+
+  const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  return name[0] == '*' && name[1] == 'f';
+}
+
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
constraints when long doubles are stored in vector registers.  */
 
@@ -16787,9 +16804,23 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
   gcc_assert (allows_reg);
   gcc_assert (!is_inout);
   /* Copy output value from a FPR pair into a vector register.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  rtx fprx2;
   push_to_sequence2 (after_md_seq, after_md_end);
-  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  if (s390_hard_fp_reg_p (outputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (outputs[i]));
+ /* The first half is already at the correct location, copy only the
+  * second one.  Use gen_rtx_raw_SUBREG() in order to skip subreg
+  * validation - we need to build (subreg:DF (reg:TF %fN) 8), which
+  * will otherwise be rejected by s390_can_change_mode_class().  */
+ emit_move_insn (gen_rtx_raw_SUBREG (DFmode, outputs[i], 8),
+ simplify_gen_subreg (DFmode, fprx2, FPRX2mode, 8));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+   }
   after_md_seq = get_insns ();
   after_md_end = get_last_insn ();
   end_sequence ();
@@ -16813,8 +16844,19 @@ s390_md_asm_adjust (vec &outputs, vec 
&inputs,
continue;
   gcc_assert (allows_reg);
   /* Copy input value from a vector register into a FPR pair.  */
-  rtx fprx2 = gen_reg_rtx (FPRX2mode);
-  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  rtx fprx2;
+  if (s390_hard_fp_reg_p (inputs[i]))
+   {
+ fprx2 = gen_rtx_REG (FPRX2mode, REGNO (inputs[i]));
+ /* Copy only the second half.  */
+ emit_move_insn (gen_rtx_raw_SUBREG (DFmode, fprx2, 8),
+ gen_rtx_raw_SUBREG (DFmode, inputs[i], 8));
+   }
+  else
+   {
+ fprx2 = gen_reg_rtx (FPRX2mode);
+ emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+   }
   inputs[i] = fprx2;
   input_modes[i] = FPRX2mode;
 }
diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index c80d582a300..648e00625e1 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -634,6 +634,14 @@
 }
   [(set_attr "op_type" "VRR,*")])
 
+(define_insn "*df_to_tf_1"
+  [(set (subreg:DF (match_operand:TF 0 "nonimmediate_operand" "+v") 8)
+   (match_operand:DF1 "general_operand"   "f"))]
+  "TARGET_VXE"
+  ; M4 == 0 corresponds to %v0[0] = %v0[0]; %v0[1] = %v1[0];
+  "vpdi\t%v0,%v0,%v1,0"
+  [(set_attr "op_type" "VRR")])
+
 (define_insn "*vec_ti_to_v1ti"
   [(set (match_operand:V1TI   0 "nonimmediate_operand" 
"=v,v,R,  v,  v,v")
(vec_duplicate:V1TI (match_operand:TI 1 "general_operand"   
"v,R,v,j00,jm1,d")))]
diff --git 
a/gcc/testsuite/gcc.tar

Re: [PATCH v3] fwprop: Fix single_use_p calculation

2021-03-23 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-03-23 at 12:48 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > +inline use_info *
> > +set_info::single_nondebug_use () const
> > +{
> > +  use_info *nondebug_insn = single_nondebug_insn_use ();
> > +  if (nondebug_insn)
> > +    return has_phi_uses () ? nullptr : nondebug_insn;
> > +  use_info *phi = single_phi_use ();
> > +  if (phi)
> > +    return has_nondebug_insn_uses() ? nullptr : phi;
> > +  return nullptr;
> 
> Very minor, but I think this is simpler as:
> 
>   if (!has_phi_uses ())
>     return single_nondebug_insn_use ();
>   if (!has_nondebug_insn_uses ())
>     return single_phi_use ();
>   return nullptr;
> 
> OK with that change (or without if you prefer the original).
> Thanks for the fix and for your patience. :-)
> 
> Richard

Retested with the change above and pushed as:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b61461ac7f9bdd0e98145be79423d19b933afaa0

Thanks for all the suggestions!

Best regards,
Ilya



[PATCH v3] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
Bootstrap and regtest running on x86_64-redhat-linux,
ppc64le-redhat-linux and s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html
v1 -> v2: Pass a set_info instead of a def_info around.
  Add single_nondebug_insn_use () - maybe this could be improved
  further? [1]
  Simplify def->insn ()->ebb ().
  Improve formatting.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567121.html
v2 -> v3: Introduce single_nondebug_use and single_phi_use methods.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html

---

Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at set_info's uses instead of insn_info's uses.  This
requires passing around set_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::fwprop_propagation): Look at
set_info's uses.
(try_fwprop_subst_note): Use set_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* rtl-ssa/accesses.h (set_info::single_nondebug_use) New
method.
(set_info::single_nondebug_insn_use): Likewise.
(set_info::single_phi_use): Likewise.
* rtl-ssa/member-fns.inl (set_info::single_nondebug_use) New
method.
(set_info::single_nondebug_insn_use): Likewise.
(set_info::single_phi_use): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
---
 gcc/fwprop.c  | 81 +--
 gcc/rtl-ssa/accesses.h| 13 +++
 gcc/rtl-ssa/member-fns.inl| 30 +++
 .../s390/vector/long-double-asm-abi.c | 26 ++
 4 files changed, 109 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..d7203672886 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, set_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,13 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   set_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def->single_nondebug_use ()),
+single_ebb_p (use_insn->ebb () == def->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,24 +368,25 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
-   If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
+   If REQUIRE_CONSTANT is true, require all substituted occurrences of SRC
to fold to a constant, so that the note does not use any more registers
than it did previously.  If REQUIRE_CONSTANT is false, also allow the
substitution if it's something we'd normally allow for the main
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, set_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   i

Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Mon, 2021-03-22 at 22:55 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote:
> > > Ilya Leoshkevich  writes:
> > 
> > [...]
> > 
> > > > Do you still want me to add single_nondebug_use() for
> > > > completeness
> > > > in
> > > > this patch, or would it be better to add it later when it's
> > > > actually
> > > > needed?
> > > 
> > > I was thinking that the fwprop.c code would use
> > > def->single_nondebug_use () instead of
> > > def->single_nondebug_insn_use () && !def->has_phi_uses ().
> > 
> > But these two are not equivalent, are they?  single_nondebug_use()
> > that you proposed explicitly allows phis:
> > 
> >   // If there is exactly one nondebug use of the set's result,
> >   // return that use, otherwise return null.  The use might be in
> >   // instruction or a phi node.
> >   use_info *single_nondebug_use () const;
> > 
> > but I don't think we want to propagate into phis here.
> > Or should the check be a bit bigger, like the following?
> 
> But we're in the process of substituting the definition into an
> insn use.  So we know that an insn use exists.  I think the
> question we're trying to answer is: is this insn use the only
> nondebug use?  I'd rather test that with a single accessor rather
> than break it down into individual data structure tests.

Ah, you are absolutely right - now I get it.  Please ignore the v2
then, I will send a v3.



[PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?

v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566127.html
v1 -> v2: Pass a set_info instead of a def_info around.
  Add single_nondebug_insn_use () - maybe this could be improved
  further? [1]
  Simplify def->insn ()->ebb ().
  Improve formatting.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567118.html

---

Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at set_info's uses instead of insn_info's uses.  This
requires passing around set_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::fwprop_propagation): Look at
set_info's uses.
(try_fwprop_subst_note): Use set_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* rtl-ssa/accesses.h (set_info::single_nondebug_insn_use): New
method.
* rtl-ssa/member-fns.inl (set_info::single_nondebug_insn_use):
Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
---
 gcc/fwprop.c  | 79 +--
 gcc/rtl-ssa/accesses.h|  4 +
 gcc/rtl-ssa/member-fns.inl|  9 +++
 .../s390/vector/long-double-asm-abi.c | 26 ++
 4 files changed, 78 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..6173c9248eb 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, set_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,13 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   set_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def->single_nondebug_insn_use () && !def->has_phi_uses ()),
+single_ebb_p (use_insn->ebb () == def->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,9 +368,9 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
to fold to a constant, so that the note does not use any more registers
@@ -379,13 +379,14 @@ contains_paradoxical_subreg_p (rtx x)
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, set_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -436,19 +437,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   return prop.num_replacements;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of
+/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of
USE_INSN's pattern.  Return true on success, otherwise leave US

Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Mon, 2021-03-22 at 18:23 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:

[...]

> > Do you still want me to add single_nondebug_use() for completeness
> > in
> > this patch, or would it be better to add it later when it's
> > actually
> > needed?
> 
> I was thinking that the fwprop.c code would use
> def->single_nondebug_use () instead of
> def->single_nondebug_insn_use () && !def->has_phi_uses ().

But these two are not equivalent, are they?  single_nondebug_use()
that you proposed explicitly allows phis:

  // If there is exactly one nondebug use of the set's result,
  // return that use, otherwise return null.  The use might be in
  // instruction or a phi node.
  use_info *single_nondebug_use () const;

but I don't think we want to propagate into phis here.
Or should the check be a bit bigger, like the following?

use_info *single = def->single_nondebug_use ();
single_use_p = single && !single->is_in_phi ();


[...]

Best regards,
Ilya



Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-22 Thread Ilya Leoshkevich via Gcc-patches
On Sun, 2021-03-21 at 13:19 +, Richard Sandiford wrote:
> Ilya Leoshkevich  writes:
> > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-
> > linux
> > and s390x-redhat-linux.  Ok for master?
> 
> Given what was said downthread, I agree we should fix this for GCC
> 11.
> Sorry for missing this problem in the initial review.
> 
> > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem))
> > simplifications")
> > introduced a check that was supposed to look at the propagated
> > def's
> > number of uses.  It uses insn_info::num_uses (), which in reality
> > returns the number of uses def's insn has.  The whole change
> > therefore
> > works only by accident.
> > 
> > Fix by looking at def_info's uses instead of insn_info's uses. 
> > This
> > requires passing around def_info instead of insn_info.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-03-02  Ilya Leoshkevich  
> > 
> > * fwprop.c (def_has_single_use_p): New function.
> > (fwprop_propagation::fwprop_propagation): Look at
> > def_info's uses.
> > (try_fwprop_subst_note): Use def_info instead of insn_info.
> > (try_fwprop_subst_pattern): Likewise.
> > (try_fwprop_subst_notes): Likewise.
> > (try_fwprop_subst): Likewise.
> > (forward_propagate_subreg): Likewise.
> > (forward_propagate_and_simplify): Likewise.
> > (forward_propagate_into): Likewise.
> > * iterator-utils.h (single_element_p): New function.
> > ---
> >  gcc/fwprop.c | 89 ++--
> > 
> >  gcc/iterator-utils.h | 10 +
> >  2 files changed, 62 insertions(+), 37 deletions(-)
> > 
> > diff --git a/gcc/fwprop.c b/gcc/fwprop.c
> > index 4b8a554e823..478dcdd96cc 100644
> > --- a/gcc/fwprop.c
> > +++ b/gcc/fwprop.c
> > @@ -175,7 +175,7 @@ namespace
> >  static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
> >  static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
> >  
> > -    fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
> > +    fwprop_propagation (insn_info *, def_info *, rtx, rtx);
> 
> use->def () returns a set_info *, and since you want set_info stuff,
> I think it would probably be better to pass around a set_info *
> instead.
> (Let's keep the variable names the same though.  “def” is still
> accurate
> and IMO the natural choice.)
> 
> > @@ -191,13 +191,27 @@ namespace
> >    };
> >  }
> >  
> > -/* Prepare to replace FROM with TO in INSN.  */
> > +/* Return true if DEF has a single non-debug non-phi use.  */
> > +
> > +static bool
> > +def_has_single_use_p (def_info *def)
> > +{
> > +  if (!is_a (def))
> > +    return false;
> > +
> > +  set_info *set = as_a (def);
> > +
> > +  return single_element_p (set->nondebug_insn_uses ())
> > +    && !set->has_phi_uses ();
> 
> I think instead we should add:
> 
>   // If exactly one nondebug instruction uses the set's result,
> return
>   // the use by that instruction, otherwise return null.
>   use_info *single_nondebug_insn_use () const;
> 
>   // If there is exactly one nondebug use of the set's result,
>   // return that use, otherwise return null.  The use might be in
>   // instruction or a phi node.
>   use_info *single_nondebug_use () const;
> 
> before the declaration of set_info::is_local_to_ebb.
> 
> > +}
> > +
> > +/* Prepare to replace FROM with TO in USE_INSN.  */
> >  
> >  fwprop_propagation::fwprop_propagation (insn_info *use_insn,
> > -   insn_info *def_insn, rtx
> > from, rtx to)
> > +   def_info *def, rtx from,
> > rtx to)
> >    : insn_propagation (use_insn->rtl (), from, to),
> > -    single_use_p (def_insn->num_uses () == 1),
> > -    single_ebb_p (use_insn->ebb () == def_insn->ebb ())
> > +    single_use_p (def_has_single_use_p (def)),
> > +    single_ebb_p (use_insn->ebb () == def->insn ()->ebb ())
> 
> Just def->ebb ()
> 
> > @@ -538,7 +554,7 @@ try_fwprop_subst_pattern (obstack_watermark
> > &attempt, insn_change &use_change,
> >  {
> >    if ((REG_NOTE_KIND (note) == REG_EQUAL
> >    || REG_NOTE_KIND (note) == REG_EQUIV)
> > - && try_fwprop_subst_note (use_insn, def_insn, note,
> > + && try_fwprop_subst_note (use_insn, 

[PATCH] IBM Z: Fix "+fvm" constraint with long doubles

2021-03-15 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



When a long double is passed to an asm statement with a "+fvm"
constraint, a LRA loop occurs.  This happens, because LRA chooses the
widest register class in this case (VEC_REGS), but the code generated
by s390_md_asm_adjust() always wants FP_REGS.  Mismatching register
classes cause infinite reloading.

Fix by treating "fv" constraints as "v" in s390_md_asm_adjust().

gcc/ChangeLog:

* config/s390/s390.c (f_constraint_p): Treat "fv" constraints
as "v".

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-asm-fprvrmem.c: New test.
---
 gcc/config/s390/s390.c   | 12 ++--
 .../s390/vector/long-double-asm-fprvrmem.c   | 11 +++
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 151136bedbc..f7b1c03561e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16714,13 +16714,21 @@ s390_shift_truncation_mask (machine_mode mode)
 static bool
 f_constraint_p (const char *constraint)
 {
+  bool seen_f_p = false;
+  bool seen_v_p = false;
+
   for (size_t i = 0, c_len = strlen (constraint); i < c_len;
i += CONSTRAINT_LEN (constraint[i], constraint + i))
 {
   if (constraint[i] == 'f')
-   return true;
+   seen_f_p = true;
+  if (constraint[i] == 'v')
+   seen_v_p = true;
 }
-  return false;
+
+  /* Treat "fv" constraints as "v", because LRA will choose the widest register
+   * class.  */
+  return seen_f_p && !seen_v_p;
 }
 
 /* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
new file mode 100644
index 000..f95656c5723
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-asm-fprvrmem.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=z14 -mzarch" } */
+
+long double
+foo (long double x)
+{
+  x = x * x;
+  asm("# %0" : "+fvm"(x));
+  x = x + x;
+  return x;
+}
-- 
2.29.2



[PATCH v3] IBM Z: Fix usage of "f" constraint with long doubles

2021-03-04 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html
v1 -> v2:
- Handle constraint modifiers, use AR constraint instead of R, add
  testcases for & and %.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
v2 -> v3:
- The main prereq is now committed:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566237.html
- Dropped long-double-asm-abi.c test, because its prereq is not
  approved (yet):
  https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566218.html
- Removed superfluous constraint pointer increment.



After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (f_constraint_p): New function.
(s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-commutative.c: New
test.
* gcc.target/s390/vector/long-double-asm-earlyclobber.c: New
test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-asm-matching.c: New test.
* gcc.target/s390/vector/long-double-asm-regmem.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 86 +++
 .../s390/vector/long-double-asm-commutative.c | 16 
 .../vector/long-double-asm-earlyclobber.c | 17 
 .../s390/vector/long-double-asm-in-out.c  | 14 +++
 .../s390/vector/long-double-asm-inout.c   | 14 +++
 .../s390/vector/long-double-asm-matching.c| 13 +++
 .../s390/vector/long-double-asm-regmem.c  |  8 ++
 .../vector/long-double-volatile-from-i64.c| 22 +
 8 files changed, 190 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-regmem.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index f3d0d1ba596..68dc3c58c1b 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16698,6 +16698,89 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional
+   modifiers.  */
+
+static bool
+f_constraint_p (const char *constraint)
+{
+  for (size_t i = 0, c_len = strlen (constraint); i < c_len;
+   i += CONSTRAINT_LEN (constraint[i], constraint + i))
+{
+  if (constraint[i] == 'f')
+   return true;
+}
+  return false;
+}
+
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (!f_constraint_p (constraint))
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!is_inout);
+  /* Copy output va

Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 21:26 +0100, Ilya Leoshkevich via Gcc-patches
wrote:
> On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote:
> > 
> > 
> > On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > > Hello,
> > > 
> > > I would like to ping the following patch:
> > > 
> > > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
> > >  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html
> > > 
> > > It is needed for the following regression fix:
> > > 
> > > IBM Z: Fix usage of "f" constraint with long doubles
> > >  https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
> > > 
> > > 
> > > Jakub, who would be the right person to review this change?  I've
> > > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows
> > > that
> > > you deal with this code a lot.
> > > 
> > > Best regards,
> > > Ilya
> > > 
> > > 
> > > 
> > > 
> > > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
> > > should be ok as long as the hook itself as well as after_md_seq
> > > make up
> > > for it), input_mode will contain stale information.
> > > 
> > > It might be tempting to fix this by removing input_mode altogether
> > > and
> > > just using GET_MODE (), but this will not work correctly with
> > > constants.
> > > So add input_modes parameter and document that it should be updated
> > > whenever inputs parameter is updated.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2021-01-05  Ilya Leoshkevich  
> > > 
> > > * cfgexpand.c (expand_asm_loc): Pass new parameter.
> > > (expand_asm_stmt): Likewise.
> > > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add
> > > new
> > > parameter.
> > > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
> > > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
> > > * config/cris/cris.c (cris_md_asm_adjust): Likewise.
> > > * config/i386/i386.c (ix86_md_asm_adjust): Likewise.
> > > * config/mn10300/mn10300.c (mn10300_md_asm_adjust):
> > > Likewise.
> > > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
> > > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
> > > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
> > > * config/vax/vax.c (vax_md_asm_adjust): Likewise.
> > > * config/visium/visium.c (visium_md_asm_adjust): Likewise.
> > > * target.def (md_asm_adjust): Likewise.
> > Ugh.    A couple questions
> > Are there any cases where you're going to want to change modes for
> > arguments that were constants?   I'm a bit surprised that we don't
> > have
> > a mode for constants for the cases that we care about.  Presumably we
> > can get a (modeless) CONST_INT here and we're not restricted to
> > CONST_DOUBLE and friends (which have modes).
> 
> Yes, this might happen.  For example, here:
> 
>     asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L));
> 
> the (const_double) and the corresponding operand will initially have 
> the mode TFmode.  s390_md_asm_adjust () will add a conversion from
> TFmode to FPRX2mode and change the argument accordingly.

Just to be more precise: the mode of the (const_double) itself will not
change.  Here is the resulting RTL for the asm statement above:

# s390_md_asm_adjust () step 1: put the (const_double) operand into a
# new (reg) with the same mode
(insn (set (reg:TF 63)
   (const_double:TF ...)))

# s390_md_asm_adjust () step 2: convert a reg from TFmode to FPRX2mode
(insn (set (reg:FPRX2 65)
   (subreg:FPRX2 (reg:TF 63) 0)))

# s390_md_asm_adjust () step 3: replace the original operand with the
# resulting (reg), adjust (asm_input) accordingly
(insn (set (reg:FPRX2 64)
   (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
   [(reg:FPRX2 65)]
   [(asm_input:FPRX2 ("f"))])))



Re: [PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 13:02 -0700, Jeff Law wrote:
> 
> 
> On 3/2/21 4:50 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > Hello,
> > 
> > I would like to ping the following patch:
> > 
> > Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html
> > 
> > It is needed for the following regression fix:
> > 
> > IBM Z: Fix usage of "f" constraint with long doubles
> > https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html
> > 
> > 
> > Jakub, who would be the right person to review this change?  I've
> > decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows
> > that
> > you deal with this code a lot.
> > 
> > Best regards,
> > Ilya
> > 
> > 
> > 
> > 
> > If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
> > should be ok as long as the hook itself as well as after_md_seq
> > make up
> > for it), input_mode will contain stale information.
> > 
> > It might be tempting to fix this by removing input_mode altogether
> > and
> > just using GET_MODE (), but this will not work correctly with
> > constants.
> > So add input_modes parameter and document that it should be updated
> > whenever inputs parameter is updated.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-01-05  Ilya Leoshkevich  
> > 
> > * cfgexpand.c (expand_asm_loc): Pass new parameter.
> > (expand_asm_stmt): Likewise.
> > * config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add
> > new
> > parameter.
> > * config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
> > * config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
> > * config/cris/cris.c (cris_md_asm_adjust): Likewise.
> > * config/i386/i386.c (ix86_md_asm_adjust): Likewise.
> > * config/mn10300/mn10300.c (mn10300_md_asm_adjust):
> > Likewise.
> > * config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
> > * config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
> > * config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
> > * config/vax/vax.c (vax_md_asm_adjust): Likewise.
> > * config/visium/visium.c (visium_md_asm_adjust): Likewise.
> > * target.def (md_asm_adjust): Likewise.
> Ugh.    A couple questions
> Are there any cases where you're going to want to change modes for
> arguments that were constants?   I'm a bit surprised that we don't
> have
> a mode for constants for the cases that we care about.  Presumably we
> can get a (modeless) CONST_INT here and we're not restricted to
> CONST_DOUBLE and friends (which have modes).

Yes, this might happen.  For example, here:

asm("sqxbr\t%0,%1" : "=f"(res) : "f"(0x1.1p+0L));

the (const_double) and the corresponding operand will initially have 
the mode TFmode.  s390_md_asm_adjust () will add a conversion from
TFmode to FPRX2mode and change the argument accordingly.

However, this is not the problematic case that I refer to in the
commit message:  I caught some failures in the testsuite that I
tracked down to (const_int)s, which, like you mentioned, don't have
a mode.

> Is input_modes read after the call to md_asm_adjust?   I'm trying to
> figure out why we'd need to update it.

Yes, its contents goes into (asm_operand)'s (asm_input). If we don't
adjust it, (asm_input)s will no longer be consistent with input operand
RTXes.

> Not acking or naking at this point, I just want to make sure I
> understand what's going on.
> 
> jeff



Re: [PATCH] fwprop: Fix single_use_p calculation

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 11:34 -0700, Jeff Law wrote:
> 
> 
> On 3/2/21 3:37 PM, Ilya Leoshkevich via Gcc-patches wrote:
> > Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-
> > linux
> > and s390x-redhat-linux.  Ok for master?
> > 
> > 
> > 
> > Commit efb6bc55a93a ("fwprop: Allow (subreg (mem))
> > simplifications")
> > introduced a check that was supposed to look at the propagated
> > def's
> > number of uses.  It uses insn_info::num_uses (), which in reality
> > returns the number of uses def's insn has.  The whole change
> > therefore
> > works only by accident.
> > 
> > Fix by looking at def_info's uses instead of insn_info's uses. 
> > This
> > requires passing around def_info instead of insn_info.
> > 
> > gcc/ChangeLog:
> > 
> > 2021-03-02  Ilya Leoshkevich  
> > 
> > * fwprop.c (def_has_single_use_p): New function.
> > (fwprop_propagation::fwprop_propagation): Look at
> > def_info's uses.
> > (try_fwprop_subst_note): Use def_info instead of insn_info.
> > (try_fwprop_subst_pattern): Likewise.
> > (try_fwprop_subst_notes): Likewise.
> > (try_fwprop_subst): Likewise.
> > (forward_propagate_subreg): Likewise.
> > (forward_propagate_and_simplify): Likewise.
> > (forward_propagate_into): Likewise.
> > * iterator-utils.h (single_element_p): New function.
> Given we're well into stage4, I'd recommend deferring to gcc-12
> unless
> this fixes a code correctness issue.
> 
> Jeff
> 

Fortunately the issue here is not a miscompilation, but it's still
a regression: on s390 small functions that use long doubles get
a number of useless load/stores as well as a stack frame, where none
was required before.  Basically, the same issue efb6bc55a93a failed to
fully fix due to the num_uses() / nondebug_insn_uses() mixup.



Re: [PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+

2021-03-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-03-03 at 07:50 +0100, Andreas Krebbel wrote:
> On 3/2/21 11:59 PM, Ilya Leoshkevich wrote:
> > mul-signed-overflow-*.c execution tests fail on z13, because they
> > contain z14-specific instructions.  Fix by requiring s390_z14_hw
> > target.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/s390/mul-signed-overflow-1.c: Run only on
> > z14+.
> > * gcc.target/s390/mul-signed-overflow-2.c: Likewise.
> 
> I did that change yesterday already.

Ah, I haven't noticed.  One difference between our patches is, though,
that I also have `dg-do compile` - this way, compile tests still run on
z13.

[...]



[PATCH PING^3] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html


Jakub, who would be the right person to review this change?  I've
decided to ask you, since `git shortlog -ns gcc/cfgexpand.c` shows that
you deal with this code a lot.

Best regards,
Ilya




If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
should be ok as long as the hook itself as well as after_md_seq make up
for it), input_mode will contain stale information.

It might be tempting to fix this by removing input_mode altogether and
just using GET_MODE (), but this will not work correctly with constants.
So add input_modes parameter and document that it should be updated
whenever inputs parameter is updated.

gcc/ChangeLog:

2021-01-05  Ilya Leoshkevich  

* cfgexpand.c (expand_asm_loc): Pass new parameter.
(expand_asm_stmt): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new
parameter.
* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
* config/cris/cris.c (cris_md_asm_adjust): Likewise.
* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
* config/vax/vax.c (vax_md_asm_adjust): Likewise.
* config/visium/visium.c (visium_md_asm_adjust): Likewise.
* target.def (md_asm_adjust): Likewise.
---
 gcc/cfgexpand.c  | 16 
 gcc/config/arm/aarch-common-protos.h |  8 
 gcc/config/arm/aarch-common.c|  7 ---
 gcc/config/arm/arm.c | 14 --
 gcc/config/cris/cris.c   |  7 ---
 gcc/config/i386/i386.c   |  7 ---
 gcc/config/mn10300/mn10300.c |  7 ---
 gcc/config/nds32/nds32.c |  1 +
 gcc/config/pdp11/pdp11.c |  9 +
 gcc/config/rs6000/rs6000.c   |  7 ---
 gcc/config/vax/vax.c |  3 ++-
 gcc/config/visium/visium.c   | 12 +++-
 gcc/doc/tm.texi  | 10 ++
 gcc/target.def   | 13 -
 14 files changed, 69 insertions(+), 52 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index aef9e916fcd..a6b48d3e48f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2880,6 +2880,7 @@ expand_asm_loc (tree string, int vol, location_t locus)
   rtx asm_op, clob;
   unsigned i, nclobbers;
   auto_vec input_rvec, output_rvec;
+  auto_vec input_mode;
   auto_vec constraints;
   auto_vec clobber_rvec;
   HARD_REG_SET clobbered_regs;
@@ -2889,9 +2890,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
   clobber_rvec.safe_push (clob);
 
   if (targetm.md_asm_adjust)
-   targetm.md_asm_adjust (output_rvec, input_rvec,
-  constraints, clobber_rvec,
-  clobbered_regs);
+   targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+  constraints, clobber_rvec, clobbered_regs);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
@@ -3068,8 +3068,8 @@ expand_asm_stmt (gasm *stmt)
   return;
 }
 
-  /* There are some legacy diagnostics in here, and also avoids a
- sixth parameger to targetm.md_asm_adjust.  */
+  /* There are some legacy diagnostics in here, and also avoids an extra
+ parameter to targetm.md_asm_adjust.  */
   save_input_location s_i_l(locus);
 
   unsigned noutputs = gimple_asm_noutputs (stmt);
@@ -3420,9 +3420,9 @@ expand_asm_stmt (gasm *stmt)
  the flags register.  */
   rtx_insn *after_md_seq = NULL;
   if (targetm.md_asm_adjust)
-after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec,
- constraints, clobber_rvec,
- clobbered_regs);
+after_md_seq
+   = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+constraints, clobber_rvec, clobbered_regs);
 
   /* Do not allow the hook to change the output and input count,
  lest it mess up the operand numbering.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 7a9cf3d324c..b6171e8668d 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -144,9 +144,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;

[PATCH] IBM Z: Run mul-signed-overflow-*.c only on z14+

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
mul-signed-overflow-*.c execution tests fail on z13, because they
contain z14-specific instructions.  Fix by requiring s390_z14_hw
target.

gcc/testsuite/ChangeLog:

* gcc.target/s390/mul-signed-overflow-1.c: Run only on z14+.
* gcc.target/s390/mul-signed-overflow-2.c: Likewise.
---
 gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c | 3 ++-
 gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
index fdf56d6e695..e8b1938dab7 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-1.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
+/* { dg-do run { target { s390_z14_hw } } } */
 /* z14 only because we need msrkc, msc, msgrkc, msgc  */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 
diff --git a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c 
b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
index d0088188aa2..01328e1d286 100644
--- a/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
+++ b/gcc/testsuite/gcc.target/s390/mul-signed-overflow-2.c
@@ -1,4 +1,5 @@
-/* { dg-do run } */
+/* { dg-do compile } */
+/* { dg-do run { target { s390_z14_hw } } } */
 /* z14 only because we need msrkc, msc, msgrkc, msgc  */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 
-- 
2.29.2



[PATCH] fwprop: Fix single_use_p calculation

2021-03-02 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?



Commit efb6bc55a93a ("fwprop: Allow (subreg (mem)) simplifications")
introduced a check that was supposed to look at the propagated def's
number of uses.  It uses insn_info::num_uses (), which in reality
returns the number of uses def's insn has.  The whole change therefore
works only by accident.

Fix by looking at def_info's uses instead of insn_info's uses.  This
requires passing around def_info instead of insn_info.

gcc/ChangeLog:

2021-03-02  Ilya Leoshkevich  

* fwprop.c (def_has_single_use_p): New function.
(fwprop_propagation::fwprop_propagation): Look at
def_info's uses.
(try_fwprop_subst_note): Use def_info instead of insn_info.
(try_fwprop_subst_pattern): Likewise.
(try_fwprop_subst_notes): Likewise.
(try_fwprop_subst): Likewise.
(forward_propagate_subreg): Likewise.
(forward_propagate_and_simplify): Likewise.
(forward_propagate_into): Likewise.
* iterator-utils.h (single_element_p): New function.
---
 gcc/fwprop.c | 89 ++--
 gcc/iterator-utils.h | 10 +
 2 files changed, 62 insertions(+), 37 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index 4b8a554e823..478dcdd96cc 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -175,7 +175,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
+fwprop_propagation (insn_info *, def_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -191,13 +191,27 @@ namespace
   };
 }
 
-/* Prepare to replace FROM with TO in INSN.  */
+/* Return true if DEF has a single non-debug non-phi use.  */
+
+static bool
+def_has_single_use_p (def_info *def)
+{
+  if (!is_a (def))
+return false;
+
+  set_info *set = as_a (def);
+
+  return single_element_p (set->nondebug_insn_uses ())
+&& !set->has_phi_uses ();
+}
+
+/* Prepare to replace FROM with TO in USE_INSN.  */
 
 fwprop_propagation::fwprop_propagation (insn_info *use_insn,
-   insn_info *def_insn, rtx from, rtx to)
+   def_info *def, rtx from, rtx to)
   : insn_propagation (use_insn->rtl (), from, to),
-single_use_p (def_insn->num_uses () == 1),
-single_ebb_p (use_insn->ebb () == def_insn->ebb ())
+single_use_p (def_has_single_use_p (def)),
+single_ebb_p (use_insn->ebb () == def->insn ()->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -368,9 +382,9 @@ contains_paradoxical_subreg_p (rtx x)
   return false;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into note NOTE of USE_INSN.
-   Return the number of substitutions on success, otherwise return -1 and
-   leave USE_INSN unchanged.
+/* Try to substitute (set DEST SRC), which defines DEF, into note NOTE of
+   USE_INSN.  Return the number of substitutions on success, otherwise return
+   -1 and leave USE_INSN unchanged.
 
If REQUIRE_CONSTANT is true, require all substituted occurences of SRC
to fold to a constant, so that the note does not use any more registers
@@ -379,13 +393,14 @@ contains_paradoxical_subreg_p (rtx x)
instruction pattern.  */
 
 static int
-try_fwprop_subst_note (insn_info *use_insn, insn_info *def_insn,
+try_fwprop_subst_note (insn_info *use_insn, def_info *def,
   rtx note, rtx dest, rtx src, bool require_constant)
 {
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def_insn, dest, src);
+  fwprop_propagation prop (use_insn, def, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -436,19 +451,20 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   return prop.num_replacements;
 }
 
-/* Try to substitute (set DEST SRC) from DEF_INSN into location LOC of
+/* Try to substitute (set DEST SRC), which defines DEF, into location LOC of
USE_INSN's pattern.  Return true on success, otherwise leave USE_INSN
unchanged.  */
 
 static bool
 try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change,
- insn_info *def_insn, rtx *loc, rtx dest, rtx src)
+ def_info *def, rtx *loc, rtx dest, rtx src)
 {
   insn_info *use_insn = use_change.insn ();
   rtx_insn *use_rtl = use_insn->rtl ();
+  insn_info *def_insn = def->insn ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_insn, def

[PATCH 2/2] IBM Z: Fix long double <-> DFP conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
When switching the s390 backend to store long doubles in vector
registers, the patterns for long double <-> DFP conversions were
forgotten.  This did not cause observable problems so far, because
libdfp calls are emitted instead of pfpo.  However, when building
libdfp itself, this leads to infinite recursion.

gcc/ChangeLog:

* config/s390/vector.md (trunctf2_vr): New
pattern.
(trunctf2): Likewise.
(trunctdtf2_vr): Likewise.
(trunctdtf2): Likewise.
(extendtf2_vr): Likewise.
(extendtf2): Likewise.
(extendtftd2_vr): Likewise.
(extendtftd2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-from-decimal128.c: New test.
* gcc.target/s390/vector/long-double-from-decimal32.c: New test.
* gcc.target/s390/vector/long-double-from-decimal64.c: New test.
* gcc.target/s390/vector/long-double-to-decimal128.c: New test.
* gcc.target/s390/vector/long-double-to-decimal32.c: New test.
* gcc.target/s390/vector/long-double-to-decimal64.c: New test.
---
 gcc/config/s390/vector.md | 72 +++
 .../s390/vector/long-double-from-decimal128.c | 20 ++
 .../s390/vector/long-double-from-decimal32.c  | 20 ++
 .../s390/vector/long-double-from-decimal64.c  | 20 ++
 .../s390/vector/long-double-to-decimal128.c   | 19 +
 .../s390/vector/long-double-to-decimal32.c| 19 +
 .../s390/vector/long-double-to-decimal64.c| 19 +
 7 files changed, 189 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index e48c965db00..bc52211c55e 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -2480,6 +2480,42 @@
   "HAVE_TF (trunctfsf2)"
   { EXPAND_TF (trunctfsf2, 2); })
 
+(define_expand "trunctf2_vr"
+  [(match_operand:DFP_ALL 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP
+   && GET_MODE_SIZE (TFmode) > GET_MODE_SIZE (mode)
+   && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, operands[1]));
+  emit_insn (gen_truncfprx22 (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "trunctf2"
+  [(match_operand:DFP_ALL 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (trunctf2)"
+  { EXPAND_TF (trunctf2, 2); })
+
+(define_expand "trunctdtf2_vr"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:TD 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_trunctdfprx22 (fprx2, operands[1]));
+  emit_insn (gen_fprx2_to_tf (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "trunctdtf2"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:TD 1 "nonimmediate_operand" "")]
+  "HAVE_TF (trunctdtf2)"
+  { EXPAND_TF (trunctdtf2, 2); })
+
 ; load lengthened
 
 (define_insn "extenddftf2_vr"
@@ -2511,6 +2547,42 @@
   "HAVE_TF (extendsftf2)"
   { EXPAND_TF (extendsftf2, 2); })
 
+(define_expand "extendtf2_vr"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:DFP_ALL 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP
+   && GET_MODE_SIZE (mode) < GET_MODE_SIZE (TFmode)
+   && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_extendfprx22 (fprx2, operands[1]));
+  emit_insn (gen_fprx2_to_tf (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "extendtf2"
+  [(match_operand:TF 0 "nonimmediate_operand" "")
+   (match_operand:DFP_ALL 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extendtf2)"
+  { EXPAND_TF (extendtf2, 2); })
+
+(define_expand "extendtftd2_vr"
+  [(match_operand:TD 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "TARGET_HARD_DFP && TARGET_VXE"
+{
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, operands[1]));
+  emit_insn (gen_extendfprx2td2 (operands[0], fprx2));
+  DONE;
+})
+
+(define_expand "extendtftd2"
+  [(match_operand:TD 0 "nonimmediate_operand" "")
+   (match_operand:TF 1 "nonimmediate_operand" "")]
+  "HAVE_TF (extendtftd2)"
+  { EXPAND_TF (extendtftd2, 2); })
+
 ; test data class
 
 (define_expand "signbittf2_vr"
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
new file mode 100644
index 000..3cd2c68f5c6
--- /dev/null
+++ b/gcc/testsui

[PATCH 1/2] IBM Z: Improve FPRX2 <-> TF conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
gcc/ChangeLog:

* config/s390/vector.md (*fprx2_to_tf): Rename to fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.
---
 gcc/config/s390/vector.md | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 0e3c31f5d4f..e48c965db00 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -616,12 +616,23 @@
vlvgp\t%v0,%1,%N1"
   [(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
 
-(define_insn "*fprx2_to_tf"
-  [(set (match_operand:TF   0 "nonimmediate_operand" "=v")
-   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f") 0))]
+(define_insn_and_split "fprx2_to_tf"
+  [(set (match_operand:TF   0 "nonimmediate_operand" "=v,AR")
+   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f,f") 0))]
   "TARGET_VXE"
-  "vmrhg\t%v0,%1,%N1"
-  [(set_attr "op_type" "VRR")])
+  "@
+   vmrhg\t%v0,%1,%N1
+   #"
+  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode, 0);
+  operands[3] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 0);
+  operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode, 8);
+  operands[5] = simplify_gen_subreg (DFmode, operands[1], FPRX2mode, 8);
+}
+  [(set_attr "op_type" "VRR,*")])
 
 (define_insn "*vec_ti_to_v1ti"
   [(set (match_operand:V1TI   0 "nonimmediate_operand" 
"=v,v,R,  v,  v,v")
@@ -753,6 +764,21 @@
   "vpdi\t%V0,%v1,%V0,5"
   [(set_attr "op_type" "VRR")])
 
+(define_insn_and_split "tf_to_fprx2"
+  [(set (match_operand:FPRX20 "nonimmediate_operand" "=f,f")
+   (subreg:FPRX2 (match_operand:TF 1 "general_operand"   "v,AR") 0))]
+  "TARGET_VXE"
+  "#"
+  "!(MEM_P (operands[1]) && MEM_VOLATILE_P (operands[1]))"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (match_dup 5))]
+{
+  operands[2] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 0);
+  operands[3] = simplify_gen_subreg (DFmode, operands[1], TFmode, 0);
+  operands[4] = simplify_gen_subreg (DFmode, operands[0], FPRX2mode, 8);
+  operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8);
+})
+
 ; vec_perm_const for V2DI using vpdi?
 
 ;;
-- 
2.29.2



[PATCH 0/2] IBM Z: Fix long double <-> DFP conversions

2021-02-18 Thread Ilya Leoshkevich via Gcc-patches
This series fixes PR99134.  Patch 1 is factored out from the pending
[1], patch 2 is the actual fix.  Bootstrapped and regtested on
s390x-redhat-linux.  Ok for master?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/564380.html

Ilya Leoshkevich (2):
  IBM Z: Improve FPRX2 <-> TF conversions
  IBM Z: Fix long double <-> DFP conversions

 gcc/config/s390/vector.md | 108 +-
 .../s390/vector/long-double-from-decimal128.c |  20 
 .../s390/vector/long-double-from-decimal32.c  |  20 
 .../s390/vector/long-double-from-decimal64.c  |  20 
 .../s390/vector/long-double-to-decimal128.c   |  19 +++
 .../s390/vector/long-double-to-decimal32.c|  19 +++
 .../s390/vector/long-double-to-decimal64.c|  19 +++
 7 files changed, 220 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-from-decimal64.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal128.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal32.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-to-decimal64.c

-- 
2.29.2



[PATCH] PING^2 Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-02-15 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

Best regards,
Ilya



[PATCH] PING lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-01-28 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

lra: clear lra_insn_recog_data after simplifying a mem subreg
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563428.html

Best regards,
Ilya



[PATCH v2] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

v1 -> v2: Handle constraint modifiers, use AR constraint instead of R,
add testcases for & and %.




After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (f_constraint_p): New function.
(s390_md_asm_adjust): Implement TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
* gcc.target/s390/vector/long-double-asm-commutative.c: New
test.
* gcc.target/s390/vector/long-double-asm-earlyclobber.c: New
test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 88 +++
 gcc/config/s390/vector.md | 36 ++--
 .../s390/vector/long-double-asm-abi.c | 26 ++
 .../s390/vector/long-double-asm-commutative.c | 16 
 .../vector/long-double-asm-earlyclobber.c | 17 
 .../s390/vector/long-double-asm-in-out.c  | 14 +++
 .../s390/vector/long-double-asm-inout.c   | 14 +++
 .../s390/vector/long-double-asm-matching.c| 13 +++
 .../vector/long-double-volatile-from-i64.c| 22 +
 9 files changed, 241 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-commutative.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-earlyclobber.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-matching.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9d2cee950d0..d4b098325e8 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16688,6 +16688,91 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Return TRUE iff CONSTRAINT is an "f" constraint, possibly with additional
+   modifiers.  */
+
+static bool
+f_constraint_p (const char *constraint)
+{
+  for (size_t i = 0, c_len = strlen (constraint); i < c_len;
+   i += CONSTRAINT_LEN (constraint[i], constraint + i))
+{
+  if (constraint[i] == 'f')
+   return true;
+}
+  return false;
+}
+
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (!f_constraint_p (constraint + 1))
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 (after_md_seq, after_md_end);
+  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  after_md_seq = get_insns ();
+  after_md_end = get_last_insn ();
+  end_sequence ();
+  outputs[i] = fprx2;
+}
+
+  for 

Re: [PATCH] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-27 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2021-01-27 at 08:58 +0100, Andreas Krebbel wrote:
> On 1/18/21 10:54 PM, Ilya Leoshkevich wrote:
> ...
> 
> > +static rtx_insn *
> > +s390_md_asm_adjust (vec &outputs, vec &inputs,
> > +   vec &input_modes,
> > +   vec &constraints, vec &
> > /*clobbers*/,
> > +   HARD_REG_SET & /*clobbered_regs*/)
> > +{
> > +  if (!TARGET_VXE)
> > +/* Long doubles are stored in FPR pairs - nothing to do.  */
> > +return NULL;
> > +
> > +  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
> > +
> > +  unsigned ninputs = inputs.length ();
> > +  unsigned noutputs = outputs.length ();
> > +  for (unsigned i = 0; i < noutputs; i++)
> > +{
> > +  if (GET_MODE (outputs[i]) != TFmode)
> > +   /* Not a long double - nothing to do.  */
> > +   continue;
> > +  const char *constraint = constraints[i];
> > +  bool allows_mem, allows_reg, is_inout;
> > +  bool ok = parse_output_constraint (&constraint, i, ninputs,
> > noutputs,
> > +&allows_mem, &allows_reg,
> > &is_inout);
> > +  gcc_assert (ok);
> > +  if (strcmp (constraint, "=f") != 0)
> > +   /* Long double with a constraint other than "=f" - nothing to
> > do.  */
> > +   continue;
> 
> What about other constraint modifiers like & and %? Don't we need to
> handle matching constraints as
> well here?

Oh, right - we need to account for %?!*&# and maybe some others.  I'll
j
ust copy the code from parse_output_constraint() that skips over all
of
them, because I don't think they need any special handling - we just
nee
d to make sure they don't mess up the recognition of "=f".

I don't think we need to explicitly support matching constraints,
because parse_input_constraint() will resolve them for us.  I'll add
a test for this just in case.

Do we make use of multi-alternative constraints on s390?  I think not,
because our instructions are fairly rigid, but maybe I'm missing
something?

...

> > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
> > index 0e3c31f5d4f..1332a65a1d1 100644
> > --- a/gcc/config/s390/vector.md
> > +++ b/gcc/config/s390/vector.md
> > @@ -616,12 +616,23 @@ (define_insn "*vec_tf_to_v1tf_vr"
> > vlvgp\t%v0,%1,%N1"
> >[(set_attr "op_type" "VRR,VRX,VRX,VRI,VRR")])
> >  
> > -(define_insn "*fprx2_to_tf"
> > -  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v")
> > -   (subreg:TF (match_operand:FPRX2 1 "general_operand"   "f")
> > 0))]
> > +(define_insn_and_split "fprx2_to_tf"
> > +  [(set (match_operand:TF   0 "nonimmediate_operand"
> > "=v,R")
> > +   (subreg:TF (match_operand:FPRX2 1
> > "general_operand"   "f,f") 0))]
> >"TARGET_VXE"
> > -  "vmrhg\t%v0,%1,%N1"
> > -  [(set_attr "op_type" "VRR")])
> > +  "@
> > +   vmrhg\t%v0,%1,%N1
> > +   #"
> > +  "!(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))"
> > +  [(set (match_dup 2) (match_dup 3))
> > +   (set (match_dup 4) (match_dup 5))]
> > +{
> > +  operands[2] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 0);
> > +  operands[3] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 0);
> > +  operands[4] = simplify_gen_subreg (DFmode, operands[0], TFmode,
> > 8);
> > +  operands[5] = simplify_gen_subreg (DFmode, operands[1],
> > FPRX2mode, 8);
> > +}
> > +  [(set_attr "op_type" "VRR,*")])
> 
> Splitting an address like this might cause the displacement to
> overflow in the second part. This
> would require an additional reg to make the address valid again.
> Which in turn will be a problem
> after reload. You can use the 'AR' constraint for the memory
> alternative. That way reload will make
> sure the address is offsetable.

Ok, thanks for the hint!



[PATCH v3] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-01-21 at 12:29 +, Richard Sandiford wrote:
> Given what you said in the other message about combine, I agree this
> is a reasonable workaround.  I don't know whether it's suitable for
> stage 4 or whether it would need to wait for stage 1.

Thanks for reviewing!  I've implemented your suggestions in the patch
below.

Regarding stage 4, this can be seen as a part of IBM Z

https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

regression fix - before moving long doubles to vector registers and
fixing up "f" constraints on RTL level, code generation for small
glibc functions like __ieee754_sqrtl has been fairly efficient.  Not
sure if that issue is big enough to justify this common code change at
this point, but still..



v2 -> v3: Added single_ebb_p, added paradoxical subreg check, fixed
formatting.  Bootstrapped and regtested on x86_64-redhat-linux,
pc64le-redhat-linux and s390x-redhat-linux.




Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.
---
 gcc/fwprop.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..123cc228630 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (rtx_insn *, rtx, rtx);
+fwprop_propagation (insn_info *, insn_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -185,13 +185,20 @@ namespace
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
 uint16_t classify_result (rtx, rtx);
+
+  private:
+const bool single_use_p;
+const bool single_ebb_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (insn_info *use_insn,
+   insn_info *def_insn, rtx from, rtx to)
+  : insn_propagation (use_insn->rtl (), from, to),
+single_use_p (def_insn->num_uses () == 1),
+single_ebb_p (use_insn->ebb () == def_insn->ebb ())
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +269,22 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications with the following
+ exceptions:
+ 1) Propagating (mem)s into multiple uses is not profitable.
+ 2) Propagating (mem)s across EBBs may not be profitable if the source EBB
+   runs less frequently.
+ 3) Propagating (mem)s into paradoxical (subreg)s is not profitable.
+ 4) Creating new (mem/v)s is not correct, since DCE will not remove the old
+   ones.  */
+  if (single_use_p
+  && single_ebb_p
+  && SUBREG_P (old_rtx)
+  && !paradoxical_subreg_p (old_rtx)
+  && MEM_P (new_rtx)
+  && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +386,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +449,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_insn, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2



Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-21 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2021-01-21 at 10:49 +, Richard Sandiford wrote:
> Ilya Leoshkevich via Gcc-patches  writes:
> > On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote:
> > > On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches
> > >  wrote:
> > > Suppose we have:
> > > > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
> > > > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))
> > > > 
> > > > It is clearly profitable to propagate the first insn into the
> > > > second
> > > > one and get:
> > > > 
> > > > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))
> > > > 
> > > > fwprop actually manages to perform this, but doesn't think the
> > > > result is
> > > > worth it, which results in unnecessary store/load sequences on
> > > > s390.
> > > > Improve the situation by classifying SUBREG -> MEM changes as
> > > > profitable.
> > > 
> > > IIRC fwprop also propagates into multiple uses and replacing a
> > > non-
> > > MEM
> > > with a MEM is only good when the original MEM goes away - is that
> > > properly
> > > dealt with here?
> > 
> > This is because of efficiency and not correctness reasons,
> > right?  For
> > correctness I already check MEM_VOLATILE_P (new_rtx).  For
> > efficiency I
> > think it would be reasonable to add def_insn->num_uses () == 1
> > check
> > (this passes my tests, I'm yet to do a full regtest though).
> 
> That sounds plausible, but I think there's also the issue that the
> mem could be in a less frequently executed block.
> 
> A potential problem with checking num_uses is that it might make the
> boundary between fwprop and combine more fuzzy.  If the propagation
> makes the original instruction redundant then we should remove it
> and take the cost of the removal into account when costing the
> propagation (as combine does).  fwprop is instead set up for cases
> in which propagations are profitable even if the original instruction
> is kept.
> 
> What prevents combine from handling this?  Are the instructions in
> different blocks?

I wanted to do this before combine, because in __ieee754_sqrtl case
fwprop turns this (example from the commit message + the insn after
it):

(set (reg:TF 63) (mem:TF (reg:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg:TF 63) 0))
(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(reg:FPRX2 66)]
 [(asm_input:FPRX2 ("f"))]
 []))

into this:

(set (reg:TF 63) (mem:TF (reg:DI 62)))
(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(subreg:FPRX2 (reg:TF 63) 0)]
 [(asm_input:FPRX2 ("f"))]
 []))

by propagating (reg:FPRX2 66), and there is not much combine can do
about this anymore:

(set (reg:FPRX2 65)
 (asm_operands:FPRX2 ("sqxbr %0,%1") ("=f") 0
 [(mem:FPRX2 (reg:DI 62))]
 [(asm_input:FPRX2 ("f"))]
 []))

is not a valid insn.



[PATCH] PING Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-01-20 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I would like to ping the following patch:

Add input_modes parameter to TARGET_MD_ASM_ADJUST hook
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html

It is needed for the following regression fix:

IBM Z: Fix usage of "f" constraint with long doubles
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html

Best regards,
Ilya



[PATCH v2] fwprop: Allow (subreg (mem)) simplifications

2021-01-19 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563800.html

v1 -> v2: Allow (mem) -> (subreg) propagation only for single uses.

Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Ok for master?



Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.
---
 gcc/fwprop.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..02d3d507cbc 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -176,7 +176,7 @@ namespace
 static const uint16_t CONSTANT = FIRST_SPARE_RESULT << 1;
 static const uint16_t PROFITABLE = FIRST_SPARE_RESULT << 2;
 
-fwprop_propagation (rtx_insn *, rtx, rtx);
+fwprop_propagation (rtx_insn *, insn_info *, rtx, rtx);
 
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
@@ -185,13 +185,18 @@ namespace
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
 uint16_t classify_result (rtx, rtx);
+
+  private:
+const bool single_use_p;
   };
 }
 
 /* Prepare to replace FROM with TO in INSN.  */
 
-fwprop_propagation::fwprop_propagation (rtx_insn *insn, rtx from, rtx to)
-  : insn_propagation (insn, from, to)
+fwprop_propagation::fwprop_propagation (rtx_insn *insn, insn_info *def_insn,
+   rtx from, rtx to)
+: insn_propagation (insn, from, to),
+  single_use_p (def_insn->num_uses () == 1)
 {
   should_check_mems = true;
   should_note_simplifications = true;
@@ -262,6 +267,13 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications.  Do not allow propagation
+ of (mem)s into multiple uses, since those are not profitable, as well as
+ creating new (mem/v)s, since DCE will not remove the old ones.  */
+  if (single_use_p && SUBREG_P (old_rtx) && MEM_P (new_rtx)
+  && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
@@ -363,7 +375,7 @@ try_fwprop_subst_note (insn_info *use_insn, insn_info 
*def_insn,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_rtl, def_insn, dest, src);
   if (!prop.apply_to_rvalue (&XEXP (note, 0)))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
@@ -426,7 +438,7 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, 
insn_change &use_change,
   rtx_insn *use_rtl = use_insn->rtl ();
 
   insn_change_watermark watermark;
-  fwprop_propagation prop (use_rtl, dest, src);
+  fwprop_propagation prop (use_rtl, def_insn, dest, src);
   if (!prop.apply_to_pattern (loc))
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
-- 
2.26.2



Re: [PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-19 Thread Ilya Leoshkevich via Gcc-patches
On Tue, 2021-01-19 at 09:41 +0100, Richard Biener wrote:
> On Mon, Jan 18, 2021 at 11:04 PM Ilya Leoshkevich via Gcc-patches
>  wrote:
> > 
> Suppose we have:
> > 
> > (set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
> > (set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))
> > 
> > It is clearly profitable to propagate the first insn into the
> > second
> > one and get:
> > 
> > (set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))
> > 
> > fwprop actually manages to perform this, but doesn't think the
> > result is
> > worth it, which results in unnecessary store/load sequences on
> > s390.
> > Improve the situation by classifying SUBREG -> MEM changes as
> > profitable.
> 
> IIRC fwprop also propagates into multiple uses and replacing a non-
> MEM
> with a MEM is only good when the original MEM goes away - is that
> properly
> dealt with here?

This is because of efficiency and not correctness reasons, right?  For
c
orrectness I already check MEM_VOLATILE_P (new_rtx).  For efficiency I
t
hink it would be reasonable to add def_insn->num_uses () == 1 check
(thi
s passes my tests, I'm yet to do a full regtest though).  What do
you
think about this?



[PATCH] fwprop: Allow (subreg (mem)) simplifications

2021-01-18 Thread Ilya Leoshkevich via Gcc-patches
Boostrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  I realize it might be too late for a change
like this, but it's desirable to have this in conjunction with the
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563799.html s390
regression fix, which otherwise produces unnecessary store/load
sequences in certain glibc routines, e.g. __ieee754_sqrtl.  Ok for
master?



Suppose we have:

(set (reg/v:TF 63) (mem/c:TF (reg/v:DI 62)))
(set (reg:FPRX2 66) (subreg:FPRX2 (reg/v:TF 63) 0))

It is clearly profitable to propagate the first insn into the second
one and get:

(set (reg:FPRX2 66) (mem/c:FPRX2 (reg/v:DI 62)))

fwprop actually manages to perform this, but doesn't think the result is
worth it, which results in unnecessary store/load sequences on s390.
Improve the situation by classifying SUBREG -> MEM changes as
profitable.

gcc/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* fwprop.c (fwprop_propagation::classify_result): Allow
(subreg (mem)) simplifications.

gcc/testsuite/ChangeLog:

2021-01-15  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-to-i64.c: Expect that
float-vector moves do *not* happen.
---
 gcc/fwprop.c  | 5 +
 gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/fwprop.c b/gcc/fwprop.c
index eff8f7cc141..46b8ec7eccf 100644
--- a/gcc/fwprop.c
+++ b/gcc/fwprop.c
@@ -262,6 +262,11 @@ fwprop_propagation::classify_result (rtx old_rtx, rtx 
new_rtx)
   && GET_MODE (new_rtx) == GET_MODE_INNER (GET_MODE (from)))
 return PROFITABLE;
 
+  /* Allow (subreg (mem)) -> (mem) simplifications.  However, do not allow
+ creating new (mem/v)s, since DCE will not remove the old ones.  */
+  if (SUBREG_P (old_rtx) && MEM_P (new_rtx) && !MEM_VOLATILE_P (new_rtx))
+return PROFITABLE;
+
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
index 2dbbb5d1c03..8f4e377ed72 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
@@ -10,8 +10,7 @@ long_double_to_i64 (long double x)
   return x;
 }
 
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */
+/* { dg-final { scan-assembler-not {\n\tvpdi\t} } } */
 /* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */
 
 int
-- 
2.26.2



[PATCH] IBM Z: Fix usage of "f" constraint with long doubles

2021-01-18 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Depends on
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/562898.html;
ok for master once the dependency is committed?



After switching the s390 backend to store long doubles in vector
registers, "f" constraint broke when used with the former: long doubles
correspond to TFmode, which in combination with "f" corresponds to
hard regs %v0-%v15, however, asm users expect a %f0-%f15 pair.

Fix by using TARGET_MD_ASM_ADJUST hook to convert TFmode values to
FPRX2mode and back.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390.c (s390_md_asm_adjust): Implement
TARGET_MD_ASM_ADJUST.
(TARGET_MD_ASM_ADJUST): Likewise.
* config/s390/vector.md (fprx2_to_tf): Rename from *fprx2_to_tf,
add memory alternative.
(tf_to_fprx2): New pattern.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-asm-abi.c: New test.
* gcc.target/s390/vector/long-double-asm-in-out.c: New test.
* gcc.target/s390/vector/long-double-asm-inout.c: New test.
* gcc.target/s390/vector/long-double-volatile-from-i64.c: New
test.
---
 gcc/config/s390/s390.c| 73 +++
 gcc/config/s390/vector.md | 36 +++--
 .../s390/vector/long-double-asm-abi.c | 26 +++
 .../s390/vector/long-double-asm-in-out.c  | 14 
 .../s390/vector/long-double-asm-inout.c   | 14 
 .../vector/long-double-volatile-from-i64.c| 22 ++
 6 files changed, 180 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-abi.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-asm-in-out.c
 create mode 100644 gcc/testsuite/gcc.target/s390/vector/long-double-asm-inout.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-volatile-from-i64.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 9d2cee950d0..a22fd9fe391 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -16688,6 +16688,76 @@ s390_shift_truncation_mask (machine_mode mode)
   return mode == DImode || mode == SImode ? 63 : 0;
 }
 
+/* Implement TARGET_MD_ASM_ADJUST hook in order to fix up "f"
+   constraints when long doubles are stored in vector registers.  */
+
+static rtx_insn *
+s390_md_asm_adjust (vec &outputs, vec &inputs,
+   vec &input_modes,
+   vec &constraints, vec & /*clobbers*/,
+   HARD_REG_SET & /*clobbered_regs*/)
+{
+  if (!TARGET_VXE)
+/* Long doubles are stored in FPR pairs - nothing to do.  */
+return NULL;
+
+  rtx_insn *after_md_seq = NULL, *after_md_end = NULL;
+
+  unsigned ninputs = inputs.length ();
+  unsigned noutputs = outputs.length ();
+  for (unsigned i = 0; i < noutputs; i++)
+{
+  if (GET_MODE (outputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[i];
+  bool allows_mem, allows_reg, is_inout;
+  bool ok = parse_output_constraint (&constraint, i, ninputs, noutputs,
+&allows_mem, &allows_reg, &is_inout);
+  gcc_assert (ok);
+  if (strcmp (constraint, "=f") != 0)
+   /* Long double with a constraint other than "=f" - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  gcc_assert (!is_inout);
+  /* Copy output value from a FPR pair into a vector register.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  push_to_sequence2 (after_md_seq, after_md_end);
+  emit_insn (gen_fprx2_to_tf (outputs[i], fprx2));
+  after_md_seq = get_insns ();
+  after_md_end = get_last_insn ();
+  end_sequence ();
+  outputs[i] = fprx2;
+}
+
+  for (unsigned i = 0; i < ninputs; i++)
+{
+  if (GET_MODE (inputs[i]) != TFmode)
+   /* Not a long double - nothing to do.  */
+   continue;
+  const char *constraint = constraints[noutputs + i];
+  bool allows_mem, allows_reg;
+  bool ok = parse_input_constraint (&constraint, i, ninputs, noutputs, 0,
+   constraints.address (), &allows_mem,
+   &allows_reg);
+  gcc_assert (ok);
+  if (strcmp (constraint, "f") != 0 && strcmp (constraint, "=f") != 0)
+   /* Long double with a constraint other than "f" (or "=f" for inout
+  operands) - nothing to do.  */
+   continue;
+  gcc_assert (allows_reg);
+  gcc_assert (!allows_mem);
+  /* Copy input value from a vector register into a FPR pair.  */
+  rtx fprx2 = gen_reg_rtx (FPRX2mode);
+  emit_insn (gen_tf_to_fprx2 (fprx2, inputs[i]));
+  inputs[i] = 

[PATCH] lra: clear lra_insn_recog_data after simplifying a mem subreg

2021-01-13 Thread Ilya Leoshkevich via Gcc-patches
Hello,

I ran into this problem when writing new patterns for s390.  I'm not
100% sure this fix is correct, but it resolves my issue and survives
bootstrap and regtest on x86_64-redhat-linux, ppc64le-redhat-linux and
s390x-redhat-linux.  Could you please take a look?

Best regards,
Ilya




Suppose we have:

(insn (set (reg:FPRX2 70) (subreg:FPRX2 (reg/v:TF 63) 0)))

where operand_loc[0] points to r70 and operand_loc[1] points to r63.
If r63 is spilled, remove_pseudos() will change this insn to:

  (insn (set (reg:FPRX2 70)
 (subreg:FPRX2 (mem/c:TF (plus:DI (reg:DI %fp)
  (const_int 144))

This is fine so far: rtx pointed to by operand_loc[1] has been changed
from (reg) to (mem), but its slot is still under (subreg).  However,
alter_subreg() will simplify this insn to:

  (insn (set (reg:FPRX2 70)
 (mem/c:FPRX2 (plus:DI (reg:DI %fp) (const_int 144)

The (subreg) is gone, and therefore operand_loc[1] is no longer valid.
This will prevent process_insn_for_elimination() from updating the spill
slot offset, causing miscompilation: different instructions will refer
to the same spill slot using different offsets.

Fix by clearing all the cached data, and not just used_insn_alternative.

gcc/ChangeLog:

2021-01-13  Ilya Leoshkevich  

* lra-spills.c (remove_pseudos): Call lra_update_insn_recog_data()
after calling alter_subreg() on a (mem).
---
 gcc/lra-spills.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lra-spills.c b/gcc/lra-spills.c
index 26f56b2df02..01bd82574e7 100644
--- a/gcc/lra-spills.c
+++ b/gcc/lra-spills.c
@@ -431,7 +431,7 @@ remove_pseudos (rtx *loc, rtx_insn *insn)
  alter_subreg (loc, false);
  if (GET_CODE (*loc) == MEM)
{
- lra_get_insn_recog_data (insn)->used_insn_alternative = -1;
+ lra_update_insn_recog_data (insn);
  if (lra_dump_file != NULL)
fprintf (lra_dump_file,
 "Memory subreg was simplified in insn #%u\n",
-- 
2.26.2



[PATCH] IBM Z: Fix constraints in vpdi patterns

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



The destination register is only partially overwritten, so + should be
used instead of =.

gcc/ChangeLog:

2021-01-08  Ilya Leoshkevich  

* config/s390/vector.md (*tf_to_fprx2_0): Rename from
*mov_tf_to_fprx2_0 for consistency, fix constraint.
(*tf_to_fprx2_1): Rename from *mov_tf_to_fprx2_1 for
consistency, fix constraint.
---
 gcc/config/s390/vector.md | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index 5b8d75f18f0..0e3c31f5d4f 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -737,16 +737,16 @@ (define_insn "*vec_perm"
   "vperm\t%v0,%v1,%v2,%v3"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_0"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 0)
+(define_insn "*tf_to_fprx2_0"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 0)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 0))]
   "TARGET_VXE"
   ; M4 == 1 corresponds to %v0[0] = %v1[0]; %v0[1] = %v0[1];
   "vpdi\t%v0,%v1,%v0,1"
   [(set_attr "op_type" "VRR")])
 
-(define_insn "*mov_tf_to_fprx2_1"
-  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "=f") 8)
+(define_insn "*tf_to_fprx2_1"
+  [(set (subreg:DF (match_operand:FPRX2 0 "nonimmediate_operand" "+f") 8)
(subreg:DF (match_operand:TF1 "general_operand"   "v") 8))]
   "TARGET_VXE"
   ; M4 == 5 corresponds to %V0[0] = %v1[1]; %V0[1] = %V0[1];
-- 
2.26.2



[PATCH v2] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-08 Thread Ilya Leoshkevich via Gcc-patches
v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563034.html
v1 -> v2: Use TARGET_VXE_P instead of TARGET_Z14_P.



Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on.c: New test.
---
 gcc/config/s390/s390-c.c  | 59 ---
 .../s390/vector/long-double-vx-macro-off-on.c | 11 
 .../s390/vector/long-double-vx-macro-on-off.c | 11 
 3 files changed, 60 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 95cd2df505d..a5f5f56311a 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
 /* Helper function that defines or undefines macros.  If SET is true, the macro
MACRO_DEF is defined.  If SET is false, the macro MACRO_UNDEF is undefined.
Nothing is done if SET and WAS_SET have the same value.  */
+template 
 static void
-s390_def_or_undef_macro (cpp_reader *pfile,
-unsigned int mask,
+s390_def_or_undef_macro (cpp_reader *pfile, F is_set,
 const struct cl_target_option *old_opts,
 const struct cl_target_option *new_opts,
 const char *macro_def, const char *macro_undef)
@@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile,
   bool was_set;
   bool set;
 
-  was_set = (!old_opts) ? false : old_opts->x_target_flags & mask;
-  set = new_opts->x_target_flags & mask;
+  was_set = (!old_opts) ? false : is_set (old_opts);
+  set = is_set (new_opts);
   if (was_set == set)
 return;
   if (set)
@@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile,
 cpp_undef (pfile, macro_undef);
 }
 
+struct target_flag_set_p
+{
+  target_flag_set_p (unsigned int mask) : m_mask (mask) {}
+
+  bool
+  operator() (const struct cl_target_option *opts) const
+  {
+return opts->x_target_flags & m_mask;
+  }
+
+  unsigned int m_mask;
+};
+
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
@@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
struct cl_target_option *opts,
const struct cl_target_option *old_opts)
 {
-  s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts,
-  "__HTM__", "__HTM__");
-  s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts,
-  "__VX__", "__VX__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__VEC__=10303", "__VEC__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__vector=__attribute__((vector_size(16)))",
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts,
+  opts, "__HTM__", "__HTM__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts,
+  opts, "__VX__", "__VX__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__VEC__=10303", "__VEC__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__vector=__attribute__((vector_size(16)))",
   "__vector__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__bool=__attribute__((s390_vector_bool)) unsigned",
-  "__bool");
+  s390_def_or_undef_macro (
+  pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts,
+  "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool");
   {
 char macro_def[64];
 gcc_assert (s390_arch != PROCESSOR_NATIVE);
@@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
 cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
+  s390_def_or_undef_macro (
+  

[PATCH] IBM Z: Introduce __LONG_DOUBLE_VX__ macro

2021-01-07 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?



Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

gcc/testsuite/ChangeLog:

2020-12-14  Ilya Leoshkevich  

* gcc.target/s390/vector/long-double-vx-macro-off.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on.c: New test.
---
 gcc/config/s390/s390-c.c  | 59 ---
 .../s390/vector/long-double-vx-macro-off-on.c | 11 
 .../s390/vector/long-double-vx-macro-on-off.c | 11 
 3 files changed, 60 insertions(+), 21 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-off-on.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/long-double-vx-macro-on-off.c

diff --git a/gcc/config/s390/s390-c.c b/gcc/config/s390/s390-c.c
index 95cd2df505d..29b87d76ab1 100644
--- a/gcc/config/s390/s390-c.c
+++ b/gcc/config/s390/s390-c.c
@@ -294,9 +294,9 @@ s390_macro_to_expand (cpp_reader *pfile, const cpp_token 
*tok)
 /* Helper function that defines or undefines macros.  If SET is true, the macro
MACRO_DEF is defined.  If SET is false, the macro MACRO_UNDEF is undefined.
Nothing is done if SET and WAS_SET have the same value.  */
+template 
 static void
-s390_def_or_undef_macro (cpp_reader *pfile,
-unsigned int mask,
+s390_def_or_undef_macro (cpp_reader *pfile, F is_set,
 const struct cl_target_option *old_opts,
 const struct cl_target_option *new_opts,
 const char *macro_def, const char *macro_undef)
@@ -304,8 +304,8 @@ s390_def_or_undef_macro (cpp_reader *pfile,
   bool was_set;
   bool set;
 
-  was_set = (!old_opts) ? false : old_opts->x_target_flags & mask;
-  set = new_opts->x_target_flags & mask;
+  was_set = (!old_opts) ? false : is_set (old_opts);
+  set = is_set (new_opts);
   if (was_set == set)
 return;
   if (set)
@@ -314,6 +314,19 @@ s390_def_or_undef_macro (cpp_reader *pfile,
 cpp_undef (pfile, macro_undef);
 }
 
+struct target_flag_set_p
+{
+  target_flag_set_p (unsigned int mask) : m_mask (mask) {}
+
+  bool
+  operator() (const struct cl_target_option *opts) const
+  {
+return opts->x_target_flags & m_mask;
+  }
+
+  unsigned int m_mask;
+};
+
 /* Internal function to either define or undef the appropriate system
macros.  */
 static void
@@ -321,18 +334,18 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
struct cl_target_option *opts,
const struct cl_target_option *old_opts)
 {
-  s390_def_or_undef_macro (pfile, MASK_OPT_HTM, old_opts, opts,
-  "__HTM__", "__HTM__");
-  s390_def_or_undef_macro (pfile, MASK_OPT_VX, old_opts, opts,
-  "__VX__", "__VX__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__VEC__=10303", "__VEC__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__vector=__attribute__((vector_size(16)))",
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_HTM), old_opts,
+  opts, "__HTM__", "__HTM__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_OPT_VX), old_opts,
+  opts, "__VX__", "__VX__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__VEC__=10303", "__VEC__");
+  s390_def_or_undef_macro (pfile, target_flag_set_p (MASK_ZVECTOR), old_opts,
+  opts, "__vector=__attribute__((vector_size(16)))",
   "__vector__");
-  s390_def_or_undef_macro (pfile, MASK_ZVECTOR, old_opts, opts,
-  "__bool=__attribute__((s390_vector_bool)) unsigned",
-  "__bool");
+  s390_def_or_undef_macro (
+  pfile, target_flag_set_p (MASK_ZVECTOR), old_opts, opts,
+  "__bool=__attribute__((s390_vector_bool)) unsigned", "__bool");
   {
 char macro_def[64];
 gcc_assert (s390_arch != PROCESSOR_NATIVE);
@@ -340,16 +353,20 @@ s390_cpu_cpp_builtins_internal (cpp_reader *pfile,
 cpp_undef (pfile, "__ARCH__");
 cpp_define (pfile, macro_def);
   }
+  s390_def_or_undef_macro (
+  pfile,
+  [] (const struc

[PATCH] Add input_modes parameter to TARGET_MD_ASM_ADJUST hook

2021-01-05 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux.  I also built
cross-compilers for arm-linux-gnueabi, cris-elf mn10300-elf,
nds32-linux-gnu, pdp11-aout (didn't fully work due to
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg251887.html,
but the changed code compiled fine), powerpc-linux-gnu, vax-linux-gnu
and visium-elf, but didn't test them.  I ran into this issue while
implementing TARGET_MD_ASM_ADJUST for s390.  Ok for master?



If TARGET_MD_ASM_ADJUST changes a mode of an input operand (which
should be ok as long as the hook itself as well as after_md_seq make up
for it), input_mode will contain stale information.

It might be tempting to fix this by removing input_mode altogether and
just using GET_MODE (), but this will not work correctly with constants.
So add input_modes parameter and document that it should be updated
whenever inputs parameter is updated.

gcc/ChangeLog:

2021-01-05  Ilya Leoshkevich  

* cfgexpand.c (expand_asm_loc): Pass new parameter.
(expand_asm_stmt): Likewise.
* config/arm/aarch-common-protos.h (arm_md_asm_adjust): Add new
parameter.
* config/arm/aarch-common.c (arm_md_asm_adjust): Likewise.
* config/arm/arm.c (thumb1_md_asm_adjust): Likewise.
* config/cris/cris.c (cris_md_asm_adjust): Likewise.
* config/i386/i386.c (ix86_md_asm_adjust): Likewise.
* config/mn10300/mn10300.c (mn10300_md_asm_adjust): Likewise.
* config/nds32/nds32.c (nds32_md_asm_adjust): Likewise.
* config/pdp11/pdp11.c (pdp11_md_asm_adjust): Likewise.
* config/rs6000/rs6000.c (rs6000_md_asm_adjust): Likewise.
* config/vax/vax.c (vax_md_asm_adjust): Likewise.
* config/visium/visium.c (visium_md_asm_adjust): Likewise.
* target.def (md_asm_adjust): Likewise.
---
 gcc/cfgexpand.c  | 16 
 gcc/config/arm/aarch-common-protos.h |  8 
 gcc/config/arm/aarch-common.c|  7 ---
 gcc/config/arm/arm.c | 14 --
 gcc/config/cris/cris.c   |  7 ---
 gcc/config/i386/i386.c   |  7 ---
 gcc/config/mn10300/mn10300.c |  7 ---
 gcc/config/nds32/nds32.c |  1 +
 gcc/config/pdp11/pdp11.c |  9 +
 gcc/config/rs6000/rs6000.c   |  7 ---
 gcc/config/vax/vax.c |  3 ++-
 gcc/config/visium/visium.c   | 12 +++-
 gcc/doc/tm.texi  | 10 ++
 gcc/target.def   | 13 -
 14 files changed, 69 insertions(+), 52 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index b73019b241f..e25528261a0 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2879,6 +2879,7 @@ expand_asm_loc (tree string, int vol, location_t locus)
   rtx asm_op, clob;
   unsigned i, nclobbers;
   auto_vec input_rvec, output_rvec;
+  auto_vec input_mode;
   auto_vec constraints;
   auto_vec clobber_rvec;
   HARD_REG_SET clobbered_regs;
@@ -2888,9 +2889,8 @@ expand_asm_loc (tree string, int vol, location_t locus)
   clobber_rvec.safe_push (clob);
 
   if (targetm.md_asm_adjust)
-   targetm.md_asm_adjust (output_rvec, input_rvec,
-  constraints, clobber_rvec,
-  clobbered_regs);
+   targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+  constraints, clobber_rvec, clobbered_regs);
 
   asm_op = body;
   nclobbers = clobber_rvec.length ();
@@ -3067,8 +3067,8 @@ expand_asm_stmt (gasm *stmt)
   return;
 }
 
-  /* There are some legacy diagnostics in here, and also avoids a
- sixth parameger to targetm.md_asm_adjust.  */
+  /* There are some legacy diagnostics in here, and also avoids an extra
+ parameter to targetm.md_asm_adjust.  */
   save_input_location s_i_l(locus);
 
   unsigned noutputs = gimple_asm_noutputs (stmt);
@@ -3419,9 +3419,9 @@ expand_asm_stmt (gasm *stmt)
  the flags register.  */
   rtx_insn *after_md_seq = NULL;
   if (targetm.md_asm_adjust)
-after_md_seq = targetm.md_asm_adjust (output_rvec, input_rvec,
- constraints, clobber_rvec,
- clobbered_regs);
+after_md_seq
+   = targetm.md_asm_adjust (output_rvec, input_rvec, input_mode,
+constraints, clobber_rvec, clobbered_regs);
 
   /* Do not allow the hook to change the output and input count,
  lest it mess up the operand numbering.  */
diff --git a/gcc/config/arm/aarch-common-protos.h 
b/gcc/config/arm/aarch-common-protos.h
index 251de3d61a8..cbef50dde71 100644
--- a/gcc/config/arm/aarch-common-protos.h
+++ b/gcc/config/arm/aarch-common-protos.h
@@ -143,9 +143,9 @@ struct cpu_cost_table
   const struct vector_cost_table vect;
 };
 
-rtx_insn *
-arm_md_asm_adjust (vec &outputs, vec &/*inputs*/,
- 

[PATCH] IBM Z: Fix check_effective_target_s390_z14_hw

2021-01-05 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on z14.  Ok for master?



Commit 2f473f4b065d ("IBM Z: Do not run long double tests on old
machines") introduced a predicate for tests that must run only on z14+.
However, due to a syntax error, the predicate always returns false.

gcc/testsuite/ChangeLog:

2020-12-10  Ilya Leoshkevich  

* gcc.target/s390/s390.exp: Replace %% with %.
---
 gcc/testsuite/gcc.target/s390/s390.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/s390/s390.exp 
b/gcc/testsuite/gcc.target/s390/s390.exp
index ba493de9f95..57b2690f8ab 100644
--- a/gcc/testsuite/gcc.target/s390/s390.exp
+++ b/gcc/testsuite/gcc.target/s390/s390.exp
@@ -197,7 +197,7 @@ proc check_effective_target_s390_z14_hw { } {
int main (void)
{
int x = 0;
-   asm ("msgrkc %%0,%%0,%%0" : "+r" (x) : );
+   asm ("msgrkc %0,%0,%0" : "+r" (x) : );
return x;
}
 }] "-march=z14 -m64 -mzarch" ] } { return 0 } else { return 1 }
-- 
2.26.2



[PATCH v2] aix: Fixinclude updates [PR98208]

2020-12-14 Thread Ilya Leoshkevich via Gcc-patches
On Fri, 2020-12-11 at 07:51 -0500, Nathan Sidwell wrote:
>
> I'm pretty sure this is wrong.  I think the test_text in
> inclhack.def
> should be a pre-fixed string that the testsuite presumably checks is
> converted.

You're right; I've added your change from the Bugzilla and updated the
expectation.  Does the following look better?



After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to
fail (at least on gcc121 machine).  Fix by updating fixincludes/tests
and rerunning genfixes.

Co-developed-by: Nathan Sidwell 

fixincludes/ChangeLog:

2020-12-11  Ilya Leoshkevich  

* fixincl.x: Rerun genfixes.
* inclhack.def(aix_physadr_t): Change test_text to something
that needs to be replaced.
* tests/base/sys/types.h(aix_physadr_t): Add expectation.
---
 fixincludes/fixincl.x  | 4 ++--
 fixincludes/inclhack.def   | 2 +-
 fixincludes/tests/base/sys/types.h | 5 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x
index 21439652bce..cc17edfba0b 100644
--- a/fixincludes/fixincl.x
+++ b/fixincludes/fixincl.x
@@ -2,11 +2,11 @@
  *
  * DO NOT EDIT THIS FILE   (fixincl.x)
  *
- * It has been AutoGen-ed  October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16
+ * It has been AutoGen-ed  December  9, 2020 at 11:16:08 AM by AutoGen 5.18.16
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020
+/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec  9 11:16:08 EST 2020
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index 80c9adfb07c..3a4cfe06542 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -731,7 +731,7 @@ fix = {
 select= "typedef[ \t]*struct[ \t]*([{][^}]*[}][ \t]*\\*[ 
\t]*physadr_t;)";
 c_fix = format;
 c_fix_arg = "typedef struct __physadr_s %1";
-test_text = "typedef struct __physadr_s {";
+test_text = "typedef   struct { random stuff } *   physadr_t;";
 };
 
 /*
diff --git a/fixincludes/tests/base/sys/types.h 
b/fixincludes/tests/base/sys/types.h
index 683b5e93ecd..7340e76b175 100644
--- a/fixincludes/tests/base/sys/types.h
+++ b/fixincludes/tests/base/sys/types.h
@@ -9,6 +9,11 @@
 
 
 
+#if defined( AIX_PHYSADR_T_CHECK )
+typedef struct __physadr_s { random stuff } *  physadr_t;
+#endif  /* AIX_PHYSADR_T_CHECK */
+
+
 #if defined( GNU_TYPES_CHECK )
 #if !defined(_GCC_PTRDIFF_T)
 #define _GCC_PTRDIFF_T
-- 
2.25.4



[PATCH] aix: Fixinclude updates [PR98208]

2020-12-10 Thread Ilya Leoshkevich via Gcc-patches
Tested on gcc121 (x86_64 CentOS Linux 7).  Ok for master?



After 92648faa1cb2 ("aix: Fixinclude") make check-fixincludes began to
fail (at least on gcc121 machine).  Fix by updating fixincludes/tests
and rerunning genfixes.

fixincludes/ChangeLog:

2020-12-11  Ilya Leoshkevich  

* fixincl.x: Rerun genfixes.
* tests/base/sys/types.h: Add AIX_PHYSADR_T_CHECK.
---
 fixincludes/fixincl.x  | 4 ++--
 fixincludes/tests/base/sys/types.h | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fixincludes/fixincl.x b/fixincludes/fixincl.x
index 21439652bce..cc17edfba0b 100644
--- a/fixincludes/fixincl.x
+++ b/fixincludes/fixincl.x
@@ -2,11 +2,11 @@
  *
  * DO NOT EDIT THIS FILE   (fixincl.x)
  *
- * It has been AutoGen-ed  October 21, 2020 at 10:43:22 AM by AutoGen 5.18.16
+ * It has been AutoGen-ed  December  9, 2020 at 11:16:08 AM by AutoGen 5.18.16
  * From the definitionsinclhack.def
  * and the template file   fixincl
  */
-/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Oct 21 10:43:22 EDT 2020
+/* DO NOT SVN-MERGE THIS FILE, EITHER Wed Dec  9 11:16:08 EST 2020
  *
  * You must regenerate it.  Use the ./genfixes script.
  *
diff --git a/fixincludes/tests/base/sys/types.h 
b/fixincludes/tests/base/sys/types.h
index 683b5e93ecd..a318f9b713b 100644
--- a/fixincludes/tests/base/sys/types.h
+++ b/fixincludes/tests/base/sys/types.h
@@ -9,6 +9,11 @@
 
 
 
+#if defined( AIX_PHYSADR_T_CHECK )
+typedef struct __physadr_s {
+#endif  /* AIX_PHYSADR_T_CHECK */
+
+
 #if defined( GNU_TYPES_CHECK )
 #if !defined(_GCC_PTRDIFF_T)
 #define _GCC_PTRDIFF_T
-- 
2.25.4



[PATCH] Limit perf data buffer during feature checking

2020-12-09 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on x86_64-redhat-linux.  Ok for master?

Commit 2ead1ab91123 ("Limit perf data buffer during profiling") added
-m8 to perf invocations during running tests, but the same problem
exists for checking whether perf is working in the first place.

gcc/testsuite/ChangeLog:

2020-12-08  Ilya Leoshkevich  

* lib/target-supports.exp(check_profiling_available): Limit
perf data buffer.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 89c4f67554f..75b4f5d0e85 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -654,7 +654,7 @@ proc check_profiling_available { test_what } {
return 0
}
 global srcdir
-   set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" 
"true -v >/dev/null"]
+   set status [remote_exec host "$srcdir/../config/i386/gcc-auto-profile" 
"-m8 true -v >/dev/null"]
if { [lindex $status 0] != 0 } {
verbose "autofdo not supported because perf does not work"
return 0
-- 
2.25.4



Re: [PATCH v4 1/2] asan: specify alignment for LASANPC labels

2020-12-08 Thread Ilya Leoshkevich via Gcc-patches
On Thu, 2020-07-09 at 14:07 +0200, Ilya Leoshkevich wrote:
> On Wed, 2020-07-01 at 21:48 +0200, Ilya Leoshkevich wrote:
> > On Wed, 2020-07-01 at 11:57 -0600, Jeff Law wrote:
> > > On Wed, 2020-07-01 at 14:29 +0200, Ilya Leoshkevich via Gcc-
> > > patches
> > > wrote:
> > > > gcc/ChangeLog:
> > > > 
> > > > 2020-06-30  Ilya Leoshkevich  
> > > > 
> > > > * asan.c (asan_emit_stack_protection): Use
> > > > CODE_LABEL_BOUNDARY.
> > > > * defaults.h (CODE_LABEL_BOUNDARY): New macro.
> > > > * doc/tm.texi: Document CODE_LABEL_BOUNDARY.
> > > > * doc/tm.texi.in: Likewise.
> > > Don't we already have the ability to set label alignments?  See
> > > LABEL_ALIGN.
> > 
> > The following works with -falign-labels=2:
> > 
> > --- a/gcc/asan.c
> > +++ b/gcc/asan.c
> > @@ -1524,7 +1524,7 @@ asan_emit_stack_protection (rtx base, rtx
> > pbase,
> > unsigned int alignb,
> >DECL_INITIAL (decl) = decl;
> >TREE_ASM_WRITTEN (decl) = 1;
> >TREE_ASM_WRITTEN (id) = 1;
> > -  SET_DECL_ALIGN (decl, CODE_LABEL_BOUNDARY);
> > +  SET_DECL_ALIGN (decl, (1 << LABEL_ALIGN (gen_label_rtx ())) *
> > BITS_PER_UNIT);
> >emit_move_insn (mem, expand_normal (build_fold_addr_expr
> > (decl)));
> >shadow_base = expand_binop (Pmode, lshr_optab, base,
> >   gen_int_shift_amount (Pmode,
> > ASAN_SHADOW_SHIFT),
> > 
> > In order to go this way, we would need to raise `-falign-labels=`
> > default to 2 for s390, which is not incorrect, but would
> > unnecessarily
> > clutter asm with `.align 2` before each label.  So IMHO it would be
> > nicer to simply ask the backend "what is your target's instruction
> > alignment?".
> 
> Besides that it would clutter asm with .align 2, another argument
> against using LABEL_ALIGN here is that it's semantically different
> from
> what is needed: -falign-labels value, which it returns, is specified
> by
> user for optimization purposes, whereas here we need to query the
> architecture's property.
> 
> In practical terms, if user specifies -falign-labels=4096, this would
> affect how the code is generated here. However, this would be
> completely unnecessary: we never jump to decl, its address is only
> saved for reporting.

Hi Jeff,

Could you please have another look at this one?

Best regards,
Ilya



Re: [PATCH RESEND] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

2020-12-03 Thread Ilya Leoshkevich via Gcc-patches
On Wed, 2020-12-02 at 11:42 -0700, Jeff Law wrote:
> 
> On 12/1/20 7:09 PM, Ilya Leoshkevich wrote:
> > On Tue, 2020-12-01 at 15:34 -0700, Jeff Law wrote:
> > > No strong opinions.  I think whichever is less invasive in terms
> > > of
> > > code
> > > quality is probably the way to go.  What we want to avoid is
> > > suppressing
> > > threading unnecessarily as that often leads to false positives
> > > from
> > > middle-end based warnings.  Suppressing threading can also lead
> > > to
> > > build
> > > failures in the kernel due to the way they use b_c_p.
> > I think v1 is better then.  Would you mind approving the following?
> > That's the same code as in v1, but with the improved commit message
> > and
> > comments.
> > 
> > 
> > 
> > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c)
> > build
> > with GCC 10 fails on s390 with "impossible constraint".
> > 
> > Explanation by Jeff Law:
> > 
> > ```
> > So what we have is a b_c_p at the start of an if-else
> > chain.  Subsequent
> > tests on the "true" arm of the the b_c_p test may throw us off the
> > constant path (because the constants are out of range).  Once all
> > the
> > tests are passed (it's constant and the constant is in range) the
> > true
> > arm's terminal block has a special asm that requires a constant
> > argument.   In the case where we get to the terminal block on the
> > true
> > arm, the argument to the b_c_p is used as the constant argument to
> > the
> > special asm.
> > 
> > At first glace jump threading seems to be doing the right
> > thing.  Except
> > that we end up with two paths to that terminal block with the
> > special
> > asm, one for each of the two constant arguments to the b_c_p call.
> > Naturally since that same value is used in the asm, we have to
> > introduce
> > a PHI to select between them at the head of the terminal
> > block.   Now
> > the argument in the asm is no longer constant and boom we fail.
> > ```
> > 
> > Fix by disallowing __builtin_constant_p on threading paths.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-06-03  Ilya Leoshkevich  
> > 
> > * tree-ssa-threadbackward.c
> > (thread_jumps::profitable_jump_thread_path):
> > Do not allow __builtin_constant_p on a threading path.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-06-03  Ilya Leoshkevich  
> > 
> > * gcc.target/s390/builtin-constant-p-threading.c: New test.
> OK.  I think the old forward threader has the same problem.  Which I
> think can be fixed by returning NULL from
> record_temporary_equivalences_from_stmts_at_dest when we see the
> B_C_P
> call.  Fixing that in the obvious way is pre-approved once it's gone
> through the usual testing.

Thanks!

I've committed both:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=70a62009181f66d1d1c90d3c74de38e153c96eb0
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=614aff0adf8fba5d843ec894603160151c20f0aa

Best regards,
Ilya



[PATCH] IBM Z: Build autovec-*-signaling-eq.c tests with exceptions

2020-12-02 Thread Ilya Leoshkevich via Gcc-patches
According to
https://gcc.gnu.org/pipermail/gcc/2020-November/234344.html, GCC is
allowed to perform optimizations that remove floating point traps,
since they do not affect the modeled control flow.  This interferes with
two signaling comparison tests, where (a <= b && a >= b) is turned into
(a <= b && a == b) by test_for_singularity, into ((a <= b) & (a == b))
by vectorizer and then into (a == b) eliminate_redundant_comparison.

Fix by making traps affect the control flow by turning them into
exceptions.

gcc/testsuite/ChangeLog:

2020-12-03  Ilya Leoshkevich  

* gcc.target/s390/zvector/autovec-double-signaling-eq.c: Build
with exceptions.
* gcc.target/s390/zvector/autovec-float-signaling-eq.c:
Likewise.
---
 .../gcc.target/s390/zvector/autovec-double-signaling-eq.c   | 2 +-
 .../gcc.target/s390/zvector/autovec-float-signaling-eq.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
index a8402b9f705..3645d3cc393 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
 
 #include "autovec.h"
 
diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c 
b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
index 7dd91a5e6f3..d98aa0c494e 100644
--- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
+++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-eq.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */
+/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fexceptions 
-fnon-call-exceptions" } */
 
 #include "autovec.h"
 
-- 
2.25.4



  1   2   3   >