Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-11 Thread Andreas Krebbel
On 04/10/2017 11:37 PM, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> 
>> So, we could add a special case for const0_rtx that generates the
>> LT pattern and does not rely on Combine, and get rid of the
>> peephole.  I'm not sure this is worthwhile thoug, because the
>> peephole has other beneficial effects (as discussed), and until
>> we've solved the problems preventing Combine from merging L+LTR in
>> some cases, this is the best we have.  What do you think?
> 
> If we removed the peephole (for now), the patch now only touches
> parts of the backend used to emit atomic instructions, so code
> generation for any code that doesn't use those is guaranteed to
> be unchanged.  Given that we're quite late in the cycle, this
> might be a good idea at this point ...
> 
> But I don't see anything actually incorrect in the peephole, and
> it might indeed be a good thing in general -- just maybe more
> appropriate for the next stage1. 
> 
> Andreas, do you have an opinion on this?

Yes, the change is unrelated to the rest of the patch and therefore should go 
into a separate patch.
But I'm fine with applying both right now.
The peephole surprisingly often helped code generation.  There appear to be 
plenty of cases where
the L+LTR combination could not be handled otherwise. Dominik reviewed the 
speccpu diffs and there
were only improvements.

-Andreas-



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-10 Thread Ulrich Weigand
Dominik Vogt wrote:

> So, we could add a special case for const0_rtx that generates the
> LT pattern and does not rely on Combine, and get rid of the
> peephole.  I'm not sure this is worthwhile thoug, because the
> peephole has other beneficial effects (as discussed), and until
> we've solved the problems preventing Combine from merging L+LTR in
> some cases, this is the best we have.  What do you think?

If we removed the peephole (for now), the patch now only touches
parts of the backend used to emit atomic instructions, so code
generation for any code that doesn't use those is guaranteed to
be unchanged.  Given that we're quite late in the cycle, this
might be a good idea at this point ...

But I don't see anything actually incorrect in the peephole, and
it might indeed be a good thing in general -- just maybe more
appropriate for the next stage1. 

Andreas, do you have an opinion on this?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-10 Thread Dominik Vogt
On Mon, Apr 10, 2017 at 10:13:01AM +0100, Dominik Vogt wrote:
> On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote:
> > Dominik Vogt wrote:
> > > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > > > +; Peephole to combine a load-and-test from volatile memory which 
> > > > > combine does
> > > > > +; not do.
> > > > > +(define_peephole2
> > > > > +  [(set (match_operand:GPR 0 "register_operand")
> > > > > + (match_operand:GPR 2 "memory_operand"))
> > > > > +   (set (reg CC_REGNUM)
> > > > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > > > +   && GENERAL_REG_P (operands[0])
> > > > > +   && satisfies_constraint_T (operands[2])"
> > > > > +  [(parallel
> > > > > +[(set (reg:CCS CC_REGNUM)
> > > > > +   (compare:CCS (match_dup 2) (match_dup 1)))
> > > > > + (set (match_dup 0) (match_dup 2))])])
> > > > 
> > > > Still wondering why this is necessary.
> > > 
> > > It's necessary vecause Combine refuses to match anything that
> > > contains a volatile memory reference, using a global flag for
> > > Recog.
> > 
> > So is this specifically to match the pre-test load emitted here?
> > 
> > +  emit_move_insn (output, mem);
> > +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> > 
> > If so, since you already know that this should always map to a
> > LOAD AND TEST, could simply just emit the LT pattern here,
> > instead of relying on combine to do it ...
> 
> Well, only if the value to compare to is constant zero (which is
> what Glibc does).  In all other cases this won't result in
> load-and-test.

So, we could add a special case for const0_rtx that generates the
LT pattern and does not rely on Combine, and get rid of the
peephole.  I'm not sure this is worthwhile thoug, because the
peephole has other beneficial effects (as discussed), and until
we've solved the problems preventing Combine from merging L+LTR in
some cases, this is the best we have.  What do you think?

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-10 Thread Dominik Vogt
On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote:
> Dominik Vogt wrote:
> > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > > +; Peephole to combine a load-and-test from volatile memory which 
> > > > combine does
> > > > +; not do.
> > > > +(define_peephole2
> > > > +  [(set (match_operand:GPR 0 "register_operand")
> > > > +   (match_operand:GPR 2 "memory_operand"))
> > > > +   (set (reg CC_REGNUM)
> > > > +   (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > > +   && GENERAL_REG_P (operands[0])
> > > > +   && satisfies_constraint_T (operands[2])"
> > > > +  [(parallel
> > > > +[(set (reg:CCS CC_REGNUM)
> > > > + (compare:CCS (match_dup 2) (match_dup 1)))
> > > > + (set (match_dup 0) (match_dup 2))])])
> > > 
> > > Still wondering why this is necessary.
> > 
> > It's necessary vecause Combine refuses to match anything that
> > contains a volatile memory reference, using a global flag for
> > Recog.
> 
> So is this specifically to match the pre-test load emitted here?
> 
> +  emit_move_insn (output, mem);
> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> 
> If so, since you already know that this should always map to a
> LOAD AND TEST, could simply just emit the LT pattern here,
> instead of relying on combine to do it ...

Well, only if the value to compare to is constant zero (which is
what Glibc does).  In all other cases this won't result in
load-and-test.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Ulrich Weigand
Dominik Vogt wrote:
> On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > > +; Peephole to combine a load-and-test from volatile memory which combine 
> > > does
> > > +; not do.
> > > +(define_peephole2
> > > +  [(set (match_operand:GPR 0 "register_operand")
> > > + (match_operand:GPR 2 "memory_operand"))
> > > +   (set (reg CC_REGNUM)
> > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > > +   && GENERAL_REG_P (operands[0])
> > > +   && satisfies_constraint_T (operands[2])"
> > > +  [(parallel
> > > +[(set (reg:CCS CC_REGNUM)
> > > +   (compare:CCS (match_dup 2) (match_dup 1)))
> > > + (set (match_dup 0) (match_dup 2))])])
> > 
> > Still wondering why this is necessary.
> 
> It's necessary vecause Combine refuses to match anything that
> contains a volatile memory reference, using a global flag for
> Recog.

So is this specifically to match the pre-test load emitted here?

+  emit_move_insn (output, mem);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));

If so, since you already know that this should always map to a
LOAD AND TEST, could simply just emit the LT pattern here,
instead of relying on combine to do it ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Dominik Vogt
On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote:
> > +; Peephole to combine a load-and-test from volatile memory which combine 
> > does
> > +; not do.
> > +(define_peephole2
> > +  [(set (match_operand:GPR 0 "register_operand")
> > +   (match_operand:GPR 2 "memory_operand"))
> > +   (set (reg CC_REGNUM)
> > +   (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> > +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> > +   && GENERAL_REG_P (operands[0])
> > +   && satisfies_constraint_T (operands[2])"
> > +  [(parallel
> > +[(set (reg:CCS CC_REGNUM)
> > + (compare:CCS (match_dup 2) (match_dup 1)))
> > + (set (match_dup 0) (match_dup 2))])])
> 
> Still wondering why this is necessary.

It's necessary vecause Combine refuses to match anything that
contains a volatile memory reference, using a global flag for
Recog.

> > @@ -6518,13 +6533,30 @@
> >[(parallel
> >  [(set (match_operand:SI 0 "register_operand" "")
> >   (match_operator:SI 1 "s390_eqne_operator"
> > -   [(match_operand:CCZ1 2 "register_operand")
> > +   [(match_operand 2 "cc_reg_operand")
> > (match_operand 3 "const0_operand")]))
> >   (clobber (reg:CC CC_REGNUM))])]
> >""
> > -  "emit_insn (gen_sne (operands[0], operands[2]));
> > -   if (GET_CODE (operands[1]) == EQ)
> > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +  "machine_mode mode = GET_MODE (operands[2]);
> > +   if (TARGET_Z196)
> > + {
> > +   rtx cond, ite;
> > +
> > +   if (GET_CODE (operands[1]) == NE)
> > +cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> > +   else
> > +cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> > +   ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> > +   emit_insn (gen_rtx_SET (operands[0], ite));
> > + }
> > +   else
> > + {
> > +   if (mode != CCZ1mode)
> > +FAIL;
> > +   emit_insn (gen_sne (operands[0], operands[2]));
> > +   if (GET_CODE (operands[1]) == EQ)
> > +emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > + }
> > DONE;")
> 
> >From what I can see in the rest of the patch, none of the CS changes now
> actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only
> calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and
> in the TARGET_Z196 case it deliberates does *not* use cstorecc4.

You're right.  After all the refactoring, this part of the patch
has become unused.

> Now, in general this improvement to cstorecc4 is of course valuable
> in itself.  But I think at this point it might be better to separate
> this out into an independent patch (and measure its effect separately).

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Ulrich Weigand
Dominik Vogt wrote:

> v4:
> 
>   * Remoce CCZZ1 iterator. 
>   * Remove duplicates of CS patterns. 
>   * Move the skip_cs_label so that output is moved to vtarget even 
> if the CS instruction was not used. 
>   * Removed leftover from "sne" (from an earlier version of the
>   * patch). 

Thanks, this looks quite good to me now.  I do still have two questions:

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> + (match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +[(set (reg:CCS CC_REGNUM)
> +   (compare:CCS (match_dup 2) (match_dup 1)))
> + (set (match_dup 0) (match_dup 2))])])

Still wondering why this is necessary.  On the other hand, I guess it
cannot hurt to have the peephole either ...

> @@ -6518,13 +6533,30 @@
>[(parallel
>  [(set (match_operand:SI 0 "register_operand" "")
> (match_operator:SI 1 "s390_eqne_operator"
> -   [(match_operand:CCZ1 2 "register_operand")
> +   [(match_operand 2 "cc_reg_operand")
>   (match_operand 3 "const0_operand")]))
>   (clobber (reg:CC CC_REGNUM))])]
>""
> -  "emit_insn (gen_sne (operands[0], operands[2]));
> -   if (GET_CODE (operands[1]) == EQ)
> - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> +  "machine_mode mode = GET_MODE (operands[2]);
> +   if (TARGET_Z196)
> + {
> +   rtx cond, ite;
> +
> +   if (GET_CODE (operands[1]) == NE)
> +  cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> +   else
> +  cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> +   ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> +   emit_insn (gen_rtx_SET (operands[0], ite));
> + }
> +   else
> + {
> +   if (mode != CCZ1mode)
> +  FAIL;
> +   emit_insn (gen_sne (operands[0], operands[2]));
> +   if (GET_CODE (operands[1]) == EQ)
> +  emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> + }
> DONE;")

>From what I can see in the rest of the patch, none of the CS changes now
actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only
calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and
in the TARGET_Z196 case it deliberates does *not* use cstorecc4.

Now, in general this improvement to cstorecc4 is of course valuable
in itself.  But I think at this point it might be better to separate
this out into an independent patch (and measure its effect separately).

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-07 Thread Dominik Vogt
On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.

New version attached.

v4:

  * Remoce CCZZ1 iterator. 
  * Remove duplicates of CS patterns. 
  * Move the skip_cs_label so that output is moved to vtarget even 
if the CS instruction was not used. 
  * Removed leftover from "sne" (from an earlier version of the
  * patch). 

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog-dv-atomic-gcc7

* s390-protos.h (s390_expand_cs_hqi): Removed.
(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
modes as well as CCZ1mode and CCZmode.
(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
signature of s390_emit_compare_and_swap.
(s390_expand_cs_hqi): Likewise, make static.
(s390_expand_cs_tdsi): Generate an explicit compare before trying
compare-and-swap, in some cases.
(s390_expand_cs): Wrapper function.
(s390_expand_atomic_exchange_tdsi): New backend specific expander for
atomic_exchange.
(s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode.
* config/s390/s390.md (define_peephole2): New peephole to help
combining the load-and-test pattern with volatile memory.
("cstorecc4"): Use load-on-condition and deal with CCZmode for
TARGET_Z196.
("atomic_compare_and_swap"): Merge the patterns for small and
large integers.  Forbid symref memory operands.  Move expander to
s390.c.  Require cc register.
("atomic_compare_and_swap_internal")
("*atomic_compare_and_swap_1")
("*atomic_compare_and_swapdi_2")
("*atomic_compare_and_swapsi_3"): Use s_operand to forbid
symref memory operands.  Remove CC mode and call s390_match_ccmode
instead.
("atomic_exchange"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
* gcc.target/s390/md/atomic_exchange-1.inc: New test.
>From 0dbcab9152b3d1b7c3a6e72f6d45b8eb56ab40ae Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/s390-protos.h  |   4 +-
 gcc/config/s390/s390.c | 171 ++-
 gcc/config/s390/s390.md| 150 +
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++
 .../s390/md/atomic_compare_exchange-1.inc  | 336 +
 .../gcc.target/s390/md/atomic_exchange-1.c | 309 +++
 6 files changed, 977 insertions(+), 77 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..3fdb320 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-   rtx, rtx, bool);
+extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 2cb8947..fe16647 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1306,6 +1306,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode)
   set_mode = GET_MODE 

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-06 Thread Ulrich Weigand
I wrote (incorrectly):

> >[(parallel
> >  [(set (match_operand:SI 0 "register_operand" "")
> >   (match_operator:SI 1 "s390_eqne_operator"
> > -   [(match_operand:CCZ1 2 "register_operand")
> > +   [(match_operand 2 "cc_reg_operand")
> > (match_operand 3 "const0_operand")]))
> >   (clobber (reg:CC CC_REGNUM))])]
> >""
> > -  "emit_insn (gen_sne (operands[0], operands[2]));
> > -   if (GET_CODE (operands[1]) == EQ)
> > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx));
> > +  "machine_mode mode = GET_MODE (operands[2]);
> > +   if (TARGET_Z196)
> > + {
> > +   rtx cond, ite;
> > +
> > +   if (GET_CODE (operands[1]) == NE)
> > +cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx);
> > +   else
> > +cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx);
> 
> I guess as a result cond is now always the same as operands[1] and
> could be just taken from there?

This is wrong -- I didn't notice the mode changes (in the cstore
pattern, the mode on the operator is SImode, but of the if_then_else
we want VOIDmode.

> > +   ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx);
> > +   emit_insn (gen_rtx_SET (operands[0], ite));
> 
> Also, since you're just emitting RTL directly, maybe you could simply use
> the expander pattern above to do so (and not use emit_insn followed
> by DONE in this case?)

Therefore this doesn't work either.

Sorry for the confusion.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-06 Thread Ulrich Weigand
Dominik Vogt wrote:

> > v3:
> > 
> >   * Remove sne* patterns.
> >   * Move alignment check from s390_expand_cs to s390.md.
> >   * Use s_operand instead of memory_nosymref_operand.
> >   * Remove memory_nosymref_operand.
> >   * Allow any CC-mode in cstorecc4 for TARGET_Z196.
> >   * Fix EQ with TARGET_Z196 in cstorecc4.
> >   * Duplicate CS patterns for CCZmode.
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.

>  s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem,
> - rtx cmp, rtx new_rtx)
> + rtx cmp, rtx new_rtx, machine_mode ccmode)
>  {
> -  emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, 
> new_rtx));
> -  return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM),
> - const0_rtx);
> +  switch (GET_MODE (mem))
> +{
> +case SImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapsiccz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapsiccz_internal (old, mem, cmp,
> +new_rtx));
> +  break;
> +case DImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapdiccz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapdiccz_internal (old, mem, cmp,
> +   new_rtx));
> +  break;
> +case TImode:
> +  if (ccmode == CCZ1mode)
> + emit_insn (gen_atomic_compare_and_swapticcz1_internal (old, mem, cmp,
> +new_rtx));
> +  else
> + emit_insn (gen_atomic_compare_and_swapticcz_internal (old, mem, cmp,
> +   new_rtx));
> +  break;

These expanders don't really do anything different depending on the
mode of the accessed word (SI/DI/TImode), so this seems like a bit of
unncessary duplication.  The original code was correct in always
calling the SImode variant, even if this looks a bit odd.  Maybe a
better fix is to just remove the mode from this expander.

> +  if (TARGET_Z196
> +  && (mode == SImode || mode == DImode))
> +do_const_opt = (is_weak && CONST_INT_P (cmp));
> +
> +  if (do_const_opt)
> +{
> +  const int very_unlikely = REG_BR_PROB_BASE / 100 - 1;
> +  rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM);
> +
> +  skip_cs_label = gen_label_rtx ();
> +  emit_move_insn (output, mem);
> +  emit_move_insn (btarget, const0_rtx);
> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp)));
> +  s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx));
> +  add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely);
> +  /* If the jump is not taken, OUTPUT is the expected value.  */
> +  cmp = output;
> +  /* Reload newval to a register manually, *after* the compare and jump
> +  above.  Otherwise Reload might place it before the jump.  */
> +}
> +  else
> +cmp = force_reg (mode, cmp);
> +  new_rtx = force_reg (mode, new_rtx);
> +  s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx,
> +   (do_const_opt) ? CCZmode : CCZ1mode);
> +
> +  /* We deliberately accept non-register operands in the predicate
> + to ensure the write back to the output operand happens *before*
> + the store-flags code below.  This makes it easier for combine
> + to merge the store-flags code with a potential test-and-branch
> + pattern following (immediately!) afterwards.  */
> +  if (output != vtarget)
> +emit_move_insn (vtarget, output);
> +
> +  if (skip_cs_label != NULL)
> +  emit_label (skip_cs_label);

So if do_const_opt is true, but output != vtarget, the code above will
write to output, but this is then never moved to vtarget.  This looks
incorrect.

> +  if (TARGET_Z196 && do_const_opt)

do_const_opt seems to always imply TARGET_Z196.

> +; Peephole to combine a load-and-test from volatile memory which combine does
> +; not do.
> +(define_peephole2
> +  [(set (match_operand:GPR 0 "register_operand")
> + (match_operand:GPR 2 "memory_operand"))
> +   (set (reg CC_REGNUM)
> + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))]
> +  "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM
> +   && GENERAL_REG_P (operands[0])
> +   && satisfies_constraint_T (operands[2])"
> +  [(parallel
> +[(set (reg:CCS CC_REGNUM)
> +   (compare:CCS (match_dup 2) (match_dup 1)))
> + (set (match_dup 0) (match_dup 2))])])

We should really try to understand why this isn't done earlier and
fix the problem there ...

>[(parallel
>  [(set (match_operand:SI 0 "register_operand" "")
> 

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-06 Thread Dominik Vogt
On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> New version attached.

This time it really is.  :-)

> v3:
> 
>   * Remove sne* patterns.
>   * Move alignment check from s390_expand_cs to s390.md.
>   * Use s_operand instead of memory_nosymref_operand.
>   * Remove memory_nosymref_operand.
>   * Allow any CC-mode in cstorecc4 for TARGET_Z196.
>   * Fix EQ with TARGET_Z196 in cstorecc4.
>   * Duplicate CS patterns for CCZmode.
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog-dv-atomic-gcc7

* s390-protos.h (s390_expand_cs_hqi): Removed.
(s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes.
* config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer
modes as well as CCZ1mode and CCZmode.
(s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new
signature of s390_emit_compare_and_swap.
(s390_expand_cs_hqi): Likewise, make static.
(s390_expand_cs_tdsi): Generate an explicit compare before trying
compare-and-swap, in some cases.
(s390_expand_cs): Wrapper function.
(s390_expand_atomic_exchange_tdsi): New backend specific expander for
atomic_exchange.
* config/s390/s390.md (CCZZ1): New mode iterator for compare-and-swap.
(define_peephole2): New peephole to help combining the load-and-test
pattern with volatile memory.
("cstorecc4"): Use load-on-condition and deal with CCZmode for
TARGET_Z196.
("atomic_compare_and_swap"): Merge the patterns for small and
large integers.  Forbid symref memory operands.  Move expander to
s390.c.  Require cc register.
("atomic_compare_and_swap_internal")
("*atomic_compare_and_swap_1")
("*atomic_compare_and_swapdi_2")
("*atomic_compare_and_swapsi_3"): Duplicate for CCZ1mode and
CCZmode.  Use s_operand to forbid symref memory operands.
("atomic_exchange"): Allow and implement all integer modes.
gcc/testsuite/ChangeLog-dv-atomic-gcc7

* gcc.target/s390/md/atomic_compare_exchange-1.c: New test.
* gcc.target/s390/md/atomic_compare_exchange-1.inc: New test.
* gcc.target/s390/md/atomic_exchange-1.inc: New test.
>From d5e4c5785eaee076112d8493b5104db6689fe209 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 23 Feb 2017 17:23:11 +0100
Subject: [PATCH] S/390: Optimize atomic_compare_exchange and
 atomic_compare builtins.

1) Use the load-and-test instructions for atomic_exchange if the value is 0.
2) If IS_WEAK is true, compare the memory contents before a compare-and-swap
   and skip the CS instructions if the value is not the expected one.
---
 gcc/config/s390/s390-protos.h  |   4 +-
 gcc/config/s390/s390.c | 176 ++-
 gcc/config/s390/s390.md| 150 -
 .../gcc.target/s390/md/atomic_compare_exchange-1.c |  84 ++
 .../s390/md/atomic_compare_exchange-1.inc  | 336 +
 .../gcc.target/s390/md/atomic_exchange-1.c | 309 +++
 6 files changed, 980 insertions(+), 79 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc
 create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c

diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h
index 7f06a20..3fdb320 100644
--- a/gcc/config/s390/s390-protos.h
+++ b/gcc/config/s390/s390-protos.h
@@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx);
 extern void s390_expand_vec_movstr (rtx, rtx, rtx);
 extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx);
 extern bool s390_expand_insv (rtx, rtx, rtx, rtx);
-extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx,
-   rtx, rtx, bool);
+extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx);
 extern void s390_expand_atomic (machine_mode, enum rtx_code,
rtx, rtx, rtx, bool);
 extern void s390_expand_tbegin (rtx, rtx, rtx, bool);
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 2cb8947..b1d6088 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -1762,11 +1762,40 @@ 

Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-05 Thread Ulrich Weigand
Dominik Vogt wrote:
> On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> > The attached patch optimizes the atomic_exchange and
> > atomic_compare patterns on s390 and s390x (mostly limited to
> > SImode and DImode).  Among general optimizaation, the changes fix
> > most of the problems reported in PR 80080:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> > 
> > Bootstrapped and regression tested on a zEC12 with s390 and s390x
> > biarch.
> 
> New version attached.

No, it isn't :-)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.

2017-04-05 Thread Dominik Vogt
On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote:
> The attached patch optimizes the atomic_exchange and
> atomic_compare patterns on s390 and s390x (mostly limited to
> SImode and DImode).  Among general optimizaation, the changes fix
> most of the problems reported in PR 80080:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080
> 
> Bootstrapped and regression tested on a zEC12 with s390 and s390x
> biarch.

New version attached.

v3:

  * Remove sne* patterns.
  * Move alignment check from s390_expand_cs to s390.md.
  * Use s_operand instead of memory_nosymref_operand.
  * Remove memory_nosymref_operand.
  * Allow any CC-mode in cstorecc4 for TARGET_Z196.
  * Fix EQ with TARGET_Z196 in cstorecc4.
  * Duplicate CS patterns for CCZmode.

Bootstrapped and regression tested on a zEC12 with s390 and s390x
biarch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany