Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On 04/10/2017 11:37 PM, Ulrich Weigand wrote: > Dominik Vogt wrote: > >> So, we could add a special case for const0_rtx that generates the >> LT pattern and does not rely on Combine, and get rid of the >> peephole. I'm not sure this is worthwhile thoug, because the >> peephole has other beneficial effects (as discussed), and until >> we've solved the problems preventing Combine from merging L+LTR in >> some cases, this is the best we have. What do you think? > > If we removed the peephole (for now), the patch now only touches > parts of the backend used to emit atomic instructions, so code > generation for any code that doesn't use those is guaranteed to > be unchanged. Given that we're quite late in the cycle, this > might be a good idea at this point ... > > But I don't see anything actually incorrect in the peephole, and > it might indeed be a good thing in general -- just maybe more > appropriate for the next stage1. > > Andreas, do you have an opinion on this? Yes, the change is unrelated to the rest of the patch and therefore should go into a separate patch. But I'm fine with applying both right now. The peephole surprisingly often helped code generation. There appear to be plenty of cases where the L+LTR combination could not be handled otherwise. Dominik reviewed the speccpu diffs and there were only improvements. -Andreas-
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
Dominik Vogt wrote: > So, we could add a special case for const0_rtx that generates the > LT pattern and does not rely on Combine, and get rid of the > peephole. I'm not sure this is worthwhile thoug, because the > peephole has other beneficial effects (as discussed), and until > we've solved the problems preventing Combine from merging L+LTR in > some cases, this is the best we have. What do you think? If we removed the peephole (for now), the patch now only touches parts of the backend used to emit atomic instructions, so code generation for any code that doesn't use those is guaranteed to be unchanged. Given that we're quite late in the cycle, this might be a good idea at this point ... But I don't see anything actually incorrect in the peephole, and it might indeed be a good thing in general -- just maybe more appropriate for the next stage1. Andreas, do you have an opinion on this? Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Mon, Apr 10, 2017 at 10:13:01AM +0100, Dominik Vogt wrote: > On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote: > > Dominik Vogt wrote: > > > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote: > > > > > +; Peephole to combine a load-and-test from volatile memory which > > > > > combine does > > > > > +; not do. > > > > > +(define_peephole2 > > > > > + [(set (match_operand:GPR 0 "register_operand") > > > > > + (match_operand:GPR 2 "memory_operand")) > > > > > + (set (reg CC_REGNUM) > > > > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > > > > > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > > > > > + && GENERAL_REG_P (operands[0]) > > > > > + && satisfies_constraint_T (operands[2])" > > > > > + [(parallel > > > > > +[(set (reg:CCS CC_REGNUM) > > > > > + (compare:CCS (match_dup 2) (match_dup 1))) > > > > > + (set (match_dup 0) (match_dup 2))])]) > > > > > > > > Still wondering why this is necessary. > > > > > > It's necessary vecause Combine refuses to match anything that > > > contains a volatile memory reference, using a global flag for > > > Recog. > > > > So is this specifically to match the pre-test load emitted here? > > > > + emit_move_insn (output, mem); > > + emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp))); > > > > If so, since you already know that this should always map to a > > LOAD AND TEST, could simply just emit the LT pattern here, > > instead of relying on combine to do it ... > > Well, only if the value to compare to is constant zero (which is > what Glibc does). In all other cases this won't result in > load-and-test. So, we could add a special case for const0_rtx that generates the LT pattern and does not rely on Combine, and get rid of the peephole. I'm not sure this is worthwhile thoug, because the peephole has other beneficial effects (as discussed), and until we've solved the problems preventing Combine from merging L+LTR in some cases, this is the best we have. What do you think? Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Fri, Apr 07, 2017 at 07:22:23PM +0200, Ulrich Weigand wrote: > Dominik Vogt wrote: > > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote: > > > > +; Peephole to combine a load-and-test from volatile memory which > > > > combine does > > > > +; not do. > > > > +(define_peephole2 > > > > + [(set (match_operand:GPR 0 "register_operand") > > > > + (match_operand:GPR 2 "memory_operand")) > > > > + (set (reg CC_REGNUM) > > > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > > > > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > > > > + && GENERAL_REG_P (operands[0]) > > > > + && satisfies_constraint_T (operands[2])" > > > > + [(parallel > > > > +[(set (reg:CCS CC_REGNUM) > > > > + (compare:CCS (match_dup 2) (match_dup 1))) > > > > + (set (match_dup 0) (match_dup 2))])]) > > > > > > Still wondering why this is necessary. > > > > It's necessary vecause Combine refuses to match anything that > > contains a volatile memory reference, using a global flag for > > Recog. > > So is this specifically to match the pre-test load emitted here? > > + emit_move_insn (output, mem); > + emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp))); > > If so, since you already know that this should always map to a > LOAD AND TEST, could simply just emit the LT pattern here, > instead of relying on combine to do it ... Well, only if the value to compare to is constant zero (which is what Glibc does). In all other cases this won't result in load-and-test. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
Dominik Vogt wrote: > On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote: > > > +; Peephole to combine a load-and-test from volatile memory which combine > > > does > > > +; not do. > > > +(define_peephole2 > > > + [(set (match_operand:GPR 0 "register_operand") > > > + (match_operand:GPR 2 "memory_operand")) > > > + (set (reg CC_REGNUM) > > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > > > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > > > + && GENERAL_REG_P (operands[0]) > > > + && satisfies_constraint_T (operands[2])" > > > + [(parallel > > > +[(set (reg:CCS CC_REGNUM) > > > + (compare:CCS (match_dup 2) (match_dup 1))) > > > + (set (match_dup 0) (match_dup 2))])]) > > > > Still wondering why this is necessary. > > It's necessary vecause Combine refuses to match anything that > contains a volatile memory reference, using a global flag for > Recog. So is this specifically to match the pre-test load emitted here? + emit_move_insn (output, mem); + emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp))); If so, since you already know that this should always map to a LOAD AND TEST, could simply just emit the LT pattern here, instead of relying on combine to do it ... Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Fri, Apr 07, 2017 at 04:34:44PM +0200, Ulrich Weigand wrote: > > +; Peephole to combine a load-and-test from volatile memory which combine > > does > > +; not do. > > +(define_peephole2 > > + [(set (match_operand:GPR 0 "register_operand") > > + (match_operand:GPR 2 "memory_operand")) > > + (set (reg CC_REGNUM) > > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > > + && GENERAL_REG_P (operands[0]) > > + && satisfies_constraint_T (operands[2])" > > + [(parallel > > +[(set (reg:CCS CC_REGNUM) > > + (compare:CCS (match_dup 2) (match_dup 1))) > > + (set (match_dup 0) (match_dup 2))])]) > > Still wondering why this is necessary. It's necessary vecause Combine refuses to match anything that contains a volatile memory reference, using a global flag for Recog. > > @@ -6518,13 +6533,30 @@ > >[(parallel > > [(set (match_operand:SI 0 "register_operand" "") > > (match_operator:SI 1 "s390_eqne_operator" > > - [(match_operand:CCZ1 2 "register_operand") > > + [(match_operand 2 "cc_reg_operand") > > (match_operand 3 "const0_operand")])) > > (clobber (reg:CC CC_REGNUM))])] > >"" > > - "emit_insn (gen_sne (operands[0], operands[2])); > > - if (GET_CODE (operands[1]) == EQ) > > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); > > + "machine_mode mode = GET_MODE (operands[2]); > > + if (TARGET_Z196) > > + { > > + rtx cond, ite; > > + > > + if (GET_CODE (operands[1]) == NE) > > +cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx); > > + else > > +cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx); > > + ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx); > > + emit_insn (gen_rtx_SET (operands[0], ite)); > > + } > > + else > > + { > > + if (mode != CCZ1mode) > > +FAIL; > > + emit_insn (gen_sne (operands[0], operands[2])); > > + if (GET_CODE (operands[1]) == EQ) > > +emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); > > + } > > DONE;") > > >From what I can see in the rest of the patch, none of the CS changes now > actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only > calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and > in the TARGET_Z196 case it deliberates does *not* use cstorecc4. You're right. After all the refactoring, this part of the patch has become unused. > Now, in general this improvement to cstorecc4 is of course valuable > in itself. But I think at this point it might be better to separate > this out into an independent patch (and measure its effect separately). Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
Dominik Vogt wrote: > v4: > > * Remoce CCZZ1 iterator. > * Remove duplicates of CS patterns. > * Move the skip_cs_label so that output is moved to vtarget even > if the CS instruction was not used. > * Removed leftover from "sne" (from an earlier version of the > * patch). Thanks, this looks quite good to me now. I do still have two questions: > +; Peephole to combine a load-and-test from volatile memory which combine does > +; not do. > +(define_peephole2 > + [(set (match_operand:GPR 0 "register_operand") > + (match_operand:GPR 2 "memory_operand")) > + (set (reg CC_REGNUM) > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > + && GENERAL_REG_P (operands[0]) > + && satisfies_constraint_T (operands[2])" > + [(parallel > +[(set (reg:CCS CC_REGNUM) > + (compare:CCS (match_dup 2) (match_dup 1))) > + (set (match_dup 0) (match_dup 2))])]) Still wondering why this is necessary. On the other hand, I guess it cannot hurt to have the peephole either ... > @@ -6518,13 +6533,30 @@ >[(parallel > [(set (match_operand:SI 0 "register_operand" "") > (match_operator:SI 1 "s390_eqne_operator" > - [(match_operand:CCZ1 2 "register_operand") > + [(match_operand 2 "cc_reg_operand") > (match_operand 3 "const0_operand")])) > (clobber (reg:CC CC_REGNUM))])] >"" > - "emit_insn (gen_sne (operands[0], operands[2])); > - if (GET_CODE (operands[1]) == EQ) > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); > + "machine_mode mode = GET_MODE (operands[2]); > + if (TARGET_Z196) > + { > + rtx cond, ite; > + > + if (GET_CODE (operands[1]) == NE) > + cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx); > + else > + cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx); > + ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx); > + emit_insn (gen_rtx_SET (operands[0], ite)); > + } > + else > + { > + if (mode != CCZ1mode) > + FAIL; > + emit_insn (gen_sne (operands[0], operands[2])); > + if (GET_CODE (operands[1]) == EQ) > + emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); > + } > DONE;") >From what I can see in the rest of the patch, none of the CS changes now actually *rely* on this change to cstorecc4 ... s390_expand_cs_tdsi only calls cstorecc4 on !TARGET_Z196, where the above change is a no-op, and in the TARGET_Z196 case it deliberates does *not* use cstorecc4. Now, in general this improvement to cstorecc4 is of course valuable in itself. But I think at this point it might be better to separate this out into an independent patch (and measure its effect separately). Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote: > On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote: > > The attached patch optimizes the atomic_exchange and > > atomic_compare patterns on s390 and s390x (mostly limited to > > SImode and DImode). Among general optimizaation, the changes fix > > most of the problems reported in PR 80080: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 > > > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > > biarch. New version attached. v4: * Remoce CCZZ1 iterator. * Remove duplicates of CS patterns. * Move the skip_cs_label so that output is moved to vtarget even if the CS instruction was not used. * Removed leftover from "sne" (from an earlier version of the * patch). Bootstrapped and regression tested on a zEC12 with s390 and s390x biarch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog-dv-atomic-gcc7 * s390-protos.h (s390_expand_cs_hqi): Removed. (s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes. * config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer modes as well as CCZ1mode and CCZmode. (s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new signature of s390_emit_compare_and_swap. (s390_expand_cs_hqi): Likewise, make static. (s390_expand_cs_tdsi): Generate an explicit compare before trying compare-and-swap, in some cases. (s390_expand_cs): Wrapper function. (s390_expand_atomic_exchange_tdsi): New backend specific expander for atomic_exchange. (s390_match_ccmode_set): Allow CCZmode <-> CCZ1 mode. * config/s390/s390.md (define_peephole2): New peephole to help combining the load-and-test pattern with volatile memory. ("cstorecc4"): Use load-on-condition and deal with CCZmode for TARGET_Z196. ("atomic_compare_and_swap"): Merge the patterns for small and large integers. Forbid symref memory operands. Move expander to s390.c. Require cc register. ("atomic_compare_and_swap_internal") ("*atomic_compare_and_swap_1") ("*atomic_compare_and_swapdi_2") ("*atomic_compare_and_swapsi_3"): Use s_operand to forbid symref memory operands. Remove CC mode and call s390_match_ccmode instead. ("atomic_exchange"): Allow and implement all integer modes. gcc/testsuite/ChangeLog-dv-atomic-gcc7 * gcc.target/s390/md/atomic_compare_exchange-1.c: New test. * gcc.target/s390/md/atomic_compare_exchange-1.inc: New test. * gcc.target/s390/md/atomic_exchange-1.inc: New test. >From 0dbcab9152b3d1b7c3a6e72f6d45b8eb56ab40ae Mon Sep 17 00:00:00 2001 From: Dominik VogtDate: Thu, 23 Feb 2017 17:23:11 +0100 Subject: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins. 1) Use the load-and-test instructions for atomic_exchange if the value is 0. 2) If IS_WEAK is true, compare the memory contents before a compare-and-swap and skip the CS instructions if the value is not the expected one. --- gcc/config/s390/s390-protos.h | 4 +- gcc/config/s390/s390.c | 171 ++- gcc/config/s390/s390.md| 150 + .../gcc.target/s390/md/atomic_compare_exchange-1.c | 84 ++ .../s390/md/atomic_compare_exchange-1.inc | 336 + .../gcc.target/s390/md/atomic_exchange-1.c | 309 +++ 6 files changed, 977 insertions(+), 77 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 7f06a20..3fdb320 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx); extern void s390_expand_vec_movstr (rtx, rtx, rtx); extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx); extern bool s390_expand_insv (rtx, rtx, rtx, rtx); -extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx, - rtx, rtx, bool); +extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool); +extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx); extern void s390_expand_atomic (machine_mode, enum rtx_code, rtx, rtx, rtx, bool); extern void s390_expand_tbegin (rtx, rtx, rtx, bool); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 2cb8947..fe16647 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -1306,6 +1306,7 @@ s390_match_ccmode_set (rtx set, machine_mode req_mode) set_mode = GET_MODE
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
I wrote (incorrectly): > >[(parallel > > [(set (match_operand:SI 0 "register_operand" "") > > (match_operator:SI 1 "s390_eqne_operator" > > - [(match_operand:CCZ1 2 "register_operand") > > + [(match_operand 2 "cc_reg_operand") > > (match_operand 3 "const0_operand")])) > > (clobber (reg:CC CC_REGNUM))])] > >"" > > - "emit_insn (gen_sne (operands[0], operands[2])); > > - if (GET_CODE (operands[1]) == EQ) > > - emit_insn (gen_xorsi3 (operands[0], operands[0], const1_rtx)); > > + "machine_mode mode = GET_MODE (operands[2]); > > + if (TARGET_Z196) > > + { > > + rtx cond, ite; > > + > > + if (GET_CODE (operands[1]) == NE) > > +cond = gen_rtx_NE (VOIDmode, operands[2], const0_rtx); > > + else > > +cond = gen_rtx_EQ (VOIDmode, operands[2], const0_rtx); > > I guess as a result cond is now always the same as operands[1] and > could be just taken from there? This is wrong -- I didn't notice the mode changes (in the cstore pattern, the mode on the operator is SImode, but of the if_then_else we want VOIDmode. > > + ite = gen_rtx_IF_THEN_ELSE (SImode, cond, const1_rtx, const0_rtx); > > + emit_insn (gen_rtx_SET (operands[0], ite)); > > Also, since you're just emitting RTL directly, maybe you could simply use > the expander pattern above to do so (and not use emit_insn followed > by DONE in this case?) Therefore this doesn't work either. Sorry for the confusion. Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
Dominik Vogt wrote: > > v3: > > > > * Remove sne* patterns. > > * Move alignment check from s390_expand_cs to s390.md. > > * Use s_operand instead of memory_nosymref_operand. > > * Remove memory_nosymref_operand. > > * Allow any CC-mode in cstorecc4 for TARGET_Z196. > > * Fix EQ with TARGET_Z196 in cstorecc4. > > * Duplicate CS patterns for CCZmode. > > > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > > biarch. > s390_emit_compare_and_swap (enum rtx_code code, rtx old, rtx mem, > - rtx cmp, rtx new_rtx) > + rtx cmp, rtx new_rtx, machine_mode ccmode) > { > - emit_insn (gen_atomic_compare_and_swapsi_internal (old, mem, cmp, > new_rtx)); > - return s390_emit_compare (code, gen_rtx_REG (CCZ1mode, CC_REGNUM), > - const0_rtx); > + switch (GET_MODE (mem)) > +{ > +case SImode: > + if (ccmode == CCZ1mode) > + emit_insn (gen_atomic_compare_and_swapsiccz1_internal (old, mem, cmp, > +new_rtx)); > + else > + emit_insn (gen_atomic_compare_and_swapsiccz_internal (old, mem, cmp, > +new_rtx)); > + break; > +case DImode: > + if (ccmode == CCZ1mode) > + emit_insn (gen_atomic_compare_and_swapdiccz1_internal (old, mem, cmp, > +new_rtx)); > + else > + emit_insn (gen_atomic_compare_and_swapdiccz_internal (old, mem, cmp, > + new_rtx)); > + break; > +case TImode: > + if (ccmode == CCZ1mode) > + emit_insn (gen_atomic_compare_and_swapticcz1_internal (old, mem, cmp, > +new_rtx)); > + else > + emit_insn (gen_atomic_compare_and_swapticcz_internal (old, mem, cmp, > + new_rtx)); > + break; These expanders don't really do anything different depending on the mode of the accessed word (SI/DI/TImode), so this seems like a bit of unncessary duplication. The original code was correct in always calling the SImode variant, even if this looks a bit odd. Maybe a better fix is to just remove the mode from this expander. > + if (TARGET_Z196 > + && (mode == SImode || mode == DImode)) > +do_const_opt = (is_weak && CONST_INT_P (cmp)); > + > + if (do_const_opt) > +{ > + const int very_unlikely = REG_BR_PROB_BASE / 100 - 1; > + rtx cc = gen_rtx_REG (CCZmode, CC_REGNUM); > + > + skip_cs_label = gen_label_rtx (); > + emit_move_insn (output, mem); > + emit_move_insn (btarget, const0_rtx); > + emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (CCZmode, output, cmp))); > + s390_emit_jump (skip_cs_label, gen_rtx_NE (VOIDmode, cc, const0_rtx)); > + add_int_reg_note (get_last_insn (), REG_BR_PROB, very_unlikely); > + /* If the jump is not taken, OUTPUT is the expected value. */ > + cmp = output; > + /* Reload newval to a register manually, *after* the compare and jump > + above. Otherwise Reload might place it before the jump. */ > +} > + else > +cmp = force_reg (mode, cmp); > + new_rtx = force_reg (mode, new_rtx); > + s390_emit_compare_and_swap (EQ, output, mem, cmp, new_rtx, > + (do_const_opt) ? CCZmode : CCZ1mode); > + > + /* We deliberately accept non-register operands in the predicate > + to ensure the write back to the output operand happens *before* > + the store-flags code below. This makes it easier for combine > + to merge the store-flags code with a potential test-and-branch > + pattern following (immediately!) afterwards. */ > + if (output != vtarget) > +emit_move_insn (vtarget, output); > + > + if (skip_cs_label != NULL) > + emit_label (skip_cs_label); So if do_const_opt is true, but output != vtarget, the code above will write to output, but this is then never moved to vtarget. This looks incorrect. > + if (TARGET_Z196 && do_const_opt) do_const_opt seems to always imply TARGET_Z196. > +; Peephole to combine a load-and-test from volatile memory which combine does > +; not do. > +(define_peephole2 > + [(set (match_operand:GPR 0 "register_operand") > + (match_operand:GPR 2 "memory_operand")) > + (set (reg CC_REGNUM) > + (compare (match_dup 0) (match_operand:GPR 1 "const0_operand")))] > + "s390_match_ccmode(insn, CCSmode) && TARGET_EXTIMM > + && GENERAL_REG_P (operands[0]) > + && satisfies_constraint_T (operands[2])" > + [(parallel > +[(set (reg:CCS CC_REGNUM) > + (compare:CCS (match_dup 2) (match_dup 1))) > + (set (match_dup 0) (match_dup 2))])]) We should really try to understand why this isn't done earlier and fix the problem there ... >[(parallel > [(set (match_operand:SI 0 "register_operand" "") >
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Wed, Apr 05, 2017 at 02:52:00PM +0100, Dominik Vogt wrote: > On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote: > > The attached patch optimizes the atomic_exchange and > > atomic_compare patterns on s390 and s390x (mostly limited to > > SImode and DImode). Among general optimizaation, the changes fix > > most of the problems reported in PR 80080: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 > > > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > > biarch. > > New version attached. This time it really is. :-) > v3: > > * Remove sne* patterns. > * Move alignment check from s390_expand_cs to s390.md. > * Use s_operand instead of memory_nosymref_operand. > * Remove memory_nosymref_operand. > * Allow any CC-mode in cstorecc4 for TARGET_Z196. > * Fix EQ with TARGET_Z196 in cstorecc4. > * Duplicate CS patterns for CCZmode. > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > biarch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog-dv-atomic-gcc7 * s390-protos.h (s390_expand_cs_hqi): Removed. (s390_expand_cs, s390_expand_atomic_exchange_tdsi): New prototypes. * config/s390/s390.c (s390_emit_compare_and_swap): Handle all integer modes as well as CCZ1mode and CCZmode. (s390_expand_atomic_exchange_tdsi, s390_expand_atomic): Adapt to new signature of s390_emit_compare_and_swap. (s390_expand_cs_hqi): Likewise, make static. (s390_expand_cs_tdsi): Generate an explicit compare before trying compare-and-swap, in some cases. (s390_expand_cs): Wrapper function. (s390_expand_atomic_exchange_tdsi): New backend specific expander for atomic_exchange. * config/s390/s390.md (CCZZ1): New mode iterator for compare-and-swap. (define_peephole2): New peephole to help combining the load-and-test pattern with volatile memory. ("cstorecc4"): Use load-on-condition and deal with CCZmode for TARGET_Z196. ("atomic_compare_and_swap"): Merge the patterns for small and large integers. Forbid symref memory operands. Move expander to s390.c. Require cc register. ("atomic_compare_and_swap_internal") ("*atomic_compare_and_swap_1") ("*atomic_compare_and_swapdi_2") ("*atomic_compare_and_swapsi_3"): Duplicate for CCZ1mode and CCZmode. Use s_operand to forbid symref memory operands. ("atomic_exchange"): Allow and implement all integer modes. gcc/testsuite/ChangeLog-dv-atomic-gcc7 * gcc.target/s390/md/atomic_compare_exchange-1.c: New test. * gcc.target/s390/md/atomic_compare_exchange-1.inc: New test. * gcc.target/s390/md/atomic_exchange-1.inc: New test. >From d5e4c5785eaee076112d8493b5104db6689fe209 Mon Sep 17 00:00:00 2001 From: Dominik VogtDate: Thu, 23 Feb 2017 17:23:11 +0100 Subject: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins. 1) Use the load-and-test instructions for atomic_exchange if the value is 0. 2) If IS_WEAK is true, compare the memory contents before a compare-and-swap and skip the CS instructions if the value is not the expected one. --- gcc/config/s390/s390-protos.h | 4 +- gcc/config/s390/s390.c | 176 ++- gcc/config/s390/s390.md| 150 - .../gcc.target/s390/md/atomic_compare_exchange-1.c | 84 ++ .../s390/md/atomic_compare_exchange-1.inc | 336 + .../gcc.target/s390/md/atomic_exchange-1.c | 309 +++ 6 files changed, 980 insertions(+), 79 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.c create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_compare_exchange-1.inc create mode 100644 gcc/testsuite/gcc.target/s390/md/atomic_exchange-1.c diff --git a/gcc/config/s390/s390-protos.h b/gcc/config/s390/s390-protos.h index 7f06a20..3fdb320 100644 --- a/gcc/config/s390/s390-protos.h +++ b/gcc/config/s390/s390-protos.h @@ -112,8 +112,8 @@ extern void s390_expand_vec_strlen (rtx, rtx, rtx); extern void s390_expand_vec_movstr (rtx, rtx, rtx); extern bool s390_expand_addcc (enum rtx_code, rtx, rtx, rtx, rtx, rtx); extern bool s390_expand_insv (rtx, rtx, rtx, rtx); -extern void s390_expand_cs_hqi (machine_mode, rtx, rtx, rtx, - rtx, rtx, bool); +extern void s390_expand_cs (machine_mode, rtx, rtx, rtx, rtx, rtx, bool); +extern void s390_expand_atomic_exchange_tdsi (rtx, rtx, rtx); extern void s390_expand_atomic (machine_mode, enum rtx_code, rtx, rtx, rtx, bool); extern void s390_expand_tbegin (rtx, rtx, rtx, bool); diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index 2cb8947..b1d6088 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -1762,11 +1762,40 @@
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
Dominik Vogt wrote: > On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote: > > The attached patch optimizes the atomic_exchange and > > atomic_compare patterns on s390 and s390x (mostly limited to > > SImode and DImode). Among general optimizaation, the changes fix > > most of the problems reported in PR 80080: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 > > > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > > biarch. > > New version attached. No, it isn't :-) Bye, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [PATCH] S/390: Optimize atomic_compare_exchange and atomic_compare builtins.
On Mon, Mar 27, 2017 at 09:27:35PM +0100, Dominik Vogt wrote: > The attached patch optimizes the atomic_exchange and > atomic_compare patterns on s390 and s390x (mostly limited to > SImode and DImode). Among general optimizaation, the changes fix > most of the problems reported in PR 80080: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80080 > > Bootstrapped and regression tested on a zEC12 with s390 and s390x > biarch. New version attached. v3: * Remove sne* patterns. * Move alignment check from s390_expand_cs to s390.md. * Use s_operand instead of memory_nosymref_operand. * Remove memory_nosymref_operand. * Allow any CC-mode in cstorecc4 for TARGET_Z196. * Fix EQ with TARGET_Z196 in cstorecc4. * Duplicate CS patterns for CCZmode. Bootstrapped and regression tested on a zEC12 with s390 and s390x biarch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany