Richard Henderson wrote: > On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > > There is one particular inefficiency I have noticed. This code: > > > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > > abort (); > > > > from atomic-compare-exchange-3.c gets compiled into: > > > > l %r3,0(%r2) > > larl %r1,v > > cs %r3,%r4,0(%r1) > > ipm %r1 > > sra %r1,28 > > st %r3,0(%r2) > > ltr %r1,%r1 > > jne .L3 > > > > which is extremely inefficient; it converts the condition code into > > an integer using the slow ipm, sra sequence, just so that it can > > convert the integer back into a condition code via ltr and branch > > on it ... > > This was caused (or perhaps abetted by) the representation of EQ > as NE ^ 1. With the subsequent truncation and zero-extend, I > think combine reached its insn limit of 3 before seeing everything > it needed to see.
Yes, combine isn't able to handle everything in one go, but it finds an intermediate insn. With the old __sync_compare_and_swap, it is then able to optimize everything away in a second step; the only reason this doesn't work any more is the intervening store. The following patch changes the builtin expander to pass a MEM oldval as-is to the back-end expander, so that the back-end can move the store to before the CC operation. With that patch I'm also seeing all the IPMs disappear. [ Well, all except for this one: > This happens because CSE notices the cbranch vs 0, and sets r116 > to zero along the path to > > 32 if (!__atomic_compare_exchange_n (&v, &expected, 0, STRONG, > __ATOMIC_RELEASE, > __ATOMIC_ACQUIRE)) > > at which point CSE decides that it would be cheaper to "re-use" > the zero already in r116 instead of load another constant 0 here. > After that, combine is ham-strung because r116 is not dead. As you point out, this does indeed fix this problem as well: > A short-term possibility is to have the CAS insns accept general_operand, > so that the 0 gets merged. ] What do you think about this solution? It has the advantage that we still get the same xor code if we actually do need the ipm ... Bye, Ulrich diff -ur gcc/builtins.c gcc.new/builtins.c --- gcc/builtins.c 2012-08-07 16:04:45.054348099 +0200 +++ gcc.new/builtins.c 2012-08-07 15:44:01.304349225 +0200 @@ -5376,6 +5376,7 @@ expect = expand_normal (CALL_EXPR_ARG (exp, 1)); expect = convert_memory_address (Pmode, expect); + expect = gen_rtx_MEM (mode, expect); desired = expand_expr_force_mode (CALL_EXPR_ARG (exp, 2), mode); weak = CALL_EXPR_ARG (exp, 3); @@ -5383,14 +5384,15 @@ if (host_integerp (weak, 0) && tree_low_cst (weak, 0) != 0) is_weak = true; - oldval = copy_to_reg (gen_rtx_MEM (mode, expect)); - + oldval = expect; if (!expand_atomic_compare_and_swap ((target == const0_rtx ? NULL : &target), &oldval, mem, oldval, desired, is_weak, success, failure)) return NULL_RTX; - emit_move_insn (gen_rtx_MEM (mode, expect), oldval); + if (oldval != expect) + emit_move_insn (expect, oldval); + return target; } diff -ur gcc/config/s390/s390.md gcc.new/config/s390/s390.md --- gcc/config/s390/s390.md 2012-08-07 16:04:54.204348621 +0200 +++ gcc.new/config/s390/s390.md 2012-08-07 16:00:21.934348628 +0200 @@ -8870,7 +8870,7 @@ (define_expand "atomic_compare_and_swap<mode>" [(match_operand:SI 0 "register_operand") ;; bool success output - (match_operand:DGPR 1 "register_operand") ;; oldval output + (match_operand:DGPR 1 "nonimmediate_operand");; oldval output (match_operand:DGPR 2 "memory_operand") ;; memory (match_operand:DGPR 3 "register_operand") ;; expected intput (match_operand:DGPR 4 "register_operand") ;; newval intput @@ -8879,9 +8879,17 @@ (match_operand:SI 7 "const_int_operand")] ;; failure model "" { - rtx cc, cmp; + rtx cc, cmp, output = operands[1]; + + if (!register_operand (output, <MODE>mode)) + output = gen_reg_rtx (<MODE>mode); + emit_insn (gen_atomic_compare_and_swap<mode>_internal - (operands[1], operands[2], operands[3], operands[4])); + (output, operands[2], operands[3], operands[4])); + + if (output != operands[1]) + emit_move_insn (operands[1], output); + cc = gen_rtx_REG (CCZ1mode, CC_REGNUM); cmp = gen_rtx_EQ (SImode, cc, const0_rtx); emit_insn (gen_cstorecc4 (operands[0], cmp, cc, const0_rtx)); -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com