Richard Henderson wrote:
Some more comments on this patch. > +; Different from movdi_31 in that we have no splitters. > +(define_insn "atomic_loaddi_1" > + [(set (match_operand:DI 0 "register_operand" "=d,d,!*f,!*f") > + (unspec:DI [(match_operand:DI 1 "s_operand" "Q,S,Q,m")] The constraint line doesn't look right. ld(y) accept base+index, so they should get R and T constraints, respectively. In fact, I wouldn't use "s_operand" as a predicate, since this excludes valid addresses for those cases, and since in general I've found it to be preferable to have reload fix up addresses. So I'd just use "memory_operand" here, and actually in all the other patterns as well ... (This also simplifes the expander.) > + UNSPEC_MOVA))] > + "!TARGET_ZARCH" > + "@ > + lm\t%0,%M0,%S1 > + lmy\t%0,%M0,%S1 > + ld\t%0,%1 > + ldy\t%0,%1" > + [(set_attr "op_type" "RS,RSY,RS,RSY") > + (set_attr "type" "lm,lm,floaddf,floaddf")]) > + > +(define_insn "atomic_loadti_1" > + [(set (match_operand:TI 0 "register_operand" "=r") > + (unspec:TI [(match_operand:TI 1 "memory_operand" "m")] "m" is wrong here (and basically everywhere), since it includes SYMBOL_REFs for lrl etc. on z10. For lpq we need "RT". > + "lpq\t%0,%1" > + [(set_attr "op_type" "RXY")]) LPQ and STPQ probably should get a "type" of "other". This may not model the pipeline exactly, but it's better than just assuming "integer" by default. These instructions are special (and slow). > +(define_expand "atomic_compare_and_swap<mode>" > + [(match_operand:SI 0 "register_operand") ;; bool success output > + (match_operand:DGPR 1 "register_operand") ;; oldval output > + (match_operand:DGPR 2 "s_operand") ;; memory memory_operand probably better again. > + (match_operand:DGPR 3 "register_operand") ;; expected intput > + (match_operand:DGPR 4 "register_operand") ;; newval intput > + (match_operand:SI 5 "const_int_operand") ;; is_weak > + (match_operand:SI 6 "const_int_operand") ;; success model > + (match_operand:SI 7 "const_int_operand")] ;; failure model > + "" > +{ > + rtx cc, cmp; > + emit_insn (gen_atomic_compare_and_swap<mode>_internal > + (operands[1], operands[2], operands[3], operands[4])); > + cc = gen_rtx_REG (CCZ1mode, CC_REGNUM); > + cmp = gen_rtx_NE (SImode, cc, const0_rtx); As already pointed out, this needs to be EQ. > - "" > - "cds<tg>\t%0,%3,%S1" > - [(set_attr "op_type" "RS<TE>") > + "TARGET_ZARCH" > + "c<td>sg\t%0,%3,%S1" > + [(set_attr "op_type" "RXY") Should be "RSY" > + "@ > + cds\t%0,%3,%S1 > + cdsy\t%0,%3,%S1" > + [(set_attr "op_type" "RX,RXY") Should be "RS, RSY" > + "@ > + cs\t%0,%3,%S1 > + csy\t%0,%3,%S1" > + [(set_attr "op_type" "RX,RXY") Likewise. > +(define_insn "atomic_fetch_<atomic><mode>_iaf" > + [(set (match_operand:GPR 0 "register_operand" "=d") > + (match_operand:GPR 1 "s_operand" "+S")) > + (set (match_dup 1) > + (unspec_volatile:GPR > + [(ATOMIC_Z196:GPR (match_dup 1) > + (match_operand:GPR 2 "general_operand" "d"))] > + UNSPECV_ATOMIC_OP)) > + (clobber (reg:CC CC_REGNUM))] > "TARGET_Z196" > - "la<noxa><g>\t%0,%2,%1") > + "la<noxa><g>\t%0,%2,%1" > + [(set_attr "op_type" "RXY") Again RSY. There is one particular inefficiency I have noticed. This code: if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) abort (); from atomic-compare-exchange-3.c gets compiled into: l %r3,0(%r2) larl %r1,v cs %r3,%r4,0(%r1) ipm %r1 sra %r1,28 st %r3,0(%r2) ltr %r1,%r1 jne .L3 which is extremely inefficient; it converts the condition code into an integer using the slow ipm, sra sequence, just so that it can convert the integer back into a condition code via ltr and branch on it ... Now, with the old code this sequence used to be optimized away by combine and we'd just have the cs followed by a branch. This is not done any more because we have the store to "expected" in between. This is because of this code in combine: /* Make sure that the value that is to be substituted for the register does not use any registers whose values alter in between. However, If the insns are adjacent, a use can't cross a set even though we think it might (this can happen for a sequence of insns each setting the same destination; last_set of that register might point to a NOTE). If INSN has a REG_EQUIV note, the register is always equivalent to the memory so the substitution is valid even if there are intervening stores. Also, don't move a volatile asm or UNSPEC_VOLATILE across any other insns. */ || (! all_adjacent && (((!MEM_P (src) || ! find_reg_note (insn, REG_EQUIV, src)) && use_crosses_set_p (src, DF_INSN_LUID (insn))) || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src)) || GET_CODE (src) == UNSPEC_VOLATILE)) Note that we have exactly the case mentioned, where a series of instructions to be combined all set the same (CC) register. If they're all adjacent, this still gets optimized away -- but they no longer are due to the store. Is there a way of structuring the atomic optabs/expander so that the store gets done either before or after all the CC operations? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com