On Thu, 17 Apr 2014, Richard Biener wrote: > > The patch below increases the number of coalescs we attempt > to also cover unary and binary operations. This improves > initial code generation for code like > > int foo (int i, int j, int k, int l) > { > int res = i; > res += j; > res += k; > res += l; > return res; > } > > from > > ;; res_3 = i_1(D) + j_2(D); > > (insn 9 8 0 (parallel [ > (set (reg/v:SI 83 [ res ]) > (plus:SI (reg/v:SI 87 [ i ]) > (reg/v:SI 88 [ j ]))) > (clobber (reg:CC 17 flags)) > ]) t.c:4 -1 > (nil)) > > ;; res_5 = res_3 + k_4(D); > > (insn 10 9 0 (parallel [ > (set (reg/v:SI 84 [ res ]) > (plus:SI (reg/v:SI 83 [ res ]) > (reg/v:SI 89 [ k ]))) > (clobber (reg:CC 17 flags)) > ]) t.c:5 -1 > (nil)) > ... > > to > > ;; res_3 = i_1(D) + j_2(D); > > (insn 9 8 0 (parallel [ > (set (reg/v:SI 83 [ res ]) > (plus:SI (reg/v:SI 85 [ i ]) > (reg/v:SI 86 [ j ]))) > (clobber (reg:CC 17 flags)) > ]) t.c:4 -1 > (nil)) > > ;; res_5 = res_3 + k_4(D); > > (insn 10 9 0 (parallel [ > (set (reg/v:SI 83 [ res ]) > (plus:SI (reg/v:SI 83 [ res ]) > (reg/v:SI 87 [ k ]))) > (clobber (reg:CC 17 flags)) > ]) t.c:5 -1 > (nil)) > > re-using the same pseudo for the LHS. > > Expansion has special code to improve coalescing of op1 with > target thus this is what we try to match here. > > Overall there are positive and negative size effects during > a bootstrap on x86_64, but overall it seems to be a loss > - stage3 cc1 text size is 18261647 bytes without the patch > compared to 18265751 bytes with the patch. > > Now the question is what does this tell us? Not re-using > the same pseudo as op and target is always better? > > Btw, I tried this to find a convincing metric for a intra-BB > scheduling pass (during out-of-SSA) on GIMPLE (to be able > to kill that odd scheduling code we now have in reassoc). > And to have sth that TER not immediately un-does we have > to disable TER which conveniently happens for coalesced > SSA names. Thus -> schedule for "register pressure", and thus > reduce SSA name lifetime - with the goal that out-of-SSA can > do more coalescing. But it won't even try to coalesce > anything else than PHI copies (not affected by scheduling) > or plain SSA name copies (shouldn't happen anyway due to > copy propagation). > > So - any ideas? Or is the overall negative for cc1 just > an artifact to ignore and we _should_ coalesce as much > as possible (even if it doesn't avoid copies - thus the > "cost" of 0 used in the patch)?
One example where it delivers bad initial expansion on x86_64 is int foo (int *p) { int res = p[0]; res += p[1]; res += p[2]; res += p[3]; return res; } where i386.c:ix86_fixup_binary_operands tries to be clever and "improve address combine", generating two instructions for (plus:SI (reg/v:SI 83 [ res ]) (mem:SI (...))) and thus triggering expand_binop_directly pat = maybe_gen_insn (icode, 3, ops); if (pat) { /* If PAT is composed of more than one insn, try to add an appropriate REG_EQUAL note to it. If we can't because TEMP conflicts with an operand, call expand_binop again, this time without a target. */ if (INSN_P (pat) && NEXT_INSN (pat) != NULL_RTX && ! add_equal_note (pat, ops[0].value, optab_to_code (binoptab), ops[1].value, ops[2].value)) { delete_insns_since (last); return expand_binop (mode, binoptab, op0, op1, NULL_RTX, unsignedp, methods); } and thus we end up with (insn 9 6 10 (set (reg:SI 91) (mem:SI (plus:DI (reg/v/f:DI 88 [ p ]) (const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4 A32])) t.c:4 -1 (nil)) (insn 10 9 11 (parallel [ (set (reg:SI 90) (plus:SI (reg/v:SI 83 [ res ]) (reg:SI 91))) (clobber (reg:CC 17 flags)) ]) t.c:4 -1 (expr_list:REG_EQUAL (plus:SI (reg/v:SI 83 [ res ]) (mem:SI (plus:DI (reg/v/f:DI 88 [ p ]) (const_int 4 [0x4])) [0 MEM[(int *)p_2(D) + 4B]+0 S4 A32])) (nil))) (insn 11 10 0 (set (reg/v:SI 83 [ res ]) (reg:SI 90)) t.c:4 -1 (nil)) unpatched we avoid the last move (the tiny testcase of course ends up optimizing the same anyway). Not sure if that strong desire to add a REG_EQUAL note makes up for the losses. At least it looks backwards to the code preceeding it: /* If operation is commutative, try to make the first operand a register. Even better, try to make it the same as the target. Also try to make the last operand a constant. */ if (commutative_p && swap_commutative_operands_with_target (target, xop0, xop1)) { swap = xop1; xop1 = xop0; xop0 = swap; } "Even better"? It will cause the REG_EQUAL failure path and make that swap moot ... (same in expand_unop_directly). Looks like REG_EQUAL notes are more important than register coalescing (of course I can't know if a multi-insn expansion will happen during out-of-SSA :/) Richard. > Otherwise the patch bootstraps and tests fine on x86_64-unknown-linux-gnu. > > Thanks, > Richard. > > 2014-04-17 Richard Biener <rguent...@suse.de> > > * tree-ssa-coalesce.c (create_outofssa_var_map): Try to > coalesce SSA name uses with SSA name results in all unary > and binary operations. > > Index: gcc/tree-ssa-coalesce.c > =================================================================== > *** gcc/tree-ssa-coalesce.c (revision 209469) > --- gcc/tree-ssa-coalesce.c (working copy) > *************** create_outofssa_var_map (coalesce_list_p > *** 991,1007 **** > case GIMPLE_ASSIGN: > { > tree lhs = gimple_assign_lhs (stmt); > tree rhs1 = gimple_assign_rhs1 (stmt); > ! if (gimple_assign_ssa_name_copy_p (stmt) > && gimple_can_coalesce_p (lhs, rhs1)) > { > v1 = SSA_NAME_VERSION (lhs); > v2 = SSA_NAME_VERSION (rhs1); > ! cost = coalesce_cost_bb (bb); > ! add_coalesce (cl, v1, v2, cost); > bitmap_set_bit (used_in_copy, v1); > bitmap_set_bit (used_in_copy, v2); > } > } > break; > > --- 993,1031 ---- > case GIMPLE_ASSIGN: > { > tree lhs = gimple_assign_lhs (stmt); > + if (TREE_CODE (lhs) != SSA_NAME) > + break; > + > + /* Expansion handles target == op1 properly and also > + target == op2 for commutative binary ops. */ > tree rhs1 = gimple_assign_rhs1 (stmt); > ! enum tree_code code = gimple_assign_rhs_code (stmt); > ! enum gimple_rhs_class klass = get_gimple_rhs_class (code); > ! if (TREE_CODE (rhs1) == SSA_NAME > && gimple_can_coalesce_p (lhs, rhs1)) > { > v1 = SSA_NAME_VERSION (lhs); > v2 = SSA_NAME_VERSION (rhs1); > ! add_coalesce (cl, v1, v2, > ! klass == GIMPLE_SINGLE_RHS > ! ? coalesce_cost_bb (bb) : 0); > bitmap_set_bit (used_in_copy, v1); > bitmap_set_bit (used_in_copy, v2); > } > + if (klass == GIMPLE_BINARY_RHS > + && commutative_tree_code (code)) > + { > + tree rhs2 = gimple_assign_rhs2 (stmt); > + if (TREE_CODE (rhs2) == SSA_NAME > + && gimple_can_coalesce_p (lhs, rhs2)) > + { > + v1 = SSA_NAME_VERSION (lhs); > + v2 = SSA_NAME_VERSION (rhs2); > + add_coalesce (cl, v1, v2, 0); > + bitmap_set_bit (used_in_copy, v1); > + bitmap_set_bit (used_in_copy, v2); > + } > + } > } > break; > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer