Hi, originally inspired by the wish to transform
vmv v3, a0 ; = vec_duplicate vadd.vv v1, v2, v3 into vadd.vx v1, v2, a0 via fwprop for riscv, this patch enables the forward propagation of UNARY_P sources. As this involves potentially replacing a vector register with a scalar register the ira_hoist_pressure machinery is used to calculate the change in register pressure. If the propagation would increase the pressure beyond the number of hard regs, we don't perform it. The regpressure commit this patch depends on is not yet pushed because I figured I'd post the code using it in case of further comments. The testsuite is unchanged on i386 and power10 but aarch64 has some new FAILs but I'm not terribly familiar with aarch64, hence some examples here: The following cases shrn-combine-1/2/3 seem worse as we emit one instruction more. Before: shrn v30.8b, v30.8h, 2 shrn2 v30.16b, v31.8h, 2 str q30, [x1, x3] After: ushr v30.8h, v30.8h, 2 ushr v31.8h, v31.8h, 2 uzp1 v31.16b, v30.16b, v31.16b str q31, [x1, x3] Here, one optimization already happens in fwprop so combine cannot do the same work it did before. vec-init-22-speed.c changes from sxth w0, w0 sxth w1, w1 fmov d31, x0 fmov d0, x1 to: dup v31.4h, w0 dup v0.4h, w1 which I hope has the same semantics and is shorter. Apart from that there are numerous check-asm testcases where a new, earlier propagation prevents a later, supposedly better propagation. One such example is from ld4_s8.c: (insn 11 10 12 2 (set (reg:DI 100) (neg:DI (reg:DI 102))) 385 {negdi2} (expr_list:REG_EQUAL (const_poly_int:DI [-576, -576]) (nil))) (insn 12 11 13 2 (set (reg/f:DI 99) (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100))) 153 {*adddi3_aarch64} (expr_list:REG_EQUAL (plus:DI (reg/v/f:DI 97 [ x0 ]) (const_poly_int:DI [-576, -576])) (nil))) (insn 13 12 14 2 (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (reg/f:DI 99) [0 MEM <svint8_t[4]> [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) 5885 {vec_mask_load_lanesvnx64qivnx16qi} (nil)) where we now do: propagating insn 11 into insn 12, replacing: (set (reg/f:DI 99) (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100))) successfully matched this instruction to subdi3: (set (reg/f:DI 99) (minus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 102))) vs before: cannot propagate from insn 11 into insn 12: would increase complexity of pattern propagating insn 12 into insn 13, replacing: (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (reg/f:DI 99) [0 MEM <svint8_t[4]> [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) successfully matched this instruction to vec_mask_load_lanesvnx64qivnx16qi: (set (reg/v:VNx64QI 94 [ z0 ]) (unspec:VNx64QI [ (reg/v:VNx16BI 96 [ p0 ]) (mem:VNx64QI (plus:DI (reg/v/f:DI 97 [ x0 ]) (reg:DI 100)) [0 MEM <svint8_t[4]> [(signed char *)_1]+0 S[64, 64] A8]) ] UNSPEC_LDN)) All in all this seems like a general problem with earlier optimizations preventing later ones and surely all of those could be fixed individually. Still, the question remains if the general approach is useful or desired or if we not rather prevent more optimizations that we gain. Suggestions welcome. I have some riscv tests for it but didn't attach them yet in order to focus on the main part first. Regards Robin gcc/ChangeLog: * fwprop.cc (fwprop_propagation::profitable_p): Add unary handling. (fwprop_propagation::update_register_pressure): New function. (fwprop_propagation::register_pressure_high_p): New function (reg_single_def_for_src_p): Look through unary expressions. (try_fwprop_subst_pattern): Check register pressure. (forward_propagate_into): Call new function. (fwprop_init): Init register pressure. (fwprop_done): Clean up register pressure. (fwprop_insn): Add comment. --- gcc/fwprop.cc | 307 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 300 insertions(+), 7 deletions(-) diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc index 0707a234726..413fe4e7335 100644 --- a/gcc/fwprop.cc +++ b/gcc/fwprop.cc @@ -36,6 +36,10 @@ along with GCC; see the file COPYING3. If not see #include "tree-pass.h" #include "rtl-iter.h" #include "target.h" +#include "dominance.h" + +#include "ira.h" +#include "regpressure.h" /* This pass does simple forward propagation and simplification when an operand of an insn can only come from a single def. This pass uses @@ -103,6 +107,10 @@ using namespace rtl_ssa; static int num_changes; +/* Keep track of which registers already increased the pressure to avoid double + booking. */ +sbitmap pressure_accounted; + /* Do not try to replace constant addresses or addresses of local and argument slots. These MEM expressions are made only once and inserted in many instructions, as well as being used to control symbol table @@ -181,6 +189,8 @@ namespace bool changed_mem_p () const { return result_flags & CHANGED_MEM; } bool folded_to_constants_p () const; bool profitable_p () const; + bool register_pressure_high_p (rtx, rtx, rtx_insn *, rtx_insn *) const; + void update_register_pressure (rtx, rtx, rtx_insn *, rtx_insn *) const; bool check_mem (int, rtx) final override; void note_simplification (int, uint16_t, rtx, rtx) final override; @@ -332,25 +342,243 @@ fwprop_propagation::profitable_p () const && (result_flags & PROFITABLE)) return true; - if (REG_P (to)) + /* Only continue with an unary operation if we consider register + pressure. */ + rtx what = copy_rtx (to); + if (UNARY_P (what) && flag_ira_hoist_pressure) + what = XEXP (what, 0); + + if (REG_P (what)) return true; - if (GET_CODE (to) == SUBREG - && REG_P (SUBREG_REG (to)) - && !paradoxical_subreg_p (to)) + if (GET_CODE (what) == SUBREG + && REG_P (SUBREG_REG (what)) + && !paradoxical_subreg_p (what)) return true; - if (CONSTANT_P (to)) + if (CONSTANT_P (what)) return true; return false; } -/* Check that X has a single def. */ +/* Check if the register pressure in any predecessor block of USE's block + until DEF's block is equal or higher to the number of hardregs in NU's + register class. */ +bool +fwprop_propagation::register_pressure_high_p (rtx nu, rtx old, rtx_insn *def, + rtx_insn *use) const +{ + enum reg_class nu_class, old_class; + int nu_nregs, old_nregs; + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); + old_class + = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); + + if (nu_class == NO_REGS && old_class == NO_REGS) + return true; + + if (nu_class == old_class) + return false; + + basic_block bbfrom = BLOCK_FOR_INSN (def); + basic_block bbto = BLOCK_FOR_INSN (use); + + basic_block bb; + + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); + bitmap_clear (visited); + auto_vec<basic_block> q; + q.safe_push (bbto); + + while (!q.is_empty ()) + { + bb = q.pop (); + + if (bitmap_bit_p (visited, bb->index)) + continue; + + /* Nothing to do if the register to be replaced is not live + in this BB. */ + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) + continue; + + /* Nothing to do if the replacement register is already live in + this BB. */ + if (regpressure_is_live_in (bb, REGNO (nu))) + continue; + + bitmap_set_bit (visited, bb->index); + + int press = regpressure_get (bb, nu_class); + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "pressure for reg %d in bb %d: %d\n", + REGNO (nu), bb->index, press); + + if (press + nu_nregs > ira_class_hard_regs_num[nu_class]) + return true; + + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) + break; + + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + q.safe_push (pred); + } + } + + sbitmap_free (visited); + + return false; +} + +/* Update the register pressure when propagating from DEF to USE + replacing OLD with NU. */ +void +fwprop_propagation::update_register_pressure (rtx nu, rtx old, rtx_insn *def, + rtx_insn *use) const +{ + basic_block bb; + + enum reg_class nu_class, old_class; + int nu_nregs, old_nregs; + nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs); + old_class = regpressure_get_regno_pressure_class (REGNO (old), &old_nregs); + + if (nu_class == old_class) + return; + + basic_block bbfrom = BLOCK_FOR_INSN (def); + basic_block bbto = BLOCK_FOR_INSN (use); + + sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun)); + bitmap_clear (visited); + auto_vec<basic_block> q; + q.safe_push (bbto); + + /* Precompute the users. */ + auto_vec<rtx_insn *> users; + df_ref duse; + bool should_decrease = true; + FOR_EACH_INSN_USE (duse, def) + { + rtx dreg = DF_REF_REAL_REG (duse); + if (REGNO (dreg) != REGNO (old)) + continue; + + df_ref op_ref = DF_REG_USE_CHAIN (REGNO (dreg)); + for (; op_ref; op_ref = DF_REF_NEXT_REG (op_ref)) + { + if (!DF_REF_INSN_INFO (op_ref)) + continue; + + rtx_insn *insn = DF_REF_INSN (op_ref); + + /* If there are other users in BBTO, never decrease the + pressure. */ + if (BLOCK_FOR_INSN (insn) == bbto && insn != use) + { + should_decrease = false; + break; + } + + users.safe_push (insn); + } + } + + while (!q.is_empty ()) + { + bb = q.pop (); + + if (bitmap_bit_p (visited, bb->index)) + continue; + + /* Nothing to do if the register to be replaced is not live + in this BB. */ + if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old))) + continue; + + /* Nothing to do if the new register is already live in this BB. */ + if (regpressure_is_live_in (bb, REGNO (nu))) + continue; + + bitmap_set_bit (visited, bb->index); + + if (regpressure_is_live_in (bb, REGNO (old))) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "increasing pressure for " + "reg %d in bb %d by %d regs.\n", + REGNO (nu), bb->index, nu_nregs); + regpressure_set_live_in (bb, REGNO (nu)); + regpressure_increase (bb, nu_class, nu_nregs); + + if (should_decrease) + { + rtx_insn *user; + int ix; + bool should_decrease_local = true; + + /* If this BB dominates no BB where OLD is used (except BBTO) + the register pressure is decreased. */ + FOR_EACH_VEC_ELT(users, ix, user) + { + basic_block ubb = BLOCK_FOR_INSN (user); + if (ubb == bbto) + continue; + if (dominated_by_p (CDI_DOMINATORS, ubb, bb)) + { + should_decrease_local = false; + break; + } + } + + if (should_decrease_local) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "decreasing pressure for " + "reg %d in bb %d by %d regs.\n", + REGNO (nu), bb->index, old_nregs); + regpressure_clear_live_in (bb, REGNO (old)); + regpressure_decrease (bb, old_class, old_nregs); + } + } + } + + if (bb == bbfrom || bb == ENTRY_BLOCK_PTR_FOR_FN (cfun)) + break; + + edge e; + edge_iterator ei; + + FOR_EACH_EDGE (e, ei, bb->preds) + { + basic_block pred = e->src; + q.safe_push (pred); + } + } + + sbitmap_free (visited); +} + +/* Check that X has a single def. Also look through unary operations if + we take register pressure into consideration. */ static bool reg_single_def_p (rtx x) { + if (UNARY_P (x) && flag_ira_hoist_pressure) + { + x = XEXP (x, 0); + + if (SUBREG_P (x) && !contains_paradoxical_subreg_p (x)) + x = SUBREG_REG (x); + } + return REG_P (x) && crtl->ssa->single_dominating_def (REGNO (x)); } @@ -490,6 +718,53 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change, } } + if (ok && UNARY_P (src)) + { + /* Currently the register classes can only be different with a + unary operation like vec_duplicate. + In that case the propagation can increase the register pressure + on the way to the target BB. + Therefore, we first check if the pressure of the respective register + is already high, i.e. equals or exceeds the number of hardregs for + that class. If not, we allow the propagation and update the + pressure. */ + + rtx src1 = src; + + /* Strip unary operation and possible subreg. */ + src1 = XEXP (src, 0); + if (GET_CODE (src1) == SUBREG) + src1 = SUBREG_REG (src1); + + rtx_insn *def_rtl = def_insn->rtl (); + + if (REG_P (dest) && REG_P (src1)) + { + if (!bitmap_bit_p (pressure_accounted, REGNO (src1))) + { + if (!prop.register_pressure_high_p (src1, dest, def_rtl, use_rtl)) + { + prop.update_register_pressure (src1, dest, def_rtl, use_rtl); + bitmap_set_bit (pressure_accounted, REGNO (src1)); + } + else + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "cannot propagate from insn %d into" + " insn %d: %s\n", def_insn->uid (), + use_insn->uid (), + "would increase register pressure beyond number" + " of hard regs"); + ok = false; + } + } + else + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "pressure for reg %d has already" + " been accounted\n", REGNO (src1)); + } + } + if (!ok) { /* The pattern didn't match, but if all uses of SRC folded to @@ -859,7 +1134,7 @@ forward_propagate_into (use_info *use, bool reg_prop_only = false) if ((reg_prop_only || (def_loop != use_loop && !flow_loop_nested_p (use_loop, def_loop))) - && (!reg_single_def_p (dest) || !reg_single_def_p (src))) + && !reg_single_def_p (dest)) return false; /* Don't substitute into a non-local goto, this confuses CFG. */ @@ -890,6 +1165,14 @@ fwprop_init (void) df_analyze (); crtl->ssa = new rtl_ssa::function_info (cfun); + + /* Calculate register pressure for each basic block. */ + if (flag_ira_hoist_pressure) + { + regpressure_init (); + + pressure_accounted = sbitmap_alloc (DF_REG_SIZE (df)); + } } static void @@ -910,6 +1193,12 @@ fwprop_done (void) fprintf (dump_file, "\nNumber of successful forward propagations: %d\n\n", num_changes); + + if (flag_ira_hoist_pressure) + { + regpressure_cleanup (); + sbitmap_free (pressure_accounted); + } } /* Try to optimize INSN, returning true if something changes. @@ -919,6 +1208,10 @@ fwprop_done (void) static bool fwprop_insn (insn_info *insn, bool fwprop_addr_p) { + /* TODO: when rejecting propagations due to register pressure + we would actually like to try the propagation with most + potential = uses first instead of going through them + sequentially. */ for (use_info *use : insn->uses ()) { if (use->is_mem ()) -- 2.41.0