Richard Henderson writes: > On 07/15/2016 01:59 AM, Pranith Kumar wrote: >> This patch applies on top of the fence generation patch series. >> >> This commit optimizes fence instructions. Two optimizations are >> currently implemented. These are: >> >> 1. Unnecessary duplicate fence instructions >> >> If the same fence instruction is detected consecutively, we remove >> one instance of it. >> >> ex: mb; mb => mb, strl; strl => strl >> >> 2. Merging weaker fence with subsequent/previous stronger fence >> >> load-acquire/store-release fence can be combined with a full fence >> without relaxing the ordering constraint. >> >> ex: a) ld; ldaq; mb => ld; mb >> b) mb; strl; st => mb; st >> >> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com> >> --- >> tcg/optimize.c | 59 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> tcg/tcg.h | 1 + >> 2 files changed, 60 insertions(+) >> >> diff --git a/tcg/optimize.c b/tcg/optimize.c >> index c0d975b..a655829 100644 >> --- a/tcg/optimize.c >> +++ b/tcg/optimize.c >> @@ -569,6 +569,63 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2) >> return false; >> } >> >> +/* Eliminate duplicate and unnecessary fence instructions */ >> +void tcg_optimize_mb(TCGContext *s) >> +{ >> + int oi, oi_next; >> + TCGArg prev_op_mb = -1; >> + TCGOp *prev_op; >> + >> + for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) { >> + TCGOp *op = &s->gen_op_buf[oi]; >> + TCGArg *args = &s->gen_opparam_buf[op->args]; >> + TCGOpcode opc = op->opc; >> + >> + switch (opc) { >> + case INDEX_op_mb: >> + { >> + TCGBar curr_mb_type = args[0] & 0xF0; >> + TCGBar prev_mb_type = prev_op_mb & 0xF0; > > No check to make sure that prev_op_mb != -1? >
I don't think that is necessary since if prev_op_mb is -1 then prev_mb_type will be 0x70 which will not match any of the cases we compare against below. >> + >> + if (curr_mb_type == prev_mb_type || >> + (curr_mb_type == TCG_BAR_STRL && prev_mb_type == >> TCG_BAR_SC)) { >> + /* Remove the current weaker barrier op. The previous >> + * barrier is stronger and sufficient. >> + * mb; strl => mb; st >> + */ >> + tcg_op_remove(s, op); >> + } else if (curr_mb_type == TCG_BAR_SC && >> + prev_mb_type == TCG_BAR_LDAQ) { >> + /* Remove the previous weaker barrier op. The current >> + * barrier is stronger and sufficient. >> + * ldaq; mb => ld; mb >> + */ >> + tcg_op_remove(s, prev_op); >> + } else if (curr_mb_type == TCG_BAR_STRL && >> + prev_mb_type == TCG_BAR_LDAQ) { >> + /* Consecutive load-acquire and store-release barriers >> + * can be merged into one stronger SC barrier >> + * ldaq; strl => ld; mb; st >> + */ >> + args[0] = (args[0] & 0x0F) | TCG_BAR_SC; >> + tcg_op_remove(s, prev_op); > > Don't you need to merge the bottom 4 bits as well? > That is an interesting point. Ideally, the TCG_MO_* flags for both the barriers should be the same. Atleast now, for LDAR and STRL barriers this flag is TCG_MO_ALL since we do not have any other variant. >> + default: >> + prev_op_mb = -1; >> + prev_op = NULL; >> + break; > > A reasonable start, I suppose, but there's nothing stopping you from > considering barriers that aren't exactly adjacent. > > You only need to reset prev_op_mb when you see qemu_ld, qemu_st, call, or > tcg_op_defs[opc].flags & TCG_OPF_BB_END. > That sounds like a good idea to extend this to. I will work on it. Thanks! -- Pranith