Richard Henderson writes:

> On 07/15/2016 01:59 AM, Pranith Kumar wrote:
>> This patch applies on top of the fence generation patch series.
>>
>> This commit optimizes fence instructions. Two optimizations are
>> currently implemented. These are:
>>
>> 1. Unnecessary duplicate fence instructions
>>
>>    If the same fence instruction is detected consecutively, we remove
>>    one instance of it.
>>
>>    ex: mb; mb => mb, strl; strl => strl
>>
>> 2. Merging weaker fence with subsequent/previous stronger fence
>>
>>    load-acquire/store-release fence can be combined with a full fence
>>    without relaxing the ordering constraint.
>>
>>    ex: a) ld; ldaq; mb => ld; mb
>>        b) mb; strl; st => mb; st
>>
>> Signed-off-by: Pranith Kumar <bobby.pr...@gmail.com>
>> ---
>>  tcg/optimize.c | 59 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  tcg/tcg.h      |  1 +
>>  2 files changed, 60 insertions(+)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index c0d975b..a655829 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -569,6 +569,63 @@ static bool swap_commutative2(TCGArg *p1, TCGArg *p2)
>>      return false;
>>  }
>>
>> +/* Eliminate duplicate and unnecessary fence instructions */
>> +void tcg_optimize_mb(TCGContext *s)
>> +{
>> +    int oi, oi_next;
>> +    TCGArg prev_op_mb = -1;
>> +    TCGOp *prev_op;
>> +
>> +    for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
>> +        TCGOp *op = &s->gen_op_buf[oi];
>> +        TCGArg *args = &s->gen_opparam_buf[op->args];
>> +        TCGOpcode opc = op->opc;
>> +
>> +        switch (opc) {
>> +        case INDEX_op_mb:
>> +        {
>> +            TCGBar curr_mb_type = args[0] & 0xF0;
>> +            TCGBar prev_mb_type = prev_op_mb & 0xF0;
>
> No check to make sure that prev_op_mb != -1?
>

I don't think that is necessary since if prev_op_mb is -1 then prev_mb_type
will be 0x70 which will not match any of the cases we compare against below.

>> +
>> +            if (curr_mb_type == prev_mb_type ||
>> +                (curr_mb_type == TCG_BAR_STRL && prev_mb_type == 
>> TCG_BAR_SC)) {
>> +                /* Remove the current weaker barrier op. The previous
>> +                 * barrier is stronger and sufficient.
>> +                 * mb; strl => mb; st
>> +                 */
>> +                tcg_op_remove(s, op);
>> +            } else if (curr_mb_type == TCG_BAR_SC &&
>> +                       prev_mb_type == TCG_BAR_LDAQ) {
>> +                /* Remove the previous weaker barrier op. The current
>> +                 * barrier is stronger and sufficient.
>> +                 * ldaq; mb => ld; mb
>> +                 */
>> +                tcg_op_remove(s, prev_op);
>> +            } else if (curr_mb_type == TCG_BAR_STRL &&
>> +                       prev_mb_type == TCG_BAR_LDAQ) {
>> +                /* Consecutive load-acquire and store-release barriers
>> +                 * can be merged into one stronger SC barrier
>> +                 * ldaq; strl => ld; mb; st
>> +                 */
>> +                args[0] = (args[0] & 0x0F) | TCG_BAR_SC;
>> +                tcg_op_remove(s, prev_op);
>
> Don't you need to merge the bottom 4 bits as well?
>

That is an interesting point. Ideally, the TCG_MO_* flags for both the
barriers should be the same. Atleast now, for LDAR and STRL barriers this flag
is TCG_MO_ALL since we do not have any other variant.

>> +        default:
>> +            prev_op_mb = -1;
>> +            prev_op = NULL;
>> +            break;
>
> A reasonable start, I suppose, but there's nothing stopping you from 
> considering barriers that aren't exactly adjacent.
>
> You only need to reset prev_op_mb when you see qemu_ld, qemu_st, call, or 
> tcg_op_defs[opc].flags & TCG_OPF_BB_END.
>

That sounds like a good idea to extend this to. I will work on it.

Thanks!
-- 
Pranith

Reply via email to