Hi Maciej:

LGTM, thanks for modeling this in cost model!

On Tue, Jul 19, 2022 at 12:48 AM Maciej W. Rozycki <ma...@embecosm.com> wrote:
>
> Fix a performance regression from commit 391500af1932 ("Do not ignore
> costs of jump insns in combine."), a part of the m68k series for MODE_CC
> conversion (<https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01028.html>),
> observed in soft-fp code in libgcc used by some of the embench-iot
> benchmarks.
>
> The immediate origin of the regression is the middle end, which in the
> absence of cost information from the backend estimates the cost of an
> RTL expression by assuming a single machine instruction for each of the
> expression's subexpression.
>
> So for `if_then_else', which takes 3 operands, the estimated cost is 3
> instructions (i.e. 12 units) even though a branch instruction evaluates
> it in a single machine cycle (ignoring the cost of actually taking the
> branch of course, which is handled elsewhere).  Consequently an insn
> sequence like:
>
> (insn 595 594 596 43 (set (reg:DI 305)
>         (lshiftrt:DI (reg/v:DI 160 [ R_f ])
>             (const_int 55 [0x37]))) ".../libgcc/soft-fp/adddf3.c":46:3 216 
> {lshrdi3}
>      (nil))
> (insn 596 595 597 43 (set (reg:DI 304)
>         (and:DI (reg:DI 305)
>             (const_int 1 [0x1]))) ".../libgcc/soft-fp/adddf3.c":46:3 109 
> {anddi3}
>      (expr_list:REG_DEAD (reg:DI 305)
>         (nil)))
> (jump_insn 597 596 598 43 (set (pc)
>         (if_then_else (eq (reg:DI 304)
>                 (const_int 0 [0]))
>             (label_ref:DI 1644)
>             (pc))) ".../libgcc/soft-fp/adddf3.c":46:3 237 {*branchdi}
>      (expr_list:REG_DEAD (reg:DI 304)
>         (int_list:REG_BR_PROB 536870916 (nil)))
>  -> 1644)
>
> does not (anymore, as from the commit referred) get combined into:
>
> (note 595 594 596 43 NOTE_INSN_DELETED)
> (note 596 595 597 43 NOTE_INSN_DELETED)
> (jump_insn 597 596 598 43 (parallel [
>             (set (pc)
>                 (if_then_else (eq (zero_extract:DI (reg/v:DI 160 [ R_f ])
>                             (const_int 1 [0x1])
>                             (const_int 55 [0x37]))
>                         (const_int 0 [0]))
>                     (label_ref:DI 1644)
>                     (pc)))
>             (clobber (scratch:DI))
>         ]) ".../libgcc/soft-fp/adddf3.c":46:3 243 {*branch_on_bitdi}
>      (int_list:REG_BR_PROB 536870916 (nil))
>  -> 1644)
>
> This is because the new cost is incorrectly calculated as 28 units while
> the cost of the original 3 instructions was 24:
>
> rejecting combination of insns 595, 596 and 597
> original costs 4 + 4 + 16 = 24
> replacement cost 28
>
> Before the commit referred the cost of jump instruction was ignored and
> considered 0 (i.e. unknown) and a sequence of instructions of a known
> cost used to win:
>
> allowing combination of insns 595, 596 and 597
> original costs 4 + 4 + 0 = 0
> replacement cost 28
>
> Add the missing costs for the 3 variants of `if_then_else' expressions
> we currently define in the backend.
>
> With the fix in place the cost of this particular `if_then_else' pattern
> is 2 instructions or 8 units (because of the shift operation) and
> therefore the ultimate cost of the original 3 RTL insns will work out at
> 16 units (4 + 4 + 8), however the replacement single RTL insn will cost
> 8 units only.
>
>         gcc/
>         * config/riscv/riscv.cc (riscv_rtx_costs) <IF_THEN_ELSE>: New
>         case.
> ---
>  gcc/config/riscv/riscv.cc |   27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> gcc-riscv-rtx-costs-if-then-else.diff
> Index: gcc/gcc/config/riscv/riscv.cc
> ===================================================================
> --- gcc.orig/gcc/config/riscv/riscv.cc
> +++ gcc/gcc/config/riscv/riscv.cc
> @@ -1853,6 +1853,33 @@ riscv_rtx_costs (rtx x, machine_mode mod
>        /* Otherwise use the default handling.  */
>        return false;
>
> +    case IF_THEN_ELSE:
> +      if (TARGET_SFB_ALU
> +         && register_operand (XEXP (x, 1), mode)
> +         && sfb_alu_operand (XEXP (x, 2), mode)
> +         && comparison_operator (XEXP (x, 0), VOIDmode))
> +       {
> +         /* For predicated conditional-move operations we assume the cost
> +            of a single instruction even though there are actually two.  */
> +         *total = COSTS_N_INSNS (1);
> +         return true;
> +       }
> +      else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
> +       {
> +         if (equality_operator (XEXP (x, 0), mode)
> +             && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTRACT)
> +           {
> +             *total = COSTS_N_INSNS (SINGLE_SHIFT_COST + 1);
> +             return true;
> +           }
> +         if (order_operator (XEXP (x, 0), mode))
> +           {
> +             *total = COSTS_N_INSNS (1);
> +             return true;
> +           }
> +       }
> +      return false;
> +
>      case NOT:
>        *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1);
>        return false;

Reply via email to