[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

xry111 at gcc dot gnu.org via Gcc-bugs Fri, 08 Dec 2023 23:48:13 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112935


--- Comment #8 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #7)
> (In reply to Xi Ruoyao from comment #5)
> > 
> > so we still slightly penalty multiplication.  To me we should code
> > COSTS_N_INSNS (1) + 1 into loongarch_rtx_cost_optimize_size instead of
> > special casing it in loongarch_rtx_costs.
> 
> Oh yes slightly penalty is definitely not going make a huge difference if
> the cost of an mult instruction is worse than an and and an neg.
> 
> > 
> > For the default value (used when -O2) I'll do some micro-benchmark...

I've changed it to

/* Default RTX cost initializer.  */
loongarch_rtx_cost_data::loongarch_rtx_cost_data ()
  : fp_add (COSTS_N_INSNS (5)),
    fp_mult_sf (COSTS_N_INSNS (5)),
    fp_mult_df (COSTS_N_INSNS (5)),
    fp_div_sf (COSTS_N_INSNS (8)),
    fp_div_df (COSTS_N_INSNS (8)),
    int_mult_si (COSTS_N_INSNS (4)),
    int_mult_di (COSTS_N_INSNS (4)),
    int_div_si (COSTS_N_INSNS (5)),
    int_div_di (COSTS_N_INSNS (5)),
    branch_cost (6),
    memory_latency (4) {}

based on micro-benchmark results.  This fixes the int * _Bool case and int * 17
case.  But for the original test case I still get a multiplication instruction.

[Bug target/112935] [14 Regression] Performance regression in Coremarks crcu8 function

Reply via email to