On Mon, 2024-03-18 at 20:54 -0600, Jeff Law wrote:
> > +/* Costs to use when optimizing for xiangshan nanhu.  */
> > +static const struct riscv_tune_param xiangshan_nanhu_tune_info = {
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_add */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* fp_mul */
> > +  {COSTS_N_INSNS (10), COSTS_N_INSNS (20)},        /* fp_div */
> > +  {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},  /* int_mul */
> > +  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},  /* int_div */
> > +  6,                                               /* issue_rate */
> > +  3,                                               /* branch_cost */
> > +  3,                                               /* memory_cost */
> > +  3,                                               /* fmv_cost */
> > +  true,                                            /* 
> > slow_unaligned_access */
> > +  false,                                   /* use_divmod_expansion */
> > +  RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH,          /* fusible_ops */
> > +  NULL,                                            /* vector cost */

> Is your integer division really that fast?  The table above essentially 
> says that your cpu can do integer division in 6 cycles.

Hmm, I just seen I've coded some even smaller value for LoongArch CPUs
so forgive me for "hijacking" this thread...

The problem seems integer division may spend different number of cycles
for different inputs: on LoongArch LA664 I've observed 5 cycles for some
inputs and 39 cycles for other inputs.

So should we use the minimal value, the maximum value, or something in-
between for TARGET_RTX_COSTS and pipeline descriptions?

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Reply via email to