https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Michael_S from comment #8)
> What are values of gcc "loop" cost of the relevant instructions now?
> 1. AVX256 Load
> 2. FMA3 ymm,ymm,ymm
> 3. AVX256 Regmove
> 4. FMA3 mem,ymm,ymm

For skylake, outside of register allocation.

they are
1. AVX256 Load  ---- 10
2. FMA3 ymm,ymm,ymm --- 16
3. AVX256 Regmove  --- 2
4. FMA3 mem,ymm,ymm --- 32

In RA, no direct cost for fma instrcutions, but we can disparage memory
alternative in FMA instructions, but again, it may hurt performance in some
cases.

1. AVX256 Load  ---- 10
3. AVX256 Regmove  --- 2

BTW: we have done a lot of experiments with different cost models and no
significant performance impact on SPEC2017.

Reply via email to