https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97127
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Michael_S from comment #8) > What are values of gcc "loop" cost of the relevant instructions now? > 1. AVX256 Load > 2. FMA3 ymm,ymm,ymm > 3. AVX256 Regmove > 4. FMA3 mem,ymm,ymm For skylake, outside of register allocation. they are 1. AVX256 Load ---- 10 2. FMA3 ymm,ymm,ymm --- 16 3. AVX256 Regmove --- 2 4. FMA3 mem,ymm,ymm --- 32 In RA, no direct cost for fma instrcutions, but we can disparage memory alternative in FMA instructions, but again, it may hurt performance in some cases. 1. AVX256 Load ---- 10 3. AVX256 Regmove --- 2 BTW: we have done a lot of experiments with different cost models and no significant performance impact on SPEC2017.