Hi,
this patch enables logic which avoid FMA for matrix multiplicaiton loop
for 256 bit vectors. The underlying issue is same as with znver1. While
combined latency of mutliply and add operations is slower than FMA, the
dependency chain in matrix multiplication depends only on additions
that are faster.

Bootstrapped/regtested x86_64-linux, comitted.

        * config/i386/i386-options.c (ix86_option_override_internal): Default
        PARAM_AVOID_FMA_MAX_BITS to 256 for znver2.
        * conifg/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Set for
        ZNVER2.

Index: config/i386/i386-options.c
===================================================================
--- config/i386/i386-options.c  (revision 273727)
+++ config/i386/i386-options.c  (working copy)
@@ -2779,7 +2779,11 @@ ix86_option_override_internal (bool main
     opts->x_flag_cf_protection
       = (cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
 
-  if (ix86_tune_features [X86_TUNE_AVOID_128FMA_CHAINS])
+  if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
+    maybe_set_param_value (PARAM_AVOID_FMA_MAX_BITS, 256,
+                          opts->x_param_values,
+                          opts_set->x_param_values);
+  else if (ix86_tune_features [X86_TUNE_AVOID_128FMA_CHAINS])
     maybe_set_param_value (PARAM_AVOID_FMA_MAX_BITS, 128,
                           opts->x_param_values,
                           opts_set->x_param_values);
Index: config/i386/x86-tune.def
===================================================================
--- config/i386/x86-tune.def    (revision 273727)
+++ config/i386/x86-tune.def    (working copy)
@@ -431,6 +431,10 @@ DEF_TUNE (X86_TUNE_USE_GATHER, "use_gath
    smaller FMA chain.  */
 DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER)
 
+/* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
+   smaller FMA chain.  */
+DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2)
+
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
 /*****************************************************************************/

Reply via email to