> 
> This is the patch I am going to check into GCC 8.
> 
> -- 
> H.J.

> From 9ecbfa1fd04dc4370a9ec4f3d56189cc07aee668 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.to...@gmail.com>
> Date: Thu, 17 May 2018 09:52:09 -0700
> Subject: [PATCH] x86: Re-enable partial_reg_dependency and movx for Haswell
> 
> r254152 disabled partial_reg_dependency and movx for Haswell and newer
> Intel processors.  r258972 restored them for skylake-avx512.  For Haswell,
> movx improves performance.  But partial_reg_stall may be better than
> partial_reg_dependency in theory.  We will investigate performance impact
> of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9.  In
> the meantime, this patch restores both partial_reg_dependency and mox for
> Haswell in GCC 8.
> 
> On Haswell, improvements for EEMBC benchmarks with
> 
> -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell
> 
> vs
> 
> -Ofast -mtune=haswell
> 
> are
> 
> automotive
> =========
>   aifftr01 (default) - goodperf: Runtime improvement of   2.6% (time).
>   aiifft01 (default) - goodperf: Runtime improvement of   2.2% (time).
> 
> networking
> =========
>   ip_pktcheckb1m (default) - goodperf: Runtime improvement of   3.8% (time).
>   ip_pktcheckb2m (default) - goodperf: Runtime improvement of   5.2% (time).
>   ip_pktcheckb4m (default) - goodperf: Runtime improvement of   4.4% (time).
>   ip_pktcheckb512k (default) - goodperf: Runtime improvement of   4.2% (time).
> 
> telecom
> =========
>   fft00data_1 (default) - goodperf: Runtime improvement of   8.4% (time).
>   fft00data_2 (default) - goodperf: Runtime improvement of   8.6% (time).
>   fft00data_3 (default) - goodperf: Runtime improvement of   9.0% (time).

Thanks for data. Why did you commited the patch to release branch only?
The patch is OK for mainline too.

I do not have access to the benchmark so I can not check. Why do we get
the improvements here and how does that behave on skylake+?

honza
> 
>       PR target/85829
>       * config/i386/x86-tune.def: Re-enable partial_reg_dependency
>       and movx for Haswell.
> ---
>  gcc/config/i386/x86-tune.def | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
> index 5649fdcf416..60625668236 100644
> --- a/gcc/config/i386/x86-tune.def
> +++ b/gcc/config/i386/x86-tune.def
> @@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
>     over partial stores.  For example preffer MOVZBL or MOVQ to load 8bit
>     value over movb.  */
>  DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
> -          m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
> +          m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_HASWELL
>         | m_BONNELL | m_SILVERMONT | m_INTEL
>         | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
>  
> @@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, 
> "partial_flag_reg_stall",
>     partial dependencies.  */
>  DEF_TUNE (X86_TUNE_MOVX, "movx",
>            m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE
> -       | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL
> +       | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL
>         | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC)
>  
>  /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by
> -- 
> 2.17.0
> 

Reply via email to