> > This is the patch I am going to check into GCC 8. > > -- > H.J.
> From 9ecbfa1fd04dc4370a9ec4f3d56189cc07aee668 Mon Sep 17 00:00:00 2001 > From: "H.J. Lu" <hjl.to...@gmail.com> > Date: Thu, 17 May 2018 09:52:09 -0700 > Subject: [PATCH] x86: Re-enable partial_reg_dependency and movx for Haswell > > r254152 disabled partial_reg_dependency and movx for Haswell and newer > Intel processors. r258972 restored them for skylake-avx512. For Haswell, > movx improves performance. But partial_reg_stall may be better than > partial_reg_dependency in theory. We will investigate performance impact > of partial_reg_stall vs partial_reg_dependency on Haswell for GCC 9. In > the meantime, this patch restores both partial_reg_dependency and mox for > Haswell in GCC 8. > > On Haswell, improvements for EEMBC benchmarks with > > -mtune-ctrl=movx,partial_reg_dependency -Ofast -march=haswell > > vs > > -Ofast -mtune=haswell > > are > > automotive > ========= > aifftr01 (default) - goodperf: Runtime improvement of 2.6% (time). > aiifft01 (default) - goodperf: Runtime improvement of 2.2% (time). > > networking > ========= > ip_pktcheckb1m (default) - goodperf: Runtime improvement of 3.8% (time). > ip_pktcheckb2m (default) - goodperf: Runtime improvement of 5.2% (time). > ip_pktcheckb4m (default) - goodperf: Runtime improvement of 4.4% (time). > ip_pktcheckb512k (default) - goodperf: Runtime improvement of 4.2% (time). > > telecom > ========= > fft00data_1 (default) - goodperf: Runtime improvement of 8.4% (time). > fft00data_2 (default) - goodperf: Runtime improvement of 8.6% (time). > fft00data_3 (default) - goodperf: Runtime improvement of 9.0% (time). Thanks for data. Why did you commited the patch to release branch only? The patch is OK for mainline too. I do not have access to the benchmark so I can not check. Why do we get the improvements here and how does that behave on skylake+? honza > > PR target/85829 > * config/i386/x86-tune.def: Re-enable partial_reg_dependency > and movx for Haswell. > --- > gcc/config/i386/x86-tune.def | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def > index 5649fdcf416..60625668236 100644 > --- a/gcc/config/i386/x86-tune.def > +++ b/gcc/config/i386/x86-tune.def > @@ -48,7 +48,7 @@ DEF_TUNE (X86_TUNE_SCHEDULE, "schedule", > over partial stores. For example preffer MOVZBL or MOVQ to load 8bit > value over movb. */ > DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency", > - m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE > + m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE | m_HASWELL > | m_BONNELL | m_SILVERMONT | m_INTEL > | m_KNL | m_KNM | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) > > @@ -84,7 +84,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, > "partial_flag_reg_stall", > partial dependencies. */ > DEF_TUNE (X86_TUNE_MOVX, "movx", > m_PPRO | m_P4_NOCONA | m_CORE2 | m_NEHALEM | m_SANDYBRIDGE > - | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL > + | m_BONNELL | m_SILVERMONT | m_KNL | m_KNM | m_INTEL | m_HASWELL > | m_GEODE | m_AMD_MULTIPLE | m_SKYLAKE_AVX512 | m_GENERIC) > > /* X86_TUNE_MEMORY_MISMATCH_STALL: Avoid partial stores that are followed by > -- > 2.17.0 >