On Tue, Sep 11, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Tue, Sep 4, 2018 at 9:01 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Tue, Aug 28, 2018 at 11:04 AM, H.J. Lu <hongjiu...@intel.com> wrote: >>> With -mavx, for >>> >>> [hjl@gnu-cfl-1 skx-2]$ cat foo.i >>> extern float f; >>> extern double d; >>> extern int i; >>> >>> void >>> foo (void) >>> { >>> d = f; >>> f = i; >>> } >>> >>> we need to generate >>> >>> vxorp[ds] %xmmN, %xmmN, %xmmN >>> ... >>> vcvtss2sd f(%rip), %xmmN, %xmmX >>> ... >>> vcvtsi2ss i(%rip), %xmmN, %xmmY >>> >>> to avoid partial XMM register stall. This patch adds a pass to generate >>> a single >>> >>> vxorps %xmmN, %xmmN, %xmmN >>> >>> at function entry, which is shared by all SF and DF conversions, instead >>> of generating one >>> >>> vxorp[ds] %xmmN, %xmmN, %xmmN >>> >>> for each SF/DF conversion. >>> >>> Performance impacts on SPEC CPU 2017 rate with 1 copy using >>> >>> -Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops >>> >>> are >>> >>> 1. On Broadwell server: >>> >>> 500.perlbench_r (-0.82%) >>> 502.gcc_r (0.73%) >>> 505.mcf_r (-0.24%) >>> 520.omnetpp_r (-2.22%) >>> 523.xalancbmk_r (-1.47%) >>> 525.x264_r (0.31%) >>> 531.deepsjeng_r (0.27%) >>> 541.leela_r (0.85%) >>> 548.exchange2_r (-0.11%) >>> 557.xz_r (-0.34%) >>> Geomean: (-0.23%) >>> >>> 503.bwaves_r (0.00%) >>> 507.cactuBSSN_r (-1.88%) >>> 508.namd_r (0.00%) >>> 510.parest_r (-0.56%) >>> 511.povray_r (0.49%) >>> 519.lbm_r (-1.28%) >>> 521.wrf_r (-0.28%) >>> 526.blender_r (0.55%) >>> 527.cam4_r (-0.20%) >>> 538.imagick_r (2.52%) >>> 544.nab_r (-0.18%) >>> 549.fotonik3d_r (-0.51%) >>> 554.roms_r (-0.22%) >>> Geomean: (0.00%) >>> >>> 2. On Skylake client: >>> >>> 500.perlbench_r (-0.29%) >>> 502.gcc_r (-0.36%) >>> 505.mcf_r (1.77%) >>> 520.omnetpp_r (-0.26%) >>> 523.xalancbmk_r (-3.69%) >>> 525.x264_r (-0.32%) >>> 531.deepsjeng_r (0.00%) >>> 541.leela_r (-0.46%) >>> 548.exchange2_r (0.00%) >>> 557.xz_r (0.00%) >>> Geomean: (-0.34%) >>> >>> 503.bwaves_r (0.00%) >>> 507.cactuBSSN_r (-0.56%) >>> 508.namd_r (0.87%) >>> 510.parest_r (0.00%) >>> 511.povray_r (-0.73%) >>> 519.lbm_r (0.84%) >>> 521.wrf_r (0.00%) >>> 526.blender_r (-0.81%) >>> 527.cam4_r (-0.43%) >>> 538.imagick_r (2.55%) >>> 544.nab_r (0.28%) >>> 549.fotonik3d_r (0.00%) >>> 554.roms_r (0.32%) >>> Geomean: (0.12%) >>> >>> 3. On Skylake server: >>> >>> 500.perlbench_r (-0.55%) >>> 502.gcc_r (0.69%) >>> 505.mcf_r (0.00%) >>> 520.omnetpp_r (-0.33%) >>> 523.xalancbmk_r (-0.21%) >>> 525.x264_r (-0.27%) >>> 531.deepsjeng_r (0.00%) >>> 541.leela_r (0.00%) >>> 548.exchange2_r (-0.11%) >>> 557.xz_r (0.00%) >>> Geomean: (0.00%) >>> >>> 503.bwaves_r (0.58%) >>> 507.cactuBSSN_r (0.00%) >>> 508.namd_r (0.00%) >>> 510.parest_r (0.18%) >>> 511.povray_r (-0.58%) >>> 519.lbm_r (0.25%) >>> 521.wrf_r (0.40%) >>> 526.blender_r (0.34%) >>> 527.cam4_r (0.19%) >>> 538.imagick_r (5.87%) >>> 544.nab_r (0.17%) >>> 549.fotonik3d_r (0.00%) >>> 554.roms_r (0.00%) >>> Geomean: (0.62%) >>> >>> On Skylake client, impacts on 538.imagick_r are >>> >>> size before: >>> >>> text data bss dec hex filename >>> 2555577 10876 5576 2572029 273efd imagick_r.exe >>> >>> size after: >>> >>> text data bss dec hex filename >>> 2511825 10876 5576 2528277 269415 imagick_r.exe >>> >>> number of vxorp[ds]: >>> >>> before after difference >>> 14570 4515 -69% >>> >>> OK for trunk? >>> >>> Thanks. >>> >>> >>> H.J. >>> --- >>> gcc/ >>> >>> 2018-08-28 H.J. Lu <hongjiu...@intel.com> >>> Sunil K Pandey <sunil.k.pan...@intel.com> >>> >>> PR target/87007 >>> * config/i386/i386-passes.def: Add >>> pass_remove_partial_avx_dependency. >>> * config/i386/i386-protos.h >>> (make_pass_remove_partial_avx_dependency): New. >>> * config/i386/i386.c (make_pass_remove_partial_avx_dependency): >>> New function. >>> (pass_data_remove_partial_avx_dependency): New. >>> (pass_remove_partial_avx_dependency): Likewise. >>> (make_pass_remove_partial_avx_dependency): Likewise. >>> * config/i386/i386.md (SF/DF conversion splitters): Disabled >>> for TARGET_AVX. >>> >>> gcc/testsuite/ >>> >>> 2018-08-28 H.J. Lu <hongjiu...@intel.com> >>> Sunil K Pandey <sunil.k.pan...@intel.com> >>> >>> PR target/87007 >>> * gcc.target/i386/pr87007.c: New file. >> >> >> PING: >> >> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01781.html >> > > PING. >
Hi Kirll, Jakub, Jan, Can you take a look? Thanks. -- H.J.