https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89071

--- Comment #21 from hjl at gcc dot gnu.org <hjl at gcc dot gnu.org> ---
Author: hjl
Date: Fri Feb 22 15:54:08 2019
New Revision: 269119

URL: https://gcc.gnu.org/viewcvs?rev=269119&root=gcc&view=rev
Log:
i386: Add pass_remove_partial_avx_dependency

With -mavx, for

$ cat foo.i
extern float f;
extern double d;
extern int i;

void
foo (void)
{
  d = f;
  f = i;
}

we need to generate

        vxorp[ds]       %xmmN, %xmmN, %xmmN
        ...
        vcvtss2sd       f(%rip), %xmmN, %xmmX
        ...
        vcvtsi2ss       i(%rip), %xmmN, %xmmY

to avoid partial XMM register stall.  This patch adds a pass to generate
a single

        vxorps          %xmmN, %xmmN, %xmmN

at entry of the nearest dominator for basic blocks with SF/DF conversions,
which is in the fake loop that contains the whole function, instead of
generating one

        vxorp[ds]       %xmmN, %xmmN, %xmmN

for each SF/DF conversion.

NB: The LCM algorithm isn't appropriate here since it may place a vxorps
inside the loop.  Simple testcase show this:

$ cat badcase.c

extern float f;
extern double d;

void
foo (int n, int k)
{
  for (int j = 0; j != n; j++)
    if (j < k)
      d = f;
}

It generates

    ...
    loop:
      if(j < k)
        vxorps    %xmm0, %xmm0, %xmm0
        vcvtss2sd f(%rip), %xmm0, %xmm0
      ...
    loopend
    ...

This is because LCM only works when there is a certain benifit.  But for
conditional branch, LCM wouldn't move

   vxorps  %xmm0, %xmm0, %xmm0

out of loop.  SPEC CPU 2017 on Intel Xeon with AVX512 shows:

1. The nearest dominator

|RATE                   |Improvement|
|500.perlbench_r        | 0.55% |
|538.imagick_r          | 8.43% |
|544.nab_r              | 0.71% |

2. LCM

|RATE                   |Improvement|
|500.perlbench_r        | -0.76% |
|538.imagick_r          | 7.96%  |
|544.nab_r              | -0.13% |

Performance impacts of SPEC CPU 2017 rate on Intel Xeon with AVX512
using

-Ofast -flto -march=skylake-avx512 -funroll-loops

before

commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576
Author: uros <uros@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Jan 31 20:06:42 2019 +0000

            PR target/89071
            * config/i386/i386.md (*extendsfdf2): Split out reg->reg
            alternative to avoid partial SSE register stall for TARGET_AVX.
            (truncdfsf2): Ditto.
            (sse4_1_round<mode>2): Ditto.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268427
138bc75d-0d04-0410-961f-82ee72b054a4

are:

|INT RATE               |Improvement|
|500.perlbench_r        | 0.55% |
|502.gcc_r              | 0.14% |
|505.mcf_r              | 0.08% |
|523.xalancbmk_r        | 0.18% |
|525.x264_r             |-0.49% |
|531.deepsjeng_r        |-0.04% |
|541.leela_r            |-0.26% |
|548.exchange2_r        |-0.3%  |
|557.xz_r               |BuildSame|

|FP RATE                |Improvement|
|503.bwaves_r           |-0.29% |
|507.cactuBSSN_r        | 0.04% |
|508.namd_r             |-0.74% |
|510.parest_r           |-0.01% |
|511.povray_r           | 2.23% |
|519.lbm_r              | 0.1%  |
|521.wrf_r              | 0.49% |
|526.blender_r          | 0.13% |
|527.cam4_r             | 0.65% |
|538.imagick_r          | 8.43% |
|544.nab_r              | 0.71% |
|549.fotonik3d_r        | 0.15% |
|554.roms_r             | 0.08% |

After commit e739972ad6ad05e32a1dd5c29c0b950a4c4bd576, on Skylake client,
impacts on 538.imagick_r with

-fno-unsafe-math-optimizations -march=native -Ofast -funroll-loops -flto

1. Size comparision:

before:

   text    data     bss     dec     hex filename
2436377    8352    4528 2449257  255f69 imagick_r

after:

   text    data     bss     dec     hex filename
2425249    8352    4528 2438129  2533f1 imagick_r

2. Number of vxorps:

before          after           difference
4948            4135            -19.66%

3. Performance improvement:

|RATE                   |Improvement|
|538.imagick_r          | 5.5%  |

gcc/

2019-02-22  H.J. Lu  <hongjiu...@intel.com>
            Hongtao Liu  <hongtao....@intel.com>
            Sunil K Pandey  <sunil.k.pan...@intel.com>

        PR target/87007
        * config/i386/i386-passes.def: Add
        pass_remove_partial_avx_dependency.
        * config/i386/i386-protos.h
        (make_pass_remove_partial_avx_dependency): New.
        * config/i386/i386.c (make_pass_remove_partial_avx_dependency):
        New function.
        (pass_data_remove_partial_avx_dependency): New.
        (pass_remove_partial_avx_dependency): Likewise.
        (make_pass_remove_partial_avx_dependency): Likewise.
        * config/i386/i386.md (avx_partial_xmm_update): New attribute.
        (*extendsfdf2): Add avx_partial_xmm_update.
        (truncdfsf2): Likewise.
        (*float<SWI48:mode><MODEF:mode>2): Likewise.
        (SF/DF conversion splitters): Disabled for TARGET_AVX.

gcc/testsuite/

2019-02-22  H.J. Lu  <hongjiu...@intel.com>
            Hongtao Liu  <hongtao....@intel.com>
            Sunil K Pandey  <sunil.k.pan...@intel.com>

        PR target/87007
        * gcc.target/i386/pr87007-1.c: New test.
        * gcc.target/i386/pr87007-2.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr87007-1.c
    trunk/gcc/testsuite/gcc.target/i386/pr87007-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-passes.def
    trunk/gcc/config/i386/i386-protos.h
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/i386.md
    trunk/gcc/testsuite/ChangeLog

Reply via email to