On Sun, 6 Jan 2019, Jan Hubicka wrote:
> Hello,
> while running benchmarks for inliner tuning I also run benchmarks
> comparing -O2 and -O2 -ftree-vectorize -ftree-slp-vectorize using Martin
> Liska's LNT setup (https://lnt.opensuse.org/). The results are
> summarized below but you can also see also colorful table produced
> by Martin's LNT magic
>
> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?num_runs=3_percentage_change=0.02=746f%2C55f=IwAR1EhvEnavV5Fg5g404cTrguOXG2cW7b3mRZZvtYn1qy93zihyAanZ7AiWQ
> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?num_runs=10_percentage_change=0.02=746f%2C55f
>
> Overall we got following SPECrate improvements:
>
> SPECfp2k6 kabylake generic +7.15%
> SPECfp2k6 kabylake native +9.36%
> SPECfp2k17 kabylake generic +5.36%
> SPECfp2k17 kabylake native +6.03%
> SPECint2k17 kabylake generic +4.13%
>
> SPECfp2k6 zen generic +9.98%
> SPECfp2k6 zen native +7.04%
> SPECfp2k17 zen generic +6.11%
> SPECfp2k17 zen native +5.46%
> SPECint2k17 zen generic +3.61%
> SPECint2k17 zen native +5.18%
>
> The performance results seems surprisingly a lot in favor of
> vectorization. Martin's setup is also checking code size which goes up
> by as much 26% on leslie 3d, but since many of benchmarks are small,
> this is not very representative for overall code size/compile time costs
> of vectorization.
>
> I measured compile time/size on larger programs I have available with
> notable changes on DealII, but otherwise sub 1% increases. I also
> benchmarked Firefox but there are no significant differences because
> build system already uses -O3 for places where it matters (graphics
> library etc.)
Well, as much as compile-time/size of spec is not representable
the performance improvements are.
>Compile timecode segment size
> Firefox mainlin in noise 0.8%
> gcc from spec2k6 0.5% 0.6%
> gdb 0.8% 0.3%
> crafty0% 0%
> DealII3.2% 4%
>
> Note that I benchmarked -ftree-slp-vectorize separately before and
> results was hit/miss, so perhaps enabling only -ftree-vectorize would
> give better compile time tradeoffs. I was worried of partial memory
> stalls, but I will benchmark it and also benchmark difference between
> cost models.
>
> There are some performance regressions, most notably in SPEC
> - exchange (all settings),
> - gamess (all settings),
> - calculix (Zen native only),
> - bwaves (zen native)
> and induct2 on all settings and ffft2 zen only from Polyhedron. Botan
> seems very noisy, but it is rather special code.
>
> Exchange can be fixed by adding heuristics that it is bad idea to
> vectorize withing loop nest of 10 containing recursive call. I believe
> gamess and calculix are understood and i can look into the remaining
> cases.
>
> Overall I am surprised how many improvements vectorization at -O2 can do
> - clearly more parallel CPUs depends it depends on it. In my experience
> from analyzing regressions of gcc -O2 compared to clang -O2 buids,
> vectorization is one of most common reasons. Having gcc -O2 producing
> lower SPEC scores and comparably large binaries to clang -O2 does not
> feel OK and I think the problem is not limited just to artificial
> benchmarks.
>
> Even though it is late in release cycle I wonder if we can do that for
> GCC 9? Performance of vectorization is very architecture specific, I
> would propose enabling vectorization for Zen, core based chips and
> generic in x86-64. I can also run benchmarks on buldozer. I can then
> tune down the cheap model to avoid some of more expensive
> transformations.
I'd rather not do this now, it's _way_ too late (also considering
you are again doing inliner tuning so late).
See our last attempts at this btw.
Richard.
> Honza
>
>
> Kabylake Spec2k6, generic tuning
>
> improvements:
> SPEC2006/FP/481.wrf -31.33%
> SPEC2006/FP/436.cactusADM -28.17%
> SPEC2006/FP/437.leslie3d -17.21%
> SPEC2006/FP/434.zeusmp-12.90%
> SPEC2006/FP/454.calculix -6.44%
> SPEC2006/FP/433.milc -6.03%
> SPEC2006/FP/459.GemsFDTD -4.65%
> SPEC2006/FP/450.soplex-2.11%
> SPEC2006/INT/403.gcc -6.54%
> SPEC2006/INT/456.hmmer-5.45%
> SPEC2006/INT/464.h264ref -2.23%
> regresions:
> SPEC2006/FP/416.gamess8.51%
> SPEC2006/FP/447.dealII2.73%
>
> Kabylake spec2k6 -march=native
>
> improvements:
> SPEC2006/FP/436.cactusADM -45.52%
> SPEC2006/FP/481.wrf -34.13%
> SPEC2006/FP/434.zeusmp-20.25%
> SPEC2006/FP/437.leslie3d -19.44%
>