[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

rguenth at gcc dot gnu.org Tue, 21 Apr 2020 00:23:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359


--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #24)
> (In reply to Richard Biener from comment #22)
> > Created attachment 48311 [details]
> > patch
> > 
> > Note that apart from the possible bad impact on optimization when fixing 
> > this
> > bug an actual fix is complicated by the custom "optimized" dependence
> > analysis
> > code in the loop invariant motion pass.
> > 
> > A conservative "simple" patch would be the attached but that doesn't 
> > preserve
> > store-motion for the following (because the LIM data dependence code doesn't
> > care about stmt order):
> > 
> > typedef int A;
> > typedef float B;
> > 
> > void __attribute__((noinline,noclone))
> > foo(A *p, B *q, long unk)
> > {
> >   for (long i = 0; i < unk; ++i) {
> >       q[i] = 42;
> >       *p = 1;
> >   }
> > }
> > 
> > usually this bug doesn't manifest itself but of course the fix will be
> > experienced everywhere.  Benchmarking the simple patch might reveal
> > it's not an issue (but I doubt that...).
> 
> One case like this is gcc.dg/tree-ssa/pr81744.c which fails after the patch
> because we do not SM the global induction variable update which is already
> last before exit.  Similarly gcc.dg/graphite/pr80906.c and
> gcc.target/i386/pr64110.c - that's all of the GCC testsuite fallout on
> x86_64.  I do not
> think those regressions are acceptable on its own but I'll throw the patch
> on SPEC CPU 2006 to get more data (I fear even a solution preserving the
> cited regressions will regress actual code too much).

Results on a x86 Haswell CPU (-Ofast -march=native -flto), base unpatched
and peak patched (current trunk):

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
410.bwaves      13590        170       80.0 *   13590        175       77.6 *
416.gamess      19580        614       31.9 *   19580        614       31.9 *
433.milc         9180        335       27.4 *    9180        338       27.2 *
434.zeusmp       9100        227       40.0 *    9100        228       39.8 *
435.gromacs      7140        244       29.2 *    7140        245       29.2 *
436.cactusADM   11950        225       53.2 *   11950        224       53.3 *
437.leslie3d     9400        217       43.4 *    9400        225       41.8 *
444.namd         8020        304       26.4 *    8020        302       26.5 *
447.dealII      11440        201       56.8 *   11440        202       56.6 *
450.soplex       8340        226       36.9 *    8340        227       36.7 *
453.povray       5320        101       52.8 *    5320        101       52.9 *
454.calculix     8250        265       31.1 *    8250        265       31.1 *
459.GemsFDTD    10610        316       33.5 *   10610        315       33.6 *
465.tonto        9840        258       38.1 *    9840        258       38.1 *
470.lbm         13740        256       53.7 *   13740        261       52.7 *
481.wrf         11170        235       47.5 *   11170        237       47.2 *
482.sphinx3     19490        370       52.7 *   19490        373       52.3 *
 Est. SPECfp_base2006                  41.3
 Est. SPECfp2006                                                       41.0

400.perlbench    9770        249       39.2 *    9770        248       39.3 *
401.bzip2        9650        388       24.9 *    9650        389       24.8 *
403.gcc          8050        228       35.3 *    8050        230       35.0 *
429.mcf          9120        246       37.1 *    9120        241       37.9 *
445.gobmk       10490        388       27.1 *   10490        388       27.0 *
456.hmmer        9330        152       61.3 *    9330        151       61.7 *
458.sjeng       12100        426       28.4 *   12100        428       28.3 *
462.libquantum  20720        314       66.0 *   20720        308       67.3 *
464.h264ref     22130        414       53.5 *   22130        414       53.4 *
471.omnetpp      6250        290       21.5 *    6250        301       20.8 *
473.astar        7020        308       22.8 *    7020        308       22.8 *
483.xalancbmk    6900        180       38.4 *    6900        181       38.2 *
 Est. SPECint(R)_base2006              35.5
 Est. SPECint2006                                                      35.5

the "positive" ones are actually noise where spec median picked not the
fastest run for base.  There's enough negatives to not consider this
simple solution.  Actual consistent negatives to look at (to make sure
a better solution handles required cases) are 437.leslie3d and 471.omnetpp,
the rest are too much in the noise.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

Reply via email to