https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #24) > (In reply to Richard Biener from comment #22) > > Created attachment 48311 [details] > > patch > > > > Note that apart from the possible bad impact on optimization when fixing > > this > > bug an actual fix is complicated by the custom "optimized" dependence > > analysis > > code in the loop invariant motion pass. > > > > A conservative "simple" patch would be the attached but that doesn't > > preserve > > store-motion for the following (because the LIM data dependence code doesn't > > care about stmt order): > > > > typedef int A; > > typedef float B; > > > > void __attribute__((noinline,noclone)) > > foo(A *p, B *q, long unk) > > { > > for (long i = 0; i < unk; ++i) { > > q[i] = 42; > > *p = 1; > > } > > } > > > > usually this bug doesn't manifest itself but of course the fix will be > > experienced everywhere. Benchmarking the simple patch might reveal > > it's not an issue (but I doubt that...). > > One case like this is gcc.dg/tree-ssa/pr81744.c which fails after the patch > because we do not SM the global induction variable update which is already > last before exit. Similarly gcc.dg/graphite/pr80906.c and > gcc.target/i386/pr64110.c - that's all of the GCC testsuite fallout on > x86_64. I do not > think those regressions are acceptable on its own but I'll throw the patch > on SPEC CPU 2006 to get more data (I fear even a solution preserving the > cited regressions will regress actual code too much). Results on a x86 Haswell CPU (-Ofast -march=native -flto), base unpatched and peak patched (current trunk): Estimated Estimated Base Base Base Peak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -------------- ------ --------- --------- ------ --------- --------- 410.bwaves 13590 170 80.0 * 13590 175 77.6 * 416.gamess 19580 614 31.9 * 19580 614 31.9 * 433.milc 9180 335 27.4 * 9180 338 27.2 * 434.zeusmp 9100 227 40.0 * 9100 228 39.8 * 435.gromacs 7140 244 29.2 * 7140 245 29.2 * 436.cactusADM 11950 225 53.2 * 11950 224 53.3 * 437.leslie3d 9400 217 43.4 * 9400 225 41.8 * 444.namd 8020 304 26.4 * 8020 302 26.5 * 447.dealII 11440 201 56.8 * 11440 202 56.6 * 450.soplex 8340 226 36.9 * 8340 227 36.7 * 453.povray 5320 101 52.8 * 5320 101 52.9 * 454.calculix 8250 265 31.1 * 8250 265 31.1 * 459.GemsFDTD 10610 316 33.5 * 10610 315 33.6 * 465.tonto 9840 258 38.1 * 9840 258 38.1 * 470.lbm 13740 256 53.7 * 13740 261 52.7 * 481.wrf 11170 235 47.5 * 11170 237 47.2 * 482.sphinx3 19490 370 52.7 * 19490 373 52.3 * Est. SPECfp_base2006 41.3 Est. SPECfp2006 41.0 400.perlbench 9770 249 39.2 * 9770 248 39.3 * 401.bzip2 9650 388 24.9 * 9650 389 24.8 * 403.gcc 8050 228 35.3 * 8050 230 35.0 * 429.mcf 9120 246 37.1 * 9120 241 37.9 * 445.gobmk 10490 388 27.1 * 10490 388 27.0 * 456.hmmer 9330 152 61.3 * 9330 151 61.7 * 458.sjeng 12100 426 28.4 * 12100 428 28.3 * 462.libquantum 20720 314 66.0 * 20720 308 67.3 * 464.h264ref 22130 414 53.5 * 22130 414 53.4 * 471.omnetpp 6250 290 21.5 * 6250 301 20.8 * 473.astar 7020 308 22.8 * 7020 308 22.8 * 483.xalancbmk 6900 180 38.4 * 6900 181 38.2 * Est. SPECint(R)_base2006 35.5 Est. SPECint2006 35.5 the "positive" ones are actually noise where spec median picked not the fastest run for base. There's enough negatives to not consider this simple solution. Actual consistent negatives to look at (to make sure a better solution handles required cases) are 437.leslie3d and 471.omnetpp, the rest are too much in the noise.