https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85175
Jeffrey A. Law <law at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |law at redhat dot com --- Comment #10 from Jeffrey A. Law <law at redhat dot com> --- So in general you're going to see fewer false positives with -O2 when compared to -Os. And the testcase in this BZ is no exception. Looking at the .thread2 dump we have this: ;; basic block 2, loop depth 0 ;; pred: ENTRY i = 0; goto <bb 4>; [100.00%] ;; succ: 4 ;; basic block 3, loop depth 1 ;; pred: 4 __builtin_sprintf (&clkname, "di%d_sel", i.2_3); clkname ={v} {CLOBBER}; i.1_1 = i; _2 = i.1_1 + 1; i = _2; ;; succ: 4 ;; basic block 4, loop depth 1 ;; pred: 2 ;; 3 i.2_3 = i; if (i.2_3 <= 3) goto <bb 3>; [89.00%] else goto <bb 5>; [11.00%] ;; succ: 3 ;; 5 What we need to expose is the range of i.2_3 as [0..3]. The addressability of "i" is an issue, but not a show-stopper. The key here is block #4. It's got two preds, one from outside the loop, one from inside the loop. And it's possible to thread the edge from outside the loop. When we do that it's largely mirroring loop header copying. So why don't we thread that edge? DOM discovers the jump thread, but -Os kicks in. When -Os is enabled we severely throttle jump threading because it increases codesize due to its inherent block copying. But in this case I don't think it's going to result in any net new code, in fact, it should simplify the result. Sadly, reality is different. If I hack up the compiler to allow threading in this case the resultant code is 3 bytes longer. But that's entirely because of how we initialize "i" in the stack. Instead of using the value 0 that's conveniently in a register we instead generate a move-immediate into the stack slot which is 4 bytes longer. I'm very tempted to try and fix the threader and move the size regression issue to the x86 maintainers.