https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85175

Jeffrey A. Law <law at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at redhat dot com

--- Comment #10 from Jeffrey A. Law <law at redhat dot com> ---
So in general you're going to see fewer false positives with -O2 when compared
to -Os.  And the testcase in this BZ is no exception.

Looking at the .thread2 dump we have this:

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  i = 0;
  goto <bb 4>; [100.00%]
;;    succ:       4

;;   basic block 3, loop depth 1
;;    pred:       4
  __builtin_sprintf (&clkname, "di%d_sel", i.2_3);
  clkname ={v} {CLOBBER};
  i.1_1 = i;
  _2 = i.1_1 + 1;
  i = _2;
;;    succ:       4

;;   basic block 4, loop depth 1
;;    pred:       2
;;                3
  i.2_3 = i;
  if (i.2_3 <= 3)
    goto <bb 3>; [89.00%]
  else
    goto <bb 5>; [11.00%]
;;    succ:       3
;;                5


What we need to expose is the range of i.2_3 as [0..3].  The addressability of
"i" is an issue, but not a show-stopper.

The key here is block #4.  It's got two preds, one from outside the loop, one
from inside the loop.  And it's possible to thread the edge from outside the
loop.  When we do that it's largely mirroring loop header copying.  So why
don't we thread that edge?

DOM discovers the jump thread, but -Os kicks in.  When -Os is enabled we
severely throttle jump threading because it increases codesize due to its
inherent block copying.

But in this case I don't think it's going to result in any net new code, in
fact, it should simplify the result.

Sadly, reality is different.  If I hack up the compiler to allow threading in
this case the resultant code is 3 bytes longer.  But that's entirely because of
how we initialize "i" in the stack.   Instead of using the value 0 that's
conveniently in a register we instead generate a move-immediate into the stack
slot which is 4 bytes longer.

I'm very tempted to try and fix the threader and move the size regression issue
to the x86 maintainers.

Reply via email to