https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82004

--- Comment #36 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #35)
> Created attachment 43763 [details]
> pr82004_dumps.tar.xz
> 
> Dumps.  For lto I've just added the init_sw_absorption function parts of the
> dump, the dumps are too large.

Skipping partial redundancy for expression
{plus_expr,logchl_926,1.00000000000000002081668171172168513294309377670288085938e-2}
(0812), no redundancy on to be optimized for speed edge
Skipping partial redundancy for expression
{call_expr<__builtin_pow>,real_cst<1.0e+1>,logchl_1040} (0813), no redundancy
on to be optimized for speed edge

so with LTO we have "better" profile estimates and the entry edge is considered
cold...

LTO:

  <bb 33> [local count: 3813]:
...
  <bb 34> [local count: 16255]:
  # n_925 = PHI <0(33), _1128(129)>
  # logchl_926 = PHI
<-3.0099999999999997868371792719699442386627197265625e+0(33), logchl_1040(129)>

non-LTO:

  <bb 33> [local count: 10616]:
...
  <bb 34> [local count: 85892]:
  # logchl_591 = PHI
<-3.0099999999999997868371792719699442386627197265625e+0(33), logchl_701(129)>

in general optimizing the redundancy on the entry edge isn't worth it given
it often increases register pressure by introducing loop-carried dependences.
So I think LTO is "correct" here, even if that's unfortunate... :/

Reply via email to