https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Richard Biener from comment #11)
> > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0]
> The reduced testcase is invalid because it uses uninitialized l.

Sure, lets fix that, it was reduced a bit too far:

https://godbolt.org/z/he3rT5Exq

Has the extracted codegen part.

Note how GCC 14 does at least 2x the number of floating point comparisons in
the hot loops.

The scalar code doesn't look (off the top of my head) that bad, but the
additional entries in the phi nodes are still causing major headaches for
vector code.

  # iftmp.2_36 = PHI <1(10), _95(11), 0(9)>
  # iftmp.0_97 = PHI <2.0e+0(10), 2.0e+0(11), 4.0e+0(9)>
  # iftmp.1_101 = PHI <5.0e-1(10), 5.0e-1(11), 2.5e-1(9)>

vs before

  # iftmp.2_38 = PHI <1(11), _95(12)>
  # iftmp.0_96 = PHI <2.0e+0(11), iftmp.0_94(12)>
  # iftmp.1_100 = PHI <5.0e-1(11), iftmp.1_98(12)>

which causes it to generate:

        fcmge   p3.s, p0/z, z0.s, z6.s
        fcmlt   p1.s, p0/z, z0.s, z6.s
        fcmge   p1.s, p1/z, z0.s, #0.0
        fcmge   p1.s, p3/z, z0.s, #0.0
        fcmlt   p3.s, p0/z, z0.s, #0.0

        vs

        fcmge   p3.s, p0/z, z0.s, #0.0
        fcmlt   p2.s, p0/z, z0.s, z16.s

The split in threading is causing it to miss that it can do the comparison with
0 just once on all the element.

Reply via email to