https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #24 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #12) > (In reply to Richard Biener from comment #11) > > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0] > The reduced testcase is invalid because it uses uninitialized l. Sure, lets fix that, it was reduced a bit too far: https://godbolt.org/z/he3rT5Exq Has the extracted codegen part. Note how GCC 14 does at least 2x the number of floating point comparisons in the hot loops. The scalar code doesn't look (off the top of my head) that bad, but the additional entries in the phi nodes are still causing major headaches for vector code. # iftmp.2_36 = PHI <1(10), _95(11), 0(9)> # iftmp.0_97 = PHI <2.0e+0(10), 2.0e+0(11), 4.0e+0(9)> # iftmp.1_101 = PHI <5.0e-1(10), 5.0e-1(11), 2.5e-1(9)> vs before # iftmp.2_38 = PHI <1(11), _95(12)> # iftmp.0_96 = PHI <2.0e+0(11), iftmp.0_94(12)> # iftmp.1_100 = PHI <5.0e-1(11), iftmp.1_98(12)> which causes it to generate: fcmge p3.s, p0/z, z0.s, z6.s fcmlt p1.s, p0/z, z0.s, z6.s fcmge p1.s, p1/z, z0.s, #0.0 fcmge p1.s, p3/z, z0.s, #0.0 fcmlt p3.s, p0/z, z0.s, #0.0 vs fcmge p3.s, p0/z, z0.s, #0.0 fcmlt p2.s, p0/z, z0.s, z16.s The split in threading is causing it to miss that it can do the comparison with 0 just once on all the element.