https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118125
--- Comment #4 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Redirecting the call to operator delete[](void*) to
__builtin_unreachable(), which seems the correct thing to do, leads to
one more SLP vectorization in the functin experiencing the slow-down,
comparing -fopt-info-optimized gives:
@@ -188,6 +188,7 @@
/home/mjambor/gcc/mine/inst/include/c++/15.0.1/bits/stl_tree.h:206:27:
optimized: basic block part vectorized using 16 byte vectors
include/lac/vector.h:990:12: optimized: basic block part vectorized using 8
byte vectors
include/lac/vector.h:979:31: optimized: basic block part vectorized using 8
byte vectors
+include/lac/solver_gmres.h:498:14: optimized: basic block part vectorized
using 16 byte vectors
include/lac/vector.h:949:3: optimized: basic block part vectorized using 8
byte vectors
/home/mjambor/gcc/mine/inst/include/c++/15.0.1/bits/stl_tree.h:206:27:
optimized: basic block part vectorized using 16 byte vectors
include/lac/vector.h:990:12: optimized: basic block part vectorized using 8
byte vectors
I have mananged to avoid that one particular SLP vectoriation using
-fdbg-cnt=ipa_update_vr:2175-2175:3013-3013,vect_slp:1-61,63-9999 and
was able to get the original performance back. Unfortunately when I
then looked at SLP vectorization when all IPA-VR propagations were
allowed again, this particular case was not there (but there were
plenty of others).
If I can read the slp dump correctly (which is a big if), the vectorization
produced the following change:
@@ -30911,15 +26466,16 @@
# DEBUG this => NULL
# DEBUG i => NULL
c_790 = *_789;
+ _272 = {c_790, c_790};
# DEBUG c => c_790
# DEBUG this => D#400
# DEBUG i => D#396
_792 = _229 + _785;
# DEBUG this => NULL
# DEBUG i => NULL
- # DEBUG dummy => D__lsm0.1125_879
- _794 = c_790 * D__lsm0.1125_879;
- _795 = i_758 + 1;
+ # DEBUG dummy => D__lsm0.1125_214
+ _794 = D__lsm0.1125_214 * c_790;
+ _795 = i_754 + 1;
# DEBUG D#397 => (unsigned int) _795
# DEBUG this => D#400
# DEBUG i => D#397
@@ -30929,13 +26485,16 @@
# DEBUG this => NULL
# DEBUG i => NULL
_799 = *_798;
+ _273 = {D__lsm0.1125_214, _799};
_800 = s_787 * _799;
_801 = _794 + _800;
- *_792 = _801;
- _1111 = s_787 * D__lsm0.1125_879;
+ _1110 = D__lsm0.1125_214 * s_787;
+ _252 = {_800, _1110};
+ vect__2237.1174_737 = .VEC_FMSUBADD (_273, _272, _252);
_805 = c_790 * _799;
- _41 = _805 - _1111;
- *_798 = _41;
+ _41 = _805 - _1110;
+ vectp.1176_781 = _792;
+ MEM <vector(2) double> [(double &)vectp.1176_781] = vect__2237.1174_737;
# DEBUG i => _795
if (_298 > _795)
goto <bb 405>; [89.00%]