https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100173
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Note that store commoning of code sinking will sink the last store anyway: @@ -546,7 +233,6 @@ _27 = _26 << 1; _28 = (short int) _27; _29 = _28 | 1; - MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _29; goto <bb 9>; [100.00%] <bb 8> [local count: 505302904]: @@ -556,21 +242,19 @@ _32 = _31 << 1; _33 = (short int) _32; _34 = _33 | 1; - MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _34; <bb 9> [local count: 1010605809]: + # _94 = PHI <_29(7), _34(8)> + MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _94; but yes, cselim will also sink the first store, moving it across the scalar compute in the block. I might note that ideally we'd sink all the compute as well and end up with just a conditional load of either pIn1->m_esState or pIn2_89->m_esState. That might then allow scheduling to recover the original performance. You can try that as a source transform, like e_s16 tem1, tem2; if (esMetric1 >=esMetric2) { tem1 = esMetric1; tem2 = pIn1->m_esState; } else { tem1 = esMetric2; tem2 = pIn2->m_esState; } pOut->m_esPathMetric =tem1; pOut->m_esState = (tem2 << 1) | 1;