ICX, 9% regression on znver3

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 20 Apr 2021 23:46:38 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100173


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note that store commoning of code sinking will sink the last store anyway:

@@ -546,7 +233,6 @@
   _27 = _26 << 1;
   _28 = (short int) _27;
   _29 = _28 | 1;
-  MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _29;
   goto <bb 9>; [100.00%]

   <bb 8> [local count: 505302904]:
@@ -556,21 +242,19 @@
   _32 = _31 << 1;
   _33 = (short int) _32;
   _34 = _33 | 1;
-  MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _34;

   <bb 9> [local count: 1010605809]:
+  # _94 = PHI <_29(7), _34(8)>
+  MEM[(struct StatePathMetricData *)pOut_90 + 4B].m_esState = _94;

but yes, cselim will also sink the first store, moving it across the
scalar compute in the block.  I might note that ideally we'd sink
all the compute as well and end up with just a conditional load of
either pIn1->m_esState or pIn2_89->m_esState.  That might then allow
scheduling to recover the original performance.

You can try that as a source transform, like

    e_s16 tem1, tem2;
    if (esMetric1 >=esMetric2) {
        tem1 = esMetric1;
        tem2 = pIn1->m_esState;
    }
    else {
        tem1 = esMetric2;
        tem2 = pIn2->m_esState;
    }
      pOut->m_esPathMetric =tem1;
      pOut->m_esState = (tem2 << 1) | 1;

[Bug tree-optimization/100173] telecom/viterb00data_1 has 16.92% regression compared O2 -ftree-vectorize -fvect-cost-model=very-cheap to O2 on CLX/ICX, 9% regression on znver3

Reply via email to