[Bug rtl-optimization/91460] gcc -mpreferred-vector-width=256 is slower than -mpreferred-vector-width=128 for some loops

hjl.tools at gmail dot com Thu, 15 Aug 2019 12:18:39 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91460


H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
This testcase

---
int block[9][9][9];
void foo(int row, int k, int h)
{
  /* Variable nrow range from 4 to 9.  */
  int nrow = ((row - 1)/3 + 1)*3 + 1;

   for (int i = nrow; i < 9; i++)
     block[k][h][i] = block[k][h][i] - 10;
}
---

Since nrow range from 4 to 9, 256bit vector operation will never be
executed(vector elements always less than 8), so 256bit vector actually
equals no vectorization plus additional branch cost.  Even with epilogue
vectorization, 256bit vector still has more overhead.  When this is a hot
function, 256bit vector can reduce performance by 6%.

[Bug rtl-optimization/91460] gcc -mpreferred-vector-width=256 is slower than -mpreferred-vector-width=128 for some loops

Reply via email to