https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108938

--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---
Got 1 performance opportunity in GCC itself with bswap + bit_and + rotate, the
Intermediate value are all single-use which can be DCEd.

Got 4 performance opportunity in SPEC2017.
bswap + bit_and + rotate + single_use: 1 
bswap + rotate + single_use: 1
bswap + rotate + not single_use: 2.

For 2 not single use, the tectase is like

foo1 (char* a, unsigned int* __restrict b)
{
  a[0] = b[0] >> 24;
  a[1] = b[0] >> 16;
  a[2] = b[0] >> 8;
  a[3] = b[0];
  a[4] = b[1] >> 24;
  a[5] = b[1] >> 16;
  a[6] = b[1] >> 8;
  a[7] = b[1];
}

b[0] is used by multi stmt for shift, but no other places, so it actually can
be DECd. So for GCC itself and SPEC2017 with -O2, bswap + bit_and + rotate
optimization won't cause extra stmts.

Reply via email to