https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108938
--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> --- Got 1 performance opportunity in GCC itself with bswap + bit_and + rotate, the Intermediate value are all single-use which can be DCEd. Got 4 performance opportunity in SPEC2017. bswap + bit_and + rotate + single_use: 1 bswap + rotate + single_use: 1 bswap + rotate + not single_use: 2. For 2 not single use, the tectase is like foo1 (char* a, unsigned int* __restrict b) { a[0] = b[0] >> 24; a[1] = b[0] >> 16; a[2] = b[0] >> 8; a[3] = b[0]; a[4] = b[1] >> 24; a[5] = b[1] >> 16; a[6] = b[1] >> 8; a[7] = b[1]; } b[0] is used by multi stmt for shift, but no other places, so it actually can be DECd. So for GCC itself and SPEC2017 with -O2, bswap + bit_and + rotate optimization won't cause extra stmts.