https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Hongtao.liu from comment #7) > Can we recognize it as bswap32 + roatate 16 in match.pd when backend > supports boths, and then it should be easy for aarch64/arm to tranform bswap > + ratate into rev16 at rtl level. That definitely would be better in the general case (note it might not be in match.pd though) though doing: (set (reg:SI 98) (ior:SI (and:SI (lshiftrt:SI (reg/v:SI 97 [ x ]) (const_int 8 [0x8])) (const_int 16711935 [0xff00ff])) (reg:SI 102))) as (set (reg:SI 98) (ior:SI (lshiftrt:SI (and:SI (reg/v:SI 97 [ x ]) (const_int 0xff00ff00) ) (const_int 8 [0x8])) (reg:SI 102))) in the aarch64 backend would produce better code for some other examples too and not just rev16 generation really. Take: ``` unsigned f(unsigned x, unsigned b) { return ((x & 0xff00ff00U) >> 8) | b; } ``` GCC 5 used to produce: and w0, w0, -16711936 orr w0, w1, w0, lsr 8 ret While the tunk does: lsr w0, w0, 8 and w0, w0, 16711935 orr w0, w0, w1 ret Note xor and addition should be handled in a similar way too. That is these has a similar regression: unsigned f(unsigned x, unsigned b) { return ((x & 0xff00ff00U) >> 8) ^ b; } unsigned f1(unsigned x, unsigned b) { return ((x & 0xff00ff00U) >> 8) + b; }