https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #7)
> Can we recognize it as bswap32 + roatate 16 in match.pd when backend
> supports boths, and then it should be easy for aarch64/arm to tranform bswap
> + ratate into rev16 at rtl level.

That definitely would be better in the general case (note it might not be in
match.pd though) though doing:
(set (reg:SI 98)
    (ior:SI (and:SI (lshiftrt:SI (reg/v:SI 97 [ x ])
                (const_int 8 [0x8]))
            (const_int 16711935 [0xff00ff]))
        (reg:SI 102)))

as
(set (reg:SI 98)
     (ior:SI
      (lshiftrt:SI
       (and:SI (reg/v:SI 97 [ x ])  (const_int 0xff00ff00) )
       (const_int 8 [0x8]))
      (reg:SI 102)))

in the aarch64 backend would produce better code for some other examples too
and not just rev16 generation really.
Take:
```
unsigned f(unsigned x, unsigned b)
{
  return ((x & 0xff00ff00U) >> 8) | b;
}
```

GCC 5 used to produce:
        and     w0, w0, -16711936
        orr     w0, w1, w0, lsr 8
        ret

While the tunk does:
        lsr     w0, w0, 8
        and     w0, w0, 16711935
        orr     w0, w0, w1
        ret

Note xor and addition should be handled in a similar way too.
That is these has a similar regression:
unsigned f(unsigned x, unsigned b)
{
  return ((x & 0xff00ff00U) >> 8) ^ b;
}
unsigned f1(unsigned x, unsigned b)
{
  return ((x & 0xff00ff00U) >> 8) + b;
}

Reply via email to