在 2024/7/31 下午6:25, Xi Ruoyao 写道:
On Wed, 2024-07-31 at 16:57 +0800, Lulu Cheng wrote:
在 2024/7/29 下午3:58, Xi Ruoyao 写道:
Per a gcc-help thread we are generating sub-optimal code for
__builtin_bswap{32,64}. To fix it:
- Use a single revb.d instruction for bswapdi2.
- Use a single revb.2w instruction for bswapsi2 for TARGET_64BIT,
revb.2h + rotri.w for !TARGET_64BIT.
- Use a single revb.2h instruction for bswapsi2 (x) r>> 16, and a single
revb.2w instruction for bswapdi2 (x) r>> 32.
Unfortunately I cannot figure out a way to make the compiler generate
revb.4h or revh.{2w,d} instructions.
This optimization is really ingenious and I have no problem.
I also haven't figured out how to generate revb.4h or revh. {2w,d}.
I think we can merge this patch first.
Pushed r15-2433.
Ok. Thanks!
FWIW I tried a naive pattern for revh.2w:
(set (match_operand:DI 0 "register_operand" "=r")
(ior:DI
(and:DI
(ashift:DI (match_operand:DI 1 "register_operand" "r")
(const_int 16))
(const_int 18446462603027742720))
(and:DI
(lshiftrt:DI (match_dup 1)
(const_int 16))
(const_int 281470681808895))))
But it seems too complex to be recognized.
I think it needs to be recognized as a bswap operation in the tree-bswap
phase,
but it seems a bit difficult to be recognized