https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117251
--- Comment #10 from Michael Meissner <meissner at gcc dot gnu.org> ---
There is an instruction that was added in power10 (XXEVAL) that does provide
fusion between VSX vectors that includes ANDC->XOR and XOR->XOR fusion. I have
coded up patches to support this and I will be submitting these patches
shortly.
XXEVAL Trunk GCC14 GCC13 GCC12 GCC11
------ ----- ----- ----- ----- -----
-O3: 5.53 6.15 6.28 5.57 5.61 9.56
The latency of XXEVAL is slightly more than the fused VANDC/VXOR or VXOR/VXOR,
so I have written the patch to prefer doing the Altivec instructions if they
don't need a temporary register.
XXEVAL Trunk GCC14 GCC13 GCC12
------ ----- ----- ----- -----
Fuse VANDC -> VXOR 209 600 600 600 600
Fuse VXOR -> VXOR --- 240 240 120 120
XXEVAL to fuse ANDC -> XOR 391 --- --- --- ---
XXEVAL to fuse XOR -> XOR 240 --- --- --- ---
Spill vector to stack 78 364 364 172 184
Load spilled vector from stack 431 962 962 713 723
Vector moves 10 100 100 70 72
Vector rotate right 696 696 696 696 696
XXLANDC or VANDC 209 600 600 600 600
XXLXOR or VXOR 953 1,824 1,824 1,824 1,824
XXEVAL 631 --- --- --- ---
XXSPLTIB and VEXTSB2D to load constants 24 24 24 24 24