https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718
--- Comment #11 from luoxhu at gcc dot gnu.org --- Created attachment 50474 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50474&action=edit 32bit variable vec_insert LLVM also generates store-hit-load instruction: addi 3, 1, -16 rlwinm 4, 5, 2, 28, 29 stvx 2, 0, 3 stwx 6, 3, 4 lvx 2, 0, 3 blr .long 0 .quad 0 I didn't use "can't" in my reply, sorry that caused the confusion, we though it was inefficient to move SF to SI on 32bit mode , but it turns out also huge performance gain (46.704s -> 4.369s). Attached the patch that also support variable vec_insert for 32bit, testing on P8BE/PBLE/P9LE, could you please verify it on AIX? Will refine it and send to the mail-list to fix this P1 issue fundamentally.