https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99718

--- Comment #11 from luoxhu at gcc dot gnu.org ---
Created attachment 50474
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50474&action=edit
32bit variable vec_insert

LLVM also generates store-hit-load instruction:

        addi 3, 1, -16
        rlwinm 4, 5, 2, 28, 29
        stvx 2, 0, 3
        stwx 6, 3, 4
        lvx 2, 0, 3
        blr
        .long   0
        .quad   0

I didn't use "can't" in my reply, sorry that caused the confusion, we though it
was  inefficient to move SF to SI on 32bit mode , but it turns out also huge
performance gain (46.704s -> 4.369s).

Attached the patch that also support variable vec_insert for 32bit, testing on
P8BE/PBLE/P9LE, could you please verify it on AIX? Will refine it and send to
the mail-list to fix this P1 issue fundamentally.

Reply via email to