Hi, On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles <justin.rugg...@gmail.com> wrote: > + mova m0, [srcq ] ; m0 = 0, 1, 2, 3, 4, 5, 6, 7 > + mova m1, [srcq+ mmsize] ; m1 = 8, 9, 10, 11, 12, 13, 14, 15 > + mova m2, [srcq+2*mmsize] ; m2 = 16, 17, 18, 19, 20, 21, 22, 23 > + movhlps m3, m1 > + movlhps m3, m2 ; m3 = 12, 13, 14, 15, 16, 17, 18, 19 > + movlhps m1, m1 > + movhlps m1, m0 ; m1 = 4, 5, 6, 7, 8, 9, 10, 11 > + psrldq m1, 4 ; m1 = 6, 7, 8, 9, 10, 11, x, x > + psrldq m2, 4 ; m2 = 18, 19, 20, 21, 22, 23, x, x
See 10/15, should be able to do this using palignr x2+psrldqx1 instead. > + punpcklwd m4, m0, m1 ; m4 = 0, 6, 1, 7, 2, 8, 3, 9 > + punpckhwd m0, m1 ; m0 = 4, 10, 5, 11, x, x, x, x > + punpcklwd m1, m3, m2 ; m1 = 12, 18, 13, 19, 14, 20, 15, 21 > + punpckhwd m3, m2 ; m3 = 16, 22, 17, 23, x, x, x, x > + punpckldq m2, m4, m1 ; m2 = 0, 6, 12, 18, 1, 7, 13, 19 > + punpckhdq m4, m1 ; m4 = 2, 8, 14, 20, 3, 9, 15, 21 > + punpckldq m0, m3 ; m0 = 4, 10, 16, 22, 5, 11, 17, 23 > + movhlps m3, m2 ; m3 = 1, 7, 13, 19, x, x, x, x > + movhlps m5, m4 ; m5 = 3, 9, 15, 21, x, x, x, x > + movhlps m1, m0 ; m1 = 5, 11, 17, 23, x, x, x, x > + PMOVSXWD m0, m0 > + PMOVSXWD m1, m1 > + PMOVSXWD m2, m2 > + PMOVSXWD m3, m3 > + PMOVSXWD m4, m4 > + PMOVSXWD m5, m5 > + cvtdq2ps m0, m0 > + cvtdq2ps m1, m1 > + cvtdq2ps m2, m2 > + cvtdq2ps m3, m3 > + cvtdq2ps m4, m4 > + cvtdq2ps m5, m5 > + mulps m0, m6 > + mulps m1, m6 > + mulps m2, m6 > + mulps m3, m6 > + mulps m4, m6 > + mulps m5, m6 > + mova [dstq ], m2 > + mova [dstq+dst1q], m3 > + mova [dstq+dst2q], m4 > + mova [dstq+dst3q], m5 > + mova [dstq+dst4q], m0 > + mova [dstq+dst5q], m1 > + add srcq, mmsize*3 > + add dstq, mmsize > + sub lend, mmsize/4 Pointer munging allows to remove one add/sub. Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel