Hi,

On Sat, Jul 14, 2012 at 9:29 PM, Justin Ruggles
<justin.rugg...@gmail.com> wrote:
> +    mova       m0, [srcq         ]  ; m0 =  0,  1,  2,  3,  4,  5,  6,  7
> +    mova       m1, [srcq+  mmsize]  ; m1 =  8,  9, 10, 11, 12, 13, 14, 15
> +    mova       m2, [srcq+2*mmsize]  ; m2 = 16, 17, 18, 19, 20, 21, 22, 23
> +    movhlps    m3, m1
> +    movlhps    m3, m2               ; m3 = 12, 13, 14, 15, 16, 17, 18, 19
> +    movlhps    m1, m1
> +    movhlps    m1, m0               ; m1 =  4,  5,  6,  7,  8,  9, 10, 11
> +    psrldq     m1, 4                ; m1 =  6,  7,  8,  9, 10, 11,  x,  x
> +    psrldq     m2, 4                ; m2 = 18, 19, 20, 21, 22, 23,  x,  x

See 10/15, should be able to do this using palignr x2+psrldqx1 instead.

> +    punpcklwd  m4, m0, m1           ; m4 =  0,  6,  1,  7,  2,  8,  3,  9
> +    punpckhwd  m0, m1               ; m0 =  4, 10,  5, 11,  x,  x,  x,  x
> +    punpcklwd  m1, m3, m2           ; m1 = 12, 18, 13, 19, 14, 20, 15, 21
> +    punpckhwd  m3, m2               ; m3 = 16, 22, 17, 23,  x,  x,  x,  x
> +    punpckldq  m2, m4, m1           ; m2 =  0,  6, 12, 18,  1,  7, 13, 19
> +    punpckhdq  m4, m1               ; m4 =  2,  8, 14, 20,  3,  9, 15, 21
> +    punpckldq  m0, m3               ; m0 =  4, 10, 16, 22,  5, 11, 17, 23
> +    movhlps    m3, m2               ; m3 =  1,  7, 13, 19,  x,  x,  x,  x
> +    movhlps    m5, m4               ; m5 =  3,  9, 15, 21,  x,  x,  x,  x
> +    movhlps    m1, m0               ; m1 =  5, 11, 17, 23,  x,  x,  x,  x
> +    PMOVSXWD   m0, m0
> +    PMOVSXWD   m1, m1
> +    PMOVSXWD   m2, m2
> +    PMOVSXWD   m3, m3
> +    PMOVSXWD   m4, m4
> +    PMOVSXWD   m5, m5
> +    cvtdq2ps   m0, m0
> +    cvtdq2ps   m1, m1
> +    cvtdq2ps   m2, m2
> +    cvtdq2ps   m3, m3
> +    cvtdq2ps   m4, m4
> +    cvtdq2ps   m5, m5
> +    mulps      m0, m6
> +    mulps      m1, m6
> +    mulps      m2, m6
> +    mulps      m3, m6
> +    mulps      m4, m6
> +    mulps      m5, m6
> +    mova  [dstq      ], m2
> +    mova  [dstq+dst1q], m3
> +    mova  [dstq+dst2q], m4
> +    mova  [dstq+dst3q], m5
> +    mova  [dstq+dst4q], m0
> +    mova  [dstq+dst5q], m1
> +    add      srcq, mmsize*3
> +    add      dstq, mmsize
> +    sub      lend, mmsize/4

Pointer munging allows to remove one add/sub.

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to