Hi,

On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet
<christophe.gisq...@gmail.com> wrote:
> Hello,
>
> 2012/11/30 Loren Merritt <lor...@u.washington.edu>:
>> If you increment an index into W and z rather than the pointers
>> themselves, then you can eliminate an add and a cmp.
>
> I add already tested that, and redid it:
> cglobal sbr_qmf_post_shuffle, 2,4,3,W,z
>   mov       r3q, 32*4
>   lea       r2q, [zq + (64-4)*4]
>   add        zq, r3q
>   lea        Wq, [Wq + 2*r3q]
>   neg       r3q
> .loop:
>   mova       m0, [r2q]
>   mova       m1, [zq  + r3q]
>   xorps      m0, [ps_neg]
>   shufps     m0, m0, 0x1B
>   mova       m2, m0
>   unpcklps   m0, m1
>   unpckhps   m2, m1
>   mova  [Wq + 2*r3q +  0], m0
>   mova  [Wq + 2*r3q + 16], m2
>   sub       r2q, 16
>   add       r3q, 16
>   jl      .loop
>   REP_RET
>
> It's 2 cycles slower on Penrynn/Win64 (154 vs 152).

Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it?

Ronald
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to