Hi, On Fri, Nov 30, 2012 at 1:14 PM, Christophe Gisquet <christophe.gisq...@gmail.com> wrote: > Hello, > > 2012/11/30 Loren Merritt <lor...@u.washington.edu>: >> If you increment an index into W and z rather than the pointers >> themselves, then you can eliminate an add and a cmp. > > I add already tested that, and redid it: > cglobal sbr_qmf_post_shuffle, 2,4,3,W,z > mov r3q, 32*4 > lea r2q, [zq + (64-4)*4] > add zq, r3q > lea Wq, [Wq + 2*r3q] > neg r3q > .loop: > mova m0, [r2q] > mova m1, [zq + r3q] > xorps m0, [ps_neg] > shufps m0, m0, 0x1B > mova m2, m0 > unpcklps m0, m1 > unpckhps m2, m1 > mova [Wq + 2*r3q + 0], m0 > mova [Wq + 2*r3q + 16], m2 > sub r2q, 16 > add r3q, 16 > jl .loop > REP_RET > > It's 2 cycles slower on Penrynn/Win64 (154 vs 152).
Try adding an "ALIGN 16" just above ".loop:", maybe that fixes it? Ronald _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel