On Sat, Oct 3, 2015 at 2:12 AM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Well, they prototype is different. For H/V, it's not critical, but for the > directional ones, the edge handling is very quirky so I wanted to do that > in C, so l/a are arguments instead of part of the source buffer. > > (And because we do in-loop filtering, doing V as-is from h264 won't work, > since a can be post-loopfilter, whereas in h264 it's required to be pre-, > and we don't swap in vp9.)
Oh, I see. Then it's fine. > +cglobal vp9_ipred_v_32x32_16, 2, 4, 4, dst, stride, l, a [...] > +.loop: > + mova [dstq+strideq*0+ 0], m0 > + mova [dstq+strideq*0+16], m1 > + mova [dstq+strideq*0+32], m2 > + mova [dstq+strideq*0+48], m3 > + mova [dstq+strideq*1+ 0], m0 > + mova [dstq+strideq*1+16], m1 > + mova [dstq+strideq*1+32], m2 > + mova [dstq+strideq*1+48], m3 > + mova [dstq+strideq*2+ 0], m0 > + mova [dstq+strideq*2+16], m1 > + mova [dstq+strideq*2+32], m2 > + mova [dstq+strideq*2+48], m3 > + mova [dstq+stride3q + 0], m0 > + mova [dstq+stride3q +16], m1 > + mova [dstq+stride3q +32], m2 > + mova [dstq+stride3q +48], m3 > + lea dstq, [dstq+strideq*4] > + dec cntd > + jg .loop > + RET Missed this one before, but you could cut the number of stores/iteration in half here as well. Feel free to push after that. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel