On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol <one...@gmail.com> wrote: > +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x [...] > + movu m2, [aq+2*xq] > + pand m2, m3 > + movu m6, [aq+2*xq] > + pand m6, m7 > + psrlw m6, 8 > + paddw m2, m6 > + psrlw m2, 1 > + movu m6, [aq+2*xq] > + pand m6, m3 > + paddw m2, m6 > + psrlw m2, 1
I believe this can be simplified to something like (untested): movu m1, [aq+2*xq] pandn m2, m3, m1 psllw m1, 8 pavgw m2, m1 pavgw m2, m1 psrlw m2, 8 > +cglobal overlay_row_20, 6, 8, 8, 0, d, da, s, a, w, al, r, x [...] > + movu m2, [aq+2*xq] > + pand m2, m3 > + movu m6, [aq+2*xq] > + pand m6, m7 > + psrlw m6, 8 > + paddw m2, m6 > + movu m6, [daq+2*xq] > + pand m6, m3 > + paddw m2, m6 > + movu m6, [daq+2*xq] > + pand m6, m7 > + psrlw m6, 8 > + paddw m2, m6 > + psrlw m2, 2 And this to (untested): mova m6, [pb_1] ... movu m2, [aq+2*xq] movu m1, [daq+2*xq] pmaddubsw m2, m6 pmaddubsw m1, m6 paddw m2, m1 psrlw m2, 2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel