On 04/09/2013 03:51 PM, Christophe Gisquet wrote: > 2013/4/9 Justin Ruggles <[email protected]>: >> On 04/07/2013 04:29 PM, Christophe Gisquet wrote: >>> + mova m0, [src0q+cq] >>> + mova m1, [src1q] >>> + mova m4, [src0q+cq+mmsize] >>> + mova m5, [src1q+mmsize] >>> +%if cpuflag(sse2) >>> + pshufd m2, m0, q0123 >>> + pshufd m3, m1, q0123 >>> + pshufd m6, m4, q0123 >>> + pshufd m7, m5, q0123 >>> +%else >>> + shufps m2, m0, m0, q0123 >>> + shufps m3, m1, m1, q0123 >>> + shufps m6, m4, m4, q0123 >>> + shufps m7, m5, m5, q0123 >>> +%endif >> >> You can use memory args for the pshufd. > > Because of the subps, it's not stricty commutative here, and I ended > up with this using 6 xmm regs: > mova m0, [src0q+cq] > mova m2, [src0q+cq+mmsize] > pshufd m4, [src1q], q0123 > pshufd m5, [src1q+mmsize], q0123 > pshufd m3, m0, m0, q0123 > pshufd m1, m2, m2, q0123 > addps m3, [src1q+mmsize] > subps m0, m5 > addps m1, [src1q] > subps m2, m4 > This is 79 cycles compared to the 68 of the original version. Nothing > that better scheduling could help.
Yeah, doing that doesn't make sense. I didn't notice the original memory values were being used below as well. Patch looks ok. Thanks, Justin _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
