On 04/09/2013 03:51 PM, Christophe Gisquet wrote:
> 2013/4/9 Justin Ruggles <[email protected]>:
>> On 04/07/2013 04:29 PM, Christophe Gisquet wrote:
>>> +    mova       m0, [src0q+cq]
>>> +    mova       m1, [src1q]
>>> +    mova       m4, [src0q+cq+mmsize]
>>> +    mova       m5, [src1q+mmsize]
>>> +%if cpuflag(sse2)
>>> +    pshufd     m2, m0, q0123
>>> +    pshufd     m3, m1, q0123
>>> +    pshufd     m6, m4, q0123
>>> +    pshufd     m7, m5, q0123
>>> +%else
>>> +    shufps     m2, m0, m0, q0123
>>> +    shufps     m3, m1, m1, q0123
>>> +    shufps     m6, m4, m4, q0123
>>> +    shufps     m7, m5, m5, q0123
>>> +%endif
>>
>> You can use memory args for the pshufd.
> 
> Because of the subps, it's not stricty commutative here, and I ended
> up with this using 6 xmm regs:
>     mova       m0, [src0q+cq]
>     mova       m2, [src0q+cq+mmsize]
>     pshufd     m4, [src1q], q0123
>     pshufd     m5, [src1q+mmsize], q0123
>     pshufd     m3, m0, m0, q0123
>     pshufd     m1, m2, m2, q0123
>     addps      m3, [src1q+mmsize]
>     subps      m0, m5
>     addps      m1, [src1q]
>     subps      m2, m4
> This is 79 cycles compared to the 68 of the original version. Nothing
> that better scheduling could help.

Yeah, doing that doesn't make sense. I didn't notice the original memory
values were being used below as well. Patch looks ok.

Thanks,
Justin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to