"Ronald S. Bultje" <[email protected]> writes:

> Hi,
>
> On Thu, Feb 23, 2012 at 9:28 AM, Christophe Gisquet
> <[email protected]> wrote:
>> again, a simple function which takes approximately 2% of the decoding
>> time. The timings go down from 117 (32bits) / 109 (64 bits) cycles to
>> 68 cycles. See patch comments for some further experiments.
> [..]
>> +    movsxd      r4, r4d
>
> Let's change the argument to intptr_t, the caller can likely do this for free.
>
>> +.loop4:
>> +    movq        m0, [r2 + 0]
>> +    movq        m1, [r2 + 8]
>> +    movq        m2, [r1 + 0*STEP]
>> +    movq        m3, [r1 + 2*STEP]
>> +    movhps      m2, [r1 + 1*STEP]
>> +    movhps      m3, [r1 + 3*STEP]
>> +    punpckldq   m0, m0
>> +    punpckldq   m1, m1
>> +    mulps       m0, m2
>> +    mulps       m1, m3
>> +    movu [r0 +  0], m0
>> +    movu [r0 + 16], m1
>> +    add         r1, 4*STEP
>> +    add         r2, 4*1*4
>> +    add         r0, 4*2*4
>> +    dec         r3
>> +    jnz .loop4
>
> shl r3, 2
> lea r2, [r2+r3*4]
> lea r0, [r0+r3*8]
> .loop4:
> movq ..., [r2+r3*4+0/8]
> ...
> movu [r0+r3*8+0/16], ...
> add r1, 4*STEP
> add r3, r4
> jnz .loop4
>
> Is that faster? It saves one add and one sub in the loop, at the cost
> of more complex movh/movus. Also, can the movu be changed to a mova,
> i.e. can we somehow guarantee alignment?

64-bit alignment is guaranteed.

-- 
Måns Rullgård
[email protected]
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to