2012/4/17 Ronald S. Bultje <rsbul...@gmail.com>:
>> +    lea      myd, [sixtap_filter_v+myq]
>
> lea myq, ...

Also as an answer to Jason (the actual reply is pending completion of
the updated patch), I did the best I could my homework here when not
trying to cargocult vp8's code: I did try to use add instead of lea,
but this code has some trick to it. myd has the correct value, as its
high bits should be 0, but they may in fact contain garbage at least
on win64, while it is clear they should be 0. That's the classical
problem requiring sign-extension. Here the values are always positive,
so by using that lea instruction, both the add and garbage handling
are done in one step.

That's my explanation of this trick used by vp8's mc code. By just
doing an add, you get garbage in myq and crashes.

The other solution is to have the last argument be ptrdiff_t, but I'd
imagine this to be a loss.

>> +.nextrow:
>> +    mova      m6, m1
>> +    movh      m5, [srcq+2*srcstrideq]      ; read new row
>> +    paddw     m6, m4
>
> Can we use 3-arg stuff here to prepare for AVX functions? I.e. paddw m6, m1, 
> m4.

Probably. I'm not used to either avx or that syntax, and I don't
really intend to validate the avx code. I could, using obe2, but that
has proven to be too much trouble.

-- 
Christophe
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to