2012/4/17 Ronald S. Bultje <rsbul...@gmail.com>: >> + lea myd, [sixtap_filter_v+myq] > > lea myq, ...
Also as an answer to Jason (the actual reply is pending completion of the updated patch), I did the best I could my homework here when not trying to cargocult vp8's code: I did try to use add instead of lea, but this code has some trick to it. myd has the correct value, as its high bits should be 0, but they may in fact contain garbage at least on win64, while it is clear they should be 0. That's the classical problem requiring sign-extension. Here the values are always positive, so by using that lea instruction, both the add and garbage handling are done in one step. That's my explanation of this trick used by vp8's mc code. By just doing an add, you get garbage in myq and crashes. The other solution is to have the last argument be ptrdiff_t, but I'd imagine this to be a loss. >> +.nextrow: >> + mova m6, m1 >> + movh m5, [srcq+2*srcstrideq] ; read new row >> + paddw m6, m4 > > Can we use 3-arg stuff here to prepare for AVX functions? I.e. paddw m6, m1, > m4. Probably. I'm not used to either avx or that syntax, and I don't really intend to validate the avx code. I could, using obe2, but that has proven to be too much trouble. -- Christophe _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel