On 23/08/15 3:27 PM, Anton Khirnov wrote: > Quoting James Almer (2015-08-22 23:58:41) >> On 22/08/15 1:16 PM, Anton Khirnov wrote: >>>>> +%macro QPEL_8 2 >>>>> +%if %2 >>>>> + %define postfix v >>>>> + %define mvfrac myq >>>> >>>> Same here and below the else, rename this to mvfracq and add a mvfracd. >>>> >>>>> + %define pixstride srcstrideq >>>>> + %define pixstride3 sstride3q >>>>> + %define src_m3 srcm3q >>>>> +%else >>>>> + %define postfix h >>>>> + %define mvfrac mxq >>>>> + %define pixstride 1 >>>>> + %define pixstride3 3 >>>>> + %define src_m3 (srcq - 3) >>>>> +%endif >>>>> + >>>>> +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, >>>>> dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg >> >> This should be 7, 10, 7, Otherwise you're loading sstride3 from stack as if >> it were >> a function argument. >> Ideally though, for vertical you'd use 5, 9, 7 then manually load either mx >> or my >> instead of both, saving one register, or even 5, 8, 7, since coeffsreg and >> mvfrac >> are only used during init, and you can easily reuse one of those two >> registers for >> sstride3 or srcm3. >> You can also push it down to 4, 7, 7 if you manually load height before or >> after >> the SPLATWs and reuse the regs for coeffsreg and mvfrac. As a plus, this >> would make >> the functions work with x86_32. >> >> For horizontal you don't even need sstride3 or srcm3, so you definitely >> should >> declare and use less registers. >> >> Didn't check other functions but I'm sure similar optimizations can be done. >> >>>>> +%if %2 >>>>> + and mvfrac, 0x3 >>>>> +%endif >>>>> + dec mvfrac >>>>> + shl mvfrac, 4 >>>> >>>> Use mvfracd on these three, it will clear the high bits for the mova below. >>> >>> anding the whole register with 3/7 should also work fine, with less >>> clutter. >> >> "and mvfrac, 0x3" is only in ff_hevc_qpel_v_* functions, but not >> ff_hevc_qpel_h_*. >> It's the same with the "and mvfrac, 0x7" cases below. > > Sure, I meant to change the code so it's done in both paths.
It's not necessary. Just use the 32bit gprs. >> You need to use the d suffix >> instead of q on the register names to make sure the high bits are cleared. > > Eh? Perhaps I'm misunderstading something, but I'd expect that using d > here would do exactly the opposite and keep the random data in the high bits. No, using d to write a gprs on x86_64 will clear the high bits (32 to 63) in a similar way that using VEX coding instructions to write xmm registers will clear bits 128 to 255 on ymm registers. _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel