On 23/08/15 3:27 PM, Anton Khirnov wrote:
> Quoting James Almer (2015-08-22 23:58:41)
>> On 22/08/15 1:16 PM, Anton Khirnov wrote:
>>>>> +%macro QPEL_8 2
>>>>> +%if %2
>>>>> +    %define postfix    v
>>>>> +    %define mvfrac     myq
>>>>
>>>> Same here and below the else, rename this to mvfracq and add a mvfracd.
>>>>
>>>>> +    %define pixstride  srcstrideq
>>>>> +    %define pixstride3 sstride3q
>>>>> +    %define src_m3     srcm3q
>>>>> +%else
>>>>> +    %define postfix    h
>>>>> +    %define mvfrac     mxq
>>>>> +    %define pixstride  1
>>>>> +    %define pixstride3 3
>>>>> +    %define src_m3     (srcq - 3)
>>>>> +%endif
>>>>> +
>>>>> +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, 
>>>>> dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg
>>
>> This should be 7, 10, 7, Otherwise you're loading sstride3 from stack as if 
>> it were
>> a function argument.
>> Ideally though, for vertical you'd use 5, 9, 7 then manually load either mx 
>> or my
>> instead of both, saving one register, or even 5, 8, 7, since coeffsreg and 
>> mvfrac
>> are only used during init, and you can easily reuse one of those two 
>> registers for
>> sstride3 or srcm3.
>> You can also push it down to 4, 7, 7 if you manually load height before or 
>> after
>> the SPLATWs and reuse the regs for coeffsreg and mvfrac. As a plus, this 
>> would make
>> the functions work with x86_32.
>>
>> For horizontal you don't even need sstride3 or srcm3, so you definitely 
>> should
>> declare and use less registers.
>>
>> Didn't check other functions but I'm sure similar optimizations can be done.
>>
>>>>> +%if %2
>>>>> +    and       mvfrac, 0x3
>>>>> +%endif
>>>>> +    dec       mvfrac
>>>>> +    shl       mvfrac, 4
>>>>
>>>> Use mvfracd on these three, it will clear the high bits for the mova below.
>>>
>>> anding the whole register with 3/7 should also work fine, with less
>>> clutter.
>>
>> "and mvfrac, 0x3" is only in ff_hevc_qpel_v_* functions, but not 
>> ff_hevc_qpel_h_*.
>> It's the same with the "and mvfrac, 0x7" cases below.
> 
> Sure, I meant to change the code so it's done in both paths.

It's not necessary. Just use the 32bit gprs.

>> You need to use the d suffix
>> instead of q on the register names to make sure the high bits are cleared.
> 
> Eh? Perhaps I'm misunderstading something, but I'd expect that using d
> here would do exactly the opposite and keep the random data in the high bits.

No, using d to write a gprs on x86_64 will clear the high bits (32 to 63) in a 
similar
way that using VEX coding instructions to write xmm registers will clear bits 
128 to
255 on ymm registers.

_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to