On 21/04/15 7:49 PM, James Almer wrote:
> On 21/04/15 7:12 PM, Michael Niedermayer wrote:
>> On Tue, Apr 21, 2015 at 06:51:52PM -0300, James Almer wrote:
>>> On 21/04/15 6:38 PM, Michael Niedermayer wrote:
>>>> On Tue, Apr 21, 2015 at 05:45:25PM -0300, James Almer wrote:
>>>>> Signed-off-by: James Almer <jamr...@gmail.com>
>>>>> diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h
>>>>> index 5081397..bfc4d71 100644
>>>>> --- a/libavcodec/ffv1.h
>>>>> +++ b/libavcodec/ffv1.h
>>>>> @@ -143,7 +143,7 @@ static av_always_inline int fold(int diff, int bits)
>>>>>          diff = (int8_t)diff;
>>>>>      else {
>>>>>          diff +=  1 << (bits  - 1);
>>>>> -        diff &= (1 <<  bits) - 1;
>>>>> +        diff  = av_mod_uintp2(diff, bits);
>>>>>          diff -=  1 << (bits  - 1);
>>>>>      }
>>>>>  
>>>>
>>>> iam not sure some of these changes are are a good idea
>>>> for example above the mask has to be calculated anyway what can be
>>>> gained from av_mod_uintp2() use here ?
>>>> i think this should be bechmarked when its in performance critical code
>>>
>>> "1 << (bits  - 1)" is not the same as "(1 <<  bits) - 1". The latter is not 
>>> going to be
>>> calculated if the arch optimized version of av_mod_uintp2 is used.
>>
>> yep, i should not review patches when i am in a (IRC) meeting
>>
>> before the patch fold looks like this: (for the non 8bit case)
>>         addl    100(%rsp), %ecx
>>         andl    104(%rsp), %ecx
>>         subl    100(%rsp), %ecx
>> .L869:
>> afterwards it looks like this:
>>
>>         addl    104(%rsp), %ecx
>>         andl    64(%rsp), %ecx
>>         subl    104(%rsp), %ecx
>> .L869:
>>
>> so it seems ok
>>
>>
>>>
>>> The one file where i don't know if there's going to be any gain is in one 
>>> of the opus_celt.c
>>> changes, where the mask needs to be calculated even if arch optimized 
>>> av_mod_uintp2 is used.
>>> That one can be removed, but every other change is pretty straightforward.
>>
>> ok, then iam fine with the patch
>>
>> Thanks
> 
> I'm going to apply the changes that are straightforward (single line changes 
> that didn't use
> mask variables prior to this patch, or that created one but then used it 
> once). av_mod_uintp2
> should generate the same assembly with the non bmi2 optimized version.

Done and pushed, thanks.

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to