On 25/06/14 3:44 PM, Luca Barbato wrote:
> On 25/06/14 20:33, James Almer wrote:
>> On 24/06/14 11:26 AM, Luca Barbato wrote:
>>> From: Pierre Edouard Lepere <pierre-edouard.lep...@insa-rennes.fr>
>>>
>>> The functions only support x86_64.
>>>
>>> Fixes from Hendrik Leppkes and James Almer
>>>
>>> Signed-off-by: Luca Barbato <lu_z...@gentoo.org>
>>> ---
>>>  libavcodec/hevcdsp.c          |    6 +-
>>>  libavcodec/hevcdsp.h          |    3 +
>>>  libavcodec/x86/Makefile       |    2 +
>>>  libavcodec/x86/hevc_mc.asm    | 1256 
>>> +++++++++++++++++++++++++++++++++++++++++
>>>  libavcodec/x86/hevcdsp.h      |  164 ++++++
>>>  libavcodec/x86/hevcdsp_init.c |  373 ++++++++++++
>>>  6 files changed, 1803 insertions(+), 1 deletion(-)
>>>  create mode 100644 libavcodec/x86/hevc_mc.asm
>>>  create mode 100644 libavcodec/x86/hevcdsp.h
>>>  create mode 100644 libavcodec/x86/hevcdsp_init.c
>>>
>>
>> Many of these functions are SSSE3 and a couple even SSE2 at most.
> 
> Can you guide me in this regard?

The SSE4 functions are those using pextrw (with memory operand) and packusdw.

hevc_put_hevc_bi_w2_{8,10}
hevc_put_hevc_bi_w4_{8,10}
hevc_put_hevc_bi_w6_{8,10}
hevc_put_hevc_bi_w8_{8,10}
hevc_put_hevc_uni_w2_{8,10}
hevc_put_hevc_uni_w4_{8,10}
hevc_put_hevc_uni_w6_{8,10}
hevc_put_hevc_uni_w8_{8,10}
hevc_put_hevc_uni_qpel_v{4,8}_10
hevc_put_hevc_uni_qpel_hv2_{8,10}
hevc_put_hevc_uni_qpel_hv4_{8,10}
hevc_put_hevc_uni_qpel_hv6_{8,10}
hevc_put_hevc_uni_qpel_hv8_{8,10}
hevc_put_hevc_uni_pel_pixels{2,6}_8
hevc_put_hevc_bi_pel_pixels{2,6}_8
hevc_put_hevc_{uni,bi}_epel_h2_8
hevc_put_hevc_{uni,bi}_epel_v2_8
hevc_put_hevc_{uni,bi}_epel_h6_8
hevc_put_hevc_{uni,bi}_epel_v6_8
hevc_put_hevc_{uni,bi}_epel_hv{2,6}_8

I think I'm not missing any.
both instructions can be emulated using sse2, so the relevant functions could 
be 
duplicated to create an SSE2/SSSE3 variant, but that's for another time/patch.

The rest are mostly SSSE3 because of pmaddubsw and pmulhrsw, and a few only 
SSE2.

The qpel and epel tables also need to be renamed to remove the sse4 suffix 
(Which 
is unneeded).

> 
>> It will require some init macros rewriting to change, but leaving things as 
>> is 
>> will make atom, conroe and bobcat cpus miss a considerable performance boost.
> 
> Probably I can do myself but your help would be welcome =)

I don't have time nor really want to deal with the init macros, but i can help 
you with the necessary changes to the asm file if needed.

> lu
> 
> _______________________________________________
> libav-devel mailing list
> libav-devel@libav.org
> https://lists.libav.org/mailman/listinfo/libav-devel
> 

_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to