As far as I can see, the only reason those functions are SSE4 is because
of the pextrw needed for the following block widths:
- 2, used  only by chroma;
- 6, used by chroma and indirectly by luma;
- 12, used by both.
The better solution would be to convert all chroma handling to NV12, but
it is vastly simpler to modify the above cases to not use pextrw.

This is done in 2 steps:
- Fix width of 12 to do 8+4 instead of 6+6;
- Modify the store macros for width 2 and 6 by passing data through
  a GPR (alas at the cost for some functions of a supplementary GPR).

Christophe Gisquet (2):
  x86: hevc_mc: split differently calls
  x86: hevc_mc: convert to ssse3

 libavcodec/x86/hevc_mc.asm    |  63 +++--
 libavcodec/x86/hevcdsp.h      |  48 ++--
 libavcodec/x86/hevcdsp_init.c | 561 ++++++++++++++++++++++--------------------
 3 files changed, 362 insertions(+), 310 deletions(-)

-- 
1.9.2.msysgit.0

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to