On Mon, Oct 19, 2015 at 10:00 PM, Timothy Gu <timothyg...@gmail.com> wrote:
> About 16% faster on large clips (>1200px width), more than 2x slower on small 
> clips
> (352px).

The reason is for this is likely the fact that you fall back to scalar
as soon as you have less than 2*mmsize bytes left to process which
leads to a larger portion being done in scalar with larger vector
sizes.

A possible workaround for this is to gradually decrease the amount you
process with SIMD when you're approaching the end, e.g. fallback to
using xmm registers, then half of an xmm register, and maybe even a
quarter of an xmm register (as always, benchmark to see what helps)
before doing scalar for the last few bytes.

This is assuming that you cannot overread src and/or overwrite dst, if
you're allowed to do that then it's a bit easier of course.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to