2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>:
> On 4/18/2016 10:07 AM, Christophe Gisquet wrote:
>> The loops are guaranteed to be at least multiples of 8, so this
>> unrolling is safe but allows exploiting execution ports.
>>
>> For int32 version: 72 -> 57c.
>
> What compiler are you using, and what cpu at configure time?

gcc 5.1, Win64, haswell. I don't use mingw64 compiler.

> We're currently enabling tree vectorization for gcc 4.9 or newer on x86,
> and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse.
> I didn't bench it, but after this patch it's not being vectorized anymore.

The code I benchmarked as being 72c is vectorized and keeps being
vectorized here. It actually looks better than the previously
vectorized one.

The 16_c version is no longer vectorized, but is really a mess here
when vectorized.

-- 
Christophe
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to