2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>: > On 4/18/2016 10:07 AM, Christophe Gisquet wrote: >> The loops are guaranteed to be at least multiples of 8, so this >> unrolling is safe but allows exploiting execution ports. >> >> For int32 version: 72 -> 57c. > > What compiler are you using, and what cpu at configure time?
gcc 5.1, Win64, haswell. I don't use mingw64 compiler. > We're currently enabling tree vectorization for gcc 4.9 or newer on x86, > and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse. > I didn't bench it, but after this patch it's not being vectorized anymore. The code I benchmarked as being 72c is vectorized and keeps being vectorized here. It actually looks better than the previously vectorized one. The 16_c version is no longer vectorized, but is really a mess here when vectorized. -- Christophe _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel