On Tue, Nov 3, 2015 at 12:45 AM, Timothy Gu <timothyg...@gmail.com> wrote: > On Mon, Nov 2, 2015 at 8:23 PM Rostislav Pehlivanov <atomnu...@gmail.com> > wrote: > >> >if one removes the crippling >> >-fno-tree-vectorize >> Yes, I think a config option to turn this flag on (like the unsafe >> bitstream reader) would be good. Defaulting to off by default if it doesn't >> break anything for at least a few people (and compilers) who test it. It's >> not a big performance impact but every little bit counts nowadays. >> > > FWIW, I recently (i.e. 2 days ago) did some tests with auto-vectorization > and a few compilers. Fortunately, none of the compilers I tested caused any > miscompilation, when purely measured by FATE: > > compiling: > clang3.7 4m3.034s > gcc5vectorize 5m50.637s (1.14x gcc5) > gcc5 5m7.262s > gcc4.9vectorize 5m29.669s (1.11x gcc4.9) > gcc4.9 4m54.602s > gcc4.8vectorize 5m18.848s (1.09x gcc4.8) > gcc4.8 4m53.940s > > FATE: > clang3.7 3m13.923s > gcc5vectorize 3m5.988s (0.980x gcc5) > gcc5 3m9.618s > gcc4.9vectorize 3m12.880s (0.983x gcc4.9) > gcc4.9 3m16.563s > gcc4.8vectorize 3m10.321s (0.993x gcc4.8) > gcc4.8 3m11.608s > > Tested with: > - Debian jessie/stable/8.2 > - Dual-core Haswell i7 ultra low voltage > - clang-3.7 3.7.0-svn251177-1~exp1 (from the offical clang apt repo) > - gcc-5 (Debian 5.2.1-22) 5.2.1 20151010 (Debian testing stock) > - gcc-4.9 (Debian 4.9.2-10) 4.9.2 (Debian stable stock) > - gcc-4.8 (Debian 4.8.4-1) 4.8.4 (Debian stable stock) > > Note that FATE is probably the worst benchmark one can find, but it does > show something. > > Some observations: > > - GCC vectorization slows down compilation A LOT in all versions. The newer > the worse.
A ~ 20% slowdown on a build for a ~ 20% improvement in an overall FATE bench - sounds like a win to me especially with ccache. Contrast this with LTO, which is buggier than vectorization on recent compilers, and usually at best 5-10% better for a "kitchen sink" bench like the FATE you described, for a usage of a ton of resources. > - If you are developing, use clang, and DON'T use GCC 5 with vectorization. This is an opinion, so I will state mine here: if you are developing use ccache + GCC > ccache + clang > clang = gcc. Reason for the first is due to the terrible interaction ccache has with clang. I still will use GCC 5.2 + ccache (with vectorization) for my builds, and will inform Arch packagers once we have finalized configure in this respect :). > - For release builds, an option to turn it on (or rather to not turn it > off) would be helpful; but if you really care about performance _that_ much > then you should probably use some other compilers instead. No, not true at all. Why do we bother with asm? Many times for such "last mile" optimizations. A 20% improvement in FATE across board is nothing to sneeze at given what I have seen in FFmpeg. This one is virtually free. Furthermore, barring ICC (for Intel), Clang and GCC are among the best quality compilers today. I don't know about what "other compilers" you are referring to. > > FYI, as I have told Ganesh so in our private exchanges, I did also test > vectorization on GCC 4.6 on a Ubuntu 12.04/Precise box, which miscompiled > the code hilariously, _and_ made the code slower, just as illustrated in > Mans's commit message. A good point, but I did comment on this. By "recent compiler" I meant ~ 4.8 and beyond. Or put in other words, I take Debian stable as a reference. > > Timothy > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel