Re: [FFmpeg-devel] [PATCH] swresample/resample: improve bessel function accuracy

Ganesh Ajjanagadde Tue, 03 Nov 2015 04:46:47 -0800

On Tue, Nov 3, 2015 at 12:45 AM, Timothy Gu <timothyg...@gmail.com> wrote:
> On Mon, Nov 2, 2015 at 8:23 PM Rostislav Pehlivanov <atomnu...@gmail.com>
> wrote:
>
>> >if one removes the crippling
>> >-fno-tree-vectorize
>> Yes, I think a config option to turn this flag on (like the unsafe
>> bitstream reader) would be good. Defaulting to off by default if it doesn't
>> break anything for at least a few people (and compilers) who test it. It's
>> not a big performance impact but every little bit counts nowadays.
>>
>
> FWIW, I recently (i.e. 2 days ago) did some tests with auto-vectorization
> and a few compilers. Fortunately, none of the compilers I tested caused any
> miscompilation, when purely measured by FATE:
>
> compiling:
> clang3.7 4m3.034s
> gcc5vectorize 5m50.637s (1.14x gcc5)
> gcc5 5m7.262s
> gcc4.9vectorize 5m29.669s (1.11x gcc4.9)
> gcc4.9 4m54.602s
> gcc4.8vectorize 5m18.848s (1.09x gcc4.8)
> gcc4.8 4m53.940s
>
> FATE:
> clang3.7 3m13.923s
> gcc5vectorize 3m5.988s (0.980x gcc5)
> gcc5 3m9.618s
> gcc4.9vectorize 3m12.880s (0.983x gcc4.9)
> gcc4.9 3m16.563s
> gcc4.8vectorize 3m10.321s (0.993x gcc4.8)
> gcc4.8 3m11.608s
>
> Tested with:
> - Debian jessie/stable/8.2
> - Dual-core Haswell i7 ultra low voltage
> - clang-3.7 3.7.0-svn251177-1~exp1 (from the offical clang apt repo)
> - gcc-5 (Debian 5.2.1-22) 5.2.1 20151010 (Debian testing stock)
> - gcc-4.9 (Debian 4.9.2-10) 4.9.2 (Debian stable stock)
> - gcc-4.8 (Debian 4.8.4-1) 4.8.4 (Debian stable stock)
>
> Note that FATE is probably the worst benchmark one can find, but it does
> show something.
>
> Some observations:
>
> - GCC vectorization slows down compilation A LOT in all versions. The newer
> the worse.


A ~ 20% slowdown on a build for a ~ 20% improvement in an overall FATE
bench - sounds like a win to me especially with ccache. Contrast this
with LTO, which is buggier than vectorization on recent compilers, and
usually at best 5-10% better for a "kitchen sink" bench like the FATE
you described, for a usage of a ton of resources.

> - If you are developing, use clang, and DON'T use GCC 5 with vectorization.

This is an opinion, so I will state mine here: if you are developing
use ccache + GCC > ccache + clang > clang = gcc. Reason for the first
is due to the terrible interaction ccache has with clang. I still will
use GCC 5.2 + ccache (with vectorization) for my builds, and will
inform Arch packagers once we have finalized configure in this respect
:).

> - For release builds, an option to turn it on (or rather to not turn it
> off) would be helpful; but if you really care about performance _that_ much
> then you should probably use some other compilers instead.

No, not true at all. Why do we bother with asm? Many times for such
"last mile" optimizations. A 20% improvement in FATE across board is
nothing to sneeze at given what I have seen in FFmpeg. This one is
virtually free.

Furthermore, barring ICC (for Intel), Clang and GCC are among the best
quality compilers today. I don't know about what "other compilers" you
are referring to.

>
> FYI, as I have told Ganesh so in our private exchanges, I did also test
> vectorization on GCC 4.6 on a Ubuntu 12.04/Precise box, which miscompiled
> the code hilariously, _and_ made the code slower, just as illustrated in
> Mans's commit message.

A good point, but I did comment on this. By "recent compiler" I meant
~ 4.8 and beyond. Or put in other words, I take Debian stable as a
reference.

>
> Timothy
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swresample/resample: improve bessel function accuracy

Reply via email to