On Mon, Nov 02, 2015 at 10:10:33PM -0500, Ganesh Ajjanagadde wrote: > On Mon, Nov 2, 2015 at 9:32 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > > On Mon, Nov 2, 2015 at 6:49 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > >> On Mon, Nov 2, 2015 at 6:37 PM, wm4 <nfx...@googlemail.com> wrote: > >>> On Mon, 2 Nov 2015 14:49:54 -0500 > >>> Ganesh Ajjanagadde <gajjanaga...@gmail.com> wrote: > >>> > >>>> This improves accuracy for the bessel function, and this in turn should > >>>> improve the quality of the Kaiser window. > >>> > >>> > >>> "Should"? Does it or does it not? If you don't know, why the patch? > >> > >> It improves the window in the sense of mathematically matching the > >> Kaiser window expression due to the improved bessel function accuracy. > >> That claim I have verified and placed in the commit message with > >> evidence. > >> > >> What that translates into in terms of resampling accuracy, I don't > >> know. Normally, such things do help reduce the noise floor, but I > >> don't know an easy way to test via FATE or associated tools. I put > >> that caveat in the bottom lines of the patch. > > > > Turns out the init speed is definitely improved as well (~20% boost > > with default settings). > > I did a cheap trick of calling build filter 1000 times to get a large > > number of runs. > > Results (x86-64, Haswell, GNU/Linux): > > > > test: fate-swr-resample-dblp-44100-2626 > > > > new: > > 995894468 decicycles in build_filter(loop 1000), 256 runs, 0 skips > > 1029719302 decicycles in build_filter(loop 1000), 512 runs, 0 skips > > 984101131 decicycles in build_filter(loop 1000), 1024 runs, 0 skips > > > > old: > > 1250020763 decicycles in build_filter(loop 1000), 256 runs, 0 skips > > 1246353282 decicycles in build_filter(loop 1000), 512 runs, 0 skips > > 1220017565 decicycles in build_filter(loop 1000), 1024 runs, 0 skips > > > > Thus, this patch has both things going for it luckily. Will leave to > > the maintainer (Michael I believe) to give details of accuracy > > benefits as translated to the actual resampling if easily testable, > > and I will add the perf numbers to the message. > > One last comment on performance (this is something I have discussed > with Timothy privately), if one removes the crippling > -fno-tree-vectorize, FATE still passes on my GCC 5.2 setup, and one > gets further ~5 % perf improvement here: > 949318408 decicycles in build_filter(loop 1000), 256 runs, 0 skips > 948795082 decicycles in build_filter(loop 1000), 512 runs, 0 skips > 928925076 decicycles in build_filter(loop 1000), 1024 runs, 0 skips > > I am sure a lot of other places may benefit.
It's been a while I want to remove -fno-tree-vectorize. We've already observed relatively small performance improvements in various places. If it's still buggy, compilers need to be fixed anyway. --extra-cflags=-fno-tree-vectorize is a good enough workaround if some people explicitly want to disable it. About the bessel patch itself, I'm happy that you benched it, because its performance are actually relevant: last time I tried to replace the current unrolled loop with a real loop (trying to match libavresample code), I observed a noticeable overall slow down while resampling (the init is very slow). Yes, no number, sorry :) [...] -- Clément B.
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel