On Mon, Nov 02, 2015 at 10:10:33PM -0500, Ganesh Ajjanagadde wrote:
> On Mon, Nov 2, 2015 at 9:32 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> > On Mon, Nov 2, 2015 at 6:49 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> >> On Mon, Nov 2, 2015 at 6:37 PM, wm4 <nfx...@googlemail.com> wrote:
> >>> On Mon,  2 Nov 2015 14:49:54 -0500
> >>> Ganesh Ajjanagadde <gajjanaga...@gmail.com> wrote:
> >>>
> >>>> This improves accuracy for the bessel function, and this in turn should
> >>>> improve the quality of the Kaiser window.
> >>>
> >>>
> >>> "Should"? Does it or does it not? If you don't know, why the patch?
> >>
> >> It improves the window in the sense of mathematically matching the
> >> Kaiser window expression due to the improved bessel function accuracy.
> >> That claim I have verified and placed in the commit message with
> >> evidence.
> >>
> >> What that translates into in terms of resampling accuracy, I don't
> >> know. Normally, such things do help reduce the noise floor, but I
> >> don't know an easy way to test via FATE or associated tools. I put
> >> that caveat in the bottom lines of the patch.
> >
> > Turns out the init speed is definitely improved as well (~20% boost
> > with default settings).
> > I did a cheap trick of calling build filter 1000 times to get a large
> > number of runs.
> > Results (x86-64, Haswell, GNU/Linux):
> >
> > test: fate-swr-resample-dblp-44100-2626
> >
> > new:
> > 995894468 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> > 1029719302 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> > 984101131 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
> >
> > old:
> > 1250020763 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> > 1246353282 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> > 1220017565 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
> >
> > Thus, this patch has both things going for it luckily. Will leave to
> > the maintainer (Michael I believe) to give details of accuracy
> > benefits as translated to the actual resampling if easily testable,
> > and I will add the perf numbers to the message.
> 
> One last comment on performance (this is something I have discussed
> with Timothy privately), if one removes the crippling
> -fno-tree-vectorize, FATE still passes on my GCC 5.2 setup, and one
> gets further ~5 % perf improvement here:
> 949318408 decicycles in build_filter(loop 1000),     256 runs,      0 skips
> 948795082 decicycles in build_filter(loop 1000),     512 runs,      0 skips
> 928925076 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
> 
> I am sure a lot of other places may benefit.

It's been a while I want to remove -fno-tree-vectorize. We've already
observed relatively small performance improvements in various places. If
it's still buggy, compilers need to be fixed anyway.

--extra-cflags=-fno-tree-vectorize is a good enough workaround if some
people explicitly want to disable it.

About the bessel patch itself, I'm happy that you benched it, because its
performance are actually relevant: last time I tried to replace the
current unrolled loop with a real loop (trying to match libavresample
code), I observed a noticeable overall slow down while resampling (the
init is very slow). Yes, no number, sorry :)

[...]

-- 
Clément B.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to