Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

Matthieu Bouron Wed, 11 May 2016 13:15:44 -0700

On Wed, May 11, 2016 at 9:04 PM, Reimar Döffinger <[email protected]>
wrote:


>
>
> On 11.05.2016, at 20:37, Michael Niedermayer <[email protected]>
> wrote:
>
> > On Wed, May 11, 2016 at 06:39:20PM +0200, Matthieu Bouron wrote:
> >> From: Matthieu Bouron <[email protected]>
> >>
> >> ---
> >>
> >> Hello,
> >>
> >> Here are some benchmark on a rpi2 of the attached patch.
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=fltp,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >> With patch:    avg=0.001159 speed=44,1x
> >> Without patch: avg=0.001297 speed=40,8x
> >>
> >> ./ffmpeg -f lavfi -i
> sine=440,aformat=sample_fmts=s16p,asetnsamples=4096,abench=start,aresample=48000,abench=stop
> -t 1000 -f null -
> >>
> >
> >> With patch:    avg=0.001374 speed=45,6x
> >> Without patch: avg=0.000782 speed=64,6x
> >
> > so its slower ? or am i misreading this ?
>

>
> Yes, that seems weird.
> Also, what are common filter lengths?
>

Sorry I inverted the two results, the neon version is actually faster:

With*out* patch:    avg=0.001374 speed=45,6x
With patch: avg=0.000782 speed=64,6x



> Because for a length of 4 or 8 or 16 I'd think this would be much better
> fully unrolled.
> And for longer ones at least partially unrolled.
>


The common filter length seems to be 32 but it might depends.
Regarding the little performance gain on the float version it seems to be
due to the switch between vfp instructions versus neon instructions (i'm
not 100% sure).

Matthieu

[...]
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swresample/arm: add ff_resample_common_apply_filter_{x4, x8}_{float, s16}_neon

Reply via email to