aarch64: add neon {lum, chr}ConvertRange

Ramiro Polla Tue, 11 Jun 2024 05:33:27 -0700

On Mon, Jun 10, 2024 at 1:56 PM Martin Storsjö <[email protected]> wrote:
> On Fri, 7 Jun 2024, Ramiro Polla wrote:
>
> > chrRangeFromJpeg_8_c: 28.5
> > chrRangeFromJpeg_8_neon: 21.2
> > chrRangeFromJpeg_24_c: 81.2
> > chrRangeFromJpeg_24_neon: 34.7
> > chrRangeFromJpeg_128_c: 425.2
> > chrRangeFromJpeg_128_neon: 162.0
> > chrRangeFromJpeg_144_c: 480.2
> > chrRangeFromJpeg_144_neon: 180.2
> > chrRangeFromJpeg_256_c: 838.2
> > chrRangeFromJpeg_256_neon: 318.0
> > chrRangeFromJpeg_512_c: 1698.2
> > chrRangeFromJpeg_512_neon: 630.0
> > chrRangeToJpeg_8_c: 56.0
> > chrRangeToJpeg_8_neon: 23.5
> > chrRangeToJpeg_24_c: 147.7
> > chrRangeToJpeg_24_neon: 38.2
> > chrRangeToJpeg_128_c: 760.2
> > chrRangeToJpeg_128_neon: 182.5
> > chrRangeToJpeg_144_c: 857.7
> > chrRangeToJpeg_144_neon: 204.5
> > chrRangeToJpeg_256_c: 1504.2
> > chrRangeToJpeg_256_neon: 358.5
> > chrRangeToJpeg_512_c: 3025.7
> > chrRangeToJpeg_512_neon: 710.5
> > lumRangeFromJpeg_8_c: 24.0
> > lumRangeFromJpeg_8_neon: 18.2
> > lumRangeFromJpeg_24_c: 64.0
> > lumRangeFromJpeg_24_neon: 22.2
> > lumRangeFromJpeg_128_c: 289.2
> > lumRangeFromJpeg_128_neon: 79.2
> > lumRangeFromJpeg_144_c: 334.7
> > lumRangeFromJpeg_144_neon: 87.7
> > lumRangeFromJpeg_256_c: 579.5
> > lumRangeFromJpeg_256_neon: 152.0
> > lumRangeFromJpeg_512_c: 1208.0
> > lumRangeFromJpeg_512_neon: 299.0
> > lumRangeToJpeg_8_c: 30.0
> > lumRangeToJpeg_8_neon: 19.0
> > lumRangeToJpeg_24_c: 82.2
> > lumRangeToJpeg_24_neon: 24.0
> > lumRangeToJpeg_128_c: 440.7
> > lumRangeToJpeg_128_neon: 90.5
> > lumRangeToJpeg_144_c: 502.0
> > lumRangeToJpeg_144_neon: 102.2
> > lumRangeToJpeg_256_c: 893.7
> > lumRangeToJpeg_256_neon: 178.0
> > lumRangeToJpeg_512_c: 1793.7
> > lumRangeToJpeg_512_neon: 355.0
> > ---
> > libswscale/aarch64/Makefile             |   1 +
> > libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++
> > libswscale/aarch64/swscale.c            |  21 +++++
> > libswscale/swscale_internal.h           |   1 +
> > libswscale/utils.c                      |   4 +-
> > 5 files changed, 129 insertions(+), 1 deletion(-)
> > create mode 100644 libswscale/aarch64/range_convert_neon.S
> >
> > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile
> > index da1d909561..6923827f82 100644
> > --- a/libswscale/aarch64/Makefile
> > +++ b/libswscale/aarch64/Makefile
> > @@ -4,5 +4,6 @@ OBJS        += aarch64/rgb2rgb.o                \
> >
> > NEON-OBJS   += aarch64/hscale.o                 \
> >                aarch64/output.o                 \
> > +               aarch64/range_convert_neon.o     \
> >                aarch64/rgb2rgb_neon.o           \
> >                aarch64/yuv2rgb_neon.o           \
> > diff --git a/libswscale/aarch64/range_convert_neon.S 
> > b/libswscale/aarch64/range_convert_neon.S
> > new file mode 100644
> > index 0000000000..5e104971f0
> > --- /dev/null
> > +++ b/libswscale/aarch64/range_convert_neon.S
> > @@ -0,0 +1,103 @@
> > +/*
> > + * Copyright (c) 2024 Ramiro Polla
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
> > 02110-1301 USA
> > + */
> > +
> > +#include "libavutil/aarch64/asm.S"
> > +
> > +.macro lumConvertRange name max mult offset shift
>
> We usually use commas between the macro arguments here. Apparently it
> doesn't make any difference for any of the tools we support, but it would
> be nice for consistency. (When invoking macros, commas between arguments
> are optional for most platforms, but not when targeting Apple platforms,
> so being strict with consistent use of commas is generally good.)


Fixed in the new patchset.

> > +const offset_\name, align=4
> > +        .word \offset, \offset, \offset, \offset
> > +endconst
> > +function ff_\name, export=1
> > +.if \max != 0
> > +        mov             w3, #\max
> > +        dup             v24.8h, w3
> > +.endif
> > +        mov             w3, #\mult
> > +        dup             v25.4s, w3
> > +        movrel          x3, offset_\name
> > +        ld1             {v26.4s}, [x3]
>
> FWIW, I did see that you were recommended this form, over ld1r, based on
> some microarchitectural performance numbers. However in our preexisting
> assembly, manually pre-splatting vectors like this is unusual I would say.
> I don't have a strong opinion on the matter though.
>
> Anyway, the assembly looks reasonable to me.

I changed it to movz/movk/dup in the new patchset (tested on rpi5, but
not on macos).

Thanks,
Ramiro
_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange

Reply via email to