On Sun, Sep 29, 2024 at 10:16 PM Martin Storsjö <mar...@martin.st> wrote:

> On Sun, 29 Sep 2024, Zhao Zhili wrote:
>
> > From: Zhao Zhili <zhiliz...@tencent.com>
> >
> > w_avg_8_2x2_c:                                           0.0 ( 0.00x)
> > w_avg_8_2x2_neon:                                        0.0 ( 0.00x)
> > w_avg_8_4x4_c:                                           0.2 ( 1.00x)
> > w_avg_8_4x4_neon:                                        0.0 ( 0.00x)
> > w_avg_8_8x8_c:                                           1.2 ( 1.00x)
> > w_avg_8_8x8_neon:                                        0.2 ( 5.00x)
> > w_avg_8_16x16_c:                                         4.2 ( 1.00x)
> > w_avg_8_16x16_neon:                                      0.8 ( 5.67x)
> > w_avg_8_32x32_c:                                        16.2 ( 1.00x)
> > w_avg_8_32x32_neon:                                      2.5 ( 6.50x)
> > w_avg_8_64x64_c:                                        64.5 ( 1.00x)
> > w_avg_8_64x64_neon:                                      9.0 ( 7.17x)
> > w_avg_8_128x128_c:                                     269.5 ( 1.00x)
> > w_avg_8_128x128_neon:                                   35.5 ( 7.59x)
> > w_avg_10_2x2_c:                                          0.2 ( 1.00x)
> > w_avg_10_2x2_neon:                                       0.2 ( 1.00x)
> > w_avg_10_4x4_c:                                          0.2 ( 1.00x)
> > w_avg_10_4x4_neon:                                       0.2 ( 1.00x)
> > w_avg_10_8x8_c:                                          1.0 ( 1.00x)
> > w_avg_10_8x8_neon:                                       0.2 ( 4.00x)
> > w_avg_10_16x16_c:                                        4.2 ( 1.00x)
> > w_avg_10_16x16_neon:                                     0.8 ( 5.67x)
> > w_avg_10_32x32_c:                                       16.2 ( 1.00x)
> > w_avg_10_32x32_neon:                                     2.5 ( 6.50x)
> > w_avg_10_64x64_c:                                       66.2 ( 1.00x)
> > w_avg_10_64x64_neon:                                    10.0 ( 6.62x)
> > w_avg_10_128x128_c:                                    277.8 ( 1.00x)
> > w_avg_10_128x128_neon:                                  39.8 ( 6.99x)
> > w_avg_12_2x2_c:                                          0.0 ( 0.00x)
> > w_avg_12_2x2_neon:                                       0.2 ( 0.00x)
> > w_avg_12_4x4_c:                                          0.2 ( 1.00x)
> > w_avg_12_4x4_neon:                                       0.0 ( 0.00x)
> > w_avg_12_8x8_c:                                          1.2 ( 1.00x)
> > w_avg_12_8x8_neon:                                       0.5 ( 2.50x)
> > w_avg_12_16x16_c:                                        4.8 ( 1.00x)
> > w_avg_12_16x16_neon:                                     0.8 ( 6.33x)
> > w_avg_12_32x32_c:                                       17.0 ( 1.00x)
> > w_avg_12_32x32_neon:                                     2.8 ( 6.18x)
> > w_avg_12_64x64_c:                                       64.0 ( 1.00x)
> > w_avg_12_64x64_neon:                                    10.0 ( 6.40x)
> > w_avg_12_128x128_c:                                    269.2 ( 1.00x)
> > w_avg_12_128x128_neon:                                  42.0 ( 6.41x)
> >
> > Signed-off-by: Zhao Zhili <zhiliz...@tencent.com>
> > ---
> > libavcodec/aarch64/vvc/dsp_init.c | 36 +++++++++++
> > libavcodec/aarch64/vvc/inter.S    | 99 +++++++++++++++++++++++++------
> > 2 files changed, 118 insertions(+), 17 deletions(-)
> >
> > diff --git a/libavcodec/aarch64/vvc/dsp_init.c
> b/libavcodec/aarch64/vvc/dsp_init.c
> > index ad767d17e2..ebe58a2ba5 100644
> > --- a/libavcodec/aarch64/vvc/dsp_init.c
> > +++ b/libavcodec/aarch64/vvc/dsp_init.c
> > @@ -52,6 +52,39 @@ void ff_vvc_avg_12_neon(uint8_t *dst, ptrdiff_t
> dst_stride,
> >                         const int16_t *src0, const int16_t *src1, int
> width,
> >                         int height);
> >
> > +void ff_vvc_w_avg_8_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
> > +                         const int16_t *src0, const int16_t *src1,
> > +                         int width, int height,
> > +                         uintptr_t w0_w1, uintptr_t offset_shift);
> > +void ff_vvc_w_avg_10_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
> > +                         const int16_t *src0, const int16_t *src1,
> > +                         int width, int height,
> > +                         uintptr_t w0_w1, uintptr_t offset_shift);
> > +void ff_vvc_w_avg_12_neon(uint8_t *_dst, ptrdiff_t _dst_stride,
> > +                          const int16_t *src0, const int16_t *src1,
> > +                          int width, int height,
> > +                          uintptr_t w0_w1, uintptr_t offset_shift);
> > +/* When passing arguments to functions, Apple platforms diverge from
> the ARM64
> > + * standard ABI for functions that require passing arguments on the
> stack. To
> > + * simplify portability in the assembly function interface, use a
> different
> > + * function signature that doesn't require passing arguments on the
> stack.
> > + */
> > +#define W_AVG_FUN(bit_depth) \
> > +static void vvc_w_avg_ ## bit_depth(uint8_t *dst, ptrdiff_t dst_stride,
> \
> > +    const int16_t *src0, const int16_t *src1, int width, int height, \
> > +    int denom, int w0, int w1, int o0, int o1) \
> > +{ \
> > +    int shift = denom + FFMAX(3, 15 - bit_depth); \
> > +    int offset = ((o0 + o1) * (1 << (bit_depth - 8)) + 1) * (1 <<
> (shift - 1)); \
> > +    uintptr_t w0_w1 = ((uintptr_t)w0 << 32) | (uint32_t)w1; \
> > +    uintptr_t offset_shift = ((uintptr_t)offset << 32) |
> (uint32_t)shift; \
> > +    ff_vvc_w_avg_ ## bit_depth ## _neon(dst, dst_stride, src0, src1,
> width, height, w0_w1, offset_shift); \
> > +}
>
> This LGTM now (and the rest was ok already before), thanks!
>
Applied.
Thank you, Zhili and Martin.

>
> // Martin
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to