On Mon, 17 Oct 2022, Hubert Mazur wrote:

Provide arm64 neon optimized implementations for hscale16To19 with
filter sizes 4, 8 and X4.

The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.

hscale_16_to_19__fs_4_dstW_512_c: 6216.0
hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
hscale_16_to_19__fs_8_dstW_512_c: 10417.7
hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
hscale_16_to_19__fs_12_dstW_512_c: 14890.5
hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
hscale_16_to_19__fs_16_dstW_512_c: 19006.5
hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
hscale_16_to_19__fs_32_dstW_512_c: 36629.5
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0

Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libswscale/aarch64/hscale.S  | 402 +++++++++++++++++++++++++++++++++++
libswscale/aarch64/swscale.c |  70 +++++-
2 files changed, 471 insertions(+), 1 deletion(-)

+void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
+                      const uint8_t *_src, const int16_t *filter,
+                      const int32_t *filterPos, int filterSize);
+void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
+                      const uint8_t *_src, const int16_t *filter,
+                      const int32_t *filterPos, int filterSize);
+void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
+                      const uint8_t *_src, const int16_t *filter,
+                      const int32_t *filterPos, int filterSize);
+
#define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
                                                SwsContext *c, int16_t *data, \
@@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n 
## _ ## opt( \
#define SCALE_FUNCS(filter_n, opt) \
    SCALE_FUNC(filter_n,  8, 15, opt); \
    SCALE_FUNC(filter_n, 8, 19, opt); \
-    SCALE_FUNC(filter_n, 16, 15, opt);
+    SCALE_FUNC(filter_n, 16, 15, opt); \
+    SCALE_FUNC(filter_n, 16, 19, opt);

So this declares the functions we're implementing as C wrappers below, and the manual declarations further up declare the actual asm functions?

I guess that works, although it makes unnecessary extern functions. In such cases, we usually have the C functions be static functions, placed above the code that uses them. But it's not a big deal.

Other than that, this patchset mostly seems fine.

However, I tested the patches on x86, and the new checkasm tests do fail on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since we'll need to do a new round anyway, please do try to fix up the minor cosmetics I mentioned.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to