On Mon, 17 Oct 2022, Hubert Mazur wrote:
Provide arm64 neon optimized implementations for hscale16To19 with
filter sizes 4, 8 and X4.
The tests and benchmarks run on AWS Graviton 2 instances.
The results from a checkasm tool are shown below.
hscale_16_to_19__fs_4_dstW_512_c: 6216.0
hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
hscale_16_to_19__fs_8_dstW_512_c: 10417.7
hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
hscale_16_to_19__fs_12_dstW_512_c: 14890.5
hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
hscale_16_to_19__fs_16_dstW_512_c: 19006.5
hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
hscale_16_to_19__fs_32_dstW_512_c: 36629.5
hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
hscale_16_to_19__fs_40_dstW_512_c: 45477.5
hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libswscale/aarch64/hscale.S | 402 +++++++++++++++++++++++++++++++++++
libswscale/aarch64/swscale.c | 70 +++++-
2 files changed, 471 insertions(+), 1 deletion(-)
+void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW,
+ const uint8_t *_src, const int16_t *filter,
+ const int32_t *filterPos, int filterSize);
+void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW,
+ const uint8_t *_src, const int16_t *filter,
+ const int32_t *filterPos, int filterSize);
+void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW,
+ const uint8_t *_src, const int16_t *filter,
+ const int32_t *filterPos, int filterSize);
+
#define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \
void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( \
SwsContext *c, int16_t *data, \
@@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n
## _ ## opt( \
#define SCALE_FUNCS(filter_n, opt) \
SCALE_FUNC(filter_n, 8, 15, opt); \
SCALE_FUNC(filter_n, 8, 19, opt); \
- SCALE_FUNC(filter_n, 16, 15, opt);
+ SCALE_FUNC(filter_n, 16, 15, opt); \
+ SCALE_FUNC(filter_n, 16, 19, opt);
So this declares the functions we're implementing as C wrappers below, and
the manual declarations further up declare the actual asm functions?
I guess that works, although it makes unnecessary extern functions. In
such cases, we usually have the C functions be static functions, placed
above the code that uses them. But it's not a big deal.
Other than that, this patchset mostly seems fine.
However, I tested the patches on x86, and the new checkasm tests do fail
on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since
we'll need to do a new round anyway, please do try to fix up the minor
cosmetics I mentioned.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".