On Sun, 10 Jan 2021, reimar.doeffin...@gmx.de wrote:

From: Reimar Döffinger <reimar.doeffin...@gmx.de>

Speedup is fairly small, around 1.5%, but these are fairly simple.
---
libavcodec/aarch64/hevcdsp_idct_neon.S    | 190 ++++++++++++++++++++++
libavcodec/aarch64/hevcdsp_init_aarch64.c |  24 +++
2 files changed, 214 insertions(+)

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
index 9f67e45..edd03a0 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -36,6 +36,196 @@ const trans, align=4
        .short 31, 22, 13, 4
endconst

+.macro clip10 in1, in2, c1, c2
+        smax        \in1, \in1, \c1
+        smax        \in2, \in2, \c1
+        smin        \in1, \in1, \c2
+        smin        \in2, \in2, \c2
+.endm
+
+function ff_hevc_add_residual_4x4_8_neon, export=1
+        ld1             {v0.8H-v1.8H}, [x1]
+        ld1             {v2.S}[0], [x0], x2
+        ld1             {v2.S}[1], [x0], x2
+        ld1             {v2.S}[2], [x0], x2
+        ld1             {v2.S}[3], [x0], x2
+        sub             x0, x0, x2, lsl #2
+        uxtl            v8.8H, v2.8B
+        uxtl2           v9.8H, v2.16B
+        sqadd           v0.8H, v0.8H, v8.8H

FWIW, as a matter of taste, I dislike the shouty uppercase version of e.g. element specifiers, like .8H here. The code base contains both styles, but I'd say the lowercase form is more prevalent.

Overall, this patch looks good, nothing much to comment on I think. Not tested fully though, as it depends on the other patch, which still has a few issues (and fails checkasm).

// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to