aarch64: add HEVC idct_dc NEON

Martin Storsjö Sat, 16 Jan 2021 15:04:59 -0800

On Thu, 7 Jan 2021, Josh Dekker wrote:

Signed-off-by: Josh Dekker <[email protected]>
---
libavcodec/aarch64/Makefile            |  3 +-
libavcodec/aarch64/hevcdsp_idct_neon.S | 74 ++++++++++++++++++++++++++
libavcodec/aarch64/hevcdsp_init.c      | 19 +++++++
3 files changed, 95 insertions(+), 1 deletion(-)
create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S


diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile
index 4bdd554e7e..42d80bf74c 100644
--- a/libavcodec/aarch64/Makefile
+++ b/libavcodec/aarch64/Makefile
@@ -54,7 +54,8 @@ NEON-OBJS-$(CONFIG_VP8DSP)              += 
aarch64/vp8dsp_neon.o
# decoders/encoders
NEON-OBJS-$(CONFIG_AAC_DECODER)         += aarch64/aacpsdsp_neon.o
NEON-OBJS-$(CONFIG_DCA_DECODER)         += aarch64/synth_filter_neon.o
-NEON-OBJS-$(CONFIG_HEVC_DECODER)        += aarch64/hevcdsp_add_res_neon.o
+NEON-OBJS-$(CONFIG_HEVC_DECODER)        += aarch64/hevcdsp_add_res_neon.o      
\
+                                           aarch64/hevcdsp_idct_neon.o
NEON-OBJS-$(CONFIG_OPUS_DECODER)        += aarch64/opusdsp_neon.o
NEON-OBJS-$(CONFIG_VORBIS_DECODER)      += aarch64/vorbisdsp_neon.o
NEON-OBJS-$(CONFIG_VP9_DECODER)         += aarch64/vp9itxfm_16bpp_neon.o       \
diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
new file mode 100644
index 0000000000..cd886bb6dc
--- /dev/null
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -0,0 +1,74 @@
+/* -*-arm64-*-
+ *
+ * AArch64 NEON optimised IDCT functions for HEVC decoding
+ *
+ * Copyright (c) 2020 Josh Dekker <[email protected]>
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/aarch64/asm.S"
+
+.macro idct_dc size bitdepth
+function ff_hevc_idct_\size\()x\size\()_dc_\bitdepth\()_neon, export=1

As a bonus, it'd be nice to have the function prototype in a comment abovehere, as a quick reference for what the inputs are.

+    ldrsh w1, [x0]
+    mov   w2, #(1 << (13 - \bitdepth))
+    add   w1, w1, #1
+    asr   w1, w1, #1
+    add   w1, w1, w2

As commented on the other patch, please align things like in existingassembly, and add another extra space after the commas here, to keepthings aligned even if you'd be using e.g. w10.

+    asr   w1, w1, #(14 - \bitdepth)
+    dup   v0.8h, w1
+    dup   v1.8h, w1
+.if \size > 4
+    dup   v2.8h, w1
+    dup   v3.8h, w1
+.if \size > 16 /* dc 32x32 */
+    mov x2, #4
+1:
+    subs x2, x2, #1
+.endif
+.if \size > 8 /* dc 16x16 */
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64
+.endif /* dc 8x8 */
+    st1   {v0.8h-v3.8h}, [x0], #64
+    st1   {v0.8h-v3.8h}, [x0], #64

In a series of stores like this, each instruction updates the pointer x0,which means that the next instruction can't start executing until theprevious one is done writing back the value to that register. So for asequence like this, it's sometimes useful to have two registers forstoring things interleavedly, e.g. like this:


add x12,  x0,  #64
mov x13,  #128
st1 {}, [x0],  x13
st1 {}, [x12], x13
st1 {}, [x0],  x13
st1 {}, [x12], x13
...

Depending on the context and so on, it may or may not be worth doing that.For such a small trivial function like this, I'd definitely try at least.

Btw, you didn't mention what core you had benchmarked your functions on.When doing micro tuning like this, it's usually good to benchmark on bothbig and small cores, as some optimizations can make things faster on onebut slower on another one.

Functionally, the patch looks ok - but as this goes into the same file asReimar's patches also add things into (and his patches do things that arefairly 1:1 with the existing arm assembly), it might be good to hold offof pushing this one until his patches are in, to rebase this one on top ofthose.


// Martin

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/4] lavc/aarch64: add HEVC idct_dc NEON

Reply via email to