[FFmpeg-devel] [PATCH 1/2] lavu/checkasm: add (private) kperf timing for macOS
Signed-off-by: Josh Dekker --- configure | 2 + libavutil/Makefile| 1 + libavutil/macos_kperf.c | 140 ++ libavutil/macos_kperf.h | 23 +++ libavutil/timer.h | 17 - tests/checkasm/checkasm.c | 14 +++- tests/checkasm/checkasm.h | 7 +- 7 files changed, 200 insertions(+), 4 deletions(-) create mode 100644 libavutil/macos_kperf.c create mode 100644 libavutil/macos_kperf.h diff --git a/configure b/configure index 820f719a32..a79052ad28 100755 --- a/configure +++ b/configure @@ -489,6 +489,7 @@ Developer options (useful when working on FFmpeg itself): --ignore-tests=TESTS comma-separated list (without "fate-" prefix in the name) of tests whose result is ignored --enable-linux-perf enable Linux Performance Monitor API + --enable-macos-kperf enable macOS kperf (private) API --disable-large-testsdisable tests that use a large amount of memory NOTE: Object files are built at the place where configure is launched. @@ -1947,6 +1948,7 @@ CONFIG_LIST=" fontconfig large_tests linux_perf +macos_kperf memory_poisoning neon_clobber_test ossfuzz diff --git a/libavutil/Makefile b/libavutil/Makefile index 47efb718d2..18dc5f22d9 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -181,6 +181,7 @@ OBJS-$(CONFIG_D3D11VA) += hwcontext_d3d11va.o OBJS-$(CONFIG_DXVA2)+= hwcontext_dxva2.o OBJS-$(CONFIG_LIBDRM) += hwcontext_drm.o OBJS-$(CONFIG_LZO) += lzo.o +OBJS-$(CONFIG_MACOS_KPERF) += macos_kperf.o OBJS-$(CONFIG_MEDIACODEC) += hwcontext_mediacodec.o OBJS-$(CONFIG_OPENCL) += hwcontext_opencl.o OBJS-$(CONFIG_QSV) += hwcontext_qsv.o diff --git a/libavutil/macos_kperf.c b/libavutil/macos_kperf.c new file mode 100644 index 00..d5de491e12 --- /dev/null +++ b/libavutil/macos_kperf.c @@ -0,0 +1,140 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "macos_kperf.h" +#include +#include +#include + +#define KPERF_LIST \ +F(int, kpc_get_counting, void) \ +F(int, kpc_force_all_ctrs_set, int)\ +F(int, kpc_set_counting, uint32_t) \ +F(int, kpc_set_thread_counting, uint32_t) \ +F(int, kpc_set_config, uint32_t, void *) \ +F(int, kpc_get_config, uint32_t, void *) \ +F(int, kpc_set_period, uint32_t, void *) \ +F(int, kpc_get_period, uint32_t, void *) \ +F(uint32_t, kpc_get_counter_count, uint32_t) \ +F(uint32_t, kpc_get_config_count, uint32_t)\ +F(int, kperf_sample_get, int *)\ +F(int, kpc_get_thread_counters, int, unsigned int, void *) + +#define F(ret, name, ...) \ +typedef ret name##proc(__VA_ARGS__); \ +static name##proc *name = NULL; +KPERF_LIST +#undef F + +#define CFGWORD_EL0A32EN_MASK (0x1) +#define CFGWORD_EL0A64EN_MASK (0x2) +#define CFGWORD_EL1EN_MASK(0x4) +#define CFGWORD_EL3EN_MASK(0x8) +#define CFGWORD_ALLMODES_MASK (0xf) + +#define CPMU_NONE 0 +#define CPMU_CORE_CYCLE 0x02 +#define CPMU_INST_A64 0x8c +#define CPMU_INST_BRANCH 0x8d +#define CPMU_SYNC_DC_LOAD_MISS 0xbf +#define CPMU_SYNC_DC_STORE_MISS 0xc0 +#define CPMU_SYNC_DTLB_MISS 0xc1 +#define CPMU_SYNC_ST_HIT_YNGR_LD 0xc4 +#define CPMU_SYNC_BR_ANY_MISP 0xcb +#define CPMU_FED_IC_MISS_DEM 0xd3 +#define CPMU_FED_ITLB_MISS 0xd4 + +#define KPC_CLASS_FIXED_MASK(1 << 0) +#define KPC_CLASS_CONFIGURABLE_MASK (1 << 1) +#define KPC_CLASS_POWER_MASK(1 << 2) +#define KPC_CLASS_RAWPMU_MASK (1 << 3) + +#define COUNTERS_COUNT 10 +#define CONFIG_COUNT 8 +#define KPC_MASK (KPC_CLASS_CONFIGURABLE_MASK | KPC_CLASS_FIXED_MASK) + +int ff_kperf_setup() +{ +uint64_t config[COUNTERS_COUNT] = {0}; +config[0] = CPMU_CORE_CYCLE | CFGWORD_EL0A64EN_MASK; +// con
[FFmpeg-devel] [PATCH 0/2] ARM64 HEVC QPEL/EPEL
This is a patch originally, submitted in 2017 (author/date info left intact). At the time, it didn't get much attention I assume due to the sheer size of it. I have split the patch into only its QPEL/EPEL parts, rebasing, and doing some cleaning of the patches as much is reasonable for a 9001 line diff. I also have SAO band (non-working) and 32x32 IDCT (working but honestly in a worse state than these patches). This patch gives a large overall speedup roughly 30% in my testing. The only problem is that (as previously stated), 1) it's a lot of code, the original author didn't make use of macros. 2) it's only 8-bit. I will be writing 10-bit assembly, and whilst I do that will clean-up/macro-ify the current 8-bit assembly. Though there is still lots to be done. Our current IDCTs for HEVC aren't great either, I had a 40% speedup on the 16x16 one in testing. The assembly is far from 'done' but we're getting closer slowly at least. There were some suggestions for smaller improvements in the previous reviews and I have not applied those. The first course of action is to refractor it so that it is possible to work on the code without going insane. I think it's fine to use it whilst I'm working on refractoring it due to the large speedup: the code-weight in the binary should be relatively similar even after that anyway. Also, updated kperf patch as per Lynne's request. --. Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v4 1/2] lavc/aarch64: change h264pred_init structure
Set applied. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] checkasm: add (private) kperf timing for macOS
Signed-off-by: Josh Dekker --- configure| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c| 19 - tests/checkasm/checkasm.h| 10 ++- tests/checkasm/macos_kperf.c | 143 +++ tests/checkasm/macos_kperf.h | 23 ++ 6 files changed, 195 insertions(+), 3 deletions(-) create mode 100644 tests/checkasm/macos_kperf.c create mode 100644 tests/checkasm/macos_kperf.h diff --git a/configure b/configure index d7a3f507e8..a47e3dea67 100755 --- a/configure +++ b/configure @@ -490,6 +490,7 @@ Developer options (useful when working on FFmpeg itself): --ignore-tests=TESTS comma-separated list (without "fate-" prefix in the name) of tests whose result is ignored --enable-linux-perf enable Linux Performance Monitor API + --enable-macos-kperf enable macOS kperf (private) API --disable-large-testsdisable tests that use a large amount of memory NOTE: Object files are built at the place where configure is launched. @@ -1949,6 +1950,7 @@ CONFIG_LIST=" fontconfig large_tests linux_perf +macos_kperf memory_poisoning neon_clobber_test ossfuzz diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 1827a4e134..4abaef9c63 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -58,6 +58,7 @@ CHECKASMOBJS-$(CONFIG_AVUTIL) += $(AVUTILOBJS) CHECKASMOBJS-$(ARCH_AARCH64)+= aarch64/checkasm.o CHECKASMOBJS-$(HAVE_ARMV5TE_EXTERNAL) += arm/checkasm.o CHECKASMOBJS-$(HAVE_X86ASM) += x86/checkasm.o +CHECKASMOBJS-$(CONFIG_MACOS_KPERF) += macos_kperf.o CHECKASMOBJS += $(CHECKASMOBJS-yes) checkasm.o CHECKASMOBJS := $(sort $(CHECKASMOBJS:%=tests/checkasm/%)) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 8338e8ff58..4c42040244 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -26,6 +26,8 @@ # ifndef _GNU_SOURCE # define _GNU_SOURCE // for syscall (performance monitoring API) # endif +#elif CONFIG_MACOS_KPERF +#include "macos_kperf.h" #endif #include @@ -637,9 +639,20 @@ static int bench_init_linux(void) } return 0; } -#endif +#elif CONFIG_MACOS_KPERF +static int bench_init_kperf(void) +{ +if (ff_kperf_init() || ff_kperf_setup()) +return -1; -#if !CONFIG_LINUX_PERF +if (ff_kperf_cycles(NULL)) { +fprintf(stderr, "checkasm must be run as root to use kperf on macOS\n"); +return -1; +} + +return 0; +} +#else static int bench_init_ffmpeg(void) { #ifdef AV_READ_TIME @@ -656,6 +669,8 @@ static int bench_init(void) { #if CONFIG_LINUX_PERF int ret = bench_init_linux(); +#elif CONFIG_MACOS_KPERF +int ret = bench_init_kperf(); #else int ret = bench_init_ffmpeg(); #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index ef6645e3a2..4127081d74 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -31,6 +31,8 @@ #include #include #include +#elif CONFIG_MACOS_KPERF +#include "macos_kperf.h" #endif #include "libavutil/avstring.h" @@ -224,7 +226,7 @@ typedef struct CheckasmPerf { int iterations; } CheckasmPerf; -#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF +#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF || CONFIG_MACOS_KPERF #if CONFIG_LINUX_PERF #define PERF_START(t) do { \ @@ -235,6 +237,12 @@ typedef struct CheckasmPerf { ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0);\ read(sysfd, &t, sizeof(t)); \ } while (0) +#elif CONFIG_MACOS_KPERF +#define PERF_START(t) do { \ +t = 0; \ +ff_kperf_cycles(&t);\ +} while (0) +#define PERF_STOP(t) ff_kperf_cycles(&t) #else #define PERF_START(t) t = AV_READ_TIME() #define PERF_STOP(t) t = AV_READ_TIME() - t diff --git a/tests/checkasm/macos_kperf.c b/tests/checkasm/macos_kperf.c new file mode 100644 index 00..e6ae316608 --- /dev/null +++ b/tests/checkasm/macos_kperf.c @@ -0,0 +1,143 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 US
Re: [FFmpeg-devel] [PATCH v2 0/4] avcodec/aarch64/hevcdsp
Set pushed with all Martin's changes implemented. More NEON & updates soon. -- Josh On 2021-02-04 12:32, Josh Dekker wrote: Hi, Rebases the unpushed part of my patches on top of Reimar's set. Also implements Martin's suggestions except 'unrolling the loop' for SAO band function, will update the band function when I fix non 8x8 cases. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 4/4] avcodec/aarch64/hevcdsp: add sao_band NEON
Only works for 8x8. Signed-off-by: Josh Dekker --- libavcodec/aarch64/Makefile | 3 +- libavcodec/aarch64/hevcdsp_init_aarch64.c | 7 ++ libavcodec/aarch64/hevcdsp_sao_neon.S | 87 +++ 3 files changed, 96 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/hevcdsp_sao_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 2ea1d74a38..954461f81d 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -62,4 +62,5 @@ NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ aarch64/vp9mc_16bpp_neon.o \ aarch64/vp9mc_neon.o NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_idct_neon.o \ - aarch64/hevcdsp_init_aarch64.o + aarch64/hevcdsp_init_aarch64.o \ + aarch64/hevcdsp_sao_neon.o diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index fe111bd1ac..c785e46f79 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -53,6 +53,12 @@ void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); +void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, uint8_t *_src, + ptrdiff_t stride_dst, ptrdiff_t stride_src, + int16_t *sao_offset_val, int sao_left_class, + int width, int height); + + av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) { @@ -69,6 +75,7 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; +c->sao_band_filter[0] = ff_hevc_sao_band_filter_8x8_8_neon; } if (bit_depth == 10) { c->add_residual[0] = ff_hevc_add_residual_4x4_10_neon; diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S new file mode 100644 index 00..f142c1e8c2 --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -0,0 +1,87 @@ +/* -*-arm64-*- + * vim: syntax=arm64asm + * + * AArch64 NEON optimised SAO functions for HEVC decoding + * + * Copyright (c) 2020 Josh Dekker + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +// void sao_band_filter(uint8_t *_dst, uint8_t *_src, +// ptrdiff_t stride_dst, ptrdiff_t stride_src, +// int16_t *sao_offset_val, int sao_left_class, +// int width, int height) +function ff_hevc_sao_band_filter_8x8_8_neon, export=1 +sub sp, sp, #64 +stp xzr, xzr, [sp] +stp xzr, xzr, [sp, #16] +stp xzr, xzr, [sp, #32] +stp xzr, xzr, [sp, #48] +mov w8, #4 +0: +ldrsh x9, [x4, x8, lsl #1] // x9 = sao_offset_val[k+1] +subs w8, w8, #1 +add w10, w8, w5 // x10 = k + sao_left_class +and w10, w10, #0x1F +strh w9, [sp, x10, lsl #1] +bne 0b +ld1 {v16.16b-v19.16b}, [sp], #64 +movi v20.8h, #1 +1: // beginning of line +mov w8, w6 +2: +// Simple layout for accessing 16bit values +// with 8bit LUT. +// +// 00 01 02 03 04 05 06 07 +// +---> +// |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED| +// +---> +//i-0 i-1 i-2 i-3 +// dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); +ld1 {v2.8b}, [x1] +// load src[x] +uxtl v0.8h, v2.8b +// >> shift +ushr v2.8h, v0.8h, #3 // BIT_DEPTH - 3 +// x2 (access lower short) +shl v1.8h, v2.8h, #1 // low (x2, accessing short) +// +1 acces
[FFmpeg-devel] [PATCH v2 1/4] avcodec/aarch64/hevcdsp: port SIMD idct functions
From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts", running on Apple M1. Signed-off-by: Josh Dekker --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/hevcdsp_idct_neon.S| 380 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 45 +++ libavcodec/hevcdsp.c | 2 + libavcodec/hevcdsp.h | 1 + 5 files changed, 430 insertions(+) create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index f6434e40da..2ea1d74a38 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -61,3 +61,5 @@ NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ aarch64/vp9lpf_neon.o \ aarch64/vp9mc_16bpp_neon.o \ aarch64/vp9mc_neon.o +NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_idct_neon.o \ + aarch64/hevcdsp_init_aarch64.o diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S new file mode 100644 index 00..c70d6a906d --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -0,0 +1,380 @@ +/* + * ARM NEON optimised IDCT functions for HEVC decoding + * Copyright (c) 2014 Seppo Tomperi + * Copyright (c) 2017 Alexandra Hájková + * + * Ported from arm/hevcdsp_idct_neon.S by + * Copyright (c) 2020 Reimar Döffinger + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +const trans, align=4 +.short 64, 83, 64, 36 +.short 89, 75, 50, 18 +.short 90, 87, 80, 70 +.short 57, 43, 25, 9 +.short 90, 90, 88, 85 +.short 82, 78, 73, 67 +.short 61, 54, 46, 38 +.short 31, 22, 13, 4 +endconst + +.macro sum_sub out, in, c, op, p + .ifc \op, + +smlal\p \out, \in, \c + .else +smlsl\p \out, \in, \c + .endif +.endm + +.macro fixsqrshrn d, dt, n, m + .ifc \dt, .8h +sqrshrn2\d\dt, \n\().4s, \m + .else +sqrshrn \n\().4h, \n\().4s, \m +mov \d\().d[0], \n\().d[0] + .endif +.endm + +// uses and clobbers v28-v31 as temp registers +.macro tr_4x4_8 in0, in1, in2, in3, out0, out1, out2, out3, p1, p2 + sshll\p1 v28.4s, \in0, #6 + movv29.16b, v28.16b + smull\p1 v30.4s, \in1, v0.h[1] + smull\p1 v31.4s, \in1, v0.h[3] + smlal\p2 v28.4s, \in2, v0.h[0] //e0 + smlsl\p2 v29.4s, \in2, v0.h[0] //e1 + smlal\p2 v30.4s, \in3, v0.h[3] //o0 + smlsl\p2 v31.4s, \in3, v0.h[1] //o1 + + add\out0, v28.4s, v30.4s + add\out1, v29.4s, v31.4s + sub\out2, v29.4s, v31.4s + sub\out3, v28.4s, v30.4s +.endm + +.macro transpose8_4x4 r0, r1, r2, r3 +trn1v2.8h, \r0\().8h, \r1\().8h +trn2v3.8h, \r0\().8h, \r1\().8h +trn1v4.8h, \r2\().8h, \r3\().8h +trn2v5.8h, \r2\().8h, \r3\().8h +trn1\r0\().4s, v2.4s, v4.4s +trn2\r2\().4s, v2.4s, v4.4s +trn1\r1\().4s, v3.4s, v5.4s +trn2\r3\().4s, v3.4s, v5.4s +.endm + +.macro transpose_8x8 r0, r1, r2, r3, r4, r5, r6, r7 +transpose8_4x4 \r0, \r1, \r2, \r3 +transpose8_4x4 \r4, \r5, \r6, \r7 +.endm + +.macro tr_8x4 shift, in0,in0t, in1,in1t, in2,in2t, in3,in3t, in4,in4t, in5,in5t, in6,in6t, in7,in7t, p1, p2 +tr_4x4_8\in0\in0t, \in2\in2t, \in4\in4t, \in6\in6t, v24.4s, v25.4s, v26.4s, v27.4s, \p1, \p2 + +smull\p1
[FFmpeg-devel] [PATCH v2 2/4] avcodec/aarch64/hevcdsp: port add_residual functions
From: Reimar Döffinger Speedup is fairly small, around 1.5%, but these are fairly simple. Signed-off-by: Josh Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index c70d6a906d..329038a958 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -36,6 +36,196 @@ const trans, align=4 .short 31, 22, 13, 4 endconst +.macro clip10 in1, in2, c1, c2 +smax\in1, \in1, \c1 +smax\in2, \in2, \c1 +smin\in1, \in1, \c2 +smin\in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 +ld1 {v0.8h-v1.8h}, [x1] +ld1 {v2.s}[0], [x0], x2 +ld1 {v2.s}[1], [x0], x2 +ld1 {v2.s}[2], [x0], x2 +ld1 {v2.s}[3], [x0], x2 +sub x0, x0, x2, lsl #2 +uxtlv6.8h, v2.8B +uxtl2 v7.8h, v2.16B +sqadd v0.8h, v0.8h, v6.8h +sqadd v1.8h, v1.8h, v7.8h +sqxtun v0.8B, v0.8h +sqxtun2 v0.16B, v1.8h +st1 {v0.s}[0], [x0], x2 +st1 {v0.s}[1], [x0], x2 +st1 {v0.s}[2], [x0], x2 +st1 {v0.s}[3], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_4x4_10_neon, export=1 +mov x12, x0 +ld1 {v0.8h-v1.8h}, [x1] +ld1 {v2.d}[0], [x12], x2 +ld1 {v2.d}[1], [x12], x2 +ld1 {v3.d}[0], [x12], x2 +sqadd v0.8h, v0.8h, v2.8h +ld1 {V3.d}[1], [x12], x2 +moviv4.8h, #0 +sqadd v1.8h, v1.8h, v3.8h +mvniv5.8h, #0xFC, LSL #8 // movi #0x3FF +clip10 v0.8h, v1.8h, v4.8h, v5.8h +st1 {v0.d}[0], [x0], x2 +st1 {v0.d}[1], [x0], x2 +st1 {v1.d}[0], [x0], x2 +st1 {v1.d}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_8x8_8_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +1: subsx3, x3, #2 +ld1 {v2.d}[0], [x0] +ld1 {v2.d}[1], [x12] +uxtlv3.8h, v2.8B +ld1 {v0.8h-v1.8h}, [x1], #32 +uxtl2 v2.8h, v2.16B +sqadd v0.8h, v0.8h, v3.8h +sqadd v1.8h, v1.8h, v2.8h +sqxtun v0.8B, v0.8h +sqxtun2 v0.16B, v1.8h +st1 {v0.d}[0], [x0], x2 +st1 {v0.d}[1], [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_8x8_10_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +moviv4.8h, #0 +mvniv5.8h, #0xFC, LSL #8 // movi #0x3FF +1: subsx3, x3, #2 +ld1 {v0.8h-v1.8h}, [x1], #32 +ld1 {v2.8h},[x0] +sqadd v0.8h, v0.8h, v2.8h +ld1 {v3.8h},[x12] +sqadd v1.8h, v1.8h, v3.8h +clip10 v0.8h, v1.8h, v4.8h, v5.8h +st1 {v0.8h}, [x0], x2 +st1 {v1.8h}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_8_neon, export=1 +mov x3, #16 +add x12, x0, x2 +add x2, x2, x2 +1: subsx3, x3, #2 +ld1 {v16.16B}, [x0] +ld1 {v0.8h-v3.8h}, [x1], #64 +ld1 {v19.16B},[x12] +uxtlv17.8h, v16.8B +uxtl2 v18.8h, v16.16B +uxtlv20.8h, v19.8B +uxtl2 v21.8h, v19.16B +sqadd v0.8h, v0.8h, v17.8h +sqadd v1.8h, v1.8h, v18.8h +sqadd v2.8h, v2.8h, v20.8h +sqadd v3.8h, v3.8h, v21.8h +sqxtun v0.8B, v0.8h +sqxtun2 v0.16B, v1.8h +sqxtun v1.8B, v2.8h +sqxtun2 v1.16B, v3.8h +st1 {v0.16B}, [x0], x2 +st1 {v1.16B}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_10_neon, export=1 +mov x3, #16 +moviv20.8h, #0 +mvniv21.8h, #0xFC, LSL #8 // movi #0x3FF +add x12, x0, x2 +add
[FFmpeg-devel] [PATCH v2 3/4] avcodec/aarch64/hevcdsp: add idct_dc NEON
Signed-off-by: Josh Dekker --- libavcodec/aarch64/hevcdsp_idct_neon.S| 54 +++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 16 +++ 2 files changed, 70 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 329038a958..d3902a9e0f 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -5,6 +5,7 @@ * * Ported from arm/hevcdsp_idct_neon.S by * Copyright (c) 2020 Reimar Döffinger + * Copyright (c) 2020 Josh Dekker * * This file is part of FFmpeg. * @@ -568,3 +569,56 @@ tr_16x4 secondpass_10, 20 - 10, 512, 1 idct_16x16 8 idct_16x16 10 + +// void ff_hevc_idct_NxN_dc_DEPTH_neon(int16_t *coeffs) +.macro idct_dc size bitdepth +function ff_hevc_idct_\size\()x\size\()_dc_\bitdepth\()_neon, export=1 +ldrsh w1, [x0] +mov w2, #(1 << (13 - \bitdepth)) +add w1, w1, #1 +asr w1, w1, #1 +add w1, w1, w2 +asr w1, w1, #(14 - \bitdepth) +dup v0.8h, w1 +dup v1.8h, w1 +.if \size > 4 +dup v2.8h, w1 +dup v3.8h, w1 +.if \size > 16 /* dc 32x32 */ +mov x2, #4 +1: +subsx2, x2, #1 +.endif +addx12, x0, #64 +movx13, #128 +.if \size > 8 /* dc 16x16 */ +st1 {v0.8h-v3.8h}, [ x0], x13 +st1 {v0.8h-v3.8h}, [x12], x13 +st1 {v0.8h-v3.8h}, [ x0], x13 +st1 {v0.8h-v3.8h}, [x12], x13 +st1 {v0.8h-v3.8h}, [ x0], x13 +st1 {v0.8h-v3.8h}, [x12], x13 +.endif /* dc 8x8 */ +st1 {v0.8h-v3.8h}, [ x0], x13 +st1 {v0.8h-v3.8h}, [x12], x13 +.if \size > 16 /* dc 32x32 */ +bne 1b +.endif +.else /* dc 4x4 */ +st1 {v0.8h-v1.8h}, [x0] +.endif +ret +endfunc +.endm + +idct_dc 4 8 +idct_dc 4 10 + +idct_dc 8 8 +idct_dc 8 10 + +idct_dc 16 8 +idct_dc 16 10 + +idct_dc 32 8 +idct_dc 32 10 diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c b/libavcodec/aarch64/hevcdsp_init_aarch64.c index 4c29daa6d5..fe111bd1ac 100644 --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c @@ -45,6 +45,14 @@ void ff_hevc_idct_8x8_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_8x8_10_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_8_neon(int16_t *coeffs, int col_limit); void ff_hevc_idct_16x16_10_neon(int16_t *coeffs, int col_limit); +void ff_hevc_idct_4x4_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_8x8_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_16x16_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_32x32_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) { @@ -57,6 +65,10 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[3] = ff_hevc_add_residual_32x32_8_neon; c->idct[1] = ff_hevc_idct_8x8_8_neon; c->idct[2] = ff_hevc_idct_16x16_8_neon; +c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_neon; +c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; +c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; +c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; } if (bit_depth == 10) { c->add_residual[0] = ff_hevc_add_residual_4x4_10_neon; @@ -65,5 +77,9 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[3] = ff_hevc_add_residual_32x32_10_neon; c->idct[1] = ff_hevc_idct_8x8_10_neon; c->idct[2] = ff_hevc_idct_16x16_10_neon; +c->idct_dc[0] = ff_hevc_idct_4x4_dc_10_neon; +c->idct_dc[1] = ff_hevc_idct_8x8_dc_10_neon; +c->idct_dc[2] = ff_hevc_idct_16x16_dc_10_neon; +c->idct_dc[3] = ff_hevc_idct_32x32_dc_10_neon; } } -- 2.24.3 (Apple Git-128) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 0/4] avcodec/aarch64/hevcdsp
Hi, Rebases the unpushed part of my patches on top of Reimar's set. Also implements Martin's suggestions except 'unrolling the loop' for SAO band function, will update the band function when I fix non 8x8 cases. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] Patch for FFmpeg
On 2021-01-13 17:06, Robin Cooksey wrote: I’ve attached a patch which makes avformat handle the 308 Permanent Redirect HTTP status code – which is more recently defined in https://tools.ietf.org/html/rfc7538 The change just treats 308 in the same way as the other 30x status codes. Thanks. Applied with a slightly edited commit message to conform to our conventions & a small reference to the spec. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 4/4] checkasm: add hevc_pel tests
On 2021-01-07 13:10, Josh Dekker wrote: Co-authored-by: Niklas Haas Signed-off-by: Josh Dekker --- tests/checkasm/Makefile | 2 +- tests/checkasm/checkasm.c | 10 + tests/checkasm/checkasm.h | 10 + tests/checkasm/hevc_pel.c | 523 ++ 4 files changed, 544 insertions(+), 1 deletion(-) create mode 100644 tests/checkasm/hevc_pel.c [...] Pushed (only this patch). -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] configure: add fallback to $arch in msvc assembler check.
On 2021-01-23 14:14, Martin Storsjö wrote: On Sat, 23 Jan 2021, Reimar Döffinger wrote: Setting the defaults for $arch happens only later, so the current code would not set AS correctly if --arch was not specified on the command-line. Fix it by adding an explicit fallback to $arch_default. --- configure | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/configure b/configure index 54fbbd6b5f..df298b4b9b 100755 --- a/configure +++ b/configure @@ -4268,7 +4268,7 @@ case "$toolchain" in ld_default="$source_path/compat/windows/mslink" nm_default="dumpbin.exe -symbols" ar_default="lib.exe" - case "$arch" in + case "${arch:-$arch_default}" in aarch64|arm64) as_default="armasm64.exe" ;; -- 2.30.0 LGTM, thanks! // Martin Thanks, pushed. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdsp: port SIMD idct functions from 32-bit.
Hi, On 2021-01-08 21:36, reimar.doeffin...@gmx.de wrote: From: Reimar Döffinger Makes SIMD-optimized 8x8 and 16x16 idcts for 8 and 10 bit depth available on aarch64. For a UHD HDR (10 bit) sample video these were consuming the most time and this optimization reduced overall decode time from 19.4s to 16.4s, approximately 15% speedup. Test sample was the first 300 frames of "LG 4K HDR Demo - New York.ts", running on Apple M1. --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/hevcdsp_idct_neon.S| 426 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 45 +++ libavcodec/hevcdsp.c | 2 + libavcodec/hevcdsp.h | 1 + 5 files changed, 476 insertions(+) create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_init_aarch64.c [...] AS libavcodec/aarch64/hevcdsp_idct_neon.o libavcodec/aarch64/hevcdsp_idct_neon.S: Assembler messages: libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov v29.4S,v28.4S' libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this? libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.8b, v28.8b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s): libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.16b, v28.16b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov v29.4S,v28.4S' libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this? libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.8b, v28.8b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s): libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.16b, v28.16b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov v29.4S,v28.4S' libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this? libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.8b, v28.8b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s): libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.16b, v28.16b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Error: operand mismatch -- `mov v29.4S,v28.4S' libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:did you mean this? libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.8b, v28.8b libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info:other valid variant(s): libavcodec/aarch64/hevcdsp_idct_neon.S:418: Info: mov v29.16b, v28.16b This doesn't build on GNU assembler (GNU Binutils for Ubuntu) 2.34 (aarch64). Thanks for porting this, I was in the process of writing HEVC assembly (see my set on the ML) and would be interested to rebase this on top of that set. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/4] checkasm: add hevc_pel tests
Co-authored-by: Niklas Haas Signed-off-by: Josh Dekker --- tests/checkasm/Makefile | 2 +- tests/checkasm/checkasm.c | 10 + tests/checkasm/checkasm.h | 10 + tests/checkasm/hevc_pel.c | 523 ++ 4 files changed, 544 insertions(+), 1 deletion(-) create mode 100644 tests/checkasm/hevc_pel.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 9e9569777b..1827a4e134 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -24,7 +24,7 @@ AVCODECOBJS-$(CONFIG_HUFFYUV_DECODER) += huffyuvdsp.o AVCODECOBJS-$(CONFIG_JPEG2000_DECODER) += jpeg2000dsp.o AVCODECOBJS-$(CONFIG_OPUS_DECODER) += opusdsp.o AVCODECOBJS-$(CONFIG_PIXBLOCKDSP) += pixblockdsp.o -AVCODECOBJS-$(CONFIG_HEVC_DECODER) += hevc_add_res.o hevc_idct.o hevc_sao.o +AVCODECOBJS-$(CONFIG_HEVC_DECODER) += hevc_add_res.o hevc_idct.o hevc_sao.o hevc_pel.o AVCODECOBJS-$(CONFIG_UTVIDEO_DECODER) += utvideodsp.o AVCODECOBJS-$(CONFIG_V210_DECODER) += v210dec.o AVCODECOBJS-$(CONFIG_V210_ENCODER) += v210enc.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b3ac76c325..8338e8ff58 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -116,6 +116,16 @@ static const struct { #if CONFIG_HEVC_DECODER { "hevc_add_res", checkasm_check_hevc_add_res }, { "hevc_idct", checkasm_check_hevc_idct }, +{ "hevc_qpel", checkasm_check_hevc_qpel }, +{ "hevc_qpel_uni", checkasm_check_hevc_qpel_uni }, +{ "hevc_qpel_uni_w", checkasm_check_hevc_qpel_uni_w }, +{ "hevc_qpel_bi", checkasm_check_hevc_qpel_bi }, +{ "hevc_qpel_bi_w", checkasm_check_hevc_qpel_bi_w }, +{ "hevc_epel", checkasm_check_hevc_epel }, +{ "hevc_epel_uni", checkasm_check_hevc_epel_uni }, +{ "hevc_epel_uni_w", checkasm_check_hevc_epel_uni_w }, +{ "hevc_epel_bi", checkasm_check_hevc_epel_bi }, +{ "hevc_epel_bi_w", checkasm_check_hevc_epel_bi_w }, { "hevc_sao", checkasm_check_hevc_sao }, #endif #if CONFIG_HUFFYUV_DECODER diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 0190bc912c..ef6645e3a2 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -58,6 +58,16 @@ void checkasm_check_h264pred(void); void checkasm_check_h264qpel(void); void checkasm_check_hevc_add_res(void); void checkasm_check_hevc_idct(void); +void checkasm_check_hevc_qpel(void); +void checkasm_check_hevc_qpel_uni(void); +void checkasm_check_hevc_qpel_uni_w(void); +void checkasm_check_hevc_qpel_bi(void); +void checkasm_check_hevc_qpel_bi_w(void); +void checkasm_check_hevc_epel(void); +void checkasm_check_hevc_epel_uni(void); +void checkasm_check_hevc_epel_uni_w(void); +void checkasm_check_hevc_epel_bi(void); +void checkasm_check_hevc_epel_bi_w(void); void checkasm_check_hevc_sao(void); void checkasm_check_huffyuvdsp(void); void checkasm_check_jpeg2000dsp(void); diff --git a/tests/checkasm/hevc_pel.c b/tests/checkasm/hevc_pel.c new file mode 100644 index 00..236404f8ff --- /dev/null +++ b/tests/checkasm/hevc_pel.c @@ -0,0 +1,523 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include "checkasm.h" +#include "libavcodec/hevcdsp.h" +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/intreadwrite.h" + +static const uint32_t pixel_mask[] = { 0x, 0x01ff01ff, 0x03ff03ff, 0x07ff07ff, 0x0fff0fff }; +static const uint32_t pixel_mask16[] = { 0x00ff00ff, 0x01ff01ff, 0x03ff03ff, 0x07ff07ff, 0x0fff0fff }; +static const int sizes[] = { -1, 4, 6, 8, 12, 16, 24, 32, 48, 64 }; +static const int weights[] = { 0, 128, 255, -1 }; +static const int denoms[] = {0, 7, 12, -1 }; +static const int offsets[] = {0, 255, -1 }; + +#define SIZEOF_PIXEL ((bit_depth + 7) / 8) +#define BUF_SIZE (2 * MAX_PB_SIZE * (2 * 4 + MAX_PB_SIZE)) + +#define randomize_buffers() \ +do { \ +uint32_t mask = p
[FFmpeg-devel] [PATCH 3/4] lavc/aarch64: add HEVC sao_band NEON
Only works for 8x8. Signed-off-by: Josh Dekker --- libavcodec/aarch64/Makefile | 3 +- libavcodec/aarch64/hevcdsp_init.c | 7 +++ libavcodec/aarch64/hevcdsp_sao_neon.S | 87 +++ 3 files changed, 96 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/hevcdsp_sao_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 42d80bf74c..1f54fc31f4 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -55,7 +55,8 @@ NEON-OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_neon.o NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_add_res_neon.o \ - aarch64/hevcdsp_idct_neon.o + aarch64/hevcdsp_idct_neon.o \ + aarch64/hevcdsp_sao_neon.o NEON-OBJS-$(CONFIG_OPUS_DECODER)+= aarch64/opusdsp_neon.o NEON-OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_neon.o NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ diff --git a/libavcodec/aarch64/hevcdsp_init.c b/libavcodec/aarch64/hevcdsp_init.c index 2cd7ef3a6c..8f0a923ab1 100644 --- a/libavcodec/aarch64/hevcdsp_init.c +++ b/libavcodec/aarch64/hevcdsp_init.c @@ -23,6 +23,11 @@ #include "libavcodec/hevcdsp.h" #include "libavcodec/avcodec.h" +void ff_hevc_sao_band_filter_8x8_8_neon(uint8_t *_dst, uint8_t *_src, + ptrdiff_t stride_dst, ptrdiff_t stride_src, + int16_t *sao_offset_val, int sao_left_class, + int width, int height); + void ff_hevc_idct_4x4_dc_8_neon(int16_t *coeffs); void ff_hevc_idct_8x8_dc_8_neon(int16_t *coeffs); void ff_hevc_idct_16x16_dc_8_neon(int16_t *coeffs); @@ -53,6 +58,8 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) { int cpu_flags = av_get_cpu_flags(); if (have_neon(cpu_flags) && bit_depth == 8) { +c->sao_band_filter[0] = ff_hevc_sao_band_filter_8x8_8_neon; + c->add_residual[0] = ff_hevc_add_residual_4x4_8_neon; c->add_residual[1] = ff_hevc_add_residual_8x8_8_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_8_neon; diff --git a/libavcodec/aarch64/hevcdsp_sao_neon.S b/libavcodec/aarch64/hevcdsp_sao_neon.S new file mode 100644 index 00..25b6c25117 --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_sao_neon.S @@ -0,0 +1,87 @@ +/* -*-arm64-*- + * vim: syntax=arm64asm + * + * AArch64 NEON optimised SAO functions for HEVC decoding + * + * Copyright (c) 2020 Josh Dekker + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +// void sao_band_filter(uint8_t *_dst, uint8_t *_src, +// ptrdiff_t stride_dst, ptrdiff_t stride_src, +// int16_t *sao_offset_val, int sao_left_class, +// int width, int height) +function ff_hevc_sao_band_filter_8x8_8_neon, export=1 +sub sp, sp, #64 +stp xzr, xzr, [sp] +stp xzr, xzr, [sp, #16] +stp xzr, xzr, [sp, #32] +stp xzr, xzr, [sp, #48] +mov w8, #4 +.setup: +ldrsh x9, [x4, x8, lsl #1] // x9 = sao_offset_val[k+1] +subs w8, w8, #1 +add w10, w8, w5 // x10 = k + sao_left_class +and w10, w10, #0x1F +strh w9, [sp, x10, lsl #1] +bne .setup +ld1 {v16.16B-v19.16B}, [sp], #64 +movi v20.8H, #1 +0: // beginning of line +mov w8, w6 +8: +// Simple layout for accessing 16bit values +// with 8bit LUT. +// +// 00 01 02 03 04 05 06 07 +// +---> +// |xDE#xAD|xCA#xFE|xBE#xEF|xFE#xED| +// +---> +//i-0 i-1 i-2 i-3 +// dst[x] = av_clip_pixel(src[x] + offset_table[src[x] >> shift]); +ld1 {v2.8B}, [x1] +// load src[x] +ushll v0.8H, v2.8B, #0 +// >> shift +ushr v2.8H, v0.8H, #3 // BIT_DEPTH
[FFmpeg-devel] [PATCH 2/4] lavc/aarch64: add HEVC idct_dc NEON
Signed-off-by: Josh Dekker --- libavcodec/aarch64/Makefile| 3 +- libavcodec/aarch64/hevcdsp_idct_neon.S | 74 ++ libavcodec/aarch64/hevcdsp_init.c | 19 +++ 3 files changed, 95 insertions(+), 1 deletion(-) create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index 4bdd554e7e..42d80bf74c 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -54,7 +54,8 @@ NEON-OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_neon.o # decoders/encoders NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o -NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_add_res_neon.o +NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_add_res_neon.o \ + aarch64/hevcdsp_idct_neon.o NEON-OBJS-$(CONFIG_OPUS_DECODER)+= aarch64/opusdsp_neon.o NEON-OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_neon.o NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S new file mode 100644 index 00..cd886bb6dc --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -0,0 +1,74 @@ +/* -*-arm64-*- + * + * AArch64 NEON optimised IDCT functions for HEVC decoding + * + * Copyright (c) 2020 Josh Dekker + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro idct_dc size bitdepth +function ff_hevc_idct_\size\()x\size\()_dc_\bitdepth\()_neon, export=1 +ldrsh w1, [x0] +mov w2, #(1 << (13 - \bitdepth)) +add w1, w1, #1 +asr w1, w1, #1 +add w1, w1, w2 +asr w1, w1, #(14 - \bitdepth) +dup v0.8h, w1 +dup v1.8h, w1 +.if \size > 4 +dup v2.8h, w1 +dup v3.8h, w1 +.if \size > 16 /* dc 32x32 */ +mov x2, #4 +1: +subs x2, x2, #1 +.endif +.if \size > 8 /* dc 16x16 */ +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +.endif /* dc 8x8 */ +st1 {v0.8h-v3.8h}, [x0], #64 +st1 {v0.8h-v3.8h}, [x0], #64 +.if \size > 16 /* dc 32x32 */ +bne 1b +.endif +.else /* dc 4x4 */ +st1 {v0.8h-v1.8h}, [x0] +.endif +ret +endfunc +.endm + +idct_dc 4 8 +idct_dc 4 10 + +idct_dc 8 8 +idct_dc 8 10 + +idct_dc 16 8 +idct_dc 16 10 + +idct_dc 32 8 +idct_dc 32 10 diff --git a/libavcodec/aarch64/hevcdsp_init.c b/libavcodec/aarch64/hevcdsp_init.c index f0a617ab39..2cd7ef3a6c 100644 --- a/libavcodec/aarch64/hevcdsp_init.c +++ b/libavcodec/aarch64/hevcdsp_init.c @@ -23,6 +23,15 @@ #include "libavcodec/hevcdsp.h" #include "libavcodec/avcodec.h" +void ff_hevc_idct_4x4_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_8x8_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_16x16_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_32x32_dc_8_neon(int16_t *coeffs); +void ff_hevc_idct_4x4_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_8x8_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_16x16_dc_10_neon(int16_t *coeffs); +void ff_hevc_idct_32x32_dc_10_neon(int16_t *coeffs); + void ff_hevc_add_residual_4x4_8_neon(uint8_t *_dst, int16_t *coeffs, ptrdiff_t stride); void ff_hevc_add_residual_4x4_10_neon(uint8_t *_dst, int16_t *coeffs, @@ -48,6 +57,11 @@ av_cold void ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) c->add_residual[1] = ff_hevc_add_residual_8x8_8_neon; c->add_residual[2] = ff_hevc_add_residual_16x16_8_neon; c->add_residual[3] = ff_hevc_add_residual_32x32_8_neon; + +c->idct_dc[0] = ff_hevc_idct_4x4_dc_8_neon; +c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_neon; +c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_neon; +c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_neon; } if (have_
[FFmpeg-devel] [PATCH 1/4] lavc/aarch64: add HEVC add_residual NEON
Signed-off-by: Josh Dekker --- libavcodec/aarch64/Makefile | 2 + libavcodec/aarch64/hevcdsp_add_res_neon.S | 298 ++ libavcodec/aarch64/hevcdsp_init.c | 59 + libavcodec/hevcdsp.c | 2 + libavcodec/hevcdsp.h | 1 + 5 files changed, 362 insertions(+) create mode 100644 libavcodec/aarch64/hevcdsp_add_res_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_init.c diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index f6434e40da..4bdd554e7e 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -17,6 +17,7 @@ OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_init_aarch64.o OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_init_aarch64.o \ aarch64/sbrdsp_init_aarch64.o OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_init.o +OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_init.o OBJS-$(CONFIG_OPUS_DECODER) += aarch64/opusdsp_init.o OBJS-$(CONFIG_RV40_DECODER) += aarch64/rv40dsp_init_aarch64.o OBJS-$(CONFIG_VC1DSP) += aarch64/vc1dsp_init_aarch64.o @@ -53,6 +54,7 @@ NEON-OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_neon.o # decoders/encoders NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o +NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_add_res_neon.o NEON-OBJS-$(CONFIG_OPUS_DECODER)+= aarch64/opusdsp_neon.o NEON-OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_neon.o NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ diff --git a/libavcodec/aarch64/hevcdsp_add_res_neon.S b/libavcodec/aarch64/hevcdsp_add_res_neon.S new file mode 100644 index 00..4005366192 --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_add_res_neon.S @@ -0,0 +1,298 @@ +/* -*-arm64-*- + * + * AArch64 NEON optimised add residual functions for HEVC decoding + * + * Copyright (c) 2020 Josh Dekker + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro clip10 in1, in2, c1, c2 +smax \in1, \in1, \c1 +smax \in2, \in2, \c1 +smin \in1, \in1, \c2 +smin \in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 +mov x3, x0 +ld1 {v0.S}[0], [x3], x2 +ld1 {v0.S}[1], [x3], x2 +ld1 {v1.S}[0], [x3], x2 +ld1 {v1.S}[1], [x3], x2 +ld1 { v2.8H-v3.8H}, [x1] +ushll v4.8H, v0.8B, #0 +ushll v5.8H, v1.8B, #0 +add v6.8H, v4.8H, v2.8H +add v7.8H, v5.8H, v3.8H +sqxtun v0.8B, v6.8H +sqxtun v1.8B, v7.8H +st1 {v0.S}[0], [x0], x2 +st1 {v0.S}[1], [x0], x2 +st1 {v1.S}[0], [x0], x2 +st1 {v1.S}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_4x4_10_neon, export=1 +mov x3, x0 +movi v4.8H, #0 +mvni v5.8H, #0xFC, lsl #8 +ld1 {v0.D}[0], [x3], x2 +ld1 {v0.D}[1], [x3], x2 +ld1 {v1.D}[0], [x3], x2 +ld1 {v1.D}[1], [x3], x2 +ld1 { v2.8H-v3.8H}, [x1] +add v2.8H, v0.8H, v2.8H +add v3.8H, v1.8H, v3.8H +clip10 v2.8H, v3.8H, v4.8H, v5.8H +st1 {v2.D}[0], [x0], x2 +st1 {v2.D}[1], [x0], x2 +st1 {v3.D}[0], [x0], x2 +st1 {v3.D}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_8x8_8_neon, export=1 +mov x3, x0 +ld1 {v0.8B}, [x3], x2 +ld1 {v1.8B}, [x3], x2 +ld1 {v2.8B}, [x3], x2 +ld1 {v3.8B}, [x3], x2 +ld1 {v4.8B}, [x3], x2 +ld1 {v5.8B}, [x3], x2 +ld1 {v6.8B}, [x3], x2 +ld1 {v7.8B}, [x3], x2 +ld1 { v16.8H-v19.8H}, [x1], #64 +ld1 { v20.8H-v23.8H}, [x1] +ushll v24.8H, v0.8B, #0 +ushll v25.8H, v1.8B, #0 +ushll v26.8H, v2.8B, #0 +ushll v27.8H, v3.8B, #0 +ushll v28.8H, v4.8B, #0 +ushll v29.8H, v5.8B, #0 +ushll v30.8H, v6.8B, #0 +ushll v31.8H, v7.8B, #0 +add v0.8H, v24.8H, v16.8H +add v1.8H, v25.8H, v17.8H +add v2.8H, v26.8H, v18.8H +add v3.8H, v27.8H, v19.8H +add v4.8H, v28.8H, v20.8H +add v5.8H, v29.8H, v21.8H +add v6.8H, v30.8H, v22.8H +add v7.8H, v31.8H, v23.8H +sqxtun v24.8B, v0.8H +sqxtun v25
[FFmpeg-devel] [PATCH 0/4] AArch64 NEON for HEVC
checkasm: all 657 tests passed hevc_add_res_4x4_8_c: 49.7 hevc_add_res_4x4_8_neon: 20.5 hevc_add_res_4x4_10_c: 45.7 hevc_add_res_4x4_10_neon: 18.7 hevc_add_res_8x8_8_c: 211.0 hevc_add_res_8x8_8_neon: 24.5 hevc_add_res_8x8_10_c: 195.7 hevc_add_res_8x8_10_neon: 24.0 hevc_add_res_16x16_8_c: 787.2 hevc_add_res_16x16_8_neon: 79.0 hevc_add_res_16x16_10_c: 714.7 hevc_add_res_16x16_10_neon: 77.7 hevc_add_res_32x32_8_c: 3444.2 hevc_add_res_32x32_8_neon: 306.5 hevc_add_res_32x32_10_c: 3820.7 hevc_add_res_32x32_10_neon: 299.5 hevc_idct_4x4_dc_8_c: 16.2 hevc_idct_4x4_dc_8_neon: 13.7 hevc_idct_4x4_dc_10_c: 16.2 hevc_idct_4x4_dc_10_neon: 14.5 hevc_idct_8x8_dc_8_c: 40.7 hevc_idct_8x8_dc_8_neon: 18.5 hevc_idct_8x8_dc_10_c: 39.2 hevc_idct_8x8_dc_10_neon: 19.2 hevc_idct_16x16_dc_8_c: 136.7 hevc_idct_16x16_dc_8_neon: 35.7 hevc_idct_16x16_dc_10_c: 136.0 hevc_idct_16x16_dc_10_neon: 36.0 hevc_idct_32x32_dc_8_c: 1386.7 hevc_idct_32x32_dc_8_neon: 132.0 hevc_idct_32x32_dc_10_c: 1366.2 hevc_idct_32x32_dc_10_neon: 132.0 hevc_sao_band_8x8_8_c: 230.7 hevc_sao_band_8x8_8_neon: 92.7 Please disregard my previous email with subject 'lavc/aarch64: add HEVC add_residual NEON', the patch was split incorrectly. IDCT (first) and QPEL functions in the works, then SAO edge, and whatever is left for parity with ARM NEON. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] lavc/aarch64: add HEVC add_residual NEON
Signed-off-by: Josh Dekker --- checkasm: all 648 tests passed hevc_add_res_4x4_8_c: 49.7 hevc_add_res_4x4_8_neon: 20.5 hevc_add_res_4x4_10_c: 46.0 hevc_add_res_4x4_10_neon: 19.0 hevc_add_res_8x8_8_c: 209.0 hevc_add_res_8x8_8_neon: 24.5 hevc_add_res_8x8_10_c: 192.7 hevc_add_res_8x8_10_neon: 27.0 hevc_add_res_16x16_8_c: 791.5 hevc_add_res_16x16_8_neon: 79.0 hevc_add_res_16x16_10_c: 711.0 hevc_add_res_16x16_10_neon: 77.7 hevc_add_res_32x32_8_c: 3431.2 hevc_add_res_32x32_8_neon: 306.5 hevc_add_res_32x32_10_c: 3825.0 hevc_add_res_32x32_10_neon: 299.5 libavcodec/aarch64/Makefile | 3 + libavcodec/aarch64/hevcdsp_add_res_neon.S | 298 ++ libavcodec/aarch64/hevcdsp_idct_neon.S| 24 ++ libavcodec/aarch64/hevcdsp_init.c | 59 + libavcodec/hevcdsp.c | 2 + libavcodec/hevcdsp.h | 1 + 6 files changed, 387 insertions(+) create mode 100644 libavcodec/aarch64/hevcdsp_add_res_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_idct_neon.S create mode 100644 libavcodec/aarch64/hevcdsp_init.c diff --git a/libavcodec/aarch64/Makefile b/libavcodec/aarch64/Makefile index f6434e40da..0eaafce74b 100644 --- a/libavcodec/aarch64/Makefile +++ b/libavcodec/aarch64/Makefile @@ -17,6 +17,7 @@ OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_init_aarch64.o OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_init_aarch64.o \ aarch64/sbrdsp_init_aarch64.o OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_init.o +OBJS-$(CONFIG_HEVC_DECODER) += aarch64/hevcdsp_init.o OBJS-$(CONFIG_OPUS_DECODER) += aarch64/opusdsp_init.o OBJS-$(CONFIG_RV40_DECODER) += aarch64/rv40dsp_init_aarch64.o OBJS-$(CONFIG_VC1DSP) += aarch64/vc1dsp_init_aarch64.o @@ -53,6 +54,8 @@ NEON-OBJS-$(CONFIG_VP8DSP) += aarch64/vp8dsp_neon.o # decoders/encoders NEON-OBJS-$(CONFIG_AAC_DECODER) += aarch64/aacpsdsp_neon.o NEON-OBJS-$(CONFIG_DCA_DECODER) += aarch64/synth_filter_neon.o +NEON-OBJS-$(CONFIG_HEVC_DECODER)+= aarch64/hevcdsp_add_res_neon.o \ + aarch64/hevcdsp_idct_neon.o NEON-OBJS-$(CONFIG_OPUS_DECODER)+= aarch64/opusdsp_neon.o NEON-OBJS-$(CONFIG_VORBIS_DECODER) += aarch64/vorbisdsp_neon.o NEON-OBJS-$(CONFIG_VP9_DECODER) += aarch64/vp9itxfm_16bpp_neon.o \ diff --git a/libavcodec/aarch64/hevcdsp_add_res_neon.S b/libavcodec/aarch64/hevcdsp_add_res_neon.S new file mode 100644 index 00..dc7e8127b9 --- /dev/null +++ b/libavcodec/aarch64/hevcdsp_add_res_neon.S @@ -0,0 +1,298 @@ +/* -*-armv8-*- + * + * AArch64 NEON optimised add residual functions for HEVC decoding + * + * Copyright (c) 2020 Josh Dekker + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro clip10 in1, in2, c1, c2 +smax \in1, \in1, \c1 +smax \in2, \in2, \c1 +smin \in1, \in1, \c2 +smin \in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 +mov x3, x0 +ld1 {v0.S}[0], [x3], x2 +ld1 {v0.S}[1], [x3], x2 +ld1 {v1.S}[0], [x3], x2 +ld1 {v1.S}[1], [x3], x2 +ld1 { v2.8H-v3.8H}, [x1] +ushll v4.8H, v0.8B, #0 +ushll v5.8H, v1.8B, #0 +add v6.8H, v4.8H, v2.8H +add v7.8H, v5.8H, v3.8H +sqxtun v0.8B, v6.8H +sqxtun v1.8B, v7.8H +st1 {v0.S}[0], [x0], x2 +st1 {v0.S}[1], [x0], x2 +st1 {v1.S}[0], [x0], x2 +st1 {v1.S}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_4x4_10_neon, export=1 +mov x3, x0 +movi v4.8H, #0 +mvni v5.8H, #0xFC, lsl #8 +ld1 {v0.D}[0], [x3], x2 +ld1 {v0.D}[1], [x3], x2 +ld1 {v1.D}[0], [x3], x2 +ld1 {v1.D}[1], [x3], x2 +ld1 { v2.8H-v3.8H}, [x1] +add v2.8H, v0.8H, v2.8H +add v3.8H, v1.8H, v3.8H +clip10 v2.8H, v3.8H, v4.8H, v5.8H +st1 {v2.D}[0], [x0], x2 +st1 {v2.D}[1], [x0], x2 +st1 {v3.D}[0], [x0], x2 +st1 {v3.D}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_8x8_8_neon, export=1 +mov x3, x0 +ld1 {v0.8B}, [x3], x2 +ld1 {v1.8B}, [x3], x2 +ld1 {v2.8B}, [x3], x2 +
Re: [FFmpeg-devel] FFmpeg buying an Apple M1 Mac Mini
On 2021/01/03 20:18, Michael Niedermayer wrote: On Sun, Jan 03, 2021 at 06:32:11PM +0100, Kieran Kunhya wrote: Hello, As it's 2021 I would like to propose FFmpeg purchase one or more (e.g FATE + development) Apple M1 Mac Minis and provide access to developers. This is something I have done a few years ago when AVX2 was a new instruction set. I can host these in the UK 24/7 and provide access and label them as belonging to the project and not me. < To clarify these will be hosted in a proper datacentre, with proper < connectivity, cooling etc. I would propose buying and getting reimbursed one or more of: - Apple M1 chip with 8‑core CPU, 8‑core GPU and 16‑core Neural Engine - 16GB unified memory - 1TB SSD storage - Gigabit Ethernet This is £1,299.00 in the UK right now on the Apple Site. assuming noone has objections or better suggestions LGTM thx Ok from me too. I would suggest getting 2x, one for only FATE and the other for general access & development for FFmpeg developers. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
On 2020/12/09 11:19, Alan Kelly wrote: --- Activates avx2 version of yuv2yuvX Adds checkasm for yuv2yuvX Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_* Replaces non-temporal stores with temporal stores libswscale/x86/Makefile | 1 + libswscale/x86/swscale.c| 106 +--- libswscale/x86/yuv2yuvX.asm | 118 tests/checkasm/sw_scale.c | 101 +- 4 files changed, 249 insertions(+), 77 deletions(-) create mode 100644 libswscale/x86/yuv2yuvX.asm [...] diff --git a/tests/checkasm/sw_scale.c b/tests/checkasm/sw_scale.c index 9efa2b4def..7009169361 100644 --- a/tests/checkasm/sw_scale.c +++ b/tests/checkasm/sw_scale.c [...] +static void check_yuv2yuvX(void) +{ +struct SwsContext *ctx; +int fsi, osi; +#define LARGEST_FILTER 8 +#define FILTER_SIZES 4 +static const int filter_sizes[FILTER_SIZES] = {1, 4, 8, 16}; + +declare_func_emms(AV_CPU_FLAG_MMX, void, const int16_t *filter, + int filterSize, const int16_t **src, uint8_t *dest, + int dstW, const uint8_t *dither, int offset); + +int dstW = SRC_PIXELS; +const int16_t **src; +LOCAL_ALIGNED_32(int16_t, filter_coeff, [LARGEST_FILTER]); +LOCAL_ALIGNED_32(uint8_t, dst0, [SRC_PIXELS]); +LOCAL_ALIGNED_32(uint8_t, dst1, [SRC_PIXELS]); +LOCAL_ALIGNED_32(uint8_t, dither, [SRC_PIXELS]); +union VFilterData{ +const int16_t *src; +uint16_t coeff[8]; +} *vFilterData; +uint8_t d_val = rnd(); +randomize_buffers(filter_coeff, LARGEST_FILTER); +ctx = sws_alloc_context(); +if (sws_init_context(ctx, NULL, NULL) < 0) +fail(); + +ff_sws_init_swscale_x86(ctx); This should be ff_getSwsFunc() instead. +for(int i = 0; i < SRC_PIXELS; ++i){ +dither[i] = d_val; +} [...] -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [RFC] Machines & Platforms of interest for testing
Hi, As discussed in the meeting, I'm starting a RFC for Machines & Platforms of interest for testing, developer access and FATE. These would be funded by SPI. The two platforms mentioned were a Mac Mini (M1 Apple Silicon platform) and a TALOS II (POWER9 platform). My personal suggestion would be a machine with both a modern nVidia GPU and AMD GPU for testing hardware acceleration integration. Kieran offered to host one Mac Mini, though I'm unsure what his capacity for hosting is. Any comments and suggestions welcome. -- Josh ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [IMPORTANT] Meeting Notes - December 2020
Hi all, Here are the notes from the FFmpeg developer meeting of the 5th of December 2020, 15:00 UTC: # Notes A recording of the call and IRC logs are available on YouTube and the Wiki respectively: - https://youtu.be/1EjIdYuWXEM - https://trac.ffmpeg.org/wiki/FFmeeting/2020-12 Some extra topics were discussed in the meeting. The notes are cleaned up to have actionable points. ## Proposed Agenda https://ffmpeg.org/pipermail/ffmpeg-devel/2020-November/272272.html - Tech / Comm committees, how-to in practice - GSoC adaptations for upcoming years - Splitting libraries (lavf->lavio) - Deprecating libpostproc - Writing down development rules - Switching to a merge request-like system - Propose FFmpeg/SPI purchase a Mac Mini ARM machine for development (and FATE if required). ## People Present (15) - Jean-Baptiste Kempf - Josh Dekker (Illya) - Jan Ekström - Michael Niedermayer - Gyan Doshi - Lynne - Kieran Kunhya - Mark Thompson - Anton Khirnov - Paul B Mahol - Steven Liu - Marvin Scholz - Andriy Gelman - Linjie Fu - James Almer ## Topics discussed ### Topic 1.0: Technical Comittee Clarify Technical Process on the mailing list -> vote in one week, in a Yes/No fashion. Question in 100 word Tech limit on the number of words * The question is limited to the one hundred words, so for example of the form "should we do X rather than Y?". * The background to the question will likely be complex and can be explained in detail elsewhere. * The intent of this restriction is to avoid an unclear question or any ambiguity in the answer. Action points - [Mark and Lynne] Submit an updated patch to clarify how the 100 word limit works. ### Topic 1.1: Community committees A Community Code of Conduct was written by j-b, needs review. The CoC MUST HAVE abuse limits of TC and CC. Action points * [Josh, JEEB, Kieran and Michael] Pre-review * [j-b] Post CoC on mailing list for general review * [GA] Vote on CoC in the same format as technical process **Goal: everything voted end of Dec 2020** ### Topic 2: GSoC GSoC was shortened to 5 weeks, project should take around 150 hours, overall time is halved. Small projects are generally more boring and less useful. Kieran and Lynne believe we should stop doing GSoC. Anton noted that GSoC is not a burden on people who don't care about it. Going forward projects would have to be restructured from the types of projects previously. They should be more integrated with community \& more specifically detailed. Suggestions included smaller optimisation projects (still lots of assembly unwritten). Action points - [Carl] Setup wiki page for *small*, *self-contained*, *specific* GSoC project ideas. ### Topic 3: splitting and merging libraries Topic 3.1 libavdevice Libavdevice is very tightly coupled to libavformat, so should be merged into it. Apparently nobody is against. Anton is working on the merge. Topic 3.2 merging all libraries into one Steven mentions it is useful for his use case to build a single libffmpeg.so JB replies this makes sense for some cases, but not for others, multiple separate libraries are preferable for many other cases. Discussion of various open source libraries moving to meson. # Action points - [Who?] More discussion on how to do a mega-library and if we should document it. Lynne suggests only this mega-library to be a static library. - [Requires further discussion] Do we support libffmpeg.a|.so in the build system? Topic 3.3 Splitting libavformat IO into libavio Some API users would prefer this functionality to be separate, as it involves network communication and such. Additionally, it would allow proper IO in other libraries, such as lavc and lavfi. Question whether IO can be moved to libavutil. Conclusion -> libavutil is big enough already # Action points - [Anton] Reflect on how to split IO out of lavf into libavio - [Requires further discussion] Point raised whether hwcontext should be split off from libavutil Topic 3.4 Splitting libavutil hwcontext into its own library Mark mentions that hardware context should move to a separate library? Anton says it is probably inconvenient for distros that lavu links to many hwaccel libs -> libavhwcontext makes sense. Nobody raises objections to this. # Action points - [Who?] Reflect on whether it makes sense to do, and if so split hardware contexts into separate library ## Topic 4: deprecating libpostproc Libpostproc does not have any external users (Kodi was thought to use it directly but was confirmed as not a user). There is no need for it to be a standalone library. A few options were suggested: - remove it all together - merge in libavfilter - move it to another repo Kieran wants to move to a different repo, Anton noted that libpostproc is already in external repo, the one from Libav. Michael wants to leave it as it is or integrate it into libavfilter. Action poin