On Thu, 20 Mar 2014, Ben Avison wrote:
Profiling results for overall audio decode and the mlp_filter_channel(_arm)
function in particular are as follows:
Before After
Mean StdDev Mean StdDev Confidence Change
6:2 total 380.4 22.0 370.8 17.0 87.4% +2.6% (insignificant)
6:2 function 60.7 7.2 36.6 8.1 100.0% +65.8%
8:2 total 357.0 17.5 343.2 19.0 97.8% +4.0% (insignificant)
8:2 function 60.3 8.8 37.3 3.8 100.0% +61.8%
6:6 total 717.2 23.2 658.4 15.7 100.0% +8.9%
6:6 function 140.4 12.9 81.5 9.2 100.0% +72.4%
8:8 total 981.9 16.2 896.2 24.5 100.0% +9.6%
8:8 function 193.4 15.0 103.3 11.5 100.0% +87.2%
Experiments with adding preload instructions to this function yielded no
useful benefit, so these have not been included.
The assembly version has also been tested with a fuzz tester to ensure that
any combinations of inputs not exercised by my available test streams still
generate mathematically identical results to the C version.
---
libavcodec/arm/Makefile | 2 +
libavcodec/arm/mlpdsp_arm.S | 435 ++++++++++++++++++++++++++++++++++++++
libavcodec/arm/mlpdsp_init_arm.c | 36 +++
libavcodec/mlpdsp.c | 2 +
libavcodec/mlpdsp.h | 1 +
5 files changed, 476 insertions(+), 0 deletions(-)
create mode 100644 libavcodec/arm/mlpdsp_arm.S
create mode 100644 libavcodec/arm/mlpdsp_init_arm.c
diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
index 8bdccbd..c6cc96e 100644
--- a/libavcodec/arm/Makefile
+++ b/libavcodec/arm/Makefile
@@ -21,6 +21,8 @@ OBJS-$(CONFIG_H264PRED) +=
arm/h264pred_init_arm.o
OBJS-$(CONFIG_H264QPEL) += arm/h264qpel_init_arm.o
OBJS-$(CONFIG_HPELDSP) += arm/hpeldsp_init_arm.o \
arm/hpeldsp_arm.o
+OBJS-$(CONFIG_MLP_DECODER) += arm/mlpdsp_init_arm.o \
+ arm/mlpdsp_arm.o
OBJS-$(CONFIG_MPEGAUDIODSP) += arm/mpegaudiodsp_init_arm.o
OBJS-$(CONFIG_MPEGVIDEO) += arm/mpegvideo_arm.o
OBJS-$(CONFIG_NEON_CLOBBER_TEST) += arm/neontest.o
diff --git a/libavcodec/arm/mlpdsp_arm.S b/libavcodec/arm/mlpdsp_arm.S
new file mode 100644
index 0000000..9e0bf57
--- /dev/null
+++ b/libavcodec/arm/mlpdsp_arm.S
@@ -0,0 +1,435 @@
+/*
+ * Copyright (c) 2014 RISC OS Open Ltd
+ * Author: Ben Avison <bavi...@riscosopen.org>
+ *
+ * This file is part of Libav.
+ *
+ * Libav is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * Libav is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with Libav; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavutil/arm/asm.S"
+
+// This code uses too many ARM-only tricks to easily assemble as Thumb
+.arm
Just to be clear, the tricks that don't work in thumb mode are
non-constant shifts, and jump tables with "ldr pc, [pc, ...]", right?
Forcing arm mode like this isn't ok in all configurations - e.g. when
building for WinRT/Windows Phone 8, you really have to build all of it in
thumb mode; the linker doesn't handle everything needed for mixing the
modes there.
Would it be acceptable to build and run this code only if CONFIG_THUMB is
disabled? That's the case for most raspberry pi builds at least, although
I guess it would lead to not using this code at all on other e.g. armv7
builds on linux where it still could have been beneficial?
// Martin
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel