On Wed, 9 Dec 2015, Janne Grunau wrote:

Quite a bit faster than int32_to_float_fmul_array8_c calling
ff_int32_to_float_fmul_scalar_neon through FmtConvertContext.
Number of cycles per int32_to_float_fmul_array8 call while decoding
padded.dts on exynos5422:

              before  after   change
cortex-a7:     1270     951    -25%
cortex-a15:     434     285    -34%

checkasm --bench cycle counts:     cortex-a15   cortex-a7
int32_to_float_fmul_array8_c:      1730.4       4384.5
int32_to_float_fmul_array8_neon_c:  571.5       1694.3
int32_to_float_fmul_array8_neon:    374.0       1448.8

Interesting are the differences between
int32_to_float_fmul_array8_neon_c and int32_to_float_fmul_array8_neon.
The former is current behaviour of calling
ff_int32_to_float_fmul_scalar_neon repeatedly from the c function,
The raw numbers differ since checkasm uses different lengths than the
dca decoder.
---
libavcodec/arm/fmtconvert_init_arm.c |  4 ++++
libavcodec/arm/fmtconvert_neon.S     | 37 ++++++++++++++++++++++++++++++++++++
2 files changed, 41 insertions(+)

Ok

// Martin
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to