Hi, On Wed, Jul 13, 2016 at 12:37 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote:
> About 1.8x speedup compared to AVX version for full IDCT. Other > sub-IDCT scenarios also see speedups. Full --bench output for > idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): > > nop: 16.5 > vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 > vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 > vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 > vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 > vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 > vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 > vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 > vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 > vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 > vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 > vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 > vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 > vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 > vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 > vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 > vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 > vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 > vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 > vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 > vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 > vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 > vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 > vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 > vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 > vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 > vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 > vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 > vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 > vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 > vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0 > --- > libavcodec/x86/vp9dsp_init.c | 2 + > libavcodec/x86/vp9itxfm.asm | 223 > ++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 222 insertions(+), 3 deletions(-) Ping. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel