Hi, On Tue, Jul 5, 2016 at 10:37 PM, Dan Parrot <dan.par...@mail.com> wrote:
> rgb24ToY_c 0.92 OK, so let's be data-driven from now on, I really don't like this name-calling and stuff. Your speedup on average is close to 1, so let's compare this to x86. I ran this patch: diff --git a/libswscale/hscale.c b/libswscale/hscale.c index eca0635..5d0b39d 100644 --- a/libswscale/hscale.c +++ b/libswscale/hscale.c @@ -105,7 +105,9 @@ static int lum_convert(SwsContext *c, SwsFilterDescriptor *desc, int sliceY, int uint8_t * dst = desc->dst->plane[0].line[i]; if (c->lumToYV12) { +START_TIMER c->lumToYV12(dst, src[0], src[1], src[2], srcW, pal); +STOP_TIMER("rgb24toy"); } else if (c->readLumPlanar) { c->readLumPlanar(dst, src, srcW, c->input_rgb2yuv_table); } And then I ran these commandlines: $ ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p -f null -vframes 100 -v error -nostats - 2>&1 | tail -n1 13890 decicycles in rgb24toy, 65428 runs, 108 skips $ ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -cpuflags 0 -s hd1080 -i /dev/zero -pix_fmt yuv420p -f null -vframes 100 -v error -nostats - 2>&1 | tail -n1 62186 decicycles in rgb24toy, 65497 runs, 39 skips As you can see, I get a ~4x speedup in this function from the SIMD from an AVX function (ff_rgb24ToY_avx) instead of the C equivalent (rgb24ToY_c), which has a register width of 16 bytes (i.e. not avx2). For PPC64, which has equal register width in its altivec instruction set, I'd expect a roughly equal speedup. I now want to figure out why you're not seeing a ~4x speedup in your altivec/ppc64 implementation of rgb24ToY, and hopefully that can serve as a template for understanding why in general, you're not seeing any speedups. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel