Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
On Mon, Dec 24, 2018 at 07:39:18PM +0200, Lauri Kasanen wrote: > On Sun, 16 Dec 2018 11:06:53 +0200 > Lauri Kasanen wrote: > > > This function wouldn't benefit from VSX instructions, so I put it > > under altivec. > > > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > grayf32le \ > > -f null -vframes 100 -v error -nostats - > > > > 3743 UNITS in planar1, 65495 runs, 41 skips > > > > -cpuflags 0 > > > > 23511 UNITS in planar1, 65530 runs, 6 skips > > > > grayf32be > > > > 4647 UNITS in planar1, 65449 runs, 87 skips > > > > -cpuflags 0 > > > > 28608 UNITS in planar1, 65530 runs, 6 skips > > > > The native speedup is 6.28133, and the bswapping one 6.15623. > > Fate passes, each format tested with an image to video conversion. > > > > Signed-off-by: Lauri Kasanen > > --- > > > > Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. > > > > v2: Added #undef vzero, that define broke the build on older gcc. Thanks > > Michael > > Ping. And of course it's not gcc version dependant, but rather it was > the BE ifdef; it was too early in the morning. seems working, will apply thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Rewriting code that is poorly written but fully understood is good. Rewriting code that one doesnt understand is a sign that one is less smart then the original author, trying to rewrite it will not make it better. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
On Sun, 16 Dec 2018 11:06:53 +0200 Lauri Kasanen wrote: > This function wouldn't benefit from VSX instructions, so I put it > under altivec. > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > grayf32le \ > -f null -vframes 100 -v error -nostats - > > 3743 UNITS in planar1, 65495 runs, 41 skips > > -cpuflags 0 > > 23511 UNITS in planar1, 65530 runs, 6 skips > > grayf32be > > 4647 UNITS in planar1, 65449 runs, 87 skips > > -cpuflags 0 > > 28608 UNITS in planar1, 65530 runs, 6 skips > > The native speedup is 6.28133, and the bswapping one 6.15623. > Fate passes, each format tested with an image to video conversion. > > Signed-off-by: Lauri Kasanen > --- > > Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. > > v2: Added #undef vzero, that define broke the build on older gcc. Thanks > Michael Ping. And of course it's not gcc version dependant, but rather it was the BE ifdef; it was too early in the morning. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
On Mon, 17 Dec 2018 14:52:49 +0100 Carl Eugen Hoyos wrote: > >> Note that this function / this pix_fmt currently has no real use-case > >> afaict. > > > > Is there a list of which pix fmts are useful? Of course I don't want to > > waste both my and reviewers' time, if the format is considered for > > removal or otherwise broken. > > The pix_fmt is not deprecated (it's new), what I meant was that it is > currently only used for obscure monochrome Photoshop images > and one filter, so I am not sure optimizing this colour conversion > will help often. Oh, thanks for the clarification. I'm going roughly in difficulty order, doing the easy functions first. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
2018-12-17 8:37 GMT+01:00, Lauri Kasanen : > On Mon, 17 Dec 2018 01:03:36 +0100 > Carl Eugen Hoyos wrote: > >> 2018-12-16 10:06 GMT+01:00, Lauri Kasanen : >> > This function wouldn't benefit from VSX instructions, so I put it >> > under altivec. >> > >> > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt >> > grayf32le \ >> > -f null -vframes 100 -v error -nostats - >> > >> > 3743 UNITS in planar1, 65495 runs, 41 skips >> > >> > -cpuflags 0 >> > >> > 23511 UNITS in planar1, 65530 runs, 6 skips >> > >> > grayf32be >> > >> > 4647 UNITS in planar1, 65449 runs, 87 skips >> > >> > -cpuflags 0 >> > >> > 28608 UNITS in planar1, 65530 runs, 6 skips >> > >> > The native speedup is 6.28133, and the bswapping one 6.15623. >> >> > Fate passes >> >> I wonder a little how, given that grayf32 already breaks fate as-is... > > Are the tests for it disabled? fate.ffmpeg.org reports 100% success for > many platforms. Iirc, it is broken with --disable-sse >> Note that this function / this pix_fmt currently has no real use-case >> afaict. > > Is there a list of which pix fmts are useful? Of course I don't want to > waste both my and reviewers' time, if the format is considered for > removal or otherwise broken. The pix_fmt is not deprecated (it's new), what I meant was that it is currently only used for obscure monochrome Photoshop images and one filter, so I am not sure optimizing this colour conversion will help often. But this is of course not very much related to this patch, sorry for the noise! Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
On Mon, 17 Dec 2018 01:03:36 +0100 Carl Eugen Hoyos wrote: > 2018-12-16 10:06 GMT+01:00, Lauri Kasanen : > > This function wouldn't benefit from VSX instructions, so I put it > > under altivec. > > > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > grayf32le \ > > -f null -vframes 100 -v error -nostats - > > > > 3743 UNITS in planar1, 65495 runs, 41 skips > > > > -cpuflags 0 > > > > 23511 UNITS in planar1, 65530 runs, 6 skips > > > > grayf32be > > > > 4647 UNITS in planar1, 65449 runs, 87 skips > > > > -cpuflags 0 > > > > 28608 UNITS in planar1, 65530 runs, 6 skips > > > > The native speedup is 6.28133, and the bswapping one 6.15623. > > > Fate passes > > I wonder a little how, given that grayf32 already breaks fate as-is... Are the tests for it disabled? fate.ffmpeg.org reports 100% success for many platforms. > Note that this function / this pix_fmt currently has no real use-case > afaict. Is there a list of which pix fmts are useful? Of course I don't want to waste both my and reviewers' time, if the format is considered for removal or otherwise broken. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
2018-12-16 10:06 GMT+01:00, Lauri Kasanen : > This function wouldn't benefit from VSX instructions, so I put it > under altivec. > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > grayf32le \ > -f null -vframes 100 -v error -nostats - > > 3743 UNITS in planar1, 65495 runs, 41 skips > > -cpuflags 0 > > 23511 UNITS in planar1, 65530 runs, 6 skips > > grayf32be > > 4647 UNITS in planar1, 65449 runs, 87 skips > > -cpuflags 0 > > 28608 UNITS in planar1, 65530 runs, 6 skips > > The native speedup is 6.28133, and the bswapping one 6.15623. > Fate passes I wonder a little how, given that grayf32 already breaks fate as-is... Note that this function / this pix_fmt currently has no real use-case afaict. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v2] swscale/output: Altivec-optimize float yuv2plane1
This function wouldn't benefit from VSX instructions, so I put it under altivec. ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \ -f null -vframes 100 -v error -nostats - 3743 UNITS in planar1, 65495 runs, 41 skips -cpuflags 0 23511 UNITS in planar1, 65530 runs, 6 skips grayf32be 4647 UNITS in planar1, 65449 runs, 87 skips -cpuflags 0 28608 UNITS in planar1, 65530 runs, 6 skips The native speedup is 6.28133, and the bswapping one 6.15623. Fate passes, each format tested with an image to video conversion. Signed-off-by: Lauri Kasanen --- Tested on POWER8 LE. Testing on earlier ppc and/or BE appreciated. v2: Added #undef vzero, that define broke the build on older gcc. Thanks Michael libswscale/ppc/swscale_altivec.c | 141 ++- 1 file changed, 139 insertions(+), 2 deletions(-) diff --git a/libswscale/ppc/swscale_altivec.c b/libswscale/ppc/swscale_altivec.c index 1d2b2fa..d72ed1e 100644 --- a/libswscale/ppc/swscale_altivec.c +++ b/libswscale/ppc/swscale_altivec.c @@ -31,7 +31,8 @@ #include "yuv2rgb_altivec.h" #include "libavutil/ppc/util_altivec.h" -#if HAVE_ALTIVEC && HAVE_BIGENDIAN +#if HAVE_ALTIVEC +#if HAVE_BIGENDIAN #define vzero vec_splat_s32(0) #define GET_LS(a,b,c,s) {\ @@ -102,7 +103,137 @@ #include "swscale_ppc_template.c" #undef FUNC -#endif /* HAVE_ALTIVEC && HAVE_BIGENDIAN */ +#undef vzero + +#endif /* HAVE_BIGENDIAN */ + +#define output_pixel(pos, val, bias, signedness) \ +if (big_endian) { \ +AV_WB16(pos, bias + av_clip_ ## signedness ## 16(val >> shift)); \ +} else { \ +AV_WL16(pos, bias + av_clip_ ## signedness ## 16(val >> shift)); \ +} + +static void +yuv2plane1_float_u(const int32_t *src, float *dest, int dstW, int start) +{ +static const int big_endian = HAVE_BIGENDIAN; +static const int shift = 3; +static const float float_mult = 1.0f / 65535.0f; +int i, val; +uint16_t val_uint; + +for (i = start; i < dstW; ++i){ +val = src[i] + (1 << (shift - 1)); +output_pixel(&val_uint, val, 0, uint); +dest[i] = float_mult * (float)val_uint; +} +} + +static void +yuv2plane1_float_bswap_u(const int32_t *src, uint32_t *dest, int dstW, int start) +{ +static const int big_endian = HAVE_BIGENDIAN; +static const int shift = 3; +static const float float_mult = 1.0f / 65535.0f; +int i, val; +uint16_t val_uint; + +for (i = start; i < dstW; ++i){ +val = src[i] + (1 << (shift - 1)); +output_pixel(&val_uint, val, 0, uint); +dest[i] = av_bswap32(av_float2int(float_mult * (float)val_uint)); +} +} + +static void yuv2plane1_float_altivec(const int32_t *src, float *dest, int dstW) +{ +const int dst_u = -(uintptr_t)dest & 3; +const int shift = 3; +const int add = (1 << (shift - 1)); +const int clip = (1 << 16) - 1; +const float fmult = 1.0f / 65535.0f; +const vector uint32_t vadd = (vector uint32_t) {add, add, add, add}; +const vector uint32_t vshift = (vector uint32_t) vec_splat_u32(shift); +const vector uint32_t vlargest = (vector uint32_t) {clip, clip, clip, clip}; +const vector float vmul = (vector float) {fmult, fmult, fmult, fmult}; +const vector float vzero = (vector float) {0, 0, 0, 0}; +vector uint32_t v; +vector float vd; +int i; + +yuv2plane1_float_u(src, dest, dst_u, 0); + +for (i = dst_u; i < dstW - 3; i += 4) { +v = vec_ld(0, (const uint32_t *) &src[i]); +v = vec_add(v, vadd); +v = vec_sr(v, vshift); +v = vec_min(v, vlargest); + +vd = vec_ctf(v, 0); +vd = vec_madd(vd, vmul, vzero); + +vec_st(vd, 0, &dest[i]); +} + +yuv2plane1_float_u(src, dest, dstW, i); +} + +static void yuv2plane1_float_bswap_altivec(const int32_t *src, uint32_t *dest, int dstW) +{ +const int dst_u = -(uintptr_t)dest & 3; +const int shift = 3; +const int add = (1 << (shift - 1)); +const int clip = (1 << 16) - 1; +const float fmult = 1.0f / 65535.0f; +const vector uint32_t vadd = (vector uint32_t) {add, add, add, add}; +const vector uint32_t vshift = (vector uint32_t) vec_splat_u32(shift); +const vector uint32_t vlargest = (vector uint32_t) {clip, clip, clip, clip}; +const vector float vmul = (vector float) {fmult, fmult, fmult, fmult}; +const vector float vzero = (vector float) {0, 0, 0, 0}; +const vector uint32_t vswapbig = (vector uint32_t) {16, 16, 16, 16}; +const vector uint16_t vswapsmall = vec_splat_u16(8); +vector uint32_t v; +vector float vd; +int i; + +yuv2plane1_float_bswap_u(src, dest, dst_u, 0); + +for (i = dst_u; i < dstW - 3; i += 4) { +v = vec_ld(0, (const uint32_t *) &src[i]); +v = vec_add(v, vadd); +v = vec_sr(v, vshift); +v = vec_min(v, vlargest); + +vd = vec_ctf(v, 0); +vd = vec_madd(vd, vmul, vzero); + +vd = (vector fl