[FFmpeg-devel] [PATCH] zscale video filter performance optimization 4x
Optimizations: by ffmpeg threading support implementation via frame slicing and moving zimg_filter_graph_build to the filter initialization phase from each frame processig the performance increase vs original version in video downscale and color conversion up to 4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 779 ++-- 1 file changed, 433 insertions(+), 346 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..1a2de1fe21 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * * 2022 Victoria Zhislina, Intel Corporation - performance optimization + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,14 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; -void *tmp; -size_t tmp_size; +void *tmp[MAX_THREADS]; //separate for each thread; +int nb_threads; +int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; zimg_graph_builder_params alpha_params, params; -zimg_filter_graph *alpha_graph, *graph; +zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +132,167 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; + +typedef struct ThreadData { +const AVPixFmtDescriptor *desc, *odesc; +AVFrame *in, *out; +} ThreadData; + +static int convert_chroma_location(enum AVChromaLocation chroma_location) +{ +switch (chroma_location) { +case AVCHROMA_LOC_UNSPECIFIED: +case AVCHROMA_LOC_LEFT: +return ZIMG_CHROMA_LEFT; +case AVCHROMA_LOC_CENTER: +return ZIMG_CHROMA_CENTER; +case AVCHROMA_LOC_TOPLEFT: +return ZIMG_CHROMA_TOP_LEFT; +case AVCHROMA_LOC_TOP: +return ZIMG_CHROMA_TOP; +case AVCHROMA_LOC_BOTTOMLEFT: +return ZIMG_CHROMA_BOTTOM_LEFT; +case AVCHROMA_LOC_BOTTOM: +return ZIMG_CHROMA_BOTTOM; +} +return ZIMG_CHROMA_LEFT; +} + +static int convert_matrix(enum AVColorSpace colorspace) +{ +switch (colorspace) { +case AVCOL_SPC_RGB: +return ZIMG_MATRIX_RGB; +case AVCOL_SPC_BT709: +return ZIMG_MATRIX_709; +case AVCOL_SPC_UNSPECIFIED: +return ZIMG_MATRIX_UNSPECIFIED; +case AVCOL_SPC_FCC: +return ZIMG_MATRIX_FCC; +case AVCOL_SPC_BT470BG: +return ZIMG_MATRIX_470BG; +case AVCOL_SPC_SMPTE170M: +return ZIMG_MATRIX_170M; +case AVCOL_SPC_SMPTE240M: +return ZIMG_MATRIX_240M; +case AVCOL_SPC_YCGCO: +return ZIMG_MATRIX_YCGCO; +case AVCOL_SPC_BT2020_NCL: +return ZIMG_MATRIX_2020_NCL; +case AVCOL_SPC_BT2020_CL: +return ZIMG_MATRIX_2020_CL; +case AVCOL_SPC_CHROMA_DERIVED_NCL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_NCL; +case AVCOL_SPC_CHROMA_DERIVED_CL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_CL; +case AVCOL_SPC_ICTCP: +return ZIMG_MATRIX_ICTCP; +} +return ZIMG_MATRIX_UNSPECIFIED; +} + +static int convert_trc(enum AVColorTransferCharacteristic color_trc) +{ +switch (color_trc) { +case AVCOL_TRC_UNSPECIFIED: +return ZIMG_TRANSFER_UNSPECIFIED; +case AVCOL_TRC_BT709: +return ZIMG_TRANSFER_709; +case AVCOL_TRC_GAMMA22: +return ZIMG_TRANSFER_470_M; +case AVCOL_TRC_GAMMA28: +return ZIMG_TRANSFER_470_BG; +case AVCOL_TRC_SMPTE170M: +return ZIMG_TRANSFER_601; +case AVCOL_TRC_SMPTE240M: +return ZIMG_TRANSFER_240M; +case AVCOL_TRC_LINEAR: +return ZIMG_TRANSFER_LINEAR; +case AVCOL_TRC_LOG: +return ZIMG_TRANSFER_LOG_100; +case AVCOL_TRC_LOG_SQRT: +return ZIMG_TRANSFER_LOG_316; +case AVCOL_TRC_IEC61966_2_4: +return ZIMG_TRANSFER_IEC_61966_2_4; +case AVCOL_TRC_BT2020_10: +return ZIMG_TRANSFER_2020_10; +case AVCOL_TRC_BT2020_12: +return ZIMG_TRANSFER_2020_12; +case AVCOL_TRC_SMPTE2084: +return ZIMG_TRANSFER_ST2084; +case AVCOL_TRC_ARIB_STD_B67: +return ZIMG_TRANSFER_ARIB_B67; +case AVCOL_TRC_IEC61966_2_1: +return ZIMG_TRANSFER_IEC_61966_2_1; +} +return ZIMG_TRANSFER_UNSPECIFIED; +} + +static int convert_primaries(enum AVColorPrimaries color_primaries) +{ +switch (color_primaries) { +case AVCOL_PRI_UNSPECIFIED: +return ZIMG_PRIMARIES_UNSPECIFIED; +case AV
Re: [FFmpeg-devel] [PATCH] zscale video filter performance optimization 4x
Paul, thanks a lot for reviewing. The answer is - yes, the move saves up to 30% of performance for a single thread execution, so it is necessary. I wasn't aware of the cases where width/height can change between frames - never seen them in real life, but right you are, iI will change my code accordingly - to make re-init if some change happened. Thanks again! On Fri, Feb 4, 2022 at 8:17 PM Paul B Mahol wrote: > Is moving all this code really needed? > > Note that width/height can change between frames so that is supported > by scale filter. > With your change that is not more possible? > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
By ffmpeg threading support implementation via frame slicing and doing zimg_filter_graph_build that used to take 30-60% of each frame processig only if necessary (some parameters changed) the performance increase vs original version in video downscale and color conversion >4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 786 1 file changed, 475 insertions(+), 311 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..ce4c0b2c76 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * * 2022 Victoria Zhislina, Intel - performance optimization + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,17 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; -void *tmp; -size_t tmp_size; +void *tmp[MAX_THREADS]; //separate for each thread; + int nb_threads; +int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; +zimg_image_format src_format_tmp, dst_format_tmp; +zimg_image_format alpha_src_format_tmp, alpha_dst_format_tmp; zimg_graph_builder_params alpha_params, params; -zimg_filter_graph *alpha_graph, *graph; +zimg_graph_builder_params alpha_params_tmp, params_tmp; +zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +135,181 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; + +typedef struct ThreadData { +const AVPixFmtDescriptor *desc, *odesc; +AVFrame *in, *out; +} ThreadData; + +static int convert_chroma_location(enum AVChromaLocation chroma_location) +{ +switch (chroma_location) { +case AVCHROMA_LOC_UNSPECIFIED: +case AVCHROMA_LOC_LEFT: +return ZIMG_CHROMA_LEFT; +case AVCHROMA_LOC_CENTER: +return ZIMG_CHROMA_CENTER; +case AVCHROMA_LOC_TOPLEFT: +return ZIMG_CHROMA_TOP_LEFT; +case AVCHROMA_LOC_TOP: +return ZIMG_CHROMA_TOP; +case AVCHROMA_LOC_BOTTOMLEFT: +return ZIMG_CHROMA_BOTTOM_LEFT; +case AVCHROMA_LOC_BOTTOM: +return ZIMG_CHROMA_BOTTOM; +} +return ZIMG_CHROMA_LEFT; +} + +static int convert_matrix(enum AVColorSpace colorspace) +{ +switch (colorspace) { +case AVCOL_SPC_RGB: +return ZIMG_MATRIX_RGB; +case AVCOL_SPC_BT709: +return ZIMG_MATRIX_709; +case AVCOL_SPC_UNSPECIFIED: +return ZIMG_MATRIX_UNSPECIFIED; +case AVCOL_SPC_FCC: +return ZIMG_MATRIX_FCC; +case AVCOL_SPC_BT470BG: +return ZIMG_MATRIX_470BG; +case AVCOL_SPC_SMPTE170M: +return ZIMG_MATRIX_170M; +case AVCOL_SPC_SMPTE240M: +return ZIMG_MATRIX_240M; +case AVCOL_SPC_YCGCO: +return ZIMG_MATRIX_YCGCO; +case AVCOL_SPC_BT2020_NCL: +return ZIMG_MATRIX_2020_NCL; +case AVCOL_SPC_BT2020_CL: +return ZIMG_MATRIX_2020_CL; +case AVCOL_SPC_CHROMA_DERIVED_NCL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_NCL; +case AVCOL_SPC_CHROMA_DERIVED_CL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_CL; +case AVCOL_SPC_ICTCP: +return ZIMG_MATRIX_ICTCP; +} +return ZIMG_MATRIX_UNSPECIFIED; +} + +static int convert_trc(enum AVColorTransferCharacteristic color_trc) +{ +switch (color_trc) { +case AVCOL_TRC_UNSPECIFIED: +return ZIMG_TRANSFER_UNSPECIFIED; +case AVCOL_TRC_BT709: +return ZIMG_TRANSFER_709; +case AVCOL_TRC_GAMMA22: +return ZIMG_TRANSFER_470_M; +case AVCOL_TRC_GAMMA28: +return ZIMG_TRANSFER_470_BG; +case AVCOL_TRC_SMPTE170M: +return ZIMG_TRANSFER_601; +case AVCOL_TRC_SMPTE240M: +return ZIMG_TRANSFER_240M; +case AVCOL_TRC_LINEAR: +return ZIMG_TRANSFER_LINEAR; +case AVCOL_TRC_LOG: +return ZIMG_TRANSFER_LOG_100; +case AVCOL_TRC_LOG_SQRT: +return ZIMG_TRANSFER_LOG_316; +case AVCOL_TRC_IEC61966_2_4: +return ZIMG_TRANSFER_IEC_61966_2_4; +case AVCOL_TRC_BT2020_10: +return ZIMG_TRANSFER_2020_10; +case AVCOL_TRC_BT2020_12: +return ZIMG_TRANSFER_2020_12; +case AVCOL_TRC_SMPTE2084: +return ZIMG_TRANSFER_ST2084; +case AVCOL_TRC_ARIB_STD_B67: +return ZIMG_TRANSFER_ARIB_B67; +case AVCOL_TRC_IEC61966_2_1: +return ZIMG_TRANSFER_IEC_61966_2_1; +} +return ZIMG_TRANSFER_UNSPECIFIED; +} + +static int conver
Re: [FFmpeg-devel] [PATCH] zscale video filter performance optimization 4x
Yes,yes,yes, moreover the graph should be rebuilt if some other parameters change - interpolation method etc, also when a single input splitted by zscaling to 2 different resolution streams. So I've submitted a new patch version that fixes everything. It has a lot of checks inside but they are absolutely necessary. They save 30-60% of performance in a single threading case! On Wed, Feb 9, 2022 at 12:38 PM Guillaume POIRIER wrote: > Hello Victoria, > > On Sun, 6 Feb 2022 at 16:12, Victoria Zhislina wrote: > > > I wasn't aware of the cases where width/height can change between frames > - > > never seen them in real life, but right you are, iI will change my code > > accordingly - to make re-init if some change happened. > > > If you want to create such a sample, take 2 short HEVC clips with > different resolutions, extract the ES, and concat them together: > > ffmpeg -i INPUT1.mp4 -codec copy -bsf:v hevc_mp4toannexb OUTPUT1.265 > ffmpeg -i INPUT2.mp4 -codec copy -bsf:v hevc_mp4toannexb OUTPUT2.265 > > cat OUTPUT1.265 OUTPUT2.265 > mixed_res.265 > > Guillaume > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
By ffmpeg threading support implementation via frame slicing and doing zimg_filter_graph_build that used to take 30-60% of each frame processig only if necessary (some parameters changed) the performance increase vs original version in video downscale and color conversion >4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 787 1 file changed, 475 insertions(+), 312 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..ea2565025f 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * 2022 Victoria Zhislina, Intel + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,17 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; -void *tmp; -size_t tmp_size; +void *tmp[MAX_THREADS]; //separate for each thread; +int nb_threads; +int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; +zimg_image_format src_format_tmp, dst_format_tmp; +zimg_image_format alpha_src_format_tmp, alpha_dst_format_tmp; zimg_graph_builder_params alpha_params, params; -zimg_filter_graph *alpha_graph, *graph; +zimg_graph_builder_params alpha_params_tmp, params_tmp; +zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +135,181 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; + +typedef struct ThreadData { +const AVPixFmtDescriptor *desc, *odesc; +AVFrame *in, *out; +} ThreadData; + +static int convert_chroma_location(enum AVChromaLocation chroma_location) +{ +switch (chroma_location) { +case AVCHROMA_LOC_UNSPECIFIED: +case AVCHROMA_LOC_LEFT: +return ZIMG_CHROMA_LEFT; +case AVCHROMA_LOC_CENTER: +return ZIMG_CHROMA_CENTER; +case AVCHROMA_LOC_TOPLEFT: +return ZIMG_CHROMA_TOP_LEFT; +case AVCHROMA_LOC_TOP: +return ZIMG_CHROMA_TOP; +case AVCHROMA_LOC_BOTTOMLEFT: +return ZIMG_CHROMA_BOTTOM_LEFT; +case AVCHROMA_LOC_BOTTOM: +return ZIMG_CHROMA_BOTTOM; +} +return ZIMG_CHROMA_LEFT; +} + +static int convert_matrix(enum AVColorSpace colorspace) +{ +switch (colorspace) { +case AVCOL_SPC_RGB: +return ZIMG_MATRIX_RGB; +case AVCOL_SPC_BT709: +return ZIMG_MATRIX_709; +case AVCOL_SPC_UNSPECIFIED: +return ZIMG_MATRIX_UNSPECIFIED; +case AVCOL_SPC_FCC: +return ZIMG_MATRIX_FCC; +case AVCOL_SPC_BT470BG: +return ZIMG_MATRIX_470BG; +case AVCOL_SPC_SMPTE170M: +return ZIMG_MATRIX_170M; +case AVCOL_SPC_SMPTE240M: +return ZIMG_MATRIX_240M; +case AVCOL_SPC_YCGCO: +return ZIMG_MATRIX_YCGCO; +case AVCOL_SPC_BT2020_NCL: +return ZIMG_MATRIX_2020_NCL; +case AVCOL_SPC_BT2020_CL: +return ZIMG_MATRIX_2020_CL; +case AVCOL_SPC_CHROMA_DERIVED_NCL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_NCL; +case AVCOL_SPC_CHROMA_DERIVED_CL: +return ZIMG_MATRIX_CHROMATICITY_DERIVED_CL; +case AVCOL_SPC_ICTCP: +return ZIMG_MATRIX_ICTCP; +} +return ZIMG_MATRIX_UNSPECIFIED; +} + +static int convert_trc(enum AVColorTransferCharacteristic color_trc) +{ +switch (color_trc) { +case AVCOL_TRC_UNSPECIFIED: +return ZIMG_TRANSFER_UNSPECIFIED; +case AVCOL_TRC_BT709: +return ZIMG_TRANSFER_709; +case AVCOL_TRC_GAMMA22: +return ZIMG_TRANSFER_470_M; +case AVCOL_TRC_GAMMA28: +return ZIMG_TRANSFER_470_BG; +case AVCOL_TRC_SMPTE170M: +return ZIMG_TRANSFER_601; +case AVCOL_TRC_SMPTE240M: +return ZIMG_TRANSFER_240M; +case AVCOL_TRC_LINEAR: +return ZIMG_TRANSFER_LINEAR; +case AVCOL_TRC_LOG: +return ZIMG_TRANSFER_LOG_100; +case AVCOL_TRC_LOG_SQRT: +return ZIMG_TRANSFER_LOG_316; +case AVCOL_TRC_IEC61966_2_4: +return ZIMG_TRANSFER_IEC_61966_2_4; +case AVCOL_TRC_BT2020_10: +return ZIMG_TRANSFER_2020_10; +case AVCOL_TRC_BT2020_12: +return ZIMG_TRANSFER_2020_12; +case AVCOL_TRC_SMPTE2084: +return ZIMG_TRANSFER_ST2084; +case AVCOL_TRC_ARIB_STD_B67: +return ZIMG_TRANSFER_ARIB_B67; +case AVCOL_TRC_IEC61966_2_1: +return ZIMG_TRANSFER_IEC_61966_2_1; +} +return ZIMG_TRANSFER_UNSPECIFIED; +} + +static int convert_primaries(enum AVColorPrimarie
[FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
By ffmpeg threading support implementation via frame slicing and doing zimg_filter_graph_build that used to take 30-60% of each frame processig only if necessary (some parameters changed) the performance increase vs original version in video downscale and color conversion >4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 417 +++- 1 file changed, 288 insertions(+), 129 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..61418d4a4a 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * 2022 Victoria Zhislina, Intel + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,17 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; -void *tmp; -size_t tmp_size; +void *tmp[MAX_THREADS]; //separate for each thread; +int nb_threads; +int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; +zimg_image_format src_format_tmp, dst_format_tmp; +zimg_image_format alpha_src_format_tmp, alpha_dst_format_tmp; zimg_graph_builder_params alpha_params, params; -zimg_filter_graph *alpha_graph, *graph; +zimg_graph_builder_params alpha_params_tmp, params_tmp; +zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +135,36 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; +typedef struct ThreadData { +const AVPixFmtDescriptor *desc, *odesc; +AVFrame *in, *out; +} ThreadData; + static av_cold int init(AVFilterContext *ctx) { ZScaleContext *s = ctx->priv; int ret; +int i; + +for (i = 0; i < MAX_THREADS; i++) { +s->tmp[i] = NULL; +s->graph[i] = NULL; +s->alpha_graph[i] = NULL; +} +zimg_image_format_default(&s->src_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->dst_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->src_format_tmp, ZIMG_API_VERSION); +zimg_image_format_default(&s->dst_format_tmp, ZIMG_API_VERSION); + +zimg_image_format_default(&s->alpha_src_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_dst_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_src_format_tmp, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_dst_format_tmp, ZIMG_API_VERSION); + +zimg_graph_builder_params_default(&s->params, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->params_tmp, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->alpha_params, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->alpha_params_tmp, ZIMG_API_VERSION); if (s->size_str && (s->w_expr || s->h_expr)) { av_log(ctx, AV_LOG_ERROR, @@ -158,7 +191,6 @@ static av_cold int init(AVFilterContext *ctx) av_opt_set(s, "w", "iw", 0); if (!s->h_expr) av_opt_set(s, "h", "ih", 0); - return 0; } @@ -471,6 +503,51 @@ static enum AVColorRange convert_range_from_zimg(enum zimg_pixel_range_e color_r return AVCOL_RANGE_UNSPECIFIED; } +/* returns 0 if image formats are the same and 1 otherwise */ +static int compare_zimg_image_formats(zimg_image_format *img_fmt0, zimg_image_format *img_fmt1) +{ +return ((img_fmt0->chroma_location != img_fmt1->chroma_location) || +#if ZIMG_API_VERSION >= 0x204 +(img_fmt0->alpha != img_fmt1->alpha) || +#endif +(img_fmt0->color_family != img_fmt1->color_family) || +(img_fmt0->color_primaries != img_fmt1->color_primaries) || +(img_fmt0->depth != img_fmt1->depth) || +(img_fmt0->field_parity != img_fmt1->field_parity) || +(img_fmt0->height != img_fmt1->height) || +(img_fmt0->matrix_coefficients != img_fmt1->matrix_coefficients) || +(img_fmt0->pixel_range != img_fmt1->pixel_range) || +(img_fmt0->pixel_type != img_fmt1->pixel_type) || +(img_fmt0->subsample_h != img_fmt1->subsample_h) || +(img_fmt0->subsample_w != img_fmt1->subsample_w) || +(img_fmt0->transfer_characteristics != img_fmt1->transfer_characteristics) || +(img_fmt0->width != img_fmt1->width)); +} +
[FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
By ffmpeg threading support implementation via frame slicing and doing zimg_filter_graph_build that used to take 30-60% of each frame processig only if necessary (some parameters changed) the performance increase vs original version in video downscale and color conversion >4x is seen on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) Signed-off-by: Victoria Zhislina --- libavfilter/vf_zscale.c | 413 +++- 1 file changed, 284 insertions(+), 129 deletions(-) diff --git a/libavfilter/vf_zscale.c b/libavfilter/vf_zscale.c index 1288c5efc1..dd0017607e 100644 --- a/libavfilter/vf_zscale.c +++ b/libavfilter/vf_zscale.c @@ -1,6 +1,7 @@ /* * Copyright (c) 2015 Paul B Mahol - * + * 2022 Victoria Zhislina, Intel + * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or @@ -44,6 +45,8 @@ #include "libavutil/imgutils.h" #define ZIMG_ALIGNMENT 32 +#define MIN_TILESIZE 64 +#define MAX_THREADS 64 static const char *const var_names[] = { "in_w", "iw", @@ -113,13 +116,17 @@ typedef struct ZScaleContext { int force_original_aspect_ratio; -void *tmp; -size_t tmp_size; +void *tmp[MAX_THREADS]; //separate for each thread; +int nb_threads; +int slice_h; zimg_image_format src_format, dst_format; zimg_image_format alpha_src_format, alpha_dst_format; +zimg_image_format src_format_tmp, dst_format_tmp; +zimg_image_format alpha_src_format_tmp, alpha_dst_format_tmp; zimg_graph_builder_params alpha_params, params; -zimg_filter_graph *alpha_graph, *graph; +zimg_graph_builder_params alpha_params_tmp, params_tmp; +zimg_filter_graph *alpha_graph[MAX_THREADS], *graph[MAX_THREADS]; enum AVColorSpace in_colorspace, out_colorspace; enum AVColorTransferCharacteristic in_trc, out_trc; @@ -128,10 +135,35 @@ typedef struct ZScaleContext { enum AVChromaLocation in_chromal, out_chromal; } ZScaleContext; +typedef struct ThreadData { +const AVPixFmtDescriptor *desc, *odesc; +AVFrame *in, *out; +} ThreadData; + static av_cold int init(AVFilterContext *ctx) { ZScaleContext *s = ctx->priv; int ret; +int i; +for (i = 0; i < MAX_THREADS; i++) { +s->tmp[i] = NULL; +s->graph[i] = NULL; +s->alpha_graph[i] = NULL; +} +zimg_image_format_default(&s->src_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->dst_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->src_format_tmp, ZIMG_API_VERSION); +zimg_image_format_default(&s->dst_format_tmp, ZIMG_API_VERSION); + +zimg_image_format_default(&s->alpha_src_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_dst_format, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_src_format_tmp, ZIMG_API_VERSION); +zimg_image_format_default(&s->alpha_dst_format_tmp, ZIMG_API_VERSION); + +zimg_graph_builder_params_default(&s->params, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->params_tmp, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->alpha_params, ZIMG_API_VERSION); +zimg_graph_builder_params_default(&s->alpha_params_tmp, ZIMG_API_VERSION); if (s->size_str && (s->w_expr || s->h_expr)) { av_log(ctx, AV_LOG_ERROR, @@ -158,7 +190,6 @@ static av_cold int init(AVFilterContext *ctx) av_opt_set(s, "w", "iw", 0); if (!s->h_expr) av_opt_set(s, "h", "ih", 0); - return 0; } @@ -471,6 +502,51 @@ static enum AVColorRange convert_range_from_zimg(enum zimg_pixel_range_e color_r return AVCOL_RANGE_UNSPECIFIED; } +/* returns 0 if image formats are the same and 1 otherwise */ +static int compare_zimg_image_formats(zimg_image_format *img_fmt0, zimg_image_format *img_fmt1) +{ +return ((img_fmt0->chroma_location != img_fmt1->chroma_location) || +#if ZIMG_API_VERSION >= 0x204 +(img_fmt0->alpha != img_fmt1->alpha) || +#endif +(img_fmt0->color_family != img_fmt1->color_family) || +(img_fmt0->color_primaries != img_fmt1->color_primaries) || +(img_fmt0->depth != img_fmt1->depth) || +(img_fmt0->field_parity != img_fmt1->field_parity) || +(img_fmt0->height != img_fmt1->height) || +(img_fmt0->matrix_coefficients != img_fmt1->matrix_coefficients) || +(img_fmt0->pixel_range != img_fmt1->pixel_range) || +(img_fmt0->pixel_type != img_fmt1->pixel_type) || +(img_fmt0->subsample_h != img_fmt1->subsample_h) || +(img_fmt0->subsample_w != img_fmt1->subsample_w) || +(img_fmt0->transfer_characteristics != img_fmt1->transfer_characteristics) || +(img_fmt0->width != img_fmt1->width)); +} +
Re: [FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
Hi, Anton. Thanks for your input. But the patch does the single thing described in the commit message - improves performance >4 :) Sorry. This patch creation is based on real experience and on real measurement. Please notice I don't write ">40x" so it is not an advertisement :). Right you are, under the hood it does 2 main things and one small additional one combined to achieve the performance gain mentioned in the commit message :) However the changes are extremely local - they cover just a couple of functions in a single file and It doesn't make sense to split them. It seems to me split will make ffmpeg-devel mail list and ffmpeg git log dirtier, not cleaner. So let's wait for the Paul B Mahol opinion - it is his code that I've modified. On Mon, Feb 21, 2022 at 2:22 PM Anton Khirnov wrote: > > libavfilter: zscale performance optimization >4x > > This reads like an advertisement rather than a useful description. It > should say what the patch does, performance improvement numbers should > be mentioned in the commit message body. > > Quoting Victoria Zhislina (2022-02-21 09:20:55) > > By ffmpeg threading support implementation via frame slicing and doing > > zimg_filter_graph_build that used to take 30-60% of each frame processig > > only if necessary (some parameters changed) > > the performance increase vs original version > > in video downscale and color conversion >4x is seen > > on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) > > This implies the patch does multiple unrelated things. Then it should be > split in multiple patches, unless some important factor prevents that > (then that factor should be described in the commit message). > > -- > Anton Khirnov > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
Paul, I've got exactly the same feelings on zing threading usage myself and decided to go along the standard ffmpeg threading route. It looks more consistent here. Megathanks for reviewing my patch, I've fixed everything you've mentioned and even more. On Tue, Feb 22, 2022 at 11:15 AM Paul B Mahol wrote: > On Tue, Feb 22, 2022 at 9:15 AM Paul B Mahol wrote: > > > > > > > On Tue, Feb 22, 2022 at 6:25 AM Lynne wrote: > > > >> 19 Feb 2022, 14:58 by niva...@gmail.com: > >> > >> > By ffmpeg threading support implementation via frame slicing and doing > >> > zimg_filter_graph_build that used to take 30-60% of each frame > processig > >> > only if necessary (some parameters changed) > >> > the performance increase vs original version > >> > in video downscale and color conversion >4x is seen > >> > on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) > >> > > >> > Signed-off-by: Victoria Zhislina > >> > > >> > >> Can't you patch such a feature into the upstream instead? > >> > > > > zscale already have own threading ability, but is very hard to use it, > > last time i tried. > > > > I mean zimg. > > > > > > > >> ___ > >> ffmpeg-devel mailing list > >> ffmpeg-devel@ffmpeg.org > >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >> > >> To unsubscribe, visit link above, or email > >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > >> > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
Paul and all, do you have any chances to view my patch from Feb,19? I assume I'\ve fixed all you've kindly pointed out and even more. Please correct me if I'm wrong. The only question remaining is - are you ok with the combination of threading and conditional filter operation (= do something if it is really required only) or you prefer to split it to 2 separate corresponding patches. I'd prefer the first option because it makes git ffmpeg repo and ffmpeg development cleaner not dirtier... On Tue, Feb 22, 2022 at 11:15 AM Paul B Mahol wrote: > On Tue, Feb 22, 2022 at 9:15 AM Paul B Mahol wrote: > > > > > > > On Tue, Feb 22, 2022 at 6:25 AM Lynne wrote: > > > >> 19 Feb 2022, 14:58 by niva...@gmail.com: > >> > >> > By ffmpeg threading support implementation via frame slicing and doing > >> > zimg_filter_graph_build that used to take 30-60% of each frame > processig > >> > only if necessary (some parameters changed) > >> > the performance increase vs original version > >> > in video downscale and color conversion >4x is seen > >> > on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) > >> > > >> > Signed-off-by: Victoria Zhislina > >> > > >> > >> Can't you patch such a feature into the upstream instead? > >> > > > > zscale already have own threading ability, but is very hard to use it, > > last time i tried. > > > > I mean zimg. > > > > > > > >> ___ > >> ffmpeg-devel mailing list > >> ffmpeg-devel@ffmpeg.org > >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >> > >> To unsubscribe, visit link above, or email > >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > >> > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavfilter: zscale performance optimization >4x
Awesome, thanks! On Thu, Mar 10, 2022 at 9:45 PM Paul B Mahol wrote: > > > On Thu, Mar 10, 2022 at 7:41 PM Victoria Zhislina > wrote: > >> Paul and all, do you have any chances to view my patch from Feb,19? I >> assume I'\ve fixed all you've kindly pointed out and even more. Please >> correct me if I'm wrong. The only question remaining is - are you ok >> with the combination of threading and conditional filter operation (= do >> something if it is really required only) or you prefer to split it to 2 >> separate corresponding patches. I'd prefer the first option because it >> makes git ffmpeg repo and ffmpeg development cleaner not dirtier... >> > > Patch was already applied and some found issues fixed. > > >> >> On Tue, Feb 22, 2022 at 11:15 AM Paul B Mahol wrote: >> >>> On Tue, Feb 22, 2022 at 9:15 AM Paul B Mahol wrote: >>> >>> > >>> > >>> > On Tue, Feb 22, 2022 at 6:25 AM Lynne wrote: >>> > >>> >> 19 Feb 2022, 14:58 by niva...@gmail.com: >>> >> >>> >> > By ffmpeg threading support implementation via frame slicing and >>> doing >>> >> > zimg_filter_graph_build that used to take 30-60% of each frame >>> processig >>> >> > only if necessary (some parameters changed) >>> >> > the performance increase vs original version >>> >> > in video downscale and color conversion >4x is seen >>> >> > on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT) >>> >> > >>> >> > Signed-off-by: Victoria Zhislina >>> >> > >>> >> >>> >> Can't you patch such a feature into the upstream instead? >>> >> >>> > >>> > zscale already have own threading ability, but is very hard to use it, >>> > last time i tried. >>> > >>> >>> I mean zimg. >>> >>> >>> > >>> > >>> >> ___ >>> >> ffmpeg-devel mailing list >>> >> ffmpeg-devel@ffmpeg.org >>> >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> >> >>> >> To unsubscribe, visit link above, or email >>> >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". >>> >> >>> > >>> ___ >>> ffmpeg-devel mailing list >>> ffmpeg-devel@ffmpeg.org >>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> >>> To unsubscribe, visit link above, or email >>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". >>> >> ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/4] avfilter/vf_zscale: fix number of threads
Thanks for the fix, lgtm as well. On Tue, Mar 15, 2022 at 12:56 AM Paul B Mahol wrote: > lgtm > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".