> -----Original Message----- > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf > Of Li, Zhong > Sent: Thursday, November 15, 2018 8:22 PM > To: FFmpeg development discussions and patches <ffmpeg- > de...@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH V5] Add a filter implementing HDR > image generation from a single exposure using deep CNNs > > > From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On > Behalf > > Of Liu Steven > > Sent: Thursday, November 15, 2018 5:40 PM > > To: FFmpeg development discussions and patches > > <ffmpeg-devel@ffmpeg.org> > > Cc: Liu Steven <l...@chinaffmpeg.org> > > Subject: Re: [FFmpeg-devel] [PATCH V5] Add a filter implementing HDR > > image generation from a single exposure using deep CNNs > > > > > > > > > 在 2018年11月14日,下午8:15,Guo, Yejun <yejun....@intel.com> > 写 > > 道: > > > > > > see the algorithm's paper and code below. > > > > > > the filter's parameter looks like: > > > > > > sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmt=gbrp10l > e > > > > > > The input of the deep CNN model is RGB24 while the output is float > > > for each color channel. This is the filter's default behavior to > > > output format with gbrpf32le. And gbrp10le is also supported as the > > > output, so we can see the rendering result in a player, as a reference. > > > > > > To generate the model file, we need modify the original script a little. > > > - set name='y' for y_final within script at > > > https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py > > > - add the following code to the script at > > > > https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict. > > > py > > > > > > graph = tf.graph_util.convert_variables_to_constants(sess, > > > sess.graph_def, ["y"]) tf.train.write_graph(graph, '.', 'graph.pb', > > > as_text=False) > > > > > > And I also uploaded the model file under > > https://drive.google.com/drive/folders/1URsRY5g-VdE-kHlP5vQoLoimMIZ- > S > > X00?usp=sharing. > > > > > > The filter only works when tensorflow C api is supported in the > > > system, native backend is not supported since there are some > > > different types of layers in the deep CNN model, besides CONV and > > DEPTH_TO_SPACE. > > > > > > https://arxiv.org/pdf/1710.07480.pdf: > > > author = "Eilertsen, Gabriel and Kronander, Joel, and Denes, > > Gyorgy and Mantiuk, Rafał and Unger, Jonas", > > > title = "HDR image reconstruction from a single exposure using > > deep CNNs", > > > journal = "ACM Transactions on Graphics (TOG)", > > > number = "6", > > > volume = "36", > > > articleno = "178", > > > year = "2017" > > > > > > https://github.com/gabrieleilertsen/hdrcnn > > > > > > btw, as a whole solution, metadata should also be generated from the > > > sdr video, so to be encoded as a HDR video. Not supported yet. > > > This patch just focuses on this paper. > > > > > > This filter accepts 8bit frame (RGB24) and outputs 10bit/float > > > frame, and there's no reference image, so it is not feasible to use > > > criteria such as > > PNSR, SSIM. > > > > > > I choose the same method described in the paper to demo the filter > > > effect, that means the frames before/after the filter are reduced by > > > 3 > > stops. > > > > > > The native video (test.native.mp4) is created from 7 png files at > > > https://github.com/gabrieleilertsen/hdrcnn/tree/master/data (the > > > size of the image is enlarged to 1920*1080 with extra area filled > > > with white) > > with command line: > > > ffmpeg -f image2 -i ./img_%03d.png -c:v libx264 -preset veryslow > > > -crf 1 > > test.native.mp4. > > > > > > And two rgb24 videos are generated before/after the filter with -3 > > > stops by modifying the code a little, see in the video folder at the > > > google drive (the same place as where the model file locates). > > > > > > For your convenient, I also dump png files from generated videos and > > > combine the before/after pngs into one file, see in png folder at > > > the > > google drive. > > I see three limitations from the code but haven't been noted in the texi or > commit message: > 1. Only support one resolution: 1920x1080. Other resolution can't been > supported. > 2. RGB24 is the only input format can be supported. > 3. No meta data which may break encoder.
thanks, will add into texi. > > (Should be good if can remove any of them). > > > > > > > Signed-off-by: Guo, Yejun <yejun....@intel.com> > > > --- > > > configure | 1 + > > > doc/filters.texi | 36 +++++++ > > > libavfilter/Makefile | 1 + > > > libavfilter/allfilters.c | 1 + > > > libavfilter/vf_sdr2hdr.c | 268 > > > +++++++++++++++++++++++++++++++++++++++++++++++ > > > 5 files changed, 307 insertions(+) > > > create mode 100644 libavfilter/vf_sdr2hdr.c > > > > > > diff --git a/configure b/configure > > > index b02b4cc..19138e8 100755 > > > --- a/configure > > > +++ b/configure > > > @@ -3446,6 +3446,7 @@ sab_filter_deps="gpl swscale" > > > scale2ref_filter_deps="swscale" > > > scale_filter_deps="swscale" > > > scale_qsv_filter_deps="libmfx" > > > +sdr2hdr_filter_deps="libtensorflow" > > > select_filter_select="scene_sad" > > > sharpness_vaapi_filter_deps="vaapi" > > > showcqt_filter_deps="avcodec avformat swscale" > > > diff --git a/doc/filters.texi b/doc/filters.texi index > > > 0d9ff43..2e6a6af 100644 > > > --- a/doc/filters.texi > > > +++ b/doc/filters.texi > > > @@ -14868,6 +14868,42 @@ Scale a subtitle stream (b) to match the > > > main video (a) in size before overlayin @end example @end itemize > > > > > > +@section sdr2hdr > > > + > > > +HDR image generation from a single exposure using deep CNNs with > > TensorFlow C library. > > > + > > > +@itemize > > > +@item > > > +paper: see @url{https://arxiv.org/pdf/1710.07480.pdf} > > > + > > > +@item > > > +code with model and trained parameters: see > > > +@url{https://github.com/gabrieleilertsen/hdrcnn} > > > +@end itemize > > > + > > > +The filter accepts the following options: > > > + > > > +@table @option > > > + > > > +@item model_filename > > > +Set path to model file specifying network architecture and its > > > +parameters, can download from > > > > > +@url{https://drive.google.com/drive/folders/1URsRY5g-VdE-kHlP5vQoLoi > > m > > > +MIZ-SX00?usp=sharing} > > > + > > > +@item out_fmt > > > +the data format of the filter's output. > > > + > > > +It accepts the following values: > > > +@table @samp > > > +@item gbrpf32le > > > +force gbrpf32le output > > > + > > > +@item gbrp10le > > > +force gbrp10le output > > > +@end table > > > + > > > +Default value is @samp{gbrpf32le}. > > > + > > > +@end table > > > + > > > @anchor{selectivecolor} > > > @section selectivecolor > > > > > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile index > > > 7c6fc83..936a525 100644 > > > --- a/libavfilter/Makefile > > > +++ b/libavfilter/Makefile > > > @@ -365,6 +365,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER) > > += vf_convolution_opencl.o opencl.o > > > OBJS-$(CONFIG_SPLIT_FILTER) += split.o > > > OBJS-$(CONFIG_SPP_FILTER) += vf_spp.o > > > OBJS-$(CONFIG_SR_FILTER) += vf_sr.o > > > +OBJS-$(CONFIG_SDR2HDR_FILTER) += vf_sdr2hdr.o > > > OBJS-$(CONFIG_SSIM_FILTER) += vf_ssim.o > > framesync.o > > > OBJS-$(CONFIG_STEREO3D_FILTER) += vf_stereo3d.o > > > OBJS-$(CONFIG_STREAMSELECT_FILTER) += f_streamselect.o > > framesync.o > > > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > > > index > > > 484b080..622f9f3 100644 > > > --- a/libavfilter/allfilters.c > > > +++ b/libavfilter/allfilters.c > > > @@ -322,6 +322,7 @@ extern AVFilter ff_vf_scale_npp; extern AVFilter > > > ff_vf_scale_qsv; extern AVFilter ff_vf_scale_vaapi; extern AVFilter > > > ff_vf_scale2ref; > > > +extern AVFilter ff_vf_sdr2hdr; > > > extern AVFilter ff_vf_select; > > > extern AVFilter ff_vf_selectivecolor; extern AVFilter ff_vf_sendcmd; > > > diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c new > > > file mode 100644 index 0000000..85a58ea > > > --- /dev/null > > > +++ b/libavfilter/vf_sdr2hdr.c > > > @@ -0,0 +1,268 @@ > > > +/* > > > + * Copyright (c) 2018 Guo Yejun > > > + * > > > + * This file is part of FFmpeg. > > > + * > > > + * FFmpeg is free software; you can redistribute it and/or > > > + * modify it under the terms of the GNU Lesser General Public > > > + * License as published by the Free Software Foundation; either > > > + * version 2.1 of the License, or (at your option) any later version. > > > + * > > > + * FFmpeg is distributed in the hope that it will be useful, > > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > > GNU > > > + * Lesser General Public License for more details. > > > + * > > > + * You should have received a copy of the GNU Lesser General Public > > > + * License along with FFmpeg; if not, write to the Free Software > > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > > > +02110-1301 USA */ > > > + > > > +/** > > > + * @file > > > + * Filter implementing HDR image generation from a single exposure > > > +using > > deep CNNs. > > > + * https://arxiv.org/pdf/1710.07480.pdf > > > + */ > > > + > > > +#include "avfilter.h" > > > +#include "formats.h" > > > +#include "internal.h" > > > +#include "libavutil/opt.h" > > > +#include "libavutil/qsort.h" > > > +#include "libavformat/avio.h" > > > +#include "libswscale/swscale.h" > > > +#include "dnn_interface.h" > > > +#include <math.h> > > > + > > > +typedef struct SDR2HDRContext { > > > + const AVClass *class; > > > + > > > + char* model_filename; > > > + enum AVPixelFormat out_fmt; > > > + DNNModule* dnn_module; > > > + DNNModel* model; > > > + DNNData input, output; > > > +} SDR2HDRContext; > > > + > > > +#define OFFSET(x) offsetof(SDR2HDRContext, x) #define FLAGS > > > +AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM > static > > const > > > +AVOption sdr2hdr_options[] = { > > > + { "model_filename", "path to model file specifying network > > architecture and its parameters", OFFSET(model_filename), > > AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, FLAGS }, > > > + { "out_fmt", "the data format of the filter's output, it could > > > + be > > gbrpf32le [default] or gbrp10le", OFFSET(out_fmt), > > AV_OPT_TYPE_PIXEL_FMT, {.i64=AV_PIX_FMT_GBRPF32LE}, > AV_PIX_FMT_NONE, > > AV_PIX_FMT_NB - 1, FLAGS }, > > > + { NULL } > > > +}; > > > + > > > +AVFILTER_DEFINE_CLASS(sdr2hdr); > > > + > > > +static av_cold int init(AVFilterContext* context) { > > > + SDR2HDRContext* ctx = context->priv; > > > + > > > + if (ctx->out_fmt != AV_PIX_FMT_GBRPF32LE && ctx->out_fmt != > > AV_PIX_FMT_GBRP10LE) { > > > + av_log(context, AV_LOG_ERROR, "could not support the output > > format\n"); > > > + return AVERROR(ENOSYS); > > > + } > > > + > > > + ctx->dnn_module = ff_get_dnn_module(DNN_TF); > > > + if (!ctx->dnn_module){ > > > + av_log(context, AV_LOG_ERROR, "could not create DNN > > module for tensorflow backend\n"); > > > + return AVERROR(ENOMEM); > > > + } > > > + if (!ctx->model_filename){ > > > + av_log(context, AV_LOG_ERROR, "model file for network was > > not specified\n"); > > > + return AVERROR(EIO); > > > + } > > > + if (!ctx->dnn_module->load_model) { > > > + av_log(context, AV_LOG_ERROR, "load_model for network was > > not specified\n"); > > > + return AVERROR(EIO); > > > + } > > > + ctx->model = > > (ctx->dnn_module->load_model)(ctx->model_filename); > > > + if (!ctx->model){ > > > + av_log(context, AV_LOG_ERROR, "could not load DNN > > model\n"); > > > + return AVERROR(EIO); > > > + } > > > + return 0; > > > +} > > > + > > > +static int query_formats(AVFilterContext* context) { > > > + const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24, > > > + > > AV_PIX_FMT_NONE}; > > > + enum AVPixelFormat out_formats[2]; > > > + SDR2HDRContext* ctx = context->priv; > > > + AVFilterFormats* formats_list; > > > + int ret = 0; > > > + > > > + formats_list = ff_make_format_list(in_formats); > > > + if ((ret = ff_formats_ref(formats_list, > > &context->inputs[0]->out_formats)) < 0) > > > + return ret; > > > + > > > + out_formats[0] = ctx->out_fmt; > > > + out_formats[1] = AV_PIX_FMT_NONE; > > > + formats_list = ff_make_format_list(out_formats); > > > + if ((ret = ff_formats_ref(formats_list, > > &context->outputs[0]->in_formats)) < 0) > > > + return ret; > > > + > > > + return 0; > > > +} > > > + > > > +static int config_props(AVFilterLink* inlink) { > > > + AVFilterContext* context = inlink->dst; > > > + SDR2HDRContext* ctx = context->priv; > > > + AVFilterLink* outlink = context->outputs[0]; > > > + DNNReturnType result; > > > + > > > + // the dnn model is tied with resolution due to deconv layer of > > tensorflow > > > + // now just support 1920*1080 and so the magic numbers within > > > + this > > file > > > + if (inlink->w != 1920 || inlink->h != 1080) { > > > + av_log(context, AV_LOG_ERROR, "only support frame size with > > 1920*1080\n"); > > > + return AVERROR(ENOSYS); > > > + } > > > + > > > + ctx->input.width = 1920; > > > + ctx->input.height = 1088; //the model requires height is a > > > + multiple > > of 32, > > Would be better to avoid any hard code. I prefer something like: > > ctx->input.width = inlink->w; > ctx->input.height = FFALIGN(inlink->h, 32); > will fix. > > > > + ctx->input.channels = 3; > > > + > > > + result = (ctx->model->set_input_output)(ctx->model->model, > > &ctx->input, &ctx->output); > > > + if (result != DNN_SUCCESS){ > > > + av_log(context, AV_LOG_ERROR, "could not set input and > > output for the model\n"); > > > + return AVERROR(EIO); > > > + } > > > + > > > + memset(ctx->input.data, 0, ctx->input.channels * > > > + ctx->input.width * > > ctx->input.height * sizeof(float)); > > > + outlink->h = 1080; > > > + outlink->w = 1920; > > And also here: > outlink->h = inlink->h; > outlink->w = inlink->w; > > Then one more resolution supported, there is few code need to be changed. > will fix, I left the magic numbers just for a clear note. > > > + return 0; > > > +} > > > + > > > +static float qsort_comparison_function_float(const void *a, const > > > +void *b) { > > > + return *(const float *)a - *(const float *)b; } > > > + > > > +static int filter_frame(AVFilterLink* inlink, AVFrame* in) { > > > + DNNReturnType dnn_result = DNN_SUCCESS; > > > + AVFilterContext* context = inlink->dst; > > > + SDR2HDRContext* ctx = context->priv; > > > + AVFilterLink* outlink = context->outputs[0]; > > > + AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h); > > > + int total_pixels = in->height * in->width; > > > + > > > + if (!out){ > > > + av_log(context, AV_LOG_ERROR, "could not allocate memory > > for output frame\n"); > > > + av_frame_free(&in); > > > + return AVERROR(ENOMEM); > > > + } > > > + > > > + av_frame_copy_props(out, in); > > > + > > > + for (int i = 0; i < in->linesize[0] * in->height; ++i) { > > > + ctx->input.data[i] = in->data[0][i] / 255.0f; > > > + } > > > + > > > + dnn_result = (ctx->dnn_module->execute_model)(ctx->model); > > > + if (dnn_result != DNN_SUCCESS){ > > > + av_log(context, AV_LOG_ERROR, "failed to execute loaded > > model\n"); > > > + return AVERROR(EIO); > > > + } > > > + > > > + if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) { > > > + float* outg = (float*)out->data[0]; > > > + float* outb = (float*)out->data[1]; > > > + float* outr = (float*)out->data[2]; > > > + for (int i = 0; i < total_pixels; ++i) { > > > + float r = ctx->output.data[i*3]; > > > + float g = ctx->output.data[i*3+1]; > > > + float b = ctx->output.data[i*3+2]; > > > + outr[i] = r; > > > + outg[i] = g; > > > + outb[i] = b; > > > + } > > > + } else > > Would be better to change to "else if (fmt=gbrp10le)", and give an assert in > the below? > (I believe the format should be checked again though it has been checked in > the initialization stage) ok, will fix. > > { > > > + // here, we just use a rough mapping to the 10bit contents > > > + // meta data generation for HDR video encoding is not > > supported yet > > > + float* converted_data = (float*)av_malloc(total_pixels * 3 > > > + * > > sizeof(float)); > > > + int16_t* outg = (int16_t*)out->data[0]; > > > + int16_t* outb = (int16_t*)out->data[1]; > > > + int16_t* outr = (int16_t*)out->data[2]; > > > + > > > + float max = 1.0f; > > > + for (int i = 0; i < total_pixels * 3; ++i) { > > > + float d = ctx->output.data[i]; > > > + d = sqrt(d); > > > + converted_data[i] = d; > > > + max = FFMAX(d, max); > > > + } > > > + > > > + if (max > 1.0f) { > > > + AV_QSORT(converted_data, total_pixels * 3, float, > > qsort_comparison_function_float); > > > + // 0.5% pixels are clipped > > > + max = converted_data[(int)(total_pixels * 3 * 0.995)]; > > > + max = FFMAX(max, 1.0f); > > > + > > > + for (int i = 0; i < total_pixels * 3; ++i) { > > > + float d = ctx->output.data[i]; > > > + d = sqrt(d); > > > + d = FFMIN(d, max); > > > + converted_data[i] = d; > > > + } > > > + } > > > + > > > + for (int i = 0; i < total_pixels; ++i) { > > > + float r = converted_data[i*3]; > > > + float g = converted_data[i*3+1]; > > > + float b = converted_data[i*3+2]; > > > + outr[i] = r / max * 1023; > > > + outg[i] = g / max * 1023; > > > + outb[i] = b / max * 1023; > > > + } > > > + > > > + av_free(converted_data); > > > + } > > > + > > > + av_frame_free(&in); > > > + return ff_filter_frame(outlink, out); } > > > + > > > +static av_cold void uninit(AVFilterContext* context) { > > > + SDR2HDRContext* ctx = context->priv; > > > + > > > + if (ctx->dnn_module){ > > > + (ctx->dnn_module->free_model)(&ctx->model); > > > + av_freep(&ctx->dnn_module); > > > + } > > > +} > > > + > > > +static const AVFilterPad sdr2hdr_inputs[] = { > > > + { > > > + .name = "default", > > > + .type = AVMEDIA_TYPE_VIDEO, > > > + .config_props = config_props, > > > + .filter_frame = filter_frame, > > > + }, > > > + { NULL } > > > +}; > > > + > > > +static const AVFilterPad sdr2hdr_outputs[] = { > > > + { > > > + .name = "default", > > > + .type = AVMEDIA_TYPE_VIDEO, > > > + }, > > > + { NULL } > > > +}; > > > + > > > +AVFilter ff_vf_sdr2hdr = { > > > + .name = "sdr2hdr", > > > + .description = NULL_IF_CONFIG_SMALL("HDR image generation > > from a single exposure using deep CNNs."), > > > + .priv_size = sizeof(SDR2HDRContext), > > > + .init = init, > > > + .uninit = uninit, > > > + .query_formats = query_formats, > > > + .inputs = sdr2hdr_inputs, > > > + .outputs = sdr2hdr_outputs, > > > + .priv_class = &sdr2hdr_class, > > > + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, > > > +}; > > > -- > > > 2.7.4 > > > > > > _______________________________________________ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > I have tested this patch, have some question: > > > > 1. It must use Tensorflow, why don’t support native mode? the model file includes nearly 20 ops while only two ops is supported in native mode. > > 2. It is must input 1920x1080 resolution, can this add more resolution > > support? > > Yup, would be better if we can support more resolution. > With current tensorflow, the model file is tied with one resolution. A tricky is to generate more model files for each resolution, but it is not friendly to user. The nice solution is to fix tensorflow first and so a single model file can support all the solutions. > > 3. i looked into the project of hdrcnn, that License is BSD-3-Clause, > > Is this will have License problem? > > Should be not a problem since it hasn't involved any source code of this > project. I think so, thanks. > > > > Thanks > > > > Steven > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel