Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
Em qui., 7 de nov. de 2019 às 13:17, Guo, Yejun escreveu: > > > > From: Pedro Arthur [mailto:bygran...@gmail.com] > > > Sent: Thursday, November 07, 2019 1:18 AM > > > To: FFmpeg development discussions and patches > > > > > Cc: Guo, Yejun > > > Subject: Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add > a > > > generic filter for image proccessing with dnn networks > > > > > > Hi, > > > > > > Em qui., 31 de out. de 2019 às 05:39, Guo, Yejun > > > escreveu: > > > This filter accepts all the dnn networks which do image processing. > > > Currently, frame with formats rgb24 and bgr24 are supported. Other > > > formats such as gray and YUV will be supported next. The dnn network > > > can accept data in float32 or uint8 format. And the dnn network can > > > change frame size. > > > > > > The following is a python script to halve the value of the first > > > channel of the pixel. It demos how to setup and execute dnn model > > > with python+tensorflow. It also generates .pb file which will be > > > used by ffmpeg. > > > > > > import tensorflow as tf > > > import numpy as np > > > import scipy.misc > > > in_img = scipy.misc.imread('in.bmp') > > > in_img = in_img.astype(np.float32)/255.0 > > > in_data = in_img[np.newaxis, :] > > > filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, > > > 1.]).reshape(1,1,3,3).astype(np.float32) > > > filter = tf.Variable(filter_data) > > > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') > > > y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', > > name='dnn_out') > > > sess=tf.Session() > > > sess.run(tf.global_variables_initializer()) > > > output = sess.run(y, feed_dict={x: in_data}) > > > graph_def = tf.graph_util.convert_variables_to_constants(sess, > > > sess.graph_def, ['dnn_out']) > > > tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', > as_text=False) > > > output = output * 255.0 > > > output = output.astype(np.uint8) > > > scipy.misc.imsave("out.bmp", np.squeeze(output)) > > > > > > To do the same thing with ffmpeg: > > > - generate halve_first_channel.pb with the above script > > > - generate halve_first_channel.model with tools/python/convert.py > > > - try with following commands > > > ./ffmpeg -i input.jpg -vf > > > > > dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_ > > > out:fmt=rgb24:dnn_backend=native -y out.native.png > > > ./ffmpeg -i input.jpg -vf > > > > > dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:f > > > mt=rgb24:dnn_backend=tensorflow -y out.tf.png > > It would be great if you could transform the above steps in a fate test, > that > > way one can automatically ensure the filter is always working properly. > > sure, I'll add a fate test to test this filter with > halve_first_channel.model. There will > be no test for tensorflow part since the fate test requires no external > dependency. > > furthermore, more industry-famous models can be added into this fate test > after we support them by > adding more layers into native mode, and after we optimize the conv2d > layer which is now > very very very very slow. > > > > +}; > > > + > > > +AVFilter ff_vf_dnn_processing = { > > > +.name = "dnn_processing", > > > +.description = NULL_IF_CONFIG_SMALL("Apply DNN processing > > filter > > > to the input."), > > > +.priv_size = sizeof(DnnProcessingContext), > > > +.init = init, > > > +.uninit= uninit, > > > +.query_formats = query_formats, > > > +.inputs= dnn_processing_inputs, > > > +.outputs = dnn_processing_outputs, > > > +.priv_class= _processing_class, > > > +}; > > > -- > > > 2.7.4 > > rest LGTM. > > thanks, could we first push this patch? > patch pushed, thanks. I slight edited the commit message, changed "scipy.misc" to "imageio" as the former is deprecated and not present in newer versions. > I plan to add two more changes for this filter next: > - add gray8 and gray32 support > - add y_from_yuv support, in other words, the network only handles the Y > channel, > and uv parts are not changed (or just scaled), just like what vf_sr does. > > I currently do not have plan to add specific yuv formats, since I do not > see a famous > network which handles all the y u v channels. > > > > BTW do you have already concrete use cases (or plans) for this filter? > > not yet, the idea of this filter is that it is general for image > processing and should be very useful, > and my basic target is to at least cover the features provided by vf_sr > and vf_derain > > actually, I do have a use case plan for a general video analytic filter, > the side data type might be > a big challenge, I'm still thinking about it. I choose this image > processing filter first because > it is simpler and community can be familiar with dnn based filters step by > step. > > > > > > > > > > ___ > > > ffmpeg-devel mailing list > > >
Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
> > From: Pedro Arthur [mailto:bygran...@gmail.com] > > Sent: Thursday, November 07, 2019 1:18 AM > > To: FFmpeg development discussions and patches > > > Cc: Guo, Yejun > > Subject: Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a > > generic filter for image proccessing with dnn networks > > > > Hi, > > > > Em qui., 31 de out. de 2019 às 05:39, Guo, Yejun > > escreveu: > > This filter accepts all the dnn networks which do image processing. > > Currently, frame with formats rgb24 and bgr24 are supported. Other > > formats such as gray and YUV will be supported next. The dnn network > > can accept data in float32 or uint8 format. And the dnn network can > > change frame size. > > > > The following is a python script to halve the value of the first > > channel of the pixel. It demos how to setup and execute dnn model > > with python+tensorflow. It also generates .pb file which will be > > used by ffmpeg. > > > > import tensorflow as tf > > import numpy as np > > import scipy.misc > > in_img = scipy.misc.imread('in.bmp') > > in_img = in_img.astype(np.float32)/255.0 > > in_data = in_img[np.newaxis, :] > > filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, > > 1.]).reshape(1,1,3,3).astype(np.float32) > > filter = tf.Variable(filter_data) > > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') > > y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', > name='dnn_out') > > sess=tf.Session() > > sess.run(tf.global_variables_initializer()) > > output = sess.run(y, feed_dict={x: in_data}) > > graph_def = tf.graph_util.convert_variables_to_constants(sess, > > sess.graph_def, ['dnn_out']) > > tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', > > as_text=False) > > output = output * 255.0 > > output = output.astype(np.uint8) > > scipy.misc.imsave("out.bmp", np.squeeze(output)) > > > > To do the same thing with ffmpeg: > > - generate halve_first_channel.pb with the above script > > - generate halve_first_channel.model with tools/python/convert.py > > - try with following commands > > ./ffmpeg -i input.jpg -vf > > > dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_ > > out:fmt=rgb24:dnn_backend=native -y out.native.png > > ./ffmpeg -i input.jpg -vf > > > dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:f > > mt=rgb24:dnn_backend=tensorflow -y out.tf.png > It would be great if you could transform the above steps in a fate test, that > way one can automatically ensure the filter is always working properly. sure, I'll add a fate test to test this filter with halve_first_channel.model. There will be no test for tensorflow part since the fate test requires no external dependency. furthermore, more industry-famous models can be added into this fate test after we support them by adding more layers into native mode, and after we optimize the conv2d layer which is now very very very very slow. > > +}; > > + > > +AVFilter ff_vf_dnn_processing = { > > + .name = "dnn_processing", > > + .description = NULL_IF_CONFIG_SMALL("Apply DNN processing > filter > > to the input."), > > + .priv_size = sizeof(DnnProcessingContext), > > + .init = init, > > + .uninit = uninit, > > + .query_formats = query_formats, > > + .inputs = dnn_processing_inputs, > > + .outputs = dnn_processing_outputs, > > + .priv_class = _processing_class, > > +}; > > -- > > 2.7.4 > rest LGTM. thanks, could we first push this patch? I plan to add two more changes for this filter next: - add gray8 and gray32 support - add y_from_yuv support, in other words, the network only handles the Y channel, and uv parts are not changed (or just scaled), just like what vf_sr does. I currently do not have plan to add specific yuv formats, since I do not see a famous network which handles all the y u v channels. > BTW do you have already concrete use cases (or plans) for this filter? not yet, the idea of this filter is that it is general for image processing and should be very useful, and my basic target is to at least cover the features provided by vf_sr and vf_derain actually, I do have a use case plan for a general video analytic filter, the side data type might be a big challenge, I'm still thinking about it. I choose this image processing filter first because it is simpler and community can be familiar with dnn based filters step by step. > > > > > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
Hi, Em qui., 31 de out. de 2019 às 05:39, Guo, Yejun escreveu: > This filter accepts all the dnn networks which do image processing. > Currently, frame with formats rgb24 and bgr24 are supported. Other > formats such as gray and YUV will be supported next. The dnn network > can accept data in float32 or uint8 format. And the dnn network can > change frame size. > > The following is a python script to halve the value of the first > channel of the pixel. It demos how to setup and execute dnn model > with python+tensorflow. It also generates .pb file which will be > used by ffmpeg. > > import tensorflow as tf > import numpy as np > import scipy.misc > in_img = scipy.misc.imread('in.bmp') > in_img = in_img.astype(np.float32)/255.0 > in_data = in_img[np.newaxis, :] > filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, > 1.]).reshape(1,1,3,3).astype(np.float32) > filter = tf.Variable(filter_data) > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') > y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', > name='dnn_out') > sess=tf.Session() > sess.run(tf.global_variables_initializer()) > output = sess.run(y, feed_dict={x: in_data}) > graph_def = tf.graph_util.convert_variables_to_constants(sess, > sess.graph_def, ['dnn_out']) > tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', > as_text=False) > output = output * 255.0 > output = output.astype(np.uint8) > scipy.misc.imsave("out.bmp", np.squeeze(output)) > > To do the same thing with ffmpeg: > - generate halve_first_channel.pb with the above script > - generate halve_first_channel.model with tools/python/convert.py > - try with following commands > ./ffmpeg -i input.jpg -vf > dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=native > -y out.native.png > ./ffmpeg -i input.jpg -vf > dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=tensorflow > -y out.tf.png > It would be great if you could transform the above steps in a fate test, that way one can automatically ensure the filter is always working properly. > > Signed-off-by: Guo, Yejun > --- > configure | 1 + > doc/filters.texi| 44 ++ > libavfilter/Makefile| 1 + > libavfilter/allfilters.c| 1 + > libavfilter/vf_dnn_processing.c | 331 > > 5 files changed, 378 insertions(+) > create mode 100644 libavfilter/vf_dnn_processing.c > > diff --git a/configure b/configure > index 875b77f..4b3964d 100755 > --- a/configure > +++ b/configure > @@ -3463,6 +3463,7 @@ derain_filter_select="dnn" > deshake_filter_select="pixelutils" > deshake_opencl_filter_deps="opencl" > dilation_opencl_filter_deps="opencl" > +dnn_processing_filter_select="dnn" > drawtext_filter_deps="libfreetype" > drawtext_filter_suggest="libfontconfig libfribidi" > elbg_filter_deps="avcodec" > diff --git a/doc/filters.texi b/doc/filters.texi > index 9d387be..15771ab 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -8928,6 +8928,50 @@ ffmpeg -i INPUT -f lavfi -i > nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2 > @end example > @end itemize > > +@section dnn_processing > + > +Do image processing with deep neural networks. Currently only AVFrame > with RGB24 > +and BGR24 are supported, more formats will be added later. > + > +The filter accepts the following options: > + > +@table @option > +@item dnn_backend > +Specify which DNN backend to use for model loading and execution. This > option accepts > +the following values: > + > +@table @samp > +@item native > +Native implementation of DNN loading and execution. > + > +@item tensorflow > +TensorFlow backend. To enable this backend you > +need to install the TensorFlow for C library (see > +@url{https://www.tensorflow.org/install/install_c}) and configure FFmpeg > with > +@code{--enable-libtensorflow} > +@end table > + > +Default value is @samp{native}. > + > +@item model > +Set path to model file specifying network architecture and its parameters. > +Note that different backends use different file formats. TensorFlow and > native > +backend can load files for only its format. > + > +Native model file (.model) can be generated from TensorFlow model file > (.pb) by using tools/python/convert.py > + > +@item input > +Set the input name of the dnn network. > + > +@item output > +Set the output name of the dnn network. > + > +@item fmt > +Set the pixel format for the Frame. Allowed values are > @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}. > +Default value is @code{AV_PIX_FMT_RGB24}. > + > +@end table > + > @section drawbox > > Draw a colored box on the input image. > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index 2080eed..3eff398 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -223,6 +223,7 @@ OBJS-$(CONFIG_DILATION_FILTER) += > vf_neighbor.o >
Re: [FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
> -Original Message- > From: Guo, Yejun > Sent: Thursday, October 31, 2019 4:33 PM > To: ffmpeg-devel@ffmpeg.org > Cc: Guo, Yejun > Subject: [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image > proccessing with dnn networks > > This filter accepts all the dnn networks which do image processing. > Currently, frame with formats rgb24 and bgr24 are supported. Other > formats such as gray and YUV will be supported next. The dnn network > can accept data in float32 or uint8 format. And the dnn network can > change frame size. > > The following is a python script to halve the value of the first > channel of the pixel. It demos how to setup and execute dnn model > with python+tensorflow. It also generates .pb file which will be > used by ffmpeg. > > import tensorflow as tf > import numpy as np > import scipy.misc > in_img = scipy.misc.imread('in.bmp') > in_img = in_img.astype(np.float32)/255.0 > in_data = in_img[np.newaxis, :] > filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, > 1.]).reshape(1,1,3,3).astype(np.float32) > filter = tf.Variable(filter_data) > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') > y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', > name='dnn_out') > sess=tf.Session() > sess.run(tf.global_variables_initializer()) > output = sess.run(y, feed_dict={x: in_data}) > graph_def = tf.graph_util.convert_variables_to_constants(sess, > sess.graph_def, ['dnn_out']) > tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', as_text=False) > output = output * 255.0 > output = output.astype(np.uint8) > scipy.misc.imsave("out.bmp", np.squeeze(output)) > > To do the same thing with ffmpeg: > - generate halve_first_channel.pb with the above script > - generate halve_first_channel.model with tools/python/convert.py > - try with following commands > ./ffmpeg -i input.jpg -vf > dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_ > out:fmt=rgb24:dnn_backend=native -y out.native.png > ./ffmpeg -i input.jpg -vf > dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:f > mt=rgb24:dnn_backend=tensorflow -y out.tf.png > > Signed-off-by: Guo, Yejun > --- > configure | 1 + > doc/filters.texi| 44 ++ > libavfilter/Makefile| 1 + > libavfilter/allfilters.c| 1 + > libavfilter/vf_dnn_processing.c | 331 > this patch ask for review, thanks. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH V3] avfilter/vf_dnn_processing: add a generic filter for image proccessing with dnn networks
This filter accepts all the dnn networks which do image processing. Currently, frame with formats rgb24 and bgr24 are supported. Other formats such as gray and YUV will be supported next. The dnn network can accept data in float32 or uint8 format. And the dnn network can change frame size. The following is a python script to halve the value of the first channel of the pixel. It demos how to setup and execute dnn model with python+tensorflow. It also generates .pb file which will be used by ffmpeg. import tensorflow as tf import numpy as np import scipy.misc in_img = scipy.misc.imread('in.bmp') in_img = in_img.astype(np.float32)/255.0 in_data = in_img[np.newaxis, :] filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, 1.]).reshape(1,1,3,3).astype(np.float32) filter = tf.Variable(filter_data) x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', name='dnn_out') sess=tf.Session() sess.run(tf.global_variables_initializer()) output = sess.run(y, feed_dict={x: in_data}) graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['dnn_out']) tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', as_text=False) output = output * 255.0 output = output.astype(np.uint8) scipy.misc.imsave("out.bmp", np.squeeze(output)) To do the same thing with ffmpeg: - generate halve_first_channel.pb with the above script - generate halve_first_channel.model with tools/python/convert.py - try with following commands ./ffmpeg -i input.jpg -vf dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=native -y out.native.png ./ffmpeg -i input.jpg -vf dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:fmt=rgb24:dnn_backend=tensorflow -y out.tf.png Signed-off-by: Guo, Yejun --- configure | 1 + doc/filters.texi| 44 ++ libavfilter/Makefile| 1 + libavfilter/allfilters.c| 1 + libavfilter/vf_dnn_processing.c | 331 5 files changed, 378 insertions(+) create mode 100644 libavfilter/vf_dnn_processing.c diff --git a/configure b/configure index 875b77f..4b3964d 100755 --- a/configure +++ b/configure @@ -3463,6 +3463,7 @@ derain_filter_select="dnn" deshake_filter_select="pixelutils" deshake_opencl_filter_deps="opencl" dilation_opencl_filter_deps="opencl" +dnn_processing_filter_select="dnn" drawtext_filter_deps="libfreetype" drawtext_filter_suggest="libfontconfig libfribidi" elbg_filter_deps="avcodec" diff --git a/doc/filters.texi b/doc/filters.texi index 9d387be..15771ab 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -8928,6 +8928,50 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2 @end example @end itemize +@section dnn_processing + +Do image processing with deep neural networks. Currently only AVFrame with RGB24 +and BGR24 are supported, more formats will be added later. + +The filter accepts the following options: + +@table @option +@item dnn_backend +Specify which DNN backend to use for model loading and execution. This option accepts +the following values: + +@table @samp +@item native +Native implementation of DNN loading and execution. + +@item tensorflow +TensorFlow backend. To enable this backend you +need to install the TensorFlow for C library (see +@url{https://www.tensorflow.org/install/install_c}) and configure FFmpeg with +@code{--enable-libtensorflow} +@end table + +Default value is @samp{native}. + +@item model +Set path to model file specifying network architecture and its parameters. +Note that different backends use different file formats. TensorFlow and native +backend can load files for only its format. + +Native model file (.model) can be generated from TensorFlow model file (.pb) by using tools/python/convert.py + +@item input +Set the input name of the dnn network. + +@item output +Set the output name of the dnn network. + +@item fmt +Set the pixel format for the Frame. Allowed values are @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}. +Default value is @code{AV_PIX_FMT_RGB24}. + +@end table + @section drawbox Draw a colored box on the input image. diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 2080eed..3eff398 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -223,6 +223,7 @@ OBJS-$(CONFIG_DILATION_FILTER) += vf_neighbor.o OBJS-$(CONFIG_DILATION_OPENCL_FILTER)+= vf_neighbor_opencl.o opencl.o \ opencl/neighbor.o OBJS-$(CONFIG_DISPLACE_FILTER) += vf_displace.o framesync.o +OBJS-$(CONFIG_DNN_PROCESSING_FILTER) += vf_dnn_processing.o OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)+= vf_weave.o OBJS-$(CONFIG_DRAWBOX_FILTER)+= vf_drawbox.o OBJS-$(CONFIG_DRAWGRAPH_FILTER) += f_drawgraph.o diff --git