Re: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for avg_pool

2020-07-20 Thread Fu, Ting


> -Original Message-
> From: ffmpeg-devel  On Behalf Of Guo,
> Yejun
> Sent: Monday, July 20, 2020 01:46 PM
> To: FFmpeg development discussions and patches 
> Subject: Re: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for
> avg_pool
> 
> 
> 
> > -Original Message-
> > From: ffmpeg-devel  On Behalf Of Ting
> > Fu
> > Sent: 2020年7月17日 23:23
> > To: ffmpeg-devel@ffmpeg.org
> > Subject: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for
> > avg_pool
> >
> > It can be tested with the model generated with below python script:
> >
> > import tensorflow as tf
> > import numpy as np
> > import imageio
> >
> > in_img = imageio.imread('input_odd.jpg') in_img =
> > in_img.astype(np.float32)/255.0 in_data = in_img[np.newaxis, :]
> >
> > x = tf.placeholder(tf.float32, shape=[1, None, None, 3],
> > name='dnn_in') x_pool = tf.nn.avg_pool(x, ksize=[1,2,2,1],
> > strides=[1,2,2,1], padding='SAME') #please alter the params as needed
> > y = tf.identity(x_pool, name='dnn_out')
> >
> > sess=tf.Session()
> > sess.run(tf.global_variables_initializer())
> >
> > graph_def = tf.graph_util.convert_variables_to_constants(sess,
> > sess.graph_def,
> > ['dnn_out'])
> > tf.train.write_graph(graph_def, '.', 'image_process.pb',
> > as_text=False)
> >
> > print("image_process.pb generated, please use \
> > path_to_ffmpeg/tools/python/convert.py to generate
> > image_process.model\n")
> >
> > output = sess.run(y, feed_dict={x: in_data}) imageio.imsave("out.jpg",
> > np.squeeze(output))
> >
> > Signed-off-by: Ting Fu 
> > ---
> >  libavfilter/dnn/Makefile  |   1 +
> >  libavfilter/dnn/dnn_backend_native.h  |   2 +
> >  .../dnn/dnn_backend_native_layer_avgpool.c| 136 ++
> >  .../dnn/dnn_backend_native_layer_avgpool.h|  35 +
> >  .../dnn/dnn_backend_native_layer_conv2d.h |   3 +-
> >  libavfilter/dnn/dnn_backend_native_layers.c   |   2 +
> >  tools/python/convert_from_tensorflow.py   |  31 +++-
> >  7 files changed, 207 insertions(+), 3 deletions(-)  create mode
> > 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.c
> >  create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.h
> >
[...]
> > +int32_t input_operand_index = input_operand_indexes[0];
> > +int number = operands[input_operand_index].dims[0];
> > +int height = operands[input_operand_index].dims[1];
> > +int width = operands[input_operand_index].dims[2];
> > +int channel = operands[input_operand_index].dims[3];
> 
> the input channel should come from here, not in AvgPoolParams.
> And so as output channel.

HI Yejun,

I got it that the in_channel should come from here. Does the 'so as output 
channel' mean out_channel = in_channel here (since the pooling of channel is 
not supported)?

> 
> > +const float *input = operands[input_operand_index].data;
> > +const AvgPoolParams *avgpool_params = (const AvgPoolParams
> > *)parameters;
> > +
> > +float kernel_strides = avgpool_params->strides;
> 
> why float?

In order to calculate height/kernel_strides with float output in following 
ceil(). Or should I multiply kernel_strides with 1.0  when using ceil function?

> 
> > +int src_linesize = width * avgpool_params->in_channels;
> > +DnnOperand *output_operand = [output_operand_index];
> > +
> > +if (avgpool_params->padding_method == SAME) {
> > +height_end = height;
> > +width_end = width;
> > +height_radius = (avgpool_params->kernel_size - ((height - 1)
> > + % (int)
> > kernel_strides + 1));
> 
> don't need the first '(' and last ')'.

OK

> 
> why we need to consider kernel_strides here?

Because when padding_method=SAME, the tensorflow will only padding the half 
number of 0 pixels except the remainders.
Eg: if the width is 1080, strides=11, so the 1080%11=2
And if ksize=5, it will fill (5-2)>>1=1 column before image and 
2 columns after the image.
And if ksize=2, so 2-2=0, so the remainder pixels just meet the 
need of calculating one time pooling, so no 0 pixels will be filled.
Which means the numbers of filling 0-pixels rely on the remainder-pixels.
Does the example make any sense?

> 
> > +width_radius = (avgpool_params->kernel_size - ((width - 1) %
> > + (int)
> > kernel_strides + 1));
> 
> same as above.
> 
> > +height_radius = height_radius < 0 ? 0 : height_radi

Re: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for avg_pool

2020-07-19 Thread Guo, Yejun


> -Original Message-
> From: ffmpeg-devel  On Behalf Of Ting Fu
> Sent: 2020年7月17日 23:23
> To: ffmpeg-devel@ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for
> avg_pool
> 
> It can be tested with the model generated with below python script:
> 
> import tensorflow as tf
> import numpy as np
> import imageio
> 
> in_img = imageio.imread('input_odd.jpg')
> in_img = in_img.astype(np.float32)/255.0
> in_data = in_img[np.newaxis, :]
> 
> x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in')
> x_pool = tf.nn.avg_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
> #please alter the params as needed
> y = tf.identity(x_pool, name='dnn_out')
> 
> sess=tf.Session()
> sess.run(tf.global_variables_initializer())
> 
> graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def,
> ['dnn_out'])
> tf.train.write_graph(graph_def, '.', 'image_process.pb', as_text=False)
> 
> print("image_process.pb generated, please use \
> path_to_ffmpeg/tools/python/convert.py to generate image_process.model\n")
> 
> output = sess.run(y, feed_dict={x: in_data})
> imageio.imsave("out.jpg", np.squeeze(output))
> 
> Signed-off-by: Ting Fu 
> ---
>  libavfilter/dnn/Makefile  |   1 +
>  libavfilter/dnn/dnn_backend_native.h  |   2 +
>  .../dnn/dnn_backend_native_layer_avgpool.c| 136 ++
>  .../dnn/dnn_backend_native_layer_avgpool.h|  35 +
>  .../dnn/dnn_backend_native_layer_conv2d.h |   3 +-
>  libavfilter/dnn/dnn_backend_native_layers.c   |   2 +
>  tools/python/convert_from_tensorflow.py   |  31 +++-
>  7 files changed, 207 insertions(+), 3 deletions(-)
>  create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.c
>  create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.h
> 
> diff --git a/libavfilter/dnn/Makefile b/libavfilter/dnn/Makefile
> index d90137ec42..e0957073ee 100644
> --- a/libavfilter/dnn/Makefile
> +++ b/libavfilter/dnn/Makefile
> @@ -1,6 +1,7 @@
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_interface.o
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native.o
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native_layers.o
> +OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native_layer_avgpool.o
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native_layer_pad.o
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native_layer_conv2d.o
>  OBJS-$(CONFIG_DNN)   +=
> dnn/dnn_backend_native_layer_depth2space.o
> diff --git a/libavfilter/dnn/dnn_backend_native.h
> b/libavfilter/dnn/dnn_backend_native.h
> index 62191ffe88..26e9a33387 100644
> --- a/libavfilter/dnn/dnn_backend_native.h
> +++ b/libavfilter/dnn/dnn_backend_native.h
> @@ -43,10 +43,12 @@ typedef enum {
>  DLT_MAXIMUM = 4,
>  DLT_MATH_BINARY = 5,
>  DLT_MATH_UNARY = 6,
> +DLT_AVG_POOL = 7,
>  DLT_COUNT
>  } DNNLayerType;
> 
>  typedef enum {DOT_INPUT = 1, DOT_OUTPUT = 2, DOT_INTERMEDIATE =
> DOT_INPUT | DOT_OUTPUT} DNNOperandType;
> +typedef enum {VALID, SAME, SAME_CLAMP_TO_EDGE} DNNPaddingParam;
> 
>  typedef struct Layer{
>  DNNLayerType type;
> diff --git a/libavfilter/dnn/dnn_backend_native_layer_avgpool.c
> b/libavfilter/dnn/dnn_backend_native_layer_avgpool.c
> new file mode 100644
> index 00..f5a3f4a0dc
> --- /dev/null
> +++ b/libavfilter/dnn/dnn_backend_native_layer_avgpool.c
> @@ -0,0 +1,136 @@
> +/*
> + * Copyright (c) 2020
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +/**
> + * @file
> + * DNN native backend implementation.
> + */
> +
> +#include "libavutil/avassert.h"
> +#include "dnn_backend_native_layer_avgpool.h"
> +
> +int dnn_load_layer_avg_pool(Layer

[FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for avg_pool

2020-07-17 Thread Ting Fu
It can be tested with the model generated with below python script:

import tensorflow as tf
import numpy as np
import imageio

in_img = imageio.imread('input_odd.jpg')
in_img = in_img.astype(np.float32)/255.0
in_data = in_img[np.newaxis, :]

x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in')
x_pool = tf.nn.avg_pool(x, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME') 
#please alter the params as needed
y = tf.identity(x_pool, name='dnn_out')

sess=tf.Session()
sess.run(tf.global_variables_initializer())

graph_def = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, 
['dnn_out'])
tf.train.write_graph(graph_def, '.', 'image_process.pb', as_text=False)

print("image_process.pb generated, please use \
path_to_ffmpeg/tools/python/convert.py to generate image_process.model\n")

output = sess.run(y, feed_dict={x: in_data})
imageio.imsave("out.jpg", np.squeeze(output))

Signed-off-by: Ting Fu 
---
 libavfilter/dnn/Makefile  |   1 +
 libavfilter/dnn/dnn_backend_native.h  |   2 +
 .../dnn/dnn_backend_native_layer_avgpool.c| 136 ++
 .../dnn/dnn_backend_native_layer_avgpool.h|  35 +
 .../dnn/dnn_backend_native_layer_conv2d.h |   3 +-
 libavfilter/dnn/dnn_backend_native_layers.c   |   2 +
 tools/python/convert_from_tensorflow.py   |  31 +++-
 7 files changed, 207 insertions(+), 3 deletions(-)
 create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.c
 create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.h

diff --git a/libavfilter/dnn/Makefile b/libavfilter/dnn/Makefile
index d90137ec42..e0957073ee 100644
--- a/libavfilter/dnn/Makefile
+++ b/libavfilter/dnn/Makefile
@@ -1,6 +1,7 @@
 OBJS-$(CONFIG_DNN)   += dnn/dnn_interface.o
 OBJS-$(CONFIG_DNN)   += dnn/dnn_backend_native.o
 OBJS-$(CONFIG_DNN)   += dnn/dnn_backend_native_layers.o
+OBJS-$(CONFIG_DNN)   += 
dnn/dnn_backend_native_layer_avgpool.o
 OBJS-$(CONFIG_DNN)   += 
dnn/dnn_backend_native_layer_pad.o
 OBJS-$(CONFIG_DNN)   += 
dnn/dnn_backend_native_layer_conv2d.o
 OBJS-$(CONFIG_DNN)   += 
dnn/dnn_backend_native_layer_depth2space.o
diff --git a/libavfilter/dnn/dnn_backend_native.h 
b/libavfilter/dnn/dnn_backend_native.h
index 62191ffe88..26e9a33387 100644
--- a/libavfilter/dnn/dnn_backend_native.h
+++ b/libavfilter/dnn/dnn_backend_native.h
@@ -43,10 +43,12 @@ typedef enum {
 DLT_MAXIMUM = 4,
 DLT_MATH_BINARY = 5,
 DLT_MATH_UNARY = 6,
+DLT_AVG_POOL = 7,
 DLT_COUNT
 } DNNLayerType;
 
 typedef enum {DOT_INPUT = 1, DOT_OUTPUT = 2, DOT_INTERMEDIATE = DOT_INPUT | 
DOT_OUTPUT} DNNOperandType;
+typedef enum {VALID, SAME, SAME_CLAMP_TO_EDGE} DNNPaddingParam;
 
 typedef struct Layer{
 DNNLayerType type;
diff --git a/libavfilter/dnn/dnn_backend_native_layer_avgpool.c 
b/libavfilter/dnn/dnn_backend_native_layer_avgpool.c
new file mode 100644
index 00..f5a3f4a0dc
--- /dev/null
+++ b/libavfilter/dnn/dnn_backend_native_layer_avgpool.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2020
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * DNN native backend implementation.
+ */
+
+#include "libavutil/avassert.h"
+#include "dnn_backend_native_layer_avgpool.h"
+
+int dnn_load_layer_avg_pool(Layer *layer, AVIOContext *model_file_context, int 
file_size, int operands_num)
+{
+AvgPoolParams *avgpool_params;
+int dnn_size = 0;
+avgpool_params = av_malloc(sizeof(*avgpool_params));
+if(!avgpool_params)
+return 0;
+
+avgpool_params->strides = (int32_t)avio_rl32(model_file_context);
+avgpool_params->padding_method = (int32_t)avio_rl32(model_file_context);
+avgpool_params->in_channels = (int32_t)avio_rl32(model_file_context);
+avgpool_params->out_channels = (int32_t)avio_rl32(model_file_context);
+avgpool_params->kernel_size = (int32_t)avio_rl32(model_file_context);
+dnn_size += 20;
+
+if (dnn_size > file_size || avgpool_params->in_channels <= 0 ||
+avgpool_params->out_channels <= 0 || avgpool_params->kernel_size <= 0 
||
+avgpool_params->strides <=0){
+