> -----Original Message----- > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Guo, > Yejun > Sent: Monday, July 20, 2020 01:46 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for > avg_pool > > > > > -----Original Message----- > > From: ffmpeg-devel <ffmpeg-devel-boun...@ffmpeg.org> On Behalf Of Ting > > Fu > > Sent: 2020年7月17日 23:23 > > To: ffmpeg-devel@ffmpeg.org > > Subject: [FFmpeg-devel] [PATCH 1/2] dnn/native: add native support for > > avg_pool > > > > It can be tested with the model generated with below python script: > > > > import tensorflow as tf > > import numpy as np > > import imageio > > > > in_img = imageio.imread('input_odd.jpg') in_img = > > in_img.astype(np.float32)/255.0 in_data = in_img[np.newaxis, :] > > > > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], > > name='dnn_in') x_pool = tf.nn.avg_pool(x, ksize=[1,2,2,1], > > strides=[1,2,2,1], padding='SAME') #please alter the params as needed > > y = tf.identity(x_pool, name='dnn_out') > > > > sess=tf.Session() > > sess.run(tf.global_variables_initializer()) > > > > graph_def = tf.graph_util.convert_variables_to_constants(sess, > > sess.graph_def, > > ['dnn_out']) > > tf.train.write_graph(graph_def, '.', 'image_process.pb', > > as_text=False) > > > > print("image_process.pb generated, please use \ > > path_to_ffmpeg/tools/python/convert.py to generate > > image_process.model\n") > > > > output = sess.run(y, feed_dict={x: in_data}) imageio.imsave("out.jpg", > > np.squeeze(output)) > > > > Signed-off-by: Ting Fu <ting...@intel.com> > > --- > > libavfilter/dnn/Makefile | 1 + > > libavfilter/dnn/dnn_backend_native.h | 2 + > > .../dnn/dnn_backend_native_layer_avgpool.c | 136 ++++++++++++++++++ > > .../dnn/dnn_backend_native_layer_avgpool.h | 35 +++++ > > .../dnn/dnn_backend_native_layer_conv2d.h | 3 +- > > libavfilter/dnn/dnn_backend_native_layers.c | 2 + > > tools/python/convert_from_tensorflow.py | 31 +++- > > 7 files changed, 207 insertions(+), 3 deletions(-) create mode > > 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.c > > create mode 100644 libavfilter/dnn/dnn_backend_native_layer_avgpool.h > > [...] > > + int32_t input_operand_index = input_operand_indexes[0]; > > + int number = operands[input_operand_index].dims[0]; > > + int height = operands[input_operand_index].dims[1]; > > + int width = operands[input_operand_index].dims[2]; > > + int channel = operands[input_operand_index].dims[3]; > > the input channel should come from here, not in AvgPoolParams. > And so as output channel.
HI Yejun, I got it that the in_channel should come from here. Does the 'so as output channel' mean out_channel = in_channel here (since the pooling of channel is not supported)? > > > + const float *input = operands[input_operand_index].data; > > + const AvgPoolParams *avgpool_params = (const AvgPoolParams > > *)parameters; > > + > > + float kernel_strides = avgpool_params->strides; > > why float? In order to calculate height/kernel_strides with float output in following ceil(). Or should I multiply kernel_strides with 1.0 when using ceil function? > > > + int src_linesize = width * avgpool_params->in_channels; > > + DnnOperand *output_operand = &operands[output_operand_index]; > > + > > + if (avgpool_params->padding_method == SAME) { > > + height_end = height; > > + width_end = width; > > + height_radius = (avgpool_params->kernel_size - ((height - 1) > > + % (int) > > kernel_strides + 1)); > > don't need the first '(' and last ')'. OK > > why we need to consider kernel_strides here? Because when padding_method=SAME, the tensorflow will only padding the half number of 0 pixels except the remainders. Eg: if the width is 1080, strides=11, so the 1080%11=2 And if ksize=5, it will fill (5-2)>>1=1 column before image and 2 columns after the image. And if ksize=2, so 2-2=0, so the remainder pixels just meet the need of calculating one time pooling, so no 0 pixels will be filled. Which means the numbers of filling 0-pixels rely on the remainder-pixels. Does the example make any sense? > > > + width_radius = (avgpool_params->kernel_size - ((width - 1) % > > + (int) > > kernel_strides + 1)); > > same as above. > > > + height_radius = height_radius < 0 ? 0 : height_radius >> 1; > > + width_radius = width_radius < 0 ? 0 : width_radius >> 1; [...] > > + for (int y = 0; y < height_end; y += kernel_strides) { > > + for (int x = 0; x < width_end; x += kernel_strides) { > > + for (int n_filter = 0; n_filter < > > + avgpool_params->out_channels; > > ++n_filter) { > [] > better to use n_channel, instead of n_filter. Sure > > > + output[n_filter] = 0.0; > > + kernel_area = 0; [...] > > + def dump_avg_pool_to_file(self, node, f): > > + assert(node.op == 'AvgPool') > > + self.layer_number = self.layer_number + 1 > > + self.converted_nodes.add(node.name) > > + node0 = self.name_node_dict[node.input[0]] > > + strides = node.attr['strides'] > > + assert(strides.list.i[1]==strides.list.i[2]) > > + strides = strides.list.i[1] > > + filter_node = node.attr['ksize'] > > + input_name = node.input[0] > [] > we can save strides[4] and ksize[4] in .model file, and do part support in .c > file. Do you mean save all 4 numbers of strides and ksize in .model file, and extract the number we need in .c file? > > > + > > + filter_height = filter_node.list.i[1] > > + filter_width = filter_node.list.i[2] > > + > > + in_channels = node0.attr['shape'].shape.dim[3].size > > + out_channels = in_channels > > + padding = node.attr['padding'].s.decode("utf-8") > > + np.array([self.op2code[node.op], strides, > > + self.pool_paddings[padding], > > in_channels, out_channels, > > + filter_height],dtype=np.uint32).tofile(f) > > + > > + input_operand_index = self.add_operand(input_name, > > Operand.IOTYPE_INPUT) > > + output_operand_index = self.add_operand(node.name, > > Operand.IOTYPE_OUTPUT) > > + np.array([input_operand_index, > > output_operand_index],dtype=np.uint32).tofile(f) > > + > > + > > def dump_layers_to_file(self, f): > > for node in self.nodes: > > if node.name in self.converted_nodes: > > @@ -311,6 +338,8 @@ class TFConverter: > > > > if node.op == 'Conv2D': > > self.dump_simple_conv2d_to_file(node, f) > > + if node.op == 'AvgPool': > > + self.dump_avg_pool_to_file(node, f) > > elif node.op == 'DepthToSpace': > > self.dump_depth2space_to_file(node, f) > > elif node.op == 'MirrorPad': > > -- > > 2.17.1 > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org > with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".