[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223959389 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: @zheng-da Now graph partitioning may change the order of inputs. This is a basic requirement from graph optimization purpose. For example, if we found there's a redundant op that doesn't contribute to graph output, shouldn't we remove it? If the answer is yes, then the input list will be changed. Besides, current API design provides bad support for inputs with same name(if you treat the undocumented behavior is a kind of support). We should fix this anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223937667 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: Yeah, I understand that you can find the order from the code, but it's not guaranteed, and you shouldn't make any assumption based on that, as it may change in future because the order isn't a part of list_arguments() spec, at least for now. I guess we should answer this question first, which is, should we support inputs with same name? If the answer is shouldn't, then we need to rename those inputs with same name in unit test and add corresponding check and document to disallow user using like that. If the answer is we should, then we need to define a clear way for distinguishing inputs apart from name on API level, instead of the undocumented DFS order. According to the current API design, I guess inputs with same name shouldn't be supported, as the order of list_arguments() is unique only if inputs has unique name. Adding DFS order to list_arguments() spec isn't user friendly, as it's hard for end user to find out the final DFS order from a complex topology, as single op(eg. rand_zipfian) from end-user level may be consist of many small ops in final computing graph. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223912091 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: `If the input type is a list of NDArray, the order should be same as the order of list_arguments().` If there're inputs with same name, for your case then the list_arguments() will be like, ['data', 'data', 'data' ...] Then, how can we know which 'data' is the data2? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223906681 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: @mbrookhart I think MXNet requires inputs with unique names, otherwise, how to distinguish the inputs with same names? According to symbol.bind API: **args (list of NDArray or dict of str to NDArray) Input arguments to the symbol. If the input type is a list of NDArray, the order should be same as the order of list_arguments(). If the input type is a dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray. In either case, all the arguments must be provided.** So basically, name is the only id for each input. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223906681 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: @mbrookhart I think MXNet requires inputs with unique names, otherwise, how to distinguish the inputs with same names? According to symbol.bind API: ``` args (list of NDArray or dict of str to NDArray) Input arguments to the symbol. If the input type is a list of NDArray, the order should be same as the order of list_arguments(). If the input type is a dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray. In either case, all the arguments must be provided. ``` So basically, name is the only id for each input. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223906681 ## File path: src/executor/graph_executor.cc ## @@ -1561,15 +1561,32 @@ static nnvm::Symbol PartitionGraph(const nnvm::Symbol& src, } else { CHECK(i1 < arg_names.size()); CHECK_EQ(arg_names[i1], input_names[i]); - arg_shapes.push_back(in_args[i1].shape()); - arg_dtypes.push_back(in_args[i1].dtype()); - arg_stypes.push_back(in_args[i1].storage_type()); - in_arg_ctxes[i1] = in_args[i1].ctx(); + arg_shapes.push_back(in_args->at(i1).shape()); + arg_dtypes.push_back(in_args->at(i1).dtype()); + arg_stypes.push_back(in_args->at(i1).storage_type()); + in_arg_ctxes[i1] = in_args->at(i1).ctx(); ++i1; } } - return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, -default_ctx, ctx_map, in_arg_ctxes, aux_state_ctxes); + + // setup in_args_map + std::unordered_map in_args_map; + for (size_t i = 0; i < in_args->size(); ++i) { +in_args_map[arg_names[i]] = in_args->at(i); Review comment: @mbrookhart I think MXNet requires inputs with unique names, otherwise, how to distinguish the inputs with same names? According to symbol.bind API: **args (list of NDArray or dict of str to NDArray) – Input arguments to the symbol. If the input type is a list of NDArray, the order should be same as the order of list_arguments(). If the input type is a dict of str to NDArray, then it maps the name of arguments to the corresponding NDArray. In either case, all the arguments must be provided.** So basically, name is the only id for each input. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223569976 ## File path: src/operator/quantization/mkldnn/mkldnn_quantize-inl.h ## @@ -75,6 +75,11 @@ static void MKLDNNQuantizeComputeKer(const std::vector& inputs, auto i_mpd = i_mem->get_primitive_desc(); auto i_desc = i_mpd.desc(); mkldnn::memory::format i_fmt = static_cast(i_desc.data.format); + if (i_fmt == mkldnn::memory::format::nchw || + i_fmt == mkldnn::memory::format::nChw8c || + i_fmt == mkldnn_nChw16c) { +i_fmt = mkldnn::memory::format::nhwc; + } Review comment: @KellenSunderland Sorry for responding late. For mkldnn quantization flow, we won't offline quantize any params, but using fp32 params(eg. weight and bias) for quantized convolution. We will online quantize convolution params in first forwarding. For the code here, it's for the default output format of 'quantize' op when using in mkldnn quantization flow, this won't effect non-mkldnn quantization flow. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223378669 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv_property.cc ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#if MXNET_USE_MKLDNN == 1 + +#include "../common.h" +#include "../subgraph_property.h" +#include "../../nn/activation-inl.h" + +namespace mxnet { +namespace op { +class SgMKLDNNConvSelector : public SubgraphSelector { + public: + /*! \brief pattern match status */ + enum SelectStatus { +kFail = 0, +kStart, +kBN, +kSum, +kSuccess, + }; + + private: + bool disable_all; + bool disable_conv_bn; + bool disable_conv_relu; + bool disable_conv_sum; + SelectStatus status; + std::vector matched_list; + + public: + SgMKLDNNConvSelector(int dis_all, int dis_conv_bn, int dis_conv_relu, int dis_conv_sum) + : disable_all(dis_all), +disable_conv_bn(dis_conv_bn), +disable_conv_relu(dis_conv_relu), +disable_conv_sum(dis_conv_sum) {} + + bool Select(const nnvm::Node &n) override { +if (n.op() && n.op()->name == "Convolution") { + status = disable_all ? kSuccess : kStart; + matched_list.clear(); + matched_list.push_back(&n); + return true; +} +return false; + } + + bool SelectInput(const nnvm::Node &n, const nnvm::Node &new_node) override { +return false; + } + + bool SelectOutput(const nnvm::Node &n, const nnvm::Node &new_node) override { +if (status == kFail || status == kSuccess || new_node.is_variable()) + return false; +// If n isn't the last matched node, then we encoutered a internal +// branch, we should pop out the node behind n and stop fusion. +if (matched_list.back() != &n) { + while (matched_list.back() != &n) { +matched_list.pop_back(); + } + status = kSuccess; + return false; +} +// Use status machine to do selection. The status change is +// kStart -> kBN -> kSum -> kSuccess +switch (status) { + case kStart: +if ((!disable_conv_bn) && new_node.op()->name == "BatchNorm") { + matched_list.push_back(&new_node); + status = kBN; + return true; +} + case kBN: +if ((!disable_conv_sum) && new_node.op()->name == "elemwise_add") { + matched_list.push_back(&new_node); + status = kSum; + return true; +} + case kSum: + default: +if ((!disable_conv_relu) && new_node.op()->name == "Activation") { + const ActivationParam ¶m = + nnvm::get(new_node.attrs.parsed); + if (param.act_type == activation::kReLU) { +matched_list.push_back(&new_node); +// If we find conv+relu, then we can't match bn anymore. +if (status == kStart) status = kBN; +return true; + } else { +status = kSuccess; +return false; + } +} +status = kSuccess; +return false; +} + } + + std::vector Filter( + const std::vector &candidates) override { +if (status == kFail) { + return std::vector(0); +} else { + return candidates; +} + } +}; + +class SgMKLDNNConvProperty : public SubgraphProperty { + public: + SgMKLDNNConvProperty() { +disable_all = dmlc::GetEnv("MXNET_DISABLE_MKLDNN_OPT", 0); +disable_conv_bn = dmlc::GetEnv("MXNET_DISABLE_MKLDNN_FUSE_CONV_BN", 0); +disable_conv_relu = dmlc::GetEnv("MXNET_DISABLE_MKLDNN_FUSE_CONV_RELU", 0); +disable_conv_sum = dmlc::GetEnv("MXNET_DISABLE_MKLDNN_FUSE_CONV_SUM", 0); + +disable_all = +disable_all && disable_conv_bn && disable_conv_relu && disable_conv_sum; +if (disable_all) { + LOG(INFO) << "MKLDNN Convolution optimization pass is disabled."; +} else { + LOG(INFO) << "Start to execute MKLDNN Convolution optimization pass."; +} + } + static SubgraphPropertyPtr Create() { +return std::make_shared(); + } + nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym, + const i
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223241107 ## File path: src/operator/subgraph/subgraph_property.h ## @@ -92,6 +92,22 @@ class SubgraphProperty { // execute the operators in the subgraph. virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &s, const int subgraph_id = 0) const = 0; + // Connect subgraph internal output with external output entries. By default, + // each output entry will connect to an unique internal output. + virtual void ConnectSubgraphOutputs( + const nnvm::NodePtr n, Review comment: @reminisce Code style is not consistent in this file. SubgraphSelector use different coding style with SubgraphProperty, which making me confusing which coding style should to follow. For example, SubgraphSelector uses /*! */ style comment with param description, and long parameter name(eg nnvm::Node &input_node), while SubgraphProperty uses \\ style comment without param description, and short parameter name(eg. const nnvm::Symbol &s). Changing these 2 SubgraphProperty functions into SubgraphSelector style doesn't make sense to me. If you insist, I'd like to adjust the code style for whole file. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r223165686 ## File path: src/c_api/c_api_symbolic.cc ## @@ -696,3 +696,21 @@ int MXSetCalibTableToQuantizedSymbol(SymbolHandle qsym_handle, *ret_qsym_handle = s; API_END_HANDLE_ERROR(delete s); } + +int MXGenBackendSubgraph(SymbolHandle sym_handle, const char *backend, Review comment: Yes, your understanding is correct. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222604179 ## File path: tests/python/mkl/test_subgraph.py ## @@ -0,0 +1,472 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import sys +import os +import mxnet as mx +import numpy as np +import unittest +import ctypes +from mxnet.io import NDArrayIter +from mxnet.module import Module +from mxnet.symbol import Symbol +from importlib import import_module +from numpy.testing import assert_allclose +from mxnet.base import SymbolHandle, check_call, _LIB, mx_uint, c_str +from mxnet.test_utils import DummyIter +curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__))) +sys.path.append(os.path.join(curr_path, '../unittest/')) +from common import with_seed + +DATA_SHAPE=[(4, 4, 10, 10), (32, 3, 24, 24), (64, 8, 64, 64)] +DATA_LABEL=[(4, 10), (32, 10), (64, 10)] +MIN_VALUE=-1.0 +MAX_VALUE=1.0 + +def check_qsym_calibrated(qsym): + assert ''.join(qsym.attr_dict().keys()).find('quantized_sg_mkldnn_conv') != -1 + for k, v in qsym.attr_dict().items(): +if k.find('quantized_sg_mkldnn_conv') != -1: + assert 'min_calib_range' in v + assert 'max_calib_range' in v +if k.find('_quantize') != -1: + assert v['out_type'] == 'uint8' + +def check_qsym_forward(qsym, qarg_params, qaux_params, batch, data_shape, label_shape): + mod = mx.mod.Module(symbol=qsym, context=mx.current_context()) + mod.bind(for_training=False, + data_shapes=[('data', data_shape)], + label_shapes=[('softmax_label', label_shape)]) + mod.set_params(qarg_params, qaux_params) + mod.forward(batch, is_train=False) + for output in mod.get_outputs(): +output.wait_to_read() + return output + +def check_quantize(sym, arg_params, aux_params, data_shape, label_shape, batch, sym_output): + excluded_sym_names = [] + if mx.current_context() == mx.cpu(): +excluded_sym_names += ['fc'] + calib_data = mx.nd.random.uniform(shape=data_shape) + calib_data = NDArrayIter(data=calib_data) + calib_data = DummyIter(calib_data) + calib_layer = lambda name: name.endswith('_output') + qsym, qarg_params, qaux_params = mx.contrib.quant.quantize_model(sym=sym, + arg_params=arg_params, + aux_params=aux_params, + ctx=mx.current_context(), + excluded_sym_names=excluded_sym_names, + quantized_dtype='uint8', + calib_mode='naive', + calib_data=calib_data, + calib_layer=calib_layer, + calib_quantize_op=True, + num_calib_examples=20) + qsym = qsym.get_backend_symbol("MKLDNN_POST_QUANTIZE") + check_qsym_calibrated(qsym) + qsym_output = check_qsym_forward(qsym, qarg_params, qaux_params, batch, data_shape, label_shape) + + diff = mx.nd.abs(sym_output - qsym_output.astype(sym_output.dtype)) + cond = mx.nd.lesser(2, diff).sum().asscalar() + assert cond == 0 + +@with_seed() +def check_fusion(sym, data_shape, label_shape, attrs_op): + dev = mx.cpu() + mod = Module(symbol=sym) + mod.bind(data_shapes=[('data', data_shape)], label_shapes=[('softmax_label', label_shape)]) + mod.init_params(mx.init.Normal(0.5)) + arg_params, aux_params = mod.get_params() + + data = [mx.random.uniform(MIN_VALUE, MAX_VALUE, shape=shape, ctx=dev) for _, shape in mod.data_shapes] + batch = mx.io.DataBatch(data, []) + + mod.forward(batch, is_train=False) + for output in mod.get_outputs(): + output.wait_to_read() + + sym_sg = sym.get_backend_symbol("MKLDNN") + mod_sg = Module(symbol=sym) + mod_sg.bind(data_shapes=[('dat
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222603978 ## File path: src/ndarray/ndarray.cc ## @@ -1592,8 +1601,12 @@ void NDArray::Save(dmlc::Stream *strm) const { save_data = nd_cpu.data(); } else { this->WaitToRead(); -save_data = this->data(); nd_cpu = *this; +#if MXNET_USE_MKLDNN == 1 +if (nd_cpu.IsMKLDNNData()) Review comment: I don't see such rule from the code itself. Single line conditionals without braces can be found everywhere, even in this file. See line 1591, 1654, 1669 ... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222594664 ## File path: src/c_api/c_api_symbolic.cc ## @@ -696,3 +696,21 @@ int MXSetCalibTableToQuantizedSymbol(SymbolHandle qsym_handle, *ret_qsym_handle = s; API_END_HANDLE_ERROR(delete s); } + +int MXGenBackendSubgraph(SymbolHandle sym_handle, const char *backend, Review comment: What do you mean by a placeholder for the fused mkldnn call? This API is intend for converting a symbol into a backend specific symbol. On quantization flow, we need do fusion at first, and then do quantization. So on python level, we need to get the backend specific symbol that all backend specific fusion applied, and pass it to quantization pass. That's why we need this API. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222592299 ## File path: tests/python/mkl/test_subgraph.py ## @@ -0,0 +1,472 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import sys +import os +import mxnet as mx +import numpy as np +import unittest +import ctypes +from mxnet.io import NDArrayIter +from mxnet.module import Module +from mxnet.symbol import Symbol +from importlib import import_module +from numpy.testing import assert_allclose +from mxnet.base import SymbolHandle, check_call, _LIB, mx_uint, c_str +from mxnet.test_utils import DummyIter +curr_path = os.path.dirname(os.path.abspath(os.path.expanduser(__file__))) +sys.path.append(os.path.join(curr_path, '../unittest/')) +from common import with_seed + +DATA_SHAPE=[(4, 4, 10, 10), (32, 3, 24, 24), (64, 8, 64, 64)] +DATA_LABEL=[(4, 10), (32, 10), (64, 10)] +MIN_VALUE=-1.0 +MAX_VALUE=1.0 + +def check_qsym_calibrated(qsym): + assert ''.join(qsym.attr_dict().keys()).find('quantized_sg_mkldnn_conv') != -1 + for k, v in qsym.attr_dict().items(): +if k.find('quantized_sg_mkldnn_conv') != -1: + assert 'min_calib_range' in v + assert 'max_calib_range' in v +if k.find('_quantize') != -1: + assert v['out_type'] == 'uint8' + +def check_qsym_forward(qsym, qarg_params, qaux_params, batch, data_shape, label_shape): + mod = mx.mod.Module(symbol=qsym, context=mx.current_context()) + mod.bind(for_training=False, + data_shapes=[('data', data_shape)], + label_shapes=[('softmax_label', label_shape)]) + mod.set_params(qarg_params, qaux_params) + mod.forward(batch, is_train=False) + for output in mod.get_outputs(): +output.wait_to_read() + return output + +def check_quantize(sym, arg_params, aux_params, data_shape, label_shape, batch, sym_output): + excluded_sym_names = [] + if mx.current_context() == mx.cpu(): +excluded_sym_names += ['fc'] + calib_data = mx.nd.random.uniform(shape=data_shape) + calib_data = NDArrayIter(data=calib_data) + calib_data = DummyIter(calib_data) + calib_layer = lambda name: name.endswith('_output') + qsym, qarg_params, qaux_params = mx.contrib.quant.quantize_model(sym=sym, + arg_params=arg_params, + aux_params=aux_params, + ctx=mx.current_context(), + excluded_sym_names=excluded_sym_names, + quantized_dtype='uint8', + calib_mode='naive', + calib_data=calib_data, + calib_layer=calib_layer, + calib_quantize_op=True, + num_calib_examples=20) + qsym = qsym.get_backend_symbol("MKLDNN_POST_QUANTIZE") + check_qsym_calibrated(qsym) + qsym_output = check_qsym_forward(qsym, qarg_params, qaux_params, batch, data_shape, label_shape) + + diff = mx.nd.abs(sym_output - qsym_output.astype(sym_output.dtype)) + cond = mx.nd.lesser(2, diff).sum().asscalar() + assert cond == 0 + +@with_seed() +def check_fusion(sym, data_shape, label_shape, attrs_op): + dev = mx.cpu() + mod = Module(symbol=sym) + mod.bind(data_shapes=[('data', data_shape)], label_shapes=[('softmax_label', label_shape)]) + mod.init_params(mx.init.Normal(0.5)) + arg_params, aux_params = mod.get_params() + + data = [mx.random.uniform(MIN_VALUE, MAX_VALUE, shape=shape, ctx=dev) for _, shape in mod.data_shapes] + batch = mx.io.DataBatch(data, []) + + mod.forward(batch, is_train=False) + for output in mod.get_outputs(): + output.wait_to_read() + + sym_sg = sym.get_backend_symbol("MKLDNN") + mod_sg = Module(symbol=sym) + mod_sg.bind(data_shapes=[('dat
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222592005 ## File path: python/mxnet/contrib/quantization.py ## @@ -53,6 +53,7 @@ def _quantize_params(qsym, params): qsym : Symbol Quantized symbol from FP32 symbol. params : dict of str->NDArray +th_dict: dict of min/max pairs of layers' output Review comment: It applies to the output as well. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r222591350 ## File path: src/operator/subgraph/subgraph_property.h ## @@ -92,6 +92,22 @@ class SubgraphProperty { // execute the operators in the subgraph. virtual nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &s, const int subgraph_id = 0) const = 0; + // Connect subgraph internal output with external output entries. By default, + // each output entry will connect to an unique internal output. + virtual void ConnectSubgraphOutputs( + const nnvm::NodePtr n, + std::vector *output_entries) const { Review comment: This can't be guaranteed as output_entries may have duplicated entries from n. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221443657 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv_property.cc ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#if MXNET_USE_MKLDNN == 1 + +#include "../common.h" +#include "../subgraph_property.h" +#include "../../nn/activation-inl.h" + +namespace mxnet { +namespace op { +class SgMKLDNNConvSelector : public SubgraphSelector { + public: + /*! \brief pattern match status */ + enum SelectStatus { +kFail = 0, +kStart, +kBN, +kSum, +kSuccess, + }; + + private: + bool disable_all; + bool disable_conv_bn; + bool disable_conv_relu; + bool disable_conv_sum; + SelectStatus status; + std::vector matched_list; + + public: + SgMKLDNNConvSelector(int dis_all, int dis_conv_bn, int dis_conv_relu, int dis_conv_sum) + : disable_all(dis_all), +disable_conv_bn(dis_conv_bn), +disable_conv_relu(dis_conv_relu), +disable_conv_sum(dis_conv_sum) {} + + bool Select(const nnvm::Node &n) override { +if (n.op() && n.op()->name == "Convolution") { + status = disable_all ? kSuccess : kStart; + matched_list.clear(); + matched_list.push_back(&n); + return true; +} +return false; + } + + bool SelectInput(const nnvm::Node &n, const nnvm::Node &new_node) override { +return false; + } + + bool SelectOutput(const nnvm::Node &n, const nnvm::Node &new_node) override { Review comment: See line 71 to 79 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221416274 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: CHECK is added. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221407486 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: @zheng-da For a single node, is it possible that it contains both control flow subgraph and common subgraph? Another question, how to identify if a subgraph node is control flow operator? We should have a method to distinguish 2 kinds of subgraph. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221407486 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: @zheng-da For a single node, is it possible that it contains both control flow subgraph and common subgraph? Another question, how to identify if a subgraph node is control flow operator? We should have a method to distinguish them. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221130858 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv-inl.h ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef MXNET_OPERATOR_SUBGRAPH_MKLDNN_MKLDNN_CONV_INL_H_ +#define MXNET_OPERATOR_SUBGRAPH_MKLDNN_MKLDNN_CONV_INL_H_ +#if MXNET_USE_MKLDNN == 1 + +#include +#include +#include +#include "../../nn/convolution-inl.h" +#include "../../nn/batch_norm-inl.h" +#include "../../nn/mkldnn/mkldnn_convolution-inl.h" + +namespace mxnet { +namespace op { + +struct MKLDNNConvFusionParam { + MKLDNNConvFullParam full_conv_param; + std::shared_ptr bn_param; +}; Review comment: @zheng-da Yes. You're right. MKLDNNConvFullParam contains all the parameters for mkldnn convolution primitive. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r221126691 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: @zheng-da If we don't need more than 1 subgraph in a node, why we define subgraphs as a vector? @reminisce If subgraph is produced and handled by each backend itself, then each backend can decide its subgraph sequence. But the thing is, we need to add code to common part to handle all subgraphs from various backends. This part change is an example. So it's better to have such a rule that all backends to follow, and then we can make check based on that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220422029 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv_post_quantize_property.cc ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#if MXNET_USE_MKLDNN == 1 + +#include "../common.h" +#include "../subgraph_property.h" +#include "../../nn/mkldnn/mkldnn_convolution-inl.h" +#include "mkldnn_conv-inl.h" +#include "../../quantization/requantize-inl.h" + +namespace mxnet { +namespace op { + +class SgMKLDNNConvPostQuantizeSelector : public SubgraphSelector { + public: + /*! \brief pattern match status */ + enum SelectStatus { +kFail = 0, +kStart, +kSuccess, + }; + + private: + bool disable_all; + SelectStatus status; + std::vector matched_list; + + public: + explicit SgMKLDNNConvPostQuantizeSelector(int dis_all) + : disable_all(dis_all) {} + + bool Select(const nnvm::Node &n) override { +if ((!disable_all) && n.op() && n.op()->name == "_sg_mkldnn_conv") { + auto const ¶m = nnvm::get(n.attrs.parsed); + if (param.full_conv_param.mkldnn_param.quantized) { +status = kStart; +matched_list.clear(); +matched_list.push_back(&n); +return true; + } +} +return false; + } + + bool SelectInput(const nnvm::Node &n, const nnvm::Node &new_node) override { +return false; + } + + bool SelectOutput(const nnvm::Node &n, const nnvm::Node &new_node) override { +if (status == kFail || status == kSuccess || new_node.is_variable()) + return false; +// If n isn't the last matched node, then we encoutered a internal +// branch, we should pop out the node behind n and stop fusion. +if (matched_list.back() != &n) { + status = kFail; + return false; +} +if (new_node.op()->name == "_contrib_requantize") { + auto const ¶m = nnvm::get(new_node.attrs.parsed); + if (param.min_calib_range.has_value() && + param.max_calib_range.has_value()) { +matched_list.push_back(&new_node); +status = kSuccess; +return true; + } else { +status = kFail; + } +} +return false; + } + + std::vector Filter( + const std::vector &candidates) override { +if (status != kSuccess) { + return std::vector(0); +} else { + return candidates; +} + } +}; + +class SgMKLDNNConvPostQuantizeProperty : public SubgraphProperty { + public: + SgMKLDNNConvPostQuantizeProperty() { +disable_all = dmlc::GetEnv("MXNET_DISABLE_MKLDNN_OPT", 0); +if (disable_all) { + LOG(INFO) << "MKLDNN Convolution post-quantization optimization pass is disabled."; +} else { + LOG(INFO) << "Start to execute MKLDNN Convolution post-quantization optimization pass."; +} + } + static SubgraphPropertyPtr Create() { +return std::make_shared(); + } + nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym, + const int subgraph_id = 0) const override { +nnvm::NodePtr conv_node = nullptr; +nnvm::NodePtr requantize_node = nullptr; +DFSVisit(sym.outputs, [&](const nnvm::NodePtr &node) { + if (node->is_variable()) return; + auto &op_name = node->op()->name; + if (op_name == "_sg_mkldnn_conv") { +conv_node = node; + } else if (op_name == "_contrib_requantize") { +requantize_node = node; + } +}); +CHECK_NOTNULL(conv_node); +CHECK_NOTNULL(requantize_node); +auto const &requantize_param = +nnvm::get(requantize_node->attrs.parsed); +CHECK(requantize_param.min_calib_range.has_value()); +CHECK(requantize_param.max_calib_range.has_value()); +conv_node->attrs.dict["min_calib_range"] = +std::to_string(requantize_param.min_calib_range.value()); +conv_node->attrs.dict["max_calib_range"] = +std::to_string(requantize_param.max_calib_range.value()); +conv_node->op()->attr_parser(&(conv_node->attrs)); +return conv_node; + } + + SubgraphSelectorPtr CreateSubgraphSelector() const override { +auto selector = +std::make_s
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220414425 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -199,33 +245,45 @@ Graph QuantizeGraph(Graph &&src) { // the new_node. *new_node = *node; new_node->inputs.clear(); - for (const auto& e : node->inputs) { -NodePtr mirror_node = mirror_map.at(e.node.get()); -NodeEntry mirror_entry = NodeEntry{ - mirror_node, e.index, e.version}; -// if input node is quantized operator, add dequantize node -if (NeedQuantize(e.node, excluded_nodes)) { - // here we calculate the output number (exclude min/max, in order to - // calculate min/max index from mirror node) based on assumption that - // there is only 1min and 1max output from mirror node (which is - // currently true) - size_t num_outputs = mirror_node->num_outputs() - 2; - uint32_t min_index = num_outputs + 2 * e.index; - uint32_t max_index = num_outputs + 2 * e.index + 1; - NodePtr dequantize_node = CreateNode("_contrib_dequantize", -e.node->attrs.name + "_dequantize"); - dequantize_node->inputs.emplace_back(mirror_entry); - dequantize_node->inputs.emplace_back(NodeEntry{mirror_node, min_index, 0}); - dequantize_node->inputs.emplace_back(NodeEntry{mirror_node, max_index, 0}); - dequantize_node->op()->attr_parser(&(dequantize_node->attrs)); + if (node->is_variable() && node->attrs.name == "data") { Review comment: This is to solve first layer calibration problem. If we try to quantize first layer(eg. conv0), then we will insert an quantize op between data and conv0. After enabling calib_quantize_op, we will collect calibration info on data, which can't be done by set_monitor_callback as it only monitors op's output, but data is not an output of any op. To solve this problem, we insert an identity op for data, to convert it to identity's output, then we can use set_monitor_callback to do calibration for it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220409445 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: Also, I don't know which kind of CHECK can be added here. CHECK(node->attrs.subgraphs.size() == 1)? Why can't we support more than 1 subgraph symbol? All we need to check here is the demanded symbol should be in index 0, but this is not checkable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220405988 ## File path: src/operator/quantization/quantize_graph_pass.cc ## @@ -89,17 +89,41 @@ std::vector OfflineParams(std::vector&& outputs, return outputs; } -inline bool NeedQuantize(NodePtr node, const std::unordered_set excluded_nodes) { +inline bool NeedQuantize(NodePtr node, const std::unordered_set& excluded_nodes) { static auto& quantized_op_map = Op::GetAttr("FQuantizedOp"); - return quantized_op_map.count(node->op()) && !excluded_nodes.count(node); + if (quantized_op_map.count(node->op())) { +bool excluded = false; +if (node->attrs.subgraphs.size()) { + // This is a subgraph node, try to match subgraph name first, + // and then try to match inner node. + if (excluded_nodes.count(node->attrs.name)) { +excluded = true; + } else { +auto subgraph_sym = node->attrs.subgraphs[0]; Review comment: OK. But this is not a good design that we didn't have a rule that which index should be the subgraph symbol node as I comment in your subgraph PR. We have to assume that the first node(index 0) is the subgraph node, and all developers won't break this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220091220 ## File path: python/mxnet/contrib/quantization.py ## @@ -408,12 +420,23 @@ def _load_params(params, logger=logging): raise ValueError('Unsupported params provided. Must be either a path to the param file or' ' a pair of dictionaries representing arg_params and aux_params') +def save_params(fname, arg_params, aux_params, logger=None): Review comment: This is already inside imagenet_gen_qsym.py. Will remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220079910 ## File path: python/mxnet/contrib/quantization.py ## @@ -40,7 +40,7 @@ from ..module import Module -def _quantize_params(qsym, params): +def _quantize_params(qsym, params, th_dict): Review comment: I guess it means threashold_dict, @reminisce can you explain it? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220079292 ## File path: example/quantization/imagenet_gen_qsym_mkldnn.py ## @@ -0,0 +1,213 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import argparse +import os +import logging +from common import modelzoo +import mxnet as mx +from mxnet.contrib.quantization import * +from mxnet.base import SymbolHandle, check_call, _LIB, mx_uint, c_str_array +import ctypes + + +def download_calib_dataset(dataset_url, calib_dataset, logger=None): +if logger is not None: +logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset)) +mx.test_utils.download(dataset_url, calib_dataset) + + +def download_model(model_name, logger=None): +dir_path = os.path.dirname(os.path.realpath(__file__)) +model_path = os.path.join(dir_path, 'model') +if logger is not None: +logger.info('Downloading model %s... into path %s' % (model_name, model_path)) +return modelzoo.download_model(args.model, os.path.join(dir_path, 'model')) + + +def save_symbol(fname, sym, logger=None): +if logger is not None: +logger.info('Saving symbol into file at %s' % fname) +sym.save(fname) + + +def save_params(fname, arg_params, aux_params, logger=None): +if logger is not None: +logger.info('Saving params into file at %s' % fname) +save_dict = {('arg:%s' % k): v.as_in_context(cpu()) for k, v in arg_params.items()} +save_dict.update({('aux:%s' % k): v.as_in_context(cpu()) for k, v in aux_params.items()}) +mx.nd.save(fname, save_dict) + + +if __name__ == '__main__': +parser = argparse.ArgumentParser(description='Generate a calibrated quantized model from a FP32 model with MKL-DNN support') Review comment: We do want to provide resnet50v1 as example, but we don't know where's to put the pre-trained model and its parameter file. Do you have any suggestion where's to upload them? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220034121 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv-inl.h ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +#ifndef MXNET_OPERATOR_SUBGRAPH_MKLDNN_MKLDNN_CONV_INL_H_ +#define MXNET_OPERATOR_SUBGRAPH_MKLDNN_MKLDNN_CONV_INL_H_ +#if MXNET_USE_MKLDNN == 1 + +#include +#include +#include +#include "../../nn/convolution-inl.h" +#include "../../nn/batch_norm-inl.h" +#include "../../nn/mkldnn/mkldnn_convolution-inl.h" + +namespace mxnet { +namespace op { + +struct MKLDNNConvFusionParam { + MKLDNNConvFullParam full_conv_param; + std::shared_ptr bn_param; +}; Review comment: It's an abstraction to isolate mkldnn convolution params with fusion params: MKLDNNConvFullParam defines in mkldnn_convolution-inl.h, which only contains the options convolution needed, and will pass to MKLDNNConvolutionForwardFullFeature. MKLDNNConvFusionParam defines in mkldnn_conv-inl.h, which is used for SgMKLDNNConvParamParser, and to support fusion related function in SgMKLDNNConvOperator::Forward. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220034161 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv.cc ## @@ -0,0 +1,678 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one +* or more contributor license agreements. See the NOTICE file +* distributed with this work for additional information +* regarding copyright ownership. The ASF licenses this file +* to you under the Apache License, Version 2.0 (the +* "License"); you may not use this file except in compliance +* with the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, +* software distributed under the License is distributed on an +* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +* KIND, either express or implied. See the License for the +* specific language governing permissions and limitations +* under the License. +*/ + +#if MXNET_USE_MKLDNN == 1 + +#include +#include +#include +#include "../common.h" +#include "../../nn/mkldnn/mkldnn_base-inl.h" +#include "../../nn/mkldnn/mkldnn_ops-inl.h" +#include "../../quantization/quantization_utils.h" +#include "mkldnn_conv-inl.h" + +namespace mxnet { +namespace op { + +template +static void UpdateConvWeightBias(NDArray *weight, NDArray *bias, bool no_bias, + const NDArray &gamma, const NDArray &beta, + const NDArray &mean, const NDArray &variance, + const BatchNormParam *param) { + // TODO(Zhennan): Handle the case weight is not in dims 4. + NDArray update_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, weight->dtype()); + NDArray update_bias = NDArray(beta.storage_type(), beta.shape(), beta.ctx(), +true, beta.dtype()); + DType *weight_ptr = weight->data().dptr(); + DType *bias_ptr = no_bias ? nullptr : bias->data().dptr(); + DType *gamma_ptr = gamma.Reorder2Default().data().dptr(); + DType *beta_ptr = beta.Reorder2Default().data().dptr(); + DType *mean_ptr = mean.Reorder2Default().data().dptr(); + DType *var_ptr = variance.Reorder2Default().data().dptr(); + DType *update_weight_ptr = update_weight.data().dptr(); + DType *update_bias_ptr = update_bias.data().dptr(); + size_t channel = gamma.shape()[0]; + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; +#pragma omp parallel for num_threads(engine::OpenMP::Get()->GetRecommendedOMPThreadCount()) + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = reinterpret_cast(weight_ptr + c * offset); +DType *p2 = reinterpret_cast(update_weight_ptr + c * offset); +DType alpha = (param->fix_gamma ? static_cast(1.0f) : gamma_ptr[c]) / + sqrt(var_ptr[c] + param->eps); + +if (bias_ptr) + update_bias_ptr[c] = beta_ptr[c] + alpha * (bias_ptr[c] - mean_ptr[c]); +else + update_bias_ptr[c] = beta_ptr[c] - alpha * mean_ptr[c]; + +for (size_t k = 0; k < offset; ++k) { + p2[k] = p1[k] * alpha; +} + } + *weight = update_weight; + *bias = update_bias; +} + +static inline size_t GetInSumIndex(const MKLDNNConvFusionParam ¶m) { + return 2 + (param.full_conv_param.conv_param.no_bias ? 0 : 1) + + (param.full_conv_param.mkldnn_param.with_bn ? 4 : 0); +} + +template +static void QuantizeConvWeightBias(NDArray *weight, NDArray *bias, + bool has_bias, float data_min, + float data_max, + bool weight_channelwise_scale, + std::vector *weight_scales) { + using red::limits::MaxValue; + using red::limits::MinValue; + DType *weight_ptr = weight->data().dptr(); + NDArray quantized_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, mshadow::kInt8); + int8_t *quan_weight_ptr = quantized_weight.data().dptr(); + size_t channel = weight->shape()[0]; + + // TODO(Zhennan): Handle the case weight is not in dims 4. + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; + std::vector weight_c_min(channel, MaxValue()); + std::vector weight_c_max(channel, MinValue()); +#pragma omp parallel for num_threads(engine::OpenMP::Get()->GetRecommendedOMPThreadCount()) + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = weight_ptr + c * offset; +for (size_t k = 0; k < offset; ++k) { + if (weight_c_min[c] > p1[k]) +weight_c_min[c] = p1[k]; + if (weight_c_max[c] < p1[k]) +weight_c_max[c] = p1[k]; +} + } + + if (weight_channelwise_scale) { +weight_scales->resize(channel); +#pragma omp parallel for num_t
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220034180 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv.cc ## @@ -0,0 +1,678 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one +* or more contributor license agreements. See the NOTICE file +* distributed with this work for additional information +* regarding copyright ownership. The ASF licenses this file +* to you under the Apache License, Version 2.0 (the +* "License"); you may not use this file except in compliance +* with the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, +* software distributed under the License is distributed on an +* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +* KIND, either express or implied. See the License for the +* specific language governing permissions and limitations +* under the License. +*/ + +#if MXNET_USE_MKLDNN == 1 + +#include +#include +#include +#include "../common.h" +#include "../../nn/mkldnn/mkldnn_base-inl.h" +#include "../../nn/mkldnn/mkldnn_ops-inl.h" +#include "../../quantization/quantization_utils.h" +#include "mkldnn_conv-inl.h" + +namespace mxnet { +namespace op { + +template +static void UpdateConvWeightBias(NDArray *weight, NDArray *bias, bool no_bias, + const NDArray &gamma, const NDArray &beta, + const NDArray &mean, const NDArray &variance, + const BatchNormParam *param) { + // TODO(Zhennan): Handle the case weight is not in dims 4. + NDArray update_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, weight->dtype()); + NDArray update_bias = NDArray(beta.storage_type(), beta.shape(), beta.ctx(), +true, beta.dtype()); + DType *weight_ptr = weight->data().dptr(); + DType *bias_ptr = no_bias ? nullptr : bias->data().dptr(); + DType *gamma_ptr = gamma.Reorder2Default().data().dptr(); + DType *beta_ptr = beta.Reorder2Default().data().dptr(); + DType *mean_ptr = mean.Reorder2Default().data().dptr(); + DType *var_ptr = variance.Reorder2Default().data().dptr(); + DType *update_weight_ptr = update_weight.data().dptr(); + DType *update_bias_ptr = update_bias.data().dptr(); + size_t channel = gamma.shape()[0]; + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; +#pragma omp parallel for num_threads(engine::OpenMP::Get()->GetRecommendedOMPThreadCount()) + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = reinterpret_cast(weight_ptr + c * offset); +DType *p2 = reinterpret_cast(update_weight_ptr + c * offset); +DType alpha = (param->fix_gamma ? static_cast(1.0f) : gamma_ptr[c]) / + sqrt(var_ptr[c] + param->eps); + +if (bias_ptr) + update_bias_ptr[c] = beta_ptr[c] + alpha * (bias_ptr[c] - mean_ptr[c]); +else + update_bias_ptr[c] = beta_ptr[c] - alpha * mean_ptr[c]; + +for (size_t k = 0; k < offset; ++k) { + p2[k] = p1[k] * alpha; +} + } + *weight = update_weight; + *bias = update_bias; +} + +static inline size_t GetInSumIndex(const MKLDNNConvFusionParam ¶m) { + return 2 + (param.full_conv_param.conv_param.no_bias ? 0 : 1) + + (param.full_conv_param.mkldnn_param.with_bn ? 4 : 0); +} + +template +static void QuantizeConvWeightBias(NDArray *weight, NDArray *bias, + bool has_bias, float data_min, + float data_max, + bool weight_channelwise_scale, + std::vector *weight_scales) { + using red::limits::MaxValue; + using red::limits::MinValue; + DType *weight_ptr = weight->data().dptr(); + NDArray quantized_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, mshadow::kInt8); + int8_t *quan_weight_ptr = quantized_weight.data().dptr(); + size_t channel = weight->shape()[0]; + + // TODO(Zhennan): Handle the case weight is not in dims 4. + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; + std::vector weight_c_min(channel, MaxValue()); + std::vector weight_c_max(channel, MinValue()); +#pragma omp parallel for num_threads(engine::OpenMP::Get()->GetRecommendedOMPThreadCount()) + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = weight_ptr + c * offset; +for (size_t k = 0; k < offset; ++k) { + if (weight_c_min[c] > p1[k]) +weight_c_min[c] = p1[k]; + if (weight_c_max[c] < p1[k]) +weight_c_max[c] = p1[k]; +} + } + + if (weight_channelwise_scale) { +weight_scales->resize(channel); +#pragma omp parallel for num_t
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220032850 ## File path: src/operator/quantization/mkldnn/mkldnn_quantize-inl.h ## @@ -75,6 +75,11 @@ static void MKLDNNQuantizeComputeKer(const std::vector& inputs, auto i_mpd = i_mem->get_primitive_desc(); auto i_desc = i_mpd.desc(); mkldnn::memory::format i_fmt = static_cast(i_desc.data.format); + if (i_fmt == mkldnn::memory::format::nchw || + i_fmt == mkldnn::memory::format::nChw8c || + i_fmt == mkldnn_nChw16c) { +i_fmt = mkldnn::memory::format::nhwc; + } Review comment: For mkldnn, yes. nhwc should be the default int8 layout, just like nchw for fp32. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220032675 ## File path: src/executor/attach_op_execs_pass.cc ## @@ -180,12 +184,14 @@ class StatefulComputeExExecutor : public OpExecutor { return state_; } - explicit StatefulComputeExExecutor(const OpStatePtr& state, + explicit StatefulComputeExExecutor(const NodeAttrs& attrs, + const OpStatePtr& state, const FStatefulComputeEx& fcompute, ExecType exec_type) - : state_(state), fcompute_(fcompute), exec_type_(exec_type) {} + : attrs_(attrs), state_(state), fcompute_(fcompute), exec_type_(exec_type) {} private: + NodeAttrs attrs_; Review comment: To query TIsMKLDNN for StatefulComputeExExecutor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220032476 ## File path: src/executor/attach_op_execs_pass.cc ## @@ -159,9 +159,13 @@ class StatefulComputeExExecutor : public OpExecutor { op_ctx.run_ctx = rctx; #if MXNET_USE_MKLDNN == 1 InvalidateOutputs(out_array, req); -CreateDefaultInputs(in_array, &in_array_fallback); -fcompute_(state_, op_ctx, in_array_fallback, req, out_array); -return; +// TODO(alex): (MXNET-847) Remove this fallback feature after subgraph implemented +const auto is_mkldnn = Op::GetAttr("TIsMKLDNN"); +if (!is_mkldnn.get(attrs_.op, false)) { + CreateDefaultInputs(in_array, &in_array_fallback); + fcompute_(state_, op_ctx, in_array_fallback, req, out_array); + return; +} Review comment: OK. I will move this part out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220032426 ## File path: include/mxnet/c_api.h ## @@ -1535,6 +1534,15 @@ MXNET_DLL int MXSetCalibTableToQuantizedSymbol(SymbolHandle qsym_handle, const float* high_quantiles, SymbolHandle* ret_sym_handle); +/*! + * \brief Run subgraph pass based on the backend provided + * \param sym_handle symbol to be converted + * \param backend backend names for subgraph pass + * \param ret_sym_handle returned symbol + */ +MXNET_DLL int MXGenBackendSubgraph(SymbolHandle sym_handle, const char *backend, + SymbolHandle *ret_sym_handle); + Review comment: It's not for testing, but for quantization script. For mkldnn quantization, we agreed to do fusion first, and then do quantization. So on python side, we need an api to generate fused graph, and then pass it to quantization pass. Otherwise, we have to allow simple_bind returning the graph after subgraph pass. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r220031940 ## File path: src/operator/nn/mkldnn/mkldnn_convolution-inl.h ## @@ -35,19 +36,79 @@ namespace mxnet { namespace op { -mkldnn::convolution_forward::primitive_desc GetConvFwdImpl( -const ConvolutionParam& param, const bool is_train, const NDArray &data, -const NDArray &weights, const NDArray *bias, const NDArray &output); +struct MKLDNNConvParam : public dmlc::Parameter { + // When adding more members into this class, please double check GetHash() + // won't overflow. + bool with_bn; + bool with_relu; + bool with_sum; + bool with_postsum_relu; + bool quantized; + bool weight_channelwise_scale; + + dmlc::optional min_calib_range; // min float value calculated from calibration dataset + dmlc::optional max_calib_range; // max float value calculated from calibration dataset + + DMLC_DECLARE_PARAMETER(MKLDNNConvParam) { +DMLC_DECLARE_FIELD(with_bn).set_default(false) +.describe("Add post batchnorm."); +DMLC_DECLARE_FIELD(with_relu).set_default(false) +.describe("Add post relu"); +DMLC_DECLARE_FIELD(with_sum).set_default(false) +.describe("Add post sum"); +DMLC_DECLARE_FIELD(with_postsum_relu).set_default(false) +.describe("Add post relu after sum"); +DMLC_DECLARE_FIELD(quantized).set_default(false) +.describe("enable quantization"); +DMLC_DECLARE_FIELD(weight_channelwise_scale).set_default(true) +.describe("Quantize weight with channel wise scales."); +DMLC_DECLARE_FIELD(min_calib_range) +.set_default(dmlc::optional()) +.describe("The minimum scalar value in the form of float32 obtained " + "through calibration. If present, it will be used to by " + "quantized convolution op to calculate primitive scale"); +DMLC_DECLARE_FIELD(max_calib_range) +.set_default(dmlc::optional()) +.describe("The maximum scalar value in the form of float32 obtained " + "through calibration. If present, it will be used to by " + "quantized convolution op to calculate primitive scale"); + } + const int GetBoolHash() const { +int hash = 0; +hash = hash * 2 + this->with_bn ? 1 : 0; Review comment: This function is not used anymore, I simply remove it. Sorry about that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r219376896 ## File path: src/operator/nn/mkldnn/mkldnn_convolution-inl.h ## @@ -35,19 +36,79 @@ namespace mxnet { namespace op { -mkldnn::convolution_forward::primitive_desc GetConvFwdImpl( -const ConvolutionParam& param, const bool is_train, const NDArray &data, -const NDArray &weights, const NDArray *bias, const NDArray &output); +struct MKLDNNConvParam : public dmlc::Parameter { + // When adding more members into this class, please double check GetHash() + // won't overflow. + bool with_bn; + bool with_relu; + bool with_sum; + bool with_postsum_relu; + bool quantized; + bool weight_channelwise_scale; + + dmlc::optional min_calib_range; // min float value calculated from calibration dataset + dmlc::optional max_calib_range; // max float value calculated from calibration dataset + + DMLC_DECLARE_PARAMETER(MKLDNNConvParam) { +DMLC_DECLARE_FIELD(with_bn).set_default(false) +.describe("Add post batchnorm."); +DMLC_DECLARE_FIELD(with_relu).set_default(false) +.describe("Add post relu"); +DMLC_DECLARE_FIELD(with_sum).set_default(false) +.describe("Add post sum"); +DMLC_DECLARE_FIELD(with_postsum_relu).set_default(false) +.describe("Add post relu after sum"); +DMLC_DECLARE_FIELD(quantized).set_default(false) +.describe("enable quantization"); +DMLC_DECLARE_FIELD(weight_channelwise_scale).set_default(true) +.describe("Quantize weight with channel wise scales."); +DMLC_DECLARE_FIELD(min_calib_range) +.set_default(dmlc::optional()) +.describe("The minimum scalar value in the form of float32 obtained " + "through calibration. If present, it will be used to by " + "quantized convolution op to calculate primitive scale"); +DMLC_DECLARE_FIELD(max_calib_range) +.set_default(dmlc::optional()) +.describe("The maximum scalar value in the form of float32 obtained " + "through calibration. If present, it will be used to by " + "quantized convolution op to calculate primitive scale"); + } + const int GetBoolHash() const { +int hash = 0; +hash = hash * 2 + this->with_bn ? 1 : 0; +hash = hash * 2 + this->with_relu ? 1 : 0; +hash = hash * 2 + this->with_sum ? 1 : 0; +hash = hash * 2 + this->with_postsum_relu ? 1 : 0; +hash = hash * 2 + this->quantized ? 1 : 0; +return hash; Review comment: Used for calculating param hash when caching and reusing mkldnn op primitive. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r219376787 ## File path: src/operator/subgraph/mkldnn/mkldnn_conv.cc ## @@ -0,0 +1,670 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one +* or more contributor license agreements. See the NOTICE file +* distributed with this work for additional information +* regarding copyright ownership. The ASF licenses this file +* to you under the Apache License, Version 2.0 (the +* "License"); you may not use this file except in compliance +* with the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, +* software distributed under the License is distributed on an +* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +* KIND, either express or implied. See the License for the +* specific language governing permissions and limitations +* under the License. +*/ + +#if MXNET_USE_MKLDNN == 1 +#include "../common.h" +#include "../../nn/convolution-inl.h" +#include "../../nn/batch_norm-inl.h" +#include "../../nn/activation-inl.h" +#include "../../nn/mkldnn/mkldnn_base-inl.h" +#include "../../nn/mkldnn/mkldnn_ops-inl.h" +#include "../../nn/mkldnn/mkldnn_convolution-inl.h" +#include "../../quantization/quantization_utils.h" +namespace mxnet { +namespace op { + +struct MKLDNNConvFusionParam { + MKLDNNConvFullParam full_conv_param; + std::shared_ptr bn_param; +}; + +static const size_t uint8_range = 255; +static const size_t int8_range = 127; + +enum MKLDNNConvOpOutputs { kOut, kMin, kMax }; + +template +static void UpdateConvWeightBias(NDArray *weight, NDArray *bias, bool no_bias, + const NDArray &gamma, const NDArray &beta, + const NDArray &mean, const NDArray &variance, + const BatchNormParam *param) { + // TODO(Zhennan): Handle the case weight is not in dims 4. + NDArray update_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, weight->dtype()); + NDArray update_bias = NDArray(beta.storage_type(), beta.shape(), beta.ctx(), +true, beta.dtype()); + DType *weight_ptr = weight->data().dptr(); + DType *bias_ptr = no_bias ? nullptr : bias->data().dptr(); + DType *gamma_ptr = gamma.Reorder2Default().data().dptr(); + DType *beta_ptr = beta.Reorder2Default().data().dptr(); + DType *mean_ptr = mean.Reorder2Default().data().dptr(); + DType *var_ptr = variance.Reorder2Default().data().dptr(); + DType *update_weight_ptr = update_weight.data().dptr(); + DType *update_bias_ptr = update_bias.data().dptr(); + size_t channel = gamma.shape()[0]; + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; +#pragma omp parallel for + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = reinterpret_cast(weight_ptr + c * offset); +DType *p2 = reinterpret_cast(update_weight_ptr + c * offset); +DType alpha = (param->fix_gamma ? static_cast(1.0f) : gamma_ptr[c]) / + sqrt(var_ptr[c] + param->eps); + +if (bias_ptr) + update_bias_ptr[c] = beta_ptr[c] + alpha * (bias_ptr[c] - mean_ptr[c]); +else + update_bias_ptr[c] = beta_ptr[c] - alpha * mean_ptr[c]; + +for (size_t k = 0; k < offset; ++k) { + p2[k] = p1[k] * alpha; +} + } + *weight = update_weight; + *bias = update_bias; +} + +static inline size_t GetInSumIndex(const MKLDNNConvFusionParam ¶m) { + return 2 + (param.full_conv_param.conv_param.no_bias ? 0 : 1) + + (param.full_conv_param.mkldnn_param.with_bn ? 4 : 0); +} + +template +static void QuantizeConvWeightBias(NDArray *weight, NDArray *bias, + bool has_bias, float data_min, + float data_max, + bool weight_channelwise_scale, + std::vector *weight_scales) { + using red::limits::MaxValue; + using red::limits::MinValue; + DType *weight_ptr = weight->data().dptr(); + NDArray quantized_weight = NDArray(weight->storage_type(), weight->shape(), + weight->ctx(), true, mshadow::kInt8); + int8_t *quan_weight_ptr = quantized_weight.data().dptr(); + size_t channel = weight->shape()[0]; + + // TODO(Zhennan): Handle the case weight is not in dims 4. + size_t offset = weight->shape()[1] * weight->shape()[2] * weight->shape()[3]; + std::vector weight_c_min(channel, MaxValue()); + std::vector weight_c_max(channel, MinValue()); +#pragma omp parallel for + for (int c = 0; c < static_cast(channel); ++c) { +DType *p1 = weight_ptr + c * offset; +for (size_t k = 0; k < offset; ++k) { + if (weight_c_min[c] > p1[k]) +weight_c_m
[GitHub] ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization.
ZhennanQin commented on a change in pull request #12530: Implement mkldnn convolution fusion and quantization. URL: https://github.com/apache/incubator-mxnet/pull/12530#discussion_r219376419 ## File path: src/operator/nn/mkldnn/mkldnn_convolution.cc ## @@ -257,32 +300,59 @@ void MKLDNNConvolutionForward(const nnvm::NodeAttrs& attrs, const OpContext &ctx // This asks the engine to change the layout of the weight array after // it's used. weight.Reorder2DefaultAsync(); -weight_mem = GetWeights(weight, fwd.fwd_pd.weights_primitive_desc(), param.num_group); +weight_mem = GetWeights(weight, fwd->fwd_pd.weights_primitive_desc(), +param.conv_param.num_group); } else { // For inference, we want to reorder the weight array so we don't need to // reorder data every time. if (weight.IsDefaultData()) { - weight_mem = GetWeights(weight, fwd.fwd_pd.weights_primitive_desc(), param.num_group); + weight_mem = GetWeights(weight, fwd->fwd_pd.weights_primitive_desc(), + param.conv_param.num_group); // We also need to modify the layout on the original weight array. The // data conversion happens after the weight array is used. - weight.MKLDNNDataReorderAsync(fwd.fwd_pd.weights_primitive_desc()); + weight.MKLDNNDataReorderAsync(fwd->fwd_pd.weights_primitive_desc()); } else { weight_mem = weight.GetMKLDNNData(); - CHECK(weight_mem->get_primitive_desc() == fwd.fwd_pd.weights_primitive_desc()); + CHECK(weight_mem->get_primitive_desc() == fwd->fwd_pd.weights_primitive_desc()); } } - auto out_mem = CreateMKLDNNMem(out_data[conv::kOut], fwd.fwd_pd.dst_primitive_desc(), - req[conv::kOut]); + mkldnn_output_t out_mem; + if (param.mkldnn_param.with_sum) { +out_mem = mkldnn_output_t( +OutDataOp::Noop, +const_cast(out_data[conv::kOut].GetMKLDNNDataReorder( +fwd->fwd_pd.dst_primitive_desc(; Review comment: Actually, out_data always has same memory description with fwd->fwd_pd, so we should use GetMKLDNNData instead of GetMKLDNNDataReorder. Then out_mem will always be the memory in out_data. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services