Great job, well done everyone!! Lv, Tao A <tao.a...@intel.com> schrieb am Fr., 1. Nov. 2019, 03:50:
> Hi dev, > > The feature branch mkldnn-v1.0 has been merged to master. Really > appreciate your support for this task. > Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0 > Project: https://github.com/apache/incubator-mxnet/projects/16 > PR: https://github.com/apache/incubator-mxnet/pull/16555 > > If possible, please downstream projects help to verify the latest master > branch and feel free to report issues if any. > > Thanks, > -tao > > -----Original Message----- > From: Lv, Tao A <tao.a...@intel.com> > Sent: Sunday, July 28, 2019 11:55 PM > To: dev@mxnet.incubator.apache.org > Cc: Zhao, Patric <patric.z...@intel.com>; Ye, Jason Y < > jason.y...@intel.com> > Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release > > Update: > > I just cut out the feature branch for MKL-DNN 1.0 integration: > https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0 > > Thanks, > -tao > > -----Original Message----- > From: Lv, Tao A <tao.a...@intel.com> > Sent: Friday, July 26, 2019 10:21 PM > To: dev@mxnet.incubator.apache.org > Cc: Zhao, Patric <patric.z...@intel.com>; Ye, Jason Y < > jason.y...@intel.com> > Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release > > Seems we don't have any objection. I will try to cut the feature branch in > the following days. > > Thanks, > -tao > > -----Original Message----- > From: Lv, Tao A <tao.a...@intel.com> > Sent: Saturday, July 20, 2019 11:06 PM > To: dev@mxnet.incubator.apache.org > Cc: Zhao, Patric <patric.z...@intel.com>; Ye, Jason Y < > jason.y...@intel.com> > Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release > > > > Hi dev, > > > > MKL-DNN just published its first major release this month: > https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to > start a discussion about upgrading MKL-DNN integration from the current > v0.20 to v1.0. > > > > Motivation > > To improve the general look-n-feel of the library and solve few important > design issues, in the coming v1.0 major release, some of the data > structures, primitive APIs and execution model will be changed and the > compatibility to v0.x versions will be broken accordingly. Change details > in MKL-DNN v1.0 are mostly covered in RFC for v1.0< > https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. > The major changes are listed as below: > * Support large tensor with int64_t dimension size. > * Expose scratchpad to support stateless primitive and better > memory management hence thread safe. > * Pass memory and stream to primitive at execution. > * Rework MKL-DNN memory descriptor. > * Split LSTM/GRU/RNN into different primitives. > * Remove MKLML dependency and stop the release of MKLML and iomp > packages in MKL-DNN repository. > * Support Intel integrated graphics. > > With these changes, we can resolve or mitigate several existing issues of > MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, > and the int64 tensor size for MKL-DNN backend. Besides that, all new > features will go to v1.x and will not be back ported to v0.x. MXNet need > update the MKL-DNN dependency to v1.0 to better leverage new features and > performance improvement. > > > > Development > > Basically we will follow the same integration methodology we used for v0.x > integration, including operator implementation, registration, NDArray > modification and graph partitioning. For better collaboration among the > community, we will have a feature branch for the development and validation > of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass > the code review and CI and finally get committers approval. The development > can be simply divide into 3 parts and all the work will be done before > Q3'19 ends. During the development, feature branch will sync to the master > branch periodically. > * P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators > integration (in src/operator/nn/mkldnn/). We can do FP32 training and > inference for CNN models after P1 is done. > * P2: quantization pass, INT8 operators integration (in > src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 > inference after P2 is done. > * P3: RNN operators integration. > > If needed, documents will be revised accordingly during the development. > > > > Validation: > * Use feature branch for development - all PRs should pass MXNet CI. > * Disable MKL-DNN related tests at the beginning of development and > recover them incrementally during the development. > * Intel internal validation: mainly focus on performance and > convergence validation on CPU, with models from MXNet examples, Gluon-CV > and Gluon-NLP. > > > > Criteria for development done: > * MXNet CI: pass all existing unit tests, nightly tests > * Accuracy: Pass training convergence and inference accuracy > validation > * Performance: Achieve similar FP32/INT8 performance as v0.x > integration > > > > Upstreaming to master branch: > > After development is done, we will start to upstream the feature branch to > the master branch. Since we cannot have two MKL-DNN libraries in MXNet > simultaneously, the upstream should be done in a single PR. Possibly the PR > will be large, so I hope the community can take time to review and comment > during development of the feature branch. > > > > We need do our best to make this happen before the 1.6.0 release so we can > address the license issue raised in the 1.5.0 vote. > > > > Please let me know what do you think about this plan. If you think > something should be fixed or improved in this integration, also let me know. > > > > thanks, > > -tao (on behalf of the Intel MXNet team) > >