Hi dev,



MKL-DNN just published its first major release this month: 
https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start 
a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important 
design issues, in the coming v1.0 major release, some of the data structures, 
primitive APIs and execution model will be changed and the compatibility to 
v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are 
mostly covered in RFC for 
v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>.
 The major changes are listed as below:
*        Support large tensor with int64_t dimension size.
*        Expose scratchpad to support stateless primitive and better memory 
management hence thread safe.
*        Pass memory and stream to primitive at execution.
*        Rework MKL-DNN memory descriptor.
*        Split LSTM/GRU/RNN into different primitives.
*        Remove MKLML dependency and stop the release of MKLML and iomp 
packages in MKL-DNN repository.
*        Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of 
MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and 
the int64 tensor size for MKL-DNN backend. Besides that, all new features will 
go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN 
dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x 
integration, including operator implementation, registration, NDArray 
modification and graph partitioning. For better collaboration among the 
community, we will have a feature branch for the development and validation of 
MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code 
review and CI and finally get committers approval. The development can be 
simply divide into 3 parts and all the work will be done before Q3'19 ends. 
During the development, feature branch will sync to the master branch 
periodically.
*        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators 
integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference 
for CNN models after P1 is done.
*        P2: quantization pass, INT8 operators integration (in 
src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 
inference after P2 is done.
*        P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*        Use feature branch for development - all PRs should pass MXNet CI.
*        Disable MKL-DNN related tests at the beginning of development and 
recover them incrementally during the development.
*        Intel internal validation: mainly focus on performance and convergence 
validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*        MXNet CI: pass all existing unit tests, nightly tests
*        Accuracy: Pass training convergence and inference accuracy validation
*        Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the 
master branch. Since we cannot have two MKL-DNN libraries in MXNet 
simultaneously, the upstream should be done in a single PR. Possibly the PR 
will be large, so I hope the community can take time to review and comment 
during development of the feature branch.



We need do our best to make this happen before the 1.6.0 release so we can 
address the license issue raised in the 1.5.0 vote.



Please let me know what do you think about this plan. If you think something 
should be fixed or improved in this integration, also let me know.



thanks,

-tao (on behalf of the Intel MXNet team)

Reply via email to