Hi dev,
MKL-DNN just published its first major release this month: https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0. Motivation To improve the general look-n-feel of the library and solve few important design issues, in the coming v1.0 major release, some of the data structures, primitive APIs and execution model will be changed and the compatibility to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are mostly covered in RFC for v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. The major changes are listed as below: * Support large tensor with int64_t dimension size. * Expose scratchpad to support stateless primitive and better memory management hence thread safe. * Pass memory and stream to primitive at execution. * Rework MKL-DNN memory descriptor. * Split LSTM/GRU/RNN into different primitives. * Remove MKLML dependency and stop the release of MKLML and iomp packages in MKL-DNN repository. * Support Intel integrated graphics. With these changes, we can resolve or mitigate several existing issues of MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and the int64 tensor size for MKL-DNN backend. Besides that, all new features will go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN dependency to v1.0 to better leverage new features and performance improvement. Development Basically we will follow the same integration methodology we used for v0.x integration, including operator implementation, registration, NDArray modification and graph partitioning. For better collaboration among the community, we will have a feature branch for the development and validation of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code review and CI and finally get committers approval. The development can be simply divide into 3 parts and all the work will be done before Q3'19 ends. During the development, feature branch will sync to the master branch periodically. * P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference for CNN models after P1 is done. * P2: quantization pass, INT8 operators integration (in src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 inference after P2 is done. * P3: RNN operators integration. If needed, documents will be revised accordingly during the development. Validation: * Use feature branch for development - all PRs should pass MXNet CI. * Disable MKL-DNN related tests at the beginning of development and recover them incrementally during the development. * Intel internal validation: mainly focus on performance and convergence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP. Criteria for development done: * MXNet CI: pass all existing unit tests, nightly tests * Accuracy: Pass training convergence and inference accuracy validation * Performance: Achieve similar FP32/INT8 performance as v0.x integration Upstreaming to master branch: After development is done, we will start to upstream the feature branch to the master branch. Since we cannot have two MKL-DNN libraries in MXNet simultaneously, the upstream should be done in a single PR. Possibly the PR will be large, so I hope the community can take time to review and comment during development of the feature branch. We need do our best to make this happen before the 1.6.0 release so we can address the license issue raised in the 1.5.0 vote. Please let me know what do you think about this plan. If you think something should be fixed or improved in this integration, also let me know. thanks, -tao (on behalf of the Intel MXNet team)