This is huge. Thanks for working on this. Is there a similar plan with eg; tensor-rt support being ported into the main cuda-9.x packages?
On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote: Hey all, We have been working hard these past few months to integrate and stabilize Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we have seen performance increase up to 12x and on other platforms (Macs, AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found here ( https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764 and https://github.com/apache/incubator-mxnet/pull/12591). Currently, using this accelerator requires the developer to either pip install the mxnet-mkl version of mxnet or to build it themselves from source. Given that we should try to provide the best performance "out of the box” with mxnet we should include this in the default build. The mkldnn library is included with in the pip package build so it does not require an external dependency. There were concerns that MKLDNN could cause regressions on certain platforms (as it did with the tensorflow version a while back); but we added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this feature during runtime. Please bring up any other concerns you may have and your thoughts on including this accelerator in the default build. Best, Alex — Alfredo Luque Software Engineer Machine Learning Infrastructure Airbnb San Francisco, CA