If my understanding is correct about the context, it should be acknowledged that the significant performance improvement comes from the Intel MKLDNN team's contribution in this PR: https://github.com/apache/incubator-mxnet/pull/12530.
On Wed, Oct 17, 2018 at 3:12 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > First of all thanks to Intel for these improvements, really a great effort. > > What would the compatibility story look like for users that don't have > these AVX instructions? Would there be any negative affect for AMD users? > > Regarding TensorRT: It's a possibility but not planned in the short term. A > few considerations would be the limits on PyPi package sizes and the bloat > incurred with TRT, the requirements of TRT to be installed on the user > side, and the TRT engine build times which are non-trivial. We can work > towards fixing or working around these issues in the future if default TRT > is something the user community would like to see for Cuda packages. While > the feature is experimental we'll likely continue to use > 'mxnet-tensorrt-cu92' and 'mxnet-tensorrt-cu90'. > > On Wed, Oct 17, 2018 at 2:12 PM Alfredo Luque > <alfredo.lu...@airbnb.com.invalid> wrote: > > > This is huge. Thanks for working on this. Is there a similar plan with > eg; > > tensor-rt support being ported into the main cuda-9.x packages? > > > > On October 17, 2018 at 2:10:20 PM, Alex Zai (aza...@gmail.com) wrote: > > > > Hey all, > > We have been working hard these past few months to integrate and > stabilize > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and have made > > incredible progress. On CPUs with AVX512 instructions (such as c5.18x) we > > have seen performance increase up to 12x and on other platforms (Macs, > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can be found > here > > ( > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764 > > and https://github.com/apache/incubator-mxnet/pull/12591). > > > > Currently, using this accelerator requires the developer to either pip > > install the mxnet-mkl version of mxnet or to build it themselves from > > source. Given that we should try to provide the best performance "out of > > the box” with mxnet we should include this in the default build. The > mkldnn > > library is included with in the pip package build so it does not require > an > > external dependency. > > > > There were concerns that MKLDNN could cause regressions on certain > > platforms (as it did with the tensorflow version a while back); but we > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to turn of this > > feature during runtime. Please bring up any other concerns you may have > and > > your thoughts on including this accelerator in the default build. > > > > Best, > > Alex > > > > — > > Alfredo Luque > > Software Engineer > > Machine Learning Infrastructure > > Airbnb > > San Francisco, CA > > >