Agree with your point about other repos also not being based on versioning Tao. I would point out that I've given some that I've worked with similar feedback: https://github.com/onnx/onnx-tensorrt/issues/68
On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com> wrote: > Tao, > > You are right there are many submodules in 3rd party. We have to start > somewhere and I believe this one is a good candidate to start with. This is > not to cater to release of MXNet or to tie them with the releases of the > submodules but instead to pick only stable releases and not to pick up > bleeding edge commits from the tip of the master, this gives us confidence > in the submodule that MXNet users are depending on that especially if we > make MKLDNN the default. > > Good to know it is known already as a regression.Alex has created this > issue https://github.com/apache/incubator-mxnet/issues/13369, please add > details and link the corresponding issue in MKLDNN(I couldn't find). > > -Naveen > > On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote: > > > Here are my answers for the questions from Kellen and Naveen about > > MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN default > > here. > > > > @Kellen, > > > > FYI, here is a list for those platforms which are officially supported by > > MKL-DNN. > > https://github.com/intel/mkl-dnn#system-requirements > > > > Most of computation intensive kernels in MKL-DNN are JITed. So they are > > supposed to generate code according to the platform during runtime. For > > non-JIT code in MKL-DNN, same as other code in MXNet, it will generate > > instructions according to the options/flags of compiler. We can set > > -DARCH_OPT_FLAGS when build MKL-DNN to avoid optimization for compiling > > machine. That's exactly what we are doing for MKL-DNN build in MXNet. > Even > > without MKL-DNN, I noticed there were issues about illegal instructions > of > > MXNet when users import the pip package on a lower end machine which > > probably only supports SSE. > > > > @Naveen, > > > > The LSTM issue has already been identified as a regression from the > recent > > version of MKL-DNN. Hopefully it will be fixed soon with a new update of > > MKL-DNN. > > > > MXNet has many submodule dependencies under the 3rd party folder. Seems > we > > don't require release versions for most of these dependencies. The > release > > period of MKL-DNN and MXNet are not matched very well. I think it would > be > > a risk for MXNet release if it hardly depends on the release of a > > submodule, no need to say depends on the releases of all submodules. > > > > -tao > > > > -----Original Message----- > > From: Naveen Swamy [mailto:mnnav...@gmail.com] > > Sent: Thursday, November 22, 2018 9:08 AM > > To: dev@mxnet.incubator.apache.org > > Cc: d...@mxnet.apache.org > > Subject: Re: Include MKLDNN into default mxnet pip package > > > > Hi Alex, > > > > Thanks for promptly running the numbers on AMD and reporting here. > > > > Can you please update the AMD numbers here for posterity > > > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking > > ? > > > > are there any outstanding issues when MKLDNN is enabled? from my offline > > conversation I am briefly aware performance issues with LSTM, is there an > > GitHub issue for it? > > > > MKLDNN is a submodule dependency, are we pulling the latest commit or > > releases ? If not we should move to releases before we make it a > default. > > Ideally we should use platform specific distributions (-dev packages) at > > least we should rely on well tested releases. > > > > > > Thanks, Naveen > > > > On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander > <alex...@amazon.com.invalid > > > > > wrote: > > > > > AMD benchmarks have been published. We are seeing a x15.8 speedup with > > > Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a > > > smaller network (Mobilenet - batch size 32) the speedup is more > > > significant at x38.7. Let's have a vote to see if the PR to have > > > MKLDNN enabled by default > > > (https://github.com/apache/incubator-mxnet/pull/12591) can be merged > > > before 1.4.0 release. > > > > > > On 10/19/18, 9:17 AM, "Pedro Larroy" <pedro.larroy.li...@gmail.com> > > > wrote: > > > > > > I did pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen 1950X > > > and unit > > > tests are passing. > > > > > > Is this build using AVX512? in /proc/cpuinfo I see only "avx" > flag. > > > There's no "avx2" like on recent intel cpus. > > > > > > Pedro. > > > > > > On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko <lupe...@gmail.com> > > > wrote: > > > > > > > Awesome collaborative effort across many contributors and > > companies! > > > > > > > > The boost is impressive and for MXNet users to get this boost > > > "out of the > > > > box" is a great benefit and makes MXNet an even better choice. > > > > > > > > Alex - can you clarify whether there are any down sides with > > > regards to > > > > noon AVX-512 architectures, AMD CPUs, etc? Will it gracefully > > > fallback? > > > > > > > > Hagay > > > > > > > > > > > > On Fri, Oct 19, 2018, 15:46 Sergio Fernández <wik...@apache.org> > > > wrote: > > > > > > > > > If there is no downside on platforms not supporting AVX512 > > > instructions, > > > > > then +1 > > > > > > > > > > > > > > > On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com> wrote: > > > > > > > > > > > Hey all, > > > > > > We have been working hard these past few months to integrate > > and > > > > > stabilize > > > > > > Intel’s MKLDNN deep learning CPU accelerator into Mxnet and > > > have made > > > > > > incredible progress. On CPUs with AVX512 instructions (such > > > as > > > c5.18x) > > > > we > > > > > > have seen performance increase up to 12x and on other > > > platforms (Macs, > > > > > > AVX2) we seen a speedup of 1.5+. Full list of benchmarks can > > > be found > > > > > here > > > > > > ( > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650764 > > > > > > and https://github.com/apache/incubator-mxnet/pull/12591). > > > > > > > > > > > > Currently, using this accelerator requires the developer to > > > either pip > > > > > > install the mxnet-mkl version of mxnet or to build it > > > themselves from > > > > > > source. Given that we should try to provide the best > > > performance "out > > > > of > > > > > > the box” with mxnet we should include this in the default > > build. > > > The > > > > > mkldnn > > > > > > library is included with in the pip package build so it does > > not > > > > require > > > > > an > > > > > > external dependency. > > > > > > > > > > > > There were concerns that MKLDNN could cause regressions on > > > certain > > > > > > platforms (as it did with the tensorflow version a while > > > back); but we > > > > > > added a env flag (MXNET_MKLDNN_ENABLED) that allows users to > > > turn of > > > > this > > > > > > feature during runtime. Please bring up any other concerns > > > you may have > > > > > and > > > > > > your thoughts on including this accelerator in the default > > build. > > > > > > > > > > > > Best, > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > >