Hi Tao, I agree with Steffen that we can start with a stable release for MKLDNN for 1.4.0. For your suggestion on using 0.17, can you provide info on what versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there are some regressions found like the LSTM regression, would it be possible to do a patch release for it or maintain a release branch for it ?
Anirudh On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a...@intel.com> wrote: > Hi Steffen, > > I think all the commits on MKL-DNN master branch are well tested for > MKL-DNN development team. If we really want to have a release commit in the > coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release. > > Thank you, > Tao > > Sent from my iPhone > > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel <steffenroc...@gmail.com> > wrote: > > > > +1 to make MKL-DNN default. > > I'm tracking https://github.com/apache/incubator-mxnet/issues/13369 as > > open issue to be addressed for 1.4.0 > > I do agree that we should move to a model to include released > dependencies > > instead of just taking bleeding edge snapshots. > > However, speed of development is important as well. > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN > development > > team provide us with a well tested tag/commit id to include in 1.4.0 > > release? > > Steffen > > > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com> wrote: > >> > >> Thanks for the information, Kellen and Naveen. > >> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning and > >> release tags. My concern is that as MKL-DNN is still under intensive > >> development, if it has a new feature or bug fix on its master branch, > do we > >> really want to wait for next release to get it supported in MXNet? > >> > >> Take the LSTM regression as an example, probably MKL-DNN will give a fix > >> or improvement on its master branch soon, do we need to wait for 0.18 > >> release to get it fixed for mxnet user? AFAIK, tensorflow is also using > >> normal commit id, not release, as the dependency for MKL-DNN. > >> > >> Regarding the LSTM regression, we are using internal JIRA tickets rather > >> than github issues to track the defects of MKL-DNN. But I agree with > you, > >> we need update the progress of it in Alex's issue. > >> > >> Thanks, > >> -tao > >> > >> -----Original Message----- > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com] > >> Sent: Thursday, November 22, 2018 10:55 AM > >> To: dev@mxnet.incubator.apache.org > >> Subject: Re: Include MKLDNN into default mxnet pip package > >> > >> Agree with your point about other repos also not being based on > versioning > >> Tao. I would point out that I've given some that I've worked with > similar > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68 > >> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com> > wrote: > >>> > >>> Tao, > >>> > >>> You are right there are many submodules in 3rd party. We have to start > >>> somewhere and I believe this one is a good candidate to start with. > >>> This is not to cater to release of MXNet or to tie them with the > >>> releases of the submodules but instead to pick only stable releases > >>> and not to pick up bleeding edge commits from the tip of the master, > >>> this gives us confidence in the submodule that MXNet users are > >>> depending on that especially if we make MKLDNN the default. > >>> > >>> Good to know it is known already as a regression.Alex has created this > >>> issue https://github.com/apache/incubator-mxnet/issues/13369, please > >>> add details and link the corresponding issue in MKLDNN(I couldn't > find). > >>> > >>> -Naveen > >>> > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote: > >>>> > >>>> Here are my answers for the questions from Kellen and Naveen about > >>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN > >>>> default here. > >>>> > >>>> @Kellen, > >>>> > >>>> FYI, here is a list for those platforms which are officially > >>>> supported by MKL-DNN. > >>>> https://github.com/intel/mkl-dnn#system-requirements > >>>> > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So they > >>>> are supposed to generate code according to the platform during > >>>> runtime. For non-JIT code in MKL-DNN, same as other code in MXNet, > >>>> it will generate instructions according to the options/flags of > >>>> compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid > >>>> optimization for compiling machine. That's exactly what we are doing > >> for MKL-DNN build in MXNet. > >>> Even > >>>> without MKL-DNN, I noticed there were issues about illegal > >>>> instructions > >>> of > >>>> MXNet when users import the pip package on a lower end machine which > >>>> probably only supports SSE. > >>>> > >>>> @Naveen, > >>>> > >>>> The LSTM issue has already been identified as a regression from the > >>> recent > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new > >>>> update of MKL-DNN. > >>>> > >>>> MXNet has many submodule dependencies under the 3rd party folder. > >>>> Seems > >>> we > >>>> don't require release versions for most of these dependencies. The > >>> release > >>>> period of MKL-DNN and MXNet are not matched very well. I think it > >>>> would > >>> be > >>>> a risk for MXNet release if it hardly depends on the release of a > >>>> submodule, no need to say depends on the releases of all submodules. > >>>> > >>>> -tao > >>>> > >>>> -----Original Message----- > >>>> From: Naveen Swamy [mailto:mnnav...@gmail.com] > >>>> Sent: Thursday, November 22, 2018 9:08 AM > >>>> To: dev@mxnet.incubator.apache.org > >>>> Cc: d...@mxnet.apache.org > >>>> Subject: Re: Include MKLDNN into default mxnet pip package > >>>> > >>>> Hi Alex, > >>>> > >>>> Thanks for promptly running the numbers on AMD and reporting here. > >>>> > >>>> Can you please update the AMD numbers here for posterity > >>>> > >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL > >>> -DNN+-+Performance+Benchmarking > >>>> ? > >>>> > >>>> are there any outstanding issues when MKLDNN is enabled? from my > >>>> offline conversation I am briefly aware performance issues with > >>>> LSTM, is there an GitHub issue for it? > >>>> > >>>> MKLDNN is a submodule dependency, are we pulling the latest commit > >>>> or releases ? If not we should move to releases before we make it a > >>> default. > >>>> Ideally we should use platform specific distributions (-dev > >>>> packages) at least we should rely on well tested releases. > >>>> > >>>> > >>>> Thanks, Naveen > >>>> > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander > >>> <alex...@amazon.com.invalid > >>>>> > >>>> wrote: > >>>> > >>>>> AMD benchmarks have been published. We are seeing a x15.8 speedup > >>>>> with > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a > >>>>> smaller network (Mobilenet - batch size 32) the speedup is more > >>>>> significant at x38.7. Let's have a vote to see if the PR to have > >>>>> MKLDNN enabled by default > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be > >>>>> merged before 1.4.0 release. > >>>>> > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy" > >>>>> <pedro.larroy.li...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> I did pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen > >>>>> 1950X and unit > >>>>> tests are passing. > >>>>> > >>>>> Is this build using AVX512? in /proc/cpuinfo I see only "avx" > >>> flag. > >>>>> There's no "avx2" like on recent intel cpus. > >>>>> > >>>>> Pedro. > >>>>> > >>>>> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko > >>>>> <lupe...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Awesome collaborative effort across many contributors and > >>>> companies! > >>>>>> > >>>>>> The boost is impressive and for MXNet users to get this > >>>>> boost "out of the > >>>>>> box" is a great benefit and makes MXNet an even better choice. > >>>>>> > >>>>>> Alex - can you clarify whether there are any down sides with > >>>>> regards to > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it > >>>>> gracefully fallback? > >>>>>> > >>>>>> Hagay > >>>>>> > >>>>>> > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández > >>>>> <wik...@apache.org> > >>>>> wrote: > >>>>>> > >>>>>>> If there is no downside on platforms not supporting AVX512 > >>>>> instructions, > >>>>>>> then +1 > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com> > >> wrote: > >>>>>>> > >>>>>>>> Hey all, > >>>>>>>> We have been working hard these past few months to > >>>>> integrate > >>>> and > >>>>>>> stabilize > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet > >>>>> and have made > >>>>>>>> incredible progress. On CPUs with AVX512 instructions > >>>>> (such as > >>>>> c5.18x) > >>>>>> we > >>>>>>>> have seen performance increase up to 12x and on other > >>>>> platforms (Macs, > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks > >>>>> can be found > >>>>>>> here > >>>>>>>> ( > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650 > >>> 764 > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591 > >> ). > >>>>>>>> > >>>>>>>> Currently, using this accelerator requires the developer > >>>>> to either pip > >>>>>>>> install the mxnet-mkl version of mxnet or to build it > >>>>> themselves from > >>>>>>>> source. Given that we should try to provide the best > >>>>> performance "out > >>>>>> of > >>>>>>>> the box” with mxnet we should include this in the > >>>>> default > >>>> build. > >>>>> The > >>>>>>> mkldnn > >>>>>>>> library is included with in the pip package build so it > >>>>> does > >>>> not > >>>>>> require > >>>>>>> an > >>>>>>>> external dependency. > >>>>>>>> > >>>>>>>> There were concerns that MKLDNN could cause regressions > >>>>> on certain > >>>>>>>> platforms (as it did with the tensorflow version a while > >>>>> back); but we > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows > >>>>> users to turn of > >>>>>> this > >>>>>>>> feature during runtime. Please bring up any other > >>>>> concerns you may have > >>>>>>> and > >>>>>>>> your thoughts on including this accelerator in the > >>>>> default > >>>> build. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Alex > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >> >