Hi Steffen, I think all the commits on MKL-DNN master branch are well tested for MKL-DNN development team. If we really want to have a release commit in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
Thank you, Tao Sent from my iPhone > On Nov 26, 2018, at 8:09 AM, Steffen Rochel <steffenroc...@gmail.com> wrote: > > +1 to make MKL-DNN default. > I'm tracking https://github.com/apache/incubator-mxnet/issues/13369 as > open issue to be addressed for 1.4.0 > I do agree that we should move to a model to include released dependencies > instead of just taking bleeding edge snapshots. > However, speed of development is important as well. > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development > team provide us with a well tested tag/commit id to include in 1.4.0 > release? > Steffen > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com> wrote: >> >> Thanks for the information, Kellen and Naveen. >> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning and >> release tags. My concern is that as MKL-DNN is still under intensive >> development, if it has a new feature or bug fix on its master branch, do we >> really want to wait for next release to get it supported in MXNet? >> >> Take the LSTM regression as an example, probably MKL-DNN will give a fix >> or improvement on its master branch soon, do we need to wait for 0.18 >> release to get it fixed for mxnet user? AFAIK, tensorflow is also using >> normal commit id, not release, as the dependency for MKL-DNN. >> >> Regarding the LSTM regression, we are using internal JIRA tickets rather >> than github issues to track the defects of MKL-DNN. But I agree with you, >> we need update the progress of it in Alex's issue. >> >> Thanks, >> -tao >> >> -----Original Message----- >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com] >> Sent: Thursday, November 22, 2018 10:55 AM >> To: dev@mxnet.incubator.apache.org >> Subject: Re: Include MKLDNN into default mxnet pip package >> >> Agree with your point about other repos also not being based on versioning >> Tao. I would point out that I've given some that I've worked with similar >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68 >> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com> wrote: >>> >>> Tao, >>> >>> You are right there are many submodules in 3rd party. We have to start >>> somewhere and I believe this one is a good candidate to start with. >>> This is not to cater to release of MXNet or to tie them with the >>> releases of the submodules but instead to pick only stable releases >>> and not to pick up bleeding edge commits from the tip of the master, >>> this gives us confidence in the submodule that MXNet users are >>> depending on that especially if we make MKLDNN the default. >>> >>> Good to know it is known already as a regression.Alex has created this >>> issue https://github.com/apache/incubator-mxnet/issues/13369, please >>> add details and link the corresponding issue in MKLDNN(I couldn't find). >>> >>> -Naveen >>> >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote: >>>> >>>> Here are my answers for the questions from Kellen and Naveen about >>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN >>>> default here. >>>> >>>> @Kellen, >>>> >>>> FYI, here is a list for those platforms which are officially >>>> supported by MKL-DNN. >>>> https://github.com/intel/mkl-dnn#system-requirements >>>> >>>> Most of computation intensive kernels in MKL-DNN are JITed. So they >>>> are supposed to generate code according to the platform during >>>> runtime. For non-JIT code in MKL-DNN, same as other code in MXNet, >>>> it will generate instructions according to the options/flags of >>>> compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid >>>> optimization for compiling machine. That's exactly what we are doing >> for MKL-DNN build in MXNet. >>> Even >>>> without MKL-DNN, I noticed there were issues about illegal >>>> instructions >>> of >>>> MXNet when users import the pip package on a lower end machine which >>>> probably only supports SSE. >>>> >>>> @Naveen, >>>> >>>> The LSTM issue has already been identified as a regression from the >>> recent >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new >>>> update of MKL-DNN. >>>> >>>> MXNet has many submodule dependencies under the 3rd party folder. >>>> Seems >>> we >>>> don't require release versions for most of these dependencies. The >>> release >>>> period of MKL-DNN and MXNet are not matched very well. I think it >>>> would >>> be >>>> a risk for MXNet release if it hardly depends on the release of a >>>> submodule, no need to say depends on the releases of all submodules. >>>> >>>> -tao >>>> >>>> -----Original Message----- >>>> From: Naveen Swamy [mailto:mnnav...@gmail.com] >>>> Sent: Thursday, November 22, 2018 9:08 AM >>>> To: dev@mxnet.incubator.apache.org >>>> Cc: d...@mxnet.apache.org >>>> Subject: Re: Include MKLDNN into default mxnet pip package >>>> >>>> Hi Alex, >>>> >>>> Thanks for promptly running the numbers on AMD and reporting here. >>>> >>>> Can you please update the AMD numbers here for posterity >>>> >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL >>> -DNN+-+Performance+Benchmarking >>>> ? >>>> >>>> are there any outstanding issues when MKLDNN is enabled? from my >>>> offline conversation I am briefly aware performance issues with >>>> LSTM, is there an GitHub issue for it? >>>> >>>> MKLDNN is a submodule dependency, are we pulling the latest commit >>>> or releases ? If not we should move to releases before we make it a >>> default. >>>> Ideally we should use platform specific distributions (-dev >>>> packages) at least we should rely on well tested releases. >>>> >>>> >>>> Thanks, Naveen >>>> >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander >>> <alex...@amazon.com.invalid >>>>> >>>> wrote: >>>> >>>>> AMD benchmarks have been published. We are seeing a x15.8 speedup >>>>> with >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a >>>>> smaller network (Mobilenet - batch size 32) the speedup is more >>>>> significant at x38.7. Let's have a vote to see if the PR to have >>>>> MKLDNN enabled by default >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be >>>>> merged before 1.4.0 release. >>>>> >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy" >>>>> <pedro.larroy.li...@gmail.com> >>>>> wrote: >>>>> >>>>> I did pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen >>>>> 1950X and unit >>>>> tests are passing. >>>>> >>>>> Is this build using AVX512? in /proc/cpuinfo I see only "avx" >>> flag. >>>>> There's no "avx2" like on recent intel cpus. >>>>> >>>>> Pedro. >>>>> >>>>> On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko >>>>> <lupe...@gmail.com> >>>>> wrote: >>>>> >>>>>> Awesome collaborative effort across many contributors and >>>> companies! >>>>>> >>>>>> The boost is impressive and for MXNet users to get this >>>>> boost "out of the >>>>>> box" is a great benefit and makes MXNet an even better choice. >>>>>> >>>>>> Alex - can you clarify whether there are any down sides with >>>>> regards to >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it >>>>> gracefully fallback? >>>>>> >>>>>> Hagay >>>>>> >>>>>> >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández >>>>> <wik...@apache.org> >>>>> wrote: >>>>>> >>>>>>> If there is no downside on platforms not supporting AVX512 >>>>> instructions, >>>>>>> then +1 >>>>>>> >>>>>>> >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com> >> wrote: >>>>>>> >>>>>>>> Hey all, >>>>>>>> We have been working hard these past few months to >>>>> integrate >>>> and >>>>>>> stabilize >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet >>>>> and have made >>>>>>>> incredible progress. On CPUs with AVX512 instructions >>>>> (such as >>>>> c5.18x) >>>>>> we >>>>>>>> have seen performance increase up to 12x and on other >>>>> platforms (Macs, >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks >>>>> can be found >>>>>>> here >>>>>>>> ( >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650 >>> 764 >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591 >> ). >>>>>>>> >>>>>>>> Currently, using this accelerator requires the developer >>>>> to either pip >>>>>>>> install the mxnet-mkl version of mxnet or to build it >>>>> themselves from >>>>>>>> source. Given that we should try to provide the best >>>>> performance "out >>>>>> of >>>>>>>> the box” with mxnet we should include this in the >>>>> default >>>> build. >>>>> The >>>>>>> mkldnn >>>>>>>> library is included with in the pip package build so it >>>>> does >>>> not >>>>>> require >>>>>>> an >>>>>>>> external dependency. >>>>>>>> >>>>>>>> There were concerns that MKLDNN could cause regressions >>>>> on certain >>>>>>>> platforms (as it did with the tensorflow version a while >>>>> back); but we >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows >>>>> users to turn of >>>>>> this >>>>>>>> feature during runtime. Please bring up any other >>>>> concerns you may have >>>>>>> and >>>>>>>> your thoughts on including this accelerator in the >>>>> default >>>> build. >>>>>>>> >>>>>>>> Best, >>>>>>>> Alex >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>