Re: Include MKLDNN into default mxnet pip package

Anirudh Subramanian Mon, 26 Nov 2018 14:17:04 -0800

Hi Tao,

I agree with Steffen that we can start with a stable release for MKLDNN for
1.4.0. For your suggestion on using 0.17, can you provide info on what
versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
are some regressions found like the LSTM regression, would it be possible
to do a patch release for it or maintain a release branch for it ?


Anirudh

On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a...@intel.com> wrote:

> Hi Steffen,
>
> I think all the commits on MKL-DNN master branch are well tested for
> MKL-DNN development team. If we really want to have a release commit in the
> coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
>
> Thank you,
> Tao
>
> Sent from my iPhone
>
> > On Nov 26, 2018, at 8:09 AM, Steffen Rochel <steffenroc...@gmail.com>
> wrote:
> >
> > +1 to make MKL-DNN default.
> > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
> > open issue to be addressed for 1.4.0
> > I do agree that we should move to a model to include released
> dependencies
> > instead of just taking bleeding edge snapshots.
> > However, speed of development is important as well.
> > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> development
> > team provide us with a well tested tag/commit id to include in 1.4.0
> > release?
> > Steffen
> >
> >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com> wrote:
> >>
> >> Thanks for the information, Kellen and Naveen.
> >>
> >> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
> >> release tags. My concern is that as MKL-DNN is still under intensive
> >> development, if it has a new feature or bug fix on its master branch,
> do we
> >> really want to wait for next release to get it supported in MXNet?
> >>
> >> Take the LSTM regression as an example, probably MKL-DNN will give a fix
> >> or improvement on its master branch soon, do we need to wait for 0.18
> >> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
> >> normal commit id, not release, as the dependency for MKL-DNN.
> >>
> >> Regarding the LSTM regression, we are using internal JIRA tickets rather
> >> than github issues to track the defects of MKL-DNN. But I agree with
> you,
> >> we need update the progress of it in Alex's issue.
> >>
> >> Thanks,
> >> -tao
> >>
> >> -----Original Message-----
> >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> >> Sent: Thursday, November 22, 2018 10:55 AM
> >> To: dev@mxnet.incubator.apache.org
> >> Subject: Re: Include MKLDNN into default mxnet pip package
> >>
> >> Agree with your point about other repos also not being based on
> versioning
> >> Tao.  I would point out that I've given some that I've worked with
> similar
> >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> >>
> >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com>
> wrote:
> >>>
> >>> Tao,
> >>>
> >>> You are right there are many submodules in 3rd party. We have to start
> >>> somewhere and I believe this one is a good candidate to start with.
> >>> This is not to cater to release of MXNet or to tie them with the
> >>> releases of the submodules but instead to pick only stable releases
> >>> and not to pick up bleeding edge commits from the tip of the master,
> >>> this gives us confidence in the submodule that MXNet users are
> >>> depending on that especially if we make MKLDNN the default.
> >>>
> >>> Good to know it is known already as a regression.Alex has created this
> >>> issue https://github.com/apache/incubator-mxnet/issues/13369, please
> >>> add details and link the corresponding issue in MKLDNN(I couldn't
> find).
> >>>
> >>> -Naveen
> >>>
> >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote:
> >>>>
> >>>> Here are my answers for the questions from Kellen and Naveen about
> >>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
> >>>> default here.
> >>>>
> >>>> @Kellen,
> >>>>
> >>>> FYI, here is a list for those platforms which are officially
> >>>> supported by MKL-DNN.
> >>>> https://github.com/intel/mkl-dnn#system-requirements
> >>>>
> >>>> Most of computation intensive kernels in MKL-DNN are JITed. So they
> >>>> are supposed to generate code according to the platform during
> >>>> runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
> >>>> it will generate instructions according to the options/flags of
> >>>> compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
> >>>> optimization for compiling machine. That's exactly what we are doing
> >> for MKL-DNN build in MXNet.
> >>> Even
> >>>> without MKL-DNN, I noticed there were issues about illegal
> >>>> instructions
> >>> of
> >>>> MXNet when users import the pip package on a lower end machine which
> >>>> probably only supports SSE.
> >>>>
> >>>> @Naveen,
> >>>>
> >>>> The LSTM issue has already been identified as a regression from the
> >>> recent
> >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
> >>>> update of MKL-DNN.
> >>>>
> >>>> MXNet has many submodule dependencies under the 3rd party folder.
> >>>> Seems
> >>> we
> >>>> don't require release versions for most of these dependencies. The
> >>> release
> >>>> period of MKL-DNN and MXNet are not matched very well. I think it
> >>>> would
> >>> be
> >>>> a risk for MXNet release if it hardly depends on the release of a
> >>>> submodule, no need to say depends on the releases of all submodules.
> >>>>
> >>>> -tao
> >>>>
> >>>> -----Original Message-----
> >>>> From: Naveen Swamy [mailto:mnnav...@gmail.com]
> >>>> Sent: Thursday, November 22, 2018 9:08 AM
> >>>> To: dev@mxnet.incubator.apache.org
> >>>> Cc: d...@mxnet.apache.org
> >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> >>>>
> >>>> Hi Alex,
> >>>>
> >>>> Thanks for promptly running the numbers on AMD and reporting here.
> >>>>
> >>>> Can you please update the AMD numbers here for posterity
> >>>>
> >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL
> >>> -DNN+-+Performance+Benchmarking
> >>>> ?
> >>>>
> >>>> are there any outstanding issues when MKLDNN is enabled? from my
> >>>> offline conversation I am briefly aware performance issues with
> >>>> LSTM, is there an GitHub issue for it?
> >>>>
> >>>> MKLDNN is a submodule dependency, are we pulling the latest commit
> >>>> or releases  ? If not we should move to releases before we make it a
> >>> default.
> >>>> Ideally we should use platform specific distributions (-dev
> >>>> packages) at least we should rely on well tested releases.
> >>>>
> >>>>
> >>>> Thanks, Naveen
> >>>>
> >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> >>> <alex...@amazon.com.invalid
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> AMD benchmarks have been published. We are seeing a x15.8 speedup
> >>>>> with
> >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
> >>>>> smaller network (Mobilenet - batch size 32) the speedup is more
> >>>>> significant at x38.7. Let's have a vote to see if the PR to have
> >>>>> MKLDNN enabled by default
> >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
> >>>>> merged before 1.4.0 release.
> >>>>>
> >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> >>>>> <pedro.larroy.li...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen
> >>>>> 1950X and unit
> >>>>>    tests are passing.
> >>>>>
> >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> >>> flag.
> >>>>>    There's no "avx2" like on recent intel cpus.
> >>>>>
> >>>>>    Pedro.
> >>>>>
> >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
> >>>>> <lupe...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Awesome collaborative effort across many contributors and
> >>>> companies!
> >>>>>>
> >>>>>> The boost is impressive and for MXNet users to get this
> >>>>> boost "out of the
> >>>>>> box" is a great benefit and makes MXNet an even better choice.
> >>>>>>
> >>>>>> Alex - can you clarify whether there are any down sides with
> >>>>> regards to
> >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> >>>>> gracefully fallback?
> >>>>>>
> >>>>>> Hagay
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> >>>>> <wik...@apache.org>
> >>>>> wrote:
> >>>>>>
> >>>>>>> If there is no downside on platforms not supporting AVX512
> >>>>> instructions,
> >>>>>>> then +1
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com>
> >> wrote:
> >>>>>>>
> >>>>>>>> Hey all,
> >>>>>>>> We have been working hard these past few months to
> >>>>> integrate
> >>>> and
> >>>>>>> stabilize
> >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
> >>>>> and have made
> >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> >>>>> (such as
> >>>>> c5.18x)
> >>>>>> we
> >>>>>>>> have seen performance increase up to 12x and on other
> >>>>> platforms (Macs,
> >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> >>>>> can be found
> >>>>>>> here
> >>>>>>>> (
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650
> >>> 764
> >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> >> ).
> >>>>>>>>
> >>>>>>>> Currently, using this accelerator requires the developer
> >>>>> to either pip
> >>>>>>>> install the mxnet-mkl version of mxnet or to build it
> >>>>> themselves from
> >>>>>>>> source. Given that we should try to provide the best
> >>>>> performance "out
> >>>>>> of
> >>>>>>>> the box” with mxnet we should include this in the
> >>>>> default
> >>>> build.
> >>>>> The
> >>>>>>> mkldnn
> >>>>>>>> library is included with in the pip package build so it
> >>>>> does
> >>>> not
> >>>>>> require
> >>>>>>> an
> >>>>>>>> external dependency.
> >>>>>>>>
> >>>>>>>> There were concerns that MKLDNN could cause regressions
> >>>>> on certain
> >>>>>>>> platforms (as it did with the tensorflow version a while
> >>>>> back); but we
> >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
> >>>>> users to turn of
> >>>>>> this
> >>>>>>>> feature during runtime. Please bring up any other
> >>>>> concerns you may have
> >>>>>>> and
> >>>>>>>> your thoughts on including this accelerator in the
> >>>>> default
> >>>> build.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Alex
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Include MKLDNN into default mxnet pip package

Reply via email to