Re: Include MKLDNN into default mxnet pip package

Anirudh Subramanian Tue, 27 Nov 2018 16:26:54 -0800

Hi Tao,

I was suggesting we can start using a release tag from mkldnn for major and
minor releases of mxnet starting with 1.4.0. But this would require a
versioning mechanism similar to semver for MKLDNN and  MKLDNN to do patch
release to backport the bug fixes/regressions. I dont know if this is going
to happen anytime soon (It would be nice if you can obtain some timeline
from MKLDNN team on this). As long as the PIP still has two different
packages for mkl and without mkl my vote is +1 for adding it as a default.


Anirudh


On Tue, Nov 27, 2018 at 5:04 AM Lv, Tao A <tao.a...@intel.com> wrote:

> Hi Anirudh,
>
> Just to confirm, you're focusing on the 1.4.0 release of MXNet and want to
> have a release version of MKL-DNN there, right? Or do you mean all the
> development in the future should base on the release version of MKL-DNN?
> For the former one, I think 0.17 release of MKL-DNN is a good choice. But
> it will not have fix for the LSTM regression mentioned in previous email.
>
> I'm talking about the versioning mechanism with MKL-DNN maintainers and
> will be back to you if I get any response. But from the releasing history
> of MKL-DNN, I cannot find any evidence about patch release.
>
> -tao
>
> -----Original Message-----
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip package
>
> Hi Tao,
>
> I agree with Steffen that we can start with a stable release for MKLDNN
> for 1.4.0. For your suggestion on using 0.17, can you provide info on what
> versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
> are some regressions found like the LSTM regression, would it be possible
> to do a patch release for it or maintain a release branch for it ?
>
> Anirudh
>
> On Sun, Nov 25, 2018 at 5:03 PM Lv, Tao A <tao.a...@intel.com> wrote:
>
> > Hi Steffen,
> >
> > I think all the commits on MKL-DNN master branch are well tested for
> > MKL-DNN development team. If we really want to have a release commit
> > in the coming 1.4 mxnet release, my suggestion is 0.17 MKL-DNN release.
> >
> > Thank you,
> > Tao
> >
> > Sent from my iPhone
> >
> > > On Nov 26, 2018, at 8:09 AM, Steffen Rochel
> > > <steffenroc...@gmail.com>
> > wrote:
> > >
> > > +1 to make MKL-DNN default.
> > > I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369
> > > as open issue to be addressed for 1.4.0 I do agree that we should
> > > move to a model to include released
> > dependencies
> > > instead of just taking bleeding edge snapshots.
> > > However, speed of development is important as well.
> > > As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN
> > development
> > > team provide us with a well tested tag/commit id to include in 1.4.0
> > > release?
> > > Steffen
> > >
> > >> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com>
> wrote:
> > >>
> > >> Thanks for the information, Kellen and Naveen.
> > >>
> > >> Better than onnx-tensorrt, MKL-DNN has already provided versioning
> > >> and release tags. My concern is that as MKL-DNN is still under
> > >> intensive development, if it has a new feature or bug fix on its
> > >> master branch,
> > do we
> > >> really want to wait for next release to get it supported in MXNet?
> > >>
> > >> Take the LSTM regression as an example, probably MKL-DNN will give
> > >> a fix or improvement on its master branch soon, do we need to wait
> > >> for 0.18 release to get it fixed for mxnet user? AFAIK, tensorflow
> > >> is also using normal commit id, not release, as the dependency for
> MKL-DNN.
> > >>
> > >> Regarding the LSTM regression, we are using internal JIRA tickets
> > >> rather than github issues to track the defects of MKL-DNN. But I
> > >> agree with
> > you,
> > >> we need update the progress of it in Alex's issue.
> > >>
> > >> Thanks,
> > >> -tao
> > >>
> > >> -----Original Message-----
> > >> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
> > >> Sent: Thursday, November 22, 2018 10:55 AM
> > >> To: dev@mxnet.incubator.apache.org
> > >> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>
> > >> Agree with your point about other repos also not being based on
> > versioning
> > >> Tao.  I would point out that I've given some that I've worked with
> > similar
> > >> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
> > >>
> > >>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com>
> > wrote:
> > >>>
> > >>> Tao,
> > >>>
> > >>> You are right there are many submodules in 3rd party. We have to
> > >>> start somewhere and I believe this one is a good candidate to start
> with.
> > >>> This is not to cater to release of MXNet or to tie them with the
> > >>> releases of the submodules but instead to pick only stable
> > >>> releases and not to pick up bleeding edge commits from the tip of
> > >>> the master, this gives us confidence in the submodule that MXNet
> > >>> users are depending on that especially if we make MKLDNN the default.
> > >>>
> > >>> Good to know it is known already as a regression.Alex has created
> > >>> this issue https://github.com/apache/incubator-mxnet/issues/13369,
> > >>> please add details and link the corresponding issue in MKLDNN(I
> > >>> couldn't
> > find).
> > >>>
> > >>> -Naveen
> > >>>
> > >>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com>
> wrote:
> > >>>>
> > >>>> Here are my answers for the questions from Kellen and Naveen
> > >>>> about MKL-DNN. It doesn't mean that I'm supportive for making
> > >>>> MKL-DNN default here.
> > >>>>
> > >>>> @Kellen,
> > >>>>
> > >>>> FYI, here is a list for those platforms which are officially
> > >>>> supported by MKL-DNN.
> > >>>> https://github.com/intel/mkl-dnn#system-requirements
> > >>>>
> > >>>> Most of computation intensive kernels in MKL-DNN are JITed. So
> > >>>> they are supposed to generate code according to the platform
> > >>>> during runtime. For non-JIT code in MKL-DNN, same as other code
> > >>>> in MXNet, it will generate instructions according to the
> > >>>> options/flags of compiler. We can set -DARCH_OPT_FLAGS when build
> > >>>> MKL-DNN to avoid optimization for compiling machine. That's
> > >>>> exactly what we are doing
> > >> for MKL-DNN build in MXNet.
> > >>> Even
> > >>>> without MKL-DNN, I noticed there were issues about illegal
> > >>>> instructions
> > >>> of
> > >>>> MXNet when users import the pip package on a lower end machine
> > >>>> which probably only supports SSE.
> > >>>>
> > >>>> @Naveen,
> > >>>>
> > >>>> The LSTM issue has already been identified as a regression from
> > >>>> the
> > >>> recent
> > >>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
> > >>>> update of MKL-DNN.
> > >>>>
> > >>>> MXNet has many submodule dependencies under the 3rd party folder.
> > >>>> Seems
> > >>> we
> > >>>> don't require release versions for most of these dependencies.
> > >>>> The
> > >>> release
> > >>>> period of MKL-DNN and MXNet are not matched very well. I think it
> > >>>> would
> > >>> be
> > >>>> a risk for MXNet release if it hardly depends on the release of a
> > >>>> submodule, no need to say depends on the releases of all submodules.
> > >>>>
> > >>>> -tao
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: Naveen Swamy [mailto:mnnav...@gmail.com]
> > >>>> Sent: Thursday, November 22, 2018 9:08 AM
> > >>>> To: dev@mxnet.incubator.apache.org
> > >>>> Cc: d...@mxnet.apache.org
> > >>>> Subject: Re: Include MKLDNN into default mxnet pip package
> > >>>>
> > >>>> Hi Alex,
> > >>>>
> > >>>> Thanks for promptly running the numbers on AMD and reporting here.
> > >>>>
> > >>>> Can you please update the AMD numbers here for posterity
> > >>>>
> > >>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel
> > >>> +MKL
> > >>> -DNN+-+Performance+Benchmarking
> > >>>> ?
> > >>>>
> > >>>> are there any outstanding issues when MKLDNN is enabled? from my
> > >>>> offline conversation I am briefly aware performance issues with
> > >>>> LSTM, is there an GitHub issue for it?
> > >>>>
> > >>>> MKLDNN is a submodule dependency, are we pulling the latest
> > >>>> commit or releases  ? If not we should move to releases before we
> > >>>> make it a
> > >>> default.
> > >>>> Ideally we should use platform specific distributions (-dev
> > >>>> packages) at least we should rely on well tested releases.
> > >>>>
> > >>>>
> > >>>> Thanks, Naveen
> > >>>>
> > >>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
> > >>> <alex...@amazon.com.invalid
> > >>>>>
> > >>>> wrote:
> > >>>>
> > >>>>> AMD benchmarks have been published. We are seeing a x15.8
> > >>>>> speedup with
> > >>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With
> > >>>>> a smaller network (Mobilenet - batch size 32) the speedup is
> > >>>>> more significant at x38.7. Let's have a vote to see if the PR to
> > >>>>> have MKLDNN enabled by default
> > >>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
> > >>>>> merged before 1.4.0 release.
> > >>>>>
> > >>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
> > >>>>> <pedro.larroy.li...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen
> > >>>>> 1950X and unit
> > >>>>>    tests are passing.
> > >>>>>
> > >>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
> > >>> flag.
> > >>>>>    There's no "avx2" like on recent intel cpus.
> > >>>>>
> > >>>>>    Pedro.
> > >>>>>
> > >>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
> > >>>>> <lupe...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Awesome collaborative effort across many contributors and
> > >>>> companies!
> > >>>>>>
> > >>>>>> The boost is impressive and for MXNet users to get this
> > >>>>> boost "out of the
> > >>>>>> box" is a great benefit and makes MXNet an even better choice.
> > >>>>>>
> > >>>>>> Alex - can you clarify whether there are any down sides with
> > >>>>> regards to
> > >>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
> > >>>>> gracefully fallback?
> > >>>>>>
> > >>>>>> Hagay
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
> > >>>>> <wik...@apache.org>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> If there is no downside on platforms not supporting AVX512
> > >>>>> instructions,
> > >>>>>>> then +1
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com>
> > >> wrote:
> > >>>>>>>
> > >>>>>>>> Hey all,
> > >>>>>>>> We have been working hard these past few months to
> > >>>>> integrate
> > >>>> and
> > >>>>>>> stabilize
> > >>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
> > >>>>> and have made
> > >>>>>>>> incredible progress. On CPUs with AVX512 instructions
> > >>>>> (such as
> > >>>>> c5.18x)
> > >>>>>> we
> > >>>>>>>> have seen performance increase up to 12x and on other
> > >>>>> platforms (Macs,
> > >>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
> > >>>>> can be found
> > >>>>>>> here
> > >>>>>>>> (
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=9
> > >>> 5650
> > >>> 764
> > >>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
> > >> ).
> > >>>>>>>>
> > >>>>>>>> Currently, using this accelerator requires the developer
> > >>>>> to either pip
> > >>>>>>>> install the mxnet-mkl version of mxnet or to build it
> > >>>>> themselves from
> > >>>>>>>> source. Given that we should try to provide the best
> > >>>>> performance "out
> > >>>>>> of
> > >>>>>>>> the box” with mxnet we should include this in the
> > >>>>> default
> > >>>> build.
> > >>>>> The
> > >>>>>>> mkldnn
> > >>>>>>>> library is included with in the pip package build so it
> > >>>>> does
> > >>>> not
> > >>>>>> require
> > >>>>>>> an
> > >>>>>>>> external dependency.
> > >>>>>>>>
> > >>>>>>>> There were concerns that MKLDNN could cause regressions
> > >>>>> on certain
> > >>>>>>>> platforms (as it did with the tensorflow version a while
> > >>>>> back); but we
> > >>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
> > >>>>> users to turn of
> > >>>>>> this
> > >>>>>>>> feature during runtime. Please bring up any other
> > >>>>> concerns you may have
> > >>>>>>> and
> > >>>>>>>> your thoughts on including this accelerator in the
> > >>>>> default
> > >>>> build.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Alex
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>

Re: Include MKLDNN into default mxnet pip package

Reply via email to