Thanks for all great inputs. 

Regarding AMD, these're some data in the wiki.
https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking


> -----Original Message-----
> From: Alfredo Luque <alfredo.lu...@airbnb.com.INVALID>
> Sent: Tuesday, November 19, 2019 10:40 AM
> To: Tao Lv <mutou...@gmail.com>; dev@mxnet.incubator.apache.org
> Subject: Re: Proposal to make MKLDNN as default CPU backend
> 
> For AMD CPUs, you’d want to perform validation because now MKL-DNN
> would be enabled by default. Historically, other intel libraries (along with 
> the
> ICC
> compiler) have had performance issues on AMD CPUs. It’s just worth double
> checking to make sure that’s not the case here. Perhaps some MKL-DNN
> authors can chime in though. It’s not sufficient to double check that an
> AVX2 package passes tests.
> 
> Agreed in the case we’re not releasing ARM binaries.
> 
> The reproducibility argument is around the results being numerically
> reproducible. That is, eg; if I train a model with some fixed set of data, 
> some
> random seed, etc. and then run inference on it do I get the exact same
> floating point values for the weights and results? Does MxNet already offer
> this without MKL-DNN?
> 
> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) wrote:
> 
> Regarding the cases listed by Marco:
> - AMD CPU
> From my architecture knowledge, what works on C4 instances (with AVX2
> support) should also work well on m5a, right? I think mxnet-mkl and mxnet-
> cuxxmkl packages have been fully validated on AVX2 machines.
> Also, we didn't perform any validation on AMD CPU before, why we need do
> that for this time?
> 
> - ARM CPU
> I don't know we're releasing any convenience binaries for ARM CPU. This
> proposal mainly targets those pypi packages.
> 
> - Windows
> Already validated by CI. We're also releasing mxnet-mkl packages for Win.
> 
> - GPU and MKLDNN enabled
> Already validated by CI and mxnet-cuxxmkl packages have been released for
> several versions.
> 
> - Fully reproducible results (medical and financial sector requested that and
> we have some flags for cuda) Not sure I understand this case. We already
> have MKL-DNN backend for a while. Functionality and correctness of it have
> been verified by MXNet users.
> 
> -tao
> 
> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu
> <marco.g.ab...@gmail.com>
> wrote:
> 
> > Sorry, my intent with the "non-standard" phrase was not about general
> MXNet
> > but rather from MKLDNNs point of view, considering that it's being
> > developed by Intel, I assumed that MKLDNN might consider non-intel
> > use-cases non standard.
> >
> > -Marco
> >
> > Skalicky, Sam <sska...@amazon.com.invalid> schrieb am Mo., 18. Nov.
> > 2019,
> > 21:34:
> >
> > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we
> can
> > > add this to the todo list for integrating with the MXNet CI to test
> > > on
> > m5a
> > > instances too. Then we can start tracking this on a regular basis.
> > > It
> > would
> > > be great to actually test on ARM instances now that AWS has A1
> instances
> > > too…..ill add it to the wish list ;-D
> > >
> > > Sam
> > >
> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque
> > > > <alfredo.lu...@airbnb.com
> > .INVALID>
> > > wrote:
> > > >
> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and
> > > > first generation AMD Threadripper Gen 1 if someone has something
> > > > easy to
> run
> > > and
> > > > representative.
> > > >
> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam (
> > > > sska...@amazon.com.invalid) wrote:
> > > >
> > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs?
> > > > Or
> > is
> > > > there someone else in the mxnet dev@ community who can help?
> > > >
> > > > Sam
> > > >
> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque
> > > > <alfredo.lu...@airbnb.com.INVALID> wrote:
> > > >>
> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen /
> > > >> Epyc)
> > > > would
> > > >> definitely make sense as a requirement. It seems odd to classify
> that
> > as
> > > > a
> > > >> “nonstandard” use case.
> > > >>
> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam (
> > > >> sska...@amazon.com.invalid) wrote:
> > > >>
> > > >> Thanks Patric & team for your work over the years to make MXNet
> > > >> fast
> > > with
> > > >> MKLDNN!
> > > >>
> > > >> I think it would be great to make MKLDNN enabled by default. We
> > > >> will
> > > need
> > > >> to continue producing variants without MKLDNN for those who don’t
> want
> > > it
> > > >> (Marco enumerated some use cases). How do you propose to identify
> the
> > > pip
> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and
> > > > mxnet-cu101mkl
> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN
> > > >> what
> > do
> > > > you
> > > >> propose we call the build without MKLDNN? mxnet-nomkl?
> > > >>
> > > >> Thanks!
> > > >> Sam
> > > >>
> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu <
> > marco.g.ab...@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>> Hi Patric,
> > > >>>
> > > >>> First of all, thanks a lot to you and your team for all the
> > > >>> effort
> on
> > > >> MXNet
> > > >>> and mkldnn!
> > > >>>
> > > >>> Generally I'm inclined towards your proposal, but I'm thinking
> about
> > > the
> > > >>> non-standard use cases:
> > > >>> - AMD CPU
> > > >>> - ARM CPU
> > > >>> - Windows
> > > >>> - GPU and MKLDNN enabled
> > > >>> - Fully reproducible results (medical and financial sector
> requested
> > > > that
> > > >>> and we have some flags for cuda)
> > > >>>
> > > >>> Is mkldnn fully compatible with these use cases? If not, what
> > > >>> would
> > > >> happen?
> > > >>> If yes, do we have performance numbers?
> > > >>>
> > > >>> Best regards,
> > > >>> Marco
> > > >>>
> > > >>> Zhao, Patric <patric.z...@intel.com> schrieb am Mo., 18. Nov.
> > > >>> 2019,
> > > >> 14:00:
> > > >>>
> > > >>>> Hi MXNet community,
> > > >>>>
> > > >>>> From the first MKLDNN backend integrated in release 1.2, the
> > community
> > > >> is
> > > >>>> continuously improving the quality and performance of MKLDNN
> > > >>>> CPU
> > > >> backend.
> > > >>>> Nowadays, the MKLDNN backend is widely used for the inference,
> > > >> especially
> > > >>>> for INT8 inference, and we got lots of very positive feedbacks
> from
> > > >> MXNet
> > > >>>> users.
> > > >>>>
> > > >>>> Achieved milestones as below:
> > > >>>>
> > > >>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb,
> > > >>>> 2018
> > [1]
> > > >>>> - MKLDNN backend as default CPU backend from source building,
> > > >>>> Jan,
> > > 2019
> > > >> [2]
> > > >>>> - MKLDNN subgraph optimization as default for the inference,
> > > >>>> Jul,
> > 2019
> > > >> [3]
> > > >>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4]
> > > >>>>
> > > >>>> To make more successful and technical leadership for Apache
> > > >>>> MXNet
> in
> > > > the
> > > >>>> industry, I propose to make MKLDNN as default CPU backend in
> > > >>>> all
> > > binary
> > > >>>> distribution from the next release.
> > > >>>> The new milestone includes:
> > > >>>>
> > > >>>> - Static link MKLDNN library in the binary avoiding the
> > > >>>> mismatch
> > > > version
> > > >>>> in the runtime [5]
> > > >>>> - Make nightly build with MKLDNN default from master pre 1.7
> release
> > > >>>> - Binary distribution with MKLDNN default from 1.7 release.
> > > >>>>
> > > >>>> What will be changed:
> > > >>>>
> > > >>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1
> > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor
> > release
> > > >>>> (1.x) and plan to remove in next major release (2.0)
> > > >>>>
> > > >>>> Suggestions and comments are highly appreciated.
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> --Patric
> > > >>>>
> > > >>>>
> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677
> > > >>>> [2]
> > > >>>>
> > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c2629591
> 01e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E
> > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518
> > > >>>> [4]
> > > >>>>
> > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c6
> 84e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E
> > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731
> > > >>>>
> > > >>
> > > >> —
> > > >> Alfredo Luque
> > > >> Software Engineer
> > > >> Machine Learning Infrastructure
> > > >> Airbnb
> > > >> San Francisco, CA
> > > >
> > > > —
> > > > Alfredo Luque
> > > > Software Engineer
> > > > Machine Learning Infrastructure
> > > > Airbnb
> > > > San Francisco, CA
> > >
> > >
> >
> 
> —
> Alfredo Luque
> Software Engineer
> Machine Learning Infrastructure
> Airbnb
> San Francisco, CA

Reply via email to