Regarding the cases listed by Marco: - AMD CPU >From my architecture knowledge, what works on C4 instances (with AVX2 support) should also work well on m5a, right? I think mxnet-mkl and mxnet-cuxxmkl packages have been fully validated on AVX2 machines. Also, we didn't perform any validation on AMD CPU before, why we need do that for this time?
- ARM CPU I don't know we're releasing any convenience binaries for ARM CPU. This proposal mainly targets those pypi packages. - Windows Already validated by CI. We're also releasing mxnet-mkl packages for Win. - GPU and MKLDNN enabled Already validated by CI and mxnet-cuxxmkl packages have been released for several versions. - Fully reproducible results (medical and financial sector requested that and we have some flags for cuda) Not sure I understand this case. We already have MKL-DNN backend for a while. Functionality and correctness of it have been verified by MXNet users. -tao On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu <marco.g.ab...@gmail.com> wrote: > Sorry, my intent with the "non-standard" phrase was not about general MXNet > but rather from MKLDNNs point of view, considering that it's being > developed by Intel, I assumed that MKLDNN might consider non-intel > use-cases non standard. > > -Marco > > Skalicky, Sam <sska...@amazon.com.invalid> schrieb am Mo., 18. Nov. 2019, > 21:34: > > > Thanks Alfredo, if you can create a GitHub issue with notes/steps we can > > add this to the todo list for integrating with the MXNet CI to test on > m5a > > instances too. Then we can start tracking this on a regular basis. It > would > > be great to actually test on ARM instances now that AWS has A1 instances > > too…..ill add it to the wish list ;-D > > > > Sam > > > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque <alfredo.lu...@airbnb.com > .INVALID> > > wrote: > > > > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) and first > > > generation AMD Threadripper Gen 1 if someone has something easy to run > > and > > > representative. > > > > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam ( > > > sska...@amazon.com.invalid) wrote: > > > > > > Thanks a good idea Alfredo, are you able to help test on AMD CPUs? Or > is > > > there someone else in the mxnet dev@ community who can help? > > > > > > Sam > > > > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque > > > <alfredo.lu...@airbnb.com.INVALID> wrote: > > >> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; Ryzen / Epyc) > > > would > > >> definitely make sense as a requirement. It seems odd to classify that > as > > > a > > >> “nonstandard” use case. > > >> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam ( > > >> sska...@amazon.com.invalid) wrote: > > >> > > >> Thanks Patric & team for your work over the years to make MXNet fast > > with > > >> MKLDNN! > > >> > > >> I think it would be great to make MKLDNN enabled by default. We will > > need > > >> to continue producing variants without MKLDNN for those who don’t want > > it > > >> (Marco enumerated some use cases). How do you propose to identify the > > pip > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and > > > mxnet-cu101mkl > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains MKLDNN what > do > > > you > > >> propose we call the build without MKLDNN? mxnet-nomkl? > > >> > > >> Thanks! > > >> Sam > > >> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu < > marco.g.ab...@gmail.com> > > >> wrote: > > >>> > > >>> Hi Patric, > > >>> > > >>> First of all, thanks a lot to you and your team for all the effort on > > >> MXNet > > >>> and mkldnn! > > >>> > > >>> Generally I'm inclined towards your proposal, but I'm thinking about > > the > > >>> non-standard use cases: > > >>> - AMD CPU > > >>> - ARM CPU > > >>> - Windows > > >>> - GPU and MKLDNN enabled > > >>> - Fully reproducible results (medical and financial sector requested > > > that > > >>> and we have some flags for cuda) > > >>> > > >>> Is mkldnn fully compatible with these use cases? If not, what would > > >> happen? > > >>> If yes, do we have performance numbers? > > >>> > > >>> Best regards, > > >>> Marco > > >>> > > >>> Zhao, Patric <patric.z...@intel.com> schrieb am Mo., 18. Nov. 2019, > > >> 14:00: > > >>> > > >>>> Hi MXNet community, > > >>>> > > >>>> From the first MKLDNN backend integrated in release 1.2, the > community > > >> is > > >>>> continuously improving the quality and performance of MKLDNN CPU > > >> backend. > > >>>> Nowadays, the MKLDNN backend is widely used for the inference, > > >> especially > > >>>> for INT8 inference, and we got lots of very positive feedbacks from > > >> MXNet > > >>>> users. > > >>>> > > >>>> Achieved milestones as below: > > >>>> > > >>>> - MKLDNN integrated into Apache MXNet from release 1.2, Feb, 2018 > [1] > > >>>> - MKLDNN backend as default CPU backend from source building, Jan, > > 2019 > > >> [2] > > >>>> - MKLDNN subgraph optimization as default for the inference, Jul, > 2019 > > >> [3] > > >>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 [4] > > >>>> > > >>>> To make more successful and technical leadership for Apache MXNet in > > > the > > >>>> industry, I propose to make MKLDNN as default CPU backend in all > > binary > > >>>> distribution from the next release. > > >>>> The new milestone includes: > > >>>> > > >>>> - Static link MKLDNN library in the binary avoiding the mismatch > > > version > > >>>> in the runtime [5] > > >>>> - Make nightly build with MKLDNN default from master pre 1.7 release > > >>>> - Binary distribution with MKLDNN default from 1.7 release. > > >>>> > > >>>> What will be changed: > > >>>> > > >>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1 > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the minor > release > > >>>> (1.x) and plan to remove in next major release (2.0) > > >>>> > > >>>> Suggestions and comments are highly appreciated. > > >>>> > > >>>> Thanks, > > >>>> > > >>>> --Patric > > >>>> > > >>>> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677 > > >>>> [2] > > >>>> > > >> > > > > > > https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c262959101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518 > > >>>> [4] > > >>>> > > >> > > > > > > https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731 > > >>>> > > >> > > >> — > > >> Alfredo Luque > > >> Software Engineer > > >> Machine Learning Infrastructure > > >> Airbnb > > >> San Francisco, CA > > > > > > — > > > Alfredo Luque > > > Software Engineer > > > Machine Learning Infrastructure > > > Airbnb > > > San Francisco, CA > > > > >