Also per Sam's suggestion, we could still release a build without MKLDNN (name it mxnet-nomkldnn?) and track the usage/download for one or two releases. If there is no usage, we could drop that build in the future.
Best, Lin On Tue, Nov 19, 2019 at 1:23 PM Lin Yuan <apefor...@gmail.com> wrote: > Just to summarize base on the concerns Marco raised and discussed abvove: > > - AMD CPU (it should work with MKLDNN: > https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL-DNN+-+Performance+Benchmarking > ) > - ARM CPU (we don't have it today w/o MKLDNN either) > - Windows (Windows support is there regardless of MKLDNN or not) > - GPU and MKLDNN enabled (already supported) > - Fully reproducible results (medical and financial sector requested that > and we have some flags for cuda) (The nondeterminism exists even today w/o > MKLDNN. We should address it regardless of MLKDNN) > > Marco, please let us know if your concerns are properly addressed? > > Given that MKLDNN gives significant performance speed up in CPU, I am > inclined to make it default in pip build. > > Best, > > Lin > > On Tue, Nov 19, 2019 at 8:08 AM Chris Olivier <cjolivie...@gmail.com> > wrote: > >> Thanks, Patric. I was just trying to point out that there was currently no >> guarantee of deterministic results without MKL, so there’s not necessarily >> an expectation of determinism with MKL (ie requirement isn’t relaxed). >> >> On Mon, Nov 18, 2019 at 9:38 PM Zhao, Patric <patric.z...@intel.com> >> wrote: >> >> > It may be a concern but little noise can't affect the final results if >> the >> > algorithm is stable in numerical. >> > The MKLDNN backend with mxnet-mkl has been used for 2 years and we >> didn't >> > see the coverage issue caused by multiple threading. >> > In other words, GPU programming mode works well on training where the >> > non-deterministic also exists from multiple threads. >> > >> > Parts of training accuracy was pasted in the first PR when MKLDNN is >> > integrated. >> > >> https://github.com/apache/incubator-mxnet/pull/8302#issuecomment-359674818 >> > >> > In conclusion, it may happen with very little probability. I believe we >> > can get a solution in case it happens someday. >> > >> > Thanks, >> > >> > --Patric >> > >> > >> > > -----Original Message----- >> > > From: Chris Olivier <cjolivie...@gmail.com> >> > > Sent: Tuesday, November 19, 2019 11:51 AM >> > > To: dev@mxnet.incubator.apache.org >> > > Cc: Tao Lv <mutou...@gmail.com> >> > > Subject: Re: Proposal to make MKLDNN as default CPU backend >> > > >> > > (for non mkl dropout, for instance) >> > > >> > > On Mon, Nov 18, 2019 at 7:50 PM Chris Olivier <cjolivie...@gmail.com> >> > > wrote: >> > > >> > > > To address the deterministic item, I know for a fact that training >> > > > will not be deterministic in some cases where the “parallel random” >> > > > class is utilized in parallel threads, such as OMP, if the number of >> > > > cores is different, even with the same seed, because threads are >> > > > seeded independently and different number of threads will end up >> > > > generating different random number sequences. Dropout operator being >> > > an example. >> > > > >> > > > On Mon, Nov 18, 2019 at 6:39 PM Alfredo Luque >> > > > <alfredo.lu...@airbnb.com.invalid> wrote: >> > > > >> > > >> For AMD CPUs, you’d want to perform validation because now MKL-DNN >> > > >> would be enabled by default. Historically, other intel libraries >> > > >> (along with the ICC >> > > >> compiler) have had performance issues on AMD CPUs. It’s just worth >> > > >> double checking to make sure that’s not the case here. Perhaps some >> > > >> MKL-DNN authors can chime in though. It’s not sufficient to double >> > > >> check that an >> > > >> AVX2 package passes tests. >> > > >> >> > > >> Agreed in the case we’re not releasing ARM binaries. >> > > >> >> > > >> The reproducibility argument is around the results being >> numerically >> > > >> reproducible. That is, eg; if I train a model with some fixed set >> of >> > > >> data, some random seed, etc. and then run inference on it do I get >> > > >> the exact same floating point values for the weights and results? >> > > >> Does MxNet already offer this without MKL-DNN? >> > > >> >> > > >> On November 18, 2019 at 6:32:07 PM, Tao Lv (mutou...@gmail.com) >> > > wrote: >> > > >> >> > > >> Regarding the cases listed by Marco: >> > > >> - AMD CPU >> > > >> From my architecture knowledge, what works on C4 instances (with >> AVX2 >> > > >> support) should also work well on m5a, right? I think mxnet-mkl and >> > > >> mxnet-cuxxmkl packages have been fully validated on AVX2 machines. >> > > >> Also, we didn't perform any validation on AMD CPU before, why we >> need >> > > >> do that for this time? >> > > >> >> > > >> - ARM CPU >> > > >> I don't know we're releasing any convenience binaries for ARM CPU. >> > > >> This proposal mainly targets those pypi packages. >> > > >> >> > > >> - Windows >> > > >> Already validated by CI. We're also releasing mxnet-mkl packages >> for >> > Win. >> > > >> >> > > >> - GPU and MKLDNN enabled >> > > >> Already validated by CI and mxnet-cuxxmkl packages have been >> released >> > > >> for several versions. >> > > >> >> > > >> - Fully reproducible results (medical and financial sector >> requested >> > > >> that and we have some flags for cuda) Not sure I understand this >> > > >> case. We already have MKL-DNN backend for a while. Functionality >> and >> > > >> correctness of it have been verified by MXNet users. >> > > >> >> > > >> -tao >> > > >> >> > > >> On Tue, Nov 19, 2019 at 4:41 AM Marco de Abreu >> > > >> <marco.g.ab...@gmail.com> >> > > >> wrote: >> > > >> >> > > >> > Sorry, my intent with the "non-standard" phrase was not about >> > > >> > general >> > > >> MXNet >> > > >> > but rather from MKLDNNs point of view, considering that it's >> being >> > > >> > developed by Intel, I assumed that MKLDNN might consider >> non-intel >> > > >> > use-cases non standard. >> > > >> > >> > > >> > -Marco >> > > >> > >> > > >> > Skalicky, Sam <sska...@amazon.com.invalid> schrieb am Mo., 18. >> Nov. >> > > >> 2019, >> > > >> > 21:34: >> > > >> > >> > > >> > > Thanks Alfredo, if you can create a GitHub issue with >> notes/steps >> > > >> > > we >> > > >> can >> > > >> > > add this to the todo list for integrating with the MXNet CI to >> > > >> > > test on >> > > >> > m5a >> > > >> > > instances too. Then we can start tracking this on a regular >> > > >> > > basis. It >> > > >> > would >> > > >> > > be great to actually test on ARM instances now that AWS has A1 >> > > >> instances >> > > >> > > too…..ill add it to the wish list ;-D >> > > >> > > >> > > >> > > Sam >> > > >> > > >> > > >> > > > On Nov 18, 2019, at 12:32 PM, Alfredo Luque < >> > > >> alfredo.lu...@airbnb.com >> > > >> > .INVALID> >> > > >> > > wrote: >> > > >> > > > >> > > >> > > > Happy to run some benchmarks on an AWS m5a instance (Epyc) >> and >> > > >> > > > first generation AMD Threadripper Gen 1 if someone has >> > > >> > > > something easy to >> > > >> run >> > > >> > > and >> > > >> > > > representative. >> > > >> > > > >> > > >> > > > On November 18, 2019 at 12:29:31 PM, Skalicky, Sam ( >> > > >> > > > sska...@amazon.com.invalid) wrote: >> > > >> > > > >> > > >> > > > Thanks a good idea Alfredo, are you able to help test on AMD >> > CPUs? >> > > >> Or >> > > >> > is >> > > >> > > > there someone else in the mxnet dev@ community who can help? >> > > >> > > > >> > > >> > > > Sam >> > > >> > > > >> > > >> > > >> On Nov 18, 2019, at 12:27 PM, Alfredo Luque >> > > >> > > > <alfredo.lu...@airbnb.com.INVALID> wrote: >> > > >> > > >> >> > > >> > > >> Verifying that there isn’t a slowdown on AMD CPUs (eg; >> Ryzen / >> > > >> Epyc) >> > > >> > > > would >> > > >> > > >> definitely make sense as a requirement. It seems odd to >> > > >> > > >> classify >> > > >> that >> > > >> > as >> > > >> > > > a >> > > >> > > >> “nonstandard” use case. >> > > >> > > >> >> > > >> > > >> On November 18, 2019 at 12:20:33 PM, Skalicky, Sam ( >> > > >> > > >> sska...@amazon.com.invalid) wrote: >> > > >> > > >> >> > > >> > > >> Thanks Patric & team for your work over the years to make >> > > >> > > >> MXNet >> > > >> fast >> > > >> > > with >> > > >> > > >> MKLDNN! >> > > >> > > >> >> > > >> > > >> I think it would be great to make MKLDNN enabled by default. >> > > >> > > >> We >> > > >> will >> > > >> > > need >> > > >> > > >> to continue producing variants without MKLDNN for those who >> > > >> > > >> don’t >> > > >> want >> > > >> > > it >> > > >> > > >> (Marco enumerated some use cases). How do you propose to >> > > >> > > >> identify >> > > >> the >> > > >> > > pip >> > > >> > > >> wheels with/without MKLDNN? Previously we had: mxnet-mkl and >> > > >> > > > mxnet-cu101mkl >> > > >> > > >> with MKLDNN. If the plain “mxnet” pip wheel now contains >> > > >> > > >> MKLDNN >> > > >> what >> > > >> > do >> > > >> > > > you >> > > >> > > >> propose we call the build without MKLDNN? mxnet-nomkl? >> > > >> > > >> >> > > >> > > >> Thanks! >> > > >> > > >> Sam >> > > >> > > >> >> > > >> > > >>> On Nov 18, 2019, at 11:08 AM, Marco de Abreu < >> > > >> > marco.g.ab...@gmail.com> >> > > >> > > >> wrote: >> > > >> > > >>> >> > > >> > > >>> Hi Patric, >> > > >> > > >>> >> > > >> > > >>> First of all, thanks a lot to you and your team for all the >> > > >> > > >>> effort >> > > >> on >> > > >> > > >> MXNet >> > > >> > > >>> and mkldnn! >> > > >> > > >>> >> > > >> > > >>> Generally I'm inclined towards your proposal, but I'm >> > > >> > > >>> thinking >> > > >> about >> > > >> > > the >> > > >> > > >>> non-standard use cases: >> > > >> > > >>> - AMD CPU >> > > >> > > >>> - ARM CPU >> > > >> > > >>> - Windows >> > > >> > > >>> - GPU and MKLDNN enabled >> > > >> > > >>> - Fully reproducible results (medical and financial sector >> > > >> requested >> > > >> > > > that >> > > >> > > >>> and we have some flags for cuda) >> > > >> > > >>> >> > > >> > > >>> Is mkldnn fully compatible with these use cases? If not, >> what >> > > >> would >> > > >> > > >> happen? >> > > >> > > >>> If yes, do we have performance numbers? >> > > >> > > >>> >> > > >> > > >>> Best regards, >> > > >> > > >>> Marco >> > > >> > > >>> >> > > >> > > >>> Zhao, Patric <patric.z...@intel.com> schrieb am Mo., 18. >> Nov. >> > > >> 2019, >> > > >> > > >> 14:00: >> > > >> > > >>> >> > > >> > > >>>> Hi MXNet community, >> > > >> > > >>>> >> > > >> > > >>>> From the first MKLDNN backend integrated in release 1.2, >> the >> > > >> > community >> > > >> > > >> is >> > > >> > > >>>> continuously improving the quality and performance of >> MKLDNN >> > > >> > > >>>> CPU >> > > >> > > >> backend. >> > > >> > > >>>> Nowadays, the MKLDNN backend is widely used for the >> > > >> > > >>>> inference, >> > > >> > > >> especially >> > > >> > > >>>> for INT8 inference, and we got lots of very positive >> > > >> > > >>>> feedbacks >> > > >> from >> > > >> > > >> MXNet >> > > >> > > >>>> users. >> > > >> > > >>>> >> > > >> > > >>>> Achieved milestones as below: >> > > >> > > >>>> >> > > >> > > >>>> - MKLDNN integrated into Apache MXNet from release 1.2, >> Feb, >> > > >> > > >>>> 2018 >> > > >> > [1] >> > > >> > > >>>> - MKLDNN backend as default CPU backend from source >> > > >> > > >>>> building, >> > > >> Jan, >> > > >> > > 2019 >> > > >> > > >> [2] >> > > >> > > >>>> - MKLDNN subgraph optimization as default for the >> inference, >> > > >> > > >>>> Jul, >> > > >> > 2019 >> > > >> > > >> [3] >> > > >> > > >>>> - MKLDNN major version upgrade in release 1.6, Oct, 2019 >> [4] >> > > >> > > >>>> >> > > >> > > >>>> To make more successful and technical leadership for >> Apache >> > > >> > > >>>> MXNet >> > > >> in >> > > >> > > > the >> > > >> > > >>>> industry, I propose to make MKLDNN as default CPU backend >> in >> > > >> > > >>>> all >> > > >> > > binary >> > > >> > > >>>> distribution from the next release. >> > > >> > > >>>> The new milestone includes: >> > > >> > > >>>> >> > > >> > > >>>> - Static link MKLDNN library in the binary avoiding the >> > > >> > > >>>> mismatch >> > > >> > > > version >> > > >> > > >>>> in the runtime [5] >> > > >> > > >>>> - Make nightly build with MKLDNN default from master pre >> 1.7 >> > > >> release >> > > >> > > >>>> - Binary distribution with MKLDNN default from 1.7 >> release. >> > > >> > > >>>> >> > > >> > > >>>> What will be changed: >> > > >> > > >>>> >> > > >> > > >>>> - mxnet and mxnet-cuXX binary will be built with MKLDNN=1 >> > > >> > > >>>> - mxnet-mkl and mxnet-cuXXmkl will be not changed in the >> > > >> > > >>>> minor >> > > >> > release >> > > >> > > >>>> (1.x) and plan to remove in next major release (2.0) >> > > >> > > >>>> >> > > >> > > >>>> Suggestions and comments are highly appreciated. >> > > >> > > >>>> >> > > >> > > >>>> Thanks, >> > > >> > > >>>> >> > > >> > > >>>> --Patric >> > > >> > > >>>> >> > > >> > > >>>> >> > > >> > > >>>> [1] https://github.com/apache/incubator-mxnet/pull/9677 >> > > >> > > >>>> [2] >> > > >> > > >>>> >> > > >> > > >> >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> >> > > https://lists.apache.org/thread.html/bfeae6ee46374112eb4dff1470c26295 >> > > >> 9101e4bffb19930926963535@%3Cdev.mxnet.apache.org%3E >> > > >> > > >>>> [3] https://github.com/apache/incubator-mxnet/pull/15518 >> > > >> > > >>>> [4] >> > > >> > > >>>> >> > > >> > > >> >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> >> > > https://lists.apache.org/thread.html/f46ab920f18795496eafe713e6e9e561 >> > > >> c684e06189085cec17b401dc@%3Cdev.mxnet.apache.org%3E >> > > >> > > >>>> [5] https://github.com/apache/incubator-mxnet/pull/16731 >> > > >> > > >>>> >> > > >> > > >> >> > > >> > > >> — >> > > >> > > >> Alfredo Luque >> > > >> > > >> Software Engineer >> > > >> > > >> Machine Learning Infrastructure Airbnb San Francisco, CA >> > > >> > > > >> > > >> > > > — >> > > >> > > > Alfredo Luque >> > > >> > > > Software Engineer >> > > >> > > > Machine Learning Infrastructure Airbnb San Francisco, CA >> > > >> > > >> > > >> > > >> > > >> > >> > > >> >> > > >> — >> > > >> Alfredo Luque >> > > >> Software Engineer >> > > >> Machine Learning Infrastructure >> > > >> Airbnb >> > > >> San Francisco, CA >> > > >> >> > > > >> > >> >