Re: Include MKLDNN into default mxnet pip package

Lv, Tao A Sun, 25 Nov 2018 17:04:23 -0800

Hi Steffen, 

I think all the commits on MKL-DNN master branch are well tested for MKL-DNN 
development team. If we really want to have a release commit in the coming 1.4 
mxnet release, my suggestion is 0.17 MKL-DNN release.


Thank you,
Tao 

Sent from my iPhone

> On Nov 26, 2018, at 8:09 AM, Steffen Rochel <steffenroc...@gmail.com> wrote:
> 
> +1 to make MKL-DNN default.
> I'm tracking  https://github.com/apache/incubator-mxnet/issues/13369 as
> open issue to be addressed for 1.4.0
> I do agree that we should move to a model to include released dependencies
> instead of just taking bleeding edge snapshots.
> However, speed of development is important as well.
> As a compromise for 1.4.0 release with MKL-DNN: can the MKL-DNN development
> team provide us with a well tested tag/commit id to include in 1.4.0
> release?
> Steffen
> 
>> On Wed, Nov 21, 2018 at 11:42 PM Lv, Tao A <tao.a...@intel.com> wrote:
>> 
>> Thanks for the information, Kellen and Naveen.
>> 
>> Better than onnx-tensorrt, MKL-DNN has already provided versioning and
>> release tags. My concern is that as MKL-DNN is still under intensive
>> development, if it has a new feature or bug fix on its master branch, do we
>> really want to wait for next release to get it supported in MXNet?
>> 
>> Take the LSTM regression as an example, probably MKL-DNN will give a fix
>> or improvement on its master branch soon, do we need to wait for 0.18
>> release to get it fixed for mxnet user? AFAIK, tensorflow is also using
>> normal commit id, not release, as the dependency for MKL-DNN.
>> 
>> Regarding the LSTM regression, we are using internal JIRA tickets rather
>> than github issues to track the defects of MKL-DNN. But I agree with you,
>> we need update the progress of it in Alex's issue.
>> 
>> Thanks,
>> -tao
>> 
>> -----Original Message-----
>> From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
>> Sent: Thursday, November 22, 2018 10:55 AM
>> To: dev@mxnet.incubator.apache.org
>> Subject: Re: Include MKLDNN into default mxnet pip package
>> 
>> Agree with your point about other repos also not being based on versioning
>> Tao.  I would point out that I've given some that I've worked with similar
>> feedback: https://github.com/onnx/onnx-tensorrt/issues/68
>> 
>>> On Wed, Nov 21, 2018 at 6:48 PM Naveen Swamy <mnnav...@gmail.com> wrote:
>>> 
>>> Tao,
>>> 
>>> You are right there are many submodules in 3rd party. We have to start
>>> somewhere and I believe this one is a good candidate to start with.
>>> This is not to cater to release of MXNet or to tie them with the
>>> releases of the submodules but instead to pick only stable releases
>>> and not to pick up bleeding edge commits from the tip of the master,
>>> this gives us confidence in the submodule that MXNet users are
>>> depending on that especially if we make MKLDNN the default.
>>> 
>>> Good to know it is known already as a regression.Alex has created this
>>> issue https://github.com/apache/incubator-mxnet/issues/13369, please
>>> add details and link the corresponding issue in MKLDNN(I couldn't find).
>>> 
>>> -Naveen
>>> 
>>>> On Wed, Nov 21, 2018 at 6:04 PM Lv, Tao A <tao.a...@intel.com> wrote:
>>>> 
>>>> Here are my answers for the questions from Kellen and Naveen about
>>>> MKL-DNN. It doesn't mean that I'm supportive for making MKL-DNN
>>>> default here.
>>>> 
>>>> @Kellen,
>>>> 
>>>> FYI, here is a list for those platforms which are officially
>>>> supported by MKL-DNN.
>>>> https://github.com/intel/mkl-dnn#system-requirements
>>>> 
>>>> Most of computation intensive kernels in MKL-DNN are JITed. So they
>>>> are supposed to generate code according to the platform during
>>>> runtime. For non-JIT code in MKL-DNN, same as other code in MXNet,
>>>> it will generate instructions according to the options/flags of
>>>> compiler. We can set -DARCH_OPT_FLAGS when build MKL-DNN to avoid
>>>> optimization for compiling machine. That's exactly what we are doing
>> for MKL-DNN build in MXNet.
>>> Even
>>>> without MKL-DNN, I noticed there were issues about illegal
>>>> instructions
>>> of
>>>> MXNet when users import the pip package on a lower end machine which
>>>> probably only supports SSE.
>>>> 
>>>> @Naveen,
>>>> 
>>>> The LSTM issue has already been identified as a regression from the
>>> recent
>>>> version of MKL-DNN. Hopefully it will be fixed soon with a new
>>>> update of MKL-DNN.
>>>> 
>>>> MXNet has many submodule dependencies under the 3rd party folder.
>>>> Seems
>>> we
>>>> don't require release versions for most of these dependencies. The
>>> release
>>>> period of MKL-DNN and MXNet are not matched very well. I think it
>>>> would
>>> be
>>>> a risk for MXNet release if it hardly depends on the release of a
>>>> submodule, no need to say depends on the releases of all submodules.
>>>> 
>>>> -tao
>>>> 
>>>> -----Original Message-----
>>>> From: Naveen Swamy [mailto:mnnav...@gmail.com]
>>>> Sent: Thursday, November 22, 2018 9:08 AM
>>>> To: dev@mxnet.incubator.apache.org
>>>> Cc: d...@mxnet.apache.org
>>>> Subject: Re: Include MKLDNN into default mxnet pip package
>>>> 
>>>> Hi Alex,
>>>> 
>>>> Thanks for promptly running the numbers on AMD and reporting here.
>>>> 
>>>> Can you please update the AMD numbers here for posterity
>>>> 
>>> https://cwiki.apache.org/confluence/display/MXNET/MXNet+with+Intel+MKL
>>> -DNN+-+Performance+Benchmarking
>>>> ?
>>>> 
>>>> are there any outstanding issues when MKLDNN is enabled? from my
>>>> offline conversation I am briefly aware performance issues with
>>>> LSTM, is there an GitHub issue for it?
>>>> 
>>>> MKLDNN is a submodule dependency, are we pulling the latest commit
>>>> or releases  ? If not we should move to releases before we make it a
>>> default.
>>>> Ideally we should use platform specific distributions (-dev
>>>> packages) at least we should rely on well tested releases.
>>>> 
>>>> 
>>>> Thanks, Naveen
>>>> 
>>>> On Wed, Nov 21, 2018 at 4:55 PM Zai, Alexander
>>> <alex...@amazon.com.invalid
>>>>> 
>>>> wrote:
>>>> 
>>>>> AMD benchmarks have been published. We are seeing a x15.8 speedup
>>>>> with
>>>>> Resnet50 (batch size 32) on AWS's new m5a.24xlarge machine. With a
>>>>> smaller network (Mobilenet - batch size 32) the speedup is more
>>>>> significant at x38.7. Let's have a vote to see if the PR to have
>>>>> MKLDNN enabled by default
>>>>> (https://github.com/apache/incubator-mxnet/pull/12591) can be
>>>>> merged before 1.4.0 release.
>>>>> 
>>>>> On 10/19/18, 9:17 AM, "Pedro Larroy"
>>>>> <pedro.larroy.li...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>    I did  pip install mxnet-mkl==1.3.1b20181018 on an AMD Ryzen
>>>>> 1950X and unit
>>>>>    tests are passing.
>>>>> 
>>>>>    Is this build using AVX512?  in /proc/cpuinfo I see only "avx"
>>> flag.
>>>>>    There's no "avx2" like on recent intel cpus.
>>>>> 
>>>>>    Pedro.
>>>>> 
>>>>>    On Fri, Oct 19, 2018 at 5:12 PM Hagay Lupesko
>>>>> <lupe...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Awesome collaborative effort across many contributors and
>>>> companies!
>>>>>> 
>>>>>> The boost is impressive and for MXNet users to get this
>>>>> boost "out of the
>>>>>> box" is a great benefit and makes MXNet an even better choice.
>>>>>> 
>>>>>> Alex - can you clarify whether there are any down sides with
>>>>> regards to
>>>>>> noon AVX-512 architectures, AMD CPUs, etc? Will it
>>>>> gracefully fallback?
>>>>>> 
>>>>>> Hagay
>>>>>> 
>>>>>> 
>>>>>> On Fri, Oct 19, 2018, 15:46 Sergio Fernández
>>>>> <wik...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> If there is no downside on platforms not supporting AVX512
>>>>> instructions,
>>>>>>> then +1
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Oct 17, 2018, 14:10 Alex Zai <aza...@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>>> Hey all,
>>>>>>>> We have been working hard these past few months to
>>>>> integrate
>>>> and
>>>>>>> stabilize
>>>>>>>> Intel’s MKLDNN deep learning CPU accelerator into Mxnet
>>>>> and have made
>>>>>>>> incredible progress. On CPUs with AVX512 instructions
>>>>> (such as
>>>>> c5.18x)
>>>>>> we
>>>>>>>> have seen performance increase up to 12x and on other
>>>>> platforms (Macs,
>>>>>>>> AVX2) we seen a speedup of 1.5+. Full list of benchmarks
>>>>> can be found
>>>>>>> here
>>>>>>>> (
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650
>>> 764
>>>>>>>> and https://github.com/apache/incubator-mxnet/pull/12591
>> ).
>>>>>>>> 
>>>>>>>> Currently, using this accelerator requires the developer
>>>>> to either pip
>>>>>>>> install the mxnet-mkl version of mxnet or to build it
>>>>> themselves from
>>>>>>>> source. Given that we should try to provide the best
>>>>> performance "out
>>>>>> of
>>>>>>>> the box” with mxnet we should include this in the
>>>>> default
>>>> build.
>>>>> The
>>>>>>> mkldnn
>>>>>>>> library is included with in the pip package build so it
>>>>> does
>>>> not
>>>>>> require
>>>>>>> an
>>>>>>>> external dependency.
>>>>>>>> 
>>>>>>>> There were concerns that MKLDNN could cause regressions
>>>>> on certain
>>>>>>>> platforms (as it did with the tensorflow version a while
>>>>> back); but we
>>>>>>>> added a env flag (MXNET_MKLDNN_ENABLED) that allows
>>>>> users to turn of
>>>>>> this
>>>>>>>> feature during runtime. Please bring up any other
>>>>> concerns you may have
>>>>>>> and
>>>>>>>> your thoughts on including this accelerator in the
>>>>> default
>>>> build.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Alex
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Include MKLDNN into default mxnet pip package

Reply via email to