RE: RE: MXNet 1.6.0 release

2019-10-31 Thread Zhao, Patric
Sure, I will see the issue.

> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, November 1, 2019 11:27 AM
> To: d...@mxnet.apache.org
> Subject: Re: RE: MXNet 1.6.0 release
> 
> Hi Patric,
> 
> Actually the nightly tests show some problems with machines not being able
> to find libmkldnn.so.1 (see e.g. here: http://jenkins.mxnet-ci.amazon-
> ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/4
> 92/pipeline). I'm not sure if this is just a problem with configuration of CI
> machines for nightly tests, but please take a look at this.
> 
> Przemek
> 
> On 2019/11/01 02:41:01, "Zhao, Patric"  wrote:
> > Hi Przemek,
> >
> > The MKLDNN upgrade PR was merged in Oct 31.  Please double check the
> nightly build and going forward for the release progress.
> >
> > Feel free to ping me if anything we can help.
> >
> > Thanks,
> >
> > --Patric
> >
> > > -Original Message-
> > > From: Przemysław Trędak 
> > > Sent: Friday, October 25, 2019 10:25 PM
> > > To: d...@mxnet.apache.org
> > > Subject: Re: MXNet 1.6.0 release
> > >
> > > Dear MXNet Community
> > >
> > > Last night I updated 1.6.x branch to point to current master. The
> > > code freeze is now in effect.
> > >
> > > That said, since most of the features intended for 1.6 release are
> > > still not fully finished (a few PRs for BERT GPU performance,
> > > multiple MKLDNN PRs, multiple PRs tagged NumPy etc.) we decided to
> > > go with a "soft" code freeze approach. Only the PRs that are in the
> > > scope of 1.6 release will now be accepted into 1.6.x branch. The
> > > hard code freeze is planned next week, Oct 31st.
> > >
> > > While contributors of those in-scope PRs and their reviewers work to
> > > meet that deadline, I would like to call for action for the rest of
> > > the MXNet Community to test, raise issues and fix the bugs in the release.
> > >
> > > Thank you
> > > Przemek
> > >
> > > On 2019/10/11 00:00:34, Przemys  aw Tr  dak 
> > > wrote:
> > > > Hi MXNet Community,
> > > >
> > > > As the 1.5.1 patch release is done (many thanks Tao!), it is time
> > > > to prepare
> > > for the next minor release of MXNet - 1.6.0.
> > > >
> > > > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> > > release of 1.6.0. As it will be the first time for me to manage a
> > > release, Sam
> > > (samskalicky) and Lin (apeforest) agreed to help guiding me through
> > > the process.
> > > >
> > > > Thanks to Sheng there is a GitHub issue[1] listing major features
> > > > that
> > > should go into the 1.6.0, please add any features that you want
> > > included there.
> > > >
> > > > That said, as we target November for the release, to accommodate
> > > > for
> > > extensive testing and bugfixing, the code freeze date is set to
> > > October 24th 23:59PST. Please reach out to me as soon as possible if
> > > you feel that you will need an extension of that deadline for your 
> > > feature.
> > > >
> > > > Sheng created a page on cwiki[2] about the release, I will
> > > > populate it with
> > > the information and tracked issues and PRs.
> > > >
> > > > Thank you and let's make the great 1.6.0 release together!
> > > > Przemek
> > > >
> > > > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > > > [2]
> > > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan
> > > +a
> > > nd+Status
> > > >
> >


Re: RE: MXNet 1.6.0 release

2019-10-31 Thread Przemysław Trędak
Hi Patric,

Actually the nightly tests show some problems with machines not being able to 
find libmkldnn.so.1 (see e.g. here: 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/NightlyTestsForBinaries/detail/master/492/pipeline).
 I'm not sure if this is just a problem with configuration of CI machines for 
nightly tests, but please take a look at this.

Przemek

On 2019/11/01 02:41:01, "Zhao, Patric"  wrote: 
> Hi Przemek,
> 
> The MKLDNN upgrade PR was merged in Oct 31.  Please double check the nightly 
> build and going forward for the release progress.
> 
> Feel free to ping me if anything we can help.
> 
> Thanks,
> 
> --Patric
> 
> > -Original Message-
> > From: Przemysław Trędak 
> > Sent: Friday, October 25, 2019 10:25 PM
> > To: d...@mxnet.apache.org
> > Subject: Re: MXNet 1.6.0 release
> > 
> > Dear MXNet Community
> > 
> > Last night I updated 1.6.x branch to point to current master. The code 
> > freeze
> > is now in effect.
> > 
> > That said, since most of the features intended for 1.6 release are still 
> > not fully
> > finished (a few PRs for BERT GPU performance, multiple MKLDNN PRs,
> > multiple PRs tagged NumPy etc.) we decided to go with a "soft" code freeze
> > approach. Only the PRs that are in the scope of 1.6 release will now be
> > accepted into 1.6.x branch. The hard code freeze is planned next week, Oct
> > 31st.
> > 
> > While contributors of those in-scope PRs and their reviewers work to meet
> > that deadline, I would like to call for action for the rest of the MXNet
> > Community to test, raise issues and fix the bugs in the release.
> > 
> > Thank you
> > Przemek
> > 
> > On 2019/10/11 00:00:34, Przemys��aw Tr��dak 
> > wrote:
> > > Hi MXNet Community,
> > >
> > > As the 1.5.1 patch release is done (many thanks Tao!), it is time to 
> > > prepare
> > for the next minor release of MXNet - 1.6.0.
> > >
> > > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> > release of 1.6.0. As it will be the first time for me to manage a release, 
> > Sam
> > (samskalicky) and Lin (apeforest) agreed to help guiding me through the
> > process.
> > >
> > > Thanks to Sheng there is a GitHub issue[1] listing major features that
> > should go into the 1.6.0, please add any features that you want included
> > there.
> > >
> > > That said, as we target November for the release, to accommodate for
> > extensive testing and bugfixing, the code freeze date is set to October 24th
> > 23:59PST. Please reach out to me as soon as possible if you feel that you 
> > will
> > need an extension of that deadline for your feature.
> > >
> > > Sheng created a page on cwiki[2] about the release, I will populate it 
> > > with
> > the information and tracked issues and PRs.
> > >
> > > Thank you and let's make the great 1.6.0 release together!
> > > Przemek
> > >
> > > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > > [2]
> > https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan+a
> > nd+Status
> > >
> 


Re: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

2019-10-31 Thread Marco de Abreu
Great job, well done everyone!!

Lv, Tao A  schrieb am Fr., 1. Nov. 2019, 03:50:

> Hi dev,
>
> The feature branch mkldnn-v1.0 has been merged to master. Really
> appreciate your support for this task.
> Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
> Project: https://github.com/apache/incubator-mxnet/projects/16
> PR: https://github.com/apache/incubator-mxnet/pull/16555
>
> If possible, please downstream projects help to verify the latest master
> branch and feel free to report issues if any.
>
> Thanks,
> -tao
>
> -Original Message-
> From: Lv, Tao A 
> Sent: Sunday, July 28, 2019 11:55 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric ; Ye, Jason Y <
> jason.y...@intel.com>
> Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
> Update:
>
> I just cut out the feature branch for MKL-DNN 1.0 integration:
> https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
>
> Thanks,
> -tao
>
> -Original Message-
> From: Lv, Tao A 
> Sent: Friday, July 26, 2019 10:21 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric ; Ye, Jason Y <
> jason.y...@intel.com>
> Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
> Seems we don't have any objection. I will try to cut the feature branch in
> the following days.
>
> Thanks,
> -tao
>
> -Original Message-
> From: Lv, Tao A 
> Sent: Saturday, July 20, 2019 11:06 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric ; Ye, Jason Y <
> jason.y...@intel.com>
> Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
>
>
> Hi dev,
>
>
>
> MKL-DNN just published its first major release this month:
> https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to
> start a discussion about upgrading MKL-DNN integration from the current
> v0.20 to v1.0.
>
>
>
> Motivation
>
> To improve the general look-n-feel of the library and solve few important
> design issues, in the coming v1.0 major release, some of the data
> structures, primitive APIs and execution model will be changed and the
> compatibility to v0.x versions will be broken accordingly. Change details
> in MKL-DNN v1.0 are mostly covered in RFC for v1.0<
> https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>.
> The major changes are listed as below:
> *Support large tensor with int64_t dimension size.
> *Expose scratchpad to support stateless primitive and better
> memory management hence thread safe.
> *Pass memory and stream to primitive at execution.
> *Rework MKL-DNN memory descriptor.
> *Split LSTM/GRU/RNN into different primitives.
> *Remove MKLML dependency and stop the release of MKLML and iomp
> packages in MKL-DNN repository.
> *Support Intel integrated graphics.
>
> With these changes, we can resolve or mitigate several existing issues of
> MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue,
> and the int64 tensor size for MKL-DNN backend. Besides that, all new
> features will go to v1.x and will not be back ported to v0.x. MXNet need
> update the MKL-DNN dependency to v1.0 to better leverage new features and
> performance improvement.
>
>
>
> Development
>
> Basically we will follow the same integration methodology we used for v0.x
> integration, including operator implementation, registration, NDArray
> modification and graph partitioning. For better collaboration among the
> community, we will have a feature branch for the development and validation
> of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass
> the code review and CI and finally get committers approval. The development
> can be simply divide into 3 parts and all the work will be done before
> Q3'19 ends. During the development, feature branch will sync to the master
> branch periodically.
> *P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators
> integration (in src/operator/nn/mkldnn/). We can do FP32 training and
> inference for CNN models after P1 is done.
> *P2: quantization pass, INT8 operators integration (in
> src/operator/quantization/mkldnn). We can do INT8 quantization and INT8
> inference after P2 is done.
> *P3: RNN operators integration.
>
> If needed, documents will be revised accordingly during the development.
>
>
>
> Validation:
> *Use feature branch for development - all PRs should pass MXNet CI.
> *Disable MKL-DNN related tests at the beginning of development and
> recover them incrementally during the development.
> *Intel internal validation: mainly focus on performance and
> convergence validation on CPU, with models from MXNet examples, Gluon-CV
> and Gluon-NLP.
>
>
>
> Criteria for development done:
> *MXNet CI: pass all existing unit tests, nightly tests
> *Accuracy: Pass training convergence and inference accuracy
> validation
> *Performance: Achieve similar FP32/INT8 performance as v0.x
> 

RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

2019-10-31 Thread Lv, Tao A
Hi dev,

The feature branch mkldnn-v1.0 has been merged to master. Really appreciate 
your support for this task.
Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
Project: https://github.com/apache/incubator-mxnet/projects/16
PR: https://github.com/apache/incubator-mxnet/pull/16555

If possible, please downstream projects help to verify the latest master branch 
and feel free to report issues if any.

Thanks,
-tao

-Original Message-
From: Lv, Tao A  
Sent: Sunday, July 28, 2019 11:55 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric ; Ye, Jason Y 
Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Update:

I just cut out the feature branch for MKL-DNN 1.0 integration: 
https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0

Thanks,
-tao

-Original Message-
From: Lv, Tao A  
Sent: Friday, July 26, 2019 10:21 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric ; Ye, Jason Y 
Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Seems we don't have any objection. I will try to cut the feature branch in the 
following days.

Thanks,
-tao

-Original Message-
From: Lv, Tao A  
Sent: Saturday, July 20, 2019 11:06 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric ; Ye, Jason Y 
Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release



Hi dev,



MKL-DNN just published its first major release this month: 
https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start 
a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important 
design issues, in the coming v1.0 major release, some of the data structures, 
primitive APIs and execution model will be changed and the compatibility to 
v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are 
mostly covered in RFC for 
v1.0.
 The major changes are listed as below:
*Support large tensor with int64_t dimension size.
*Expose scratchpad to support stateless primitive and better memory 
management hence thread safe.
*Pass memory and stream to primitive at execution.
*Rework MKL-DNN memory descriptor.
*Split LSTM/GRU/RNN into different primitives.
*Remove MKLML dependency and stop the release of MKLML and iomp 
packages in MKL-DNN repository.
*Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of 
MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and 
the int64 tensor size for MKL-DNN backend. Besides that, all new features will 
go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN 
dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x 
integration, including operator implementation, registration, NDArray 
modification and graph partitioning. For better collaboration among the 
community, we will have a feature branch for the development and validation of 
MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code 
review and CI and finally get committers approval. The development can be 
simply divide into 3 parts and all the work will be done before Q3'19 ends. 
During the development, feature branch will sync to the master branch 
periodically.
*P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators 
integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference 
for CNN models after P1 is done.
*P2: quantization pass, INT8 operators integration (in 
src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 
inference after P2 is done.
*P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*Use feature branch for development - all PRs should pass MXNet CI.
*Disable MKL-DNN related tests at the beginning of development and 
recover them incrementally during the development.
*Intel internal validation: mainly focus on performance and convergence 
validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*MXNet CI: pass all existing unit tests, nightly tests
*Accuracy: Pass training convergence and inference accuracy validation
*Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the 
master branch. Since we cannot have two MKL-DNN libraries in MXNet 
simultaneously, the upstream should be done in a single PR. Possibly the PR 
will be large, so I hope the community can take time to review and comment 
during 

RE: MXNet 1.6.0 release

2019-10-31 Thread Zhao, Patric
Hi Przemek,

The MKLDNN upgrade PR was merged in Oct 31.  Please double check the nightly 
build and going forward for the release progress.

Feel free to ping me if anything we can help.

Thanks,

--Patric

> -Original Message-
> From: Przemysław Trędak 
> Sent: Friday, October 25, 2019 10:25 PM
> To: d...@mxnet.apache.org
> Subject: Re: MXNet 1.6.0 release
> 
> Dear MXNet Community
> 
> Last night I updated 1.6.x branch to point to current master. The code freeze
> is now in effect.
> 
> That said, since most of the features intended for 1.6 release are still not 
> fully
> finished (a few PRs for BERT GPU performance, multiple MKLDNN PRs,
> multiple PRs tagged NumPy etc.) we decided to go with a "soft" code freeze
> approach. Only the PRs that are in the scope of 1.6 release will now be
> accepted into 1.6.x branch. The hard code freeze is planned next week, Oct
> 31st.
> 
> While contributors of those in-scope PRs and their reviewers work to meet
> that deadline, I would like to call for action for the rest of the MXNet
> Community to test, raise issues and fix the bugs in the release.
> 
> Thank you
> Przemek
> 
> On 2019/10/11 00:00:34, Przemys��aw Tr��dak 
> wrote:
> > Hi MXNet Community,
> >
> > As the 1.5.1 patch release is done (many thanks Tao!), it is time to prepare
> for the next minor release of MXNet - 1.6.0.
> >
> > I (ptrendx@github / ptredak@mxnet Slack) would like to manage the
> release of 1.6.0. As it will be the first time for me to manage a release, Sam
> (samskalicky) and Lin (apeforest) agreed to help guiding me through the
> process.
> >
> > Thanks to Sheng there is a GitHub issue[1] listing major features that
> should go into the 1.6.0, please add any features that you want included
> there.
> >
> > That said, as we target November for the release, to accommodate for
> extensive testing and bugfixing, the code freeze date is set to October 24th
> 23:59PST. Please reach out to me as soon as possible if you feel that you will
> need an extension of that deadline for your feature.
> >
> > Sheng created a page on cwiki[2] about the release, I will populate it with
> the information and tracked issues and PRs.
> >
> > Thank you and let's make the great 1.6.0 release together!
> > Przemek
> >
> > [1] https://github.com/apache/incubator-mxnet/issues/15589
> > [2]
> https://cwiki.apache.org/confluence/display/MXNET/1.6.0+Release+Plan+a
> nd+Status
> >