Re: RE: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

Patrick Mu Mon, 13 Jul 2020 12:33:10 -0700

It happens only on CPU, and I did more runs and found that the runtime 
fluctuates very badly, but the average regression is ~10%.


Through the previous benchmarks I also found some worse regression comparing 
1.6 to 1.5 like inception inference on CPU and those regression was not caught. 

My 2-cent is it might not be a blocker for the release, and we can have room 
for improvement for upcoming 2.0 and 1.7.1 if necessary

Ziyi

On 2020/07/13 08:40:32, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: 
> Thanks Ziyi,
> 
> May I know which platform did you notice the performance regression, CPU or 
> GPU? ~20% regression would be a large gap.
> 
> Thanks,
> -Ciyong
> 
> -----Original Message-----
> From: Patrick Mu <zm2...@columbia.edu> 
> Sent: Monday, July 13, 2020 4:13 PM
> To: d...@mxnet.apache.org
> Subject: Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> 
> Hi Ciyong,
> 
> I have reverted the commit, and I am able to train Yolov3 with no problem.
> 
> However I also noticed there is a ~20% regression in 1.7 comparing with 1.6 
> in inference Yolov3 with Module API, so we are going to discuss tomorrow if 
> that would be an issue for 1.7.
> 
> Thanks,
> Ziyi
> 
> On 2020/07/13 02:19:28, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: 
> > Hi Ziyi, Xingjian,
> > 
> > Thanks for reporting the issues from GluonCV/AutoGluon perspective.
> > I just did a quick try by reverting the 
> > https://github.com/apache/incubator-mxnet/pull/18358, then the behavior is 
> > same as 1.6.0 with the cases in the gist 
> > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > 
> > Considering there's many end-users using Gluon based API/models, and 
> > introducing a new patch to fix this issue could be risky, so I agree that 
> > reverting this PR (#18358) might be the best option for the 1.7.0 release.
> > But I'm considering is there any other test cases to cover this feature, 
> > which could be helpful to track this kind of code changes in future, or can 
> > you help to verify if this revert do resolve the broken issue at your side?
> > 
> > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If 
> > > not, should we support combining multiprocessing.pool with the Gluon 
> > > Block?
> > Seems it's more like a new feature for MXNet Gluon Block, probably we can 
> > make it available in the next patch/minor release?
> > 
> > Thanks,
> > -Ciyong
> > 
> > -----Original Message-----
> > From: Xingjian SHI <xsh...@connect.ust.hk> 
> > Sent: Saturday, July 11, 2020 4:27 AM
> > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
> > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> > 
> > Thanks Ziyi,
> > 
> > I've discovered the same issue when I'm trying to use AutoGluon with 
> > 1.7.0rc0 and would like to share my finding:
> > 
> > Basically, I don't think Gluon Block is designed to be pickleble. But 
> > pickling do work for some cases in the old version:
> > 
> > I've included two cases in the gist 
> > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
> > 
> > - Case1: we construct a gluon block, hybridize it and feed one NDArray to 
> > help initialize the block. After that, it will no longer be pickleble. 
> > - Case2: we just construct a gluon block and it will be pickleble in 1.6.0, 
> > but won't be pickleble in 1.7.0.
> > 
> > Thus, the real issue is: Should we supporting pickling a Gluon Block? If 
> > not, should we support combining multiprocessing.pool with the Gluon Block? 
> > For reference, PyTorch supports pickling the nn.Module as shown in: 
> > https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 and 
> > also in the doc 
> > (https://pytorch.org/tutorials/beginner/saving_loading_models.html). 
> > 
> > Best,
> > Xingjian
> > 
> > 
> > On 7/10/20, 11:31 AM, "Patrick Mu" <zm2...@columbia.edu> wrote:
> > 
> >     Hi Ciyong, 
> > 
> >     I just discovered an issue with the 1.7, which causes the Yolo training 
> > with latest Gluon CV Yolo to fail.
> > 
> >     The PR that causes the failure is 
> > https://github.com/apache/incubator-mxnet/pull/18358, which modifies  basic 
> > blocks of Gluon to fix a memory leak issue.
> > 
> >     Talked with Leonard, the author of the PR, and he said he found the 
> > root cause, but patching that PR would modifies those Gluon basic blocks 
> > further, which might be risky towards existing models and various customer 
> > models.
> > 
> >     So my 2-cents is reverting this PR in 1.7, and try patching the PR in 
> > 1.x and 2.0, meaning that the 1.7 won't have memory usage optimized by that 
> > feature.
> > 
> >     I'd like to hear what you think about this issue.
> > 
> >     Thanks,
> >     Ziyi
> > 
> > 
> >     On 2020/07/10 06:18:02, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: 
> >     > Hi Community,
> >     > 
> >     > I would like to call for action to test/validate/vote for the release 
> > candidate (1.7.0.rc0)
> >     > As there's not any voting result during the scheduled time window, I 
> > would like to extend the time windows to July 13, 23:59:59 PST.
> >     > Please prepare your time and provide feedback if you've tried with 
> > the pre-release code bases, thanks!
> >     > 
> >     > Best regards,
> >     > Ciyong
> >     > 
> >     > -----Original Message-----
> >     > From: Chen, Ciyong <ciyong.c...@intel.com> 
> >     > Sent: Monday, July 6, 2020 10:48 PM
> >     > To: d...@mxnet.apache.org
> >     > Cc: Bob Paulin <b...@apache.org>; Henri Yandell <bay...@apache.org>; 
> > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>; Michael 
> > Wall <mjw...@apache.org>
> >     > Subject: RE: [VOTE] Release Apache MXNet (incubating) version 
> > 1.7.0.rc0
> >     > 
> >     > For the language bindings and windows platform, may I have your 
> > support to help verify these features? Thanks!
> >     > 
> >     > @lanking520 to help verify the Scala/Java @gigasquid to help verify 
> > the Clojure
> >     > @hetong007 to help verify the R
> >     > @yajiedesign to help verify the windows platform
> >     > 
> >     > Best regards,
> >     > Ciyong Chen
> >     > 
> >     > -----Original Message-----
> >     > From: Chen, Ciyong <ciyong.c...@intel.com>
> >     > Sent: Monday, July 6, 2020 10:39 PM
> >     > To: d...@mxnet.apache.org
> >     > Cc: Bob Paulin <b...@apache.org>; Henri Yandell <bay...@apache.org>; 
> > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>; Michael 
> > Wall <mjw...@apache.org>
> >     > Subject: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0
> >     > 
> >     > Dear MXNet community,
> >     > 
> >     > This is the vote to release Apache MXNet (incubating) version 1.7.0. 
> > Voting will start July 6, 23:59:59 PST and close on July 9, 23:59:59 PST.
> >     > 
> >     > Link to release notes:
> >     > https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+notes
> >     > 
> >     > Link to release candidate:
> >     > https://github.com/apache/incubator-mxnet/releases/tag/1.7.0.rc0
> >     > 
> >     > Link to source and signatures on apache dist server:
> >     > 
> > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0<https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0/>
> >     > 
> >     > Please remember to TEST first before voting accordingly:
> >     > +1 = approve
> >     > +0 = no opinion
> >     > -1 = disapprove (provide reason)
> >     > 
> >     > Additional notes:
> >     > 
> >     >   *   There was an issue and discussion[1] regarding on a few numpy 
> > operators failed due to numpy 1.19.0 released on Jun 20, 2020, which exists 
> > in all branches (works with numpy <= 1.18.5). As numpy operator is still an 
> > experimental feature in 1.7.0 release and mainly targeting in MXNet 2.0 
> > release, so I decided to not block the voting and instead let the Community 
> > decide whether this is a blocker for the release.
> >     > 
> >     > [1] https://github.com/apache/incubator-mxnet/issues/18600
> >     > 
> >     > Best regards,
> >     > Ciyong Chen
> >     > 
> >     > 
> > 
> > 
>

Re: RE: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

Reply via email to