It happens only on CPU, and I did more runs and found that the runtime fluctuates very badly, but the average regression is ~10%.
Through the previous benchmarks I also found some worse regression comparing 1.6 to 1.5 like inception inference on CPU and those regression was not caught. My 2-cent is it might not be a blocker for the release, and we can have room for improvement for upcoming 2.0 and 1.7.1 if necessary Ziyi On 2020/07/13 08:40:32, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: > Thanks Ziyi, > > May I know which platform did you notice the performance regression, CPU or > GPU? ~20% regression would be a large gap. > > Thanks, > -Ciyong > > -----Original Message----- > From: Patrick Mu <zm2...@columbia.edu> > Sent: Monday, July 13, 2020 4:13 PM > To: d...@mxnet.apache.org > Subject: Re: RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0 > > Hi Ciyong, > > I have reverted the commit, and I am able to train Yolov3 with no problem. > > However I also noticed there is a ~20% regression in 1.7 comparing with 1.6 > in inference Yolov3 with Module API, so we are going to discuss tomorrow if > that would be an issue for 1.7. > > Thanks, > Ziyi > > On 2020/07/13 02:19:28, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: > > Hi Ziyi, Xingjian, > > > > Thanks for reporting the issues from GluonCV/AutoGluon perspective. > > I just did a quick try by reverting the > > https://github.com/apache/incubator-mxnet/pull/18358, then the behavior is > > same as 1.6.0 with the cases in the gist > > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890). > > > > Considering there's many end-users using Gluon based API/models, and > > introducing a new patch to fix this issue could be risky, so I agree that > > reverting this PR (#18358) might be the best option for the 1.7.0 release. > > But I'm considering is there any other test cases to cover this feature, > > which could be helpful to track this kind of code changes in future, or can > > you help to verify if this revert do resolve the broken issue at your side? > > > > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If > > > not, should we support combining multiprocessing.pool with the Gluon > > > Block? > > Seems it's more like a new feature for MXNet Gluon Block, probably we can > > make it available in the next patch/minor release? > > > > Thanks, > > -Ciyong > > > > -----Original Message----- > > From: Xingjian SHI <xsh...@connect.ust.hk> > > Sent: Saturday, July 11, 2020 4:27 AM > > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0 > > > > Thanks Ziyi, > > > > I've discovered the same issue when I'm trying to use AutoGluon with > > 1.7.0rc0 and would like to share my finding: > > > > Basically, I don't think Gluon Block is designed to be pickleble. But > > pickling do work for some cases in the old version: > > > > I've included two cases in the gist > > (https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890). > > > > - Case1: we construct a gluon block, hybridize it and feed one NDArray to > > help initialize the block. After that, it will no longer be pickleble. > > - Case2: we just construct a gluon block and it will be pickleble in 1.6.0, > > but won't be pickleble in 1.7.0. > > > > Thus, the real issue is: Should we supporting pickling a Gluon Block? If > > not, should we support combining multiprocessing.pool with the Gluon Block? > > For reference, PyTorch supports pickling the nn.Module as shown in: > > https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 and > > also in the doc > > (https://pytorch.org/tutorials/beginner/saving_loading_models.html). > > > > Best, > > Xingjian > > > > > > On 7/10/20, 11:31 AM, "Patrick Mu" <zm2...@columbia.edu> wrote: > > > > Hi Ciyong, > > > > I just discovered an issue with the 1.7, which causes the Yolo training > > with latest Gluon CV Yolo to fail. > > > > The PR that causes the failure is > > https://github.com/apache/incubator-mxnet/pull/18358, which modifies basic > > blocks of Gluon to fix a memory leak issue. > > > > Talked with Leonard, the author of the PR, and he said he found the > > root cause, but patching that PR would modifies those Gluon basic blocks > > further, which might be risky towards existing models and various customer > > models. > > > > So my 2-cents is reverting this PR in 1.7, and try patching the PR in > > 1.x and 2.0, meaning that the 1.7 won't have memory usage optimized by that > > feature. > > > > I'd like to hear what you think about this issue. > > > > Thanks, > > Ziyi > > > > > > On 2020/07/10 06:18:02, "Chen, Ciyong" <ciyong.c...@intel.com> wrote: > > > Hi Community, > > > > > > I would like to call for action to test/validate/vote for the release > > candidate (1.7.0.rc0) > > > As there's not any voting result during the scheduled time window, I > > would like to extend the time windows to July 13, 23:59:59 PST. > > > Please prepare your time and provide feedback if you've tried with > > the pre-release code bases, thanks! > > > > > > Best regards, > > > Ciyong > > > > > > -----Original Message----- > > > From: Chen, Ciyong <ciyong.c...@intel.com> > > > Sent: Monday, July 6, 2020 10:48 PM > > > To: d...@mxnet.apache.org > > > Cc: Bob Paulin <b...@apache.org>; Henri Yandell <bay...@apache.org>; > > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>; Michael > > Wall <mjw...@apache.org> > > > Subject: RE: [VOTE] Release Apache MXNet (incubating) version > > 1.7.0.rc0 > > > > > > For the language bindings and windows platform, may I have your > > support to help verify these features? Thanks! > > > > > > @lanking520 to help verify the Scala/Java @gigasquid to help verify > > the Clojure > > > @hetong007 to help verify the R > > > @yajiedesign to help verify the windows platform > > > > > > Best regards, > > > Ciyong Chen > > > > > > -----Original Message----- > > > From: Chen, Ciyong <ciyong.c...@intel.com> > > > Sent: Monday, July 6, 2020 10:39 PM > > > To: d...@mxnet.apache.org > > > Cc: Bob Paulin <b...@apache.org>; Henri Yandell <bay...@apache.org>; > > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>; Michael > > Wall <mjw...@apache.org> > > > Subject: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0 > > > > > > Dear MXNet community, > > > > > > This is the vote to release Apache MXNet (incubating) version 1.7.0. > > Voting will start July 6, 23:59:59 PST and close on July 9, 23:59:59 PST. > > > > > > Link to release notes: > > > https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+notes > > > > > > Link to release candidate: > > > https://github.com/apache/incubator-mxnet/releases/tag/1.7.0.rc0 > > > > > > Link to source and signatures on apache dist server: > > > > > https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0<https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0/> > > > > > > Please remember to TEST first before voting accordingly: > > > +1 = approve > > > +0 = no opinion > > > -1 = disapprove (provide reason) > > > > > > Additional notes: > > > > > > * There was an issue and discussion[1] regarding on a few numpy > > operators failed due to numpy 1.19.0 released on Jun 20, 2020, which exists > > in all branches (works with numpy <= 1.18.5). As numpy operator is still an > > experimental feature in 1.7.0 release and mainly targeting in MXNet 2.0 > > release, so I decided to not block the voting and instead let the Community > > decide whether this is a blocker for the release. > > > > > > [1] https://github.com/apache/incubator-mxnet/issues/18600 > > > > > > Best regards, > > > Ciyong Chen > > > > > > > > > > >