RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

Chen, Ciyong Mon, 13 Jul 2020 19:13:36 -0700

Thanks all for the effort to double check the performance status and the 
valuable comments, then let's not taking it as a blocker and moving forward 
with the 1.7.0 release process.


Thanks,
-Ciyong

-----Original Message-----
From: Skalicky, Sam <sska...@amazon.com.INVALID> 
Sent: Tuesday, July 14, 2020 4:41 AM
To: dev@mxnet.incubator.apache.org; lau...@apache.org; d...@mxnet.apache.org
Subject: Re: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

That’s a good point, 1.6 did have a performance regression since it dropped 
MKLML to simplify build an fix licensing. 2.0 will have performance degradation 
too in favor of new features. Clearly the community is focusing on features 
rather than performance, at least we're consistent :-)

I would prefer we move forward with the 1.7.0 release and consider performance 
fixes for 1.7.1 (like we did for 1.3.1/1.4.1)

Sam

On 7/13/20, 1:36 PM, "Leonard Lausen" <lau...@apache.org> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    One of the selling points of MXNet is (or used to be) speed and having 
multiple
    releases in series with speed regressions may not be acceptable to users 
that
    adopted MXNet based on the speed advantage. Should we vote on a 1.7 Beta 
release
    and only vote on 1.7 final release once the regressions have been fixed?

    On Mon, 2020-07-13 at 19:33 +0000, Patrick Mu wrote:
    > It happens only on CPU, and I did more runs and found that the runtime
    > fluctuates very badly, but the average regression is ~10%.
    >
    >
    > Through the previous benchmarks I also found some worse regression 
comparing
    > 1.6 to 1.5 like inception inference on CPU and those regression was not
    > caught.
    >
    > My 2-cent is it might not be a blocker for the release, and we can have 
room
    > for improvement for upcoming 2.0 and 1.7.1 if necessary
    >
    > Ziyi
    >
    > On 2020/07/13 08:40:32, "Chen, Ciyong" <ciyong.c...@intel.com> wrote:
    > > Thanks Ziyi,
    > >
    > > May I know which platform did you notice the performance regression, 
CPU or
    > > GPU? ~20% regression would be a large gap.
    > >
    > > Thanks,
    > > -Ciyong
    > >
    > > -----Original Message-----
    > > From: Patrick Mu <zm2...@columbia.edu>
    > > Sent: Monday, July 13, 2020 4:13 PM
    > > To: d...@mxnet.apache.org
    > > Subject: Re: RE: [VOTE] Release Apache MXNet (incubating) version 
1.7.0.rc0
    > >
    > > Hi Ciyong,
    > >
    > > I have reverted the commit, and I am able to train Yolov3 with no 
problem.
    > >
    > > However I also noticed there is a ~20% regression in 1.7 comparing with 
1.6
    > > in inference Yolov3 with Module API, so we are going to discuss 
tomorrow if
    > > that would be an issue for 1.7.
    > >
    > > Thanks,
    > > Ziyi
    > >
    > > On 2020/07/13 02:19:28, "Chen, Ciyong" <ciyong.c...@intel.com> wrote:
    > > > Hi Ziyi, Xingjian,
    > > >
    > > > Thanks for reporting the issues from GluonCV/AutoGluon perspective.
    > > > I just did a quick try by reverting the
    > > > https://github.com/apache/incubator-mxnet/pull/18358, then the 
behavior is
    > > > same as 1.6.0 with the cases in the gist (
    > > > https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
    > > >
    > > > Considering there's many end-users using Gluon based API/models, and
    > > > introducing a new patch to fix this issue could be risky, so I agree 
that
    > > > reverting this PR (#18358) might be the best option for the 1.7.0 
release.
    > > > But I'm considering is there any other test cases to cover this 
feature,
    > > > which could be helpful to track this kind of code changes in future, 
or
    > > > can you help to verify if this revert do resolve the broken issue at 
your
    > > > side?
    > > >
    > > > > Thus, the real issue is: Should we supporting pickling a Gluon 
Block? If
    > > > > not, should we support combining multiprocessing.pool with the Gluon
    > > > > Block?
    > > > Seems it's more like a new feature for MXNet Gluon Block, probably we 
can
    > > > make it available in the next patch/minor release?
    > > >
    > > > Thanks,
    > > > -Ciyong
    > > >
    > > > -----Original Message-----
    > > > From: Xingjian SHI <xsh...@connect.ust.hk>
    > > > Sent: Saturday, July 11, 2020 4:27 AM
    > > > To: dev@mxnet.incubator.apache.org; d...@mxnet.apache.org
    > > > Subject: Re: [VOTE] Release Apache MXNet (incubating) version 
1.7.0.rc0
    > > >
    > > > Thanks Ziyi,
    > > >
    > > > I've discovered the same issue when I'm trying to use AutoGluon with
    > > > 1.7.0rc0 and would like to share my finding:
    > > >
    > > > Basically, I don't think Gluon Block is designed to be pickleble. But
    > > > pickling do work for some cases in the old version:
    > > >
    > > > I've included two cases in the gist (
    > > > https://gist.github.com/sxjscience/944066c82e566f1b89b01fa226678890).
    > > >
    > > > - Case1: we construct a gluon block, hybridize it and feed one 
NDArray to
    > > > help initialize the block. After that, it will no longer be pickleble.
    > > > - Case2: we just construct a gluon block and it will be pickleble in
    > > > 1.6.0, but won't be pickleble in 1.7.0.
    > > >
    > > > Thus, the real issue is: Should we supporting pickling a Gluon Block? 
If
    > > > not, should we support combining multiprocessing.pool with the Gluon
    > > > Block? For reference, PyTorch supports pickling the nn.Module as 
shown in:
    > > > https://gist.github.com/sxjscience/90b812a66d445e759c55eedc3ef93668 
and
    > > > also in the doc (
    > > > https://pytorch.org/tutorials/beginner/saving_loading_models.html).
    > > >
    > > > Best,
    > > > Xingjian
    > > >
    > > >
    > > > On 7/10/20, 11:31 AM, "Patrick Mu" <zm2...@columbia.edu> wrote:
    > > >
    > > >     Hi Ciyong,
    > > >
    > > >     I just discovered an issue with the 1.7, which causes the Yolo
    > > > training with latest Gluon CV Yolo to fail.
    > > >
    > > >     The PR that causes the failure is
    > > > https://github.com/apache/incubator-mxnet/pull/18358, which
    > > > modifies  basic blocks of Gluon to fix a memory leak issue.
    > > >
    > > >     Talked with Leonard, the author of the PR, and he said he found 
the
    > > > root cause, but patching that PR would modifies those Gluon basic 
blocks
    > > > further, which might be risky towards existing models and various 
customer
    > > > models.
    > > >
    > > >     So my 2-cents is reverting this PR in 1.7, and try patching the 
PR in
    > > > 1.x and 2.0, meaning that the 1.7 won't have memory usage optimized by
    > > > that feature.
    > > >
    > > >     I'd like to hear what you think about this issue.
    > > >
    > > >     Thanks,
    > > >     Ziyi
    > > >
    > > >
    > > >     On 2020/07/10 06:18:02, "Chen, Ciyong" <ciyong.c...@intel.com> 
wrote:
    > > >     > Hi Community,
    > > >     >
    > > >     > I would like to call for action to test/validate/vote for the
    > > > release candidate (1.7.0.rc0)
    > > >     > As there's not any voting result during the scheduled time 
window, I
    > > > would like to extend the time windows to July 13, 23:59:59 PST.
    > > >     > Please prepare your time and provide feedback if you've tried 
with
    > > > the pre-release code bases, thanks!
    > > >     >
    > > >     > Best regards,
    > > >     > Ciyong
    > > >     >
    > > >     > -----Original Message-----
    > > >     > From: Chen, Ciyong <ciyong.c...@intel.com>
    > > >     > Sent: Monday, July 6, 2020 10:48 PM
    > > >     > To: d...@mxnet.apache.org
    > > >     > Cc: Bob Paulin <b...@apache.org>; Henri Yandell 
<bay...@apache.org>;
    > > > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>;
    > > > Michael Wall <mjw...@apache.org>
    > > >     > Subject: RE: [VOTE] Release Apache MXNet (incubating) version
    > > > 1.7.0.rc0
    > > >     >
    > > >     > For the language bindings and windows platform, may I have your
    > > > support to help verify these features? Thanks!
    > > >     >
    > > >     > @lanking520 to help verify the Scala/Java @gigasquid to help 
verify
    > > > the Clojure
    > > >     > @hetong007 to help verify the R
    > > >     > @yajiedesign to help verify the windows platform
    > > >     >
    > > >     > Best regards,
    > > >     > Ciyong Chen
    > > >     >
    > > >     > -----Original Message-----
    > > >     > From: Chen, Ciyong <ciyong.c...@intel.com>
    > > >     > Sent: Monday, July 6, 2020 10:39 PM
    > > >     > To: d...@mxnet.apache.org
    > > >     > Cc: Bob Paulin <b...@apache.org>; Henri Yandell 
<bay...@apache.org>;
    > > > Jason Dai <jason...@apache.org>; Markus Weimer <wei...@apache.org>;
    > > > Michael Wall <mjw...@apache.org>
    > > >     > Subject: [VOTE] Release Apache MXNet (incubating) version 
1.7.0.rc0
    > > >     >
    > > >     > Dear MXNet community,
    > > >     >
    > > >     > This is the vote to release Apache MXNet (incubating) version 
1.7.0.
    > > > Voting will start July 6, 23:59:59 PST and close on July 9, 23:59:59 
PST.
    > > >     >
    > > >     > Link to release notes:
    > > >     >
    > > > https://cwiki.apache.org/confluence/display/MXNET/1.7.0+Release+notes
    > > >     >
    > > >     > Link to release candidate:
    > > >     > https://github.com/apache/incubator-mxnet/releases/tag/1.7.0.rc0
    > > >     >
    > > >     > Link to source and signatures on apache dist server:
    > > >     >
    > > > 
https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0<https://dist.apache.org/repos/dist/dev/incubator/mxnet/1.7.0.rc0/
    > > > >
    > > >     >
    > > >     > Please remember to TEST first before voting accordingly:
    > > >     > +1 = approve
    > > >     > +0 = no opinion
    > > >     > -1 = disapprove (provide reason)
    > > >     >
    > > >     > Additional notes:
    > > >     >
    > > >     >   *   There was an issue and discussion[1] regarding on a few 
numpy
    > > > operators failed due to numpy 1.19.0 released on Jun 20, 2020, which
    > > > exists in all branches (works with numpy <= 1.18.5). As numpy 
operator is
    > > > still an experimental feature in 1.7.0 release and mainly targeting in
    > > > MXNet 2.0 release, so I decided to not block the voting and instead 
let
    > > > the Community decide whether this is a blocker for the release.
    > > >     >
    > > >     > [1] https://github.com/apache/incubator-mxnet/issues/18600
    > > >     >
    > > >     > Best regards,
    > > >     > Ciyong Chen
    > > >     >
    > > >     >
    > > >
    > > >

RE: [VOTE] Release Apache MXNet (incubating) version 1.7.0.rc0

Reply via email to