Thanks everyone for testing and voting for the release. I am working with Sheng to finalize and post the release. Announcement will follow soon.
Regards, Roshani On Mon, Sep 10, 2018 at 7:03 AM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > Tracked down the issue referred to above and it's not a bug. I'll update > the ticket. > > Changing to +1. > > On Mon, Sep 10, 2018 at 3:00 PM kellen sunderland < > kellen.sunderl...@gmail.com> wrote: > > > -0.1 > > > > There's one test failure I've run into (details below). Following > Indhu's > > logic I don't think this should block the release as it's not relating > to a > > release feature introduced in this version. > > > > I'm trying to use the cpp-package examples as reference code for how to > > run MXNet models from a native context. I'd like to run them with ASAN > as a > > sanity check for memory leaks and pointer errors. I was continually > > running into segfaults and crashes w/ and w/o ASAN. A little googling > > shows me that this issue has already been reported, and is related to > > running tests on CPU, not to any changes I made: > > https://github.com/apache/incubator-mxnet/issues/9814 Having what our > > effectively our reference examples crash is not a good practice IMO. > > > > I also share some concerns around the fp16 failures. I know developers > > who are currently porting their models to Gluon who use fp16. They'll be > > disappointed with the error. > > > > In general though, release looks good. Big thanks to Sheng and Roshani > > for putting it together (and sorry for the late testing). > > > > -Kellen > > > > > > On Fri, Sep 7, 2018 at 4:31 AM Anirudh <anirudh2...@gmail.com> wrote: > > > >> -1 Considering that using fp16 with gluon is much easier than the > >> alternative where you need access to the model code, this fix is really > >> useful. I understand the pain of doing mxnet release and appreciate > >> Roshani > >> and Shengs efforts, but this seems like something we should fix. > >> > >> On Thu, Sep 6, 2018, 4:57 PM Haibin Lin <haibin.lin....@gmail.com> > wrote: > >> > >> > +1 built from source and passes dist_sync_kvstore test on Ubuntu. > >> > > >> > Best, > >> > Haibin > >> > > >> > On Thu, Sep 6, 2018 at 1:32 PM Indhu <indhubhara...@gmail.com> wrote: > >> > > >> > > +1 > >> > > > >> > > The release candidate looks good. I'm able to build and run basic > >> models. > >> > > > >> > > One the FP16 issue: > >> > > > >> > > Like others have pointed out, releases on expensive in terms of time > >> and > >> > > effort. There needs to be a high and more objective bar on what > >> qualifies > >> > > as a release blocker to make sure we are not setting precedence for > a > >> lot > >> > > of release blockers in future. > >> > > > >> > > I think a release blocker is justified only if there is a serious > bug > >> > > discovered in one of the features included in the release or if > there > >> is > >> > a > >> > > regression. Given FP16 supports is not a new feature claimed in this > >> > > release and this is not a regression in this release candidate, I'm > >> > > inclined to release this candidate and include the FP16 fix in a > >> > subsequent > >> > > release. > >> > > > >> > > Thanks, > >> > > Indu > >> > > > >> > > On Wed, Sep 5, 2018 at 10:21 AM Aaron Markham < > >> aaron.s.mark...@gmail.com > >> > > > >> > > wrote: > >> > > > >> > > > 0 (non-binding) If we have a problem that blocks users, and a > >> solution > >> > in > >> > > > hand... then we should fix it, but not at the expense of starting > >> the > >> > > > release cycle again just for one fix. Users can cherry pick or > build > >> > from > >> > > > master if they want the fix right away, right? I'd change my mind > >> to -1 > >> > > if > >> > > > this wasn't the case, with good reason, and if the user impact was > >> > > critical > >> > > > to adoption or risks abandonment. > >> > > > > >> > > > > >> > > > On Wed, Sep 5, 2018 at 9:57 AM Roshani Nagmote < > >> > > roshaninagmo...@gmail.com> > >> > > > wrote: > >> > > > > >> > > > > I believe everyone here is working hard to make MXNet a better > >> > > framework > >> > > > > for users. It's completely okay to have different opinions, we > can > >> > > decide > >> > > > > together if this issue is a blocker or not after voting time is > >> over. > >> > > > > > >> > > > > As I mentioned before, voting will end at 7 pm today. So there > is > >> > still > >> > > > > time to test the release. If there are any other issues anyone > >> > finds, I > >> > > > > will be happy to start the process again and work on RC1. For > >> now, I > >> > > want > >> > > > > to encourage everyone to utilize this time and vote. :) > >> > > > > > >> > > > > Thanks, > >> > > > > Roshani > >> > > > > > >> > > > > On Tue, Sep 4, 2018 at 10:35 PM sandeep krishnamurthy < > >> > > > > sandeep.krishn...@gmail.com> wrote: > >> > > > > > >> > > > > > 1. As a Apache MXNet community member, I raised the concern > >> of > >> > > > broken > >> > > > > > functionality for the user. I explained and provided the > data > >> > > points > >> > > > > on > >> > > > > > the > >> > > > > > issue, workaround and why I think it is important. If after > >> all > >> > > > this, > >> > > > > > you > >> > > > > > think my vote is biased on my employer just because a user > I > >> > > quoted > >> > > > is > >> > > > > > from > >> > > > > > Amazon, this is more concerning to me on my voting > abilities. > >> > > > > > 2. My -1 no where undermines the huge amount of effort that > >> goes > >> > > > > behind > >> > > > > > the scene for a release to happen. Great respect and > >> recognition > >> > > for > >> > > > > > everyone involved in all the releases of MXNet in the past > >> and > >> > > > this. I > >> > > > > > voted on my judgement of what may be good for the users of > >> > MXNet. > >> > > > > > 3. As pointed by Naveen & Chris, -1 are NOT veto. Feel free > >> to > >> > > > decide > >> > > > > > and progress on the release as we already have >3 +1 in > this > >> > > thread. > >> > > > > > > >> > > > > > > >> > > > > > Best, > >> > > > > > > >> > > > > > Sandeep > >> > > > > > > >> > > > > > On Tue, Sep 4, 2018 at 8:29 PM Chris Olivier < > >> > cjolivie...@gmail.com> > >> > > > > > wrote: > >> > > > > > > >> > > > > > > btw, there are no vetoes on package releases: > >> > > > > > > > >> > > > > > > VOTES ON PACKAGE RELEASES > >> > > > > > > <https://www.apache.org/foundation/voting.html#ReleaseVotes > > > >> > > > > > > > >> > > > > > > Votes on whether a package is ready to be released use > >> majority > >> > > > > approval > >> > > > > > > < > >> > https://www.apache.org/foundation/glossary.html#MajorityApproval> > >> > > > -- > >> > > > > > i.e. > >> > > > > > > at least three PMC members must vote affirmatively for > >> release, > >> > and > >> > > > > there > >> > > > > > > must be more positive than negative votes.Releases may not > be > >> > > vetoed. > >> > > > > > > Generally > >> > > > > > > the community will cancel the release vote if anyone > >> identifies > >> > > > serious > >> > > > > > > problems, but in most cases the ultimate decision, lies with > >> the > >> > > > > > individual > >> > > > > > > serving as release manager. The specifics of the process may > >> vary > >> > > > from > >> > > > > > > project to project, but the 'minimum quorum of three +1 > votes' > >> > rule > >> > > > is > >> > > > > > > universal. > >> > > > > > > > >> > > > > > > On Tue, Sep 4, 2018 at 7:12 PM Sheng Zha < > szha....@gmail.com> > >> > > wrote: > >> > > > > > > > >> > > > > > > > Thanks for sharing your opinions, Thomas. Your recognition > >> and > >> > > > > respect > >> > > > > > of > >> > > > > > > > people's efforts on preparing the release candidate are > >> > certainly > >> > > > > > > > appreciated. > >> > > > > > > > > >> > > > > > > > Now that the vote is set to fail thanks to the veto, there > >> will > >> > > be > >> > > > > > plenty > >> > > > > > > > of opportunities to include those bug fixes, including the > >> one > >> > > Zhi > >> > > > > > > > mentioned [1], which was already merged in the master and > >> yet > >> > > chose > >> > > > > not > >> > > > > > > to > >> > > > > > > > block this release with [2]. I will be happy to work with > >> > Roshani > >> > > > to > >> > > > > > > > prepare another release candidate once ready. > >> > > > > > > > > >> > > > > > > > -sz > >> > > > > > > > > >> > > > > > > > [1] > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://lists.apache.org/thread.html/f02e952bec22c82cb00a6741390a78f55373311c97464997bb455a6c@%3Cdev.mxnet.apache.org%3E > >> > > > > > > > [2] > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://lists.apache.org/thread.html/85d3fcabb3437ba7f1af455cf69aa13eb3afd1ea1d1f6f891e9c339c@%3Cdev.mxnet.apache.org%3E > >> > > > > > > > > >> > > > > > > > On Tue, Sep 4, 2018 at 6:02 PM Thomas DELTEIL < > >> > > > > > thomas.delte...@gmail.com > >> > > > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > -0 > >> > > > > > > > > (non-binding) > >> > > > > > > > > > >> > > > > > > > > If I may add some nuancing plus a personal data point as > >> one > >> > of > >> > > > the > >> > > > > > > users > >> > > > > > > > > commenting in the bug report in question: > >> > > > > > > > > > >> > > > > > > > > - Performance vs. Basic functionality => I don't think > >> high > >> > > > > > performance > >> > > > > > > > > use-cases and basic functionality are two obviously > >> opposed > >> > > > > concepts > >> > > > > > > and > >> > > > > > > > > see no contradiction in Hagay's and Sandeep's > statements. > >> > > > > > > > > Float16 support is feature of MXNet that provides more > >> than > >> > > twice > >> > > > > the > >> > > > > > > > > performance of Float32 on supported platforms, hence the > >> high > >> > > > > > > performance > >> > > > > > > > > use-case. The bug is that the basic functionality of > >> > reloading > >> > > a > >> > > > > > saved > >> > > > > > > > > float16 models is currently broken. > >> > > > > > > > > > >> > > > > > > > > - This bug vs Other bugs => Contrary the vast majority > of > >> the > >> > > 140 > >> > > > > > open > >> > > > > > > > bugs > >> > > > > > > > > that are mentioned above, I would put to Sandeep's > credit > >> > that > >> > > > this > >> > > > > > one > >> > > > > > > > bug > >> > > > > > > > > has a PR open that provides a fix for it. This would > make > >> it > >> > a > >> > > > > better > >> > > > > > > > > candidate to get included in this release than a bug > that > >> has > >> > > no > >> > > > > fix > >> > > > > > > > ready > >> > > > > > > > > for it. > >> > > > > > > > > > >> > > > > > > > > - Personal datapoint: I recently did some > experimentation > >> > with > >> > > > > > float16 > >> > > > > > > > [1] > >> > > > > > > > > and actually coincidentally just published a video on > >> > > optimizing > >> > > > > > > > > performance for Gluon. Float16 conversion is one of the > >> most, > >> > > if > >> > > > > not > >> > > > > > > the > >> > > > > > > > > most effective way to get performance out of MXNet [2]. > I > >> > > believe > >> > > > > > there > >> > > > > > > > is > >> > > > > > > > > a lot of value in publicizing more its use and hence > >> making > >> > > sure > >> > > > at > >> > > > > > > least > >> > > > > > > > > the basic support for normal use-cases is present. > >> > > > > > > > > > >> > > > > > > > > Of course this needs to be balanced with the overhead of > >> > > > preparing > >> > > > > a > >> > > > > > > new > >> > > > > > > > > release candidate once the fixed is reviewed and merged, > >> > which > >> > > > > seems > >> > > > > > to > >> > > > > > > > be > >> > > > > > > > > a lengthy and complex process in its own right, and the > >> delay > >> > > > with > >> > > > > > > > > providing the other features present in 1.3 for users > that > >> > are > >> > > > not > >> > > > > > > > running > >> > > > > > > > > off the nightly builds. > >> > > > > > > > > > >> > > > > > > > > All the best, > >> > > > > > > > > > >> > > > > > > > > Thomas > >> > > > > > > > > > >> > > > > > > > > [1] > >> > > https://github.com/ThomasDelteil/PerformanceTricksMXNetGluon > >> > > > > > > > > [2] > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://www.youtube.com/watch?v=Cqo7FPftNyo&t=0s&list=PLkEvNnRk8uVk6U515Pj-jHQUxFC4eDi3m > >> > > > > > > > > > >> > > > > > > > > Le mar. 4 sept. 2018 à 17:11, Sheng Zha < > >> szha....@gmail.com> > >> > a > >> > > > > > écrit : > >> > > > > > > > > > >> > > > > > > > > > Sandeep, > >> > > > > > > > > > > >> > > > > > > > > > Thanks for explaining your veto. We have open bugs > that > >> > > > impacted > >> > > > > a > >> > > > > > > lot > >> > > > > > > > > more > >> > > > > > > > > > than just 3 customers, just by referring to the number > >> of > >> > > > > > commenters > >> > > > > > > on > >> > > > > > > > > the > >> > > > > > > > > > issue [1]. > >> > > > > > > > > > > >> > > > > > > > > > You said that this is for "high performance use > cases", > >> > which > >> > > > > > > > contradicts > >> > > > > > > > > > with Hagay's assement that this is "basic > functionality > >> > > > broken". > >> > > > > > > Given > >> > > > > > > > > that > >> > > > > > > > > > this is for advanced use cases of using half-precision > >> > > > training, > >> > > > > > why > >> > > > > > > is > >> > > > > > > > > it > >> > > > > > > > > > so much more important than any other open bug > reports, > >> > that > >> > > > for > >> > > > > > this > >> > > > > > > > > > specific bug fix, we have to delay the access of > regular > >> > > users > >> > > > to > >> > > > > > the > >> > > > > > > > new > >> > > > > > > > > > MXNet 1.3 release by at least another week? > >> > > > > > > > > > > >> > > > > > > > > > Honestly, I'm concerned that your vote is biased by > >> Amazon > >> > > > > > > involvement, > >> > > > > > > > > > given that you quoted Amazon Rekognition. > >> > > > > > > > > > > >> > > > > > > > > > -sz > >> > > > > > > > > > > >> > > > > > > > > > [1] > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+is%3Aopen+label%3ABug+sort%3Acomments-desc > >> > > > > > > > > > > >> > > > > > > > > > On Tue, Sep 4, 2018 at 4:51 PM sandeep krishnamurthy < > >> > > > > > > > > > sandeep.krishn...@gmail.com> wrote: > >> > > > > > > > > > > >> > > > > > > > > > > My initial vote of “-0” was due to lack of info > from a > >> > user > >> > > > who > >> > > > > > had > >> > > > > > > > > said, > >> > > > > > > > > > > he overcame this issue for FP16 model. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > However, suggested workaround [1] for the issue is > not > >> > > > straight > >> > > > > > > > forward > >> > > > > > > > > > and > >> > > > > > > > > > > generally usable for all users. Also, issue is not > >> simple > >> > > and > >> > > > > > > > isolated > >> > > > > > > > > to > >> > > > > > > > > > > be listed in the Release Notes as known issue with a > >> > > > > workaround. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Changing my vote to: "-1 (binding)" owing to the > user > >> > > impact > >> > > > > [3] > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > @Sheng: > >> > > > > > > > > > > > >> > > > > > > > > > > 1. Agreed, bug existed from long time. However, FP16 > >> and > >> > > such > >> > > > > > > > > > optimizations > >> > > > > > > > > > > were added later on. Followed by users [2] using > this > >> > > feature > >> > > > > for > >> > > > > > > > high > >> > > > > > > > > > > performance use cases. It is not ok to measure > >> severity > >> > of > >> > > > the > >> > > > > > bug > >> > > > > > > > > based > >> > > > > > > > > > on > >> > > > > > > > > > > its past existence, rather we can see who is > impacted > >> now > >> > > and > >> > > > > is > >> > > > > > > it a > >> > > > > > > > > > small > >> > > > > > > > > > > subset with a simple workaround or large user > >> impacting > >> > > > issue. > >> > > > > > > > > > > > >> > > > > > > > > > > 2. Agreed bug was reported 7/21. However, I became > >> aware > >> > of > >> > > > > this > >> > > > > > > > issue > >> > > > > > > > > on > >> > > > > > > > > > > 08/29 and submitted the fix on 08/30. Also, I did > >> bring > >> > > this > >> > > > to > >> > > > > > the > >> > > > > > > > > > notice > >> > > > > > > > > > > of community, you and 1.3 release manager (Roshani) > on > >> > the > >> > > > RC0 > >> > > > > > > > proposal > >> > > > > > > > > > > thread. Also, I would focus on the issue and user > >> impact > >> > > than > >> > > > > who > >> > > > > > > > > > > identified and who is fixing the issue. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Based on my discussion with 2 users, I think it is a > >> > > > important > >> > > > > > > > feature > >> > > > > > > > > > for > >> > > > > > > > > > > them to see in Apache MXNet v1.3.0. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > Best, > >> > > > > > > > > > > > >> > > > > > > > > > > Sandeep > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > [1] Workaround used by the user. > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > net_fp16 = > >> > > > > > > mx.gluon.SymbolBlock.imports('resnet34_fp16-symbol.json', > >> > > > > > > > > > > ['data']) > >> > > > > > > > > > > > >> > > > > > > > > > > params_fp16 = > mx.nd.load('resnet34_fp16-0000.params') > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > for k, v in params_fp16.items(): > >> > > > > > > > > > > > >> > > > > > > > > > > new_key = k.split(':')[1] > >> > > > > > > > > > > > >> > > > > > > > > > > net_fp16.collect_params()[new_key].cast(v.dtype) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > net_fp16.collect_params().load('resnet34_fp16-0000.params', > >> > > > > ctx) > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > [2] Amazon Rekognition > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > [3] User story: Train a model -> Cast it to FP16 -> > >> Save > >> > > the > >> > > > > > model > >> > > > > > > -> > >> > > > > > > > > > Load > >> > > > > > > > > > > back the model does not work. They have to cast > every > >> > > > parameter > >> > > > > > > with > >> > > > > > > > a > >> > > > > > > > > > > workaround mentioned above [1]. > >> > > > > > > > > > > > >> > > > > > > > > > > On Tue, Sep 4, 2018 at 4:14 PM Hagay Lupesko < > >> > > > > lupe...@gmail.com> > >> > > > > > > > > wrote: > >> > > > > > > > > > > > >> > > > > > > > > > > > Hi Sheng, > >> > > > > > > > > > > > > >> > > > > > > > > > > > Addressing your questions: > >> > > > > > > > > > > > > >> > > > > > > > > > > > - "why this specific bug is more important than > all > >> the > >> > > > other > >> > > > > > > known > >> > > > > > > > > > bugs, > >> > > > > > > > > > > > that this becomes a release blocker" > >> > > > > > > > > > > > I do not consider it to be more or less important > >> than > >> > > > other > >> > > > > > > fixes. > >> > > > > > > > > It > >> > > > > > > > > > > can > >> > > > > > > > > > > > be fixed and included in the release alongside the > >> rest > >> > > of > >> > > > > the > >> > > > > > > > > release > >> > > > > > > > > > > > content, right? > >> > > > > > > > > > > > From the description of the issue it seems > important > >> > > since > >> > > > it > >> > > > > > is > >> > > > > > > > > > blocking > >> > > > > > > > > > > > users from loading models that were previously > >> trained > >> > > and > >> > > > > > saved. > >> > > > > > > > > There > >> > > > > > > > > > > is > >> > > > > > > > > > > > nothing stopping the community from including this > >> fix > >> > > into > >> > > > > > > 1.3.0, > >> > > > > > > > > > > > alongside the rest of the features and fixes. > >> > > > > > > > > > > > > >> > > > > > > > > > > > - "The bug exists since SymbolBlock was > introduced a > >> > year > >> > > > ago > >> > > > > > and > >> > > > > > > > has > >> > > > > > > > > > > > survived at least three releases, so this is not a > >> > > > > regression." > >> > > > > > > > > > > > I do not think I said it is a regression. However, > >> the > >> > > > fact a > >> > > > > > bug > >> > > > > > > > > > existed > >> > > > > > > > > > > > before, does not mean it is OK to release it > rather > >> > than > >> > > > fix > >> > > > > > it. > >> > > > > > > > > > > > > >> > > > > > > > > > > > - "Timeline-wise, this bug was reported on 7/21, > but > >> > was > >> > > > not > >> > > > > > > > reported > >> > > > > > > > > > as > >> > > > > > > > > > > > release-blocker in the release discussion thread > >> until > >> > > 8/31 > >> > > > > > [1]. > >> > > > > > > > > > Neither > >> > > > > > > > > > > > its reporting as release-blocker nor its fix made > it > >> > for > >> > > > the > >> > > > > > 8/3 > >> > > > > > > > code > >> > > > > > > > > > > > freeze." > >> > > > > > > > > > > > You are right, would have been better to have this > >> > > > identified > >> > > > > > and > >> > > > > > > > > fixed > >> > > > > > > > > > > > earlier and included before code freeze. > >> > > > > > > > > > > > > >> > > > > > > > > > > > - "The PR is still not ready yet as it doesn't > have > >> > > > > approval." > >> > > > > > > > > > > > I think it is waiting for your review. > >> > > > > > > > > > > > > >> > > > > > > > > > > > - "it would be great if you could provide some > >> > additional > >> > > > > > > reasoning > >> > > > > > > > > > > besides > >> > > > > > > > > > > > "X mentions the issue" or "fix was done by X"" > >> > > > > > > > > > > > I have. Repeating what I wrote in my previous > email > >> for > >> > > > > > clarity: > >> > > > > > > > > Basic > >> > > > > > > > > > > > functionality broken: loading a model (albeit one > >> that > >> > > that > >> > > > > was > >> > > > > > > > saved > >> > > > > > > > > > as > >> > > > > > > > > > > > non FP32) > >> > > > > > > > > > > > > >> > > > > > > > > > > > So, yes - this issue seems to have been out there > >> for a > >> > > > > while, > >> > > > > > > > > somehow > >> > > > > > > > > > > went > >> > > > > > > > > > > > under the radar... but I think the key question is > >> > > whether > >> > > > > this > >> > > > > > > > > blocks > >> > > > > > > > > > a > >> > > > > > > > > > > > basic functionality in MXNet. I believe so, hence > >> my -1 > >> > > > vote. > >> > > > > > > > > > > > > >> > > > > > > > > > > > Hagay > >> > > > > > > > > > > > > >> > > > > > > > > > > > On Tue, Sep 4, 2018 at 1:19 PM Sheng Zha < > >> > > > szha....@gmail.com > >> > > > > > > >> > > > > > > > wrote: > >> > > > > > > > > > > > > >> > > > > > > > > > > > > Hi Hagay and Sandeep, > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Could you help us understand why this specific > >> bug is > >> > > > more > >> > > > > > > > > important > >> > > > > > > > > > > than > >> > > > > > > > > > > > > all the other known bugs, that this becomes a > >> release > >> > > > > > blocker? > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Some facts to consider: > >> > > > > > > > > > > > > - The bug exists since SymbolBlock was > introduced > >> a > >> > > year > >> > > > > ago > >> > > > > > > and > >> > > > > > > > > has > >> > > > > > > > > > > > > survived at least three releases, so this is > not a > >> > > > > > regression. > >> > > > > > > > > > > > > - Timeline-wise, this bug was reported on 7/21, > >> but > >> > was > >> > > > not > >> > > > > > > > > reported > >> > > > > > > > > > as > >> > > > > > > > > > > > > release-blocker in the release discussion thread > >> > until > >> > > > 8/31 > >> > > > > > > [1]. > >> > > > > > > > > > > Neither > >> > > > > > > > > > > > > its reporting as release-blocker nor its fix > made > >> it > >> > > for > >> > > > > the > >> > > > > > > 8/3 > >> > > > > > > > > code > >> > > > > > > > > > > > > freeze. > >> > > > > > > > > > > > > - The PR is still not ready yet as it doesn't > have > >> > > > > approval. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > Hagay, it would be great if you could provide > some > >> > > > > additional > >> > > > > > > > > > reasoning > >> > > > > > > > > > > > > besides "X mentions the issue" or "fix was done > by > >> > X". > >> > > > > > Thanks. > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > -sz > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > [1] > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://lists.apache.org/thread.html/d1ed611f98c20d5d85c294b0c07c8bdebca13a209cf66a3872c9123e@%3Cdev.mxnet.apache.org%3E > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:39 PM Hagay Lupesko < > >> > > > > > > lupe...@gmail.com > >> > > > > > > > > > >> > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > Sandeep mentions the issue of an error when > user > >> > > tries > >> > > > to > >> > > > > > > load > >> > > > > > > > > > model > >> > > > > > > > > > > > > params > >> > > > > > > > > > > > > > trained/saved as FP16. > >> > > > > > > > > > > > > > > >> > > https://github.com/apache/incubator-mxnet/issues/11849 > >> > > > > > > > > > > > > > The fix was done by Sandeep: > >> > > > > > > > > > > > > > > >> > https://github.com/apache/incubator-mxnet/pull/12412 > >> > > > and > >> > > > > > is > >> > > > > > > > > ready > >> > > > > > > > > > to > >> > > > > > > > > > > > be > >> > > > > > > > > > > > > > cherry picked into the release branch. > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > This seems like a release blocker to me: > >> > > > > > > > > > > > > > - Basic functionality broken: loading a model > >> > (albeit > >> > > > one > >> > > > > > > that > >> > > > > > > > > that > >> > > > > > > > > > > was > >> > > > > > > > > > > > > > saved as non FP32) > >> > > > > > > > > > > > > > - Reported by 3 users (wgchang@, nicklhy@ and > >> > > > > > ThomasDelteil@ > >> > > > > > > ) > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > -1 (non binding) > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > Hagay > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > On Tue, Sep 4, 2018 at 12:01 PM sandeep > >> > > krishnamurthy < > >> > > > > > > > > > > > > > sandeep.krishn...@gmail.com> wrote: > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > "- 0" > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > I believe the bug #11849 > >> > > > > > > > > > > > > > > < > >> > > > > https://github.com/apache/incubator-mxnet/issues/11849 > >> > > > > > >, > >> > > > > > > > > unable > >> > > > > > > > > > > to > >> > > > > > > > > > > > > > import > >> > > > > > > > > > > > > > > non-fp32 models into Gluon, fixed in this PR > >> > #12412 > >> > > > > > > > > > > > > > > < > >> > > > https://github.com/apache/incubator-mxnet/pull/12412> > >> > > > > > is > >> > > > > > > > > > > important > >> > > > > > > > > > > > > for > >> > > > > > > > > > > > > > > the > >> > > > > > > > > > > > > > > users. I would rather pick this fix in this > >> > release > >> > > > > than > >> > > > > > > > plan a > >> > > > > > > > > > > minor > >> > > > > > > > > > > > > > > release later. > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > Best, > >> > > > > > > > > > > > > > > Sandeep > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 2:34 PM Philip Cho < > >> > > > > > > > > > > > chohy...@cs.washington.edu> > >> > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > Actually, the command "git clone > --recursive > >> > > > > > > > > > > > > > > > https://github.com/apache/incubator-mxnet > >> -b > >> > > > > > 1.3.0.rc0" > >> > > > > > > > > works > >> > > > > > > > > > > fine > >> > > > > > > > > > > > > > now, > >> > > > > > > > > > > > > > > > never mind. > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 1:45 PM Philip Cho > < > >> > > > > > > > > > > > > chohy...@cs.washington.edu> > >> > > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Unfortunately, MXNet was depending on a > >> > branch > >> > > of > >> > > > > TVM > >> > > > > > > > that > >> > > > > > > > > is > >> > > > > > > > > > > now > >> > > > > > > > > > > > > > > > deleted. > >> > > > > > > > > > > > > > > > > We will have to merge #12448 > >> > > > > > > > > > > > > > > > > < > >> > > > > > https://github.com/apache/incubator-mxnet/pull/12448> > >> > > > > > > > > > before > >> > > > > > > > > > > > the > >> > > > > > > > > > > > > > > > release. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Background: See dmlc/tvm#1394 < > >> > > > > > > > > > > > > > https://github.com/dmlc/tvm/issues/1394 > >> > > > > > > > > > > > > > > >. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > Philip. > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > On Mon, Sep 3, 2018 at 7:26 AM Carin > >> Meier < > >> > > > > > > > > > > carinme...@gmail.com > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > wrote: > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> Checked out the tag, built and tested > the > >> > > > Clojure > >> > > > > > > > package. > >> > > > > > > > > > +1 > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > >> On Fri, Aug 31, 2018 at 10:59 PM > Roshani > >> > > > Nagmote < > >> > > > > > > > > > > > > > > > >> roshaninagmo...@gmail.com> > >> > > > > > > > > > > > > > > > >> wrote: > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > >> > Hi all, > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > I would like to propose a vote to > >> release > >> > > > Apache > >> > > > > > > MXNet > >> > > > > > > > > > > > > > (incubating) > >> > > > > > > > > > > > > > > > >> version > >> > > > > > > > > > > > > > > > >> > 1.3.0.RC0. Voting will start now > >> (Friday, > >> > > Aug > >> > > > > > 31st) > >> > > > > > > > and > >> > > > > > > > > > end > >> > > > > > > > > > > at > >> > > > > > > > > > > > > > 7:00 > >> > > > > > > > > > > > > > > PM > >> > > > > > > > > > > > > > > > >> > PDT, Wednesday, Sept 5th. > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > Link to release notes: > >> > > > > > > > > > > > > > > > >> > > >> > > > > > https://github.com/apache/incubator-mxnet/releases > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > Link to release candidate 1.3.0.rc0: > >> > > > > > > > > > > > > > > > >> > * > >> > > > > > > > > > > > > > >> > > > > > > > >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc > >> > > > > > > > > > > > > > > > >> > < > >> > > > > > > > > > > > > > >> > > > > > > > >> https://github.com/apache/incubator-mxnet/releases/tag/1.3.0.rc0 > >> > > > > > > > > > > > > > > >0* > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > View this page, click on "Build from > >> > > Source", > >> > > > > and > >> > > > > > > use > >> > > > > > > > > the > >> > > > > > > > > > > > source > >> > > > > > > > > > > > > > > code > >> > > > > > > > > > > > > > > > >> > obtained from 1.3.0.rc0 tag: > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > https://mxnet.incubator.apache.org/install/index.html > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > Please remember to TEST first before > >> > voting > >> > > > > > > > accordingly: > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > +1 = approve > >> > > > > > > > > > > > > > > > >> > +0 = no opinion > >> > > > > > > > > > > > > > > > >> > -1 = disapprove (provide reason) > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > Thanks, > >> > > > > > > > > > > > > > > > >> > Roshani > >> > > > > > > > > > > > > > > > >> > > >> > > > > > > > > > > > > > > > >> > >> > > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > -- > >> > > > > > > > > > > > > > > Sandeep Krishnamurthy > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > -- > >> > > > > > > > > > > Sandeep Krishnamurthy > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > -- > >> > > > > > Sandeep Krishnamurthy > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >