I found 2 bugs related to gluon Trainer with distributed KVStore. Basically
if someone uses Gluon for distributed training with a learning rate
schedule (e.g. train ResNet50 for image classification), it won't work.

https://github.com/apache/incubator-mxnet/issues/12713

I have the fix for the first bug locally, but I don't have the fix for the
second one.

Best,
Haibin

On Mon, Oct 1, 2018 at 10:14 AM Afrooze, Sina <sina....@gmail.com> wrote:

> This post suggests there is a regression from 1.1.0 to 1.2.1 related to
> MKLDNN integration:
> https://discuss.mxnet.io/t/mxnet-1-2-1-module-get-outputs/1882
>
> The error is related to MKLDNN layout not being converted back to MXNet
> layout in some operator: " !IsMKLDNNData() We can’t generate TBlob for
> MKLDNN data. Please use Reorder2Default() to generate a new NDArray first"
>
> Sina
>
>
>
>
> On 9/30/18, 6:55 PM, "Steffen Rochel" <steffenroc...@gmail.com> wrote:
>
>     Thanks Patrick.
>     Updated roadmap and next release content.
>
>     Patrick - suggest to send a reminder to review the design doc and
> collect
>     feedback.
>     Are there still known issues or gaps before we declare MKL-DNN
> integration
>     as GA?
>
>     Regards,
>     Steffen
>
>     On Sat, Sep 29, 2018 at 1:31 AM Zhao, Patric <patric.z...@intel.com>
> wrote:
>
>     > Thanks, Steffen.
>     >
>     > Regarding the next release note, two items from our side:
>     >
>     > 1. (-remove) MKL-DNN integration is done. I think we can remove this
> item.
>     > 2. (+add) MKL-DNN based graph optimization and quantization by
> subgraph
>     >     Design doc:
>     >
> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
>     >     Lead Contributor: Patric Zhao,
> https://github.com/pengzhao-intel/
>     >
>     > Regarding the Roadmap
>     > (+add) Q1 2019: MKL-DNN RNN API supports
>     >
>     > BR,
>     >
>     > Thanks,
>     >
>     > --Patric
>     >
>     >
>     > > -----Original Message-----
>     > > From: kellen sunderland [mailto:kellen.sunderl...@gmail.com]
>     > > Sent: Saturday, September 29, 2018 11:31 AM
>     > > To: dev@mxnet.incubator.apache.org
>     > > Subject: Re: [Discuss] Next MXNet release
>     > >
>     > > Sorry I meant to say next 'Regarding the *minor* release'.
>     > >
>     > > On Sat, Sep 29, 2018 at 5:27 AM kellen sunderland <
>     > > kellen.sunderl...@gmail.com> wrote:
>     > >
>     > > > Thanks for transparently setting a rough timeline Steffen.  I
> think
>     > > > this will go a long way in helping the community plan their
> work, even
>     > > > if the details change somewhat on the road to the release.
>     > > >
>     > > > Regarding the major release: I would propose we unify TensorRT
> with
>     > > > the subgraph operator work.
>     > > >
>     > > > Regarding the patch release:  There were a few minor stack/buffer
>     > > > overflows exposed by ASAN that have been addressed.  It's
> probably a
>     > > > good idea to include them in a patch release, as they at best
> result
>     > > > in non-deterministic behaviour.
>     > > >
>     > > > -Kellen
>     > > >
>     > > >
>     > > > On Sat, Sep 29, 2018 at 1:39 AM Steffen Rochel
>     > > > <steffenroc...@gmail.com>
>     > > > wrote:
>     > > >
>     > > >> I updated
>     > > >>
>     > > >>
> https://cwiki.apache.org/confluence/display/MXNET/Project+Proposals+f
>     > > >> or+next+MXNet+Release
>     > > >> ,
>     > > >> removed the completed items from 1.3 release and would like to
> kick
>     > > >> off discussion about the next release. Please suggest what you
> would
>     > > >> like to see included in the next release together with link to
> design
>     > > >> proposal (appropriately for the size and complexity of the
> proposal)
>     > > >> or suggest changes.
>     > > >> I suggest to target the next release for December 2018 to frame
> the
>     > > >> discussion.
>     > > >> Lets include review of
>     > > >> https://cwiki.apache.org/confluence/display/MXNET/MXNet+Roadmap
> -
>     > > >> time to update and discuss changes.
>     > > >>
>     > > >> From the 1.3 release we had discussion regarding
>     > > >> https://github.com/apache/incubator-mxnet/issues/11849 and
> resolution
>     > > >> in
>     > > >> https://github.com/apache/incubator-mxnet/pull/12412 .
>     > > >> Are you aware of critical issues and feedback from user which we
>     > > >> should consider for a potential 1.3.1 patch release. Should we
>     > > >> include PR 12412 in a potential patch release?
>     > > >>
>     > > >> Regards,
>     > > >> Steffen
>     > > >>
>     > > >
>     >
>
>
>
>

Reply via email to