Hi Seb: please use a different email thread for new topics of discussion.

Hi Jun: I think Seb may be referring to Volta V100 support in MXNet and NOT
P4/P40 inference accelerators.

Corrections/clarifications welcome.

Bhavin Thaker.

On Mon, Oct 2, 2017 at 8:22 PM Jun Wu <wujun....@gmail.com> wrote:

> Thanks for your attention, Seb. We are inclined to be cautious on what can
> claim for this project. TensorRT has already supported converting
> TensorFlow and Caffe models to its compatible format for fast inference,
> but not MXNet. In this sense, it may not be fair to claim MXNet as the
> first one supporting Nvidia Volta.
>
> What we are working on is more experimental and research oriented. We want
> to get the first-hand materials in our own hands by building a INT-8
> inference prototype and have a thorough understanding on its strength and
> limitation, rather than handing it off completely to TensorRT, which is
> transparent to us. Considering that the project is experimental, it's still
> too early to make a conclusion here as there are plenty of known/unknown
> issues and unfinished work.
>
> On the other hand, we are glad to hear that Nvidia is working on supporting
> model conversion from MXNet to TensorRT (Dom please correct me if I'm
> mistaken). It would be super beneficial to MXNet on INT-8 if they could
> open-source their work as we would be able to maintain and add new features
> on our side.
>
>
> On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni <
> dominic.divakar...@gmail.com> wrote:
>
> > 👏
> >
> > On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <sebou...@gmail.com>
> wrote:
> >
> > > It would be awesome if MXNet were the first DL framework to support
> > Nvidia
> > > Volta. What do you all think about cutting a v0.12 release once that
> > > integration is ready?
> > >
> > > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wujun....@gmail.com> wrote:
> > >
> > > > I had been working on the sparse tensor project with Haibin. After it
> > was
> > > > wrapped up for the first stage, I started my work on the quantization
> > > > project (INT-8 inference). The benefits of using quantized models for
> > > > inference include much higher inference throughput than FP32 model
> with
> > > > acceptable accuracy loss and compact models saved on small devices.
> The
> > > > work currently aims at quantizing ConvNets, and we will consider
> > > expanding
> > > > it to RNN networks after getting good results for images. Meanwhile,
> > it's
> > > > expected to support quantization on CPU, GPU, and mobile devices.
> > > >
> > >
> > --
> >
> >
> > Dominic Divakaruni
> > 206.475.9200 Cell
> >
>

Reply via email to