Hi Seb: please use a different email thread for new topics of discussion. Hi Jun: I think Seb may be referring to Volta V100 support in MXNet and NOT P4/P40 inference accelerators.
Corrections/clarifications welcome. Bhavin Thaker. On Mon, Oct 2, 2017 at 8:22 PM Jun Wu <wujun....@gmail.com> wrote: > Thanks for your attention, Seb. We are inclined to be cautious on what can > claim for this project. TensorRT has already supported converting > TensorFlow and Caffe models to its compatible format for fast inference, > but not MXNet. In this sense, it may not be fair to claim MXNet as the > first one supporting Nvidia Volta. > > What we are working on is more experimental and research oriented. We want > to get the first-hand materials in our own hands by building a INT-8 > inference prototype and have a thorough understanding on its strength and > limitation, rather than handing it off completely to TensorRT, which is > transparent to us. Considering that the project is experimental, it's still > too early to make a conclusion here as there are plenty of known/unknown > issues and unfinished work. > > On the other hand, we are glad to hear that Nvidia is working on supporting > model conversion from MXNet to TensorRT (Dom please correct me if I'm > mistaken). It would be super beneficial to MXNet on INT-8 if they could > open-source their work as we would be able to maintain and add new features > on our side. > > > On Mon, Oct 2, 2017 at 8:04 PM, Dominic Divakaruni < > dominic.divakar...@gmail.com> wrote: > > > 👏 > > > > On Mon, Oct 2, 2017 at 8:02 PM Seb Kiureghian <sebou...@gmail.com> > wrote: > > > > > It would be awesome if MXNet were the first DL framework to support > > Nvidia > > > Volta. What do you all think about cutting a v0.12 release once that > > > integration is ready? > > > > > > On Wed, Sep 27, 2017 at 10:38 PM, Jun Wu <wujun....@gmail.com> wrote: > > > > > > > I had been working on the sparse tensor project with Haibin. After it > > was > > > > wrapped up for the first stage, I started my work on the quantization > > > > project (INT-8 inference). The benefits of using quantized models for > > > > inference include much higher inference throughput than FP32 model > with > > > > acceptable accuracy loss and compact models saved on small devices. > The > > > > work currently aims at quantizing ConvNets, and we will consider > > > expanding > > > > it to RNN networks after getting good results for images. Meanwhile, > > it's > > > > expected to support quantization on CPU, GPU, and mobile devices. > > > > > > > > > -- > > > > > > Dominic Divakaruni > > 206.475.9200 Cell > > >