RE: Proposal for Conversion from FP32 to Mixed Precision Models

Lv, Tao A Tue, 30 Apr 2019 07:59:28 -0700

Thank you Anirudh! I'm just a little surprised that when we talk about mixed 
precision model we don't talk about training, and when talk about inference, 
INT8 quantization is not mentioned~


-----Original Message-----
From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] 
Sent: Tuesday, April 30, 2019 8:27 PM
To: dev@mxnet.incubator.apache.org
Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models

Hi Zach,

I checked the QuantizeGraph pass and I think probably it can benefit from CSE 
pass to eliminate additional quantize/quantize_v2 nodes. Having said that, I 
think it may still be an overkill to add another NNVM pass to have a generic 
common subexpression elimination pass. Currently, this elimination logic takes 
only additional 3 to 6 lines of code in each of the two NNVM pass. Also, a 
generic common subexpression elimination has its own associated maintenance 
costs. I think it is better to continue with the current approach and revisit 
this need in the future as we add more NNVM passes.

Anirudh

On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com>
wrote:

> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it 
> brings is to make the ReducePrecision nnvm pass more lightweight. 
> Since the amortized cost of the ReducePrecision pass is O(1) it 
> shouldn't matter much whether we  add it or not from performance point of 
> view.
>
> From maintenance point of view, I would agree that separating these 
> two logics can be helpful if we have other such workflows which 
> require the original Pass followed by CSE pass. Currently, as far as I 
> know only the ReducePrecision pass using it. I will check to see if 
> CSE pass can benefit other NNVM pass also like quantization pass apart 
> from ReducePrecision, and will get back.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg 
> <zachary.kimb...@gmail.com>
> wrote:
>
>> I have one suggestion. In the current design, there are the 
>> additional maps from each input entry to each target casted entry 
>> dtype in order to avoid creating duplicate casts. Instead of creating 
>> these, another option is to use a general purpose Common 
>> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you 
>> would run the mixed precision pass which creates the duplicates and 
>> then the CSE pass which would remove all duplicates.
>>
>> This design is common in existing compilers like LLVM because 
>> maintaining and testing the passes is much easier when they are kept 
>> as simple as possible. The CSE can also be reused as necessary for 
>> other passes that could create duplicates or to remove duplicate expressions 
>> in general.
>> This
>> tutorial [2] talks about it a bit.
>>
>> Zach
>>
>> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
>> [2] - https://blog.regehr.org/archives/1603
>>
>> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian < 
>> anirudh2...@gmail.com>
>> wrote:
>>
>> > Hi Tao,
>> >
>> > Thanks for raising this question! I thought about the existing
>> quantization
>> > workflow and whether it can be included with the AMP API. Although 
>> > quantization can be considered as mixed precision, there are
>> differences.
>> > For example, only a small number of operators can be quantized 
>> > compared
>> to
>> > the operators that can run in FP16 precision. Thus, overriding the 
>> > operators to run in original dtype vs target dtype doesnt make much
>> sense
>> > for quantization.
>> >
>> > Also, quantization workflow may require a calibration dataset to
>> calibrate
>> > the min and max and calib_mode.
>> > Arriving at a common API, for quantization with calibration and 
>> > mixed precision inference (FP16 and BF16) may make the API too 
>> > complicated and not very easy to use. I understand that this may 
>> > cause some confusion as people may try to use target_dtype of int8 
>> > but I think its still better than causing user confusion with the API 
>> > usage.
>> >
>> > Also, when we move quantize_model APIs outside contrib we can 
>> > consider adding them under AMP namespace. The challenge would then 
>> > be to educate users on difference between "quantize" and "convert".
>> >
>> > Anirudh
>> >
>> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote:
>> >
>> > > Thank you for the explanation. Sorry I didn't realize the 
>> > > proposal is
>> for
>> > > inference only.
>> > >
>> > > Then how do you think the amp_cast and amp_multicast in this 
>> > > proposal
>> can
>> > > work with the existing INT8 quantization workflow which I think 
>> > > should
>> > also
>> > > be considered as 'mixed precision'.
>> > >
>> > > -----Original Message-----
>> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
>> > > Sent: Monday, April 29, 2019 10:25 PM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
>> Models
>> > >
>> > > Hi Tao,
>> > >
>> > > The APIs proposed: "convert_model" and "convert_block" are mainly 
>> > > for inference use cases, where customers bring a FP32 model to 
>> > > convert it
>> to
>> > a
>> > > mixed precision model to get improved performance while not 
>> > > losing
>> out on
>> > > the accuracy.
>> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
>> supposed
>> > > to handle the training use cases and this proposal doesn't cover 
>> > > the
>> AMP
>> > > feature added in the PR. I think ptrendx@ and canoerst@ are 
>> > > better equipped to answer questions 1 and 2.
>> > >
>> > > > - more generally, what will be saved when users want to 
>> > > > serialize their
>> > > model to disk?
>> > >
>> > > Lets say users want to save converted mixed precision model used 
>> > > for inference to disk. It will save both, the symbol with the 
>> > > amp_cast and amp_multicast operators and the params (which are 
>> > > casted if
>> necessary).
>> > >
>> > > Anirudh
>> > >
>> > >
>> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com> wrote:
>> > >
>> > > > Thank you for sharing this, Anirudh.
>> > > >
>> > > > Curious to know:
>> > > > - what will be saved in a training checkpoint or snapshot? Can 
>> > > > it be resumed on another platform which might not support the 
>> > > > lower precision the previous one used?
>> > > > - what will be saved in the final symbol.json and params file 
>> > > > when training is finished?
>> > > > - more generally, what will be saved when users want to 
>> > > > serialize their model to disk?
>> > > >
>> > > > Thank you,
>> > > > -tao
>> > > >
>> > > > -----Original Message-----
>> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
>> > > > Sent: Monday, April 29, 2019 7:00 PM
>> > > > To: dev@mxnet.incubator.apache.org
>> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision 
>> > > > Models
>> > > >
>> > > > Hi all,
>> > > >
>> > > > I have created a doc for conversion from FP32 to Mixed 
>> > > > Precision
>> > Models:
>> > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3
>> 2
>> > > > +to+Mixed+Precision+Models
>> > > >
>> > > > I look forward to your feedback on the same.
>> > > >
>> > > > Thanks,
>> > > > Anirudh
>> > > >
>> > >
>> >
>>
>

RE: Proposal for Conversion from FP32 to Mixed Precision Models

Reply via email to