Re: Proposal for Conversion from FP32 to Mixed Precision Models

Anirudh Subramanian Tue, 30 Apr 2019 05:34:30 -0700

Hi Zach,

I checked the QuantizeGraph pass and I think probably it can benefit from
CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said
that, I think it may still be an overkill to add another NNVM pass to have
a generic common subexpression elimination pass. Currently, this
elimination logic takes only additional 3 to 6 lines of code in each of the
two NNVM pass. Also, a generic common subexpression elimination has its own
associated maintenance costs. I think it is better to continue with the
current approach and revisit this need in the future as we add more NNVM
passes.


Anirudh

On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com>
wrote:

> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it
> brings is to make the ReducePrecision nnvm pass more lightweight. Since the
> amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much
> whether we  add it or not from performance point of view.
>
> From maintenance point of view, I would agree that separating these two
> logics can be helpful if we have other such workflows which require the
> original Pass followed by CSE pass. Currently, as far as I know only the
> ReducePrecision pass using it. I will check to see if CSE pass can benefit
> other NNVM pass also like quantization pass apart from ReducePrecision, and
> will get back.
>
> Anirudh
>
> On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg <zachary.kimb...@gmail.com>
> wrote:
>
>> I have one suggestion. In the current design, there are the additional
>> maps
>> from each input entry to each target casted entry dtype in order to avoid
>> creating duplicate casts. Instead of creating these, another option is to
>> use a general purpose Common Subexpression Elimination (CSE) [1] pass to
>> apply afterwards. So, you would run the mixed precision pass which creates
>> the duplicates and then the CSE pass which would remove all duplicates.
>>
>> This design is common in existing compilers like LLVM because maintaining
>> and testing the passes is much easier when they are kept as simple as
>> possible. The CSE can also be reused as necessary for other passes that
>> could create duplicates or to remove duplicate expressions in general.
>> This
>> tutorial [2] talks about it a bit.
>>
>> Zach
>>
>> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
>> [2] - https://blog.regehr.org/archives/1603
>>
>> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian <
>> anirudh2...@gmail.com>
>> wrote:
>>
>> > Hi Tao,
>> >
>> > Thanks for raising this question! I thought about the existing
>> quantization
>> > workflow and whether it can be included with the AMP API. Although
>> > quantization can be considered as mixed precision, there are
>> differences.
>> > For example, only a small number of operators can be quantized compared
>> to
>> > the operators that can run in FP16 precision. Thus, overriding the
>> > operators to run in original dtype vs target dtype doesnt make much
>> sense
>> > for quantization.
>> >
>> > Also, quantization workflow may require a calibration dataset to
>> calibrate
>> > the min and max and calib_mode.
>> > Arriving at a common API, for quantization with calibration and mixed
>> > precision inference (FP16 and BF16) may make the API too complicated and
>> > not very easy to use. I understand that this may cause some confusion as
>> > people may try to use target_dtype of int8 but I think its still better
>> > than causing user confusion with the API usage.
>> >
>> > Also, when we move quantize_model APIs outside contrib we can consider
>> > adding them under AMP namespace. The challenge would then be to educate
>> > users on difference between "quantize" and "convert".
>> >
>> > Anirudh
>> >
>> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote:
>> >
>> > > Thank you for the explanation. Sorry I didn't realize the proposal is
>> for
>> > > inference only.
>> > >
>> > > Then how do you think the amp_cast and amp_multicast in this proposal
>> can
>> > > work with the existing INT8 quantization workflow which I think should
>> > also
>> > > be considered as 'mixed precision'.
>> > >
>> > > -----Original Message-----
>> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
>> > > Sent: Monday, April 29, 2019 10:25 PM
>> > > To: dev@mxnet.incubator.apache.org
>> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision
>> Models
>> > >
>> > > Hi Tao,
>> > >
>> > > The APIs proposed: "convert_model" and "convert_block" are mainly for
>> > > inference use cases, where customers bring a FP32 model to convert it
>> to
>> > a
>> > > mixed precision model to get improved performance while not losing
>> out on
>> > > the accuracy.
>> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is
>> supposed
>> > > to handle the training use cases and this proposal doesn't cover the
>> AMP
>> > > feature added in the PR. I think ptrendx@ and canoerst@ are better
>> > > equipped to answer questions 1 and 2.
>> > >
>> > > > - more generally, what will be saved when users want to serialize
>> > > > their
>> > > model to disk?
>> > >
>> > > Lets say users want to save converted mixed precision model used for
>> > > inference to disk. It will save both, the symbol with the amp_cast and
>> > > amp_multicast operators and the params (which are casted if
>> necessary).
>> > >
>> > > Anirudh
>> > >
>> > >
>> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com> wrote:
>> > >
>> > > > Thank you for sharing this, Anirudh.
>> > > >
>> > > > Curious to know:
>> > > > - what will be saved in a training checkpoint or snapshot? Can it be
>> > > > resumed on another platform which might not support the lower
>> > > > precision the previous one used?
>> > > > - what will be saved in the final symbol.json and params file when
>> > > > training is finished?
>> > > > - more generally, what will be saved when users want to serialize
>> > > > their model to disk?
>> > > >
>> > > > Thank you,
>> > > > -tao
>> > > >
>> > > > -----Original Message-----
>> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
>> > > > Sent: Monday, April 29, 2019 7:00 PM
>> > > > To: dev@mxnet.incubator.apache.org
>> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models
>> > > >
>> > > > Hi all,
>> > > >
>> > > > I have created a doc for conversion from FP32 to Mixed Precision
>> > Models:
>> > > >
>> > > >
>> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32
>> > > > +to+Mixed+Precision+Models
>> > > >
>> > > > I look forward to your feedback on the same.
>> > > >
>> > > > Thanks,
>> > > > Anirudh
>> > > >
>> > >
>> >
>>
>

Re: Proposal for Conversion from FP32 to Mixed Precision Models

Reply via email to