Thank you Anirudh! I'm just a little surprised that when we talk about mixed precision model we don't talk about training, and when talk about inference, INT8 quantization is not mentioned~
-----Original Message----- From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] Sent: Tuesday, April 30, 2019 8:27 PM To: dev@mxnet.incubator.apache.org Subject: Re: Proposal for Conversion from FP32 to Mixed Precision Models Hi Zach, I checked the QuantizeGraph pass and I think probably it can benefit from CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said that, I think it may still be an overkill to add another NNVM pass to have a generic common subexpression elimination pass. Currently, this elimination logic takes only additional 3 to 6 lines of code in each of the two NNVM pass. Also, a generic common subexpression elimination has its own associated maintenance costs. I think it is better to continue with the current approach and revisit this need in the future as we add more NNVM passes. Anirudh On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com> wrote: > Hi Zach, > > You raise an interesting point. Thank you for the pointer! > > Incorporating CSE pass comes with its own cost, and the advantage it > brings is to make the ReducePrecision nnvm pass more lightweight. > Since the amortized cost of the ReducePrecision pass is O(1) it > shouldn't matter much whether we add it or not from performance point of > view. > > From maintenance point of view, I would agree that separating these > two logics can be helpful if we have other such workflows which > require the original Pass followed by CSE pass. Currently, as far as I > know only the ReducePrecision pass using it. I will check to see if > CSE pass can benefit other NNVM pass also like quantization pass apart > from ReducePrecision, and will get back. > > Anirudh > > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg > <zachary.kimb...@gmail.com> > wrote: > >> I have one suggestion. In the current design, there are the >> additional maps from each input entry to each target casted entry >> dtype in order to avoid creating duplicate casts. Instead of creating >> these, another option is to use a general purpose Common >> Subexpression Elimination (CSE) [1] pass to apply afterwards. So, you >> would run the mixed precision pass which creates the duplicates and >> then the CSE pass which would remove all duplicates. >> >> This design is common in existing compilers like LLVM because >> maintaining and testing the passes is much easier when they are kept >> as simple as possible. The CSE can also be reused as necessary for >> other passes that could create duplicates or to remove duplicate expressions >> in general. >> This >> tutorial [2] talks about it a bit. >> >> Zach >> >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination >> [2] - https://blog.regehr.org/archives/1603 >> >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian < >> anirudh2...@gmail.com> >> wrote: >> >> > Hi Tao, >> > >> > Thanks for raising this question! I thought about the existing >> quantization >> > workflow and whether it can be included with the AMP API. Although >> > quantization can be considered as mixed precision, there are >> differences. >> > For example, only a small number of operators can be quantized >> > compared >> to >> > the operators that can run in FP16 precision. Thus, overriding the >> > operators to run in original dtype vs target dtype doesnt make much >> sense >> > for quantization. >> > >> > Also, quantization workflow may require a calibration dataset to >> calibrate >> > the min and max and calib_mode. >> > Arriving at a common API, for quantization with calibration and >> > mixed precision inference (FP16 and BF16) may make the API too >> > complicated and not very easy to use. I understand that this may >> > cause some confusion as people may try to use target_dtype of int8 >> > but I think its still better than causing user confusion with the API >> > usage. >> > >> > Also, when we move quantize_model APIs outside contrib we can >> > consider adding them under AMP namespace. The challenge would then >> > be to educate users on difference between "quantize" and "convert". >> > >> > Anirudh >> > >> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote: >> > >> > > Thank you for the explanation. Sorry I didn't realize the >> > > proposal is >> for >> > > inference only. >> > > >> > > Then how do you think the amp_cast and amp_multicast in this >> > > proposal >> can >> > > work with the existing INT8 quantization workflow which I think >> > > should >> > also >> > > be considered as 'mixed precision'. >> > > >> > > -----Original Message----- >> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] >> > > Sent: Monday, April 29, 2019 10:25 PM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision >> Models >> > > >> > > Hi Tao, >> > > >> > > The APIs proposed: "convert_model" and "convert_block" are mainly >> > > for inference use cases, where customers bring a FP32 model to >> > > convert it >> to >> > a >> > > mixed precision model to get improved performance while not >> > > losing >> out on >> > > the accuracy. >> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is >> supposed >> > > to handle the training use cases and this proposal doesn't cover >> > > the >> AMP >> > > feature added in the PR. I think ptrendx@ and canoerst@ are >> > > better equipped to answer questions 1 and 2. >> > > >> > > > - more generally, what will be saved when users want to >> > > > serialize their >> > > model to disk? >> > > >> > > Lets say users want to save converted mixed precision model used >> > > for inference to disk. It will save both, the symbol with the >> > > amp_cast and amp_multicast operators and the params (which are >> > > casted if >> necessary). >> > > >> > > Anirudh >> > > >> > > >> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com> wrote: >> > > >> > > > Thank you for sharing this, Anirudh. >> > > > >> > > > Curious to know: >> > > > - what will be saved in a training checkpoint or snapshot? Can >> > > > it be resumed on another platform which might not support the >> > > > lower precision the previous one used? >> > > > - what will be saved in the final symbol.json and params file >> > > > when training is finished? >> > > > - more generally, what will be saved when users want to >> > > > serialize their model to disk? >> > > > >> > > > Thank you, >> > > > -tao >> > > > >> > > > -----Original Message----- >> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] >> > > > Sent: Monday, April 29, 2019 7:00 PM >> > > > To: dev@mxnet.incubator.apache.org >> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision >> > > > Models >> > > > >> > > > Hi all, >> > > > >> > > > I have created a doc for conversion from FP32 to Mixed >> > > > Precision >> > Models: >> > > > >> > > > >> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP3 >> 2 >> > > > +to+Mixed+Precision+Models >> > > > >> > > > I look forward to your feedback on the same. >> > > > >> > > > Thanks, >> > > > Anirudh >> > > > >> > > >> > >> >