Hi Zach, I checked the QuantizeGraph pass and I think probably it can benefit from CSE pass to eliminate additional quantize/quantize_v2 nodes. Having said that, I think it may still be an overkill to add another NNVM pass to have a generic common subexpression elimination pass. Currently, this elimination logic takes only additional 3 to 6 lines of code in each of the two NNVM pass. Also, a generic common subexpression elimination has its own associated maintenance costs. I think it is better to continue with the current approach and revisit this need in the future as we add more NNVM passes.
Anirudh On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian <anirudh2...@gmail.com> wrote: > Hi Zach, > > You raise an interesting point. Thank you for the pointer! > > Incorporating CSE pass comes with its own cost, and the advantage it > brings is to make the ReducePrecision nnvm pass more lightweight. Since the > amortized cost of the ReducePrecision pass is O(1) it shouldn't matter much > whether we add it or not from performance point of view. > > From maintenance point of view, I would agree that separating these two > logics can be helpful if we have other such workflows which require the > original Pass followed by CSE pass. Currently, as far as I know only the > ReducePrecision pass using it. I will check to see if CSE pass can benefit > other NNVM pass also like quantization pass apart from ReducePrecision, and > will get back. > > Anirudh > > On Mon, Apr 29, 2019 at 11:18 AM Zach Kimberg <zachary.kimb...@gmail.com> > wrote: > >> I have one suggestion. In the current design, there are the additional >> maps >> from each input entry to each target casted entry dtype in order to avoid >> creating duplicate casts. Instead of creating these, another option is to >> use a general purpose Common Subexpression Elimination (CSE) [1] pass to >> apply afterwards. So, you would run the mixed precision pass which creates >> the duplicates and then the CSE pass which would remove all duplicates. >> >> This design is common in existing compilers like LLVM because maintaining >> and testing the passes is much easier when they are kept as simple as >> possible. The CSE can also be reused as necessary for other passes that >> could create duplicates or to remove duplicate expressions in general. >> This >> tutorial [2] talks about it a bit. >> >> Zach >> >> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination >> [2] - https://blog.regehr.org/archives/1603 >> >> On Mon, Apr 29, 2019 at 9:26 AM Anirudh Subramanian < >> anirudh2...@gmail.com> >> wrote: >> >> > Hi Tao, >> > >> > Thanks for raising this question! I thought about the existing >> quantization >> > workflow and whether it can be included with the AMP API. Although >> > quantization can be considered as mixed precision, there are >> differences. >> > For example, only a small number of operators can be quantized compared >> to >> > the operators that can run in FP16 precision. Thus, overriding the >> > operators to run in original dtype vs target dtype doesnt make much >> sense >> > for quantization. >> > >> > Also, quantization workflow may require a calibration dataset to >> calibrate >> > the min and max and calib_mode. >> > Arriving at a common API, for quantization with calibration and mixed >> > precision inference (FP16 and BF16) may make the API too complicated and >> > not very easy to use. I understand that this may cause some confusion as >> > people may try to use target_dtype of int8 but I think its still better >> > than causing user confusion with the API usage. >> > >> > Also, when we move quantize_model APIs outside contrib we can consider >> > adding them under AMP namespace. The challenge would then be to educate >> > users on difference between "quantize" and "convert". >> > >> > Anirudh >> > >> > On Mon, Apr 29, 2019 at 7:45 AM Lv, Tao A <tao.a...@intel.com> wrote: >> > >> > > Thank you for the explanation. Sorry I didn't realize the proposal is >> for >> > > inference only. >> > > >> > > Then how do you think the amp_cast and amp_multicast in this proposal >> can >> > > work with the existing INT8 quantization workflow which I think should >> > also >> > > be considered as 'mixed precision'. >> > > >> > > -----Original Message----- >> > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] >> > > Sent: Monday, April 29, 2019 10:25 PM >> > > To: dev@mxnet.incubator.apache.org >> > > Subject: Re: Proposal for Conversion from FP32 to Mixed Precision >> Models >> > > >> > > Hi Tao, >> > > >> > > The APIs proposed: "convert_model" and "convert_block" are mainly for >> > > inference use cases, where customers bring a FP32 model to convert it >> to >> > a >> > > mixed precision model to get improved performance while not losing >> out on >> > > the accuracy. >> > > The PR: https://github.com/apache/incubator-mxnet/pull/14173 is >> supposed >> > > to handle the training use cases and this proposal doesn't cover the >> AMP >> > > feature added in the PR. I think ptrendx@ and canoerst@ are better >> > > equipped to answer questions 1 and 2. >> > > >> > > > - more generally, what will be saved when users want to serialize >> > > > their >> > > model to disk? >> > > >> > > Lets say users want to save converted mixed precision model used for >> > > inference to disk. It will save both, the symbol with the amp_cast and >> > > amp_multicast operators and the params (which are casted if >> necessary). >> > > >> > > Anirudh >> > > >> > > >> > > On Mon, Apr 29, 2019 at 6:55 AM Lv, Tao A <tao.a...@intel.com> wrote: >> > > >> > > > Thank you for sharing this, Anirudh. >> > > > >> > > > Curious to know: >> > > > - what will be saved in a training checkpoint or snapshot? Can it be >> > > > resumed on another platform which might not support the lower >> > > > precision the previous one used? >> > > > - what will be saved in the final symbol.json and params file when >> > > > training is finished? >> > > > - more generally, what will be saved when users want to serialize >> > > > their model to disk? >> > > > >> > > > Thank you, >> > > > -tao >> > > > >> > > > -----Original Message----- >> > > > From: Anirudh Subramanian [mailto:anirudh2...@gmail.com] >> > > > Sent: Monday, April 29, 2019 7:00 PM >> > > > To: dev@mxnet.incubator.apache.org >> > > > Subject: Proposal for Conversion from FP32 to Mixed Precision Models >> > > > >> > > > Hi all, >> > > > >> > > > I have created a doc for conversion from FP32 to Mixed Precision >> > Models: >> > > > >> > > > >> https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32 >> > > > +to+Mixed+Precision+Models >> > > > >> > > > I look forward to your feedback on the same. >> > > > >> > > > Thanks, >> > > > Anirudh >> > > > >> > > >> > >> >