TaoLv commented on issue #17559: [MXNET-1446] Quantization: intgemm matrix 
multiply wrappers 
URL: https://github.com/apache/incubator-mxnet/pull/17559#issuecomment-586591074
 
 
   Hi @mjdenkowski ,
   
   > We're particularly interested in Kenneth's (@kpuatamazon) intgemm because 
it provides functionality we weren't able to find in the existing libraries.
   
   Could you please be more specific what the functionality is?
   
   > we're seeing a roughly 3X inference speedup on an already significantly 
optimized transformer implementation.
   
   It sounds to be a great achievement. Could you please share more detail 
numbers of that? Ie. how much time is spent in fp32/int8 gemm before and after 
the optimization?
   
   > Like many other Gluon users, our inference model is not currently 
expressible as a static graph.
   
   I know there is a plan/effort of moving the existing quantization flow from 
the symbolic executor to gluon. It's part of the roadmap of mxnet 2.0. If 
possible, please share more about how you did the quantization in your gluon 
model. It will help mxnet to improve. Thanks!
   
   > Are there particular concerns we can address about adding intgemm as a 
third party library?
   
   Sorry i haven't read the code yet. But I would like to know what's the 
advantage of this library over the existing MKL BLAS and DNNL? What's the 
adoption status of the library? And who will maintain the library? Amazon or 
@kpuatamazon himself?
   
   > Is there another path to using intgemm with MXNet that you recommend?
   
   You may consider the custom operator / subgraph feature developed by 
@samskalicky 's team.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to