kpuatamazon commented on issue #17559: [MXNET-1446] Quantization: intgemm 
matrix multiply wrappers 
URL: https://github.com/apache/incubator-mxnet/pull/17559#issuecomment-587020032
 
 
   I'm the same person as @kpu but work part time as @kpuatamazon. Typically 
you'll hear from my Amazon hat on Mondays, though I plan to work flexibly to 
respond more quickly.  
   
   Overall, I think this is going to come down to an end-to-end benchmark.  
   
   Here's some numbers from a c5.12xlarge (with VNNI).  
   
   Sockeye in fp32 on one core:
   ```
   real    14m21.688s
   user    14m24.608s
   sys     0m1.329s
   ```
   Sockeye in int8 (intgemm) on one core:
   ```
   real    5m2.986s
   user    5m6.203s
   sys     0m1.036s
   ```
   And BLEU was unchanged (it went up 0.1% oddly).  
   
   I'll work on how much time is spent in GEMM and a version backed with DNNL.  
   
   > Also, the intgemm library seems to be a personal project more than a 
product. I'm not sure how will it be maintained and what's the adoption status 
in other projects. 
   > What's the adoption status of the library? And who will maintain the 
library? Amazon or @kpuatamazon himself?
   
   The intgemm library started as code inside the Marian machine translation 
project https://marian-nmt.github.io/ .  It's been extracted as a standalone. 
Marian is run in production at Microsoft, the European Union, World 
Intellectual Property Organization, US Air Force, and others listed on the 
site.  I've introduced @pengzhao-intel to our collaborators at Intel, which has 
funded some of the development.  
   
   I coordinate a 3-year EUR 3 million project funded by the EU to add 
client-side machine translation to web browsers https://browser.mt/ 
https://www.zdnet.com/article/firefox-to-get-page-translation-feature-like-chrome/
 .  This project is using Marian.  Since we want to run on people's desktops, 
intgemm is mostly optimized for pre-VNNI CPUs though we have VNNI support and 
further register optimization in a branch.  
   
   > Could you please be more specific what the functionality is?
   > If possible, please share more about how you did the quantization in your 
gluon model. 
   
   I'm calling the quantization operators directly from gluon instead of doing 
a graph transformation.  Please see the Sockeye code that uses this pull 
request.  The code is in https://github.com/awslabs/sockeye/pull/771 and  
https://github.com/kpuatamazon/sockeye/tree/heafield-quantize

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to