kpuatamazon commented on issue #17559: [MXNET-1446] Quantization: intgemm matrix multiply wrappers URL: https://github.com/apache/incubator-mxnet/pull/17559#issuecomment-587020032 I'm the same person as @kpu but work part time as @kpuatamazon. Typically you'll hear from my Amazon hat on Mondays, though I plan to work flexibly to respond more quickly. Overall, I think this is going to come down to an end-to-end benchmark. Here's some numbers from a c5.12xlarge (with VNNI). Sockeye in fp32 on one core: ``` real 14m21.688s user 14m24.608s sys 0m1.329s ``` Sockeye in int8 (intgemm) on one core: ``` real 5m2.986s user 5m6.203s sys 0m1.036s ``` And BLEU was unchanged (it went up 0.1% oddly). I'll work on how much time is spent in GEMM and a version backed with DNNL. > Also, the intgemm library seems to be a personal project more than a product. I'm not sure how will it be maintained and what's the adoption status in other projects. > What's the adoption status of the library? And who will maintain the library? Amazon or @kpuatamazon himself? The intgemm library started as code inside the Marian machine translation project https://marian-nmt.github.io/ . It's been extracted as a standalone. Marian is run in production at Microsoft, the European Union, World Intellectual Property Organization, US Air Force, and others listed on the site. I've introduced @pengzhao-intel to our collaborators at Intel, which has funded some of the development. I coordinate a 3-year EUR 3 million project funded by the EU to add client-side machine translation to web browsers https://browser.mt/ https://www.zdnet.com/article/firefox-to-get-page-translation-feature-like-chrome/ . This project is using Marian. Since we want to run on people's desktops, intgemm is mostly optimized for pre-VNNI CPUs though we have VNNI support and further register optimization in a branch. > Could you please be more specific what the functionality is? > If possible, please share more about how you did the quantization in your gluon model. I'm calling the quantization operators directly from gluon instead of doing a graph transformation. Please see the Sockeye code that uses this pull request. The code is in https://github.com/awslabs/sockeye/pull/771 and https://github.com/kpuatamazon/sockeye/tree/heafield-quantize
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services