TaoLv commented on issue #17559: [MXNET-1446] Quantization: intgemm matrix multiply wrappers URL: https://github.com/apache/incubator-mxnet/pull/17559#issuecomment-586591074 Hi @mjdenkowski , > We're particularly interested in Kenneth's (@kpuatamazon) intgemm because it provides functionality we weren't able to find in the existing libraries. Could you please be more specific what the functionality is? > we're seeing a roughly 3X inference speedup on an already significantly optimized transformer implementation. It sounds to be a great achievement. Could you please share more detail numbers of that? Ie. how much time is spent in fp32/int8 gemm before and after the optimization? > Like many other Gluon users, our inference model is not currently expressible as a static graph. I know there is a plan/effort of moving the existing quantization flow from the symbolic executor to gluon. It's part of the roadmap of mxnet 2.0. If possible, please share more about how you did the quantization in your gluon model. It will help mxnet to improve. Thanks! > Are there particular concerns we can address about adding intgemm as a third party library? Sorry i haven't read the code yet. But I would like to know what's the advantage of this library over the existing MKL BLAS and DNNL? What's the adoption status of the library? And who will maintain the library? Amazon or @kpuatamazon himself? > Is there another path to using intgemm with MXNet that you recommend? You may consider the custom operator / subgraph feature developed by @samskalicky 's team.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services