TaoLv commented on issue #16749: Ask for advice about using my int8gemm URL: https://github.com/apache/incubator-mxnet/issues/16749#issuecomment-550998461 I see. Given you already have INT8 GEMM, so I would suggest you start from the normal FP32 FullyConnected operator and replace its implementation with yours. Inputs/outputs are still FP32, in the operator there should be quantization (FP32->INT8), INT8 GEMM, de-quantization (INT8/INT32 -> FP32). With this method, you can verify what's the accuracy and performance benefit you can get with your implementation. Once everything looks good, then you can extract the quantization/de-quantization code to be a normal operator which are already done for MKL-DNN/cuDNN path. For convolution, implement an INT8 Conv API with your INT8 GEMM and do the same experiment as above.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services