TaoLv commented on issue #16749: Ask for advice about using my int8gemm 
URL: 
https://github.com/apache/incubator-mxnet/issues/16749#issuecomment-550998461
 
 
   I see. Given you already have INT8 GEMM, so I would suggest you start from 
the normal FP32 FullyConnected operator and replace its implementation with 
yours. Inputs/outputs are still FP32, in the operator there should be 
quantization (FP32->INT8), INT8 GEMM, de-quantization (INT8/INT32 -> FP32). 
With this method, you can verify what's the accuracy and performance benefit 
you can get with your implementation. Once everything looks good, then you can 
extract the quantization/de-quantization code to be a normal operator which are 
already done for MKL-DNN/cuDNN path.
   
   For convolution, implement an INT8 Conv API with your INT8 GEMM and do the 
same experiment as above.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to