Speed.  All those `std::string` and `std::unordered_map` objects don't come 
cheaply.  

I compared an integrated fork with a custom operator.  

https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm integrated version 
end-to-end Sockeye performance (based on 1.6.0):
```
real    2m57.962s
user    7m3.986s
sys     0m6.724s
```
Custom operator version (based on 1.7.x. because it had to be for custom 
operators):
```
real    3m16.879s
user    7m43.727s
sys     0m8.273s
```
Conditions:
`unset MXNET_ENGINE_TYPE; export OMP_NUM_THREADS=2; numactl -C 0-7 translate.sh`
Both were compiled with the MKL backend hack for the remaining fp32 operations. 
 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/17006#issuecomment-636870342

Reply via email to