rahul003 commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument / FP16 performance on Volta URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-371654013 Both suggestions didn't help improve the speed unforunately. Using MXNET_CUDNN_AUTOTUNE_DEFAULT=2 helped in some cases. But we can't say this setting helps consistently. If it picks the fastest, why would it not help in all cases? I understand cases where it should be same speed as other algos. But sometimes, this is slower than setting it to 1. All else should remain same, right? I'm writing a tutorial for fp16 usage in MXNet. While doing so, I am trying to understand some of the changes you made. Here, https://github.com/apache/incubator-mxnet/blob/649b08665bad016a71fa8b7a29a184d25217e335/example/image-classification/symbols/resnet.py#L141 Why does softmax input need to be cast to fp32? Is it for precision reasons? Is that double buffering you mention with identity operator general enough to go in as an official guide? Thanks for your help :)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services