KellenSunderland commented on issue #9774: mx.io.ImageRecordIter does not respect dtype argument URL: https://github.com/apache/incubator-mxnet/issues/9774#issuecomment-366532991 @rahul003 If you're only seeing ~15% speedups I'd recommend you run nvprof before your training. Take a look at the GEMMS and ensure they have s884 in the name. If they don't then one of these rules is probably not being followed: ``` A Few Simple Rules cuBLAS users will notice a few changes from their existing cuBLAS GEMM code: The routine must be a GEMM; currently, only GEMMs support Tensor Core execution. The math mode must be set to CUBLAS_TENSOR_OP_MATH. Floating point math is not associative, so the results of the Tensor Core math routines are not quite bit-equivalent to the results of the analogous non-Tensor Core math routines. cuBLAS requires the user to ?opt in? to the use of tensor cores. All of k, lda, ldb, and ldc must be a multiple of eight; m must be a multiple of four. The Tensor Core math routines stride through input data in steps of eight values, so the dimensions of the matrices must be multiples of eight. The input and output data types for the matrices must be either half precision or single precision. (Only CUDA_R_16F is shown above, but CUDA_R_32F also is supported.) ``` (from https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/) GEMMs that do not satisfy the above rules will fall back to a non-Tensor Core implementation. The tensor cores are a little tricky to use in a lot of cases. Let us know if nvprof shows that your model isn't being run on tensor cores and we might be able to give you some next steps.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services