KellenSunderland commented on issue #9774: does not 
respect dtype argument
   @rahul003 If you're only seeing ~15% speedups I'd recommend you run nvprof 
before your training.  Take a look at the GEMMS and ensure they have s884 in 
the name.  If they don't then one of these rules is probably not being followed:
   A Few Simple Rules
   cuBLAS users will notice a few changes from their existing cuBLAS GEMM code:
       The routine must be a GEMM; currently, only GEMMs support Tensor Core 
       The math mode must be set to CUBLAS_TENSOR_OP_MATH. Floating point math 
is not associative, so the results of the Tensor Core math routines are not 
quite bit-equivalent to the results of the analogous non-Tensor Core math 
routines.  cuBLAS requires the user to ?opt in? to the use of tensor cores.
       All of k, lda, ldb, and ldc must be a multiple of eight; m must be a 
multiple of four. The Tensor Core math routines stride through input data in 
steps of eight values, so the dimensions of the matrices must be multiples of 
       The input and output data types for the matrices must be either half 
precision or single precision. (Only CUDA_R_16F is shown above, but CUDA_R_32F 
also is supported.)
   GEMMs that do not satisfy the above rules will fall back to a non-Tensor 
Core implementation.
   The tensor cores are a little tricky to use in a lot of cases. Let us know 
if nvprof shows that your model isn't being run on tensor cores and we might be 
able to give you some next steps.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to