ptrendx opened a new pull request #17028: Workaround problem with fusion in 
CUDA 9
URL: https://github.com/apache/incubator-mxnet/pull/17028
 
 
   ## Description ##
   Fixes #17020 
   
   The problem comes from the bug in how NVRTC in CUDA 9 handles the 
`default-device` flag. That flag is supposed to mark all the functions in the 
file as `__device__` functions, but it should leave the functions decorated 
differently (like kernels decorated with `__global__`) alone. This is the 
behavior in CUDA 10+. In CUDA 9, however, this `__device__` attribute is 
applied to every function (including kernels), which is incompatible with 
`__launch_bounds__()` attribute that we use for kernels.
   
   This PR removes the usage of `default-device` flag for NVRTC compilation and 
instead manually decorates all the required functions as `__device__`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to