ptrendx opened a new pull request #17028: Workaround problem with fusion in CUDA 9 URL: https://github.com/apache/incubator-mxnet/pull/17028 ## Description ## Fixes #17020 The problem comes from the bug in how NVRTC in CUDA 9 handles the `default-device` flag. That flag is supposed to mark all the functions in the file as `__device__` functions, but it should leave the functions decorated differently (like kernels decorated with `__global__`) alone. This is the behavior in CUDA 10+. In CUDA 9, however, this `__device__` attribute is applied to every function (including kernels), which is incompatible with `__launch_bounds__()` attribute that we use for kernels. This PR removes the usage of `default-device` flag for NVRTC compilation and instead manually decorates all the required functions as `__device__`
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services