masahi opened a new pull request #9087:
URL: https://github.com/apache/tvm/pull/9087
In the cuda conv2d NHWC schedule, the number of blocks in the Z dimension is
(roughly) `H * W`. According to `deviceQuery`, the max grid z dimension is
65536:
```
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
```
When H and W are large, it is very likely to generate an invalid schedule,
because we try to launch too many blocks in the Z dimension. For example, here
is an error that I hit. I cannot avoid this error even after auto tuning, since
the block z size stays fixed during tuning. Without the change in this PR, I
cannot ever run this model unless I use the auto scheduler.
```
Check failed: ret == 0 (-1 vs. 0) : TVMError: CUDALaunch Error:
CUDA_ERROR_INVALID_VALUE
grid=(8,1,150000), block=(4,4,1)
```
My solution is simply to swap the use of block x and z dimension, since we
can launch far more blocks in the x dim as `deviceQuery` shows.
cc @vinx13 @junrushao1994 @Hzfengsy
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]